The Trouble with Erlang

Tony Arcieri, author of the Reia Ruby-like language for the Erlang BEAM platform, wrote a piece in July, The Trouble with Erlang (or Erlang is a ghetto), bringing together a long laundry list of complaints about Erlang and the concepts behind it, and arguing at the end that Clojure now provides a better basis for parallel programming in practice.

While the complaints include many points about syntax, data types, and the like, the heart of the critique is two-fold: first, that Erlang has terrible problems managing memory and does not scale as advertised, and that these failures partly follow from "Erlang hat[ing] state. It especially hates shared state." He points to the Goetz and Click argument in Concurrency Revolution From a Hardware Perspective (2010) that local state is compatible with the Actors model. He further argues that SSA as it is used in Erlang is less safe than local state.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Interesting but possibly opinionated

I read this post with interest when it came out; it's interesting and comes from someone which has spent considerable time inside the Erlang community -- I quite like some of the idea behind Reia, a shame it's been dropped.

I would warn however that some of the arguments seems actually quite subjective, even those that seem to come out of sensible experiments. The claim that Erlang doesn't scale so well in practice must be balanced with the results of the article Characterizing the Scalability of Erlang VM on Many-core Processors by Jianrong Zhang, 2011:

[..] This thesis presents a study on the scalability of the Erlang VM on a many-core processor with 64 cores, TILEPro64. The purpose is to study the implementation of parallel Erlang VM, investigate its performance, identify bottlenecks and provide optimization suggestions. To achieve this goal, the VM is tested with some benchmark programs. Then discovered problems are examined more closely with methods such as profiling and tracing. The results show that the current version of Erlang VM achieves good scalability on the processor with most benchmarks used. The maximum speedup is from about 40 to 50 on 60 cores. [..]

I'm interested in work on mixing a message-passing model with local, or shared state; obviously the Erlang system has been more optimized for distributed computation that single many-core machines. But it's clearly too soon to claim that the no-shared-state principle of Erlang is becoming a major bottleneck in general -- of course absence of shared-state is already a killer for some specific kind of use of parallelism. And the implementation may also improve over time, as those system become more common and get in the radar of the implementors priorities.

This post came shortly after

This post came shortly after Tony was "shutdown" on the Erlang mailing list when he discussed the addition of Ruby-like blocks to Erlang. I think a number of his points only stand up to criticism if you think Erlang is a general purpose language, which most people I know would argue it isn't.

Emotional context

I was aware of the suppositions around the emotional context of this post -- I read the discussion on the erlang mailing list, but unfortunately the signal/troll ratio is relatively low, even if there are some interesting bits, and some seriously funny things such as the Ruby 'ennnnd' proposal discussion.

I'm unsure however that this emotional context is actually helpful to evaluate and get the most out of this post. No harm done, I trust LtU to keep a calm and considerate discussion; we are more easily victim -- and perpetrators -- of long heated partly-off-topic threads than simple trolling.

True enough. One thing that

True enough.

One thing that stuck with me from his post is mutability. The argument seems to be that copying data between processes is good (so one cannot affect the other indirectly), but a process should be allowed to mutate its own data. But doesn't that go against what decades of programming have taught us? Mutability is difficult to do properly, even in serial programs?

The problem with mutability

The problem with mutability occurs when objects are shared. If mutability is adequately encapsulated (e.g., by not creating aliases to mutable objects), then its not that bad. In this case, encapsulating a mutable object within a process is definitely a good thing, though you might still run into trouble depending how the object is used within the process.

Hmm...

Well, all true, but having read the blog post in question, his complaint is really not with immutability per se, but rather with the ambiguity of binding and matching in Erlang (due to its roots in Prolog). The example he gives is something like:

(Foo, Bar) <- blah...
... later ...
(Baz, Bar) <- blah...

The third line intends to shadow Bar, but in Erlang it will attempt a match of the previously bound Bar with the right-hand side. A mismatch will result in a run-time error, which is of course a terrible time to find out that you re-used a variable name.

I agree with the blog post that this is probably the wrong default, and I can see why errors of this sort would be easy to make and very frustrating. But in my view it has essentially nothing to do with immutability.

I know what this behavior

I know what this behavior buys you in Prolog, but I don't recall if it buys you anything in Erlang. Does it?

As far as I can tell, it

As far as I can tell, it doesn't particularly buy you anything. I'm not aware of any use in the single-assignment syntax fragment. It's somewhat useful in conjunction with case statements/pattern matching, but that can be dealt with using guards with a smallish additional syntactic burden. But I don't see how the benefits of this are enough to justify itself, even ignoring the problems that Matt pointed out.

Erlang's syntax has its moments of brilliance; I especially like the single-assignment flavor and it's interactions with code branches. But on the whole it feels a little too quirky; I think you could go a long way to improving Erlang syntax by getting rid of this particular behavior, making the statement separators more uniform, and making the anonymous function syntax a bit more lightweight. (Though Erlang's anonymous functions isn't terribly worse than ML or Lisp: I suppose I've been spoiled by Haskell)

Thanks. That's pretty much

Thanks. That's pretty much what I thought.

see LFE?

lfe? maybe less bad?

LFE

I've heard of LFE, but I've never really tried it out. As far as I know, it isn't very widely used.

general-purpose

I'm not an Erlang expert (I've never even used it), but I've read a few things about it and I had no idea that it wasn't meant to be a general-purpose language. I mention that just as a data point, without any particular agenda.

General purpose, a definition

"General purpose" seems to have come to mean "supports, or at least tolerates, my favorite programming style." It doesn't seem to be related to any combined meaning of the words "general" and "purpose" any longer.

Or perhaps I'm just bitter.

General Purpose Blub

I'm a bit curious. Do you know what gives you this impression?

I've never thought general purpose has anything to do with programming style. To me, it is simply the counterpart of 'domain specific', i.e. a language is general purpose to the extent it supports solving multi-domain problems.

I do set a high bar for effective 'general purpose', but only because I've seen a lot of problem domains - ad-hoc sensors, actuators, foreign services, HCI, data-fusion, orchestration, workflows, embedded, hard-realtime, distributed, continuous. And I have learned there are a lot of general 'end-to-end' requirements that cross arbitrary domains, such as resource management, job control, concurrency, consistency, liveness, safety, security, modularity, extensibility, resilience, graceful degradation, scalability, maintenance and upgrade.

General purpose, to me, is a relative metric. Your Blub is a general purpose language, but might be weak compared to Paul Graham's Blub, which in turn is probably weak compared to Adam Chlipala's Blub.

By general, you must understand we don't mean general

I've always supposed that "general purpose" does not cover the requirements of systems programming, which covers a fair number of your problem domains.

Here's a few opposed pairs, which I probably risk making the discussion too subjective by posting:

  1. Fortran is, but APL isn't
  2. Haskell is, but FP isn't
  3. C is, but assembler isn't
  4. Perl 5 is, but Awk isn't

General : no weaker than necessary outside the comfort zone

We could say that a language is "general" if it doesn't degrade too much out of its comfort zone (or niche); we could define "not too much" as "not worse than what most other do". For a given domain, you should not compare it only to the niche languages -- otherwise no language would meet those requirements -- but to other languages that are "average" on this domain.

For example, Java is clearly not an ideal choice to write a compiler, and never intended to be. But it's not "much worse" than, say, C++ or Python, for this purpose. (Well-written) Perl is bearable to write a scientific application, while Awk would be truly horrible. Assembler used to be a "general purpose language" when there was nothing else in most domains; now nobody would accept to use it for most tasks. TeX is not general purpose.

Multi-domain programming

What a strange position you take. Consider: integer arithmetic is a "requirement of systems programming". Would you say, therefore, that "general purpose" does not cover integer arithmetic?

The requirements I presented above are cross-cutting. They are requirements of systems programming, but not specific to systems programming.

As I understand it, "general purpose" should ideally support the cross-cutting requirements and constraints, without becoming specialized or constrained to any particular domain. This allows our applications to compose multiple problem domains.

Would it be fair then to say that

Would it be fair then to say that RDP will be the first general purpose programming language?

No

First, Reactive Demand Programming (RDP) is not a language; it's a programming model. Second, RDP isn't the first general purpose programming model, just the best to date. ;)

General Purpose is not a binary property. It's a continuum of Blub.

Really, really general purpose programming language

Consider: integer arithmetic is a "requirement of systems programming".

I think you must be aware that you are taking my remark in a way I didn't intend. To put it another way, a PL does not need to be well-suited to systems programming to be a general purpose programming language.

Here, this is a weak sense of "general purpose", but I think it is just how the idiom is used. One is stuck with what such phrases do mean, not with what they ought to mean. You could come up with another term for what you do mean. Of course, the field has too many such terms already.

ill suited for every purpose

a PL does not need to be well-suited to systems programming to be a general purpose programming language

A PL does not need to be well-suited to [PICK ANY PURPOSE] to be a general purpose programming language. We could call Brainfuck a (degenerate) general purpose programming language: it is Turing complete, and clearly does not accommodate requirements specific to any domain. You seem to be picking on 'systems programming' for no good reason that I can discern.

Yet, there remains a question of quality, fitness, suitability. General purpose encompasses every purpose. A high-quality general purpose programming language will be effective (even if not ideal) for [PICK ANY PURPOSE], including systems programming, and including purposes that cross domains.

There is very little question of Brainfuck's 'quality' in this role.

Scalability require massive concurrency, not just parallelism

The issue is that scalability requires massive concurrency; not just parallelism as in the lambda calculus. Change has to be encapsulated and managed locally. Non-local participants must communicate their requests using messages. These requirements led to the development of the Actor Model see Actor Model of Computation: Scalable Robust Information Systems

Blogs feature weak arguments

Something that is generally true, but it made me groan when I clicked on the link. The argument about memory management seems particularly weak. I've never used Erlang and so I won't defend its memory management at all, but a comparison to a single data-point from one benchmark on another system, running on custom hardware, is quite a poor comparison.

The complaint about the lack of zero-copy also slightly contradicts the main thesis. Sharing memory under the hood will speed up performance when processes are really threads on a single processor. But it will introduce a hidden scaling problem when the same code is reused on a larger array of processors and sharing is no longer a viable option. Given that the main thrust of the argument is a rant that Erlang doesn't meet its own goals for scaling this is a little counter-productive.

The last comment that the blog prompts is quite a general question: does any language / architecture combination provide shared state to the programmer and meet the performance goal of N times faster on N processors for a reasonably large value of N?

The last comment that the

The last comment that the blog prompts is quite a general question: does any language / architecture combination provide shared state to the programmer and meet the performance goal of N times faster on N processors for a reasonably large value of N?

Does any language /architecture combination at all providing or not providing shared state scale linearly? Immutability is not a silver bullet. The closest I can think of are GPU languages, and they most definitely are built around the notion of a restricted memory model, even sharing immutable data has a cost!

Exactly

That is a better question to ask as it is slightly weaker. The main complaint in the blog seemed to be that Erlang doesn't scale linearly, but I've never come across a system that does. Even applications that are supposed to be embarrassingly parallel fail to scale linearly if you have to distribute (enough) immutable data between the processors. Ray-tracing is the classic example as it requires a cache for the rays, and the performance of the distributed cache dominates the performance of the rest of the system.

Then I don't quite get your

Then I don't quite get your criticism. I have yet to see a functional (fun) programming language that scales even better than an imperative language, let alone one optimized for parallization. Advocates of fun languages have pushed the concept that we will eventually get great scaling via immutability, but no one has really delivered yet. It turns out that scaling is much more than just a memory coherency problem, and so why beat on Erlang about this point.

Because there wasn't one there.

I think you are trying to read something critically that was not a criticism. The original blog posting labored the point that Erlang fails to scale linearly. I was pointing out that the problem is not limited to Erlang: as far as I know there are no languages (imperative, logical or functional) that scale in such a way. It could be that avoiding shared state helps, but it certainly is not a complete solution.

Got it. Arguments of the

Got it. Arguments of the form "but it doesn't do X" always confuse me when nothing else happens to do "X" either.

Could be

because I always like to state my arguments backwards. Helps exercise the reader :)

Erlang's memory management

Quite a lot of work has gone into Erlang's memory management, but rather than making speed a top priority, the Erlang VM goes to great lengths to fight fragmentation in non-stop systems. As Erlang nodes in commercial systems have been known to run for years without restarting, this is crucial.

To this end, the memory management subsystem is pluggable, so that you can use your own allocator, and quite a number of parameters can be tuned and inspected at runtime. Memory "carriers" are selected based on data type and persistency expectations (e.g. data in the built-in hash tables are kept separate from the process heaps, since they tend to be more long-lived). Incremental adjustments are made to move towards NUMA for better scalability.

I'm sure that more could be done. The source is available on github, and the Erlang/OTP team is quite receptive to patches. Not that many patch submissions fall in this category, though - presumably because there is little low-hanging fruit...

Does the post only feature weak arguments?

I was not impressed by many of the criticisms that Arcieri made, but I thought they had interest even so because of who made them.

I was impressed by his Single assignment is just as problematic as destructive assignment point, in the sense that it made me think and continues to do so. Now, while the behaviour Arcieri describes might be said to fit Erlang's better to fail than risk doing the wrong thing approach, I don't see any argument that Erlang's existing behaviour is better than the well-established restriction of destructive assignment to local variables. The question is, does SSA still look like a good thing for programmer-facing languages?

The point about zero copy and process boundaries reminds me of all the hard, hard problems that are so much more tractable with linear naming.

It seems like that to me

It seems to me as if he never wanted a functional programming language in the first place, considering that he was implementing something akin to Ruby on top of Erlang. In an interview in 2008, he made several statements to that effect. So I think that most of his frustration with Erlang stems from his desire for Erlang to be like Ruby, Python or Perl, but with an "Actor model." I think that his complaints about it greatly reflect his different language design priorities.

It really is not very different from a hypothetical scenario if I were to go on to the Erlang mailing lists and complain that Erlang doesn't have strongly static type checking. (I like the dialyzer, but that to me is the biggest weakness for large scale development) Such a feature obviously isn't something that Erlang was geared towards in the first place. (If it were, I'm sure that they would have borrowed from ML directly) The fact that they made some steps towards addressing my scenario just means that they to some extent share my concerns and priorities, whereas they don't want Erlang to have more state. (And I'm glad that the Erlang designers were opinionated enough to not cave in to mainstream pressure for more mutability)

His expectations of what is "general purpose" and expectation of linear scaling really don't hold for any programming language or concurrency framework either, so I think they are both mostly unfair.

Even though I disagree with basically all of his major points, (or at least with the way they were phrased) I found them somewhat interesting though, just coming from him in general.

Full Disclaimer: I'm not as experienced with Erlang as Mr. Arcieri is, I have written an email client in it though.