## SMP Erlang vs. Haskell vs. ML

(Didn't find this on LtU yet, apologies if it is a dupe.) While all benchmarks are questionable, I found this 3 way comparison for a given concurrency situation to be of interest.

This is a continuation of Parallel Ant Colony Optimization for TSP in Standard ML and Multi-Core Ant Colony Optimization for TSP in Haskell. Here the program has been rewritten in the programming language Erlang.

## Comment viewing options

### Not a valid comparison

The text talked about Erlang being a byte-coded virtual machine (which is incorrect in itself, it uses threaded code), but there was no mention of using HIPE (the machine code generating compiler that's part of the standard distribution).

The Erlang code is written in a Haskell-style, with lots of folds and zips, which is usually not the choice for Erlang if you need high-performance. There are no guard tests to indicate that floating point math is being done. The classic Erlang optimization is to start processes with larger heap sizes, or else you're just measuring continual GC for fast-growing processes (this may or may not apply to this benchmark).

I could keep going on, but you get the point.

### Erlang style

Thanks for the review, it is informative for an Erlang newbie to hear about appropriate Erlang style. (Were I to dream, there would be a wiki that would accumulate such style information for sundry languages, for newbies to reference.)

### The use of a dict for the

The use of a dict for the matricies is a bit odd also. Rewriting the erlang program to use ets to store the matricies results in a ~2x speedup on my single processor machine. Specifically, timer:tc(ant,parallelMain,[]) went from around 62 seconds to about 27 seconds.

It would be interesting to see what an even more optimized version of the erlang program would do.

### Erlang implementation

I realize this is a year-and-a-half-old thread; I just found it here. I responded to the HiPE compilation suggestion in a later blog post:

Basically HiPE didn't speed up the code at the time, isn't compatible with SMP, and (June 2008) isn't available in the binary from Debian apt-get.

I do have a basic question about Erlang. If it isn't a byte-coded virtual machine, what are the .beam files produced by erlc? I assumed they were machine-independent byte-code files, like Java .class files. The erl interpreter would be similar to the java interpreter (except for no command-line interpreter). erlc +native would produce machine code (but on a per-function basis, not a standalone executable). Are .beam files portable across machine architectures like .class files? The documentation talks about copying .beam files to different machines, but I don't see mention of being concerned about different architectures.

### HiPE hype

Yes, I too have been less than impressed with HiPE. (OTOH, I've been really impressed with recent performance improvements (>6.8) in GHC. The compiler itself is a lot snappier, and my programs run a bit faster as well :-)

I'm not sure exactly what James was getting at. Bytecode tends to be a pretty broad term, whereas threaded code is more specific, but to say that threaded code is not bytecode would seem to be a mistake. (hmm, if you take threaded code to mean concrete procedure addresses as in Jones Forth (which is an interpreter, not a compiler, despite the headline), which seems to be one case where "threaded code" is not "bytecode" by most accounts. But my impression is that Erlang's beam format is largely backward-compatible, and can be shipped to OTP nodes running on different architectures on a single OTP cluster, so unless Erlang's loader translates a version-independent bytecode into direct threaded code, this definition doesn't apply)

But the latter portion of the original post is absolutely right: you need to understand what your implementation does well and does poorly. Most abstractions are leaky in some sense...

### the secret byte arrays

A very quick look at the code gave me the feeling that the destructive byte arrays in the hipe_bifs module might be a good candidate. They only handle normal ints (not floats or bignums), but as long as that's adequate, they're pretty fast. I don't have time to try it, though, and one may dismiss it as cheating. (:

Use of the byte arrays has been demonstrated in the debian shootout.

### Use of the byte arrays

Use of the byte arrays has been demonstrated in the ... and when we get it right the program works!

### What might be a valid concurrency comparison?

On a single processor system there seem to be a wonderful variety of "concurrency" implementations across programming languages - from roll-your-own C++ through roll-your-own Clean to batteries included Erlang.

Given the variety, what might be a valid concurrency comparison?