OCaml vs. C++ for Dynamic Programming

Some might want to check this /. thread,

On one particular problem instance (a garden of size 7x3), my C++ implementation finished in 1 second, while the OCaml implementation was still running after 16 minutes. Bear in mind that my OCaml implementation was dramatically faster than my equivalent Haskell code... If you write OCaml code that is isomorphic to C code, it will be fast---what about if you use OCaml the way it was meant to be used?

My critical view of this sort of thing is well known, and I don't want another holy war (the Knuth thread is enough for my taste), but I guess we should check comparisons of this sort from time to time, if only to keep everyone honest (including ourselves).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Decent Thread

From reading the thread, I'd say the issue has been addressed well: the O'Caml implementation is rather naïve, and one can do considerably better, even staying within the functional paradigm apart from the memoization implementation. Other comments, of course, go farther: O'Caml features state/mutation/iteration for a reason, so use them when it makes sense to use them. The latter perspective seems to be gaining prominence, which should make Peter Van Roy happy. :-) And I agree: I'm not persuaded that paradigm-purity helps anyone; the challenge is to integrate multiple concepts in as orthogonal a way as possible, which is why Oz rocks.

And sorry again about the Knuth thing...

I agree

with Paul Snively. I don't normally post to /., but I did that time.

What annoys me about this topic is that it's not at all a valid comparison of languages. One implementation of one problem by one programmer who admits he doesn't know much about Ocaml serves to pass judgement on entire paradigm? Even if the implementation was flawless (it wasn't), one example isn't proof of anything (except maybe an existance proof- OK, I accept that it is possible to write an Ocaml program that is slower than some other C++ program. This doesn't mean much). And it gets posted to the largest technical blog in existance.

The Great Computer Language Shootout has it's shortcommings, and it's results (both good and bad) should be taken with a very large grain of salt. But at least it's trying to be meaningfull, with dozens of tests.

'It' annoys us all

And by it I mean the lack of reliable references when it comes to comparing language performances.

As long as the language designers/compiler writers themselves don't undertake some of this work people will always be conducting some of this benchmarking themselves and cause some poor soul to pass up on O'Caml because he saw a headline on /. that said it was way slower than C++.

I know that benchmarking is very complicated but not having the ideal, complete answer is worse than having no answer in this situation. I admire the attempt by The Language Shootout, but I've always expected the more qualified examples to be demonstrated by Sun/IBM, etc. themselves as to how Java is within x% of C++.

I just hope that this was indeed done and I've just missed it.

OCaml list responses

There is quite an active thread in response on the OCaml list, that I just joined.

( sorry no URL; the OCaml site seems to be having technical difficulties right now.)

Google Groups

Google Groups carries the OCaml list. And the OCaml site seemed to be down because they were doing an upgrade (looks nice!).

This posting from Jacques Garrigue is most informative

http://groups-beta.google.com/group/fa.caml/msg/49a8e8135eb9ca1a

An almost completely functional implementation performs about the same as the C++ implementation.

I think the guy actually has a point.

Though it's a really well-hidden one, that nobody (including him) has noticed. In the general case, purely functional programming, strictly speaking, is slower than imperative programming. At least, Chris Okasaki does say as much early on in his book.

The key, however, is that the data structures used in a purely functional program do more than the imperative equivalents. Following Okasaki once more, purely functional data structures are persisent; conceptually speaking, when you "put" a new binding into a purely functional map, you create a whole new map that has all the bindings as the old one, plus the one you added. Conceptually, both of those maps exist forever, so you can always refer back to the old "states" of the map. (In practice, most of the storage is shared between the maps, and garbage collection ensures that any state that can't be referred to any longer by the program is cleaned up.)

Now, a fair comparison between a purely functional program and an imperative one would involve some algorithm that exploits persistent data structures; for example, one that makes heavy use of backtracking. A naïve imperative program would probably do very badly in time, memory, or both (think of e.g. copying a whole hashtable every time you want a new state you can backtrack from). A good one, in effect, would have to implement a bunch of functionality for its data structures that comes for free in the implementation of the purely functional language.

And more...

More than that, the purely functional program often does even more than that. I know that when I write equivalent programs in, say, Haskell and C, the Haskell version often handles more "bells and whistles" and corner cases than the C version, simply because it's easy to make it do that. And, of course, the Haskell version is invaribaly segfault-proof and buffer overrun-proof.

Next time this comparison is done for a non-trivial program, it should be taken into account that pure programs can be far, far more robust than their impure equivalents.

That one's really because C i

That one's really because C is weakly typed. A program in a strongly typed imperative language shouldn't crash, either.

Anyway, the comparison I have in mind is between a correct C/C++ progam and a purely functional equivalent in OCaml, and ignores the effort and pitfalls involved in implementing the first of those...

Weak typing vs. crashing

What do you mean by weak typing, and what you would qualify as a strongly typed imperative language?

Automatic memory management and run-time array bounds checking would seem to have more to do with eliminating the possibility of crashes and overruns than types...with the exception of fixed size arrays, which are (inadequately) provided by the C type system, but not most type systems in functional languages.

"Well-typed programs don’t go wrong"

I think that meant: programs in stronger typed languages go wrong less frequently than those in weaker typed PLs. To reach the absolute guarantee of not going wrong, one has to run the program - the ultimate type of the program is the program itself.

I believe that the whole history of type systems in CS (and SE) is a quest for the best trade-off between complexity (both computational and conceptual) and the provided guarantees.

In SE, there is no such thing as a perfect universal trade-off, so for me, the ideal type system has to provide the whole spectrum of possibilities: from asserting just a kind of the term (like it's a function) to algebraic types to dependent types to behavioral types to a program as a type.

Obviously

And many of the common sources of errors in C can be framed as type errors.

My comment was more of a nitpick because when people talk about C as "weakly typed", they almost never refer to the most common kinds of errors which are present in what are also present in what are usually referred to as "strongly typed" imperative languages.

Those "strongly typed" languages are usually only slightly safer than C in practice, and hardly "crash-proof" compared to it.

Eat your spinach

when people talk about C as "weakly typed", they almost never refer to the most common kinds of errors which are present in what are also present in what are usually referred to as "strongly typed" imperative languages.

What kind of errors are you thinking of?

The one that concerns me most is the fact that I can set, eg. an int pointer at an arbitrary place in memory (via pointer arithmetic) and alter or use it as an int without knowing if what was put there was an int.

In this view, C is "weakly typed" because it has types, but does not enforce them at this basic level.

Well-typed programs don’t go wrong!

If you compare C to a strongly-typed language with dynamic types (like Lisp) or with implicit null-values (like Java), then of course you open the door to all kinds of NullPointerExceptions.

Take ML or Haskell with their sum types: failure is an explicit part of the type (e.g. in the "option" type), so you *have* to treat all cases.
case (call function) of
NONE => error
| SOME (nice value) => process value

Errors can't sneak in.

Oops

This is supposed to be a reply to Ville-Pertti Keinonen.

What do you mean by weak typi

What do you mean by weak typing, and what you would qualify as a strongly typed imperative language?

A language is strongly typed if its implementations can guarantee that no undetected type errors occur at runtime. A correct implementation of a strong statically typed language like OCaml can guarantee that every piece of compiled code treats every item of data as the correct type; you can't mistakenly store a value as an int, then read it back as if it were a float. A correct implementation of a strong dynamically typed language doesn't guarantee that, but it guarantees that if such a condition does arise at runtime, it will be signaled immediately as a runtime type error.

In a language like C, a correct implementation doesn't guarantee either. Your program can suffer from a type error that's never detected at all, or which has runtime side-effects that are not obvious at all.

Automatic memory management and run-time array bounds checking would seem to have more to do with eliminating the possibility of crashes and overruns than types...

Well, those are two of the key technologies used to enforce strong typing.

Sounds more like soundness

I constantly keep forgetting that "strong/weak typing" really has nothing to do with the descriptive strength of the type systems. Isn't it an unfortunate choice of the term? Or am I the only person confused by it?

I need to re-read Type Systems to restore my understanding.

'Weakly typed" is informal

I need to re-read Type Systems to restore my understanding.

The only catch is that Cardelli uses a slightly different, more formal terminology.

He uses the term "unsafe" for the property that we more colloquially call "weakly typed".

Unfortunately, that terminology is likely to start even worse flamewars with C enthusiasts. ;-)

Thanks for the clarification

Historically the attacks on C for being "weakly typed" compared to other imperative languages have often come from Pascal programmers, and terms were interpreted accordingly... Thus I was assuming that by saying C was "weakly typed" you were referring to the language allowing unsafe casts.

Personally I agree with your definition of strong/static typing (and avoid the term "weak typing" altogether) but they're often interpreted differently. Perhaps the most common modern example is Java being generally considered statically and strongly typed, but with null pointers and type narrowing casts which may implicitly generate run-time errors, that doesn't sound like real static typing...

Note that you can violate the type system in OCaml, as well, it just isn't strictly accepted programming practice (although it's reasonably common, and I sometimes do it myself).

OCaml type violation

Note that you can violate the type system in OCaml, as well, it just isn't strictly accepted programming practice

Interesting. Can you give me an example?

I just started using OCaml, so I'm still exploring the boundaries of the language. I'd like to know more about this.

Here are a few


# external evil : float -> int = "%identity";;
external evil : float -> int = "%identity"
# evil 1.0;;
- : int = 2852888
# (Obj.magic 1.0 : int);;
- : int = 2832240
# String.create 10;;
- : string = "\001\000\000\000\000\000\000\000\000\000"

Note that only the last is a documented interface, and what is happening is that strings are uninitialized, so when using String.create, you are effectively accessing arbitrary data as a string...

Forgot the worst example...

Since this is both documented and unsafe:

# (Marshal.from_string (Marshal.to_string [ "foo"; "bar" ] []) 0 : int list);;
- : int list = [2847668; 2847688]

Thanks!

Thanks for the examples: very interesting.

Coincidentally, I was just thinking the other day about the problems of type-safe marshalling across systems (In the context of a hypothetical statically-typed Oz) and wondering why more systems don't do it.

Relative Performance

In the general case, purely functional programming, strictly speaking, is slower than imperative programming.

I don't think this was the guy's point, or what people are reacting negatively to in the thread.

As the popularity of Python, Java, etc. has shown, people are willing to tolerate small constant factors of reduced performance relative to C for most applications if the ease of development and maintenance are worth it.

This is the kind of slow-down trade off one expects and can live with for functional programming as well.

The poster's contention, based on his results, seems to be that OCaml is in whole different Big O class of performance, outside the bounds of what is an acceptable trade off for the functional benefits.

The OCaml community is rightfully peeved at this, since it seems that the guy just isn't that good an FP programmer, and since it also smears one of their drawing cards as a "fast" FPL.

The lesson I think needs to be drawn from this is that, if you are worried about performance, invest in algorithmic improvements before going off on a spitting match between paradigms and PLs.

You can write slow code in any language

Still lots of Fortran code out there to do array processing that can't be beat in terms of performance.

Every language that I've come across requires a certain amount of skill and knowledge to come up with acceptable performance. The reason I dislike C++ is that the amount of arcana required to produce working code is immense - requiring a great deal of practice - more time and brain cells than I'm willing to invest based on the payoff.

Functional languages like ML and its derivatives don't relieve one of learning the do's and dont's, they just move the demarcation points. But generally, those stress points are easier to understand from a higher level perspective: Hash Arrays are slower than Maps; Tail Recursion can improve performance; etc... The bottom line is that you still have to have knowledge of data structures and algorithms, and how they are realized in the target language.

The (theoretical) general case

See this old LtU thread. It's a bit messay but the paper mentioned ("Pure vs Impure Lisp") is the one to read.

The functional pearls site is missing

That's it... it disappeared, apparently. I looked around with Google, to no avail; only got links to individual papers.

Measurement of Languages

To quote Guy Steele, the usefulness of a language should be measured by the time it takes for the user to get their results. And this includes the time spent programming or formulating the answer.

Do you have a reference for this?

I have this same viewpoint myself, but for me it means that I should write programs in such a way that other people can easily read the source.

I'd like to propose a new category to the Great Computer Language Shootout, namely Readability. But, I'm not sure how to unit test readability. The only tests I can think of are lines of code and metrics such as cyclomatic complexity. Any better suggestions?

How do you test code for understandability?

--Shae Erisson - ScannedInAvian.com

Could I suggest?..

Are Ours Really Smaller Than Theirs? paper. This paper overviews another formula for language expressiveness metric. And also refers to other experiments on expressiveness measurements.

Excellent, thanks!

This is exactly the sort of thing I've been searching for, thanks!

--Shae Erisson - ScannedInAvian.com