expressivity of lisp/scheme but speed of assembly/C/C++

I have recently found scheme to be very expressive language that allows me to not only quickly implement my project, but also to extend it easily. I also need my application to be as fast as possible (its basically a database application, as mentioned in an earlier thread). From what I gather, scheme/lisp implementation are far slower than C or C++ (even OCaml seems very fast). I've seen benchmarks where even scheme-to-c ends up being several times slower than c++.

Now my question is this: is it at all possible to bring the performance of scheme closer to c++?

I would think that lisp/scheme would actually be easy to optimize. Could they be optimized further if haskell like typing was introduced (I think Qi does that)...more information available to the compiler, better it can work!

If there are some things intrinsic to lisp/scheme (perhaps eval?) which keeps performance low, then I'll have to live with some performnace upper limit. Otherwise, why don't we see faster implementations? Is it because those who use lisp/scheme just don't have a need for performance?

I don't even mind if current performance limits exist simply because no one has written an aggressive enough compiler. I can prototype my application...then start work on a compiler.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Not all implementations are slow

I think you have been misinformed about several points. Let me try to explain:

  • You are writing a database application. The performace of a database application depends mainly on the speed of the database engine you're using. Most database applications are written in interpreted implementations (VB, Access, Perl, ...) simply because speed does not matter much for these applications.
  • There exists scheme/lisp implementations that are fast. From experience I would say Chez scheme is very fast. It's a fast compiler (can easily compile 25,000 lines/sec of scheme code (before expansion) on current hardware). The machine code it produces is very fast (otherwise the compiler couldn't have achieved such speed).
  • I would not say that optimizing scheme/lisp is easy. Having written a scheme compiler myself, I would say it's far more challenging to write a scheme compiler than it is to write a C compiler.
    Type-checking, higher-order-functions, data/control flow analysis, ... all add to the complexity. Decent scheme/lisp compilers would have to perform all of the optimizations that C compilers perform in addition to all the front-end optimizations that are unique to scheme/lisp.
  • eval does not cause a performance bottleneck since it's hardly used in production code or in performance critical sections. My compiler, just like Chez, provides two versions of eval: interpret and compile where the latter produces very efficient code for a slightly higher cost.
  • We don't see many high-performance implementations of scheme because many people like to write small toy implementations or have other goals besides performance (nice extended library for example). It's not because there is no need for performance. The need exists (though not as pronounced as the need for fast C compilers). The need is also satisfied by the existing high-performance implementations.

slow implementations

My interest is actually in building a database engine itself.

I've come accross Chez scheme...I'll try it out.

Your comment about ease of optimization is very interesting. I actually prefer to use compile time type checking (perhaps like haskell). I don't understand why higher-order-functions or control flow analysis would slow the runtime (I may simply not have gotten far enough in my compiler books).

By the way, database is just one example. I would like to use scheme/lisp for more complex programming such as pricing complex derivatives (Simon Peyton Jones showed how to do it)...but it is not likely to be used unless it can compete with C++ in terms of performance. Once again, I'll try Chez.

Fast Scheme

Let's not forget:

  1. Stalin
  2. Bigloo
  3. Gambit
  4. Chicken

These are all Scheme compilers of slightly varying degrees of R5RS compliance whose explicit goal is to generate small, fast code. From a compliance standpoint, some of them compromise, typically on call-with-current-continuation (surprise, surprise). For the rest, you have the usual suspects: typically some way to say "assume the standard bindings" coupled with "assume a closed world" coupled with some level of type inference so that you don't end up calling a generic + on a list of machine-sized integers, etc. There's no magic here, just a lot of elbow grease.

Used in performance sensitive programs?

Have any of the compilers you list been used in real, not-just-one-off-personal-projects, applications which would normally be the domain of C? Stalin was never designed to work with large programs, for example.

optimizing scheme

falcon: Your comment about ease of optimization is very interesting. I actually prefer to use compile time type checking (perhaps like haskell).
The major benefit of static type checking is not the elimination of runtime type checks. Every statically typed languages must performs some runtime checks/pattern-matching.

falcon: I don't understand why higher-order-functions or control flow analysis would slow the runtime
This sentence does not make much sense. What I said is that higher-order-functions make control-flow-analysis harder. The effect of that on the runtime depends on many factors including the transformations the compiler performs to higher order functions (lambda-lifting/dropping, closure-size-minimization, direct-call-optimizations, inlining, ...) and also on the effective use of higher-order-functions.

You claimed that optimizing scheme should be easy. I was trying to show that this is not the case.

As Fast As C

I've seen benchmarks where even scheme-to-c ends up being several times slower than c++.

You might be interested in C2 discussion of this issue:
As Fast As C.

Abdulaziz Ghuloum: You are writing a database application. The performace of a database application depends mainly on the speed of the database engine you're using.

I believe you misunderstood falcon: he is building exactly the database engine. I am not an expert, so I don't know if the I/O is not the bottleneck anyway.

As Fast As C

Andris, thanks for the c2.com reference, that site is always amazing! You are right about my interest in building a DB engine, rather than an app which makes use of a DB. Please see my reply to Abdulaziz.

I've always found this to be

I've always found this to be an interesting issue. As you move toward the
utmost performance, you naturally move away from expressiveness and dynamism.
If you look at many benchmarks in OCaml, for example, they're nothing like what
you'd use as examples of the beauty of ML-family languages. This may be a
natural effect of using hardware that's designed for C.

Garbage collection is still a big issue for real-time applications, which is
often what's being talked about when performance is supposedly critical. Even
if you can avoid all pauses (which is possible in theory but rarely implemented
outside of reference counting), there's often still extra work going on to
support GC-able data: tagging, making sure data structures and registers are
initialized to safe values.

The big issue for me is having a language that's not "big OS"-centric, something
that's simple enough to get running on embedded hardware (like a MIPS chip, not
some little 8-bit microcontroller). Usually there's just a flat address space
with no memory mapping hardware. Language runtime systems designed for
UNIX-like memory systems don't work well here.

importance of performance

I guess one of the questions I keep wondering about is: if an implementation is slow, is it because of some fundamental reasons, or is it because of the culture around that language.
At the same time I have to wonder how much productivity people like myself have lost because Java didn't think we could handle first class functions, operator overloading, open classes (mixins), etc. Also makes me wish I had payed more attention to theory during university :)

"higher level" is fundamentally further from the machine

I guess one of the questions I keep wondering about is: if an implementation
is slow, is it because of some fundamental reasons, or is it because of the
culture around that language.

When talking about languages like Scheme, ML, Haskell, and Erlang, in relation
to machine-oriented languages like C, I think it's because of fundamental
reasons. Simply put, you're operating on a higher-level in, say, Scheme.
You're not declaring types, you're not worrying about memory, etc. Trying to
optimize away those features completely is very difficult. You can also have
cases where you have two similar-seeming programs, but one program can have it's
memory usage inferred, removing the need for garbage collection, and the other
can't. How do you get that across to the programmer?

I think the best you can do is to pick and choose features carefully, to the
point where you can finely craft a runtime and/or code generator for it. That's
the OCaml method. They eschew function composition, for example, because that
makes it harder to generate fast and straightforward code. Go down that road
too far, though, and you end up with a machine-oriented language again.

Many reasons

I guess one of the questions I keep wondering about is: if an implementation is slow, is it because of some fundamental reasons, or is it because of the culture around that language.

Like most things, there are a number of reasons for the differences. Look at the $$$ poured into Java, or the sheer amount of time gcc has been around. It seems a little unfair to compare to something like Chicken. And then of course, there's the self-perpetuating nature of the whole enterprise. Intel makes their processors so that existing software will run fast. And a large portion of that existing software are in C-like languages. So the hardware is designed to execute C efficiently. And then anyone who comes along and wants fast code will realize that the best option is C. Around and around we go.

Backus

Speed is relative

Or at least, speed of programming languages is very much relative to programming idioms and which particular libraries are being used, etc.

Siskind's exercise in compiler ruthlessness is a good example: understanding the optimisations as well as he does, he can get magical performance from scheme code, but hardly anyone else can replicate the trick.

slightly different thread

I hope I'm excused for asking almost random questions...but basically I'm trying to figure out why, on one hand many of these non-mainstream languages are claimed to be better, but on the other hand they remain below the radar. I'm hoping that by poking and prodding I'll understand.

So, one of the reasons people continue to use Java/C# is simply because they are well marketed and because fresh grads from Comp. Sci. programs think of them as C/C++ from school...but without pointers (that's certainly how I started with Java). But why don't we have hard-core hackers who have written a whole operating system? C abstracts away assembly code but returns HUGE improvements in productivity. Why not let ML/Haskell/Scheme/Lisp abstract further details yet become orders of magnitude more productive?

On one end programmers could build 'cool' features to the operating system, academics could do experiments more easily, mediocre programmers could more easily contribute while gurus could contribute even more in less time.

On the other end some motivated programmers could continue to optimize everything that underlies the abstraction.

Wonder why that doesn't happen. Is it because, at the end of the day, many improvements in programming language theory are only marginally helpful to practicing developers...or is it that benefits of these improvements just haven't been 'marketed.'

Applications

With regard to your questions about creating a whole new OS, I suspect you'll find that part of the answer to your question has to do with applications. Look how hard it is for even relatively mainstream approaches to OS design to get adopted (BeOS anyone?). Now look at things like Bell Labs/Lucent's Plan 9 and Inferno language/OS combinations - they've seen very little use because there hasn't been much effort to port existing apps to them

There seem to be two approaches that do work on this regard:

  1. The Mac OS X approach - create a whole new development model (Cocoa) but provide backwards compatibility (Carbon) so that old apps still work.
  2. The Java/Erlang approach - don't create an OS, just create a platform that is almost an OS inside an OS, and port it to the "standard" OS's.

Why Aren't Boutique Languages Used More?

I'm trying to figure out why, on one hand many of these non-mainstream languages are claimed to be better, but on the other hand they remain below the radar.

If you can answer this question you will be placed among the marketing Gods :-). Part of me wants to say it's completely random (else how would one explain Perl?). Other parts of me think it's one of the universe's little jokes on us.

In reality, first understand that it's not a claim. Many of these languages are better. People using them are able to produce more understandable and maintainable code at a higher productivity than with other mainstream languages. In addition, they are often more linguistically coherent making them easier to learn.

So why don't people use them?

First, many of these boutique languages have been designed to explore or highlight new programming paradigms - symbolic processing in Lisp's case, OO programming for Smalltalk, functional programming principles for ML, etc. As such, people who have seen only procedural languages see them as odd. To appreciate these languages you often have to work to learn a new paradigm as well as a new language.

Next, as prototypical languages, many of them do not have extensive libraries or support for external components. So, what in Java was a simple excercise in including the SAX XML parser might mean building a new interface to third party code in a boutique language. This is exacerbated over time by more widely used languages having more support (and not only in libraries - this happens in training materials, documentation, etc.).

Finally, mainstream languages often have a large number of people proselytizing them. This leads to a network effect and/or buzz surrounding them. It also happens to provide a large number of people with whom you can discuss problems you might be having with understanding the language. Gathering together a similar group of boutique language users may be difficult.

Given all this a salient question might be "Why don't the boutique languages die out?"

Well, first of all, don't forget that they work and they do work well. That's a great start.

Next, people are fiercely loyal to them. They know these languages work well and have gone to the trouble of overcoming the learning hurdles inherent in using them. They have a vested interest in seeing them succeed. Mainstream language users have little "skin in the game". They can switch to the flavor of the month language very simply and next month... there will be another one waiting to take its place.

Finally, the people who do use these languages and take the trouble to learn new paradigms are generally more successful than those who do not. This is true even if they are foced to use languages other than their preferred ones. Their exposures to different paradigms makes them better programmers in general. In the land of the blind, the one eyed man is king and all that.

So, in the final analysis, the real question might be "Where can I use these languages profitably?"

And the answer is "Just about anywhere where you aren't prohibited from using them."

Just do it.

A minor quibble

Claiming that the statement "non-mainstream languages are ... better" is not a claim is begging the question a bit, doncha think?

I'm not disagreeing with the claim... however it is still a proposition which may or may not be true. What works well for us PL geeks may not work well for other programmers--most claims of "better" or "worse" in programming languages usually need qualification. Saying that something is a "fact" rather than a "claim" strikes me as an attempt to declare a proposition to be an axiom... which is always a dangerous thing to do.

The statement you could make which is probably uncontroversially true is "expert programmers report greater productivity in non-mainstream languages". Which is a significant result, to be sure. Extrapolating such to claims of overall superiority strikes me as premature.

You are correct, of course, in describing the various network effects which keep many of the mainstream languages entrenched. At any rate, developers of non-mainstream languages would do well to study the examples of Perl and Python (and perhaps, to a lesser extent, Tcl, PHP, and Ruby)--not for their designs as programming languages, but for how they managed to cross the chasm from botique language to mainstream.

Anyone found to be grousing that these languages (and or their respective designers) somehow "sold out" or "compromised" would of course be missing the point. :)

Perl

Perl was in the right spot at the right time. The Web was just beginning and Perl fit the bill for web development on the server. Back then if you were in web development you used Perl. Perl made life a lot easier and there was not anything that could compare. This gave it a huge kickstart and large base of developers.

For any new language, OS, or whatever to takeoff it needs a similar kickstart (kinda like jets catapulting off of a carrier). There has to be some fairly strong compelling reason for a very large number of people (managers, developers, users; this includes casual users) to switch. Most new stuff that comes along just doesn't provide enough benefit to merit its existence. ☣

Surprisingly

Surprisingly the bottlenecks are different depending on different usage:
a) Highly concurrent database servers are more concerned about:
i) caching/in-memory database
ii) multithreading, (Microsoft's uses lightweight threads, called fibres, which sounds like cooperative coroutines)
iii) MVCC
iv) query plans
(Disk access speed used to be important, but with memory so cheap, the opportunities are different)
b) Embedded databases are concerned about:
i) footprint
ii) self-healing
c) Desktop databases are concerned about:
i) page locking
ii) query-optimization without benefit of cache-support

Thre was an essay on the net about a class who was given a problem which has to be computed 'as fast as possible', and everyone else chose C, while he used Lisp (?) and applied standard program transformations to his initial program until ported the final version to C. If I remember correctly, he came out 2nd fastest. (Any readers who still have the link to that article?)

It's a talk by Dan Friedman

Great talk

I regularly use this example in my introductory lecture before embarking on EOPL.

database engine in perl

This is somewhat tangential to your post, in that it doesn't address scheme/lisp.

You might be interested in Genezzo , a database engine being written in perl.

The speed of a typical language implementation is not necessarily indicative of an end system's performance. Some examples of high performance systems written in high(er) level languages:

The throughput of the java=based WebLogic server was much better than Apache (last when I saw it a few years ago). The distributed transactions framework (which I co-wrote) is significantly faster than its counterpart in the Tuxedo TP monitor, which is written in C (the logging subsystem is 10 times as fast).

Here's a report of an implementation of sendmail in Erlang.

The PyGame framework allows you to write games in python.

I don't know of any language/runtime that provides C's speed, the elegance of haskell, and java style portable libraries. To me, the availability of fast libraries matters much more than language performance shootouts.

Have you considered PreScheme?

To quote Olin Shivers' History of T -

"Pre-scheme was quite interesting. Kelsey published a paper on it, as well, I believe. It was Scheme in the sense that you could load it into a Scheme system and run the code. But it was restrictive -- it required you to write in a fashion that allowed complete Hindley-Milner static type inference, and all higher-order procedures were beta-substituted away at compile time, meaning you could *straightforwardly* translate a prescheme program into "natural" C code with C-level efficiency. That is, you could view prescheme as a really pleasant alternative to C for low-level code. And you could debug your prescheme programs in the interactive Scheme development environment of your choice, before flipping a switch and translating to C code, because prescheme was just a restricted Scheme."

this page has links to all of the papers on it. The source is part of Scheme48

interesting

I never looked at pre-scheme, its very interesting. Thanks!

Why not use Ocaml instead?

Is there enything in Scheme that would be missing in Ocaml, except for dynamic typing, which is actually very bad for real programming? Apart from making it much harder to make certain mistakes during programming static typing is also one essential property that makes it possible for a language to do better optimizations

syntax

I looked at OCaml, I was very impressed with its performance (even though I had convinced my self that I didn't need great performance). Honestly, I haven't gotten over its syntax...I figured that if I had to learn a new syntax, may as well make it lisp/scheme. OCaml does offer static typing...but projects like Qi showed that typing could be done in lisp as well.

Common lisp shows that typing

Common lisp shows that typing can be done in lisp as well.

Bother to learn what's being discussed

Common Lisp's type system, while decent, is not comparable to OCaml's, for the specific reasons that Qi was created (on top of Common Lisp) to address.

Bother to read the words on your screen

Before you start throwing brickbats around, I suggest that you learn to read. The post I replied to said that Qi shows that lisp programs can be typed. It is incorrect to say that common lisp does not include a type system. It is also incorrect to say that either post discussed whether or not one type system was as powerful as any other.

Although I'm not an editor...

I'll still suggest that the two of you tone down the rhetoric a notch. This isn't /. after all.

The beatings will commence...

..until morale improves. Suggest reading the FAQ. And if that doesn't work, at least write something that's worth reading. Something like:

Both O'Caml and Lisp are horrid and have no redeeming qualities what-so-ever in the serious study of programming languages!!!

This is true. After all, thei

This is true. After all, their antiquated feature sets have been around long enough that anyone ought to consider them mainstream in a way that C# cannot yet hope to be.

Speaking of syntax

There is a very nice, although quite under-used, feature in OCaml, which you may find interesting: Camlp4. Basically, this is OCaml's fully-static fully-typed and richer (I believe) counterpart to Lisp's macro system.

More specifically, it lets you fully customize OCaml's syntax, add or remove primitives of the language, etc. I have seen it used to produce a version of OCaml with the syntax of Pascal, a OCaml with built-in regular expressions and a few others.

OCAML's "Shootout" Performance...

...is a bit misleading, in my opinion. Examining the OCAML sources reveals that they are almost all imperative. So I don't think they really can be considered an functional programming language benchmark. (OCAML does excel at mixing imperative and functional programming styles though, and it's good to know that you can get such good code when you need it.)

Concurrent Clean's showing is a much more impressive feat. (Though there is the argument that extensive use of uniqueness typing is essentially imperative programming.) But benchmarking of any type, *especially* your typical shootout, must be taken with a rather large grain of salt. After all, if your task is io-bound, you aren't going to notice a shred of difference between C and Python/Perl,

(It also amuses me that compiling C tends to be an io-bound task whereas compiling many functional programming languages tend to be a cpu-bound task)

Good point

While being able to mix imperative and functional programming is a good thing,
it's almost certainly the case that people aren't flocking to OCaml so they can write C-style
imperative code with it. I'd like to see the OCaml benchmarks written in a pretty and functional
style (where applicable).

Microbenchmarks Lead to Microoptimization...

... which leads to suffering, etc.

I disagree with this thesis: it reads to me as "O'Caml is only good when it's used imperatively." But it ignores the obvious selection bias: what's being measured, at least with the default settings of the shootout, is strictly CPU performance, which these days, for most applications, is the least important factor.

What I find interesting about O'Caml is how hard it is to shake from being near the top when you add weights for code lines and memory use: I'm generally at least as interested in economy of expression and resource consumption as I am in CPU time, and often much more.

Finally, O'Caml remains an extremely expressive functional/OO language—AFAIK, only O'Haskell and Oz are in the same class in terms of expressive power, and neither is in terms of performance—but that isn't what's being measured in the shootout. If someone wanted to do a "functional language expressivity shootout," that would be very interesting indeed!

Measuring expressive power?

Could you define 'expressive power' in a way that could be measured? What are your criteria for ranking O'Caml, O'Haskell and OZ above other languages?

Measuring Expressive Power

Informally, I'd say "lines of code that aren't line-noise like some perl or most APL." The sense I'm speaking of is: how much boilerplate code can I avoid? How much "bookkeeping" does the language make me do/help me avoid?

More formally, of course, On the Expressive Power of Programming Languages is the canonical reference.

Are only a-machines covered?

More formally, of course, On the Expressive Power of Programming Languages is the canonical reference.
The paper is not too short to figure an answer quickly, so maybe you can help me - does it cover only languages describing functions from a finite input to finite output (Turing a-machines)? Or do interactive processes fit the framework as well?

I quickly browsed the paper and found that many definitions use the fact of termination of the program, this made me suspicious.

Shootout ?

If you are mentioning the Big Language Shootout, keep in mind that a number of the "exercises" explicitely require the programmer to implement the solutions with a specific imperative algorithm.

>It also amuses me that co

>It also amuses me that compiling C tends to be an io-bound task whereas compiling many functional programming languages tend to be a cpu-bound task


Brilliant comment - It supports your own prejudices whether you are Cfolk or lambdafolk! In C the compiler isn't doing enough to help you (verbosity designed to help the compiler - at your expense). Watch my assembly compiler screaming.