Compilation to C, a definitive guide?

Does anyone know of a "definitive guide" to using C as a compilation target? I have some of the dated references, like Appel's A Runtime System, and obviously "Cheney on the MTA" which forms the basis for Chicken Scheme, but there have been more recent developments, like Henderson's shadow stack and lazy pointer stacks.

I haven't found much recent work on supporting tail calls in C either, other than trampolines and Cheney on the MTA. Scheme has some papers on compiling to C, as does Bigloo, and Mercury. This Scheme paper seems like the best reference.

So does anyone know of a guide that discusses the various techniques and tradeoffs of compiling to C? Or any papers that you might consider relevant? I started writing my own guide, since I haven't found anything comprehensive yet, but figured someone here would know if something like it already exists.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


There's a paper that came out of Microsoft Research's Barrelfish effort, "Filet-o-Fish: practical and dependable domain-specific languages for OS development, which touches on this.

Flattening Transformation

Just came across Eliminating the Call Stack to Save RAM, which describes a sort of inlining transformation of C code to eliminate stack frames. I was considering a very similar transformation to support mutually tail recursive functions in C; interesting to see that it has wider use.

Re: Filet-o-Fish, looks like a very relevant paper, thanks!

A similar approach

I'm working on implementing a somewhat similar approach for the C backend of my Rhope compiler. Essentially I compile the whole program down to a single C function containing a big switch statement. Each function gets a label and a case statement. After each call within the individual function bodies, there is an additional case statement. Calls which can be resolved statically are done with a goto. Calls which are resolved at runtime are handled by sticking the ID of the function (the number that corresponds with its initial case statement) into a certain variable and doing a jump to a label that precedes the switch statement. Returns are handled in a similar manner. Before the call is made, the number that corresponds to the case statement following the call is pushed onto a stack. When it's time to return, that number is pulled off the stack and then used in the same manner as with calls that are resolved at runtime.

Using computed gotos for returns/late-binding would probably be more efficient if you don't mind limiting yourself to C compilers that support them, but I haven't done any testing to see how big of an improvement it makes.

Switch threading

... I haven't done any testing to see how big of an improvement it makes.

But Anton Ertl has... In general, I highly recommend that page if you're interested in a comparison of relatively portable interpreter/VM techniques.

I love Ertl's idea of inline

I love Ertl's idea of inline threading, but it's rarely portable from even one gcc version to another. Last I spoke to him, he was working on making gcc more friendly to it.

single big C function can be a burden for the C compiler

A single C function as the whole program is a burden for the C compiler having to optimize it.

In practical terms, some versions of gcc with -O2 may show a quadratic (w.r.t. to the C function's size) behavior: you'll need a lot of RAM (4Gb is not enough) and a lot of CPU time to compile a big single 50KLOC function with -O2 optimization


This sounds like the C code produced by the Gambit-C scheme compiler. It avoids the huge function problem by splitting the compiled code into multiple host functions (unless you ask it not to). Gambit will happily consume 4Gb of RAM while compiling with optimizations and the --single-host switch.

Gambit performs well. Here is a simplifed version of the compiler presented by Marc Feely.

Yes, I read that about

Yes, I read that about Gambit. I'm interested to read more, but your link is broken. :-)

re: Missing Link

I assume this is the desired page :- The 90 Minute Scheme to C compiler.

It depends of what kind of C you target... (see MELT in GCC)

FWIW, I just submitted a paper to Compiler Construction 2010: MELT - A Lisp dialect matching GCC internals

And it really depends of what kind of C you want to compile to: if your translator targets a well defined small library, or just plain C code with almost no external dependencies (except perhaps your own runtime, e.g. garbage collector), it is very different from targeting C code fitting into a huge legacy software (in MELT case's, the GCC internals API) which has a big, still evolving API.

In my understanding, Chicken or Stalin (both are Scheme->C compilers) are different from MELT, mostly because the challenge inside MELT is to target a huge API (the GCC internals), and that requires various linguistic devices (describing how some operators should be translated to C following some particular style & API).

About MELT, see also MELT wikipage


Why compile to C?

Compiling to C is a bad idea.

You have to jump through hoops if you want language features such as tail call optimization or continuations. Not to mention the difficulty of implementation features such as precise garbage collection, single-stepping through compiled code, a fully compiled REPL, interactive code reloading, etc.

All of these incredibly basic things are easy with direct code generation, but toy functional languages struggle with them because they're such a pain to do when your compiler targets C. Entire papers have been written about simulating these features on top of C, using incredibly complex techniques which have various amounts of overhead. Nobody would write a paper about tail call optimization on machine code, because its trivial.

Basically, it is not possible to implement a modern, usable language that compiles to C. Bite the bullet and write a real compiler that targets machine code or LLVM.

Simple answer: portability.

Simple answer: portability. All of the features you mention are doable within C, and are not more difficult than in ASM, though the resulting program will definitely be slower. LLVM was a contender at one point, and may still be, but LLVM still isn't ported to enough architectures to satisfy the types of development I'd like to explore.


Compiling to C can be very useful in an environment dominated by C. Currently I'm working on a game written in Python, which is too slow for the critical numeric and graphics routines, so we add a few type declarations and compile to C with Cython (successor to Pyrex). We also interface directly with C libraries instead of using SWIG or ctypes FFI, thus avoiding a whole layer of dependencies.

Messy? You bet. Cython hides a ton of complexity. But it allows us to intermix static and dynamic code quite freely, effectively in the same language. And we stopped searching for "the perfect language"; this is good enough for now.

I agree, though: direct compilation is ideal -- and easy. It would be practical if it was used throughout the system/libraries instead of C and everybody used it. Question is, would it be held together by Lisp, Factor, etc, or some language-independent protocol, or what? There's a reason why TUNES is vaporware after 15 years :-) I suppose LLVM is the next best thing now that it's integrated with GCC and seeing widespread use.

Because it is easy

I compile from source code to lambda expressions to combinators to thunks which are trampolined. Not a lot of magic to it, each combinator is a function, and no need for tail call optimization. I don't go for optimal speed, so everything but graph rewriting, like handling integers, is handled through ffi.

Tail calls...

If you target gcc, you can just compile your tail calls to ... TaDa! ... tail calls, the compiler will do the elimination for you. Heck, it will even automatically transform a naive recursive faculty function into a proper tail recursive form (i.e. loop with an accumulator), a feat most current functional language compilers fail at, see slide 40 of Felix von Leitners recent talk on compiler optimization.

Um, no

As far as I can tell, GCC just can do tail recursion, and even that only under certain restrictions.

For the compilation of functional languages you need something much more general. In particular, you need to be able to do tail calls not just to other functions in scope (e.g. for mutual recursion), but even to first-class functions. In C terms, you want to tail-call through arbitrary function pointers where you don't know the callee statically.

What is this comment based on?

Under an old gcc version (3.4.4 that came with cygwin), using -O3 -S, I compile the following:

int bar(int n, int *acc, void (*f)())
    if(n == 0) return *acc;
    (*acc) ++;
    int (*rec)(int,int *,void (*)()) = (int (*)(int,int *,void (*)())) f;
    return rec(n - 1, acc, f);

It outputs the following:

	pushl	%ebp
	movl	%esp, %ebp
	movl	8(%ebp), %edx
	movl	12(%ebp), %eax
	movl	16(%ebp), %ecx
	testl	%edx, %edx
	jne	L5
	popl	%ebp
	movl	(%eax), %eax
	.p2align 4,,7
	incl	(%eax)
	leal	-1(%edx), %eax
	movl	%eax, 8(%ebp)
	popl	%ebp
	jmp	*%ecx

Can you give an example of a case it doesn't optimize?

Mh, interesting

Mh, interesting. I was under the impression that this does not work.

But still, what if you are calling a function that happens to have a different argument list than the caller? And to make it more interesting, one that doesn't fit in registers? Or where the caller's one doesn't?

One (not so big) problem

I haven't checked the other cases you mention, but one thing that I'd guess would be a problem (and have now confirmed as a problem) is this:

void bar(int *);

int foo(int n)
    int x;
    bar(&x); // Does bar retain a pointer?  Don't know
    return tc(n); // Can't tail call

Edit: Actually, this case can be re-written as:

int foo(int n)
        int x;
    return tc(n);

And this can be generalized to support any tail calls. The only real problem I think is that, since gcc considers this an optimization, it may decide it's more optimal in some cases to grow the stack instead of shuffling data around to allow the tail call. I may play with this more later to see when it makes that decision. It'd be nice if you could just instruct gcc to make a tail call.

It'd be nice if you could

It'd be nice if you could just instruct gcc to make a tail call.

I'm seeing a lot of this in the recent discussions about tail calls. Not a bad idea I don't think, for an optional "tail" annotation on calls (like how LLVM and CIL have in their IR). This way the user can be assured a call reduces to a tail call if the compiler does not spit out an error.

gcc sibcall rules

Gcc will not attempt TCO for calls needing more argument stack space than the caller, because the caller's stack frame (local slots) are built on top of its arguments, and making room for more arguments would force the caller to rearrange its locals first - and this is typically not an optimisation for C code.

It's less of a restriction for modern platforms where most parameters are passed in registers. (Some ABIs unwisely mandate argument stack space allocation for register arguments.)

As other people have said, it would be nice to have an annotation to enforce TCO.

Looking at the source

This is not a trivial change. It looks like it would be pretty trivial to ignore potential aliasing issues, but there are also fifty things that can block tail call optimization from being attempted or cause it to fail if attempted. And most of them look like they'd break the code if removed, rather than just potentially slowing things down.

Partial solution

A much smaller step (and a trivial change) would be to add an annotation of tail calls that makes the compiler produce an error (or warning) if a sibcall is not possible. That would at least guard against surprises when C is used as a back-end.