Request For Advice and Guidance On Writing a Scheme To C Compiler?
Dear Lambda The Ultimate,
I've been working very hard recently on writing something I thought would be entirely straightforward: A scheme to C compiler that bootstraps - nothing I do seems to work. On my first attempt (which was able to compile and run its own parser) there was no GC or tail calls so it demanded >8GB of memory and failed, on my next the emitted code blows up so huge that it grinds to a halt. I hope someone could give me some advice on what I should do or what I need to learn to complete this? It would also be interesting to discuss and read about other members experiences.
I've been interested in programming languages since I started programming a long time ago, and have explored and written parsers, type checkers and interpreters for a fair selection of different types of programming languages like lisp (of course!), prolog (and minikanren), a cut down version of haskell with hindly milnor type checking as well as some unusual things like linear and reversible programming languages.
A long time ago I read Marc Feeley - 90 Minute Scheme to C compiler when it came up on this site and it was fascinating to understand (or so I thought) how a scheme compiler could be done, in particular using the CPS transform to deal with tail calls and call-with-current-continuation. This got me interested in continuations and I studied Appel - Compiling with Continuations but I didn't write a compiler back then.
Since then I've read about Abstracting and Representing Control from Olivier Danvy and written metacircular interpreters that emit continuation semantics (pure lambda terms without any control operators). I also came to prefer delimited continuations rather than call-with-current-continuation and decided that I would like to implement those directly in my own compiler. I noticed that in Marc Feeley's compiler he was able to just use the continuation semantics of call-with-current-continuation as is because it's already in CPS form, on the other hand the continuation semantics for shift and reset are not in CPS form so this technique cannot be used: one would have to CPS transform twice to get the necessary metacontinuations - but this also causes a huge blowup in the number of lambda terms. So I studied other approaches to direct implementation of shift/reset:
In the end I felt like building a runtime for my compiler to target (in C) that handled delimited continuations was something I could add later on but for now I should forget about that (and if it was require in bootstrapping I had an easy way to interpret them), so I just used Matt Might's very clear and simple explanation of hybrid higher order CPS transform How to Compile with Continuations.
When the got the idea to write a self hosting scheme compiler I quickly wrote out a parser that's able to parse as much of scheme as I needed as well as the file itself. Then I initially came up with something very simple that just performed closure conversion: It was a surprise to me that this was basically enough to execute lambda using C procedures for code and vectors for their environment - it just lacked garbage collection and tail call elimination. I pushed on with this hoping lack of optimisations wouldn't be an issue ending up with a system that had one compiler pass per file as well, some utility functions and a script that chained them all together. This quite successfully compiles all the small simple scheme programs I gave to it (getting a program with the Y combinator to compile and run for the first time was very exciting!) but it just isn't good enough to bootstrap.
So I looked back a bit and read over Guy Steele's RABBIT paper and decided to start again from scratch. This time I started with a C runtime that had a simple two space copy and collect GC which uses the stack as the root - the stack itself is executed by a trampoline (a while loop that pops a continuation off the stack and calls it) and the continuations pop arguments off the stack, compute something with them then push a continuation onto the stack and return. I wrote some tests for the runtime: for example compiling recursive and iterative factorial functions by hand. The value of this approach is that you get tail call optimisation "for free" it only requires doing a CPS transform in the compiler.
So the overall architecture of the compiler is now:
after that actual C source code is emitted and some simple programs like an infinite lazy stream of fibonacci numbers do execute (with the correct space usage i.e. TCO and GC are working) - but when I started to head towards bootstrapping just adding a recursive definition of the EQUAL? function blow up horrible creating 40k lines of C.
Looking for more to read
I've looked for more modern things to read that might help me build a self hosting scheme compiler like:
and tried to find short readable implementations (studying real world compilers is just too hard since they are so complex and enormous)
but all in all I just don't know what I'm missing - I don't want to give up though because I think there's an important difference between understanding something and actually doing it. In particular I feel like I can't progress in my study of programming languages until I pass this block (I want to experiment with building compilers using Futamura projections like PyPy does after I complete this for one thing).
Active forum topics
New forum topics