archives

LtU database problem

On posting a comment, I get:

user error: Incorrect key file for table './ltu/cache.MYI'; try to repair it
query: DELETE FROM cache WHERE expire != 0 AND expire 

The comment is posted, but the above sounds wrong.

Request For Advice and Guidance On Writing a Scheme To C Compiler?

Dear Lambda The Ultimate,

I've been working very hard recently on writing something I thought would be entirely straightforward: A scheme to C compiler that bootstraps - nothing I do seems to work. On my first attempt (which was able to compile and run its own parser) there was no GC or tail calls so it demanded >8GB of memory and failed, on my next the emitted code blows up so huge that it grinds to a halt. I hope someone could give me some advice on what I should do or what I need to learn to complete this? It would also be interesting to discuss and read about other members experiences.

Starting

I've been interested in programming languages since I started programming a long time ago, and have explored and written parsers, type checkers and interpreters for a fair selection of different types of programming languages like lisp (of course!), prolog (and minikanren), a cut down version of haskell with hindly milnor type checking as well as some unusual things like linear and reversible programming languages.

A long time ago I read Marc Feeley - 90 Minute Scheme to C compiler when it came up on this site and it was fascinating to understand (or so I thought) how a scheme compiler could be done, in particular using the CPS transform to deal with tail calls and call-with-current-continuation. This got me interested in continuations and I studied Appel - Compiling with Continuations but I didn't write a compiler back then.

Continuations

Since then I've read about Abstracting and Representing Control from Olivier Danvy and written metacircular interpreters that emit continuation semantics (pure lambda terms without any control operators). I also came to prefer delimited continuations rather than call-with-current-continuation and decided that I would like to implement those directly in my own compiler. I noticed that in Marc Feeley's compiler he was able to just use the continuation semantics of call-with-current-continuation as is because it's already in CPS form, on the other hand the continuation semantics for shift and reset are not in CPS form so this technique cannot be used: one would have to CPS transform twice to get the necessary metacontinuations - but this also causes a huge blowup in the number of lambda terms. So I studied other approaches to direct implementation of shift/reset:

In the end I felt like building a runtime for my compiler to target (in C) that handled delimited continuations was something I could add later on but for now I should forget about that (and if it was require in bootstrapping I had an easy way to interpret them), so I just used Matt Might's very clear and simple explanation of hybrid higher order CPS transform How to Compile with Continuations.

Getting Started

When the got the idea to write a self hosting scheme compiler I quickly wrote out a parser that's able to parse as much of scheme as I needed as well as the file itself. Then I initially came up with something very simple that just performed closure conversion: It was a surprise to me that this was basically enough to execute lambda using C procedures for code and vectors for their environment - it just lacked garbage collection and tail call elimination. I pushed on with this hoping lack of optimisations wouldn't be an issue ending up with a system that had one compiler pass per file as well, some utility functions and a script that chained them all together. This quite successfully compiles all the small simple scheme programs I gave to it (getting a program with the Y combinator to compile and run for the first time was very exciting!) but it just isn't good enough to bootstrap.

So I looked back a bit and read over Guy Steele's RABBIT paper and decided to start again from scratch. This time I started with a C runtime that had a simple two space copy and collect GC which uses the stack as the root - the stack itself is executed by a trampoline (a while loop that pops a continuation off the stack and calls it) and the continuations pop arguments off the stack, compute something with them then push a continuation onto the stack and return. I wrote some tests for the runtime: for example compiling recursive and iterative factorial functions by hand. The value of this approach is that you get tail call optimisation "for free" it only requires doing a CPS transform in the compiler.

So the overall architecture of the compiler is now:

  • Parse a scheme file to get the s-expression tree
  • "desugaring" - I have looked into syntactic closures and I think that I would like to build a good macro system out of this in future but to start with I decided to just hard code expansions for all the special forms.
  • mutability analysis - since I use what I found out later is called "flat-closures" I had to box variables that SET! is used on
  • continuation passing style transform - this makes every call a tail call
  • lambda lifting - I added this optimisation at the end but it didn't help at all [it looks for in place lambda applications like ((lambda (var ..) body ...) param ...) and adds its free variables as extra parameters - this means that it can stack allocate them instead of heap allocate then]
  • closure conversion - this removes all free-variables from the code by creating closures (code & environment vector pairs that holds the closed over variables)
  • hoisting - I give every lambda a name and move it to the top level since the code part of each closure will be implemented as a C function
  • c-gen - this translates continuation passing procedure calls into stack machine operations that my runtime can execute, it also handles taking things off the stack to allocate them on the heap

after that actual C source code is emitted and some simple programs like an infinite lazy stream of fibonacci numbers do execute (with the correct space usage i.e. TCO and GC are working) - but when I started to head towards bootstrapping just adding a recursive definition of the EQUAL? function blow up horrible creating 40k lines of C.

Looking for more to read

I've looked for more modern things to read that might help me build a self hosting scheme compiler like:

and tried to find short readable implementations (studying real world compilers is just too hard since they are so complex and enormous)

  • http://canonical.org/~kragen/sw/urscheme/
  • https://github.com/darius/ichbins/
  • https://bitbucket.org/bunny351/bones/src

but all in all I just don't know what I'm missing - I don't want to give up though because I think there's an important difference between understanding something and actually doing it. In particular I feel like I can't progress in my study of programming languages until I pass this block (I want to experiment with building compilers using Futamura projections like PyPy does after I complete this for one thing).