An Incremental Approach to Compiler Construction

Abdulaziz Ghuloum (Indiana University), An Incremental Approach to Compiler Construction

Compilers are perceived to be magical artifacts, carefully crafted by the wizards, and unfathomable by the mere mortals. Books on compilers are better described as wizard-talk: written by and for a clique of all-knowing practitioners. Real-life compilers are too complex to serve as an educational tool. And the gap between real-life compilers and the educational toy compilers is too wide. The novice compiler writer stands puzzled facing an impenetrable barrier, “better write an interpreter instead.”


The goal of this paper is to break that barrier. We show that building a compiler can be as easy as building an interpreter.

This paper from the Scheme workshop presents an extended tutorial showing how to implement a compiler from Scheme to x86 assembly language. The compiler is written in Scheme, obviously (something that may confuse students, I might add, so beware).

An important aspect of the presentation of the material is that the compiler is built incrementally, and the product of each step is a working compiler for a growing subset of the language. This is similar to the EOPL approach. In contrast many compiler textbooks and courses follow a different route, in which each phase is dedicated to building a different part of the compiler (i.e., lexer, parser etc.), and thus the student only sees how the parts interconnect, and gets the satisfaction of seeing the compiler work, at the very end of the process.

Supporting material can be found here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Bottom up

It's a particularly fun paper because it builds the compiler bottom up. The first thing he handles are fixnums: he moves them into eax, because that's what gcc does for a simple program that returns an integer. Then he moves on to tagging for different types of values, and so on.

A pretty cool idea, actually -- he assumes no knowledge of x86 in particular, instead using the code that gcc generates as a guide.

definitely fun

It's a particularly fun paper

I am working though the paper now and I can definitely agree that it is fun!

Nice

For the enthusiast that's not going to be building compilers for a living, it's nice to be able to get to the meat of a compiler without getting bogged down in theory, which can be explored later if further interest is there.

I'm sure it's been mentioned here before, but Jack Crenshaw's Let's Build a Compiler seems to be something along the same lines.

It has been mentioned

I first heard about Let's Buld a Compiler through this very forum, but was a bit disappointed when I tried reading it. It's a great series of articles, but it's rather long-in-the-tooth now, and I don't think the years have been kind to it. (68000 assembler + Pascal anyone?)

Well for readability, I'd

Well for readability, I'd take 68000 assembler instead of x86 assembler without any hesitation!

Minimalist good

I appreciate the minimalist style of this tutorial. Ghuloum seems to have put a lot of thought into giving the reader exactly the right amount of information -- just enough to leave the reader feeling confident that he can figure out what to do next for himself.

Previously on LtU: The 90 Minute Scheme to C Compiler.

Good, but familiar

Structure and Interpretation of Computer Programs makes the same point. Ditto for Norvig's wonderful Paradigms of AI Programming. And then of course there's Crenshaw's classic "Let's Build a Compiler!" (which now has a Forth version. Not that I get tired of papers like this, mind you :)

Minimalist Language Development

How does this approach lend itself to language development? It would be interesting to develop a language from object code primitives and build its compiler at the same time...

Compared to EOPL

How would this compare to the approach in Essentials of Programming Languages, from those who have read both?

Both? I am not sure exactly

Both? I am not sure exactly what you are referring to.

The EOPL appraoch (philosophy, if you will) is to gradually build language constructs, while having a working interepreter at all stages. That's was my main point.

As regards compilation, EOPL1 derives a compiler (and VM) from an interpreter. This specific technique is unrelated to the current paper.

Ah yes

Sorry. I somehow forgot the line in your initial post that says it is similar to EOPL in this respect.