Language Implementation 101

There has been a Language Design 101 with the comment: "Let's not to turn this into a thread about language implementation tips. We'll do that one later." I have not found this "later" so far. Here it is: How would you implement your language?

Of course, the requirements might be very different. Quick implementation to try out programs as soon as possible? Using LLVM for optimized code generation? Interpreter vs compiler? Just a language extension or a clean-room implementation? So the tips should clarify their context.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

C vs OMeta

For my current experimental language I have a nearly finished parser written in C. I intend the language to be used as an embedded scripting language (like Lua, Squirrel,...) and C-bindings are easy to embed nearly everywhere.

However, it becomes tedious. Since I do not want to include external dependencies for data structures, I'll have to implement the symbol table completely from the ground up.

To play around with my language, I'm thinking about a first implementation with OMeta/JS. This would also give a nice way to present it on the web (without downloading). However having no experience with OMeta, I wonder how it works out after the parsing, e.g. for implementing type inference.

To play around with

To play around with my language, I'm thinking about a first implementation with OMeta/JS. This would also give a nice way to present it on the web (without downloading). However having no experience with OMeta, I wonder how it works out after the parsing, e.g. for implementing type inference.

I, too, think that parser combinators and other "a la PEG" fluent interfaces defined in the meta language(*) are a nice, cheap and easy way to get started. I've read many times about OMeta/JS being a powerful helper in that respect as well (if you don't want or need to craft your own parsing support layer up).

Re: your remark about type inference, in the hope it's relevant/it could help, coincidentally, I have shared about two versions of the same HM(X) PoC I've made for my own learning process up there, on LtU. 'HTH!

(*) i.e., "implementation language"

Meta in C

Although JS is a good place to start, the Meta/OMeta technique more-or-less works in C. You definitely need an intermediate representation for type inference etc. There are some nice little data structure libraries; I used UTHash, which consists of macros only. I also found it cleaner to read/write streams everywhere, using open_memstream() to read/write string buffers.

Write your testcases first

I wouldn't even say write a quick implementation to try out programs as soon as possible. My advice is write those programs first in whatever language you have available. Find out how their implementation stumbles, and look for that common structure among them that you are going to bake into a language. See which things annoy you writing them in an existing language and check that your motivation for implementing a new language is worth the effort.

Language syntax and parsing is generally independent of the problems that you encounter in getting an implementation to work. I normally skip any kind of parser or fixed syntax and instead encode test programs in whatever intermediate representation I expect the parser to output. The semantics of those structures and how to implement them is generally where you discover if a new langage is going to "work" or not. Interpreters are a much more direct route to this discovery than compilers. If you encode your interesting testcases and write an interpreter for them as your initial approach then you are hitting the low-hanging fruit earlier. This type of implementation is not too different to the implementation of the motivating examples, and wrapping a layer of interpretation around a set of problems lets you verify that the common structure that you expected is present, and that is useful to merge it into one place.

After that it depends on your target audience: fixing the syntax and putting a parser on the front makes your implementation available to a much wider audience, while focusing on the code generation improves the quality of your solution to the motivating problems. So this also widens your audience but among a different group of people.

My example toy language with comprehensive testcases


Comments most welcome.

Edit: never mind, I misunderstood what you meant by testcases :/

I love the name :)

I love the name :)

Also love freedom from parentheses.

Got confused at first with "=" as assignment operator.

Not sure the propagation of labels is worth the performance penalty for apply.

For test cases more in the spirit of the previous poster, try answering some programming challenges that you can find on some sites (I used

A few tips

Here are a few language-agnostic tips that I've learned.

First make it correct, then make it fast. This is good advice in general, but especially true for languages. There will be many temptations to pursue optimizations before they are really necessary. If you end up making major changes in the design (or even refactoring the compiler's internals), any time invested in optimizations could be lost. So, write the code in the dumbest possible way first, and wait as long as you can before optimizing. I'm constantly surprised at how well a modern processor can execute my "dumb" code.

Know and love your intermediate formats. The language will probably have a few intermediate formats; there will probably be an AST-like format to start with, it'll probably end with some low-level bytecode, and in the middle you may have additional formats that bridge the gap between declarative and operational forms. At a high level, your compiler should be separated into several phases, where each phase spits out IR to the next.

Most of your debugging time will probably be spent staring at IR output to verify it's correct, so it will pay off to invest in some tools around this. For each IR format you'll want a way to pretty-print to the console, and you'll want the program to have command-line options that do nothing except dump the IR representation. You might also want to write a simple assembler-like parser that parses directly to IR, for testing. If you target an existing platform like JVM/LLVM then you'll benefit from existing tools around this.

As for writing a compiler versus writing an interpreter, I like writing an interpreter first; it's much easier to get started and to experiment. And it seems like, if you have a simple working interpreter, that a lot of the same code can then be reused when writing a proper compiler. That is, you could adopt your interpreter to be a trace-based compiler: it steps through every instruction and instead of executing it, writes the appropriate machine code. (I haven't written an implementation like this yet so this is a bit of speculation.)

There are languages built

There are languages built specifically for modeling semantics and executing programs in them. PLT redex is the most mature and feature-rich one I'm familiar with, and a (much more) experimental tool called K has recently been publicized. Term rewriting systems like Maude are more low-level but tend to be fairly fast. K compiles to Maude.

First figure out

First figure out what kind of formal semantics your language has. If you really want to be able to prove things and do aggressive optimizations that rely on theory and deep structure, it's best to have a simple, referentially transparent semantics that can be expressed as a denotational semantics. And you need to decide, as consistently as possible, what does and does not constitute a semantic error.

If the meaning of a given nontrivial (unbounded in complexity) subtree of your AST can be nontrivially different (unbounded in different possible meanings) depending on the context in which it appears, then your language isn't referentially transparent and you won't be able to formulate a denotational semantics for it.

Most languages have trivial subtrees that can be read with a limited number of different semantic meanings depending on context (variable names that can be read as LHS/binding or RHS/address in an assignment, for example). Modern LISPS have nontrivial (infinitely varying) subtrees that can be read with a limited number of distinct meanings (code or data) and play pretty hard with meta-semantics in terms of macros, but there exist coherent and capable lispy macrologies such as Scheme's define-syntax that can be expressed denotationally.

But languages like Perl, and lispy macrologies like Scheme's syntax-case or Common Lisp's define-macro, have nontrivial subtrees that have a nontrivial set of possible meanings and can only be parsed correctly by a system that is executing program code in the same language, possibly including code parsed earlier in the same run. You cannot write a denotational semantics for these. At best you can write an operational semantics, or a denotational semantics for a language subset capable of beginning a parse of programs written in them.

How important is this to you? It depends on your goals with the language itself. What is it supposed to be or do? I'm working on something whose major virtues are supposed to be that it's a viable language usable by human beings, and ALSO a viable translation target with 'obvious, straightforward, isomorphic' representation for code originally written in a wide variety of extant programming languages. The languages under consideration have very different semantics (functional/imperative/OO, lazy/strict/by-name arguments, static/dynamic typing and scoping, lisp macrologies, multithreading, mutable higher-order functions, etc). So I was not surprised to work out that there is no denotational semantics capable of expressing the design. That means there is a huge variety of static and runtime optimizations and transformations I will never be able to do, if I don't keep the program's AST around as a reference in case I need to undo them. But I was sort of resigned to that fact in the first place, and doing the math was just a confirmation.

But if my goal had been creating a language with programs easy to reason about, where the compiler can confidently do deep global transformations or general expression reduction and know for sure that it will not need to save enough information to undo them later if some presumed invariant changes, or statically analyze a program to reliably identify a confidently known set of semantic errors before attempting to run it, then being unable to create a denotational semantics would have been a disaster.

Aside: there is a vogue for calling any error you can identify via static analysis a "type error" these days, but in some cases I think that this may be unreasonably extending the once-precise notion of "type." I prefer "static error" in cases that go beyond simply keeping a consistent account of the meanings assigned to a specific non-executing location's bits.

And anyway, what is a semantic error? You the language designer have to decide. Division by Zero doesn't need to be a semantic error in a language that uses NaN values. Accessing a nonexistent member variable in an object doesn't have to be a semantic error in a language that auto-instantiates members or signals and handles such behavior. Doing a typed operation on the wrong type isn't a semantic error in languages with 'weak' types or the desired automatic coercions.

Even infinite loops, which are normally considered to be an error, don't have to be. If the language is multithreaded, and the "infinite" loop happens in a thread that will get terminated by another thread, bottom can be business-as-usual. (This is, btw, a possible control structure. 'DoOneOf' starts N threads, returns the result of whichever finishes first without an error, and cancels/cleans up all the remaining threads. If six or eight of 'em hit infinite loops, that's fine as long as at least one terminates correctly). Can you prove that the threads which hit bottom will in fact get terminated? It's capital-H Hard, unless you can detect that at least one of the other threads in the hypothetical 'DoOneOf' clause happens to be expressed in a sub-turing subset of the language.

The crucial question is whether having any of these things be an error, or NOT be an error, is a good idea in your language. Does it enable you to do anything you couldn't otherwise do? Is it related to other errors? Does it make your treatment of something inconsistent? And does it violate the use cases and expectations of programmers who may one day use your language?

Anyway, that's where I start. Decide what kind of semantic model your language has. If it's important to you that a consistent denotational semantics can exist, be very careful with macrologies and context-sensitivity. Formulate a list of things that constitute an error, and make sure that your design is very consistent about the way it treats those things. If you do your denotational semantics first, you will have a good idea of what your AST's are so you can design your syntax around that. Or you can just pick a simple well-known 'default' syntax like fully-parenthesized prefix notation and use that until/unless you have a better idea.


Denotational semantics is

Denotational semantics is not useful and is actually hard to reason about or even formulate. More operational approaches are more popular and practical. The most powerful methods for practical reasoning about contextual equivlence are logical relations that have an operational feel.

That's a strong claim, which

That's a strong claim, which I'd dispute. It may well be true with regards to the sort of work you've done, and the tradition you're familiar with, but there are other branches of work where it's much less the case. Denotational semantics can be incredibly useful, in moderation. For some languages (those which are referentially transparent, and especially those which are lazy), denotational semantics are an order of magnitude more reasonable to work with than small- (or even big-) step semantics. For more imperative languages, denotational semantics can be made to work with a little wrangling, but operational semantics certainly remain more common, and probably simpler (though I'd argue less powerful). Certainly, the granularity of denotational semantics with regards to latticies of definedness and the like can be a bit of a pain. But for "practical" reasoning we can use plenty of results to avoid tangling with the intricacies and just work at a fairly high-level of equational reasoning.

Personally, I find logical relations much harder to work with, though certainly of great importance. If you have some examples in mind of work that uses logical relations with an "operational feel" to good effect for "practical reasoning" (the former qualification, i suspect, ruling out free theorems and related), I'd be very interested.

Bootstrap early

Is your language designed to be general purpose? If so, consider writing an interpreter in another language first, and then writing a "real" implementation in the language itself, using the interpreter to bootstrap. Don't bother trying to write a compiler in the bootstrap language. You'll get more experience with tackling real-world problems in your language if your implementation is written in the same language.

The downside is that making major changes to your language requires a delicate bootstrap dance, and bugs in either the bootstrap interpreter or compiler can be tricky to tiptoe around. Also, until you are fully bootstrapped and can jettison the first interpreter, you'll have to make changes in two language implementations.

Also, consider being meta

Also, consider being meta circular :)


Go deep before you go wide. Get a few examples running with a small subset of the language. Expand incrementally, based on relevance or importance.

Try many different things

Write interpreters, compilers, JITs in different high- and low-level languages.

well, not in C++ ...

I'll try to provoke new ideas, or to amuse, rather than be really technically useful. (I would have liked this reply myself ten years ago.) But as a practical matter, implementing a language at least twice, in different ways, may help debug via comparison.

qznc: How would you implement your language?

Here's a story about how not to do it; I think it only has entertainment value. Around 1992 I started writing a (Lisp syntax) Dylan compiler in C++, for fun, but I got stuck when I reached the point where types were processed. To my horror, I realized the spec required the runtime to be present at compile time, to evaluate arbitrary expressions at compile time. I was slightly angry about it. :-) But it started me thinking about how much freedom I might have had working in assembler instead, whereas I was pretty wedged in C++, since I hadn't planned on a re-entrant compiler.

That got me really interested in continuations, though now I'm not sure why. How would they have fixed my problem? Something about Appel's description of compiling with continuations in Standard ML made me think really fluid control flow would let me get out of jams. I guess it was just an excuse to think outside the box more. I saw continuations as a means of context tricks giving more scope control, so I'd never find myself painted in a corner.

If I was going to implement a language now, I'd do it in C, but in a weird runtime variant using cactus stacks and green threads, so green thread support would end up in the resulting language too. Most languages without intrinsic async support (letting you write in direct style without lots of callbacks) look kind of similar to me in terms of sorts of results you can get. That's one reason Go looks novel to me: it tries to step out of the everything-is-sync model so standard elsewhere.

Then I'd compile to C, but assuming that weird C runtime just noted is present, or can be reached by another stage rewriting standard sync C into async stackless C with some CPS aspects to it. (This doesn't really sound fun now, the way it would have once; everything sounds like work to me now -- just stuff that needs doing. :-)

All other techniques noted sound like good ideas, especially when you can try things cheaply at first, getting early feedback before you commit to something hard to change. But choice of tools can limit scope of choices, if a tool thinks only certain things can occur at runtime or compile time. You need to pick basic runtime features you want, like garbage collection or green threads, before your tool picks for you and herds you into line.

I'd recommend thinking about how a language interacts with the host operating system. You might ask folks what models of OS relationship affect implementation a lot.

To my horror, I realized the

To my horror, I realized the spec required the runtime to be present at compile time, to evaluate arbitrary expressions at compile time. I was slightly angry about it. :-)

Out of curiosity, what corner of the language necessitated that?

I ask because I'm working on a bytecode compiler/interpreter for a hobby language that's quite similar to Dylan and I just ran into the same problem.


I'm guessing it's mainly because of macros.

in syntax to type variables

Instead of looking it up, I'll go by memory only, so there's a chance I'll confabulate something. There was a Dylan book and several spec refinements and enhancements as separate short documents. I forget where the part bothering me was located. It might have been in the slot specification, which I think was a separate document. I don't recall how many different points in Dylan syntax had the issue, but I think slot specifications did.

Since Dylan was a Lisp variant, it had typed values. But for optional static typing, it also had optional typed variables. A language with typed variables must have some way to associate variable and type. For example, if a slot in an object has a type, there's a spot in syntax where type is specified.

I naively assumed each type was a static expression, but it was dynamic: the type was a value returned by an expression, and (in the general case) was known only when a type expression executed at compile time. And yet to generate the value from an expression, you needed a working runtime at compile time to evaluate all expressions permitted in that spot.

It was the only place I noticed a working runtime at compile time was needed. Lack of advance warning shocked me. I supposed Dylan folks followed Common Lisp convention here in assuming a version of the end system also existed at compile time. (Dylan was developed at Apple's Cambridge office by the folks who worked on Apple's Common Lisp, including David Moon, who I think was most famous in the group.)

I should have just implemented a different language -- not quite Dylan -- which had static type expressions only. That's how you ship things: first do something simple that works, then chase down every last detail. But I was doing it to find out things, and that one led me on an entertaining tangent.

I don't know whether to be reassured or horrified.

I don't know whether to be reassured or horrified. That's the exact same issue that I have. My language is multimethod based. Each method specifies a pattern (in the pattern-matching sense) that it uses to select that method.

Ideally, I'd be able to statically compile a multimethod all at once, but to do that, I need to linearize those patterns (well, topologically sort, but close enough). But those patterns can contain expressions, which I would need to evaluate for that to work.

I haven't come up with a clean solution yet. Do you happen to remember how the Dylan compiler handled that? Did it just rely on being able to interpret expressions at compile time?

finite concrete trusted primitive sub-graph for basis?

Your language sounds interesting, but then I'm biased to like Dylan. :-) I never quite felt happy about generic multimethods; the cost/benefit tradeoff of obfuscation versus flexibility involved seemed slightly off, or else I was overly sensitive to added difficulty in finding code invoked for sure by a given call site. I think it's just personal taste that I'd rather avoid non-determinism when I can. (And when I can't, like in async code, I want more explicit control handles.)

I recall dispatch precedence for Dylan generic methods was defined as following the same scheme defined by Steele in the Common Lisp spec. This mattered a lot when it came to which method runs if you call next-method, which was Dylan's moral equivalent of a superclass method call. You might look at that part of Common Lisp's spec, in case it gives you an idea how to write the spec for your plan in a way developers are sure they understand. (Maybe you did already -- seems likely.)

I don't know how the Dylan compiler in Common Lisp handled that part. (The development env wanted at least 20MB of RAM, and we only had 16MB machines standard then for devs working on Pink at Apple in Cupertino; so it wasn't practical for me to use the compiler written in Common Lisp.) I was shy about asking, partly because I was junior then, and partly because the Lisp and C++ folks maintained a mutual formal distance, despite planning to cooperate. I was considered slightly nutty for liking both C++ and Lisp, but it meant I got to attend rare Dylan events.

Frankly, the Lisp devs were considered magicians, with implied non-scaleabilty.(For example, I heard this phrase several times: "Only David Moon needs to understand how the common memory management involved will work." I found the spec for cross C++/Lisp memory management mildly disturbing. Don't recall that spec's codename. [Edit: it was Creole.])

So all I can offer is rumination based on analogy to related storage problems. I worked on storage for a while despite liking languages more, because I thought apps that didn't remember anything were not very interesting. So I got side-tracked by designs for ideal efficient low-end no-SQL stores for dynamic languages run in resource starved contexts, like personal computers with just several megs of RAM to use, which explains why I know how to use resources efficiently now.

If a system has means to be self-describing, you want a fixed point basis for bootstrapping somewhere. There must be something that makes sense by itself, which doesn't require a meta-description to use, because it's part of the primitive trusted base. You want your graph of dependencies to ultimately resolve into to references to those primitives, preferably so a topological sort of dependencies has a static resolution. Cycles in primitives are okay, as long as they're defined ab initio, since they can't change and must be accepted by fiat. (For example, think about how Object, Class, and Metaclass relate in a Smalltalk class hierarchy.)

This is essentially a restatement of what makes a valid inductive proof, with a focus on the recursion present in self-describing meta-information. As you follow definitions to resolve their meaning, the scope of remaining work has to get smaller until it terminates in a basis case. You can probably constrain patterns so -- instead of being fully general -- they might be defined as having a resolution in terms of a starting set of primitive patterns that must be present when your language boostraps. Thus a pattern graph is anchored in something concrete.

Hope that helps provoke a line of thinking that leads somewhere useful.

I recall dispatch precedence

I recall dispatch precedence for Dylan generic methods was defined as following the same scheme defined by Steele in the Common Lisp spec.

I believe Dylan uses (and invented) C3 linearization which is what Python also uses. If I remember right (unlikely) that was a bit different than the original linearization CLOS used in some corner cases, I think.

Magpie has a slightly different algorithm in part because methods in Magpie have patterns and not just a fixed-length argument list. So it has to be able to handle things like determining precedence between a nested record and a type pattern and all sorts of other differently-shaped patterns.

Hope that helps provoke a line of thinking that leads somewhere useful.

Yes, thank you, it really does. This along with skimming some Scheme has given me lots of food for thought. Thanks!


Slate (and Atomo/Atomy to a greater extent) performs a much simpler thing than Dylan, defining a very restricted grammar for type expressions.

Isn't your dispatch

Isn't your dispatch mechanism very close to Instance Chains? Of course, the type matching language is different from the expression language.

It may very well be, but I

It may very well be, but I need to work on my Haskell skills before I can wade through that paper. Thanks for pointing it out, though. I'll try to get through it eventually.

Compiler-only bootstrap

It is not difficult to find a bootstrapping fixpoint in a compiler-only implementation, so that the compiler is able to evaluate arbitrary expressions at compile-time.

Lisp compilers must be able to evaluate arbitrary expressions at compile-time (due to procedural macros). Often, they require a separate interpreter, a different compiler, or an earlier version of the same compiler. But there's a different way.

I've implemented Lisp->C and Lisp->JS compilers that do away with these requirements.

In effect, you have an ordinary runtime of your language at compile-time, and evaluate (compile, link, and load) expressions that have compile-time effects in this "compile-time runtime". In Lisp, examples of such expressions are macro definitions (they populate an in-compiler table mapping macro names to expander functions) and expressions explicitly wrapped in EVAL-WHEN-COMPILE (which may used to define utility functions that are used by macro expander functions).

During compilation of a file, if the compiler encounters an expression with a compile-time effect, it compiles that expression, and immediately loads it into the compile-time runtime. My Lisp->C compiler separately processes Lisp files and spits out one .so shared object file for each. In addition, if the file contains expressions with compile-time effects, there will be another object file containing the code for those expressions. [This second object file effectively contains the concatenation of the individual compile-time expressions evaluated during compilation of the input file.]

I haven't found any difficulties with this approach, and I think it's applicable to any language that needs compile-time expression evaluation and where a compiler-only implementation is desired.

Axis of Eval link

During compilation of a file, if the compiler encounters an expression with a compile-time effect, it compiles that expression, and immediately loads it into the compile-time runtime. My Lisp->C compiler separately processes Lisp files and spits out one .so shared object file for each. In addition, if the file contains expressions with compile-time effects, there will be another object file containing the code for those expressions. [This second object file effectively contains the concatenation of the individual compile-time expressions evaluated during compilation of the input file.]

Is there a Part II article? I was left hanging and wondering what the CFASL file is used for. Is it to be able to use the macros in separately compiled modules?

No second part

Yes, the CFASL of a module M contains M's compile-time effects, and so you load it for compiling modules that import M's macros/compile-time effects.

Flatt's Composable and Compilable Macros has details on separate compilation in Lisp/Scheme.

Limited types

Are you thinking of Dylan's limited types?

some other design note besides limited types

No, not the limited types. It was likely a design note on slots. But we're talking about a design note from 1992 (correction: 1993) when the syntax was still Lisp (using parentheses), which preceded the later Algol style syntax.

There's a chance I have a copy of all the Dylan design notes on a laptop hard disk, after transfer from device to device over twenty years. I'll have a look, and maybe post the one I have in mind if I find it. (If Ehud won't hit me for doing so.)

Edit: I was unable to find the issue I mentioned after scanning a 1992 Dylan reference manual, as well as the first 21 auxiliary design notes. And it seems slots were a red herring because design note #9: Punt Slot Descriptors actually retracts slot descriptors from the language, even if there had been a type issue there. The manual even says the type: keyword argument can only take class or singleton specializers, rather than general expressions, so that couldn't be it either. It isn't easy to find, if it's there.

So my report is unsubstantiated, and should be downgraded in expected veracity, since all I have is the memory of hours of analysis after realizing I was stuck. Looking at the manual now, it looks really small, and taken together with design notes, the entire feel is one of a work in progress. So a hitch in spec detail ought to have little consequence anyway. Maybe I had an episode of newbie panic in response to something easily redefined.

1. Try to delay thinking

1. Try to delay thinking about parsing as long as possible.
2. Try not to think about executable (binary) file formats.

Trivial, I know. But many newcomers go down these rabbit holes never to come back.

How would you get to test stage then?

If you are delaying the parsing stage, how would you start to test your ideas? Would you suggest writing out the raw code, then hand-coding an AST (or RPN stack), and then developing the operators and memory semantics from there?

I know the major mistake I made on my first go-around was not having a good idea of variable storage / scope / lifetime. For my second try, I'm starting off with figuring out garbage collection and data scope, then building out from there. Also, along with your suggestion of delaying parsing, I'm starting out with RPN (Postfix) notation, since that is a simple thing to parse, and really what gives a language unique capabilities is what happens behind the scenes, the means of abstraction and combination (as was so eloquently stated by Abelson and Sussman).

With that being said, does anyone have any really good references on implementing garbage collection, without relying on anything special in the implementation language? Ideally it would describe algorithms that could easily be implemented in C or Assembler.

Starting at a Higher Level

You could easily use s-expressions, JSON, etc. to provide the initial structure. Reduces need for parsing.

But if you want to focus on your language ideas and testing them, I suggest embedding in a well established language - Haskell or Scheme or ML. Or compile to an established language that provides a runtime (e.g. compile to JavaScript, Clojure, Scala, or Haskell).

For my second try, I'm starting off with figuring out garbage collection

A simple bi-space Cheney GC takes about ten minutes to write in C.

GC is another rabbit hole. Don't fall in. I lost about a year down there. Henry Baker became a regular GC lagamorphologist.

The problem I'm having with GC...

The problem I'm having with GC (or memory management in general), is that although the simple ones look easy to implement, looking through the descriptions it seems to leave out some key items. For example, they all mention something about "when memory runs low", but on modern systems (multiple gigs of ram), you probably will never run low on memory for most trivial programs. So I'm assuming that you can set low and high water marks -- whenever you hit the high water mark, garbage collect. If you don't free up enough to increase your heap so that you fall below your low mark, then malloc some more memory until you have enough reserve. Now the high and low water marks can be absolute values, or can be a percentage of the most recent heap size. For example, start of with 1 MB allocated. Once that hits 100% full, garbage collect, and if you don't have 50% free afterwards, add to the heap until you have that 50% free.

Now the above makes sense to me, but I'd like to see a paper that describes this -- chances are it will also go over other implementation details that I haven't thought of. (I hope I'm making sense).

Growing the Heap

It is common to grow the heap as necessary. You can find many papers on the subject if it interests you.

Your technique would work. When I used Cheney's algorithm, I resized the heap if I was over 70% full after GC twice in a row. (I'd allocate more on the third GC, or after the first one if I didn't gain enough space for whatever I was allocating.)

Keep in mind that there lots of performance knobs and heuristics, resulting in a very large performance space, which ultimately depends on both the application and the hardware. Don't worry too much about getting them "right" (that's an optimization problem). Just get things working.

That is basically is how my

That is basically is how my collector works. The first -real- problem you encounter with GC, I found, is not how to write a collector, but when, and what, to collect. So, in my case, the collector is Cheney style, when became in between function calls, and what became everything in the from-space, but not in the statically allocated space.

GC isn't as simple as just choosing one algorithm, the implementation will warp all invariants thus that you always end up writing a collector from scratch, and the collectors are tightly bound to the language design and compilation scheme used.

Compiler/Runtime interface

I found the most difficult part of GC to be the interface between the compiler and the runtime system. For example, generating appropriate information to walk the stack and find the GC roots. I ended up shooting for a very dense encoding that was difficult to debug, both encoding and decoding. It also requires a bit of support in the register allocator for liveness. Writing the collector is easy!

On a broader note, the whole compiler/runtime interface is an extremely difficult knot to disentangle. We wrote a paper on this subject back in my Maxine days:

Improving compiler-runtime separation with XIR.

I found the most difficult

I found the most difficult part of GC to be the interface between the compiler and the runtime system.

There is indeed an interesting pair of relations not very often investigated between the compiler/translator, the runtime, and the GC. Of relevant interest, one can also check this SmartEiffel paper out, for instance: Compiler Support to Customize the Mark and Sweep Algorithm.

Henderson technique

For prototyping purposes, if you are emitting C code, consider using the Henderson technique as an interim implementation. It will cost you a bit of performance, but it will let you defer the challenge of dealing with GC-related code generation issues.

The link-up between code generation and the low-level GC runtime is important, and I agree with Ben that it's a significant challenge. That said, it is largely orthogonal to all of the other aspects of the language implementation, so using the Henderson approach as an interim solution probably won't lead you down any bad paths that you will have to back out of later.

If you can express your code within the safe subset of CLI, using that is a huge win. It will get you some (awkward) early debugging support as well.

A more interesting challenge is compiling for concurrency. That one doesn't have any neat interim solutions (at least so far as I know).

I meant using hand coded

I meant using hand coded ASTs as the input, allowing you to figure out the more critical aspects of the implementation. If you are working where s-expressions are available, use those, but the point stands regardless.

It quite depends on what

It quite depends on what phase you are at. The hand-coded AST approach is a great way to figure things out (I do this all the time in even in C#, great for pre-parser work).

But then again, building a parser isn't really that hard, and it can improve the design feedback process substantially; you can "see" more (psychologically, not formally, of course) when your language has a real syntax, rather than as just a more fluent library. Perhaps other languages (e.g., s-exprs) can help you see more before the real syntax phase; I'm not sure.

Now, if you are building a programming experience (editor, language, debugger), you should wind up with your own syntax much sooner to make progress.

I agree.

I agree.

suggest mark-and-sweep or stop-and-copy

derekp: Ideally it would describe algorithms that could easily be implemented in C or Assembler.

A goal of "clear and easy" tends to mean one or another flavor of stop-the-world garbage collection, where nothing else happens during gc. Fancy gc gets more complex, to have subtle effects, like incremental collection to avoid mode switch between gc and normal processing. You can go simple first, then fancy later.

If you want to implement Cheney style stop-and-copy, there's a description in Appel's Compiling with Continuations which is so clear you can read it, standing up in the bookstore, then tell yourself, "Yeah, I can do that easily." Maybe it's online too.

I'm unsure where to find great mark-and-sweep descriptions. The C code in David Betz's XLisp was so clear, I'd recommend reading an old version of the code instead of an English description. I'm only familiar with XLisp 1.5 from the 80's. If a later version gets too complex to be quite as helpful, I don't know which one to prefer.

Both stop-and-copy and mark-and-sweep are precise gc, because they know which pointers are valid still-living references, as opposed to conservative gc like a Boehm collection which assumes anything that looks like a pointer might be one and keeps memory pessimistically.

Typically mark-and-sweep neither de-fragments nor coalesces, because nothing moves. It just frees any previously used node that is no longer reachable from your root set of references (in stack frames still alive, or in global definitions still visible in module dictionaries). In a simple system like XLisp, this might imply all nodes are the same size, like a variant struct that unions every possible value type.

A slightly disturbing tactic you can see in mark-and-sweep is pointer reversal, where you modify nodes themselves to keep track of your graph traversal, before putting nodes back into original states before gc is done. (Otherwise you might have to keep track of gc state in additional memory.)

The easiest sort of Cheney stop-and-copy has an old (full) from-space and a new (empty) to-space, typically called hemispheres, where you copy still-living content from the from-space to the to-space, while leaving no longer reachable objects behind, and uncopied (... unless you want to support finalization, and might need to copy dead state one last time so you can call final methods). If you divide all available gc RAM in two contiguous pieces this way, the result is 100% overhead, as you might have heard folks claim is always true for gc. You can avoid that by getting fancy later.

Copying that way moves everything, so addresses change, which is perfectly safe as long as anyone who previously held a pointer into from-space now holds a pointer to a still living object in to-space instead after gc. Clearly that's very hostile to C and C++ unless expected and handled. To offset the cost of that drawback, you get benefits of compacting and shrinking the working set for memory still used. (And you typically end up with objects relocated near each other when they have a parent/child relationship.)

The main trick with Cheney is forward references: any time you copy object X, previously at X-from and now located at Y-to, you replace the old content of X in the from-space with a forward reference to Y-to. In this way, the from-space becomes a dictionary of new object positions as you go. After you copy an object once, you subsequently just update pointers to its new to-space address. The goal is to end up with a clone of your old from-space graph in your to-space, isomorphic to the old graph, with every old pointer swizzled to become the new pointer when something is still alive.

A short gloss of the Cheney algorithm sounds like this. First copy your root set -- globals and stack references -- to the start of the to-space. Now assign two cursors, say first and last, pointing to the first copied object and the last copied object, to represent the remaining work to be done in gc. (When those two cursors meet, you're done, modulo off-by-one of course.) Those two cursors create a todo list. For every object X in the to-space todo list, you examine every pointer inside to another object Y. Each reference to Y-from in from-space is changed to Y-to inside to-space. Either you copy Y-from to Y-to now, because it hasn't been copied yet, or else you find the forward reference to Y-to and use that. The address of each object Y copied to the to-space becomes the new last cursor, extending the todo list.

Afterwards, every X pointer in from-space has become a Y pointer in to-space. And if multiple copies of pointer X existed before, now they have all become the same new Y pointer in to-space. Free space for further allocation starts right after the last copied object.

(Anything that lives forever could live in a separate perm-space, in a range of addresses clearly different from the from-space and to-space, so it will be clear neither copying nor pointer swizzling will be necessary when a pointer into perm-space is encountered. You just leave perm-space pointers alone.)

Well put

That's a very nice, clear description of Cheney. I imagine it will be helpful for many readers down the line.

Last time I checked, the implementation of Lua contained a reasonably simple incremental mark/sweep collector. There are a few subtleties, but (again, last time I checked) no more than I would expect: incremental mark/sweep is a kind of Rubicon where GC becomes more complicated, as McC hinted above. (It's often worth it, but not always.) Anyway, Lua uses "a variant struct that unions every possible value type," which definitely helps keep the implementation clear.

weak reference support

If that description of Cheney is worth reading, I'll extend it here with a way to add weak ref support (which I recommend as a caching weenie: weak refs help caches avoid permanently referencing content).

The basic Cheney algorithm deals with strong ref pointers. Here "strong" means: if a pointer refers to space, that space is reachable and must be copied. So strong means "must be retained".

A weak ref points to something that might be copied, provided a strong ref exists somewhere else forcing the copy. An object reachable by only weak refs is weakly referenced only, and can be garbage collected. (Sorry if that verges on tautological.) When a weakly referenced object is collected, any weak pointers to it are replaced with nil, or false (or some other distinguished value showing the object is gone when tested).

Here's how you modify Cheney for weak refs. Say any object that contains some weak refs (they need not all be weak) possesses a true has-weak attribute. When a has-weak object is copied to the to-space, we want this done so we can efficiently visit them all again when gc is done. You can link all has-weak objects together in a list, or simply segregrate has-weak objects from other copied objects in to-space. For example, we might copy has-weak objects to the end of to-space (so free to-space is the span between used bytes at the front, and used space at the end where has-weak objects were copied during gc).

Then during Cheney, we process our todo list in to-space as normal, but with two changes. First, we track has-weak objects so we can revisit them, and second, any actual weak ref pointer is left alone so it continues pointing into the original from-space until we reach the end. So Cheney proceeds as described earlier, but weak ref pointers do not force a copy, and are not swizzled to point into to-space.

When the two cursors, first and last, finally meet so gc ends, now we have one more step to post-process all has-weak objects. For each has-weak object W, we look at each weak ref pointer X-from pointing into from-space. If the object at X-from contains a forward reference, that means it was copied to to-space because it was strongly referenced, so we swizzle the pointer to refer to the new Y-to location of that object. Otherwise if the object at X-from does not contain a forward reference, that means it was only weakly referenced and is now garbage -- so we nil out the weak ref pointer to show the object was lost.

That's it, unless you add finalization as a wrinkle. If an object had a forward reference because it was copied, but it was really garbage and only copied to dispatch a final method after gc, then you handle the pending-final attribute the same as if no forward reference was found: nil out each weak ref to it.

weak reference example

Lua's garbage collector also has a pretty simple implementation of weak references.

If you don't want to go old-skool..

Then there's always the option of implementing an interpreter in Java, or some other GC'ed language. That should give GC for free.

Not convinced

I agree with McCusker that it's better to wrap your head around GC with a simple system. Stop-and-copy is a good choice, but mark-compact is okay too. Implementations of these are usually very small and fairly easy to understand. The TinyScheme collector is pretty simple, for example. I'm not a big fan of the pointer reversing techniques, since they can't be used in concurrent collectors.

But once you're done with those, it's worth looking at some of the papers on concurrent collectors. This is relevant because the concurrent designs impose some requirements on the "shape" of your data structure representations and on the low-level semantics of data structure updates. You don't want to worry about the details of those too early, but you at least want to read enough to understand in a general way what the issues are so that you don't step on them by mistake.

it does pay to plan ahead early

I think you're actually going to get me to read about concurrent gc this way. :-) That will be distracting, since I can't easily turn off a module in my mind roughing out implementation as a concrete example when reading. Everything about threads, green or native, soaks up a lot of energy.

I'm interested in data structure "shape" issues you mentioned, if you feel like expanding. Shape concerns also affect plans for immutable data structures and copy-on-write (COW) support, and might be related to concurrent gc shape scenarios. For COW support, it's important to avoid strongly connected data structures, so children don't know parents or siblings, since that allows multiple parents to share a child and patch an immutable child via replacement, without breaking inter-sibling links (because there aren't any).

For example, if you want to do copy-on-write btrees, you can't let leaf nodes link each other directly (since that would break COW patching of immutable children) even if that would be convenient. I was kinda-sorta planning on efficient big-string implementations that way in gc space, with COW and scatter/gather. (Ropes?) Maybe there's a similar interesting interaction with concurrent gc I should think about.

Concurrent GC

I don't think that COW B-trees are impeded by concurrent collectors - all of the concurrent mechanisms have to deal with forwarding and re-mapping issues in any case. The C4 collector handles this with page re-mapping.

The main issue, so far as I can tell, has to do with the low-level implementation of vectors/arrays (but not necessarily strings). The problem is that the concurrent collector has to have a small enough unit of work that it makes progress on its copy operations even if the mutator is actively changing the structure while the copy is underway (which typically requires the copy to re-start). In practical terms, this means that you want an upper bound on the number of words in any runtime-level object that may be the subject of a copy operation by the GC. Unfortunately, that bound probably wants to be less than the system page size, which means that vectors probably need to be implemented as some form of "chunky" structure. It's potentially a concern for very large objects as well, but those are pretty unusual in practice.

In my opinion, it's OK for a concurrent collector to halt the mutator when operating in extremis. There has been a tendency in the research papers to seek completely non-blocking designs, but this complicates things quite a bit. There are a whole bunch of other system effects that can cause the mutator to halt in any case, so I'm not convinced that a micro-pause imposed by the collector is really that big a deal. By "micro-pause" I mean a pause short enough that an implementation with spin locks is credible. I would really like to see an evaluation of a collector that starts from an "optimistically non-blocking" perspective but is prepared to fall back to use of spin locks when active contention arises. Hmm. Come to think of it, the C4 collector essentially does what I am talking about by temporarily unmapping pages during copy.

The papers I would suggest on this subject are:

In my opinion, the last one is the most interesting work to date, but the discussion in the Stopless paper is very helpful. They filed a patent on incremental stack scanning, but I strongly suspect that prior art exists on the important parts.

Which papers do you

Which papers do you recommend?

See this comment

Answered here.

If you want to implement

If you want to implement Cheney style stop-and-copy, there's a description in Appel's Compiling with Continuations which is so clear you can read it, standing up in the bookstore, then tell yourself, "Yeah, I can do that easily." Maybe it's online too.

I was able to get a basic Cheney-style collector working just from the wikipedia article and the linked to paper. For what it's worth, the code is here. It's pretty simple, but I make no claims about its quality.

Even simpler than Cheney, though, is just ref counting. The first interpreter I wrote in C++ just used a simple ref-counted smart pointer class. If you just want to get something up and running, I don't think you can do any better than that.

Well, yes, you can: just don't collect anything at all. You'd be surprised how far your little toy programs will get before they exhaust all memory on today's machines. :)

mixed models

munificent: If you just want to get something up and running, I don't think you can do any better than that.

Refcounting is often considered a subtype of garbage collection. Administration can get complex, though, depending on assumptions and requirements you add (e.g. worst case consequences of native threads has unpleasant cache line sync costs). We could easily have a long rc-as-gc subthread. But as a 101 focus, maybe we should say it works but may have consequences if you plan to evolve and/or grow.

As a form of gc, it's not quite as agnostic, separable, or transparent as other kinds of gc. Code could know about refcounts in a way that interferes with swapping in different gc later. Space cost to record a per-instance count of refs has to be considered overhead, and extra size can have indirect time cost too. (I don't actually want to go into the litany of details; just note those details exist.)

I'm inclined to use both refcounting and another sort of gc for different purposes. For example, if you wanted a library version of green threads for embedding (in C apps with soft real time requirements), then refcounted cactus stack frames is a way to avoid another sort of gc which one's colleagues might veto. I know folks who won't object to refcounted frames; it's actually a more organized way of doing business than older options that pass muster too.

(Freeing refcounted objects might have to be async though, so it blocks a green thread if too much becomes free all at once, to avoid high latency to deallocate.)

Then in a different context, using that green thread library, you could implement a language using gc for everything except stack frames, where one of the value types is "reference to refcounted object". Then the runtime would be hybrid. Code interpreted here might be compiled to run in the other context, which only uses the library with refcounted cactus stack frames. In particular, writing tests would be much nicer in a high level language. Being able to compile tests for stress loads would still be helpful.

Mixing it up that way doesn't seem like a 101 place to start though. But it might help someone flesh out plans, or write requirements, since it's important to imagine what you want to have happen when you're done and this sort of weird case might spur different ideas.

just don't collect anything at all

That's a good way to start with gc as a just a stub to be added later. Other than latency, behavior shouldn't change when you gc. So tests written before gc should have the same results after gc. Then even later, collecting after every allocation is interesting way to search for bugs, to see if you ever manage to create a spot where gc can't figure out a newly allocated thing is still reachable.

BitC used S-expressions

In BitC, we made an initial decision to use an S-expression surface syntax in order to avoid getting trapped in syntax debates. S-expressions are pretty close to ASTs, and very simple to implement (at least in principle). Another option would be to use some form of XML. That has the advantage that you can specify a validation schema externally, and you can use the DOM tree as a (very) high-level AST.

But I would add that the S-expression syntax did come back to haunt us. It wasn't very good for type definitions, and when we went to build our mixfix mechanisms we found some things that didn't translate well into the final surface syntax. Not many, but a few.

In hindsight, I wouldn't do S-expressions again, but I would certainly adopt something as close as possible to an existing language until I had a clear-cut reason to depart. There are only so many ways to declare types, for example.

delay thinking about parsing

Given my experience in developing my hobby language for several years, I must say this is really important suggestion. Actually, that's what I was doing, I paid the least attention to parsing for quite long time, because it was the least interesting thing in implementing interpreters or compilers. If I had my focus on that, I probably had already got bored and given up years ago!

You can do some fun things

You can do some fun things with parsing....try writing a live programming language.

You are right on this

There may be many challenging problems in parsing for a live programming language (I doubt new comers will start with a live programming language). Anything challenging is unlikely to be boring. So I guess the bottom line is, don't get bored, which means focusing on the right things while designing and implementing a language.

Approaching parsing as a

Approaching parsing as a stable model with probability (to handle errors) is quite interesting. I can think of some other interesting approaches, too, such as modeling syntax in terms of a widget structure (with live views of data, sliders and knobs and dials and charts, and other canvas and manipulation elements built right into the syntax).

Sounds fun. I'm already into

Sounds fun. I'm already into stability via damage, repair, damage more, repair more. Memoization helps to isolate errors, but I haven't thought about using probability models yet.

Live syntax is cool also, ala Bret's presentation or Live text in Flogo II. Sounds like a nice project...

File format?

I agree with what I think Ehud is trying to say here, but I think that file formats are worth some thought. You need to make a fairly early decision about whether you are doing a whole-program language or a unit-of-compilation language. This has significant implications for compile-time and run-time visibility, and it's a hard decision to change. In the unit-of-compilation case, it's important to understand about linkage visibility, and also about the static initialization mechanisms that are implemented in various dynamic library schemes. These will manifest as constraints on what you can do in your module/interface system.

Thinking about the

Thinking about the compilation model is important. This is more of a language design issue than a straightforward implementation strategy question, though.

Implementing debugging?

What implementation approach would allow easy debugging of code in the new language? Compiling to a host language/VM and using its debugger isn't a pleasant experience if your language has significantly different semantics than the target language, or is much higher level than the target VM. Is there an alternative to writing a debugger, complete with ability to compile code in a way that enables pause/step/resume/conditional breakpoint?

use an interpreter

It seems to me that the fastest path is to instrument your interpreter. It can be a bytecode interpreter. You'll have to build a debugger anyway, and if you have enough resources you can see how to extend your debugger to compiled code. A few of the Self papers were about deoptimizing compiled code to enable it to be debugged.

Another option is to extend your host debugger to work with your language. Depending on what your language is like, you could emit the appropriate DWARF info and cajole GDB into understanding it.

On the CLR, for one, there

On the CLR, for one, there is for instance this "classic" example, in the C# compiler called “Blue” that Mike Stall contributed a while ago (with full C# source code provided), as a (quoting)

"giant reflection-emit test [...] emitting debugging information for all the code constructs (sequence points, local names, parameter names, etc) [...] an extreme example [here] because it includes most C# language features and generates fully debuggable + verifiable code.

Not my experience...

My compiler compiles down to combinators which are all translated to C functions which are trampolined. Also, all runtime structures are printable to 'okishly' readable formats; the combinators also hold code for printing the associated identifier.

The end result is that I can use the debugger of C to trace the runtime behavior, and at every moment print a 'trace' -i.e., the term as it is being evaluated- of whatever is going on.

It ain't that bad.

Counterpoint to "interpreter first"

A number of people have offered encouragement to write an interpreter prior to tackling a compiler. I think this is sound advice in general, but I would at least offer one caveat... Until you have some considerable experience, it's very easy to write the wrong kind of interpreter and end up in serious trouble with either insane semantics or no route to a faster implementation. This has happened to plenty of languages (including quite a number of today's very prominent "dynamic" languages), often because the initial implementation was an AST-walking interpreter or something with even less phase separation than that.

If you're not careful, it's quite easy to end up with semantics that rely on all kinds of quirks of the interpreter, even in such seemingly unrelated areas as scoping and the precise meaning of literal values. (Are literals mutable? Is a fresh value constructed each time code containing the literal is reached? In Python, for instance, the answer is "sometimes yes, sometimes no"...) It's even easier to end up with a language that is just incorrigibly imperative (think Ruby, Python) where basically the only thing one can say statically is "run it and see."

Therefore, I would strongly suggest something with a fairly strict phase separation and well-defined IR, such as a bytecode interpreter, abstract machine or some moral equivalent. This will really clarify your thinking about what happens when (and how) and give you a decent if very informal basis for operational reasoning about code. With a little care, it will also keep the door open to an eventual native code implementation (whether AOT or JIT). Lua is a fine example of this approach.

So, yes, I guess I would also suggest some form of interpreter prior to any kind of native code compiler, for many reasons, but be careful what kind of interpreter you choose...

Yep. We have to figure out


We have to figure out what we need erase (not look at) during execution; e.g., consider Gilad's law that static type checking should be unrelated to execution semantics. I went through a phase in my most recent language where I was conflating binding with type information. It was cool, powerful, and insanely undefined! Translating to a differently expressive format (e.g., IL) is definitely a nice way of pushing this point.

Phase separation is not enough: Java

The Java language has some semantic corners, which lead to inefficencies when compiled (GCJ).

For example, static initializers of are called "lazily". Since they can depend on each other, the result can depend on dynamic factors. In contrast, in C++ static initializers are all executed before main is called.

Another thing are interfaces. GCJ adds a runtime library for interface calls. It essentially does a linear search over method names (reflection) to find the method to call.


Right, maybe I should have been more clear. When it comes to a simple and fast native implementation, a clear phase separation is necessary but not sufficient.

If you're willing to give up on "simple", the phase separation is really not even necessary: HotSpot, GHC and SBCL all jump to mind. But these systems are anything but simple, and in the case of something like Common Lisp it's probably even more important to be clear about what representations are available when and how they're all related.

I'm actually for library

I'm actually for library first, then embedded DSL / specializer / macro-expander, and only then full-on interpreter and then compiler. In parallel, I'm 100% application-driven. No app, no language.

This isn't so much for general adoption (though it helps), but to iterate extensively. Even after I have an intuition for a feature, I need to understand the problem and then how different solutions work with it. I have consistently found my assumptions going in to a design problem to be off, so iteration on real applications showed what was really needed and actually worked. Software is complex and therefore specialized, so this shouldn't be a surprise :)

Library First

I agree, that library (or framework) first is the right approach, especially during design phases (where you're still figuring out why you want a new language and what is should contain). The main motivation I'd have to start establishing a language early is to support distributed dynamic applications (mobile agents, etc.) i.e. I need some representation to distribute.

Application vs. experienced-oriented language design

Sometimes language design isn't oriented at a specific application; say you are trying to design a tablet or live programming experience, which necessarily includes the editor. Then it makes sense to go beyond the library path. Of course, you can prototype your abstractions outside of a custom environment, but some investment must be made in syntax early on.

The live programming space

The live programming space is driven by doers - live performances, cool art projects, and an amazing demo scene. They're a lot of fun!

Funny enough, one of the designers of Max/msp confessed.that its popularity and subsequent serious dev took time away from his music hacking, which was the original point!

live coding vs. live programming

There is actually a serious story with live programming that involves a machine gun and a sniper rifle. Being able to operate on your code while its running and reacting to your changes can be a huge boon to productivity. This isn't really like live coding, that community has very different no less admirable goals.

Hmm, I'm not sure if I'm

Hmm, I'm not sure if I'm aware of the distinction. Perhaps I'm giving them too much credit: these systems are increasingly pushing towards on-the-fly programming support, with the older languages relying on flexible dataflow setups to fake it.

The live performance scene's productivity needs are often different from typical software devs. If live programming is useful for standard touch dev, my coding time would probably be in a race to getting an app written that I can experiment on :)

As an example, we recently decided to reuse our visualization language for enabling scripting of big data visualizations (~1,000,000 node layouts+renders). A student built the new compiler backend (in line with my above advice on library first, a compiler was the path of least resistance because it was only building a new backend for an existing one) and, just this weekend, he wrote his first app. It exposed some performance issues, but more fun, it showed the obvious (in hindsight) reality that you quickly run out of pixels for such visualizations.

Lessons: First, standard visualizations need to be rethought (you can't just reimplement D3's). Second, we now have a hypothesis that we want good interactive hierarchy support. Now we'll iterate on demos to see what else needs help. The above discussion on GC algorithms is funny because we still don't even know the workload we'll need to optimize. In our case, when we add GC support, it'll probably be whacky as we're on a GPU. Any new language that is to be adopted basically needs an underserved niche, so I don't think this sequence of events is at all surprising.

Yes, I totally agree with

Yes, I totally agree with this. But you can explore many aspects of the experience in parallel...where do you start? The live programming experience has some fundamental features that you can't really show off at the library level. The language needs a niche, ya, but that niche can manifest as a creation experience (Processing) as opposed to a specific kind of app (Ruby on Rails).

BTW, I would really love to find an live/incremental version of graphviz (or AGL) to support my programming system, as this is one of the tools I find very useful in compiler/interpreter/type system development. Imagine live programming a compiler and interacting with a visualized graph (linked to the code...) in real time!

I fully believe in hacks

I fully believe in hacks :)

For example, as we're seeing in the resurgence of direct manipulation prototypes, people will just reload/recompile on the fly. That eventually breaks down, so then you get something like libraries and source-to-source translators on top (e.g., for inserting 'live' values). My guess is that more aggressive ideas can be explored by hacking on top of an existing Smalltalk system.

The AGL guys are funny: they 'exposed' their stuff as a webservice on rise4fun.. but not really. I was excited and then disappointed :(

Phase separation is for the weak!

Yeah, tongue firmly in cheek. Phase separation is sane, and a giant help for static analysis. Languages that don't permit it are in some degree "insane".

The only problem with phase separation is that if you commit to it, and then persist in your pursuit of dynamic features (mutable higher-order functions, first-order code such as lisp macros that has to call user-defined subroutines, first-order runtime code such as type expressions, etc) your compilation model (specifically what gets evaluated when) will continue to sprout more and more complexity until it's eventually harder to map than a bowl of spaghetti and harder to understand than a schizophrenic bat on speed.

So you either decide early to avoid certain kinds of dynamism in order to keep things sane, or else you decide early that phase separation is for the weak, and try to produce a language design that capitalizes on its absence in a principled way.

But what constitutes "principled" once you've thrown out the primary means by which most languages achieve a sane and stable semantics? My "rule" is that dynamic features must be understandable in terms of the language and its semantics, and not require specific knowledge of the implementation and implementation-internal data structures to understand. This can be surprisingly subtle and hard to do, though.


Slippery Sloped Staging?

I think your proposed descent into spaghetti and guano is not a foregone conclusion, and that other design elements can easily avoid it.

For example, effective support for staged programming can push to to the developer many details about what gets evaluated when, thus keeping the compilation model simpler and offering the developers more expressive abstraction. And object capability model can control when and where and how much mutation is introduced - thus enabling dynamic features to be introduced with well controlled spatial-temporal scope.

I believe we can keep things simple and under control without sacrificing our ability to express flexible dynamic features. We just need to discard the idea that `static` must refer to one flat stage. It can refer to everything computed before the code is adapted to the client, and even some things computed during that adaptation.

I do agree with your rule. We should avoid the need to reason about the implementation details.

Diverging back to language design

This thread seems to be diverging back to language design too quickly. The point of the original post was to start a discussion on implementation tips, and from "101", I'm assuming basic implementation tips.

A divergence which seems

A divergence which seems somewhat ironical, if not paradoxical, considering this usually goes the other way round in that topic range: implementation details / caveats or related concerns that can pollute discussions or debates on language design. :)

Too interpretive can mean doomed to be slow

"If you're not careful, it's quite easy to end up with semantics that rely on all kinds of quirks of the interpreter, even in such seemingly unrelated areas as scoping and the precise meaning of literal values. ... It's even easier to end up with a language that is just incorrigibly imperative"

Python has run into this problem, as attempt after attempt to write an optimizing Python compiler (see the Unladen Swallow disaster) has run into serious trouble. The PyPy crowd has, after years of struggle, had some success. But they still can't optimize much. They do a little type inference to recognize when they can use unboxed numbers, but that's about it.

A big part of the problem is hidden gratuitous dynamism. Any thread in Python can store into the data of another thread at any time. The implementation must handle this. Any object can have any method replaced dynamically, even while the method is executing. The implementation must handle this. 99+% of the time, the code is doing the obvious case, and nothing exciting is happening dynamically. But the compiler can't know that, and has to compile worst-case code. Static analysis doesn't help, because modules can patch other modules, and can use functions which take strings, not identifiers, for that purpose.

CPython handles this by simply looking every non-local symbol up in a dictionary for each use. PyPy has a complex system involving two interpreters and a JIT compiler to recompile on the fly after something unexpected has happened. In neither case is the performance even close to a hard-compiled language.

Dynamism itself isn't the problem. It's that the dynamism can be hidden. The compiler can't determine, at compile time, that a block of code is not going to have anything unusual happen to it during execution. This blocks most of the big-win optimizations. The compiler can't inline a function; it might be replaced on the fly later. The compiler can't look at the side effects of a function and use that information at a call; the function might change. The compiler can't give an object a hard layout with fixed offsets - a field might be added later.

This is sad, because Python is a good language to write in. It just runs too slowly for most production applications.

Opportunity Costs

Dynamism itself isn't the problem. It's that the dynamism can be hidden.

This is often a symptom of a larger issue - of allowing implementation to drive design. I.e. we add a feature because it seems useful and, based on the current implementation, it isn't too difficult to add efficiently. Never mind that it might prohibit a more efficient implementation of the language as a whole. It is difficult to observe opportunity costs.

Consider, for example, the question 'How slow is reflection?' Such questions compare the cost of using reflection to the cost of not using it within the context of an implementation that supports reflection. The question unasked is which optimizations we cannot perform while supporting reflection.

I think it's important to consider in language design which opportunities are worth protecting even at cost to expressiveness, even when the current implementation does not realize those opportunities. Optimization opportunities should be included. In language design, we often make tradeoffs. (Tradeoffs are not essential, but we often lack a clever solution to avoid them.) Opportunity is one currency we have for such tradeoffs, and should not be spent blindly or easily.

Hmm... not such a good

Hmm... not such a good (counter-?) example, IMO.

Reflection certainly has its design and implementation goals and choices of its own, I'd say.

There are non trivial differences of respective extents and limitations between reflection a la JVM, vs., say, a la CLR. Much like Delphi's RTTI would have a whole different range of options and implications than C++'s for instance. More concretely: for Delphi, in a proprietary language but with an extensive set of use cases to ease the IDE's (and other libs) extensibility, in the case of C++ : pretty much the opposite pattern.

I'm not advocating for reflection to be part, a priori, of any default set of designed language features, but I'd argue its very own "meaning", or intent, can vary significantly from one language, or VM, or runtime or etc, to the other... anyway.

But you speak about costs. I don't know about you and others, but I like to think of costs with the exact opposite strategy to the separation of concerns or divide and conquer view point.

As layers in languages and platforms pile up, I prefer to think of costs, at least in S.E. specifically, more and more broadly:

incurred cost of prototyping ideas over that X or Y feature, cost of crafting up a v0 / beta, cost in pure dev time, cost in unit testing, cost in debugging (if applicable), cost in LOC while using to solve a problem vs. not having it, cost in maintaining, cost in having newbies NOT misuse it, etc. I try to embrace/think of as many of them guys as I can. In my humble 17 years non-academia practitioner experience I learned how far cost types never cease to surprise how well they can compose and prosper. It's really such a lot of fun. Especially in enterprise software...

Now, if I try to think of the cost we'd have had to go with that whole Web Services, WSDL, SOAP, etc shebang thing that the web era hype forced onto us, PLT wisdom around or not... well, I'm having difficulty to replay for myself the whole metadata-for-interop and loose system coupling film ... without some at minima reflection support in the platforms.

But sure, I got the point on the speed aspect overlooking. I do know what a reflection based ORM can be poor at swallowing when you throw too much at it. Granted.

Still, as of today, all in all, I like reflection to be around, especially when, for the problem domains I deal with, the state of the art has no better option to propose. Until new breakthroughs notice, just as much as I can appeal and cry for more ubiquity of proof carrying code, verifiability, composable, referential transparency-friendly languages, etc.

Sure, there can be many

Sure, there can be many different models and implementations of reflection. And there can be many different costs. And many different opportunity costs.

It is good to consider costs (and opportunity costs) of many kinds. Performance is one cost that's easy to measure, but I consider such things as live programming and runtime upgrades, administration, unit and integration testing, persistence, reasoning about partial failure, reasoning about security, ability to refactor applications into services (avoiding monolithic constructs), etc..

Reflection has a great many costs other than for performance. It's really quite terrible for a language on almost every cost dimension unless you can keep it scoped (spatially and temporally). A more formal approach involving staged programming and phase separations would be better in almost every respect. But reflection is sometimes a useful hack to bypass the limits a weakly expressive languages that lack effective staging.

My point is more about considering opportunity costs. Language developers paint themselves into corners is by only paying attention to the visible costs, allowing the current implementation to drive design, ignoring how "features" will constrain and affect future growth.


Interesting description, do you have any specific links, e.g. Mailing list posts that go into the specific problems?

Also, out of curiosity, do you know if anyone has simply ignored these problems to see what kind of gains can be made in subsets of Python? I ask as I wrote a static analyser a few years ago that worked pretty well on limited pieces of Python code. It assumed single threaded code and ignored references being switched out by pieces of code that it couldn't see (or understand). The code is sitting unfinished somewhere as it ran into difficult terrain when analysing the types of dictionaries that stored references to themselves and I did wonder if it worth tidying it up a bit and sticking it on the web.

Perfect example

Yes, this is precisely the kind of the thing I had in mind. Python, Ruby and Javascript all suffer form this to varying degrees. (A lot of other languages do too, but since those three are popular, more effort has gone into trying to make them fast and/or sane.)

I think when you talk about "hidden dynamism" and I say "incorrigibly imperative," we're talking about essentially the same thing. In languages like Python, the state of the interpreter (and, in fact, the program itself) is basically a big mutable data structure which is directly accessible to the programmer. In a multi-threaded environment, this is fully as painful as you'd imagine.

(Of course, at a low enough level, the same is true of all programs on a von Neumann machine, but even systems languages such as C were wise enough to leave that unspecified, and modern OSs generally don't allow writing to code pages anyway without some extra work.)

Add to this the fact that encountering a block of code like "class Foo: ..." basically amounts to instructions on how to modify that data structure and it's truly a world of pain. Apart from those who've given some serious thought to programming languages, I think it's very difficult for the average programmer to understand how vastly different the word "class" is in Java/C++ vs Python/Ruby.

But anyway, we're definitely talking about the same things, so thank you for the very concrete example.

What's the course text?

I'm surprised this fascinating discussion hasn't recommended any "course text", after all it's Programming Implementation 101.

What would your one book be? (Okay, perhaps two ...)

The Dragon book maybe?

At least, this is the one that was always recommended to me. But it is a bit on the heavy side, and primarily concentrates on compiler design (of course, the advice others gave to me was if writing an interpreter, just stop half way through the book -- not sure how good that advice is though). The other one would be SICP, it has a couple chapters on implementing a Scheme interpreter.

In keeping with the 101 theme, would a set of blog posts following the evolution of the design an implementation of a simple language help? I've had one go at writing an interpreter, but I was thinking of keeping a log of my second go-through, from initial design concepts through full implementation.

... writing an interpreter ...

Well I'd certainly take a look.

There are lots of good

There are lots of good recommendations in the Getting Started thread, although I suppose it can be a little tough to separate the implementation from theory material...

A problem is that as far as implementation goes, the best book really depends on what you're trying to do. Writing good interpreters is still (and I guess will always be) something of a black art, and writing a good JIT even more so. (On the other hand, I guess there is at least one book about JIT compiling nowadays, although I haven't seen such a thing myself, and there is lots of good material on interpreters scattered around the internet: the papers of Anton Ertl come to mind immediately.)

As far as books go, I guess depending on your starting point I would recommend any of:

  • Essentials of Programming Languages (which edition is best is a matter of debate)
  • Programming Languages: Application and Interpretation
  • Programming Language Pragmatics
  • Advanced Compiler Design and Implementation
  • The dragon book
  • Lisp in Small Pieces

I know those are far from proper citations, but I think they're plenty with a bit of googling. I'm sure there are other good choices, these are just the few that occur to me off the top of my head. If you're language is even a little bit non-"standard" (a logic language, for instance, or a query language, or lots of other things), these may not help much unless you need help with early stages like parsing or, on the other hand, can already compile to the point that things like register allocation are giving you trouble. In other words, you're probably back in the domain of "black art," and more specialized books like "Compiling Esterel" will be more up your alley.


Just ran across a blog post that offers a decent overview of some interpreter implementation tricks. Something of a black art, as I said, and this all presupposes a basically bytecode-VM design...

Separation anxiety...

"although I suppose it can be a little tough to separate the implementation from theory material..." That's what prompted me to ask.

Also in Implementation 101,

Also in Implementation 101, what sort of online communities exist that cover these sort of things that fall outside of LtU's focus?

All boils down to requirements.

I've been reading various answers, and -as was to be expected- there is no general answer. So maybe you could state more requirements?

In a nutshell, I would say that if you're aiming for a research languages, a prototype, explaining some nifty typing features, for example, then I would recommend implementing it in a functional language. It will be slow, and not usable for much more than expressing typed programs, but then the whole aim of an exercise like that is to get a paper published, and dump the language afterward, so that's your best bet.

If you're aiming for a DSL with some nifty features, which should be somewhat usable, then Java, LISP, .NET, or some high level language would be your best bet. You can start of with implementing a library implementing the 'runtime' and then target that with an 'ad-hoc' language which just interprets source code using a simple parser, and you won't need to bother with low level details like machine code to generate or GC.

If you're aiming at implementing a new GPL, your best bets are Java, .NET, LISP, or C(++) targeting an intermediate machine code language. Old-skool bootstrap is the longest route and probably, for more complex languages, means your looking at a road map of one year to a decade. The best thing for a GPL therefor probably is to start off with an interpreter in C or Java, or LISP, pending on the kind of language your aiming to implement.

It all boils down to requirements, but since in academia there is no real need for more languages, and you'll want to publish instead of producing industrial tangible results, the normal route almost always is an interpreter, or a shallow embedding, in a functional language.

Black art 101

My experience is also that back end issues are a black art. So what are you favorite under the radar solutions/dirty hacks, that people actually use but textbooks rarely mention?

update: I mostly meant black art that helps deal with the more mundane aspects of implementation, not with the parts that are clearly the realm of dark knights, er... wizards.

Weak linker symbols

Not exactly a black art, but weak linker symbols should be on the radar of everybody compiling to C.

In ordinary to-C compilation, you need a single C file that declares all the global variables (e.g. Lisp in Small Pieces uses this technique), because otherwise you may end up with multiple declarations of a symbol.

With weak symbols, every compilation unit may declare global variables it needs weakly. The linker will unify multiple weak declarations into a single symbol, greatly simplifying to-C compilation, especially separate compilation of units. Some pointers here.


Can I just say those of us using 25 years ago the dynamic linking of Apollo Domain OS, not to mention the Lisp Machine OS, are absolutely crying in their beers right now. Actually knowing this about Linux kind of makes me not so surly to be programming in that crap JavaScript.

I'm sorry. My rant is over... what were you saying again?


Manuel's suggestion was a neat trick.

Other than that, in to-C tricks, I guess we would include -fno-strict-aliasing and computed goto. Both are nonstandard C. It's difficult for fixnums to work well without -fno-strict-aliasing, in C at least, because the calling convention for returning unions is different from returning pointers on some common architectures (x86-32 for example).

Actually all of the common interpreter implementation techniques are hacks, like direct threading.

As an aspiring language designer/implementer

(Decloaking here after a few months of lurking, thanks for all the good stuff so far.)

So I'm trying to design a language that is statically typed, compiled to machine code, but otherwise much like Python. I've started prototyping a compiler in Python, with a hand-written lexer and Pratt parser and compiling to LLVM IR. I've ignored the LLVM APIs for that, so far, and just written out IR as I got to know the LLVM, which has worked nicely. I want this language for practical reasons, not academic ones; I'm kind of disappointed by both Go and Rust and want to see how far I can take elements of Python.

But some parts of the implementation have completely eluded me so far. For example, I've implemented fairly nice compile-time error messages (complete with carets that semi-accurately point out the error location, at least for the parser and lexer), but any tips on handling run-time errors would be very useful.

I think that there is huge value in error propagation, but cheaply recreating Python-style tracebacks is impossible in the face of inlining. I'm not sure if that means I should have a debug compilation mode where they can be enabled (and make it the default for now, premature optimization and all) or if there are other strategies that also lead to easy-to-understand run-time errors.

Something else I haven't figured out yet is how to do Python-like generators. Should I just allocate the stack frame on the heap and pass around a pointer to it, with some small int indicating the basic block to jump to when resuming?

Maybe issues like these are too specific (or too practical!) for this thread.

Lua coroutines

About error messages, debug compilation mode is a good default. Also, consider entering the error handler before unwinding the stack - many exceptions can be cleanly resumed.

For generators, look at how Lua implements its coroutines. (Lua is sufficiently Python-like, but the codebase is at least an order of magnitude smaller, and reasonably easy to follow.) You'll want to manage the stack(s) yourself, rather than using the implementation language's call stack. This can also help with better error messages.

I suddenly remembered the 90

I suddenly remembered the 90 minute scheme to c compiler.

Excellent! LtU is such a

Excellent! LtU is such a mine of references gems. :)

Avoid cascading error messages

I recommend going to some lengths to avoid cascading error messages, where one error results in a long sequence of dependent errors.

A useful trick is to have, internally to the compiler, an error type and an error value. When a type error, undefined symbol, or invalid expresion is detected, the problem is reported once and an appropriate error object goes into the parse tree. Any operation involving an error object results in another error object, so that error objects propagate. Any error detected involving an error object is not reported.

This prevents the classic error message cascade associated with some compilers. (GCC comes to mind.)

An Incremental Approach to Compiler Construction

An Incremental Approach to Compiler Construction on is a fascinating approach to implementation for beginners.

Tantalisingly, Ghuloum mentions an extended tutorial and test cases (as does his post on LtU). However the links are broken, and I can't find them elsewhere. Can anyone point me at them? I'd love to find out more about his ideas.

EDIT: Thanks to Nada Amin for the link:

Such a big subject

There are a lot of good things already posted. My two cents on those:

+1 for tracing of ASTs and other internal data structures. Have a way to print ASTs after each phase in the compiler or interpreter.

-1 for interpreter first. Spend the time early on to figure out what kind of staging makes sense for your problem, and implement that.

A few things I don't see mentioned so far:

- Write the manual or specification first. You can really fool yourself implementing simple cases of a programming language, only to need a comple redesign once you think about the general case. Take the time to a develop a complete draft of the language, including several hand-written examples, before you go to implement it. When changing the language later on, update the manual first, and only then the implementation.

- Add extra invariant checkers that you can run after each phase, e.g. that if an AST node refers to another one, the referee is still in the AST.

- If it's a typed language, freeze the types on the AST nodes early on in the pipeline, and from then on use AST constructors that maintain types automatically.

- Write verifiers for all input files and take the time to implement actionable error messages if they are bad. This applies most obviously to human-provided code in the language you are implementing, but also any other inputs you might use such as image files, separately compiled output from previous runs, and protobuf descriptors.

- When fixing bugs, take the time to widen the fix to a more general class. Watch out for branches that have 4-6 conditions, 3 of which were added by 3 subsequent bug fixes. In such cases you either have a problem with the language design, or you have botched the implementation in some way.

- Implement the Andrew Appel Tiger compiler.

Things that apply to any software, not just language implementations:

- Test first before each new feature, and keep your regression tests passing. This will often lead to writing additional parsers and unparsers for intermediate data structures.

- Even though language implementations have a convenient public API--run this code, get this output--resist the urge to structure all of your tests in this way. Instead, write unit tests of small chunks of the implementation.

- Get another pair of eyes to review your code.

+1 for tracing of ASTs and

+1 for tracing of ASTs and other internal data structures. Have a way to print ASTs after each phase in the compiler or interpreter.

One thing I learned from Sanjay Ghemawat: don't just print your AST/internal transformation structures, output them to a graph that can be rendered in graphviz. Hopefully this technique is commonly known, but if not, its a very important language implementation technique.

AST -> Graphviz

If this is not a commonly known technique then more people should tell each other about it. The gain in productivity is huge. The only problem is when you hit the threshold for stability in Graphviz and some output will render while other output will hang Graphviz. That threshold is much larger than the size at which textual descriptions of AST stop making sense.

You still have to put some

You still have to put some effort into filtering your graphs. Also, it offers a different benefit from textual tracing (I often use both methods).

First cut: Lex. Yacc. Lisp.

The easiest way to implement a new language?

First: Use standard tools like Lex and Yacc (or Flex and Bison, or Jlex, or JavaCC, or Antlr, or whatever) to lex your programs into tokens and then parse the token stream into abstract syntax trees.

Second: Make the parser output the ASTs in S-Expression form, replacing any tokens not acceptable to the parser of your favorite lisp system with "name mangled" versions that are. But keep them recognizable! You're going to wind up staring at these AST's trying to debug, so you need to be able to tell which token of your language corresponds to what "name mangled" token.

Third: Define your language semantics using Lisp or Scheme functions and macrology (a disarmingly brief and simple instruction, representing more than 90% of the work).

Fourth: Together with the definitions you created in the Third step, the AST you expressed as S-expressions in the second step should now be a valid program in your lisp system, so you can run it.

What this gets you is a first-cut implementation where you can write code and refine/debug the semantics. It lets you adjust the syntax (with simple changes to the lexing/parsing code) and semantics (by debugging the semantic definitions) independently. And it takes care of Garbage Collection for you so you don't need to worry about it at first.

It is likely to be inefficient, unless you're very careful with your Lisp code and choose a good Lisp implementation. It will probably involve a lot more memory-hoggery than a really good implementation would. Any "dynamic" semantics in your language are likely to require expression as macros that write other macros in Lisp, which can be confusing to code or trace. But dynamic semantics were going to be confusing to implement correctly anyway, one way or another.

But anyway, that'll get you a rapid first cut in a fairly easy-to-modify form.

Using the first cut, write some code. If it's supposed to be a domain-specific language, pick tasks in that particular domain. During this process, you'll discover missed cases you want to add, and changes you want to make to the syntax -- or the semantics. Otherwise (meaning, if it's not a domain-specific language), pick general computing tasks, and try to write that in your language. Some insist that your most important general computing project to try out your language on, ought to be its own compiler.

If successful, you will now have a second cut...


Another approach

Sometimes it also makes sense to extend an existing implementation, at least for experimentation purposes. Alternatively, use a library designed for building language extensions like polyglot for java.

Learning to implement languages in C#

For those interested in a very basic introduction to implementing programming languages I wrote an article on CodeProject a few months ago.