Learning language design

I have been very curious about how language design is done.So,I decided to read up on it.I have started with "Programming Language Pragmatics".And I would say,just reading the first 2 chapters have given me lot of insight into scanners,parsers(LL\LR),context-free grammars,production etc.I wanted to know,how should I proceed? There are topics which I would like to explore in detail.Or,should I just first read the book completely?
I also have book by Wirth on compiler construction.And,it seems its a nice practical book.So,I am bit confused which path to take now.?
My initial goal is to build interpreters in different languages.Then,create a small domain specific language,just for fun.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I suggest pursuing the

I suggest pursuing the references given in the Getting Started discussion (see link on the left).

Design versus implementation

From your comments re parsing issues, etc., it sounds as though you are at least as interested in implementation as in design (Programming Language Pragmatics has more this focus, too, at least in the early edition I used). Of course implementation has a big influence on design, but there are different emphases. (I would recommend "Essentials of Programming Languages" if you're interested in interpreters and you don't mind Scheme as a meta-language, by the way.)

For language design proper, I think it's a bit harder to pick up directly from texts. Surely you should follow Ehud's advice above, but you might also check out the several (3?) volumes on the History of Programming Languages: these tend to provide a forum for designers to discuss the motivation, evolution, etc. of language style & features that is the real essence of design. (The recent paper on the history of Haskell is a good example of what I'm talking about.)

Oh, plus you should program in lots of interesting languages :) .

My Path

I am exploring Scheme through SICP,its fun and challenging.My confusion was,as I am reading the PLP book,there is some stuff I am not grasping completely-so,should I explore those topics(by reading online,if exists for those topics),or read through the book atleast once.Yes,I am planning to write interpreters first,which I feel will help me better understand some of the design principles.

Common Lisp

Common Lisp is a great language to study, even if you disagree with some of its design decisions. Language designers are often able to take interesting parts from CL (e.g. Ruby, Factor, Slate, Perl 6 and Dylan are all to some extent CL-inspired.)

The CL specification is a great example of the usefulness of defining a shared vocabulary among language designers, implementers, and users. Just take a look at the glossary.

write one

Sounds like you're in the same situation I was in last year. I'd read several books about compiler implementation. Then I created a compiler/interpreter in C using Flex for the lexer and hand coded a recursive descent parser. The language I created was a qbasic variant with a couple of novel features. The actual exercise was invaluable. I highly recommend it.

Thanks for replying.You said

Thanks for replying.You said "The actual exercise was invaluable".Can you share what insights you got? Did the exercise give you a different perspective about programming ?

I have yet to write a "real"

I have yet to write a "real" language in the sense that it has a complete set of features and would be practical, but in the toys I have written I've learned a lot about how to think at a sort of meta-level. In other words, as you think about how to make something like an "if" work or a function return or whatever, you end up twisting your mind in ways that are impossible to untwist later - and that's a good thing. :)

What I learned

By that comment I meant that taking a stream of tokens, transforming it into a tree structure and then executing it was very instructive. It requires a lot of planning; not only planning for lexing and parsing the language, but organizing data strucures for the symbol table and tree for the intermediate representation. In my case I had an 'executable tree' the interpreter walked that actually executed the program (along with a runtime stack).

My primary interest in languages and compilers is 'Domain Specific Languages.' I created a Basic with a couple of novel features I've seen: static tables that are essentially the same as global arrays of structs in C. Then I added 2d gfx and sound via the Allegro game programming library. I ended up with a DSL for 2d games. Hey, might as well do something fun with all this work huh? I plan on cleaning up the code and putting it on sourceforge this summer.

The project gave me a completely different view of programming. It was the largest project I've ever completed at about 4,000 lines of C. From a DSL perspective, I learned something, not quiet sure what yet. I learned a lot but also learned that there's a lot more to learn. Which is why I'm here.

A few years ago I got

A few years ago I got interested in what it would take to put together a language and interpreter. Specifically, I was curious how certain concepts would work if added to a language with a C-like syntax.

Now, at the time, all I could find were either very heavy texts on language theory and compiler design, or some examples of very specific language implementations, but no general tutorial on writing an interpreter. Of course, anyone that I asked about it said to pick up a good compiler book (such as the Dragon book), and an interpreter is basically the front-half of a compiler. Well, that advice only got me so far, so I decided to re-invent the wheel, so to speak. What I ended up with is something that is fairly usable, and the code is almost beautiful (once I go back and clean up some shortcuts and dead-ends).

The approach I took was to start off with a simple expresison parser, which translates an infix expression into a postfix representation using the Shunting Yard algorithim. The postfix code consists of an array of integers. An integer less than 256 represents an operator, where as 256 or greater represents an offset into a table of operands (variables, numeric values, etc). Then it was a simple matter to write a postfix evaluator using a stack-based postfix evaluation algorithm (the main loop for the evaluator consists of only a dozen or so lines of code). Whenever the evaluator comes across a value greater than / equal to 256, it pushes that onto the evaluation stack; anything less than 256 it calls the appropriate function for the operand that the number represents (using a lookup table and a state machine).

To round off the language, I went back and defined operators for conditional branching and loops, then figured out what the postfix expression code would look like to implement those features (initially I added a "jump if not equal" (jne) instruction that was only visible to the postfix side, but later worked on utilizing local first-class functions for the targets of the conditional and loops).

I think what I ended up with is a generic infix style imperative language, that gets translated into a sort of VM (a postfix-evaluation VM). I was considering writting up a paper or short book on the design and implementation, following a tutorial format. The main angle of this paper is that it would be written by someone who doesn't have an extensive background in language theory, so it may end up being more approachable for a beginner. The danger is that I might end up getting some of the terminology wrong, or worse yet present a less than optimal approach that could lead a beginner astray. But before publishing it, I'll probably run it by a community such as this one first.

Might be neat

For an audience who doesn't know the theory (like me) it could be good because it would start with how I would / could also plausibly think about tackling it. Then, if you can intertwine the theory to explain "but actually if you do X problems A, B, C happen if you do that, and so the right thing is X' as explained in papers 1, 2, 3"? To avoid teaching bad ideas and to motivate interest in theory?

If you can keep some clarity between essential and incidental details, as well, that helps to give the reader an education in core theory.


If you want to write an interpreter, especially for a language with interesting control flow, using monads and Haskell makes it very easy to make an interpreter and play with the semantics. Philip Wadler's paper The essence of functional programming shows how to write a simple monadic interpreter and extend it with all kinds of stuff like exceptions, continuations, backwards-in-time state, etc.

If you want a nice surface syntax parser combinators can be pretty handy, BNFC is also very easy.