Getting started in language design -- reading material?

I have a bunch of ideas for my own programming language (like every other CS undergraduate with an appropriate degree of hubris) and I'm starting to fiddle with writing my own compiler for it, but I'm having trouble understanding a lot of technical material out there. Most of the programming language papers and tutorials I'm finding on the net assume that I already know lambda calculus and other advanced mathematics.

What's some good easy to understand stuff I can read to get the mathematical basis I need to understand all the jargon, or understand some if it without the math? (i.e. I'd like to understand dependent typing, but the papers I've found turn my brain into knots)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Try the "Getting Started" thread

The Getting Started thread has a number of suggestions.

Three suggestions:

This is what worked for me, starting from a knowledge base of "can program, but doesn't know any lambda calculus".

1. Learn how to program in ML or Haskell.

2. Work through at least the first half of Friedman, Wand and Haynes's Essentials of Programming Languages.

You can do these two steps in either order, or concurrently. This will give you the concrete intuition you need to understand the formal math -- the math is incredibly hard if you don't have an intuition for what they're trying to do, but once you have it, then it becomes about as hard as reading some code in a new language. Then, you're ready for the next step:

3. Look at Benjamin Pierce's Types and Programming Languages. What you want to do is to learn the notation, and informally understand what he's trying to prove. Try implementing evaluators and typecheckers for the languages he talks about; this will make what the proof rules are doing very clear. Implement the evaluators, too, because you should know how to do them from EOPL, and that will give you the mental confidence to take on the typecheckers.


This isn't too different from how Pugs got started. See (dataangel) the pugs interview for example.

As a side note, Haskell was the way I (also) was really led into all this formal stuff. Though it was not my goal in learning Haskell (and I was interested but not conversant in the formalism before).

Learn a wide variety of languages

So much language design material--and so many programmers with language interests--is geared to languages like Haskell, ML, and Scheme. I think the best thing you can do is to learn and use a wide variety of languages, including:







more warp-age

Other mind expanding languages:

Joy - concatenative language
Icon - goal directed evaluation.
Signal, Lucid, or Lucid-Synchrone - reactive, data flow languages

Don't forget Prolog!

Don't forget Prolog!

EDIT: Whoops; it's already on the list....

Oz, Curry

Everyone should learn Prolog. But in my opinion, there are languages that do logic programming much more pleasantly. While not just logic programming Curry and Oz are well worth looking into. In fact, Oz is practically a must.

Oz, Scheme

I second the suggestion about learning Oz.

And don't forget Scheme.


Oh, and check out Epigram for something completely different. It's a lot easier to see where it's coming from if you've seen inference rules before (check out Types and Programming Languages, mentioned above, if you haven't).

Don't do it!

Back in college (which ended a year ago) I wrote 2-4 programming languages, depending on how you count versions. One was relatively simple, and the other(s) was going to replace every other language.

Practically, I would start out with a tokenizer and/or parser to get stuff into a tree, and try to get a calculator working. I would stay away from the more mathematical papers, and worry about the basics. Also, program it in a high level language.

This is a very open-ended and hard problem and would bet anyone $36.4 (or more) that they couldn't solve it. Not to discourage you... just don't get idealistic like I did :). A preprocessor, IDE, or debugger for an already existing language would be more useful to everyone (including probably you). After all, a language is like a lingua franca (common language). If you have to make one, though, keep it simple and don't worry about "dependent typing" until you have types.


I don't plan to go to implementation until I feel like I have enough 'radical' ideas ;)

the language machine

The coal face for language is the process of analysis and transformation - hacking structure and meaning out of a stream of input symbols.

Most language tools start from a generative model of language, and so are forever condemned to operate at one remove from the coal face.

The language machine gives you a system for applying unrestricted recognition rules to a stream of input symbols. In other words, the pickaxe you need to hack directly into the coal face. It may not be perfect, but this is free software under Gnu GPL - you can pick up a grindstone and help me to sharpen the blade.

By unrestricted, I mean in a technical sense - the analytic equivalent of the generative rules in Chomsky type-0 grammars extended with variable bindings, but also in the common sense - without stupid limitations on what you can do.

So if you need a rule that produces more symbols than it consumes (in the code generator for example), you can have it. If you need a rule that consumes symbols and substitutes nothing, you can have it. If you want a rule that produces something from nothing (eg an optional clause with a default value), you can have it. If you need left-recursion, or right-recursion, or priority constraints, you have them. If you need the pattern you recognise to depend on what has already been seen, you can do it.

The system is directly usable - write a few rules, compile them to create a shebang script, try them out. Someone has said to me that "The syntax is weird at first glance but is in fact really concise and efficient. I love it". There are several examples to start from, and a diagram that shows you what happens.

I have recently shown that the rule system of the language machine effectively contains the lambda calculus. In other words it really is a machine that does language.