Language Design 101

Some of our most read threads are our introductions to type systems, monads and continuations, so I guess it wouldn't hurt to start yet another getting started thread.

From time to time we have questions posted about how to start designing a language, or a DSL; I think it would be helpful to collect links to various resources that might help people trying to design their first (or second, or third...) language.

Resources may include design tips, methodological suggestions, detailed discussion of major features (e.g., how important is type inferencing) etc. Think of utter beginners, but also about language mavens with little language design experience.

Two requests: (1) Let's not to turn this into a thread about language implementation tips. We'll do that one later. (2) If we dicussed the papers you recommend in the past, give links to the previous LtU threads.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

A classic

Hoare's Hints for Programming Language Design is old but well worth reading.

A similar discussion

I kinda liked this discussion on the LL1 mailing list about these issues.

Meta Protocols

A language without a meta protocol is not a language I would like to use.

Examples: Lisp, Smalltalk.

Self referential interpreter / Partitioning

Distill out the simplist self hosting interpreter for your language.

If it is hard really hard, then rethink your language.

If it hard to do this, your language will be hard to understand, hard to implement.

If you can do this, programs in your language will be easy to analyse.

----

When programming in the Large, Good Design is all about how little you have to read to understand a portion of code. How well does your language aid this principle?

But doesn't that rule out sta

But doesn't that rule out statically typed languages?
For as I understand it, the only languages which have a truly concise selfhosting interpreter are dynamically typed languages. The only such languages that even do that (and which I am aware of) are mini versions of various lisps, scheme, and joy.

Haskell and OCaml

They are statically typed and have interpreters written on themselves. Just to name a two. ;)

OTCC

I suppose it depends on your definition of conciseness, but you can create a pretty small self-hosted C compiler.

Constraint Languages

It is important in language design to not forget constraint languages. Most computer languages are not about describing an algorithm, so much as they are about describing a problem. HTML, server config files, and cad-cam are such languages. It is important to keep this in mind. Computability is not the primary language property we should be investigating, cognitive tractability is.

Is the language good for thinking in? Is its notation an effective mental tool? What can be done to make it more effective?

Smaller is better

I believe the world has enough complete languages that somebody starting off should not try to design yet another one.

The more stuff which is left out of a DSL, to the point that it is not complete (that is, one couldn't write a self-compiler or an interpreter in the language), the more likely it is to be focused on the problem it was designed solve.

That brings up the essential point: the problem the language is being designed to solve should have a domain-specific jargon and set of idioms that existing languages are ill-equipped to cater for. All programming languages are just user interfaces, and the mark of a good user interface is that it doesn't surprise (in a bad way) its users.

Consider your users

I think considering your audience is very important. If you are writing a language to do some fancy math work, perhaps you have different considerations than someone attempting to making a scripting language so that their game is more flexible. The first can aim for something powerful and elegant if it offers advantages in terms of the results it produces. The second will likely frustrate its users if it is too fiddly and not easy for them to pick up.

Resources

I think it would be best if we try to restrict this, as far as possible, to references to resources by established researchers, language designers etc.

Actual DSL design

How active is DSL design itself? Most of what I see on LL (or here, or in POPL, etc) is around core language semantics, not designing a language for a specific domain.

DSL design examples

I really recommend to anyone interested in DSL desgin learing about the cool Domain Specific Embedded Languages (DSELs) devolped in the Haskell community.

Functional Images is about Conal Elliott's work (Pan and Fran).

Haskore is a DSEL for scoring music.

And see this related thread.

DSLs

I submitted the DSL I'm working on to a conference focused on the domain the language is targeting. In other words, not a conference on programming languages, per se. Maybe there are lots of others who have done the same.

(Perhaps something will be submitted to LL/POPL/etc. in the future.)

Internal DSLs

I've implemented DSLs for internal applications. In two such cases, a DSL was used to replace the use of spreadsheets, to allow the data that would otherwise be in spreadsheets to be centralized in a database to support other applications. In both cases, a DSL was pretty much essential to achieve the necessary flexibility — in fact any solution other than an explicit DSL would have really been a disguised DSL.

Of course, configuration languages are everywhere, and even if they're not Turing-complete, they're not always trivial. I once implemented a layout language for a (domain-specific) graph-generating program, mainly because I didn't want to have to come up with the rather complex GUI that would otherwise have been necessary. That worked out well, because the users weren't looking for a dumbed-down interface, but rather something that would allow them to work efficiently.

I'm also aware of other companies which use in-house DSLs heavily internally. As is often the case, the stuff you hear about publically is only a fraction of what's really going on.

More internal DSLs

I'm also aware of other companies which use in-house DSLs heavily internally. As is often the case, the stuff you hear about publically is only a fraction of what's really going on.

I'll second that. I am reluctant to consult the corporate layer, so all I can disclose is that one of our DSLs is for describing data storage, another - for web forms, yet another - for processes, plus some 5 I do not know exactly about, and they understand each other, and either generate Java1 or are interpreted by Java. Sounds familiar? ;-)

1...plus JSP, or JS, or HTML, or HTC, or Hibernate config, or SQL, you name it.

Our paper on language design patterns

Think like an engineer

One way to think about languages is to think about machines. Unfortunately now day we have only one machine, the ubiquitous register machine. But we can always implement a new machine as a "virtual machine"; code that simulates the machine. This is a way to really get back to the starting point.