Formal Frustration...

First off, I should qualify that I'm self-taught, and second that, budget permitting, LtU has been a major source of book purchases over the past few years(TaPL, CTM, EOPL,...). So, I've been working through these books in bit of a cyclical fashion(i.e. coming back to the parts I have no clue about and hoping they make sense this time...), and I avidly download and read papers that people link to from here(understanding them, of course, is entirely different matter).

Well, lately, I've developed a DSL for some of my work, and I think its all very innovative and unique, and I thought, "I should try and formalize the type system and semantics for this little language," and then I realized it: I have no clue.

I really just don't know what to do to formalize my language. Oh, I think I know the relevant concepts, I just don't know where to start.

Is there some book out there that would help with this(okay, I realize I may have already bought/read/skimmed said book, and densely missed the chapter, "How to Formalize Your DSL in 3 Easy Steps").

Forgive me if I just showed me ignorance or laziness, but this is what got me interested in programming language to start with(wanting to write down the formal semantics much the way I can write down the context free grammar), but this skill continue to elude me.

Perhaps my attempt to learn PLT by osmosis and waterfall method is not working out so great...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Start small

I'm not aware of any book fitting your request, but books like TaPL do try to teach you how to formalize. In particular, most books on semantics will start with some simple language (lambda calculus, while programs, etc), then try to add features to the formal system that match features you'd want in an actual language. The hope is that, by seeing this done for several common and important language features, you can get a feel for how to formalize in general.

By the way, I believe that TaPL has several exercizes of the form "extend this formal system with feature X". That might be a good way to get practice in a relatively simple setting.

The best advice, I think, is just to start small. Reading and understanding someone else's formal system is one thing, developing your own, another. You have to build up your skillset slowly, so you can't expect to formalize a complex language as a first step. Also, remember that the formalization is a *model* of your language. It doesn't have to be accurate or detailed, at first.

You can start with the operational semantics. Or you can start with the type system. But you don't have to do both at the same time! And I think working in either direction will help improve your understanding.

In my experience, after you get accustomed to doing operational semantics, it becomes rather like programming. (If you program in ML, it becomes *very much* like programming). The main difference is that the "language" you're programming in is mathematics.

Finally, I would suggest looking at the implementation chapters of TaPL. Although that's going from formal system to implementation, the opposite of what you want, I think seeing how to match the formal semantics to an actual implementation can be very enlightening.

From another beginner...

Try The Denotational Description of Programming Languages - I found it helpful. Short and a lot of math, but they do things very step-by-step, first formalizing simple atoms, then expressions, then adding continuations and state functions and locations.

Also check out the formal semantics of eg. Scheme . It's still relatively impenetrable to me, but at least some of the symbols make sense now.

A couple of observations

1. Specifying formal semantics isn't easy. Note that most language don't have formal semantics... At least they don't have them when they are created. Many interesting features are hard to model.

2. There are various levels on which you can specify semantics. Denoational is perhaps the most advanced in terms of the math background you need. Operational semantics might be easier, and many prefer them anyway.

3. It is easier to learn to read formal defintions than to come up with them. Most texts that don't deal directly with semantics implicitly train you in the former, but not the latter. You might want to try a book on semantics (e.g., Winskel)

4. It is true that we don't often mention tutorials and such that might help you achieve your goal. We discussed these things early on, so do check the archives (the theory department is the place to look). The tutorials and lecture notes section on Frank's PLT page links to many useful resources.

Kernel language approach

The kernel language approach as defined in CTM is one good way to define a semantics. You start by peeling off the syntactic extensions to get to the simple language that is at the core of the execution. Then you define an abstract machine for this simple language. CTM uses this approach throughout and gives a lot of examples. With some luck your DSL will be close to one of the kernel languages given in CTM.


I also suggest that you look at the LtU discussion Mechanics of designing and implementing a language. In that discussion I made a posting giving some rules of thumb for language design that might be useful to you. I think you should plunge right in. Only by trying to do it will you learn how to do it.

Interpreter

It is pretty hard to answer your question without any details. But if I were you, and I wanted to describe the (dynamic) semantics of your DSL denotationally, I would start by sitting down and writing a super-slow, very transparent interpreter for it in Haskell. If you do it in the obvious way (define a datatype for ASTs and then an evaluation function) then it will give you a denotational semantics for (a superset of) your language.

Of course, depending on your DSL, it might be wordy and ugly since this sort of semantics is constrained by the fact that it needs to be constructive/computable. But it gives you a start, something you can run and test, and from there on you can `optimize' it into something more concise and conventionally mathematical (e.g., throw in some existentials).