A modular toolchain for parsing and compiling code?

What do you say to Ted Neward's blog about Modular Toolchains? Is it feasible to have modular toolchains for the whole way from the source code to the executable or is it just utopia? Is it possible to convert every language to a standardized AST? Which language-features are difficult to represent in that AST? Where are the problems?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Some thoughts:

* Modular toolchains aren't novel or utopian. They are just a matter of good design.

* A universal target AST won't be useful, since languages aren't really related by syntax.

* The future won't be a single pipeline but a graph of language transformers that generates different parts of the product, which are deployed in different ways (a la Haxe, Links, Volta).

* I can't wait for something like this to wind up in the open world.

Of course, a universal

Of course, a universal target AST won't be useful, but could the generated AST be transformed to a universal abstract tree on which everything else could operate in a uniform way?

Why, yes it could...

See S-expressions :-).

Or XML for that matter. :)

Of course, in both cases (sexprs and XML), simply identifying a convenient way to represent a tree as a stream of text doesn't buy you very much. Much more interesting is what goes *on* the tree, and here languages vary widely. While TC (and the existence of numerous IRs, VMs and CPU instruction sets) shows that a "universal AST" is possible; such an AST will be highly divorced from the semantics of the target language.

It's been done...

...for some definition of "done." See the Zephyr Compiler Infrastructure project and SUIF2 for examples.

It's not clear that you need that much infrastructure, however. For example, the fact that OCaml separates its parser from the rest of the compiler, in the form of camlp4, is under-understood and -appreciated. It's put to good use, however, in Graydon Hoare's One Day Compiler tutorial, which is essentially based on using camlp4's ability to construct/manipulate ASTs for the rest of the compiler to build from—the other interesting point being that he uses OCaml as an intermediate language to emit a C program that is the ultimate thing the end-user runs. Highly recommended reading.

Something along the same lines

I don't think its possible to convert all languages into a standard AST, but I think there's plenty of room for an AST based language. I think the biggest problem with a universal AST is that every language defines the semantics of each language element just different enough for it to be a problem. This was also reflected by the comment on the blog and above.

I did a paper a while back which was rejected by Oopsla on the topic of a non-text based language using a self defined data format called Argot. The papers is at the URL below. I'd be interested in feedback from LTU people. It has themes of a universal AST, but doesn't go in the direction of trying to replicate other languages.


I'm just starting to work on this idea again. I've been looking at intentional programming, which is what started this thread..


I'm currently looking at the concept of strongly typed S-expressions. That is, each S-expression must only contain types as defined by previous definitions. I've got some more work to do, but will post to my blog about it and link it on LTU when I'm done.

Example: Stratego/XT

Stratego/XT implements such a modular toolchain.

I don't think a 'universal AST' would make much sense, but some languages play this role as they are often used as canonical intermediate language.