Meta-Compilation of Language Abstractions

Meta-Compilation of Language Abstractions, a dissertation by Pinku Surana.

High-level programming languages are currently transformed into efficient low-level code using optimizations that are encoded directly into the compiler. Libraries, which are semantically rich user-level abstractions, are largely ignored by the compiler. Consequently, library writers often provide a complex, low-level interface to which programmers "manually compile" their high-level ideas. If library writers provide a high-level interface, it generally comes at the cost of performance. Ideally, library writers should provide a high-level interface and a means to compile it efficiently.

This dissertation demonstrates that a compiler can be dynamically extended to support user-level abstractions. The Sausage meta-compilation system is an extensible source-to-source compiler for a subset of the Scheme programming language. Since the source language lacks nearly all the abstractions found in popular languages, new abstractions are implemented by a library and a compiler extension. In fact, Sausage implements all its general-purpose optimizations for functional languages as compiler extensions. A meta-compiler, therefore, is merely a shell that coordinates the execution of many external extensions to compile a single module. Sausage demonstrates that a compiler designed to be extended can evolve and adapt to new domains without a loss of efficiency.

A very interesting and detailed paper, which touches on many perennial LtU subjects, and once again shifts the line between user programs and the compiler. If you're tempted to say "this sounds like X...", then read Chapter 2, which gives a comprehensive comparison to alternative approaches, including static type inference, traditional macro systems, templates, partial evaluation, and multi-stage languages such as MetaML and MetaOCaml.

Some carefully selected quotes which I think summarize the summary quite well:

The Sausage system provides a framework for programmers to write new domain-specific optimizations and inject them into the main compiler.
...
This dissertation takes the rather extreme view that anything beyond Core Scheme is a compiler extension.

Pinku Surana will be presenting this work at a meeting of LispNYC on Feb 13th in New York City. An announcement with details of the meeting can be found here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

c#

I would like to see more meta programming support in c#.

Like (pre/post) aspect oriented programming support.

:)

also, for source-to-source compilator one might be interested in a c# to javascript one :)

cheers

How is this relevant to the

How is this relevant to the post you are commenting on?

well, the post is talking

well, the post is talking about metaprogramming.

the blog also is about c# as far as i can tell.

so, i have a project about c# that involves some metaprogramming.

i was linking to my project to get any feedback on the idea or experiece in meta programming.

thats all.

but if i am wrong, that is there is no link, then i am sorry- i need to sleep before posting on lambada the ultimate :)

Also related: Active

Also related: Active Libraries.

Interesting, but possible limited

The idea behind Sausage is very powerful, and very interesting, although the implementation in this thesis takes a lot of shortcuts. As they are the kind of shortcuts that I would expect in an MSc (because of the time pressure) it will be really interesting to see if the author continues the work into a PhD. If he, or anyone from his group is reading there are a few questions that it would be nice to discuss.

The description of termination in the metaprogramming chapter is a little ... loose. The phase ordering problem is hard enough to have a lot of work devoted to it, and yet in Sausage it seems to be brushed under the carpet. Is this a property of your rewriting system, that it becomes easier or is Sausage ignoring the problem?

One aspect that wasn't particularly clear is how the possible AST rewrites are appled. In the text it states that all rewrites are checked against each tree to make sure that they are applied when they can be - this assumes that the transformation graph is "sparse". What happens in a "dense" graph where multiple possible rewrites can be applied to the same AST? What measure is used to choose between them? If an arbitrary rewrite is chosen then other rewrites may be masked, and then cannot be appled on the transformed tree.

A couple of the termination strategies seemed slightly dodgy. Generating AST hashes will only catch cycles, there are also infinte paths if the rule unfolds some structure that would not be captured by the strategy. This may be what is meant by the "-N -> N * -1" example although the wording makes it seem like it would be rejected because of the mutual recursion with the previous rule, it could also be rejected because "-1 -> 1 * -1" can be applied infinitely often. The final solution seems a lot like hill climbing (is that what it is?) and so it would be limited in the solutions that it could find.

All in all it is an interesting result, and it's always good to see more metaprogramming in the world...

Clarification

Thanks for your comments - the author is aware of this thread, so I hope he will respond. Just one factual point: afaict, this is a PhD thesis (its front page contains a clue to that effect). Also, the linked meeting announcement mentions that Dr. Surana received a PhD in CS from Northwestern. Apologies for not making that clear in my post.

Doh!

Sorry, I must have mentally blacked it out after reading the word disseration. I did think it was rather in-depth and long for an MSc thesis....

Re: Interesting, but possible limited

Thanks for reading my work. The shortcuts are there because I'm extremely lazy.

(1) Sausage doesn't attempt to prove termination at all (just like most rewrite systems). I opted for maximum power to see how far the technique would go, rather than add safety features too early. There's lots of work out there on rewrite systems that prove confluence and termination at compile-time; however, the rewrite rules are necessarily restricted so they can be analyzed. Solving termination is way over my head and not the main thrust of my idea. A "safe" rewrite system could easily be layered on top of my system.

(2) I didn't need complex rule ordering. Most systems, including mine, simply apply the rules in the order in which they appear in the file. Most Prolog systems do that, so it's good enough for me. A rule doesn't have to be 'match -> replace'. It can be 'high-level-match -> (big function to select the right replacement)'. So if you need to do more disambiguation, do it on the right hand side.

(3) Phase ordering is a more complicated problem. Since I don't know which compiler extensions will be run together, I can't do any preprocessing to merge extensions or otherwise deal with phase ordering. That's why I run the extensions in such a simple and expensive way. If meta-compilation actually becomes more popular, someone smarter than me will solve this problem.

I don't want folks to get distracted by my simple prototype; only the idea is important. Consider: 10 years after XML became ubiquitous, Microsoft will add LINQ to C#. With my system, an XML library writer could have added this embedded DSL immediately. That's the point: complex library APIs are really embedded DSLs and can be simplified/optimized by custom compiler passes.

Good response

Your answers have satisfied my curiosity. If anything, I'd say that you're being overly modest ;^) The simplicity of your approach is a hallmark of elegance, and although I've seen a lot of systems that try to suck the library level into the language for analysis, most of them proved to be overly complex. A project that I worked on a couple of years ago tried to do something similar using Ciao prolog as a metaprogramming language, although we achieved some results the complexity of our approach was a killer in the end. Too many layers of abstraction become fragile.

Your point (2) is quite interesting. In my mind I was thinking of trying to automatically match-up the possible transformations to sites in the AST, but of course doing that would create a large (hard) ordering problem. Using patterns as single heads, and multiple clauses in a prolog style is a good solution. There is still the issue of merging clauses from different authors / libraries, but then what's research without future work? :^)

Fit with Albert?

I've not finished to read the paper yet, but it's interesting..

In the paper of http://lambda-the-ultimate.org/node/2021 Alan Kay talks about Albert a 'very dynamic' system:
http://piumarta.com/papers/albert.pdf

I'm trying to understand how sausage / Albert "match", not sure yet.

See also Active Libraries

This thesis seems to have some connection with Active Libraries and Universal Languages, which approaches library-specific optimization by providing a language with a fix Turing-complete core of guaranteed optimizations, and arranging libraries to take advantage of these optimizations.

Interested But Lazy

Like the author, I'm extremely lazy :) Is there a conference/journal length summary of this work? The usual searches turned up nothing.

Help 4 the lazy