Multi-Stage Languages

From the Meta O'Caml tutorial.

Hygienic quasi-quotation notation for distinguishing different computational stages (that is, the generating vs. the generated code) in a program.

Ignoring O'Caml for now, I have tried with the design of several languages to syntactically distinguish between multiple stages (like C++ does with its template system and macro system), and now I have been asking myself the fundamental question: is that really a good idea?

If a language is pure (or at least side-effects are contained within some construct like monads) can the compiler not simply choose to apply as many stages as desired, using only partial evaluation?

The reason I ask the question is that if I do something like create a subset of the language with a different notation, it just obfuscates things. It seems it would be much easier to provide metaprogramming primitives, and let the compiler's partial evaluation phase try to work out what it can, and do the rest at run-time. This may even solve some thorny type-system problems.

So to sum up: my hypothesis is that I should let people write code, as if types are a first class citizen. The partial-evaluation phase will pre-execute what it can. Any non-resolved type expressions will be evaluated at run-time.

Any thoughts on the subject would be welcome.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

It's an incredibly good idea

It's an incredibly good idea if you want type checking (with some notion of checking code without evaluating it) to make sense. The code being quoted retains its own syntax and notation still, so I don't see the problem there - whereas I really don't trust a partial evaluation phase.

Additionally, I might well want to perform IO in one of the stages of computation. That'll well and truly trash the purity.

Purity Trashing

Additionally, I might well want to perform IO in one of the stages of computation. That'll well and truly trash the purity.

Agreed. However the impure types can still be contained and localized. As far as I can tell, the worst thing that can happen during pre-analysis / compilation is that some types become indeterminate at compile-time, and we have to rely on runtime type-checking, and use some form of variant type.

Only that's not much good if

Only that's not much good if my staged program is in fact analogous to a compiler and I don't want to ship the source off to the user.

Types as Abstract Interpretation

I was going to suggest something similar [i.e. achieving types via partial evaluation] in response to your previous post, but didn't get around to it. There's been academic work on this, such as Types as Abstract Interpretation (previously mentioned here). I've seen other papers along similar lines, but I'd have to dig for them.

However, I think that as far as a special syntax for metaprogramming goes, there's more to it than you've mentioned. Good compilers for functional languages already do quite a bit of partial evaluation, but that doesn't necessarily deal with the case in which code manipulates other code. Usually, when new constructs are introduced to support metaprogramming, it's for this purpose. It's possible, though, that in the concatenative languages like Joy and Cat, that the line between ordinary programs and metaprograms is sufficiently blurred that you might not need to make any special distinctions. If so, I'd be interested to see the results of such a design.

[Edit: ...but to address the metaprogramming issue specifically, rather than the type issue, special syntax would help to statically distinguish between code that's just manipulating e.g. an ordinary list, vs. code that's manipulating other code. So I suspect that there's no getting away from special syntax for metaprogramming in general.]


Anton van Straaten: So I suspect that there's no getting away from special syntax for metaprogramming in general.

I'll go one step farther and challenge anyone to come up with anything simpler than MetaOCaml's "brackets," "escape," and "run" in a statically-typed multi-stage language with the language at all stages being identical.

You can read this as a challenge for Qi programmers, if you wish. :-)

Perhaps I am naive

Perhaps I am naive but, I do not see what the limitations would be to simply treating programs as list of sub-programs? Could we not simply have the entire language (no less and no more) used to operate on them to generate new code? Why does a programmer need to specify the stages that occur, can they not be implementation defined and resolved at compilation whereever possible?

So as far as I can tell, we could do everything we need with just two primitives: the i combinator (eval or exec if you prefer) and a program_to_list type conversion.

However, I suspect I am missing something important here.


It is possible to automatically PE code, if that's what you meant, but certain non-obvious cases aren't possible, e.g. if the programmer wants to do IO or non-referentially transparent operations at compile time (or pre-runtime).

In these scenarios

In these scenarios I was thinking that the compiler should resort to inserting variants (if not done already by the programmer) and runtime-type checks to maintain type-safety.

This shuts down any further analysis at compile-time, at least local to the side-effect, but I can't see any serious detriment to doing so. It would appear to be precisely what the programmer desires anyway.

Incremental Typing as in Epigram?

Epigram does incremental typing, which is partial evaluation from my viewpoint. Since types are first class citizens in Epigram, and incremental typing allows you to inductively define many interesting things...

I agree with your hypothesis, it reminds me of FrankA's idea that dynamic or static type checking as the same thing, with static meaning 'available as a library'.

I will have to take a look

I will have to take a look at Epigram. This sounds very much like what I am envisioning.

Multi-Stage Programming With Hygenic Macros

I'm glad someone pasted something on Multi-Stage Programming, as I'm futzing around with implementing a form of it under Scheme. I'm experimenting to find out if it's feasible to implement Multi-Stage evaluation via the addition of a construct to eval, which will add an additional stage either before or after (or possibly both, I don't yet know which would be best) the standard Macro-Expansion Phase, in which another round of evaluation of source code would take place.

I think that by making a construct, lets call it "par-eval", which you can use to wrap code which would benefit from additional stages of evaluation, and then reevaluating it again, you could enable some very interesting behaviors.

A trivial example is the precomputation of constants:

(define x (par-eval (+ 2 3))

which would be the same as writing:

(define x 5)

A non-trivial example would be the calculation of a non-trivial invariant prior to run-time and deployment (assuming that you can compile the resulting code after the first stages are complete), such as a Regex generator. Let's say that your final program needs to read a particular file format (such as CSV), and for convenience you'd like to read it the quick and dirty way using regular-expressions. Then using this technique, the FSM needed to grok your format can be pre-generated prior to evaluation-time so that there would be no real performance hit for using a "convenience technique" instead of a more efficient--but less convenient technique.

I'm still new to Multi-Stage programming, so I'd appreciate any feedback that you have: Does the above sound like a good or usable idea? Has it already been done (in full or in part) elsewhere?

Edit: Fixed typo.

Special Syntax

If you consider run-time code generation as some kind of metaprogramming ( instead of macro- ), you need no extra syntax.

For example with CodeBricks you can write a run-time regexp compiler.

My own little RTCG thingy relies on run-time VM code transformations while hiding bytecode details : Comment.
With futures, it is even possible to delay the code generation/transformation until the target function is actually invoked. "Just-In-Time"...

Multi-stage is a compromise

Explicitly staged languages are a compromise, not a desirable goal.

Even the designers of MetaOcaml agree: the staging is useful as and experimental tool to learn more about how to get rid of it.

What is required is a partial evaluator, in which unknowns are progressively bound, yielding a more and more specialised program. When finally everthing is bound, you have an irreducible term, which is your final result.

Staging requires strict level boundaries and sequencing of the binding information in stages, which has two advantages: it is possible to specify clearly what the inputs for each stage are, and it is possible to actually write a partial evaluator which handles all the intermediate stages the same way (the initial and final processing is likely to be rather specialised).

In turn, the ability to multi-stage with explicit program guidance significantly simplifies the staging recursion. On the other hand a 'generalised partial evaluator' is a very open ended thing, and any system must not only be able to bind more or less arbitrary inputs, it also has to make it clear to the programmer how to proceed.

Conceptually, you could examine the current terms to see which ones might become reducible if more data were supplied, supply it, reduce the terms, then look for another term which could be simplified.

There's a big gap between such an interactive, experimental, evaluation strategy and a programming language for which you can easily predict what data to supply, without examining the internal state.