Closing the Stage: From Staged Code to Typed Closures

Yukiyoshi Kameyama, Oleg Kiselyov, Chung-chieh Shan. Closing the Stage: From Staged Code to Typed Closures. PEPM'08. (code)

Code generation lets us write well-abstracted programs without performance penalty. Writing a correct code generator is easier than building a full-scale compiler but still hard. Typed multistage languages such as MetaOCaml help in two ways: they provide simple annotations to express code generation, and they assure that the generated code is well-typed and well-scoped. Unfortunately, the assurance only holds without side effects such as state and control. Without effects, generators often have to be written in a continuation-passing or monadic style that has proved inconvenient. It is thus a pressing open problem to combine effects with staging in a sound type system.

This paper takes a first step towards solving the problem, by translating the staging away. Our source language models MetaOCaml restricted to one future stage. It is a call-by-value language, with a sound type system and a small-step operational semantics, that supports building open code, running closed code, cross-stage persistence, and non-termination effects. We translate each typing derivation from this source language to the unstaged System F with constants. Our translation represents future-stage code using closures, yet preserves the typing, alpha-equivalence (hygiene), and (we conjecture) termination and evaluation order of the staged program...

Essentially, representation of staged programs in an unstaged language turns out to be related to typechecking/typed compilation: typechecking is a stage. While this basic idea is straightforward, the work presented in the paper is rather intricate and depends on previous work on typed compilation. Oleg will be able to put this work in context in the forum, hopefully.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Typed compilation, proofs and staging

Recent discussions have brought together two seemingly unrelated areas: typed compilation and the representation of staging. By typed compilation I mean a transformation from an untyped to typed tagless representations. The untyped form is an AST (represented as a regular data type), which is usually the result of parsing a file or similar plain data. The typed tagless representation takes form of either generalized algebraic data types GADT (the initial approach), or alternatively, type-constructor--polymorphic terms (the final approach). Either type representation can be interpreted in various ways (e.g., evaluated, CPS-transformed, partially evaluated, etc). All these interpretations are assuredly type-preserving and patently free of any `type errors' such as failure to pattern-match type tags or reference to an unbound variable. The freedom from pattern-match failures is syntactically apparent: the representation just has no type tags at all, corresponding to closed code by construction. Therefore, the typed representations are handled internally as if no types existed: indeed, after typechecking all the types can be erased.

Typed compilation is a generalization of typechecking (the latter merely verifies if an expression is well-typed). Typed compilation is related to dynamic loading of (typed) code and is a part of implementing typed DSL. Typed compilation is inherently a partial operation (the source term may be ill-typed) and it has to deal with (run-time) type representations. The result of compilation, however, has no type-related partiality and needs no type information. `Typecheck once, assuredly interpret many times' is our main goal. It is achievable, albeit not simply. We follow the seminal work of Baars and Swierstra Typing Dynamic Typing. Given below are the applications of the ideas of that paper:

Implementing a typed DSL with the typed evaluator and the the typed compiler from untyped terms to GADT
http://okmij.org/ftp/Haskell/staged/TypecheckedDSL.hs

Implementing a typed DSL with the typed compiled evaluator and the the typed compiler from untyped terms to GADT. We use template Haskell to `compile' GADT to `machine code'
http://www.haskell.org/pipermail/haskell-cafe/2007-October/032657.html
http://okmij.org/ftp/Haskell/staged/TypecheckedDSL.hs
http://okmij.org/ftp/Haskell/staged/TypedTermLiftTest.hs

Typed compilation from untyped terms to the Tagless Final representation (which can be evaluated by all the interpreters in the Tagless Final paper)
http://okmij.org/ftp/Haskell/staged/IncopeTypecheck.hs

It is the latter typed compiler that shows uncanny similarities with the staging. In the hindsight, the connection is obvious: typechecking is a stage in the evaluation of a source code expression. The connection is deeper than the mere conceptual correspondence, and is closely related to a different paper on the representation of staging in an unstaged language (described above). To type-compile an untyped term to a higher-order abstract syntax, we need staging, or its emulation. Both type-checking and staging have to manipulate open code, whose free variables are hypotheses, or type abstractions. Another remarkable part of the tagless final compilation is its close relationship with proofs in logic. Typechecking is truly building a proof, using hypothetical reasoning. Moreover, our type compiler must be explicit with weakening (to extend the typing environment) and the application of structural rules (which take the form of cut-rules). The result of type compiling of the body of the function happens do be the same as that of compiling the whole function, which is the statement of the Deduction Theorem. The result of our typed compilation includes many administrative redices, which would be absent had we used genuine staging. One may consider genuine staging to be `cut elimination' with respect to structural rules.

This is a fascinating read,

This is a fascinating paper, and even after two weeks, and multiple reads, I'm still struggling with it. It's to be expected though, and I've already garnered some very useful insights.

I think the tagless approach is significant, and this paper is an important exploration in the same space, and my enthusiasm has absolutely nothing to do with the fact that Oleg actually cited my post in that thread. Nope, none whatsoever. ;-)

In fact, I just posted about my tagless embedding of the Orc language in C#. The tagless approach seems to has a close relationship with combinators, and I used that to good effect in my embedding of Orc' combinators.

Even more fun

If you have understood all of the above, then there are a few further obvious remarks still worth making: in a staged environment, most type-level trickery is completely unnecessary, since ordinary terms can be used as witnesses for properties. In effect, you get arbitrary computations at the type (or kind or ...) level ``for free''.

This does not quite mean that you get dependent types for free. This is also obvious: general dependent types must exist, somehow, at run-time [think of the case of Matrices, where the size usually needs a run-time representation]. What you do get for free are all properties which are statically decidable. The interesting question then becomes, exactly how much type erasure can you still do? [The answer is obvious, but uncomputable: anything knowable statically can be erased; the hard part is to figure out decision procedures that are powerful and useful].