A type-correct, stack-safe, provably correct, expression compiler in Epigram

In the continuing story called Epigram, a new contribution by James mcKinna and Joel Wright.

Conventional approaches to compiler correctness, type safety and type preservation have focused on off-line proofs, either on paper or formalised with a machine, of existing compilation schemes with respect to a reference operational semantics. This pearl shows how the use of dependent types in programming, illustrated here in Epigram, allows us not only to build-in these properties, but to write programs which guarantee them by design and subsequent construction.
We focus here on a very simple expression language, compiled into tree-structured code for a simple stack machine. Our purpose here is not to claim any sophistication in the source language being modelled, but to show off the metalanguage as a tool for writing programs for which the type preservation and progress theorems are self-evident by construction, and finally, whose correctness can be proved directly in the system.
In this simple setting we achieve the following;
• a type-preserving evaluation semantics, which takes typed expressions to typed values.
• a compiler, which takes typed expressions to stack-safe intermediate code.
• an interpreter for compiled code, which takes stack-safe intermediate code to a big-step
stack transition.
• a compiler correctness proof, described via a function whose type expresses the equational
correctness property.

P.S. I found it difficult to classify this paper correctly. Neither functional, nor type theory seem the proper category for a dependently typed programming language, which probably is a separate paradigm.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The first half of the paper

The first half of the paper is essentially the standard example of using GADTs to safely evaluate a monomorphic typed language, albeit using a different notation than other papers.
The second half of the paper is a nice walkthrough of how to prove a stack based abstract machine is equivalent to an evaluation function.

Granted, theres nothing fundamentally new in this paper (as the authors admit in the end), but I'd say its still worth reading because it presents some interesting ideas in a simple, elegant, wonderfully written manner.

Key Point

Yeah, in the end, the key point seems to be that dependent types give you more practical leverage from the Curry-Howard Correspondence (which, thanks to this forum, now always sounds in my head like "Dear Ma, Larry boinked me in the eyes again today...") than other systems do. On one hand, that is indeed a "no kidding" point, as anyone using any of the good theorem provers can tell you—and I do intend to review Coq'Art after I've gotten a bit farther through it.

But what's nice to see in this paper is how much Epigram succeeds at seeming like just a more powerful programming language, without the fumbling around with tactics and the "great, here's my proof; how do I get a program out of it?" aspect of even very good "certified program generating" theorem provers such as Coq. I think this is unbelievably important, and honestly I think that even a lot of programmers who have heard of the Curry-Howard Correspondence would be surprised to find that their software in a language like Epigram was simply "correct by construction" some of the time, and that some fraction of the rest of the time could still be expressed within the language itself. Results like that are why we care about type theory in the first place.

Agreed. I have to say though

Agreed.

I have to say though that I do believe we won't see languages as powerful as epigram gain widespread use until there are some compilers that can propagate the types down to the assembly level, play nice with seperate compilation, and do a respectable amount of interesting optimizations.

FLINT?

The FLINT project and related projects are close to that. I would be more surprised if there wasn't a compiler that fit those requirements somewhere (depending on what "interesting" means). Incidentally, the paper should also have impact on proof carrying code and related techniques (i.e. it should be much easier to generate PCC for more sophisticated properties given a more typed language).

ooops, I was imprecise in

ooops, I was imprecise in what I was saying.

I meant type preserving compilation for type systems at least as powerful as those of Epigram or Coq. This turns out to be a technically challenging problem, because one has to find a point in the design space where issues such as termination, and phase distinction (to name but a few) are dealt with in a satisfactory manner. I claim that all currently existing approaches fall short.

Augustsson's Cayenne compiler throws away the type information and then just runs the LML compiler with type checking turned off.

Xi's work with ATS and such, while enabling the programmer to prove theorems about the program, is still not quite the right way to go. The explicit seperation between proofs and programs (while good from an erasure standpoint) makes learning to use those features just that more challenging, and moreover the type system doesn't make the language more expressive, it is instead meant to more precisely capture the invariants needed to rule out errors that are possible with ML's type system (eg tail [] )

epigram's compilation style doesn't play nice with seperate compilation, and thats not good.

coq's extraction to OCaml doesn't count, because of the need to introduce copious uses of magic coercions, and the added dependency of relying on the Ocaml compiler correctness (though thats probably not an issue in pratice).

I do think we'll be seeing compilation technology that plays nice with the intersection of type preserving compilation, seperate compilation and powerful type theories such as that of Epigram sometime in the next year or so , but as yet it doesn't exist.

Right On!

I continue to wonder why we don't see more research around principal typings, especially given the results in ML Has Principal Typings. Since someone recently asked, a good starting point for learning more about principal typings is Trevor Jim's What are principal typings and what are they good for?. Of course, now I need to learn what the relationship between dependent types and principal typings is.

I should point out that the

I should point out that the notion of principal typings (while a property of a type system) has to do with type inference. So I suppose there is a connection in the sense that (partial) type inference for dependent types is in many respects an open problem, but thats about it.

by the way
Has anyone included a link to Barendregt's paper Lambda calculi with types in the tutorial section? Its a nice document that goes from the simply typed lambda calculus all the way to pure type systems and goes over the both church and curry presentations of type systems.

heres the link ftp://ftp.cs.ru.nl/pub/CompMath.Found/HBK.ps

Specifics

We're apparently both being vague today. :-)

I meant that type systems with principal typings (vs. principal types!) support nice features such as incremental type inference and separate compilation, to your point about separate compilation. But it occurred to me that this might not be a win if we can't come up with a dependent type system that has principal typings! In googling around, it appears that most principal typings research has occurred in the context of intersection types, and there are some rumblings around dependent types and intersection types, but I still have a tremendous amount to learn about both, so I'll shut up now.

actually, i think theres a

actually, i think theres a bunch of work on dependent intersection types for Nurpl or Metaprl... I don't have any links handy, but i do know that the work was done by some folks at cornell.

Why separate compilation?

Why is separate compilation necessary? Incremental compilation a la Epigram 1 seems to reduce (if not eliminate) the need for it.

The only application I see is for non-free software libraries, which I have little interest in. Is this anything more than a political issue?

I would think that seperate

I would think that seperate compilation is vital for any massive collaborative project... Especially if various parties are working on discrete components that serve as libraries to other components.

If memory serves, Epigram 1 (as described in Edwin Brady's thesis), tranforms the code into a bunch of supercombinators. Thats not amenable to incremental compilation, thats as whole program as compilation schemes get!

Modular Development of Certified Program Verifiers

This paper describes a certified program verifier constructed by propagating safety properties that arise from a high level language's type system down through a few steps through assembly. It seems this work uses Coq as a dependently-typed programming language, mostly reserving the use of automation to constructing proof terms.

Supercombinators

Not quite - my thesis describes a prototype back end since at the time of writing there wasn't a front end to link it to. Epigram 2 will eventually compile programs more or less in the way I described, but with more up to date technology than the G-machine.

I don't believe that compiling via supercombinators is not amenable to incremental compilation. What I've always envisaged for Epigram is that as soon as a definition is "complete" (i.e. no more holes, and all external references are complete) it can be compiled to byte code, and beyond, just like you'd incrementally compile function definitions in a traditional language. I'd also expect to be able to run the bytecode at typechecking time, in the style of Gregoire and Leroy. Or did you mean something else by incremental compilation?

By the way, propagating dependent types down to assembly level and (some) interesting optimisations wouldn't be completely compatible, at least not preserving the full dependent type. Indeed, part of the point of the interesting optimisations is to erase the type information, and to use the extra static information to remove dynamic checks such as list bounds checking.

"incremental type inference and separate compilation"

I'm a neophyte in this discussion (er, and I am with most of what is on LtU), but I think terms have gotten confused along the way of this discussion :-) It wasn't incremental compilation, it was separate compilation: how do you have Epigram slap together disparate modules (some from raw source, some already "compiled") to support component-oriented production?

To continue on that riff, how do you get it to talk to libraries that were not written in Epigram? Does it have a good debugger? What about a profiler? Basically, how does one do "real" software production? (Presumably, one might take issue with how "real" software production is done today and outline how Epigram etc. might offer an importantly different approach.)