A Type-theoretic Foundation for Programming with Higher-order Abstract Syntax and First-class Substitutions

A Type-theoretic Foundation for Programming with Higher-order Abstract Syntax and First-class Substitutions by Brigitte Pientka, appeared in POPL 08.
Higher-order abstract syntax (HOAS) is a simple, powerful technique for implementing object languages, since it directly supports common and tricky routines dealing with variables, such as capture-avoiding substitution and renaming. This is achieved by representing binders in the object-language via binders in the meta-language. However, enriching functional programming languages with direct support for HOAS has been a major challenge, because recursion over HOAS encodings requires one to traverse - abstractions and necessitates programming with open objects.

We present a novel type-theoretic foundation based on contextual modal types which allows us to recursively analyze open terms via higher-order pattern matching. By design, variables occurring in open terms can never escape their scope. Using several examples, we demonstrate that our framework provides a name-safe foundation to operations typically found in nominal systems. In contrast to nominal systems however, we also support capture-avoiding substitution operations and even provide first-class substitutions to the programmer. The main contribution of this paper is a syntax directed bi-directional type system where we distinguish between the data language and the computation language together with the progress and preservation proof for our language.

Its been a while since I posted, but this paper appears that it may be of interest to some members of this community. It looks interesting to me, but I just wish I understood all the terminology. I don't know what "open objects" are, and why they are a problem. I don't understand what HOAS is. I don't even know what binders are. The list goes on. I surely can't be the only person who is interested, but feels that this is just out of their grasp. I bet that I probably could understand these things with a minimum of explanation, given I have experience implementing languages. If anyone is interested in rephrasing the abstract in more basic terms, I would be very appreciative. [Edit: corrected spelling of Brigitte Pientka. My apologies.]

A Verified Compiler for an Impure Functional Language

A Verified Compiler for an Impure Functional Language

We present a verified compiler to an idealized assembly language from a small, untyped functional language with mutable references and exceptions. The compiler is programmed in the Coq proof assistant and has a proof of total correctness with respect to big-step operational semantics for the source and target languages. Compilation is staged and includes standard phases like translation to continuation-passing style and closure conversion, as well as a common subexpression elimination optimization. In this work, our focus has been on discovering and using techniques that make our proofs easy to engineer and maintain. While most programming language work with proof assistants uses very manual proof styles, all of our proofs are implemented as adaptive programs in Coq’s tactic language, making it possible to reuse proofs unchanged as new language features are added.

In this paper, we focus especially on phases of compilation that rearrange the structure of syntax with nested variable binders. That aspect has been a key challenge area in past compiler verification projects, with much more effort expended in the statement and proof of binder-related lemmas than is found in standard pencil-and-paper proofs. We show how to exploit the representation technique of parametric higher-order abstract syntax to avoid the need to prove any of the usual lemmas about binder manipulation, often leading to proofs that are actually shorter than their pencil-and-paper analogues. Our strategy is based on a new approach to encoding operational semantics which delegates all concerns about substitution to the meta language, without using features incompatible with general-purpose type theories like Coq’s logic.

The latest from Adam Chlipala. Yet another evolutionary step for Lambda Tamer. Between this and Ynot the Coq/certified compiler story seems to be getting more impressive nearly daily.



LNgen generates locally nameless infrastructure for the Coq proof assistant from Ott-like specifications. Its output is based on the locally nameless style advocated in Engineering Formal Metatheory and includes all of the "infrastructure" lemmas associated with that style. Compared to Ott's locally nameless backend, LNgen favors generating a large collection of infrastructure lemmas over handling complex binding specifications and methods of defining syntax and relations.

There are really three stories here:

  1. Coq 8.2 shipped a while ago.
  2. Ott, a tool for PLT semantics work, has acquired a backend in support of the increasingly-popular "locally nameless" representation of binding structure in mechanized programming language metatheory.
  3. LNGen is another tool, using a subset of Ott syntax, that takes a slightly different approach from Ott's new backend to addressing the same issues.

From the U. Penn folks who brought us the Coq tutorial at POPL '08.

Achieving Security Despite Compromise Using Zero-Knowledge

Achieving Security Despite Compromise Using Zero-Knowledge

One of the important challenges when designing and analyzing cryptographic protocols is the enforcement of security properties in the presence of compromised participants. This paper presents a general technique for strengthening cryptographic protocols in order to satisfy authorization policies despite participant compromise. The central idea is to automatically transform the original cryptographic protocols by adding non-interactive zero-knowledge proofs. Each participant proves that the messages sent to the other participants are generated
in accordance to the protocol. The zero-knowledge proofs are forwarded to ensure the correct behavior of all participants involved in the protocol, without revealing any secret data. We use an enhanced type system for zero-knowledge to verify that the transformed protocols conform to their authorization policy even if some participants are compromised. Finally, we developed a tool that automatically generates ML implementations of protocols based on zero-knowledge proofs. The protocol transformation, the verification, and the generation of protocol implementations are fully automated.

This is the follow-up to this story. The prior work did not account for compromised participants. This work does.

I continue to be excited about the prospect of this previous story's work being applied to the type system described in this story, possibly resulting in an awesome new language for developing secure software.

Branching Time vs. Linear Time: Semantical Perspective

Sumit Nain and Moshe Vardi, Branching Time vs. Linear Time: Semantical Perspective, invited ATVA'07 paper.

...this paper puts forward an, admittedly provocative, thesis, which is that process-equivalence theory allowed itself to wander in the “wilderness” for lack of accepted guiding principles. The obvious definition of contextual equivalence was not scrupulously adhered to, and the underspecificity of the formalisms proposed led to too many interpretations of equivalence.

In revisiting the notion of process equivalence, which is a fairly central part of concurrency theory, Nain and Vardi end up arguing in favor of a purely trace-based notion of equivalence and the use of linear-time logics. This in turn leads to a rejection of bisimulation as a tool for establishing process equivalences:

The gist of our argument is that branching-time-based notions of process equivalence are not reasonable notions of process equivalence, as they distinguish between processes that are not contextually distinguishable. In contrast, the linear-time view does yield an appropriate notion of contextual equivalence.

In spite of its mathematical elegance and ubiquity in logic, bisimulation is not a reasonable notion of process equivalence, as it makes distinctions that cannot be observed.

They take pains to point out that they are not claiming that bisimulation or CTL should be abandoned or are not useful. Rather their emphasis is on the fact that bisimulation is not a contextual equivalence and is therefore not appropriate for establishing equivalence between (for example) a specification and its implementation. As they say in the conclusion of the paper:

While one may not realistically expect a single paper to overwrite about 30 years of research, a more modest hope would be for this paper to stimulate a lively discussion on the basic principles of process-equivalence theory.

Semantics of Memory Management for Polymorphic Languages

In Semantics of Memory Management for Polymorphic Languages (1997) Greg Morrisett and Robert Harper

...present a static and dynamic semantics for an abstract machine that evaluates expressions of a polymorphic programming language. Unlike traditional semantics, our abstract machine exposes many important issues of memory management, such as value sharing and control representation. We prove the soundness of the static semantics with respect to the dynamic semantics using traditional techniques. We then show how these same techniques may be used to establish the soundness of various memory management strategies, including type-based, tag-free garbage collection; tail-call elimination; and environment strengthening.

This should keep the formal semantics LtUers happy for a little while. But is all the machinery necessary? Is there an easier way to prove that garbage can be thrown out?

Denotational design with type class morphisms

Denotational design with type class morphisms. Conal Elliott.

Type classes provide a mechanism for varied implementations of standard interfaces. Many of these interfaces are founded in mathematical tradition and so have regularity not only of types but also of properties (laws) that must hold. Types and properties give strong guidance to the library implementor, while leaving freedom as well. Some of the remaining freedom is in how the implementation works, and some is in what it accomplishes.

To give additional guidance to the what, without impinging on the how, this paper proposes a principle of type class morphisms (TCMs), which further refines the compositional style of denotational semantics. The TCM idea is simply that the instance’s meaning is the meaning’s instance. This principle determines the meaning of each type class instance, and hence defines correctness of implementation. In some cases, it also provides a systematic guide to implementation, and in some cases, valuable design feedback.

The paper is illustrated with several examples of type, meanings, and morphisms.

To continue in our new all-Conal format... This paper brings together a bunch of things that Conal's been talking about lately, and "algebra of programming" fans will probably like his approach.

(I have a hunch that what he calls a "type class morphism" could be described using standard categorical jargon, but I haven't given it much thought. Suggestions?)


Luke Palmer and Nick Szabo can shoot me for this if they want, but I think this is warranted, and I want to connect a couple of dots as well. Luke is one of a number of computer scientists, with Conal Elliott probably being the best known, who have devoted quite a bit of attention to Functional Reactive Programming, or FRP. FRP has been discussed on LtU off and on over the years, but, unusually for LtU IMHO, does not seem to have gotten the traction that some other similarly abstruse subjects have.

In parallel, LtU has had a couple of interesting threads about Second Life's economy, smart contracts, usage control, denial of service, technical vs. legal remedies, and the like. I would particularly like to call attention to this post by Nick Szabo, in which he discusses a contract language that he designed:

Designing the contract language radically changed my idea of what program flow and instruction pointers can be. Its successor, a general-purpose programming language, may thereby make event-oriented programming and concurrency far easier. The language is targeted at GUI programming as well as smart contracts, real-time, and workflow programming. My answer to the problems of concurrency and event handling is to make the ordering of instructions the syntactic and semantic core of the language. The order of execution and event handlers are the easiest things to express in the language rather than kludged add-ons to a procedural or functional language.

In recent private correspondence, Nick commented that he'd determined that he was reinventing synchronous programming à la Esterel, and mentioned "Reactive" programming.


To make a potentially long entry somewhat shorter, Luke is working on a new language, Dana, which appears to have grown out of some frustration with existing FRP systems, including Conal Elliot's Reactive, currently perhaps the lynchpin of FRP research. Luke's motivating kickoff post for the Dana project can be found here, and there are several follow-up posts, including links to experimental source code repositories. Of particularly motivating interest, IMHO, is this post, where Luke discusses FRP's interaction with garbage collection succinctly but nevertheless in some depth. Luke's most recent post makes the connection from Dana, which Luke has determined needs to have a dependently-typed core, to Illative Combinatory Logic, explicit, and offers a ~100 line type checker for the core.

I find this very exciting, as I believe strongly in the project of being able to express computation centered on time, in the sense of Nick's contract language, in easy and safe ways extremely compelling. I've intuited for some time now that FRP represents a real breakthrough in moving us past the Von Neumann runtime paradigm in fundamental ways, and between Conal Elliott's and Luke's work (and no doubt that of others), it seems to me that my sense of this may be borne out, with Nick's contract language, or something like it, as an initial application realm.

So I wanted to call attention to Luke's work, and by extension recapitulate Conal's and Nick's, both for the PLT aspects that Luke's clearly represents, but also as a challenge to the community to assist in the realization of Nick's design efforts, if at all possible.

A Machine-Checked Model for a Java-Like Language, Virtual Machine, and Compiler

G. Klein and T. Nipkow, A Machine-Checked Model for a Java-Like Language, Virtual Machine, and Compiler, ACM TOPLAS, vol. 28, no. 4, 2006.

We introduce Jinja, a Java-like programming language with a formal semantics designed to exhibit core features of the Java language architecture. Jinja is a compromise between realism of the language and tractability and clarity of the formal semantics. The following aspects are formalised: a big and a small step operational semantics for Jinja and a proof of their equivalence; a type system and a definite initialization analysis; a type safety proof of the small step semantics; a virtual machine (JVM), its operational semantics and its type system; a type safety proof for the JVM; a bytecode verifier, i.e. dataflow analyzer for the JVM; a correctness proof of the bytecode verifiers w.r.t. the type system; a compiler and a proof that it preseves semantics and well-typedness.

The emphasis of this work is not on particular language features but on providing a unified model of the source language, the virtual machine and the compiler. The whole development has been carried out in the theorem prover Isabelle/HOL.

This is a fairly lengthy article (clocking in at 77 pages), in part because it presents a wealth of technical detail. The authors state that the aim of the article "is to demonstrate the state-of-the-art in machine-checked language definitions."

For those who are curious, the Isabelle theories are available in the Archive of Formal Proofs.

Parameterized Notions of Computation

Parameterized Notions of Computation, Robert Atkey, JFP 2008.

Moggi's Computational Monads and Power et al's equivalent notion of Freyd category have captured a large range of computational effects present in programming languages. Examples include non-termination, non-determinism, exceptions, continuations, side-effects and input/output. We present generalisations of both computational monads and Freyd categories, which we call parameterised monads and parameterised Freyd categories, that also capture computational effects with parameters.

Examples of such are composable continuations, side-effects where the type of the state varies and input/output where the range of inputs and outputs varies. By also considering structured parameterisation, we extend the range of effects to cover separated side-effects and multiple independent streams of I/O. We also present two typed λ-calculi that soundly and completely model our categorical definitions — with and without symmetric monoidal parameterisation — and act as prototypical languages with parameterised effects.

Once you've programmed with monads for a while, it's pretty common to start defining parameterized families of monads -- e.g., we might define a family of type constructors for IO, in which the program type additionally tracks which files the computation reads and writes from. This is a very convenient programming pattern, but the theory of it is honestly a little sketchy: on what basis do we conclude that the indices we define actually track what we intend them to? And furthermore, why can we believe that (say) the monadic equational laws still apply? That's the question Atkey lays out a nice solution to. He gives a nice categorical semantics for indexed, effectful computations, and then cooks up lambda calculi whose equational theory corresponds to the equations his semantics justifies.

The application to delimited continuations is quite nice, and the type theories can also give a little insight into the basics of how stuff like Hoare Type Theory works (which uses parameterized monads, with a very sophisticated language of parameters).

On a slightly tangential note, this also raises in my mind a methodological point. Over the last n years, we've seen many people identify certain type constructors, whose usage is pervasive, and greatly simplified with some syntactic extensions -- monads, comonads, applicative functors, arrows, and so on. It's incredible to suggest that we have exhausted the list of interesting types, and so together they constitute a good argument for some kind of language extension mechanism, such as macros. However, all these examples also raise the bar for when a macro is a good idea, because what makes them compelling is precisely that the right syntax yields an interesting and pretty equational theory in the extended language.

XML feed