How OCaml type checker works -- or what polymorphism and garbage collection have in common
There is more to Hindley-Milner type inference than the Algorithm W. In 1988, Didier Rémy was looking to speed up the type inference in Caml and discovered an elegant method of type generalization. Not only it is fast, avoiding the scan of the type environment. It smoothly extends to catching of locally-declared types about to escape, to type-checking of universals and existentials, and to implementing MLF.
Alas, both the algorithm and its implementation in the OCaml type checker are little known and little documented. This page is to explain and popularize Rémy's algorithm, and to decipher a part of the OCaml type checker. The page also aims to preserve the history of Rémy's algorithm.
The attraction of the algorithm is its insight into type generalization as dependency tracking -- the same sort of tracking used in automated memory management such as regions and generational garbage collection. Generalization can be viewed as finding dominators in the type-annotated abstract syntax tree with edges for shared types. Fluet and Morrisett's type system for regions and MetaOCaml environment classifiers use the generalization of a type variable as a criterion of region containment. Uncannily, Rémy's algorithm views the region containment as a test if a type variable is generalizable.
As usual with Oleg, there's a lot going on here. Personally, I see parallels with "lambda with letrec" and "call-by-push-value," although making the connection with the latter takes some squinting through some of Levy's work other than his CBPV thesis. Study this to understand OCaml type inference and/or MLF, or for insights into region typing, or, as the title suggests, for suggestive analogies between polymorphism and garbage collection.
Visi.io comes from David Pollak and aims at revolutionizing building tablet apps, but the main attraction now seems to be in exploring the way data flow and cloud computing can be integrated. The screencast is somewhat underwhelming but at least convinces me that there is a working prototype (I haven't looked further than the website as yet). The vision document has some nice ideas. Visi.io came up recently in the discussion of the future of spreadsheets.
The Milner Symposium 2012 was held in Edinburgh this April in memory of the late Robin Milner.
The Milner Symposium is a celebration of the life and work of one of the world's greatest computer scientists, Robin Milner. The symposium will feature leading researchers whose work is inspired by Robin Milner.
The programme consisted of academic talks by colleagues and past students. The talks and slides are available online.
I particularly liked the interleaving of the personal and human narrative underlying the scientific journey. A particularly good example is Joachim Parrow's talk on the origins of the pi calculus. Of particular interest to LtU members is the panel on the future of functional programming languages, consisting of Phil Wadler, Xavier Leroy, David MacQueen, Martin Odersky, Simon Peyton-Jones, and Don Syme.
If you are in Denmark in September you should attend. If not, raise a glass to salute what is now abundantly clear: FPLs are conquering the ("real") world.
Koka extends the idea of using row polymorphism to encode an effect system and the relations between them. Daan Leijen is the primary researcher behind it and his research was featured previously on LtU, mainly on row polymorphism in the Morrow Language.
So far there's no paper available on the language design, just the slides from a Lang.Next talk (which doesn't seem to have video available at Channel 9), but it's in the program for HOPE 2012.
The Brown PLT Blog, 2012-06-04
Testing is not enough. Despite our work, other researchers found a missing case in λJS. Today, we're introducing Mechanized λJS, which comes with a machine-checked proof of correctness, using the Coq proof assistant.
More work on mechanizing the actual, implemented semantics of a real language, rather than a toy.
How to Make Ad Hoc Proof Automation Less Ad Hoc
Georges Gonthier, Beta Ziliani, Aleksandar Nanevski, and Derek Dreyer, to appear in ICFP 2011
Most interactive theorem provers provide support for some form of user-customizable proof automation. In a number of popular systems, such as Coq and Isabelle, this automation is achieved primarily through tactics, which are programmed in a separate language from that of the prover’s base logic. While tactics are clearly useful in practice, they can be difficult to maintain and compose because, unlike lemmas, their behavior cannot be specified within the expressive type system of the prover itself.
We propose a novel approach to proof automation in Coq that allows the user to specify the behavior of custom automated routines in terms of Coq’s own type system. Our approach involves a sophisticated application of Coq’s canonical structures, which generalize Haskell type classes and facilitate a flexible style of dependently-typed logic programming. Specifically, just as Haskell type classes are used to infer the canonical implementation of an overloaded term at a given type, canonical structures can be used to infer the canonical proof of an overloaded lemma for a given instantiation of its parameters. We present a series of design patterns for canonical structure programming that enable one to carefully and predictably coax Coq’s type inference engine into triggering the execution of user-supplied algorithms during unification, and we illustrate these patterns through several realistic examples drawn from Hoare Type Theory. We assume no prior knowledge of Coq and describe the relevant aspects of Coq type inference from first principles.
If you've ever toyed with Coq but run into the difficulties that many encounter in trying to construct robust, comprehensible proof scripts using tactics, which manipulate the proof state and can leave you with the "ground" of the proof rather than the "figure," if you will, in addition to being fragile in the face of change, you may wish to give this a read. It frankly never would have occurred to me to try to turn Ltac scripts into lemmas at all. This is much more appealing than most other approaches to the subject I've seen.
Validating LR(1) parsers
An LR(1) parser is a finite-state automaton, equipped with a stack, which uses a combination of its current state and one lookahead symbol in order to determine which action to perform next. We present a validator which, when applied to a context-free grammar G and an automaton A, checks that A and G agree. Validating the parser provides the correctness guarantees required by verified compilers and other high-assurance software that involves parsing. The validation process is independent of which technique was used to construct A. The validator is implemented and proved correct using the Coq proof assistant. As an application, we build a formally-verified parser for the C99 language.
I've always been somewhat frustrated, while studying verified compiler technology, that the scope of the effort has generally been limited to ensuring that the AST and the generated code mean the same thing, as important as that obviously is. Not enough attention has been paid, IMHO, to other compiler phases. Parsing: The Solved Problem That Isn't does a good job illuminating some of the conceptual issues that arise in attempting to take parsers seriously as functions that we would like to compose etc. while maintaining some set of properties that hold of the individuals. Perhaps this work can shed some light on possible solutions to some of those issues, in addition to being worthwhile in its own right. Note the pleasing presence of an actual implementation that's been used on the parser of a real-world language, C99.
Programming with Algebraic Effects and Handlers. Andrej Bauer and Matija Pretnar, arXiv preprint.
Eff is a programming language based on the algebraic approach to computational effects, in which effects are viewed as algebraic operations and effect handlers as homomorphisms from free algebras. Eff supports first-class effects and handlers through which we may easily define new computational effects, seamlessly combine existing ones, and handle them in novel ways. We give a denotational semantics of eff and discuss a prototype implementation based on it. Through examples we demonstrate how the standard effects are treated in eff, and how eff supports programming techniques that use various forms of delimited continuations, such as backtracking, breadth-first search, selection functionals, cooperative multi-threading, and others.
Eff has been discussed here before, and it's nice to see some more progress and a much more complete introduction. The paper is intended for a general audience (well, a general audience of PL enthusiasts). It's quite clear and contains a wealth of examples.
A blog post about Call-By-Push-Value by Rob Simmons: What does focusing tell us about language design?
I think that one of the key observations of focusing/CBPV is that programs are dealing with two different things - data and computation - and that we tend to get the most tripped up when we confuse the two.
- Data is classified by data types (a.k.a. positive types). Data is defined by how it is constructed, and the way you use data is by pattern-matching against it.
- Computation is classified by computation types (a.k.a. negative types). Computations are defined their eliminations - that is, by how they respond to signals/messages/pokes/arguments.
There are two things I want to talk about, and they're both recursive types: call-by-push-value has positive recursive types (which have the feel of inductive types and/or algebras and/or what we're used to as datatypes in functional languages) and negative recursive types (which have the feel of recursive, lazy records and/or "codata" whatever that is and/or coalgebras and/or what William Cook calls objects). Both positive and negative recursive types are treated by Paul Blain Levy in his thesis (section 5.3.2) and in the Call-By-Push Value book (section 4.2.2).
In particular, I want to claim that Call-By-Push-Value and focusing suggest two fundamental features that should be, and generally aren't (at least simultaneously) in modern programming languages:
- Support for structured data with rich case analysis facilities (up to and beyond what are called views)
- Support for recursive records and negative recursive types.
Previously on Rob's blog: Embracing and extending the Levy language; on LtU: Call by push-value, Levy: a Toy Call-by-Push-Value Language.
And let me also repeat CBPV's slogan, which is one of the finest in PL advocacy: Once the fine structure has been exposed, why ignore it?