How OCaml type checker works -- or what polymorphism and garbage collection have in common

How OCaml type checker works -- or what polymorphism and garbage collection have in common

There is more to Hindley-Milner type inference than the Algorithm W. In 1988, Didier Rémy was looking to speed up the type inference in Caml and discovered an elegant method of type generalization. Not only it is fast, avoiding the scan of the type environment. It smoothly extends to catching of locally-declared types about to escape, to type-checking of universals and existentials, and to implementing MLF.

Alas, both the algorithm and its implementation in the OCaml type checker are little known and little documented. This page is to explain and popularize Rémy's algorithm, and to decipher a part of the OCaml type checker. The page also aims to preserve the history of Rémy's algorithm.

The attraction of the algorithm is its insight into type generalization as dependency tracking -- the same sort of tracking used in automated memory management such as regions and generational garbage collection. Generalization can be viewed as finding dominators in the type-annotated abstract syntax tree with edges for shared types. Fluet and Morrisett's type system for regions and MetaOCaml environment classifiers use the generalization of a type variable as a criterion of region containment. Uncannily, Rémy's algorithm views the region containment as a test if a type variable is generalizable.

As usual with Oleg, there's a lot going on here. Personally, I see parallels with "lambda with letrec" and "call-by-push-value," although making the connection with the latter takes some squinting through some of Levy's work other than his CBPV thesis. Study this to understand OCaml type inference and/or MLF, or for insights into region typing, or, as the title suggests, for suggestive analogies between polymorphism and garbage collection.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

A great document

I just deleted my corresponding forum entry. Below are the ramblings of mine that were replaced by Paul's in his submission.

I think there are interesting reasons why principal type inference is not as popular today as it was in the past decades (for example Rémy's further work on MLF has been mostly ignored by the research community at large, with the notable exception of Daan Leijen) : the type languages that we are now studying are so far past the original ML "sweet spot" between expressivity and simplicity that nobody expects principal types anymore. People are convinced from the start that type inference will be an unsolvable mess for their new language, and tend to go for local, bidirectional approaches that are simpler to develop but can have less robust properties.

There is also the issue of what happens when you move to dependent type theories. There has been interesting work on type inference in dependent settings (eg. Andreas Abel and Brigitte Pientka in 2011, or Conor McBride and Adam Gundry in 2012), but I'm not sure how we could distinguish the different layers of the problem, and keep robust principality guarantees for the ML-ish or System-F-ish portions of programs.