The Simple Essence of Algebraic Subtyping: Principal Type Inference with Subtyping Made Easy

The Simple Essence of Algebraic Subtyping: Principal Type Inference with Subtyping Made Easy, Lionel Parreaux, ICFP 2020.

MLsub extends traditional Hindley-Milner type inference with subtyping while preserving compact principal types, an exciting new development. However, its specification in terms of biunification is difficult to understand, relying on the new concepts of bisubstitution and polar types, and making use of advanced notions from abstract algebra. In this paper, we show that these are in fact not essential to understanding the mechanisms at play in MLsub. We propose an alternative algorithm called Simple-sub, which can be implemented efficiently in under 500 lines of code (including parsing, simplification, and pretty-printing), looks more familiar, and is easier to understand.

There's also an introductory blog post and an online demo.

Stephen Dolan's Algebraic Subtyping (discussion) unexpectedly provided a solution to the problem of combining type inference and subtyping, but used somewhat heavy and unusual machinery. Now Lionel Parreaux shows that the system can be implemented in a very straightforward and pleasing way. Here's to hoping that it makes it into real languages!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Wow

There's a lot to assimilate here. I tackle a bit more of it every day or two, but I can see it's going to be some time before I grok it in fullness. And I do hope to get there, because it really is quite cool. I'm oddly reminded of the Church-Rosser theorem, which was first published in 1936 but the first simple proofs didn't become publicly available for about another three and a half decades. And I say this (not to misrepresent my position) as someone who doubts the advisability of advanced typing in programming languages, and has even stronger doubts about type inference in particular. Even at the height of my enthusiasm-for-powerful-typing phase, I'd never quite felt motivated to study Hindley-Milner in depth, so I'm going to have to play catch-up, here; but it seems well worth it, and you've provided plenty of hooks for the particular work (including the earlier LtU thread, which felt awfully esoteric at the time). And all this clearly relates to the notions of covariance and contravariance, which I've been probing lately as part of my ongoing effort to get a grip on tensors but somehow hadn't connected back to typing. Thanks for this!

Tensors and typing

Not related to algebraic subtyping at all, but, since you expressed (both here and in other places) an interest in understanding tensors, have you tried studying vector bundles? (Tensors, or rather tensor fields, are then “sections” of suitable tensor bundles.) Already in the one-dimensional case, line bundles have interesting geometric insights to offer, e.g., whether (and how) a variety can be immersed in projective space.

As for how this relates to typing, well, modern geometry has a very typeful flavor, in that you have very intricately structured objects (of more than one kind, of course), like “sheaves of modules” and “chain complexes of modules” (here “module” means “module over a ring”, which has nothing to do with software modules), and you want to relate them using operations that only make sense if you are manipulating objects of the right kind.

It is elegant, but is it useful?

Algebaric subtyping is the first mathematically elegant extension of Hindley-Milner in a rather long while. (Most of the extensions piled on top of the Haskell and OCaml type systems are, of course, ad hoc and not elegant at all.) However, it still has to prove its worth, in the sense of “Can you give an example of anything actually useful that I can do with algebraic subtypes that I cannot do with the existing ML type system?”

Here is something that would be useful to me. First some context: In ML, the combination of algebraic data types, parametric polymorphism and the module system can be used to implement purely functional data structures in such a manner that

  • Every module enforces a single invariant, or a single small set of closely related invariants.
  • No module can “accidentally” break another module's invariants. It is ruled out by type checking.
  • No runtime tests are performed that should never fail if your code has no bugs. That is, when branching, all branches are legitimately reachable. (Dynamic type tests, of course, count as branching.)

I have tried to provide similar modular decompositions of imperative data structures, but utterly failed. The reason why I failed is that the existing ML type system is incapable of expressing the fact that the allowable operations on an imperative data structure evolve over time, e.g., testing whether a stack is empty enables you to pop from it in the branch where it is not. Unfortunately, algebraic subtyping is not likely to help with this issue.