Foundations for Structured Programming with GADTs, Patricia Johann and Neil Ghani. POPL 2008.
GADTs are at the cutting edge of functional programming and become more widely used every day. Nevertheless, the semantic foundations underlying GADTs are not well understood. In this paper we solve this problem by showing that the standard theory of datatypes as carriers of initial algebras of functors can be extended from algebraic and nested data types to GADTs. We then use this observation to derive an initial algebra semantics for GADTs, thus ensuring that all of the accumulated knowledge about initial algebras can be brought to bear on them. Next, we use our initial algebra semantics for GADTs to derive expressive and principled tools -- analogous to the well-known and widely-used ones for algebraic and nested data types -- for reasoning about, programming with, and improving the performance of programs involving, GADTs; we christen such a collection of tools for a GADT an initial algebra package. Along the way, we give a constructive demonstration that every GADT can be reduced to one which uses only the equality GADT and existential quantification. Although other such reductions exist in the literature, ours is entirely local, is independent of any particular syntactic presentation of GADTs, and can be implemented in the host language, rather than existing solely as a metatheoretical artifact. The main technical ideas underlying our approach are (i) to modify the notion of a higher-order functor so that GADTs can be seen as carriers of initial algebras of higher-order functors, and (ii) to use left Kan extensions to trade arbitrary GADTs for simpler-but-equivalent ones for which initial algebra semantics can be derived.
I found this to be a really interesting paper, because the work had an opposite effect on my opinions from what I expected. Usually, when I see a paper give a really clean semantics to an idea, I end up convinced that this is actually a good idea. However, Johann and Ghani gave a really elegant treatment of GADTs, which had the effect of making me think that perhaps Haskell-style GADTs should not be added as is to new programming languages!
This is because of the way they give semantics to GADTs as functors
|C| -> C, where
C is the usual semantic category (eg, CPPO) and
|C| is the discrete category formed from
C that retains the objects and only the identity morphisms. If I understand rightly, this illustrates that the indices in a GADT are only being used as index terms, and their structure as types is going unused. So it's really kind of a pun, arising from the accident that Haskell has
* as its only base kind. This makes me think that future languages should include indices, but that these index domains should not coincide with kind type. That is, users should be able to define new kinds distinct from
*, and use those as indexes to types, and these are the datatypes which should get a semantics in the style of this paper.