Foundations for Structured Programming with GADTs

Foundations for Structured Programming with GADTs, Patricia Johann and Neil Ghani. POPL 2008.

GADTs are at the cutting edge of functional programming and become more widely used every day. Nevertheless, the semantic foundations underlying GADTs are not well understood. In this paper we solve this problem by showing that the standard theory of datatypes as carriers of initial algebras of functors can be extended from algebraic and nested data types to GADTs. We then use this observation to derive an initial algebra semantics for GADTs, thus ensuring that all of the accumulated knowledge about initial algebras can be brought to bear on them. Next, we use our initial algebra semantics for GADTs to derive expressive and principled tools -- analogous to the well-known and widely-used ones for algebraic and nested data types -- for reasoning about, programming with, and improving the performance of programs involving, GADTs; we christen such a collection of tools for a GADT an initial algebra package. Along the way, we give a constructive demonstration that every GADT can be reduced to one which uses only the equality GADT and existential quantification. Although other such reductions exist in the literature, ours is entirely local, is independent of any particular syntactic presentation of GADTs, and can be implemented in the host language, rather than existing solely as a metatheoretical artifact. The main technical ideas underlying our approach are (i) to modify the notion of a higher-order functor so that GADTs can be seen as carriers of initial algebras of higher-order functors, and (ii) to use left Kan extensions to trade arbitrary GADTs for simpler-but-equivalent ones for which initial algebra semantics can be derived.

I found this to be a really interesting paper, because the work had an opposite effect on my opinions from what I expected. Usually, when I see a paper give a really clean semantics to an idea, I end up convinced that this is actually a good idea. However, Johann and Ghani gave a really elegant treatment of GADTs, which had the effect of making me think that perhaps Haskell-style GADTs should not be added as is to new programming languages!

This is because of the way they give semantics to GADTs as functors |C| -> C, where C is the usual semantic category (eg, CPPO) and |C| is the discrete category formed from C that retains the objects and only the identity morphisms. If I understand rightly, this illustrates that the indices in a GADT are only being used as index terms, and their structure as types is going unused. So it's really kind of a pun, arising from the accident that Haskell has * as its only base kind. This makes me think that future languages should include indices, but that these index domains should not coincide with kind type. That is, users should be able to define new kinds distinct from *, and use those as indexes to types, and these are the datatypes which should get a semantics in the style of this paper.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

And a fine talk too

I really enjoyed Neil's talk at POPL 08. While I did not understand all of it, his main point (which worked with me) is that we should pay more attention to one more tool from category theory: Kan extensions. We already know and love (strong) functors, (co)monads, adjunctions and the Yoneda lemma, but there is yet another tool we should be using! I still don't quite "get" it, but his talk has succeeded in getting me to re-re-read the fundamental ideas around Kan extensions.

You make a very interesting point about kinds and why GADTs, although given a very elegant treatment in that paper, are perhaps not quite 'right' yet. It would indeed seem that if the underlying kind system was richer (yet still with easily decidable inference), we could gain a lot.

But what of the regression?

If you can define new kinds in a language, then someone's gonna want a "top" kind, and dynamic kinding, and so-on. And others will react to "dynamic kinding" by introducing yet another terminology--lets call them "genres" or "varieties", as those are English synonyms for "type" that I haven't yet seen overloaded in the mathematical or CS literature (unlike class, kind, sort, category, flavor, etc.)--to refer to the concept of a kind-of-a-kind. And pretty soon, computer science and the related fields of math and logic will have their own version of the Linnaean taxonomy, but with a countably infinite number of strata, rather than just the 20-30 (judiciously expanded from the original 7) layers used in biology.

Ow, my head hurts just thinking about it. :)

Sounds a lot like Sheard's

Sounds a lot like Sheard's Omega.

varieties

The name "variety" is already in mathematical use. See, for instance, http://en.wikipedia.org/wiki/Algebraic_variety

Hi,

Hi,

We saw your comment about our paper on lambda the ultimate and thought your reaction had a lot of merit to it. Indeed it is one of the conclusions that seem reasonable to draw from our work.

Here are some more ...

1. GADTs are syntactic sugar. The real structures at work are existentially quantified types and the equality GADT.

2. Indexing by a set, rather than a category, as GADTs do means that really this is dependently typed programming faked in Haskell by using kind level proxies for types

3. As a result, Haskell ends up with a curious "logic programming" feel to its type level computation and functional programming feel to its term-level computation.

Hope these thoughts help you in some way,

All the best
Patricia Johann and Neil Ghani

Comments on comments

1. You do show that the equality GADT is quite fundamental. It certainly would be interesting to examine what results we obtain depending on the power of that GADT; intensional equality is extremely weak, while extensional equality is medium-strong in its power. One can imagine types for which we have even stronger notions of equality, and have the compiler take advantage of that too. As an example, Bill Farmer shows the different kinds of equality that occur routinely in first-year calculus.
2. I am perfectly happy to 'fake' dependently typed programming via proxies in ``current'' languages, all the while ravenously absorbing the discoveries made in full-fledged dependently-typed languages [that are not yet mainstream-ready]. What I would really like is to have much more convenient and powerful proxies. I, for one, would be happy to be forced to employ kind-level programs that are trivially total as my proxies!
3. If only the type-level computations corresponded to a more classical "logic programming" or even "relational programming" methodology, that would only be half-bad. I think that most Haskellers feel that having the term-level computations have a "functional programming" feel is rather, er, a feature, shall we say?

Coming back to your first point: do you have a feeling for what the "next fundamental" GADT after the equality GADT is/should be? Ordering perhaps? The next question, naturally, is what would the pullback of those fundamental structure to programming-level syntactic sugar be?

Monads for programming

I hope this is not off topic, but I am curious to know if the paper Monads for Programming Languages is relevant to this issue?

not previously discussed?

I'm surprised this doesn't seem to be have been previously on LtU (?),
since the author writes (for example)

Another lingering question concerning monads is whether they are really the right tool for the job. They seem to have suggested themselves to Moggi originally because he was looking for something with nice mathematical properties, and monads were a known form with nice mathematical properties. Borrowing structures from one field to another like this can be hazardous; just because monads have properties that are helpful in mathematics doesn't mean they're necessarily right for modeling general computation (just as the suitability of traditional OO inheritance for coding 1960s-style simulation software doesn't necessarily make it ideal for general-purpose software engineering).

and right before that

There is no universally applicable technique for combining monads that represent impure phenomena...

[I won't quote more b/c copy & paste doesn't seem to work from this PDF]