How are GADTs useful in practical programming?

GADTs are obviously currently a hot topic in functional programming research. Most of the papers focus only on the GADT mechanism (how type checking works etc.). The only example that one usually sees is the "typed evaluator".

I am not an expert on this topic, and I'd like to know more about how they would actually be useful in practical programming.

For example, I wonder how a parser would look like if it is impossible to construct "wrong" ASTs. Would type checking then effectively take place during parsing? How would a type error in the parsed program be detected and thrown?

What other interesting applications exist? In general, how do GADTs change the programming model?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Eliminates some problems, but not all

For the simple "typed evaluator" example, yes, you cannot define incorrect ASTs. However, this wont be the case in general. Most ASTs are more complicated and contain variable definitions and references (things that, AFAIK, aren't usually checked by leveraging the host's type system). Even for pure expression evaluator ASTs, complete assurance requires that the host language's type system be able to express the type system of the target language.

GADTs are a mechanism for defining more precise types. It helps you make stronger guarantees on an AST structure, which reduces the number of explicit consistency checks you have to perform in code.

I come from an OO background. When I started learning Haskell, I noticed that I couldn't exactly express the types I wanted, though I couldn't put my finger on exactly why. Turns out, Java-style OO with generics provides a level of precision that wasn't available in Haskell without GADTs. The following paper helped me see what GADTs bring to the table by looking at them from an OO perspective: Generalized Algebraic Data Types and Object-Oriented Programming

It's Kind of Hard to Say

To me, probably the most exciting aspect of GADTs is the possibility of adding dynamic security level support to type-based information flow security a lá Flow Caml. However, it also seems that GADTs offer many, if not all, of the benefits of dependent types, and I think Tim Sweeney's POPL 06 slides do an excellent job of enumerating a few critical benefits of dependent types. But it's worth pointing out that GADTs are very general, which unfortunately means that you might need to read a lot of the literature to get a good cross-section of application domains, since you're right—everyone does seem to trot out the "typed term evaluator" example.

GADTs and Dependent Types

However, it also seems that GADTs offer many, if not all, of the benefits of dependent types...

That's interesting. One of the main uses for dependent types on Tim Sweeney's slides is to do array bounds checking. Can GADTs encode this sort of information?

No GADTs

...but you might like Eliminating Array Bound Checking through Non-dependent types...

There is a view that in order to gain static assurances such as an array index being always in range or tail being applied to a non-empty list, we must give up on something significant: on data structures such as arrays (to be replaced with nested tuples), on general recursion, on annotation-free programming, on clarity of code, on well-supported programming languages. That does not have to be the case. This message shows a non-trivial example involving native Haskell arrays, index computations, and general recursion. All arrays indexing operations are statically guaranteed to be safe -- and so we can safely use an efficient unsafeAt provided by GHC seemingly for that purpose. The code is efficient; the static assurances cost us no run-time overhead. The example uses only Haskell98 + higher-ranked types. No new type classes are introduced. The safety is based on: Haskell type system, quantified type variables, and a compact general-purpose trusted kernel.

...and Number-parameterized types...

Discussions about types parameterized by values -- especially types of arrays or finite groups parameterized by their size -- reoccur every couple of months on functional programming languages newsgroups and mailing lists. The often expressed wish is to guarantee that, for example, we never attempt to add two vectors of different lengths. As one poster said, ``This [feature] would be helpful in the crypto library where I end up having to either define new length Words all the time or using lists and losing the capability of ensuring I am manipulating lists of the same length.'' Number-parameterized types as other more expressive types let us tell the typechecker our intentions. The typechecker may then help us write the code correctly. Many errors (which are often trivial) can be detected at compile time. Furthermore, we no longer need to litter the code with array boundary match checks. The code therefore becomes more readable, reliable, and fast. Number-parameterized types when expressed in signatures also provide a better documentation of the code and let the invariants be checked across module boundaries.

Actually no

In Eliminating Array Bound Checking through Non-dependent types we have a way to ensure that a certain functions are only used with branded values. This approach has two problems:

  1. How do we brand a value? (that's easy)
  2. How we re-brand a value? (that's hard, AFAIK)

We can brand a index with:


brandIndex :: BArray s i a -> i -> Maybe (BIndex s i)
brandIndex barr i = 
  let (l,h) = bounds a
    in if l <= i && i <= h then Just (BIndex i) else Nothing

But how do we make a re-brand an index, so we can use the same index for n arrays?

[on edit: escaped < so the filter wouldn't drop this code].

Lightweight dependent types

Daniel Yokomizo wrote:

But how do we make a re-brand an index, so we can use the same index
for n arrays?

Do you have a concrete example in mind? Like a piece of (pseudo-) code
or Haskell code, in which you'd like to eliminate as much of the array
bound checking as possible? Your code doesn't have to actually run in
Haskell; it would save my time if the code at least potentially
runnable in Haskell without too many contortions. Haskell can be
replaced with OCaml or perhaps other languages, although I prefer to
do what I know, in this case. Could you post the code here or send me
privately? I'll see what I can do.

Concrete yet invented example



outer i = do bi1 <- brandIndex barr1 i
             bi2 <- rebrandIndex barr2 bi1
             bi3 <- rebrandIndex barr3 bi2
             someFunction barr1 barr2 barr3 bi3

The problem is how can we create a branded index that works for n arrays? If we can share branded indices, someFunction is statically known that bi3 can be used to index barr1, barr2 and barr3. If we can't share branded indices we must pass either three diferent brands of the same (unsafe) index or pass the unsafe index and lose the static guarantee.

Eliminating Multiple-Array Bound Checking through Non-dependent

Please see the following code


http://www.haskell.org/pipermail/haskell/2006-February/017558.html

It implements quite a general example (when the number of arrays to
process is not statically known). It is significantly simpler to use
the technique for the fixed number of arrays (especially if those
arrays are supposed to be of the same size). In the latter case, the
array of a wrong size merely triggers an exit via a special
continuation, and so no range intersection computations and brand
adjustments are needed.

Jurys still out

You can get safe code by encoding the properties you desire, but just how easy it is encode properties remains to be seen.

Check out Tim Sheards page for a few examples. GADTs can be used to marshall/unmarshall data in a type safe manner, enforce patterns of usage, encode semantic properties (e.g. a tree is balanced, a list has length > n, etc). They've come up a few times in the haskell' list, but people are still figuring out how they interact with haskells other features.

I don't know if this counts

I don't know if this counts as 'interesting' as such, but I needed GADTs to build an AST for the 'language' formed by a monad and write an evaluator for it (this turns out to be a pretty easy way to implement a new monad). While return was trivial and bind doesn't need GADTs per se (just existentially quantified datatypes), the put and get operations for a state monad constrain the computation's return type and GADTs allow that to be expressed in the type of the corresponding constructors.

logic programming

In logic programming you don't need to know the dimension of an array (or a list)! Deduction figures it out.

Typed Contracts for Functional Programming

Andres Loh shows an embedded DSL for contracts in Haskell using GADTs in Typed Contracts for Functional Programming.

Yampa

Dynamic Optimization for Functional Reactive Programming using Generalized Algebraic Data Types by Henrik Nilsson

A limited form of dependent types, called Generalized Algebraic Data Types (GADTs), has recently been added to the list of Haskell extensions supported by the Glasgow Haskell Compiler. Despite not being full-fledged dependent types, GADTs still offer considerably enlarged scope for enforcing important code and data invariants statically. Moreover, GADTs offer the tantalizing possibility of writing more efficient programs since capturing invariants statically through the type system sometimes obviates entire layers of dynamic tests and associated data markup. This paper is a case study on the applications of GADTs in the context of Yampa, a domainspecific language for Functional Reactive Programming in the form of a self-optimizing, arrow-based Haskell combinator library. The paper has two aims. Firstly, to explore what kind of optimizations GADTs make possible in this context. Much of that should also be relevant for other domain-specific embedded language implementations, in particular arrow-based ones. Secondly, as the actual performance impact of the GADT-based optimizations is not obvious, to quantify this impact, both on tailored micro benchmarks, to establish the effectiveness of individual optimizations, and on two fairly large, realistic applications, to gauge the overall impact. The performance gains for the micro benchmarks are substantial. This implies that the Yampa API could be simplified as a number of “pre-composed” primitives that were there mainly for performance reasons are no longer needed. As to the applications, a worthwhile performance gain was obtained in one case whereas the performance was more or less unchanged in the other.

Deriving combinator libraries a la John Hughes

GADTs are very useful for deriving combinator libraries in the style of John Hughes. His paper The design of a pretty-printing library is a good example of the methodology although it doesn't use GADTs. A paper which could have used GADTs, had they been available at the point of writing is Koen Claessen's Parallel Parsing Processes. In section 5 he needs to do a hack with the Symbol constructor because he didn't have GADTs available. Using GADTs we can instead write:

data P s a where
  Symbol :: P s s
  Fail   :: P s a
  (:+++) :: P s a -> P s a -> P s a
  (:>>=) :: P s a -> (a -> P s b) -> P s b
  Return :: a -> P s a

In my experience the use of GADTs pops up quite frequently when using this design methodology.