Generics as a Library

Generics as a Library. Oliveira, Hinze and Löh.

A generic function is a function that is defined on the structure of data types: with a single definition, we obtain a function that works for many data types. In contrast, an ad-hoc polymorphic function requires a separate implementation for each data type. Previous work by Hinze on lightweight generic programming has introduced techniques that allow the definition of generic functions directly in Haskell. A severe drawback of these approaches is that generic functions, once defined, cannot be extended with ad-hoc behaviour for new data types, precluding the design of a customizable generic programming library based on the se techniques. In this paper, we present a revised version of Hinze's Generics for the masses approach that overcomes this limitation. Using our new technique, writing a customizable generic programming library in Haskell 98 is possible.

Pushing forward the state of the generics art in Haskell 98. They also discuss the application of their technique to the expression problem.

(Thanks to Jacques Carette for pointing in this direction.)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

[getting OTish already? sorry] Re: the expression problem

Is it funny, depressing, or sobering that the expression problem sounds like it shouldn't be a big deal when somebody first describes it in high-level terms, but quickly shows itself to be more trouble (ignoring O'Caml, Scala, etc. of course)? Does it mean most of our languages have large blind spots that most "main stream" folks don't understand, or just never run into and thus don't care?

Expression problem

I think most programmers do encounter the expression problem, do understand it and do care. I think that what most programmers lack is a good vocabulary for characterizing the difficulty, and perhaps a broad enough perspective to see its incarnations in various languages as instances of the same thing.

Most OO programmers, for instance, recognize the trade-offs between leaving a design open to data extension and leaving it open to operational extension. This results in a proliferation of patterns and best practices, and lots of folklore about which side to come down on for any particular software design problem, as well as various ad hoc ways of partially straddling the line.

Functional programmers are certainly aware of the difficulty in adding a new variant to a type, having to chase down functions all over the place adding cases, and the lack of static safety implied by non-exhaustive matches.

But I do think that for the most part the existence of this problem is taken for granted (probably one of the reasons it wasn't given a name earlier). So maybe the other thing that programmers often "lack" is the ability or desire to imagine making a fundamental change to the language in order to solve this problem. In lots of cases, this isn't really a deficiency... I'm definitely in favor of programmers spending time thinking critically about their tools, but if you're using Java for a project, well, too much time spent thinking about its deficiencies won't change anything, and you'll still be left with the same choice. So taking the existence of the expression problem as an axiom is fairly pragmatic.

Incidentally, I don't think this is OT. The expression problem is an important part of the motivation for this work, and assessing its actual importance to programmers is highly relevant. In fact, I have other concerns about solutions to the expression problem, but I'll save them for later...

Re: other concerns

Did those ever get mentioned somewhere? Thanks.

Question

Can someone explain how this approach coexists with the parametricity theorem?

For example, if we can handle lists of characters in an ad-hoc manner, but lists of all other data types generically, then can't we use that to write a non-constant, univerally quantified function of IsChar :: forall t. t->Bool enabling us to distinguish between characters and values of any other type? (Which is impossible in a language in which the parametricity theorem holds.)

Is the trick that only certain types have a representation typeclass Rep t enabling us to discriminate between values of those types, but not enabling us to discriminate between values of an arbitrary universally quantified type? That seems workable, because we can just look at those representation types when choosing to implement special-cases. This implies the existance of some sort of metasyntactic (e.g. aparametric) stage in compilation where these representation types are generated on behalf of the user for all or certain user-defined types, lest the user have to write boilerplate for each new user-defined datatype.

Here's how it co-exists. So,

Here's how it co-exists. So, suppose that you have a dependently typed language, and define an ordinary datatype like:

datatype tag = 
  | IntTag
  | BoolTag
  | PairTag of tag * tag

So what we have is just a regular algebraic datatype of tags. For each tag, we want to put a languagel-level type in correspondence with it, and we do this with a "large elimination", which is a type operator that deconstructs a data value and constructrs at type.

val typeof : tag -> type

fun typeof IntTag          = int
  | typeof BoolTag         = bool
  | typeof PairTag(t1, t2) = (typeof t1) * (typeof t2)

Now, the type of a generic function is something like:

val print : (t:tag) -> (typeof t) -> string

fun print IntTag n               = print_int n
  | print BoolTag b              = if b then "true" else "false"
  | print PairTag(t1,t2) (e1,e2) = 
      "(" + (print t1 e1) + ", " + (print t2 e2) + ")"

So far, none of this breaks parametricity, because we're not looking at a type; we're getting a tag and computing a function of a type based on the tag. In polytypic languages, what they do is

a) bake in a definition of tags and their interpretation, and
b) make the type inference system smart enough to understand tags and infer the tag arguments to function applications.

Parametricity Theorem

Can you suggest a good introduction to this theorem? One that's available online (unlike the 1983 paper by Reynolds).

Theorems for free from the

Theorems for free, from the ever insightful Philip Wadler. It was discussed here (warning the link from that discussion doesn't work anymore).

Coincidentally, I've been

Coincidentally, I've also been wondering about the parametricity theorem in the presence of representation types (like Rep t) and, more generally, for other GADTs.

I've recently given a talk on my work in progress. This talk adds representation types to System F and looks at the "Free Theorems" that result. (Note: It would be a good idea to read "Theorems for Free" first, as I don't include much of that background in this talk.)

The parametricity theorem holds (if it holds in Haskell)

Indeed you already got the right answer. If you wanted to have the function IsChar that distinguishes between characters and values of other types, you would need something with the following type: IsChar :: forall t . Rep t => t -> Bool

As you mention this does not allow us to discriminate values of an arbitrary universally quantified type. Instead it either implies that the user needs to provide some boilerplate code (instances of Rep) for each datatype that is representable, or it implies that we have an extra mechanism in the compiler that generates this for us.

When we refer to lightweight generic programming we typically have the first, since the intention is that we use an existing programming language without any modifications. This can be, of course, a bit painful since we need to provide some boilerplate, but it shows that generic programming can be implemented solely using alredy existing concepts such as parametric polymorphism or type classes (and specially constructor classes).

An advantage of doing it this way (as opposed to create a new programming language with special support for generic programming) is that we can assume all the properties of the language in which we are building the library. For example, the parametricity theorem holds for generic functions if it holds in Haskell.

So the interesting question, related to yours, is: does the parametricity theorem hold in Haskell?

Expression Problem

Mainstream programmers tend to primarily write very domain-specific code rather than the sort of high-level frameworks and libraries where the expression problem mainly shows up. Improved generic support will result in a more expressive language that better empowers framework and library writers (thus leading to better frameworks and libraries). But I don't suppose the typical everyman programmer will make a lot of use of it directly.

Parametricity in Haskell

So the interesting question, related to yours, is: does the parametricity theorem yield in Haskell?
My understanding is that parametricity holds in all referentially transparent Haskell code that does not use explicit forcing via "seq". Code using "seq" is "parametric mod _|_". Haskell does support code that isn't referentially transparent via unsafePerformIO and I'm not sure what implications this has on parametricity.

Side effects and parametricity

It for the most part destroys it. First off, lacking the value restriction, polymorphism and state lead to an unsound type system, but ignoring that, an infinity of values is added to every type, e.g. forall a.a -> a no longer contains the few (one, two or three depending on how "ideal" you want to make it) values it currently has, but instead it contains things like \x -> unsafePerformIO (putStrLn "Hello" >> return x) and the endless variations on that theme. There is a close connection between parametricity and (categorical) naturality; a transformation, τ, is natural if fmap f . τ = τ . fmap f, which is exactly the "free theorem" of a polymorphic function (for one argument). This fails miserably in the face of side effects. However, all this said, the actual action on the input and output is still much the same, at least for completely polymorphic arguments, i.e. forall a.a -> a will still return the same argument that it is given, assuming it returns, no matter what else it does. I believe Haskell implementors assume parametricity holds. GHC certainly does if for no other reason than fusion.

Also...

Take a look at the discussion of parametricity (with respect to seq) in section 10.3 of the History of Haskell paper. The upshot is:

We have sacrificed parametricity in the interests of programming agility and (sometimes dramatic) optimisations. GHC still uses short-cut deforestation, but it is unsound...