archives

Challenges

I think that this is a very well written paper about the challenges facing the computing and software worlds at the beginning of the 21st century. There are two on line versions: a word version and an HTML version.

Generic Discrimination: Sorting and Partitioning Unshared Data in Linear Time

Generic Discrimination: Sorting and Partitioning Unshared Data in Linear Time, Fritz Henglein. ICFP 2008.

We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worst-case linear-time discrimination functions (discriminators) can be defined generically, by (co-)induction on an expressive language of order denotations. The generic definition yields discriminators that generalize both distributive sorting and multiset discrimination. The generic discriminator can be coded compactly using list comprehensions, with order denotations specified using Generalized Algebraic Data Types (GADTs). A GADT-free combinator formulation of discriminators is also given.

We give some examples of the uses of discriminators, including a new most-significant-digit lexicographic sorting algorithm.

Discriminators generalize binary comparison functions: They operate on n arguments at a time, but do not expose more information than the underlying equivalence, respectively ordering relation on the arguments. We argue that primitive types with equality (such as references in ML) and ordered types (such as the machine integer type), should expose their equality, respectively standard ordering relation, as discriminators: Having only a binary equality test on a type requires Θ(n^2) time to find all the occurrences of an element in a list of length n, for each element in the list, even if the equality test takes only constant time. A discriminator accomplishes this in linear time. Likewise, having only a (constant-time) comparison function requires Θ(n log n) time to sort a list of n elements. A discriminator can do this in linear time.

If you're like me, at some point you probably heard about linear time sorts like radix sort and then dismissed them as an interesting curiosity --- the restriction to sorting numbers seems pretty limiting, after all.

Henglein did not, and in this paper he shows how to write a teeny-tiny library of type-directed combinators that can lift linear time sorts to structured types like lists and trees, and in addition lets you quotient out by order or repetition, if you want. It's terrifically pretty code.

Note: This is a link to the ACM digital library, and so is behind a paywall. I found a link to a tech report by Henglein covering the same material, but I don't exactly what is different.