Generic Discrimination: Sorting and Partitioning Unshared Data in Linear Time, Fritz Henglein. ICFP 2008.
We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worst-case linear-time discrimination functions (discriminators) can be defined generically, by (co-)induction on an expressive language of order denotations. The generic definition yields discriminators that generalize both distributive sorting and multiset discrimination. The generic discriminator can be coded compactly using list comprehensions, with order denotations specified using Generalized Algebraic Data Types (GADTs). A GADT-free combinator formulation of discriminators is also given.
We give some examples of the uses of discriminators, including a new most-significant-digit lexicographic sorting algorithm.
Discriminators generalize binary comparison functions: They operate on n arguments at a time, but do not expose more information than the underlying equivalence, respectively ordering relation on the arguments. We argue that primitive types with equality (such as references in ML) and ordered types (such as the machine integer type), should expose their equality, respectively standard ordering relation, as discriminators: Having only a binary equality test on a type requires Θ(n^2) time to find all the occurrences of an element in a list of length n, for each element in the list, even if the equality test takes only constant time. A discriminator accomplishes this in linear time. Likewise, having only a (constant-time) comparison function requires Θ(n log n) time to sort a list of n elements. A discriminator can do this in linear time.
If you're like me, at some point you probably heard about linear time sorts like radix sort and then dismissed them as an interesting curiosity --- the restriction to sorting numbers seems pretty limiting, after all.
Henglein did not, and in this paper he shows how to write a teeny-tiny library of type-directed combinators that can lift linear time sorts to structured types like lists and trees, and in addition lets you quotient out by order or repetition, if you want. It's terrifically pretty code.
Note: This is a link to the ACM digital library, and so is behind a paywall. I found a link to a tech report by Henglein covering the same material, but I don't exactly what is different.
Recent comments
27 weeks 2 days ago
27 weeks 2 days ago
27 weeks 2 days ago
49 weeks 3 days ago
1 year 1 week ago
1 year 3 weeks ago
1 year 3 weeks ago
1 year 5 weeks ago
1 year 10 weeks ago
1 year 10 weeks ago