archives

Points in the Pattern Matching Design Space

Functional languages often have a pattern matching construct. The weakest form is destructuring:

let (x,y) = f(z) in x+y

The left hand side is an expression that usually constructs a data structure, but by using it in a pattern matching position the compiler emits code to perform the inverse of that operation: instead of constructing a pair, we destructure a pair. The general form is:

let f(x) = y in ...

The compiler inverts the operation f and emits this:

let x = f^-1(y) in ...

However, the operation f is not always surjective, that is the equation f(x) = y may not have a solution for every y. For example x::xs only ever produces non-empty lists, so the pattern x::xs only matches non-empty lists. Therefore languages like ML combine destructuring with conditionals:

match ys with
| [] -> ...
| x::xs -> ...

The other limitation is that f is not always injective, that is the equation f(x) = y may have multiple solutions x. For example f(x) = abs(x). To handle this we can combine pattern matching with iteration:

for abs(x) = y do print(x)

For example if y is 2, then x gets bound to -2 and 2 in turn, iterating over all solutions. Note that we can generalise this to a general predicate:

for P do ...

Where P is a predicate. For example:

for x in xs do ...

To iterate over all pairs x in xs and y in ys where x

for x in xs && y in ys && x < y do ...

JMatch is an example of this kind of pattern matching.

This raises the question which predicates can we use in a pattern match? For example:

for prime(n) && n < 100 do ...

We'd like this to iterate over the prime numbers that are smaller than 100, but how can we implement this? It is impossible to do this efficiently for any predicate, but how can we give the programmer the tools to implement the patterns he wants to use efficiently?

Another extension of pattern matching is logic variables like in Prolog. This allows us to reason about incomplete information, where a part of a data structure is left unspecified. For example executing [X] = Y arranges that Y is a list containing an unknown value X. If we later execute X = 2 then Y is now a list containing 2.

Futher extension is constraint programming, where we are not only able to leave out a part of a data structure, we can also specify partial information about a value. For example the constraint X*X < 10. If we also add the constraint X in [-2,5] then X gets bound to -2. New types of constraints require new specialized algorithms to solve them efficiently. How do constraint handling rules fit in? How can we provide mechanisms to the programmer to add additional algorithms for constraint solving. Existing constraint systems are limited in that you need to specify that variables come out of a finite set of values. Can we lift this restriction?

What is the state of the art in pattern matching and its extensions? How can we extend the scope of patterns that can be matched, and in which other ways can we increase the power of pattern matching?