The Expression Problem Revisited

The Expression Problem Revisited - Four new solutions using generics. Mads Torgersen. ECOOP'04.

...Over the years, many approaches to this problem have been proposed, each having different characteristics of type safety and reusability. While many of these rely on exotic or problem specific language extensions, this paper investigates the solution space within the framework of the soon-
to-be mainstream generic extensions of C# and the Java programming language.

Four new solutions are presented which, though quite different, all rely on techniques that can be used in everyday programming.

Same issue, same department (Daimi, Aarhus), different approaches.

Compared to the previous post, this paper is longer and thus perhaps easier to follow; the approach is more mainstream - and the code (in Java!) is downloadable.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Don't multimethods solve the problem?

Don't multimethods solve the problem? with multimethods, functions are decoupled from their objects and therefore both the structures domain and the functions domain can be infinitely extended.

Mostly...

...from Independently Extensible Solutions to the Expression Problem:

Programming languages supporting multiple dispatch via multi-methods provide good support for extensibility with default cases. MultiJava [8] is a Java-based programming language that allows programmers to add new methods to existing classes without modifying existing code and without breaking encapsulation properties. While new, externally specified methods require default cases, internal methods (i.e. methods that are defined inside of the corresponding class) are not subject to this restriction. A precise analysis of the constraints that are required to enable modular typechecking for such internal and external methods is given by Millstein, Bleckner, and Chambers, in their work on EML. Opposed to all the approaches mentioned before, EML makes it possible to use independent extensions jointly.

That's pretty good—as far as I can tell, it's only the need for default cases that makes it different from how Scala, O'Caml, and Sather deal with the issue. Of course, for us static-typing purists, that's bad enough: it should never be possible to apply a function to a value that it isn't equipped to deal with. Defaults are a cop-out violation of what it means to be statically typed. That was understandable before what was needed in order to truly solve the Expression Problem was well understood, but in the post-Sather/O'Caml/Scala world, we shouldn't be seeing new languages that don't solve it correctly.

However...

I believe that any class+functionality encoded using these approaches (i.e. using inernal methods as mentioned in the above quote) can be written using multimethods both more easily and more concisely (but the "more easily" part is subjective and therefore arguable). The type system should prevent using multimethods that have no default case - just as it would prevent using methods that aren't implemented in the approaches used in this paper. I believe that multimethods can indeed be more flexible as some kind of advanced type system could check whether the need for default case actually arises (maybe a multimethod is never used on a class with no case defined but only on its subclasses... and similar).

The possible weakness of multimethods that I see and wonder whether its "a real weakness" (meaning it can be the issue in practice) is that multimethods are "global", as compared the mechanism explained in the previous paper, The expression problem, Scandinavian style, where a class family can not be extended, but a new family can be defined, inheriting all the old functionality (so no code repetition) and extending it. In this manner, the old family is not modified and both (the new and the old) families can be used independently. Any comments?

What is really needed to solve the Expression Problem(?)

Seems to me that you don't need much of anything, just sum and recursive types, to basically solve the expression problem. Additional language features only serve to reduce, but (it seems) never completely eliminate, the boiler plate required to create particular ad hoc combinations.

Here is a toy example along the lines of Garrigue's solution using polymorphic variants except that here we use only ordinary variant types and SML. The essence of this technique is to produce a combination of the form fix (f1 + ... + fN). Multiple inheritance isn't an issue, because taking the fixpoint is only the last step.

First some reusable datatype and function fragments:

datatype 'e lam = ABS of string * 'e | APP of 'e * 'e | VAR of string

fun lookup s = List.find (fn (s', _) => s = s')
val gensym = let
   val i = ref 0
in
   fn () => "_" ^ Int.toString (!i) before i := !i + 1
end

fun lamEval {from, to} eval e =
 fn ABS (s, b) => let val s' = gensym ()
                  in to (ABS (s', eval ((s, to (VAR s'))::e) b))
                  end
  | APP (f, x) => let val f = eval e f
                      val x = eval e x
                  in case from f of
                        SOME (ABS (s, b)) => eval ((s, x)::e) b
                      | _                 => to (APP (f, x))
                  end
  | VAR s      => case lookup s e of
                     SOME (_, v) => v
                   | _ => to (VAR s)

fun lamToString toString =
 fn ABS (s, b) => "(\\"^s^"."^toString b^")"
  | APP (f, x) => "("^toString f^" "^toString x^")"
  | VAR s      => s

datatype 'e num = NUM of int | ADD of 'e * 'e | MUL of 'e * 'e

fun numEval {from, to} eval env = let
   fun evalBop bop con (l, r) = let
      val l = eval env l
      val r = eval env r
   in
      case (from l, from r) of
         (SOME (NUM l), SOME (NUM r)) => to (NUM (bop (l, r)))
       | _                            => to (con (l, r))
   end
in
   fn NUM i => to (NUM i)
    | ADD ? => evalBop op + ADD ?
    | MUL ? => evalBop op * MUL ?
end

fun numToString toString =
 fn NUM i => Int.toString i
  | ADD (l, r) => "("^toString l^" + "^toString r^")"
  | MUL (l, r) => "("^toString l^" * "^toString r^")"

datatype 'e cons = CONS of 'e * 'e | FST of 'e | SND of 'e

fun consToString toString =
 fn CONS (l, r) => "("^toString l^", "^toString r^")"
  | FST e       => "(#1 "^toString e^")"
  | SND e       => "(#2 "^toString e^")"

fun consEval {from, to} eval e = let
   fun prj f c t = let
      val t = eval e t
   in
      case from t of
         SOME (CONS p) => f p
       | _             => to (c t)
   end
in
   fn CONS (f, s) => to (CONS (eval e f, eval e s))
    | FST t       => prj #1 FST t
    | SND t       => prj #2 SND t
end

Then an ad hoc combination of said fragments:

datatype t = L of t lam | N of t num | C of t cons

fun toString t =
    case t of
       L l => lamToString  toString l
     | N n => numToString  toString n
     | C c => consToString toString c

fun eval e t =
    case t of
       L l => lamEval  {from = fn L l => SOME l | _ => NONE, to = L} eval e l
     | N n => numEval  {from = fn N n => SOME n | _ => NONE, to = N} eval e n
     | C c => consEval {from = fn C c => SOME c | _ => NONE, to = C} eval e c
Finally some examples:
val e = L (APP (L (ABS ("x",
                        N (ADD (C (FST (L (VAR "x"))),
                                N (NUM 1))))),
                C (CONS (N (NUM 2), N (NUM 3)))))

val N (NUM 3) = eval [] e
val "((\\x.((#1 x) + 1)) (2, 3))" = toString e
val L (VAR "y") = eval [] (L (APP (L (ABS ("x", L (VAR "x"))), L (VAR "y"))))

Adding both new datatype fragments and new functions is easy. Producing a new ad hoc combination takes some amount of boiler plate, but it is entirely straightforward and can be factored in various ways (e.g. using a generic sum type) for increased reuse (of the combination code). Indeed, all solutions to the expression problem (including Garrigues's use of polymorphic variants) that I've seen require some amount of ad hoc code to produce a particular combination and some authors argue that it is fundamentally unavoidable. However, I doubt that reducing the amount of boiler plate required to implement an ad hoc combination is all that significant in practice. Most applications aren't going to use dozens of combinations. What matters is the ability to build a library of combinable datatype and function fragments that allow one to produce a desired combination for a particular application.

More than Mostly...?

afaict unlike MultiJava the Nice programming language does not require default cases - there is no distinction between internal and external methods.

  1. // THE EXAMPLE PROBLEM
    package lang;
    interface Exp {
       override void print(); 
    }
    class Lit implements Exp {
       int value;
       print(){ print(value); }   
    }
    class Add implements Exp {
       Exp left; 
       Exp right;
       print(){ left.print(); print("+"); right.print(); }  
    }
  2. // ADDING A NEW OPERATION
    package eval; import lang;
    int eval(Exp e);
    eval(Lit e) = e.value;
    eval(Add e) = e.left.eval() + e.right.eval();
  3. // ADDING A NEW DATA STRUCTURE
    package neg; import lang;
    class Neg implements Exp {
       Exp exp;
       print(){ print("-("); exp.print(); print(")"); }
    }
  4. // COMPOSING BOTH EXTENSIONS
    package negeval; import neg; import eval; 
    eval(Neg e) = -e.exp.eval();

I added package and import definitions to make it obvious that are these separate files, being compiled separately.

afaict compiling each separate file does not require re-type-checking, although the imported files are parsed -

$ nicec negeval
nice.lang: parsing
lang: parsing
neg: parsing
eval: parsing
negeval: parsing
negeval: typechecking
negeval: generating code
negeval: linking

GEXL implements solution 4!

Thanks for this paper! I had started writing a general expression library in C# (GEXL), but ran into the expression problem. I've now implemented the 4th solution in this paper which permits full extensions in both directions as a C# library, with moderate type safety. I've described the implementation on my blog, along with an outline explaining how the 4th solution could be made more type safe, and more useful, given some modifications to C#'s type system.