Matching Objects With Patterns

Matching Objects With Patterns. Burak Emir, Martin Odersky, and John Williams.

Data in object-oriented programming is organized in a hierarchy of classes. The problem of object-oriented pattern matching is how to explore this hierarchy from the outside. This usually involves classifying objects by their run-time type, accessing their members, or determining some other characteristic of a group of objects. In this paper we compare six different pattern matching techniques: object-oriented decomposition, visitors, type-tests/typecasts, typecase, case classes, and extractors. The techniques are compared on nine criteria related to conciseness, maintainability and performance. The paper introduces case classes and extractors as two new pattern-matching methods and shows that their combination works well for all of the established criteria.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Another approach

I have to mention here another interesting (but I'm probably biased here) approach, which is proposed by the Tom language.
This language allows to integrate algebraic pattern matching in languages such as Java or C. It permits pattern matching over arbitrary object structures, provided the user gave a so called anchor, that tells how to view those concrete objects as algebraic terms. It was first proposed in A Pattern Matching Compiler for Multiple Target Languages
then this notion of anchor was formally defined in Formal Validation of Pattern Matching Code and Formal Islands.

as pattern matching is independent from the concrete representation of objects, it can be used for example to manipulate algebraically XML using the standard DOM library in Java. The language currently offers not only classical algebraic pattern matching, but also associative pattern matching, aimed at manipulating lists, and provides a strategy language used to traverse tree structures, for in depth exploration of object structures.

All this make it a quite powerful language for manipulation, alalyse and transformation of complex object structures, in a high level manner.

more than algebraic patterns

Dear Antoine, we cited the paper you mention (...multiple target languages), and any effort to promote pattern matching to Java people is great!

However, note that we don't just integrate algebraic pattern matching, but work towards an object-oriented concept of pattern matching -- one that encompasses algebraic patterns, but plays well with extensibility and data abstraction.

AFAICT Tom achieves integration in one direction because it is translated to a host language. It would be helpful (and a strong argument for TOM) if you could post a solution to one of the problems solved in the paper.

.

I wanted to say that i wasn't criticizing the paper, I enjoyed reading it, and it gives a clear view of those different techniques and the issues.

I enjoyed the extractor part, as it seems close to the mapping Tom uses. For example, the "Twice" operator example in Section 4 can be written in Tom:

  %include {int.tom}                                                                                                   
                                                                                                                       
  %op int Twice(n:int) {                                                                                               
    is_fsym(t) { t%2==0 }                                                                                              
    get_slot(n, t) { t/2 }                                                                                             
    make(t) { t*2 }                                                                                                    
  }                                                                                                                    
                                                                                                                       
  public final static void main(String[] args) {                                                                       
    int x = `Twice(21);                                                                                                
    %match(x) {                                                                                                        
      Twice(y) -> { System.out.println(x+" is two times "+`y); }                                                       
      _ -> { System.out.println(x+" is a number"); }                                                                   
    }                                                                                                                  
  }                       

The "_" part will always be executed, as the %match construct in Tom behaves like the switch instruction of Java: after a sucessfull match, the control flow passes to the next pattern unless there is an explicit return or break.

Interesting paper. I enjoyed

Interesting paper. I enjoyed reading it.

Stylistic suggestion: Less use of "on the one hand/on the other hand" constructions...

Thanks for the compliment!

Thanks for the compliment!

Regarding the style, on the one hand, we should have avoided repeating that construction, on the other hand it only appears three times! : ) Seriously, it is not easy to make a comparison between different techniques without stressing the trade-offs involved.

Regarding the upcoming discussion in this forum, I wonder if functional programmers will recognize GADTs as such when they appear in a programming language that happens *not* to be Haskell or ML?

.

I almost couldn't recognized GADTs in this paper and they said it was about GADTs and I'm an OO programmer by day ;)

Now for something completely different, as there is a connection between OO and coalgebra wouldn't make sense for have GCaDT as well?

Nice ideas

I too would like to thank the authors for an interesting paper.

I have long wondered why a simple-yet-powerful concept like pattern matching only ever seems to be provided by functional programming languages. I appreciate that it has a strong fit with the typical type systems used in the field, but when we have concepts like enumerations and inheritance in typical industrial OO languages, pattern matching seems a logical next step.

Personally, I have often wished for something like a "matches" operator, which I'll denote ~ in this post (though this symbol is used in several major languages already). This would effectively be a syntactic shorthand for a single match, so that for example the expression c ~ Card Heart val would evaluate to true if c is any heart card, and would bind val to whatever the value of c is. (As an aside, you could also give the expression a Maybe ResultType type, so a successful match could yield a meaningful value, and you could also bind a variable of static type Card during the match if c where a more general type.)

Given the typical syntax in C-style languages, the deep example in the paper would become something like this:

simplified_e = (e ~ Mul x (Num 1)) ? x : e;

which isn't far from the code presented. In the absence of deep matching, and with the typical shortcut evaluation semantics for logical-and in these languages, we could also write something like:

simplified_e = (e ~ Mul x y && y ~ Num 1) ? x : e;

Here the scope of the variables bound in the matches can safely extend to the second term of the logical-and and into the true result for the ?: operator, because we know all necessary matches have succeeded at these points.

Going back to the paper, I was particularly impressed with the concept of injection and extraction. I had envisaged overloading the "matches" operator and conversion constructors on a class type to provide similar functionality, but providing a separate conversion interface independent of the definition of the class type has clear advantages over the approach I'd contemplated.

For what it's worth, I think the single most significant result in the paper is that in the benchmarks, there was not a huge performance penalty for using this approach rather than the traditional OO-style decomposition. In other words, we could benefit from the expressive power of pattern matching even in low-level, high-performance languages. Given that most RTTI-based features in mainstream OO languages come with some sort of performance penalty attached, this result is reassuring.

It's the JIT

Dear Chris, thanks for the compliment. While it is reassuring, it also means that performance is much less predictable from the source, because one needs more information on what the JIT does (and ideally, how it does it). As remarked in the paper, we expected a performance penalty, and it came as quite a surprise that performance was that good (especially for cast). The Sun guys have embraced instanceof without telling anybody :)

OO approach

Nice paper, especially the extractor idea. Like all Scala related papers very interesting work!

I think that the real OO approach is missing, though: OO is supposed to let the objects do the work, i.e. not to decompose them. Therefore the OO approach would look more like the following:

trait Expr {
  def isUnit : boolean = false
  def simplified : Expr = this
}

class Num(val value : int) extends Expr {
  override def isUnit = value == 1
}

class Mul(val left : Expr, val right : Expr) extends Expr {
  override def simplified = if (right.isUnit) left.simplified else super
}

// simplification

Mul(x, Num(1)).simplified // result: x.simplified

Compared to the OO-decomposition this is much more concise, because only one test is needed. Furthermore the test is related to the domain and does not expose the structure.
Compared to all examples in the paper, the method "simplified" simplifies the whole expression without having to iterate over it "by hand".

fixed functionality

Dear Thorsten, thanks. The real OO approach you describe is left out for a reason. People often argued like this, e.g. in response to Martin's blog entry on pattern matching. If we want to make pattern matching popular among the masses, we will have to be clearer on where to draw the line -- but I think most experienced OO designer (experienced as in: burnt by mistakes in real life projects) would see the problems with the approach.

Can you agree with this motto: I would never use this approach at all, except when I am the author of the class and 100% sure that this functionality will not change. Adding a new method means changing all the classes.

The paper however looks for solutions to decomposing "from the outside", the scenario being that one does not have the luxury of owning the classes, or that you have to design classes such that people can add meaningful operations on them without needing access to their source code.

spoilt by Smalltalk

I'm probably spoilt by using Smalltalk in real life projects, which allows extending existing classes with new functionality :-)

Transferring this idea to Scala, it would be nice to be able to write something like the following. Of course, the extensions should only be visible from within the Simplification module (and its users).

module Expr {
  trait Expr { }
  class Num(val value : int) extend Expr { }
  class Mul(val left : Expr, val right : Expr) extend Expr { }
}

module Simplification {
  import Expr

  extend trait Expr {
    def isUnit : boolean = false
    def simplified : Expr = this
  }

  extend class Num {
    override def isUnit = value == 1
  }

  extend class Mul {
    override def simplified = if (right.isUnit) left.simplified else super
  }
}

In mainstream languages where this is not possible, I agree that there it will often be neccessary to be able to decompose from the outside.

To clarify: I'm not against pattern matching. In fact I like it. But I like the OO approach, too :-)

virtual classes

In Scala its fairly easy to get virtual classes using a pattern, your example in Scala:

trait Base {
  type Expr <: ExprImpl;
  trait ExprImpl { def self : Expr; ...}
  type Num <: Expr with NumImpl;
  trait NumImpl extends ExprImpl { def self : Mult; val value : Int; }
  def Num(value : Int) : Num;
  type Mult <: Expr with MultImpl;
  trait MultImpl extends ExprImpl { def self : Mult; val left : Expr; val right : Expr; }
  def Mult(val left : Expr, val right : Expr) : Mult;
}

trait Simplification extends Base {
  type Expr <: ExprImpl;
  trait ExprImpl extends super.ExprImpl {
    def self : Expr;
    def isUnit = false;
    def simplified : Expr = this;
  }
  type Num <: Expr with NumImpl;
  trait NumImpl extends super.NumImpl with ExprImpl {
    def self : Num;
    override def isUnit = value == 1;
  }
  type Mult <: Expr with MultImpl;
  trait MultImpl extends super.MultImpl with ExprImpl {
    def self : Mult;
    override def simplified = if (right.isUnit) left.simplified else   super.simplified;
  }
}
class Compose extends Somplified {
  trait Expr extends ExprImpl { def self : Expr; }
  case class Num(value : Int) extends NumImpl with Expr { def self = this; }
  case class Mult(left : Expr, right : Expr) extends MultImpl with Expr { def self = this; }
}

You can compose multiple extensions together using mixin-inheritance, its very flexible: I've implemented a large software system using this pattern and I'm very satisfied with the results. Of course, we need better syntactic sugar for this; i.e., Scala should support virtual classes in its syntax to support virtual class extension directly.

self-type annotations?

Thanks for the nice example.

Any reason not to use explicitly typed self references, instead of defining self methods in each trait & class?

I mean:

trait Base {
  type Expr <: ExprImpl;
  trait ExprImpl { self : Expr => /*...*/}
  type Num <: Expr with NumImpl;
  trait NumImpl extends ExprImpl { self : Num => val value : Int; }
  val Num : Int => Num;
  type Mult <: Expr with MultImpl;
  trait MultImpl extends ExprImpl { self : Mult => val left : Expr; val right : Expr; }
  val Mult : (Expr, Expr) => Mult;
}

trait Simplification extends Base {
  type Expr <: ExprImpl;
  trait ExprImpl extends super.ExprImpl { self : Expr =>
    def isUnit = false;
    def simplified : Expr = self;
  }
  type Num <: Expr with NumImpl;
  trait NumImpl extends super.NumImpl with ExprImpl { self : Num =>
    override def isUnit = value == 1;
  }
  type Mult <: Expr with MultImpl;
  trait MultImpl extends super.MultImpl with ExprImpl {	self : Mult =>
    override def simplified = if (right.isUnit) left.simplified else super.simplified;
  }
}
class Compose extends Simplification {
  trait Expr extends ExprImpl
  case class Num(value : Int) extends NumImpl with Expr
  case class Mult(left : Expr, right : Expr) extends MultImpl with Expr
}

Is there any reason NOT to do this? Thanks.

No, not at all. When I did

No, not at all. When I did this (5 years ago), something pushed me away from explicit self types, though I have no recollection of what that was. Today, Scala has changed a lot; explicit self types are probably more robust now.

Exhaustiveness/redundancy checks

I, too, enjoyed reading the paper. I have one major gripe, though: one central aspect of pattern matching seems to be completely missing from the discussion, and that is the ability to statically verify exhaustiveness and redundancy of patterns. Personally, I find this even more important than the terseness provided by nested patterns, especially when it comes to extending and maintaining existing programs. After all, that is the core motivation of static typing.

I fear that most of the presented approaches do not fare very well in this respect, since they all seem to require default cases. It may be interesting to discuss if and how each can be modified to get out better checking.

I noticed this too, and

I noticed this too, and assumed that the goal of extensibility
conflicted with the goal of exhaustiveness checking.

I'd like to see some approach to this (maybe "final" case classes or
somesuch) myself. So far all of my Scala code is "toy" sized, but
my experience with OCaml (and C vs Ada) has convinced me that the exhaustiveness checking is a good thing. Much more important to me
than having XML built into the language...

incompleteness check implemented

hello, I feel the urge to render this discussion thread a bit more timeless by announcing that the incompleteness check was added to the Scala implementation, roughly one month or two after the post above. however, so far it is not combined with extractors. some workable convention is needed to tell the compiler "this set of extractors is mutually exclusive and complete for this type", so that incompleteness checking can be enabled for extractors aswell.

It's true that extensibility clashes with incompleteness checking, that's why the curent implementation does it only if the sealed keyword is present and the match is on case classes. Mikael Pettersson's 1992 CC paper is very helpul in implementing incompleteness checking.