Multiple Dispatch in Practice

"... there seem to be clear advantages to informing the design of future languages with evidence drawn by something other than anecdote, personal experience, small-scale observational studies, or personal morality (Dijkstra 1968). Similarly, maintenance and debugging tasks - and even teaching about programming paradigms - would surely benefit from being based in evidence about the world as it is, as well as the world as we would like it to be!"

Multiple Dispatch in Practice, OOPSLA 2008 (pdf)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The functional counterpart of multi-method dispatch?

This paper reminded me of something that's been bothering me for a while. I love the idea that functional and OO languages enjoy a kind of duality: assuming suitably idealised languages, algebraic data types correspond to class hierarchies, and functions defined as folds over the algebraic data types correspond to polymorphic methods (polymorphic in the OO sense). (I hope I've summarised that correctly; see here for the details.)

What's been bothering me is how multi-methods fit into this understanding of the relationship between the functional world and the OO world. If the duality is robust, but at the same time multi-methods are a bona fide feature of OO languages, then it seems reasonable to look for a counterpart in functional languages. So one might suppose that since virtual method dispatch is dual to case analysis, that multi-methods are somehow dual to nested case analysis. Indeed (and as discussed by Muschevici et al) this is roughly the approach we take in OO languages that don't have multi-methods: we emulate them by combining normal virtual dispatch with either explicit type switching or nested Visitor patterns, both of which simulate functional-style case analysis in an OO setting. But one of the troubling things about these emulation techniques is the possibility that the order in which the dispatch code get executed might influence the outcome of the dispatch in unintended ways. Ideally we would like the outcome to be agnostic as to which ordering obtains. It's also often not obvious that the clauses are exhaustive or even that every clause is reachable.

This brings me to my question for knowledgeable readers. Recently I read Neel Krishaswami's excellent paper Focusing on Pattern Matching. I don't claim to understand it that well, but the idea seems to be that we should regard pattern matching constructs in functional languages, in logical terms, as "single-step" elimination forms for composite types such as A x (B + C x D). The significance of their being single-step is that they avoid exactly the kind of uninteresting (but potentially dangerous) distinctions between reorderings that can arise with nested elimination forms. So my question is this: is pattern-matching the functional counterpart of multi-method dispatch? If so then Neel's algorithms for checking exhaustiveness of patterns and compiling patterns into decision trees could be useful for implementing multi-methods.

(My apologies if this is obvious, or already well-understood in the world of multi-methods - or just plain wrong...I'm open to all possibilities :)

Formulae-as-types for pattern matching

neelk's paper probably deserves its own story; I haven't finished it yet, but I had observed that he cites Dummett's formidable The Logical Basis of Metaphysics.

You ask is pattern-matching the functional counterpart of multi-method dispatch? I can't say that I see a close relationship. I guess you are wondering whether the fact that both are ways of making a choice when there a number of viable competitors means there is some sort of algorithmic similarity, but proof search is intrinsically non-deterministic, while multiple dispatch normally forces a choice deterministically through an arbitrary order. I suppose you could imagine an analogous paper that introduces a formulae-as-types correspondence for multiple dispatch, which would be a counterpart...

Agreed.

The full correspondence between multiple dispatch and pattern matching could only be achieved by applying a deterministic and arbitrary choice, such as a heuristic 'best-match' approach, to determine which pattern match among many to select. And to be 'multiple' dispatch, it would need to do so using all arguments to the function.

Ideally, one must also be able to modularize this pattern-matching in the same manner one expects to modularize multi-methods or polymorphic methods. 'Modularity' isn't part of any definition of polymorphism or multi-methods that I've ever read, but it is critical to the use of multi-methods in practice. This means one must be able to escape the Haskell/ML limitation whereby functions are wholly defined within one module. Multi-methods and polymorphic methods in general often achieve their primary benefit due to the ability for certain overrides to be introduced by new modules.

Modularized multi-functions using pattern matching would, in turn, introduce some extra complexities WRGT compilation and type inference, though these can be overcome by reducing support for partial compilation (i.e. link-then-compile may be necessary, though I prefer that order anyway).

I think the correspondence

I think the correspondence you draw makes sense. Multiple dispatch corresponds to pattern matching on multiple arguments.

Agreed

A telling point for this view (mentioned in one of the references in the paper above) is that multiple dispatch can be abstracted as single dispatch on a hierarchy of tuple types.

In a language with multimethods you could imagine desugaring each pattern-matching construct into a locally-defined multimethod with a concrete method for each pattern case (CPS would make this easier...).

Similarly, if you dynamically generate optimized dispatch code for a multimethod once all the cases are known, you will probably benefit form the same optimizations used for implementing pattern matching.

In practice, implementations of the features differ along a lot of axes:
- pattern matching is usually "deep" on nested structure, while multimethods often only consider the types of the arguments
- multimethods are usually open while pattern matching is usually closed to new cases
- pattern matching is often implemented to take the first match, while multimethod systems usually try to find a "best" match

Given the pervasiveness of multi-argument pattern matching in functional languages, I am kind of surprised by the lack of uptake of a seemingly equivalent feature in the OO space.

I think the open nature of

I think the open nature of multimethods is what makes them difficult to use, and brittle to extend.

show me

Please show some evidence that multimethods are difficult to use and brittle to extend.

At a guess...

I expect he means in the context of implementation with static typing (esp. if the set of types can be extended), separate module compilation, and optimizations (such as partial evaluation or decision tree reductions). Multi-methods become easier to handle if you reject any one of those three.

Of course, I can only guess as to Naasking's actual reasoning, but I came to the same conclusion when I framed the problem in terms of implementing multi-methods or extensible pattern matching in ML. Based on previous conversations, I imagine Naasking subconsciously frames every feature idea in the context of how it might fit into OCaml and similar languages.

Part of 'difficult to use' is the inability to determine, wholly predict, or unit test the return values from an extensible multimethod (or pattern matching function) because they are extensible and thus are not wholly defined within any 'unit'.

extensible unit tests

Given extensible multimethods shouldn't we expect extensible unit tests? :-)

I imagine Naasking

I imagine Naasking subconsciously frames every feature idea in the context of how it might fit into OCaml and similar languages.

Somewhat. Being stuck with using C# at work will quickly engender an appreciation for better languages. :-)

I think static typing and separate compilation are indeed the salient points; I'm not overly concerned with optimizations if sufficient expressiveness is gained. Auditing source code with such flexible dispatch rules that are not guided by scoping or parameter passing does not sound like fun, particularly if method targets are in modules distributed in binary form, or are loaded dynamically. Then again, perhaps it's a problem for me because I'm simply not used to thinking in such terms.

As you mentioned, unit testing also seems challenging in the presence of multiple dispatch, and I raised a few issues in another post, re: dynamically loaded code and dispatch across module boundaries. Coming from a capability security background, I also have security concerns, but aside from the auditing difficulties above, these are closely related to respecting abstraction boundaries at the language level.

The converse...

While not stated, naasking, your comment seems to imply that pattern-matching is both easier to use and more robust under extensions.

I don't personally know of any approaches for allowing new cases to be added to an existing algebraic data type, and/or pattern-matching construct in a modular and safe way. In contrast I can point to the "Modular Statically Typed Multimethods" paper as an existence proof that multimethods can be made robust under certain constraints.

Adding multimethods to a Scala-like object/module system would be a really interesting exercise, and would yield a design very different from the much more "open" (and difficult to check) multimethods in CLOS or Dylan.

The open nature of multimethods...

That wasn't my impression at all. I think naasking was claiming that "the open nature of multimethods" is what makes them brittle and difficult to use, with the implied meaning being "if we make pattern-matching functions 'open'" they will be subject to the same cause and similarly would become brittle and difficult to use.

I see

So the unconstrained "open"-ness is the problem. I can buy that, but I'd add, though, that this degree of "open"-ness is not a necessary property of every multimethod system.

I strongly believe that multimethods can be integrated with good modularity and static checking (exhaustiveness, etc.), but since I can't point to a production language with all those properties I guess I need to concede the point until somebody does the work. :)

Yes, I think openness is

Yes, I think openness is generally the source of the problem, and particularly its interaction across abstraction boundaries.Also, I have a hard time understanding how multimethods interact with dynamic loading/mobile code, particularly untrusted code. Can an untrusted source inject a new function which overrides a trusted function? With single dispatch, trusted code can know precisely when it is crossing a trust boundary.

Some statically typed language which can tame the 'openness' might suffice. I think Open data types and open functions might fit the bill.

show me

Yes, I think openness is generally the source of the problem

I'm still curious to hear some evidence that multimethods are "difficult to use, and brittle to extend".

Some statically typed language ...

4 of the languages considered in "Multiple Dispatch in Practice" are statically typed.

4 of the languages

4 of the languages considered in "Multiple Dispatch in Practice" are statically typed.

Cecil has optional static type declarations. Only Nice and and MultiJava enforce static typing, and that's not very strong. I'm not familiar enough with Diesel to comment.

As for evidence, I believe I've outlined my concerns regarding multimethods, but I'll concentrate and elaborate here. If you can describe how the following can be achieved in a language with multiple dispatch, then I will retract my statement:

1. Checked exceptions.
2. Some assurance of dispatch safety in the presence of potentially malicious mobile code.
3. Some argument describing how auditing such a code base would not a nightmare, considering methods and data could be declared anywhere, and not necessarily bundled together (it sounds like a nightmare, but I will accept anecdotes to the contrary).
4. Statically checked absence of "method not understood" (which I believe Nice provides, but I'm not certain of that).
5. Abstraction boundaries are respected, which I believe Nice and MultiJava provide, though I'm not clear whether multimethod invocation and primitive/private method invocation are syntactically distinct, in which case upgrading a class which changed a primitive method to a multimethod could leave a consumer of this class vulnerable(this ties in with concern #2).

I think these objections justify my view, and I welcome any corrections.

Partial answers

1. Checked exceptions.

This one isn't a problem at all unless you're attempting simultaneously to provide type inference to determine the set of exceptions. If multi-methods are forced to adhere to a particular exception signature (e.g. via a declaration) then one would simply check the declared exceptions.

This also isn't a problem if you sacrifice separate compilation. By flipping the 'compile then link' process to become 'link then compile', you can achieve 'whole program' views when it comes to inference of actualized exceptions. This flipping is useful for optimizations, JIT, and interpretation, too, but may tend to require more link-time rework.

2. Some assurance of dispatch safety in the presence of potentially malicious mobile code.

I assume you mean to imply that this 'mobile code' can inject new cases into a multi-method that is potentially used by multiple other components which, themselves, may be operating under different privileges. This is a valid issue. If not handled carefully, it could lead to security violations (via privilege elevation) and other complications.

I believe you mean to a particular 'other complications', those being checked exceptions and type-safety. Type safety is a concern because the injected code may cause a change in the 'best match' heuristics and potentially widens the set of types allowed by the method. This potentially invalidates prior typechecking work. And 'checked exceptions' are a concern because the injected code can (if you are using inference) add new exceptions to the multi-method that go unchecked.

My own solution is to simply refuse to inject new methods into a multi-method based on the addition of code. I understand that this isn't very analogous to, say, Java where a mobile object carries with it a reference to its polymorphic uni-methods. OTOH, I don't favor object oriented approaches unless they use Erlang-style 'objects' that are essentially independent processes/actors/services, and I view communications processing as quite a distinct problem function decision trees.

Such a refusal would limit the mobile code. However, it could achieve the same effect as runtime 'unimethod' polymorphism via first class functions, so it is difficult to claim there was any loss. If extension of multi-methods was absolutely required, I might give each remote user an extensible 'interpreter' that can perform any recompilation upon seeing a new multi-method case. Privilege elevation is a convincing reason to resist direct injection of multi-method cases into a multi-user environment.

3. Some argument describing how auditing such a code base would not a nightmare, considering methods and data could be declared anywhere, and not necessarily bundled together (it sounds like a nightmare, but I will accept anecdotes to the contrary).

If one limits multi-methods to the instances found in a collection of linked modules, compiled all at once, then one could perform integration testing well enough.

As mentioned above, 'declared anywhere' is an unnecessary degree of "open"ness.

4. Statically checked absence of "method not understood" (which I believe Nice provides, but I'm not certain of that).

This cannot be done in combination with separate compilation of modules because no module can actually check and determine that a particular multi-method call won't become "understood" later. That said, one can suspend or delay that particular typecheck until link time (which is still static).

5. Abstraction boundaries are respected

I'm not certain what this means. I view abstraction and encapsulation as quite orthogonal especially for mobile code (which routinely violates abstraction boundaries).

If multi-methods are forced

If multi-methods are forced to adhere to a particular exception signature (e.g. via a declaration) then one would simply check the declared exceptions.

While definitely a solution, as is whole program compilation, they both seem a bit draconian. Whole program compilation also interferes with many industry applications which now expect a certain amount dynamism, like dynamic loading for web applications in ASP.NET.

Checked exceptions in the presence of heavy abstraction is still something of an open problem though, ie. consider an collection with two implementations, one backed by a list, and one by a hashmap. Conformity to an exception signature might be the only viable solution, unless some way to abstract over exception handling can be found, ie. functions abstracted over a module implementation are also somehow parameterized by exception handlers.

'mobile code' can inject new cases into a multi-method that is potentially used by multiple other components which, themselves, may be operating under different privileges. This is a valid issue. If not handled carefully, it could lead to security violations (via privilege elevation) and other complications.

Yes, which is why I suggested as a first step, that multimethod dispatch by syntactically distinguished from privileged/primitive methods/operations. At least then it becomes apparent when one may be vulnerable simply by local inspection of the code.

This cannot be done in combination with separate compilation of modules because no module can actually check and determine that a particular multi-method call won't become "understood" later. That said, one can suspend or delay that particular typecheck until link time (which is still static).

It's been awhile since I've read it, but I believe the "Open Types and Open Functions" paper I linked to accomplishes this. It allows exhaustively checked data type extensions and case extensions, with separate compilation given recursive modules.

Whole program compilation

Whole program compilation interferes with many industry applications which now expect a certain amount dynamism, like dynamic loading for web applications in ASP.NET.

Keep in mind that whole program compilation is only closing off one avenue by which you might achieve dynamism. Don't confuse giving up partial compilation for giving up the features you believe are provided by it.

Even with whole program compilation, the dynamism you seek may be achieved by other mechanisms such as program composition or constructing service workflows. Consider compilation of programs into First Class language objects that can then be stored in the Operating System, passed to other objects, accessed to return other objects, invoked with parameters, registered in an object system, persisted, duplicated, mobilized, and debugged.

I believe the "Open Types and Open Functions" paper I linked to accomplishes this. It allows exhaustively checked data type extensions and case extensions, with separate compilation given recursive modules.

The "Open Types and Open Functions" paper describes a mechanism that achieves a limited form of partial compilation by shifting composition of open types and functions into the 'Main' module and making all other modules dependent upon the 'Main' module. This mechanism does not achieve the ability to inject new modules at runtime without partial recompilation. This mechanism also does not provide exhaustiveness checks for open data types in a modular fashion (those wait until 'Main' is compiled).

In any case, exhaustively checked data extensions and case extensions still effectively block additions at runtime regardless of how one goes about compiling. Suppose you have the following situation:


 import moduleA; // adds constructor A and function funA
 import moduleB; // adds constructor B and function funB
 // To make this pass the exhaustive checks, we now need:
 funA (B x) = ...
 funB (A x) = ... 

Attempting to 'import moduleC' at runtime which adds constructor C and function funC, and is unaware of moduleA and moduleB, will result in a 'funC' not possessing cases for constructors A and B. You could get around this by defining some defaults (funA _ = ...), but such defaults will rarely accomplish the intended effect. So in general one will always need to modify code in a module that imports one that declares open constructs or functions.

Checked exceptions in the presence of heavy abstraction is still something of an open problem though [...]

Indeed. They can be handled by use of exception signatures if you must, as suggested above. But 'checked exceptions' are only an issue if the behavior of an unchecked exception is 'undefined'. So the easy solution is to define the behavior of raising an unchecked exception. My own preference is to 'let it crash': an unchecked exception shuts down the active process that raised it.

functions abstracted over a module implementation are also somehow parameterized by exception handlers

Programmers already may implicitly provide exception handlers to functions (as part of the implicit continuation, via the frame stack). If you mean to pass them explicitly, I cannot recommend it because doing so forces interfaces to unite policy and mechanism. If you must have checked exceptions, I prefer you keep them inferred.

I suggest that multimethod dispatch be syntactically distinguished from privileged/primitive methods/operations. Then it becomes apparent when one may be vulnerable simply by local inspection of the code.

Agreed. Doing so is also good for metaprogramming (which may need a function result prior to all modules being linked), optimization, and a variety of other purposes. I'd keep multimethods distinct from lambdas.

an inability to disprove does not prove

Seems to me that you made a definite and sweepingly general claim about multimethods being "difficult to use, and brittle to extend".

But seeing "checked exceptions" and "potentially malicious mobile code" listed leads me to ask if you feel that any language that does not provide #1 and #2 is "difficult to use, and brittle to extend"?

#3 My expectation was that your "difficult to use, and brittle to extend" claim would be based on some project experience with multimethod languages.

Instead your claim seems to be based in conjecture and rather than provide some demonstration you seem to feel others have the burden of disproving your conjecture.

Do you feel that any language that does not provide #4 and #5 is "difficult to use, and brittle to extend"?

But seeing "checked

But seeing "checked exceptions" and "potentially malicious mobile code" listed leads me to ask if you feel that any language that does not provide #1 and #2 is "difficult to use, and brittle to extend"?

With such languages, there is a clear migration path to removing the insecurities (removing ambient authority for #2), or adding checked exceptions. For me at least, the path is not so clear given multimethods.

#3 My expectation was that your "difficult to use, and brittle to extend" claim would be based on some project experience with multimethod languages.

No, sorry. Perhaps my statement was worded too strongly so as to come across as factual, rather than an inference based on the properties of multimethods.

you seem to feel others have the burden of disproving your conjecture.

Not at all. Feel free to ignore my comments if my arguments are unconvincing and you don't feel inclined to correct them, but I would hope a more constructive dialogue is possible.

Do you feel that any language that does not provide #4 and #5 is "difficult to use, and brittle to extend"?

Assuming clients use the facilities that violate abstraction boundaries and function/method lookup can fail at runtime, then yes (thus, some dynamically typed languages fall into this category for me as well).

so as to come across as factual

Let's just say it was worded too strongly for me to understand in the way you apparently intended.

I do find it a somewhat amusing to see speculative claims given the quote I chose from the paper - would surely benefit from being based in evidence about the world as it is :-)

Safe/modular multimethods...

The following is my understanding of the core lesson of the "Modular Statically Typed Multimethods" paper that came out of the Cecil project. I'm not claiming it as a new insight, just one that seems to get ignored:

The primary source of concerns over multimethods is the idea that loading a new module can "inject" new methods into an existing multimethod and thus change the semantics of existing call sites and/or hijack the capabilities available to the existing methods. When the module being loaded is untruste and/or loaded dynamically in the middle of program execution, these concerns become greater.

The crux of a possible solution, then, is to scope multimethods to individual modules. The language can allow a new module N to extend an existing module M, declaring new concrete methods for a multimethod first declared in M. What is important, though, is that this process does not inject any new behavior into M. Existing clients of M still see the exact same interface an implementation of the M module. Dynamically loading/unloading other modules can never change the behavior of M - it is an island unto itself.

Instead, the module N simply exports a "fresh" multimethod that is an extension of M's version of the same multimethod. Clients that expect an implementation of M (or rather, of some signature M implements) could be provided an N module instead - but this is a policy decision made "higher up" in the application. Whoever makes those policy decisions also decides what capabilities/permissions to grant to M and N, and those decisions can be made independently.

The parallels to ordinary inheritance and virtual method overrides in class-based OOP should be clear - while a subclass can override any virtual method in any way it wants, it can't directly change the behavior of existing base-class instances.

Adding this kind of multimethod to a Scala-like language that unifies the module and object systems could lead to a really natural approach.

Why is this problem specific to multimethods?

Isn't the problem exactly the same for single dispatch?

It's a bit different...

A new module in most single-dispatch languages cannot change how instances of any existing class in another module responds to messages.

In contrast, a new module in (for example) the Dylan language (which supports "open" multimethods) could insert new methods on a globally-defined multimethod and thus change the meaning of existing code.

As a particularly heinous example, I believe that a Dylan module could define a new method on the "make" function (the standard-library function that constructs new instances of classes) that re-routes attempts to create an instance of so that they instead create instances of . This exact kind of of overload is used to good effect in real Dylan programs, but could look pretty scary if you didn't trust all the modules going into your program.

In message-passing/single dispatch OOP, "old code" only calls "new code" when it is explicitly passed objects defined in the new code. With unrestricted multimethods, the "new code" can insert new semantics into functions declared in the "old code." Good programming practice for these languages tends to say that this is a bad idea (e.g. the Dylan reference manual reccomends that a module should only add new methods to existing multimethods when the new method is specialized on one of the types declared in teh new module), but no production langauge that I am aware at fully rules out these bad cases.

[To be fair, the Dylan language actually spends a lot of effort on the idea of "sealing" multimethods on particular domains and thus ruling out later extensions. This is an opt-in feature, though, and seems to be motivated by optimization more than safety.]

Dynamic Loading

Your point that one can simply prevent modules loaded 'later' from affecting the behavior of modules loaded 'earlier' (presuming a DAG module dependency graph), I believe doing so defeats some of the purpose of multimethods. I think the solution better lies in another of your statements:

When the module being loaded is untrusted and/or loaded dynamically in the middle of program execution, these concerns become greater.

Dynamic loading of modules in the middle of program execution is what creates this problem. So the solution would seem to be to not dynamically link in new modules during program execution. Go ahead and allow loading of other code (e.g. via marshalling), but this other code shan't have 'module' privileges (such as aspect oriented inclusions, provision of program components 'required' by higher level modules, additions to multi-methods, etc.)

There is little difference between composing programs and composing modules if modules don't have any special privileges over programs. But lacking access to such privileges could make it difficult to separate policy (e.g. constructs, multimethods, and requirements declared in higher level modules) from implementation (e.g. construct components, multimethods, and provisionals implemented in lower level modules).

When you can assert truths about the 'time' at which modules will be composed, it makes sense to give them special privileges that might not make sense for code loaded at arbitrary times - especially in the context of partial evaluations, optimizations, referential transparency, etc.

I think this is not quite true.

One of the discussions that came up in the design of E was whether E was statically typeable. The answer was: not with anything less than a dependent type system, and possible not then. The problem is that E has first-class method calls. A call can be bundled to be "appplied" to an object later.

The issue I anticipate with viewing multimethods as pattern matching is the possibility of multiple return types where return type depends on the pattern that is matched.

There has been much work on

There has been much work on typing first-class messages. I believe GADTs are all you need.

.

.

A perfect moment of clarity?

Or, a moment of perfect clarity?