Open data types and open functions

Open data types and open functions (pdf)
by Andres Löh and Ralf Hinze
Abstract: In object-oriented languages, it is easy to extend data by defining new classes, but it is difficult to add new functions. In functional languages, the situation is reversed: adding new functions poses no problems, but extending data (adding new data constructors) requires modifying existing code. The problem of supporting both directions of extensibility is known as the expression problem. We present open data types and open functions as a lightweight solution to the expression problem in the Haskell language. The idea is that constructors of open data types, and equations of open functions can appear scattered throughout a program. In particular, they may reside in different modules. The intended semantics is as follows: the program should behave as if the data types and functions were closed, defined in one place. The order of function equations is determined by best-fit pattern matching, where a specific pattern takes precedence over an unspecific one. We show that our solution is applicable to the expression problem, generic programming, and exceptions. We sketch two implementations. A simple one, derived from the semantics, and one based on mutually recursive modules that permits separate compilation.
This (draft) paper was mentioned at the Spring School on Datatype-Generic Programming (discussed previously on LtU) where it received considerable interest.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Sounds interesting. It kinda

Sounds interesting. It kinda reminds me of something from a couple of years ago, but I am not sure what exactly..

P.S: The best way to contribute items to LTU is to signup to become a contributing editor. Hint, hint.

Related work doesn't mention CLOS or Dylan

With dynamic typing it's easy, and it has been done 20 years ago.

Generic functions from CLOS, Dylan and Cecil are actually more powerful than open functions from this paper. E.g. they allow to group related cases in supertypes and provide the default definition of a function for a given supertype.

The paper doesn't seem to identify static typing as the issue which differentiates this work from CLOS.

Other than that, it's indeed a much needed feature for languages like Haskell, in particular in the mentioned use case of exceptions, and I like the concrete design.

The example of exceptions actually shows one problem of static typing which remained unsolved in this paper and also in OCaml's extensible exn type: in order to put the key value in the KeyNotFound exception, it must have a concrete monomorphic type, called Key in the paper. Actual dictionaries are usually polymorphic to various degrees. For this reason actual OCaml's Not_found exception constructor has no parameters.

Well Understood Problem

It's well understood in the community that discusses the "expression problem" that this is essentially an interesting issue in type theory. The more-or-less standard solution is often referred to as "virtual types." One of the earliest approaches to virtual types was Beta's virtual classes, but these weren't type-safe. Relatively recently (2000), Didier Rémy and Jérôme Vouillon published On the (un)reality of virtual types, in which they observe that the combination of parametric polymorphism, correct typing of binary methods, and structural typing are sufficient to allow for the construction of type-safe virtual types. Their extensive examples are in O'Caml. I found the following comment in the introduction humorous:

In fact, using Ocaml as the implementation language, all examples that we know as involving virtual types can actually be typed in with no effort at all. Actually, the untyped code that one naturally writes needs only to be documented with class type parameters and, sometimes, with type constraints. No other change in the code is necessary. Had we not known about this challenge, we would not have noticed any particular difficulty.

Still, as you rightly point out, it's good to see this being tackled by the Haskell community.

Update: I also recommend Independently Extensible Solutions to the Expression Problem, which wonderfully summarizes the issues and addresses them using Scala as its vehicle. I continue to believe that O'Caml and Scala are the two best languages to become familiar with today—I find them considerably more approachable than Haskell, for a variety of reasons.

Qrczak: Actual dictionaries are usually polymorphic to various degrees.

Right again. Here's a Polymorphic Map for O'Caml.

design patterns

The paper you mention (Independently Extensible Solutions...) was a little strange to me in that they cast everything in terms of the "Interpreter" and "Visitor" design patterns.

Is this normal practice? It seems like this expression "problem" has nothing inherently tied to design patterns.

Depending Upon Your Definition of "Inherently"

dbfaken: The paper you mention (Independently Extensible Solutions...) was a little strange to me in that they cast everything in terms of the "Interpreter" and "Visitor" design patterns.

Is this normal practice? It seems like this expression "problem" has nothing inherently tied to design patterns.

Upon re-reading the paper, my sense is that they're just attempting to articulate how the expression problem has actually been tackled in languages that don't lend themselves to implementing virtual types. They're not trying to be doctrinaire; they're saying "Solutions with an OO flavor tend to rely on the interpreter pattern and solutions with a functional flavor tend to rely on the visitor pattern," and describing the limitations that arise from both approaches. This certainly reflects my own experience with attempting to address the expression problem in popular programming languages. But the sheer amount of material they were able to write on the suboptimal solution space reinforced the humor that I found in the comment that I quoted from "On the (un)reality of virtual types," in which the authors, using O'Caml, noted that they wouldn't have known there was a "problem" if they hadn't been told about it.

"problem"s

OK, thanks. That makes more sense.

As an OO type of person...

To me it's so obvious that you want this, that I'd say: drop the open syntax, and just make all types and functions open.

Off the Cuff

sjoerd_visscher: To me it's so obvious that you want this, that I'd say: drop the open syntax, and just make all types and functions open.

I immediately have two objections:

  1. Saying "As an OO type of person" ignores the paper's observation that a straight OO approach makes it difficult to add new functions to the system without modifying base classes.
  2. I want the ability to say "this isn't extensible." In the real world, not everything is. In particular, it isn't always practical to design for unanticipated extension. Not every member function in C++ should be "virtual;" many classes and methods in Java should be "final."

Off the cuff from me too

Re 1: Right, maybe I simply should have said: "As a Javascript programmer"

Re 2: In the case of stateful code I agree. But in this case I think you don't even have to design for extension. It is the responsibility of the extender of the data type to implement all the functions that have the data type as input.

That fundamentally messes up

That fundamentally messes up some existing properties of the relevant type systems, you'd never be able to tell a function was total again.

The difference

How is making everything open different for type systems from just adding the open feature?

Some vs. All

Being able to selectively say "this function is open," i.e. may not terminate, is one thing. Saying "all functions are open," i.e. may not terminate, is another thing entirely. One thing that type systems do is constrain terms such that progress in their evaluation can be guaranteed. This naturally means that type systems have to be careful in the presence of general recursion, i.e. of "openness." If you make everything open, it quite literally means that you don't care about guaranteeing progress before delivering your software. That's not an indefensible position, but neither is it one that many of us are willing to adopt.

Are we talking about the same openness?

The paper does not even mention termination.

Not Only That...

sjoerd_visscher: Are we talking about the same openness?

Not directly, and I have to apologize for the confusion. They use "open" to mean "extensible," and that's it, as far as what's explicitly said in the paper is concerned. However, the moment you have an "extensible function" in a language with general recursion, you open (sorry) yourself up to what in some texts is called "divergence" and more often is called "non-termination." It's exactly what Philippa meant when she said "you could no longer prove that any function was total," because total functions don't diverge, i.e. they're guaranteed to terminate.

sjoerd_visscher: The paper does not even mention termination.

No, it's implicit, as described above. You should go through life with the implicit understanding that general recursion gives type theorists the vapors because of the risk of non-termination; that allowing arbitrary overloading of functions in languages with general recursion means that you run the risk of introducing general recursion, and hence non-termination, where it isn't wanted; and that having "open functions" implies arbitrary overloading of functions. :-)

I don't understand.

The papers says the idea can be implemented with a source-to-source transformation, the result being regular Haskell code. So if the target of the transformation is guaranteed to terminate, so is the input of the transformation, unless the transformation itself would not terminate. But that clearly isn't the case (it's a very simple transformation).

Termination isn't guaranteed

Termination isn't guaranteed by the type system alone. But the type system does currently guarantee that you don't get passed a constructor you weren't expecting and fail a pattern-match. If everything's open, it can't make that guarantee, and (to give an example) you could add a new constructor to the standard list type and watch a pile of previously-total functions with no clue said constructor exists fall over. Adding a new case to every function on lists isn't practical either, if nothing else some of them will be unexported functions inside other modules.

It's not much worse than it already is

Every type includes ⊥. There is little difference between encountering an unknown constructor and encountering ⊥; you can pass it around, but you can't look inside. Open types don't change anyting wrt. safety in Haskell.

The only difference is that with closed functions you know that e.g. length either returns a non-negative integer or ⊥, and thus various equivalences in code using length hold (e.g. that comparing the result for being non-zero is the same as comparing it for being greater than zero) which would no longer hold with open functions, so some implementation details would become visible.

There's one significant

There's one significant difference: one can knowingly avoid passing ⊥ to a strict function. Sure, functions can attempt to default on constructors they don't know about, but that doesn't always make sense.

Unit testing

There are rules certain functions have to follow. length will always have to return a non-negative integer or ⊥, open or not, that's what the semantics of the function requires. Unit testing can be used to ensure that is the case.

In practice

I don't see the problem. In the OO world it is not unusual to implement the Collection interface, which I think is the equivalent of adding a new constructor to the standard list type.

If there are unexported functions inside other modules, I'm sure the type system would complain. Which would make certain modules unusable with certain extensions, but that's better than no extension at all.

Btw, I'm sure there are methods in the standard Java library that won't terminate when passed the wrong implementation. I've never heared anybody complain about that.

No, implementing Collection

No, implementing Collection isn't equivalent to adding a new constructor to [] - pattern-matching on algebraic datatypes comes about because more of their concrete structure is exposed. The equivalent of implementing Collection in Haskell'd be providing an instance of a Collection type class - something which can already be done.

I Dunno...

If I ever found one, I'd complain about it!

I am not a Java programmer

But what if you implement a Collection by inheriting from AbstractCollection, that generates the fibonacci series as an Iterator. Then do a .contains(-1) on an instance.

This suggests to me the need

This suggests to me the need for an AbstractFiniteCollection which adds termination guarantees for operations like contains. That's exactly the kind of property that still can't be maintained for what're essentially open discriminated unions when structures're defined not just by individual constructors but by the interactions between them (consider what happens to the standard list type if you take out the nil constructor, or correspondingly to a stream when you add it!).

I'm glad someone's finally

I'm glad someone's finally sat down and figured this out. I've been wanting something like this in Haskell for awhile

http://haskell.org/haskellwiki/Extensible_datatypes (now needs updating, their syntax is better too).

Maude (and the OBJ languages family)

Maude, and I guess the OBJ family of languages altogether, seem to have this feature built-in.
My knowledge of Maude is very shallow, but I think that not only all data types are open, but when importing a module, one can specify up to three modes of importing (protecting, extending or plain importing). This allows to add equations to existing functions (i.e. all functions are open too?).
Even more, the Maude type system supports subsorting, which helps to extensibility by adding really nifty tricks once one get used to it (heterogeneous lists for instance), And all this withouth polymorphism.
Well there is no polymorphism and no higher order either. Weird for me, but at least there is an advanced form of parametricity (haven't reached that part of the language yet though).

Maude has other (several) particularities compared to other functional languages. I've just started learning about it, and I find it a very intriguing language!
Opinions are requested, please...

Definitely lnteresting

Maude and the OBJ family all fall loosely into the category of theorem provers, and have unusually rich type systems featuring dependent types. They have strictly more expressive power even than Epigram, let alone Haskell or O'Caml or Scala, nevermind Java, C++, Python... they definitely also make different choices about starting from an open or closed perspective, since they want to be able to support whatever can be proved in either context. e.g., if you want to prove that a function is total, you can. They're rich metalanguages implementing, thanks to the Curry-Howard Isomorphism, rich logics.

Dumb question

The OBJ stuff sounds neat to me, but I'm pretty un-smart when it comes to PLT, and so I'm unsure what they would/not be used for: the apps I've seen mentioned are all proof-esque things, rather than 'down to earth' applications. Would e.g. Maude be something one would ever want to use to write their next video game server or whatever? I'm just trying to get a feel for it all since I'm unlikely to actually understand the details.

Not a Dumb Question At All

You probably wouldn't use Maude or Obj or Coq or Isabelle or MetaPRL to develop the Unreal 4 technology. What you might do, however, is use one of them to prove that certain properties of UnrealScript 4 or the new anti-cheating architecture hold. One of the things that I've mentioned to Tim Sweeney is a desire to do just that: to use, e.g. MetaPRL to formalize the design of his new language, perhaps even going so far as to develop a certified compiler for it, since the ability to do so is an explicit design goal for MetaPRL, and because the MetaPRL team is relatively local to me.

Re: how to apply OBJ

OK, thanks for the clarification! It is what I suspected from the docs I'd read about Maude and things written with it. There's an interesting division between having annotations (e.g. JML) in your code, and having very separate languages & systems. I don't have deep experience with either I did do some 'zed' back in my abortive grad school years), so I can't say if they are light-years apart. Certainly with JML you are restricted to what it supports, so you can't use 1.5 or Scala with it. External systems might provide one with a way to be more flexible. It sounds like one would design a system in the modeling / proof language, and then once that it all worked out make an implementation in Java/C++/whatever. The thing is, you immediately get into the problem of drift between your logical and proven spec, and the 'real' implementation?

Even though I profess to know nothing, if you convince Mr. Sweeney, please hire me as your lackey. ;-) It would be amazing to learn this stuff via the (possibly) fun path of games.

Exactly

raould: External systems might provide one with a way to be more flexible. It sounds like one would design a system in the modeling / proof language, and then once that it all worked out make an implementation in Java/C++/whatever. The thing is, you immediately get into the problem of drift between your logical and proven spec, and the 'real' implementation?

That's exactly right, which is a major reason why environments like MetaPRL are attempting to be not only theorem provers, but more extensive logical programming environments such that you can actually write a compiler for your language in them. Of course, this still begs the question as to why you should trust the portions of the proof engine that are written in the language that the proof engine is written in, so MetaPRL, and presumably other tools like it, include a formalization of (a subset of) O'Caml so that you can actually prove things about code written in that subset.

In practice, though, most people just don't care that much, so you're right: most people who even care enough to prove properties of algorithms and data structures with a theorem prover go ahead and do the actual implementation in Java/C++/whatever—languages for which a formal semantics doesn't even exist—and we simply hope that they know what they're doing.

The thing is, you

The thing is, you immediately get into the problem of drift between your logical and proven spec, and the 'real' implementation?

right, so the other tactic is to try extend a "real" language using ideas from a theorem prover, trying to find the right balance between expressive power and complexitiy. that's what is discussed in this thesis (a more coherent view of the final result, omega, is in this amazing paper (discussed several times here))

(and the thesis includes some "down to earth" discussion, imho).

Where to start? (theorem provers, etc.)

Does anyone have advice for where a person should begin to learn about things like Maude/OBJ/MetaPRL? Something like a tutorial level text, which covers the basic theory, but which also uses a particular concrete implementation to teach the syntax/system/idioms, and to help keep things grounded for a beginner. For example, if a person wanted to learn about functional programming in general, I might recommend Haskell and The Haskell School of Expression.

Or is this an area of study that isn't really (yet?) accessible to someone without an advanced degree in mathematics/compsci?

A Few Resources

Maude has a very nice-looking (I haven't worked through it) Primer available. In fact, as a non-user, the most impressive-looking thing about Maude is its documentation, although CINNI and the Open Calculus of Constructions sound awfully interesting, too.

OBJ3 has Introducing Obj. Once again, I haven't actually worked through this. If you do happen to use OBJ3, please tell Joe Kiniry I said howdy. :-)

MetaPRL has a tutorial that's a work in progress, but I've found pretty easy going. Of course, I'm already an O'Camler, and if I really get stuck, I'm an hour's drive away from the team at CalTech. MetaPRL also benefits from having actually been used to develop a compiler or two, so if your interest is in programming language semantics, MetaPRL is highly recommended.

Coq is another prover written in O'Caml. It has an extensive set of libraries, a book "Coq'Art," and has been used to prove security properties of a methodology in conjunction with Java Card technology.

Twelf is another prover with a particular focus on programming language semantics. This one's in Standard ML rather than O'Caml. In some sense the "granddaddy" of the programming language theorem provers, with its connections to LF and Elf, which are venerable names in this space. There's a correspondingly rich literature on LF/Elf/Twelf, but it's unfortunately also correspondingly less accessible than the alternatives—some of the newer systems seem to have been more influenced by Tom Melham's observation that "Formal methods will only succeed when they can be used by people who don't understand them," and attempt to close that gap with tutorials and generally better documentation.

All of this is, of course, IMHO, YMMV, IANAL, etc. etc. etc.

Polymorphic variants in Haskell

In a similar vein, Oleg Kiselyov's Polymorphic variants with HList

The existing HList records support, without further programming,
polymorphic variants: extensible recursive open sum datatypes, quite
similar to Polymorphic variants of OCaml. HList thus solves the
familiar (in OO, at least) `expression problem' -- an ability to add
new variants to a datatype without changing the existing code. We
should be able to extend old processing functions to deal with the
extended data type, maximally reusing the old code. Furthermore,
values of unextended data types should be processed by extended
functions without any ado.

paper moved