Different approaches to letting a programmer define interface implementations.

I'm looking at how different statically typed languages tackle the problem of letting you define types and define interface implementations for those types.

For example, to define an implementation of interface I for type T, Java uses interface methods, Haskell uses typeclasses, Scala uses implicits.

The disadvantage of the Java approach when compared with Haskell/Scala:

  • You need a value of type T to call the method.
  • equals() isn't symmetric.
  • Must implement interface for T in the same module as T. This is probably not much of a problem for equality/ordering/hashing, but can be problematic for interfaces that aren't part of the core library.

Haskell approach vs Scala approach:

  • Haskell only allows one interface implementation per type, whereas Scala lets the programmer specify the implementation explicitly at the constraint site, if necessary. I can see arguments for both.
  • Scala's implicits allow more simple overriding. I don't see why you'd want to override equality/ordering, but maybe redefining hashing could be useful?

What do you think is the "right" way to do this? Given my limited knowledge right now, I think I'd go with something Scala-like and maybe make some modifications. For example, I think the "only one implementation per type" restriction of Haskell might be useful.

Is there a good survey of the different approaches?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Mixins + virtual method

Mixins + virtual method dispatch is safe enough. Modular abstract method checks ensure safety. I find the OO solution to this problem to be much more elegant than the functional solution.

RTFM

We're discussing open data types. Read the article, the authors are aware of the problem and suggest a kludgy solution to deal with that.

Tell me when its implemented.

Ur provides what you're looking for (for FP)

Any language based on modern row polymorphism has the necessary expressiveness.

Row polymorphism doesn't perform often

There are a large number of type inference mechanisms people prefer to avoid because the type inferrer algorithm has a bad complexity.

It boils down to what you can talk about intensionally and extensionally regarding types. Reasoning directly over the definition of a type, instead of abstractly about the label and properties of it, seems to involve lousy complexity.

You can probably get away with row typing in small languages but I doubt you'll ever find it back, as the preferred method of reasoning about types, in compilers suited for programming in the large.

(I stopped reading type algorithms which don't have complexity results. You can probably reduce various problems with lousy complexity to row typing problems.)

Open Data Types

There is a type-level value level equivalence, which results in repeated constructs at both levels:

type-family = datatype
type = constructor
type-class = record type
type-class instance = record value
type-level-function = value level function

It seemed fairly obvious working on the HList paper that a more-or-less mechanical translation could lift the value level definitions to the type level. For example:


data MyInterface a b = MkMyInterface {
fn1 :: a -> a -> a
fn2 :: a -> b -> (a, b)
}

to

class MyInterface a b where
fn1 :: a -> a -> a
fn2 :: a -> b -> (a, b)

So an open-function is just like a type class, and the open datatype is just like a type-family, only at the value level-instead of the type-level.

The open/closed issue is exactly the same whether you operate at the type level or the value level, I have no idea why closed is the norm at the value level, and open the norm at the type level, but these decisions seem arbitrary, and the functioning of the other shows how the semantics works well for both variants.

So I would suggest datatypes and functions can be declared open/closed and that selects the semantics (behaves like type-families, types, type-classes and instances; or behaves like datatypes, constructors, functions and patterns) - but this behaviour applies whether at the type level or the value level. You could then also specify 'static' to force the definition to the type level only, 'dynamic' to force it to the value level only, or let the compiler generate both type and value level variants and optimise as much as it can to the type level.

Closed vs Open

I am not sure I totally get what your hinting at but, for me, it all seems to boil down to the level of indirectness you need.

If you can get away with a closed world assumption, you express things directly in a closed world, and everything happens at the value level which also means you have some guarantee that you'll be as efficient as possible.

If you need an extra level of indirection, reasoning over an abstract property, you need something like type classes or interfaces. One of the most common mistakes in Haskell programming is probably choosing an extra level of indirection, type classes, when you don't need it.

And then you can stack the levels of indirection, late binding over late binding, find out that it doesn't perform, and introduce meta-programming constructs.

It all boils down to walking the tight rope between abstraction mechanisms and performance.

Best match first

If you can get away with a closed world assumption, you express things directly in a closed world, and everything happens at the value level which also means you have some guarantee that you'll be as efficient as possible.

You can have open world at the value level too, you just need to use best-matching pattern instead of first matching pattern semantics.

Try it

Sorry, but I doubt anyone will want that kind of assumption on pattern matching. You can implement it, of course, but I think you'll end up with a dead language.

(And then there are the performance issues. It boils down to a compiler where you can only start compiling after you've read in the whole program and you can start doing whole program analysis. You can do that, but I seem to be interested in programming in the large languages only, therefor I reject it.)

Works for inheritance

Are you saying C++ and Java have performance issues and are not suitable for programming in the large? The earlier equivalence shows a mapping from inheritance hierarchies to datatypes, which would make C++/Java inheritance an open datatype. Also an exact match is always the best match - so for simple cases it is trivial.

In any case I did not say all datatypes should be open, I said you should be able to chose open or closed. So when you only need to chose open when you want something that behaves like a type-class (but dynamically on runtime values).

Considering unification and other operations on the AST I gave earlier, I see no problems at all:

unify x y = FAIL
unify (Var x) y =
unify x (Var y) =
unify (App x y) (App x' y') =
unify (Lit n) (Lit n) =

You could later add

unify (Prod a b) (Prod c d) =

without effecting anything, however if you added:

unify (Var x) (Prod a b) =

You would override the earlier selection. If you don't want this then use a closed datatype.

No I am not

You still equate pattern matching to late binding whereas I explicitly stated you shouldn't. Get it out of your head.

Your example is too trivial and I am too bored to write another trivial example which would show that it's in general unfeasible to work with a language where 'at random' alternatives are suddenly inserted in between other alternatives.

Suffice to say that I wouldn't know how to stick alternatives in between default cases.

As I said, if you really believe in it then implement it. I don't think it'll sell.

Why

You still equate pattern matching to late binding whereas I explicitly stated you shouldn't. Get it out of your head.

Late binding is simply choosing a function overload at runtime. What is the difference between choosing a function instance by type or by constructor? If you are late binding, the type has to be a runtime value, which means it is a tag attached to the value, which is the same thing as a sum datatype (tagged union).

Effectively with type-classes you start from static, and insert a tag if you cannot resolve the instance statically. With a datatype you start with a tag, and can optimise it out if you know which part of the sum it is at compile time.

our example is too trivial and I am too bored to write another trivial example which would show that it's in general unfeasible to work with a language where 'at random' alternatives are suddenly inserted in between other alternatives.

I don't know how anyone copes with type-classes then, as they are open.

Turing Tarpit, again.

I agree with you that from an operational semantics perspective there isn't a big difference. But that's again the Turing tarpit.

Late binding provides an abstraction mechanism which you can largely compile out during whole program analysis. But that doesn't imply a system where an abstraction mechanism where alternatives are compiled into functions will handle most cases well, or that you'll be able to make much sense of it.

The abstraction mechanism, with type classes/interfaces, offered to you 'makes life simple' exactly because you're offered the chance to reason about runtime behavior with types, which informally groups together a number of related values. These 'groups of values' are handled independently which is a nice feature while you're trying to make sense of it all.

Like Oleg, you 'flattened' the world to arrive at a system without that abstraction mechanism. I don't think it'll work.

(Please notice that you don't pattern match on types, but on values. That's different.)

Unification

Actually matching instances of type classes and pattern matching on values both use unification. So calling instance selection 'pattern matching on types' makes complete sense.

I am not suggesting flattening the world. I am suggesting we use the same syntax for both levels, and that a single declaration can define entities on both levels at the same time, so that the compiler can chose a static or dynamic version as necessary. The semantics I am proposing makes a very clear distinction between static and dynamic and clearly only has two phases, compile-time and run-time. There is no staged compiler like with dependent types. Types only exist at compile time and all decisions using types happen then. At runtime there are only values. All I am proposing is to use the same syntax for both to avoid having to write static and dynamic versions of every datatype and function separately.

Ah. Success and Goodbye.

You've must have read 'Terms, and all that' by Nipkow. Unification is the base of everything. (Inside joke, chuckle and all that.)

I thought we were discussing open data types, still. Okay, change of subject.

I guess you can do that. There is a perspective that a type is a value from the view of a compiled program anyway.

But again, there is the note that while you can often compile out a lot of late binding, you can usually not compile out all of late binding. (To be honest, from a different perspective, I am not even sure languages like Haskell don't lack enough late binding; the list of shapes thing is difficult to express in it.)

I think we've discussed enough. Success with whatever you do.

Thanks

Thanks for the comments, I will probably come to see the wisdom of them with time.

in general unfeasible to

in general unfeasible to work with a language where 'at random' alternatives are suddenly inserted in between other alternatives

Heuristic 'best match' does seem non-deterministic from the user's POV, but can often be mitigated with explicit precedence.

As far as being 'unfeasible to work with', I feel you're exaggerating. I have some distaste for non-determinism built into the language. But I have been able to work well enough with operator overloading and multi-methods in presence of subtypes and default arguments, plus a variety of home rolled 'best match' systems for robotics control.

In practice, if there seems to be a difficulty matching a desired instance, one simply arranges a 'newtype' or similar such that the best match is more obvious.

Pattern matching on Types instead of values.

Open data types pattern match on values instead of interface/type classes which 'pattern match on types.' There's a difference. Agreed?

If you define open data type

If you define open data type as pattern matching on values, then I agree. But you shouldn't generalize properties of these open data types to open systems or the open world assumption.

For me, the whole distinction of 'type' vs. 'value' has become muddled quite a bit by my explorations into dependent types. Literal values can be lifted into an infinite variety of unit types, for example, and partial evaluation can be understood at the type level. Pattern matching certainly isn't the only technique available for type-level computations.

At least one distinction remains strong to me: that of staging and staged programming. It seems mostly a convention that we consider structures computed in one stage 'types' and structures computed in another stage 'values'.

The degree to which a system is 'open' - i.e. enables extension or manipulation without invasive modification - seems mostly orthogonal to the staging model. In a practical sense, it would not be a problem to model a system that is 'open' system at earlier stages then closed at later ones. For example, we could have a multi-agent system where the set of agents is known statically at "link time".

The distinction of staging and extensibility seems most obvious in a live/reactive model because it decouples from commitment. On the fly, we can update and recompute an earlier stage, and propagate the changes. Thusly, we can model agents entering or leaving the system.

I get the impression you're bringing into your discussion with Keean a few assumptions, about specific means of approaching the 'open world assumption' and 'programming in the large', such as open data types. Are there not other means to the same ends?

I would have loved to have avoided the Open discussion

Well, I certainly didn't bring it up and never gave it much thought. But the paper on open data types is both nice and hopeless. Nice that someone took the time to explore the idea, write it down, and get it published. Hopeless since it'll probably never work.

You can just shop around for other means. But personally I find a lot of current literature and thinking over types just a load of hogwash; but science will always rewrite its own history very carefully so I am waiting to be proven wrong on that account.

Didn't answer your question

As far as I know, Haskell does something odd, which is that it infers type with respect to a database of facts, instance declarations. (That is what I call 'a closed world assumption.') It's a design decision you can take but I wouldn't have. (I understand the rationale somewhat.) It gives different results with respect to type checking in simpler languages such as Java.

For my own language, I wanted to implement overloading in a Java-esque manner with interfaces. You can do that, but you will never be able to decide at runtime that a list is a list of integers when it doesn't contain any integer values.

I have no idea of the ramifications of inspecting runtime types only. It simply looks more suitable to programming in the large scenarios.

AFAICT, Haskell doesn't

AFAICT, Haskell doesn't infer types with respect to a closed database of instance declarations. That is to say, if there is only one instance for a given parameter of a multi-param type class in the database, it will not uniquely infer the other type parameters from this instance. Presumably because the database might later be extended - i.e. open world assumption. (Consequently, developers are often forced to annotate types explicitly when using MultiParameterTypeClasses.)

Haskell could presumably infer types from a database of typeclasses if permitted by FunctionalDependencies, but I was disappointed when I expected it to do so. Instead, it seems, TypeFamilies are necessary to express that sort of relationship, to make the inference requirement more explicit.

Might be

There were a number of things I doubted, the other post confirms this is mostly overlapping instances, but newer specialized non-overlapping instances might give the same behavior.

I wouldn't know. It's a thing I seemed to be interested in a long time ago.

Haskell has a open world

Haskell has a open world assumption by default, though if you use the overlapping instances extension you kinda break that.

Corrected HTML formatting

I have corrected the formatting, thanks.

As to OCaml's support for polymorphic recursion, the nice
syntax for doing this is indeed recent. However, polymorphic recursion
has been possible in OCaml for quite a long time, since OCaml got
first-class polymophism. The key paper, ``Extending ML with
semi-explicit higher order polymorphism'' by Jacques Garrigue
and Didier Re'my, has been published in 1997. It is a very nice paper,
btw, very clear. The implementation in OCaml does reflect the paper.

To write polymorphically recursive function before,
one either had to use records with polymorphic components
(and declare them), or use object types (which at least one
doesn't have to declare). It was ugly, but it worked.