Interfaces vs. Type Classes

I've recently been looking at a notion of interfaces for BitC. The language already has type classes in the style of Haskell, with all of the pros and cons of these. The deeper my dive into interfaces, the more confused I became about the difference between interfaces and type classes. This either means that I'm hopelessly confused or that I've stumbled into something useful. I'm sure the LtU audience can explain my confusion. :-)

BitC has object-style methods in a syntactic sense, but no virtual methods or inheritance. In such a language, object methods are sugar. We also have a notion of interfaces, currently called "capsules". A capsule provides existential encapsulation of an underlying object combined with a known set of methods. In contrast to Java-style interfaces, a capsule type has an explicitly invocable constructor. This means that a capsule can have non-default instantiations for a given underlying type T. Capsule constructor arguments corresponding to method initializers are constrained to be compile-time evaluable. This has the consequence that every capsule "method table" is a compile-time constant in the same way that every type class instance method table is a compile-time constant.

Capsules may have either static of conventional methods. A static is method does not receive or require a reference to the encapsulated object.

The lifetime of a capsule instance is guaranteed not to exceed the lifetime of the object (if any) that it encapsulates.

Q: How is a capsule type consisting exclusively of static methods different from a type class, if at all?

Performance and Multiple Dispatch

Interfaces provide runtime polymorphism, and type-classes compile-time polymorphism. Type-classes provide a type-safe equivalent to C++s templates. So when compiling a type-class, like a template, the manipulation happens to the intermediate code, effectively generating at compile time specialised instances of the classes as required (this is the same as elaboration in Ada I think). So implemented well, there will be zero run-time cost, and the generated code would look the same as if templates or macros were used to generate it. Interfaces with method definitions have to vary at runtime (as you do not know what will be inside the existential capsule) you have to call the method indirectly via the method table and this causes overhead. This overhead can be non-linear in modern systems due to non-local memory (Numa) and cache-architecture (non-local references can cause collisions, or cache-thrashing). There are also extra optimisation opportunities when inlining the code statically with type-classes/templates.

So one answer is performance, and predictability (sometimes you don't want existential capsules appearing all over your memory), you may be restricted to stack memory for example.

The other answer is multiple-dispatch. Object systems generally only dispatch on the first argument, which is by convention put before the function name as in obj.fn(args...) but the difference to fn(obj, args...) is just syntax. Most template/type-class systems resolve overloading on the types of all arguments. An example where this makes a big difference is unification, where you need to unify two types A and B. The action taken depends on the types of both A and B, which in an object model requires nested uses of the visitor pattern (IE N visitor classes for B nested inside the visitor class for A). This gets big and ugly fast. With type-classes a single class for unification, with instances for the combinations of A and B is much neater in my opinion. Even if object systems do support multiple-dispatch on method selection there is the syntactic problem of why is A treated differently to B. Unification is applied equally to A and B, so 'unify(A, B)' looks better than 'A.unify(B)'. This also enables/encourages separation of algorithms from data which I think is a good thing.

By Keean Schupke at Sat, 2014-01-04 09:06 | login or register to post comments

Further Thoughts

Type classes plus existential types give you something with the same functionality as objects with interfaces and interface inheritance (type-classes provide interfaces and interface-inheritance, the existential type provides the runtime polymorphism), but no implementation inheritance. Closures can be used instead of existential types for an alternative form of runtime polymorphism that requires explicit upcasting and downcasting. For implementation inheritance you need to add extensible records (although there is no equivalent of C++s protected inheritance). The OOHaskell paper gives a good overview of all this.

By Keean Schupke at Sun, 2014-01-05 07:28 | login or register to post comments

Good summary

Setting aside the presence/absence of existentials, I think this characterization as compile-time polymorphism vs. run-time polymorphism is a very good way of capturing the difference. It goes a long way toward accounting for why explaining which one a beginner should use can be challenging.

I also agree that there is a difference in performance, but the right choice is not obvious. In the presence of JIT, of course, we can do polymorphic dispatch caching. In the absence of JIT, code multiplication can cause code to overflow a tier of instruction cache in small systems. The consequences of that can be pretty dramatic.

Whether OO languages dispatch exclusively on the first argument is a bit fuzzy. There is a certain kind of "second class" dispatch by means of method overloading in C++, and when the Koenig-style name resolution rules are taken into account it becomes considerably less clear whether this should be considered single-parameter dispatch or multi-parameter dispatch. I agree with the point I think you were trying to make; I'm just saying that real languages are messy, and sometimes that makes these statements complicated.

Separation of operations from data is a good thing, but it's purely a matter of surface syntax. Separation of operations from types is not a good thing. It is very often true that operations over scalar types - and even the supporting data structures and algorithms that implement those operations - want special treatment. This is the main reason that Haskell's overlapping instance resolution challenge is interesting, rather than being merely a side note.

Concur about unification, and I think that's a very interesting one. Parametric polymorphism over functions admits many implementations. Parametric polymorphism over methods is rather trickier to implement. Especially so if whole-program compilation is rejected as a design option.

Thanks a lot for your comments, Keean.

By shap at Sun, 2014-01-05 19:30 | login or register to post comments

Can you pass capsules implicitly?

Capsule constructor arguments corresponding to method initializers are constrained to be compile-time evaluable. This has the consequence that every capsule "method table" is a compile-time constant in the same way that every type class instance method table is a compile-time constant.

I take this to mean that when a program invokes a static method of some object, that object isn't actually needed at run time; only its type information is used, and that's only used at compile time.

Q: How is a capsule type consisting exclusively of static methods different from a type class, if at all?

I haven't delved into typeclasses much, and my understanding is probably a mishmash of several GHC extensions that might not be what "type classes" mean to everyone.

However, I'm guessing your capsules are pretty close to type classes, with implicitness being the main (maybe only?) feature you lack. Still, I think there are good reasons to care about implicitness, and thus good reasons to say what you have isn't a satisfactory example of "type classes."

Implicit arguments, including type class instances, are pretty liberating not only because they reduce clutter, but because they let programmers write utilities that are usable only in a narrow range of statically checked call sites, and then generalize them to a wider variety of call sites later on.

foo :: [Foo] -> Bar
...can become...
foo :: (Monad m) => m Foo -> Bar
...or perhaps more generally...
foo :: (FooSignature t) => t

(As long as I'm using Haskell examples, the FooSignature approach might require GHC's FlexibleInstances or UndecidableInstances in order to support an instance for a deep pattern like ([Foo] -> Bar). I'm really not sure where the limits are here; I just remember trying deep patterns like this and getting errors. I'm probably embarrassing myself in several ways right now.)

If you add implicit argument support for capsules, you'll probably be much closer to something you can call type classes. In fact, if you just define a single type class that carries capsules, I wonder if that would bring you almost all the way to an implementation:

...instead of writing this...
foo :: (Monad m) => m Foo -> Bar
...write this...
foo :: (HasCapsule MonadCapsule m) => m Foo -> Bar

(Again, I don't know enough to see if there are limitations to this approach, especially for the purposes of advanced type class variations like multi-parameter type classes, functional dependencies, type families, and so on.)

By Ross Angle at Sun, 2014-01-05 00:26 | login or register to post comments

Not sure what the question means.

Let me take your questions in turn.

By "static method", I mean something like a static member function in C++. This is a conventional function making no use of "this" and having no existential encapsulation properties. It lives within the namespace of the class type rather than the global name space. In C++ it has access to private member functions of objects of the same type, but I'm not sure yet how we are going to handle public/private in BitC, so that may not apply in our case.

Capsules are not type classes. A capsule type is an interface type. The main difference between the current BitC capsules and Java interfaces is that a capsule has an explicit constructor. In Java, when "class x implements interface y", the compiler binds the interface methods to compatible class methods having the same name. In BitC, there is (or will be) a convenience syntax that does this, but in the usual case there is an explicit construction step in which the programmer-desired binding is stated. So (a) there is no inherent dependency between the method names of the capsule the method names of the thing it encapsulates, and (b) by instantiating capsule with different parameters, two [or more] capsules can be constructed that wrap the same object but with different behavior. Finally, capsules do not support downcast to the underlying object type. Capsules share with interfaces a mechanism for existential encapsulation, but with heavier emphasis on the encapsulation than interfaces seem to provide.

The question of implicit capsule instantiation is an interesting one. A capsule instance is an object, and we don't need implicit objects. If you get as far as having an object, you're already holding what you need. The issue - which doesn't arise in any language I know of that has interfaces - is how to support an expression like:

ObjectType ob;

...

ob as CapsuleType

If ObjectType explicitly supplies a preferred conversion for CapsuleType, then we're done. But we also want a way to retrofit implementations of CapsuleType for third-party object types. It's easy enough to do that syntactically, but it brings with it the problem of instance selection in much the way that Haskell must select type class instances.

The reason that instance selection (in both cases) cannot be done in a purely lexical way is tied to parametric polymorphism. I think of this as the "three party problem". One party implements type T. A second implements interface/capsule type C. A third party defines the default instantiation rule for building C from T. No single lexical context exists - or is even possible - in which all three facts are known.

Please follow up if this didn't answer your questions, and thanks for asking questions that are helping to improve my clarity on this (and hopefully yours as well. ☺)

By shap at Sun, 2014-01-05 19:54 | login or register to post comments

Implicitness and instance selection

You were asking whether this capsule approach would count as support for type classes. Most of my lack of clarity is about what the common definition of "type class" is, but thanks anyway for reinforcing what you said about the capsule approach. :)

Now you're talking about (ob as CapsuleType) coercions, and I think that might count as a way to pass (obType -> CapsuleType) values as implicit arguments, if you can implement it soundly. For instance, it would mean you can define a function with a declared signature (a -> a -> Bool) that can internally call upon (() as (Eq a)) to do its dirty work. (Note that ob might as well be () for all type class instances, since we're representing them as capsules with only static methods.)

If you made the use of as apparent in the declared signature, I'm pretty sure this could be sound and could even be modularly typecheckable--and it would look just like an implicit argument.

(!=) :: (() -> Eq a) => a -> a -> Bool
(!=) x y = not (capsule.(==) x y)
  where capsule = () as (Eq a)

It's easy enough to do that syntactically, but it brings with it the problem of instance selection in much the way that Haskell must select type class instances.

If you want to avoid the problems of Haskell type classes, can you really call what you get "type classes"? :) This is something I can't begin to say, because I don't know the culture of this terminology outside of Haskell. I'm barely familiar with its connotations even in Haskell.

The reason that instance selection (in both cases) cannot be done in a purely lexical way is tied to parametric polymorphism. I think of this as the "three party problem". One party implements type T. A second implements interface/capsule type C. A third party defines the default instantiation rule for building C from T. No single lexical context exists - or is even possible - in which all three facts are known.

Personally, I think a good way to approach this is to export instances under specific names, or perhaps with other metadata, and then let each module declare its own rules for instance lookup in terms of that information (and perhaps even in terms of how the code behaves). That way, the lexical context is indeed possible inside a single module, but the module system does not automatically share these lookup preferences across multiple modules.

By Ross Angle at Sun, 2014-01-05 23:16 | login or register to post comments

Calling a module with custom preferences

Adding to what I just said:

Personally, I think a good way to approach this is to export instances under specific names, or perhaps with other metadata, and then let each module declare its own rules for instance lookup in terms of that information (and perhaps even in terms of how the code behaves). That way, the lexical context is indeed possible inside a single module, but the module system does not automatically share these lookup preferences across multiple modules.

A comment of yours in another thread shows you're already thinking in these terms! Here's your entire comment, just to make this thread easier to follow:

Ordering is a great example, because it's something you can explain to everybody. There are lots of other examples, including a whole bunch in the area of I/O. Named instances are a precondition to any solution, because choice requires designation. But named instances are not a solution in and of themselves.

Note that once you have multiple instances you lose a bunch of subsumption properties. Consider that any implementation of "sort" over a vector is going to rely on Ord over the index (of type int or Nat), and also Ord over the element type 'a. When the element type 'a turns out to be int or Nat, it does not follow that the two uses of Ord int and Ord 'a = int can correctly be merged/simplified.

In this case it sounds like you want the second Ord to be an explicit parameter. But I'm sure you would have thought of that, so what's eating at you is probably the general task of moving instance selection preferences across module boundaries, from one module's call site into another module's function definition.

Personally, I would see it as still being a parameter, just with fancier, more module-like syntaxes for passing it in and using (i.e. importing) it. I've actually been sketching out ideas for something like this, as part of a larger module system design, but I don't (yet) intend to pass around types this way, only first-class values. It would be hefty enough that I'd only want to use it for embarrassingly open-ended things like certain I/O operations, where there are so many possible configuration options that they have their own namespacing and conflict resolution needs.

If a module does provide preferences to another module this way, they'll have to be reified to some degree (at least until the full program is available for either execution or full-program compilation). Instead of implicitly using an instance that's a compile-time constant, the code would implicitly, dynamically look up an instance based on the nearest lexically surrounding import syntax.

By Ross Angle at Mon, 2014-01-06 00:40 | login or register to post comments

Lambda the Ultimate

User login

Navigation