Implementing abstract classes automatically?

Here is an interesting construct I'm working with recently. Say we have an abstract class (trait) with one abstract method do:

trait A {
  abstract void do();
}

Now say we have two implementations:

trait B : A {
  override void do() { ... }
}
trait C : A {
  override void do() { ... }
}
object myUnderspecifiedA : A { ... }

Given an under specified object that extends A, the compiler could choose either B or C to "complete" it; perhaps we have a model that expresses that an object that extends A more often extends B rather than C, so why not just select that? Also, the object might only be able to extend one trait and not the other to implement "do," consider:

trait D : A {
  abstract void bar();
}
trait E : D {
  override void bar() { ... }
}
trait F : D {
  override void bar() { ... }
}
trait B : E {
  override void do() { ... }
}
trait G : A {
  abstract void foo();
}
trait C : F, G {
  override void do() { ... }
}
trait H : G {
  override void foo() { ... }
}

object myUnderspecifiedAWithF : A with F { ... }

In this case, the compiler could have the object extend C but not B, since in the latter case the bar method would be implemented in two different ways. Now, the completion of an object with additional traits is recursive: because we have the object extend C, we must now find a trait to implement the abstract foo method from an extended G trait, so we then also choose H (the final solution includes A, F, C, H). Finding a solution for the object then involves finding a trait that implements an abstract method, adding that trait to the extend list, then solving for additional abstract methods; the solution could always dead-end and require backtracking, so this algorithm kind of looks like a Prolog solver.

Has anyone heard of a class system like this before? My goal is to have a language where you can under specify a program and the compiler will fill in the gaps by finding traits to implement abstract methods that maintains consistency. A "best" solution could be chosen based on a model that includes "can-be" probabilities.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Inference, constraints and backtracking

Three ideas thrown quite randomly; pardon me for being possibly quite off-topic:

- elaboration of a type-class instance in presence of overlapping instances and potential cyclicity (you must abandon a branch if it leads to a loop) needs backtracking; there has been work to represent them as general constraint resolution problems (the prolog similarity you were mentioning), in particular by Martin Sulzmann. See eg. "A theory of overloading" with Stuckey, around 2005.

- debugging problems: in a debugger, I see some (part of) values, I have some information about the program types, what is the type of the value I'm looking at? Dynamic Heap Type Inference for Program Understanding and Debugging

- See the Mancoosi work for software package upgrades (with conflicting packages etc, and probability/preference due to different version numbers). See their publication page: I'm thinking in particular of "Autocompletion for Mashups" and the general work on formalizing software dependencies; if a package can "provide" an abstract feature, "depend" on other packages (or abstract features) and "conflict" with other packages, I think you've got something roughly similar in expressivity to your trait language. They express this as graph and logical constraints, run general SAT technology on it, but also use special-purpose graph algorithms to study some aspects of the problem (such as "co-installability" : can two traits cohabit in the same instance?).

Coherence

And like with overlapping type class instances, the big problem is what the type class community calls coherence -- or rather lack thereof in this case. In particular, the semantics of a piece of code will not be stable under weakening (i.e., extension of the declaration context). In fact, extending the context may even make the code ill-formed, because there no longer is an unambiguous, most specific instance to resolve to.

Both these properties would make such a construct highly brittle and rather unpleasant to work with. If the compiler is not even forced to find a "best" solution based on clearly specified, deterministic rules (if that's what you are suggesting, I'm not sure), then matters become even more unhandy.

Brittle and unpleasant? As a

Brittle and unpleasant? As a development abstraction, when we are prototyping and want our program to run continuously from the very beginning (live programming), what's wrong with the compiler inferring more of your program's missing gaps? If I say "shape", and the compiler picks circle or star, so what's the damage?

I envy Haskell programmers who can just design a program completely in their head with no need to iterate, write it down completely on the computer then have it run correctly the first time because all the types are right. I'm designing a language that is more of static/dynamic language, that supports exploration and experimentation, so its quite different.

The damage

The damage is that people will use such a feature because they can, and uses will end up in production code. Then a couple of months later some poor soul tries to extend some code, in seemingly conservative ways. And something seemingly unrelated breaks for non-obvious reasons. Happy debugging.

And yet dynamic languages

And yet dynamic languages with duck typing and such are used in a lot of production code while the world hasn't ended yet; i.e., let me raise your anecdotal evidence with my anecdotal evidence!

Seriously, there are a few ways of looking at this:

First, we can make underspecified programs very unstable by selecting alternative solutions on each program run, or even within a single run itself. The point is not to let the programmer get away without not specifying the behavior; rather its just to allow them to specify it later at their convenience. That the program keeps changing in some way provides good feedback about where their program still needs work. The compiler can definitely tell where under-specification has occurred, so programmers "know" where their program needs work without working in the debugger.

Second, treat this as code completion. The fact that the compiler is finding a solution to a chain of abstract methods is already much better than what existing IDEs are capable of. Right now, code completion asks me to work in steps, but here the compiler has already explored a bunch of solutions and their implications, so it can prune (or deemphasize) paths that don't yield good fruit, and it can probably rank them also.

Third, some forms of implementation inference should probably be stable and admissible into production code. Take default parameter bindings; why can't we have default implementations? Obviously, if someone changes the library to have a new default, then the program will break, but then the program has probably broken anyways.

Fourth, let's go even farther and think of this as Prolog. Perhaps any solution that matches the constraints the program has specified is "good enough."

I think we should explore other static type systems other than ones that require code to be complete and deterministic from the beginning; they basically force programmers to think in completely typed chunks. But actually, I'm really against dynamic/static hybrid type systems (e.g., latent typing in Dart) that are popping up as the "solution" to this problem. I think they have misidentified the problem and that we can do much better than that!

Essential properties

I'm perfectly fine with any form of inference, including default implementations. As long as it has two essential properties:

1. It is either fully deterministic, or guaranteed non-deterministic (i.e. different on *every single run*).

2. It's monotonic, i.e. the result stable under context extension.

Everything else is a maintenance nightmare (ask e.g. some Coq programmers about heavy uses of its tactics language). In particular given one of the following law of programming: Stuff intended for "debugging purposes only" invariably ends up in production code. If not immediately then eventually.

FWIW, I don't see what this has to do with dynamic typing, or with (*crossing fingers to exorcise the zombie terminology*) "duck typing".

For duck typing, if you use

For duck typing, if you use the object as a duck, it becomes a duck, and then you need an implementation for quack. Its not that the user implemented the method already, its that they need an implementation.

My interest is in comparing this to latent typing, where you iteratively go from a dynamically to a statically typed program. But I don't think typing is really the right way of looking at it; rather I think we really want to go from an under specified program to one that is completely specified. The examples don't match for sure, but I think they got the examples wrong.

Types are not the problem, they are the solution

With LightTable reaching $300K on Kickstarter today, I believe that "types are not the problem, they are the solution" to the problem of exploratory/experiential programming with great/useful feedback in an IDE with a language that doesn't force you to make decisions too early. So I think these projects are going in the wrong direction with dynamic languages, but I think to better compete with these dynamic languages, we have to rethink what static typing is beyond mere discipline.

Types or Something Types Provide

I think perhaps you aren't looking for `types` but rather some features commonly coupled to types (other than 'safety') - description, constraints, modularization or packaging. Would you describe a service or a protocol by a type? or a grammar? or a language? Any answer might suffice, given sufficient support in the description language.

Types might be "a" solution to some problems, but they certainly aren't "the" solution. There are many designs that can achieve features attributed to types without representing types in the language, or at least not distinguishing types from modules.

Regarding "compete with these dynamic languages" - I can only say I'm loving dynamic types in Haskell. In Haskell, developers can safely convert between dynamic types and arbitrary static types. In the absence of cross-module dependent typing, dynamic types are useful for generic data plumbing, dependency injection, configurations, contexts.

I'm currently designing a dependency injection, configuration, and build model in Haskell that uses types to model dependencies. There are a lot of close parallels with what you are seeking from your object model:

  • The build context is a set of rules.
  • Each "rule" is a Haskell function from a set of dependencies to a set of dependencies.
  • Each dependency is described by a Haskell type (of class Data.Typeable).
  • Tuples are treated as sets of dependencies. Unit is the empty set of dependencies.
  • Two rules conflict if they intersect in the set of provided dependencies. That is, each dependency (each Haskell type) is built at most once.
  • For configuration, rules at least have a simple precedence model - i.e you can add a rule with higher precedence than a default rule.
  • The requested product is a set of dependencies, usually a single type.

I'm modeling this as an Applicative (which allows static analysis that a solution is possible, unlike Monad). I've been thinking about how (and whether) to add soft or weighted constraints to this, i.e. to account for priorities, policies, and preferences.

Anyhow, while I chose to model dependencies as types, that is because it works nicely in Haskell - a synergistic application of static and dynamic types. But types aren't essential to this sort of description and constraint model.

Good support for weakly

Good support for weakly specified programs is a path that can greatly improve productivity. Rather than abandon it for potential stability issues, tackle the issues. Make them obvious in the IDE, and explicit in the model. Consider use of weights - weighted constraints and weighted logics.

I still don't really get

I still don't really get what type classes do and why this involves inference. If they are ascribe, does it mean multiple type classes could be ascribed to the same expression and a best fit has to be selected? I've heard they can be used for type-directed code inference, but I can't really figure out if this means something more than reified type parameters in templates. Definitely something I have to figure out.

The Mancoosi work looks interesting, but the feature I'm planning is more of a development aid: when you are prototyping a program, it would be nice if you could keep it "always running" by inferring what the programmer hasn't specified yet. I can't really see this as a feature for production code, and I never really believed in the generative programming techniques that their work seems to be based on. Maybe I'm being too closed minded? It would be cool if I coudl eventually design a language feature that required a SAT solver in the compiler!

Two traits are incompatible (they can't be extended by the same object) if they implement the same method. Traits can define new slots (fields), and they refine the types of these fields, making them more specific, causing them to extend traits that implement their abstract methods, or causing them to extend traits that introduce new abstract methods that the compiler must then solve for. We can't really express this as one bit vector to solve, since the slots beneath an object in the graph are capable of extending the same traits.

Type classes 101

Types classes are:

1. a mean to globally define some symbols (`+`, `length`, monadic `bind`, `zero`) as overloaded, where the overloading resolution is determined by the type of use. For example:

class Monoid a where
   zero : a
   plus : a -> a -> a


defines overloaded symbols `zero` and `plus`, determined by some type `a`

2. a mean to register "instances" (possible resolutions) in a global knowledge base

-- concrete instance
instance Monoid Int where
  zero = 0
  plus = (+)

-- parametric instance
-- provides instances for (List Int), (List Bool)...
instance Monoid (List a) where
  zero = []
  plus = (++)

-- instance constructed from simpler instances
instance Monoid a, Monoid b => Monoid (a,b) where
  zero = (zero, zero)
  plus (a,b) (a',b') = (plus a a', plus b b')

-- instances constructed from a dependency on another class
-- (which we assume defines `fromInt` and `add`)
instance Num a => Monoid a where 
  zero = fromInt 0
  plus = add

-- for all type operator f which satisfy Sequence,
-- for all type a, the type (f a) satisfies Monoid
instance Sequence f => Monoid (f a) where
  zero = empty
  plus = append

3. When using an overloaded symbol in the code, it is either resolved to a specific instance, if the type at use site is known and has such an instance, or, instead of making an arbitrary choice, the instance resolution is delayed by abstracting over type class membership: plus zero (1,2) will use the instance (Int, Int), while big_sum list = fold plus zero list will get the inferred type Monoid a => List a -> a.

The resolution technique is roughly to try all instances whose head constructor matches the type of use (ie. backtracking/search), resolve the dependencies, and use an agreed upon conflict handling technique when several solutions are possible (fail with an error, define priorities...).

This is not the same as what you're doing; in particular, it is generally considered that there is at most one possible resolution at each type. In your example you could consider `void bar()` to be overloaded symbols, but then instances `E` and `F` would be indistinguishable by their type alone; and instance B would not depend on any class that provides `bar`, but namely on those resolved from `E`.

To me software package specifications, as defined by EDOS/Mancoosi, are a closer match: what your user has specified is what is "installed on the system". Traits are installable packages that provide features, depend on other features, and conflict with other traits providing the same feature (I don't think this needs to be explicitly specified). The question is: "how do I install a package providing feature A, with the constraint that I don't want to remove packages that I have already installed? That's actually a quite natural problem in a package management scenario.

Thanks for the explanation,

Thanks for the explanation, this is quite useful! I don't think I like type classes, but this is because I'm in the camp that believes inheritance (implementation reuse) should mostly be conflated with subtyping.

I see how the Mancoosi work is related, but I'm looking at a more fine grained feature that is more language-level vs. what I think of about components. So its like: what traits can I extend to solve this abstractness in relation to traits that are already extended by my objects, then it is very similar to a package management problem.

There is another use for this work that I cannot really discuss in a lot of detail yet. But we can use this technique to "annotate" a generic specification into a more specific specification. That is, the abstract methods are gaps that drive automated synthesis according to some seed context.