Type Inference in Object Oriented Languages

Is it possible to do type inference of an object oriented language (such as Java) without any type annotations? I've never seen it done and I'm curious about why that is. During my limited exposure (to type inference in general) I've been taught that it is possible to do type inference of rank 1 polymorphic functions but the general case (rank 2 and up) is undecidable without type annotations. How does this relate to the type of polymorphism used in object oriented languages?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Subtyping

The biggest problem with OO is not type inference itself, but subtyping with type inference. OO rely a lot on subtyping, which is difficult to handle by using classic unification. You can have a look at haXe that implements both type inference and subtyping.

I'm not sure it's difficult

I'm not sure it's difficult to handle with something very close to classic unification (you're potentially going to substitute type constructors in various positions for their supertypes, which might require some reworking of the theoretical framework but is a piece of cake to implement).

No, its not that hard,

No, its not that hard, unless you include multiple inheritance. At least in my experience, expanding unfication to handle multiple inheritance creates problems simmilar to race conditions in that the "unification" function can't be asynchronic anymore. But perhaps where are other ways to handle the whole thing than those I thought of.

It is

I'm not sure I'm understanding what you are saying, but you cannot just arbitrarily go to supertypes, because of co/contravariance issues.

Unification only is applicable to type inference because you classically just solve type equations. With subtyping, you only ever have inequations, and generally no way to simplify them. So you cannot derive simple type terms as principal solutions. And it kills you complexity-wise.

How difficult?

So what would the time (or space) complexity of solving these inequations be? Is it a decidable problem?

Finding out whether the

Finding out whether the inequations have a solution isn't usually all that bad in practice, at least for type systems that are basically ML plus subtyping.

The fatal problem is that there is no "most general" solution to them. You really, really need a most general solution in order to simplify the inequations into a form that is convenient for a programmer to read. So the types you get out of OO type inference are these big bags of type constraints, whose shape depends on the precise details of how you implemented type inference, and they are basically incomprehensible.

Variance / Subtyping

Yes, that's one of the big difficulties : keeping variance informations distinct from types.

Let's say you have :

function id(x) { return x }

In classical ML, the type is 'a -> 'a but in OO subtyping, it's [>'a] -> [<'a]. Also in presence of subtyping, determining the most general type is quite difficult, and leads to complex (sometimes unreadable) error messages.

O'Haskell might be a

O'Haskell might be worth a quick glance.

Scala is an OO language that

Scala is an OO language that supports type inference (ETM).

How stupid was I?

How stupid was I?

Subclass/prototype polymorphism?

I know that I have run into headaches with Object Pascal's lack of subtype polymorphism.

Imagine that you have this taxonomy:


// interface
-------------------------------------------
type TItemClass = class
public
SomeData : String;
.
.
.
end;
-------------------------------------------
type TContainerClass = class
private
FList : TObjectList; // container for objects
FCurrentObj : Integer; // index of currently referenced item
public
function CurrentObject : TItemClass;
end;
-------------------------------------------
Type TItemClassA = class(TItemClass)
ClassASpecificMember : String;
end;
-------------------------------------------
Type TItemClassB = class(TItemClass)
.
end;
-------------------------------------------
Type TContainerClassA = class(TContainerClass)
procedure DoSomething;
end;
-------------------------------------------
Type TContainerClassB = class(TContainerClass)
...
end;
-------------------------------------------
etc...
.
.
.
-------------------------------------------
// Implementation
Function TContainerClass.CurrentObject : TItemClass;
begin
result := FList[fCurrentObj] as TItemClass;
end;

Now, imagine that TContainerClassA will only ever contain instances of TItemClassA, and TContainerClassB will only ever contain instances of TItemClassB, and so on.

The problem comes in when you try and use the CurrentObject function: The compiler only, and only ever, returns the object as the parent class (TItemClass) - not the true subtype of the item, as the RTTI should identify it.

Thus you cannot do the following:


-------------------------------------------
procedure TContainerClassA.DoSomething;
begin
print(CurrentObject.ClassASpecificMember); // compile-time error
end;

The only way to do this is to either typecast the CurrentObject reference EVERY time us use it within TContainerClassA, for example:


-------------------------------------------
procedure TContainerClassA.DoSomething;
begin
print(TItemClassA(CurrentObject).ClassASpecificMember); // compile-time error
end;

or else, override the CurrentObject method for each subclass of TContainerClass, i.e.


-------------------------------------------
type TContainerClassA = class(TContainerClass)
.
.
function CurrentObject : TItemClassA; reintroduce;
end;
-------------------------------------------
function TContainerClassA.CurrentObject : TItemClassA;
begin
result := FList[fCurrentObj] as TItemClassA;
end;

This approach solves the need to cast the returned object of CurrentObject every time you you use it in your code, but the need to override it in every subclass defeats the purpose of having it declared in the base class in the first place.

The solution would be for the compiler to do proper evaluation of the members accessed, and use the least general type which still allow unambigous compilation. Or else defer evaluation and polymorph ("quack-ify(tm)") the type according to RTTI.

i.e.

print(CurrentObject.ClassASpecificMember); // member is unambigous so use the correct type, dammit.

Self, Starkiller

Check the Cartesian Product Algorithm for type inferencing and static compilation of dynamically checked OOP languages as used in Self and Starkiller.

Nemerle has type inference

See Nemerle. Depends how much you want inferenced though, as Nemerle still requires type annotations at the method level.

Chuck, Type inference for Squeak using the DDP algorithm

There's also Chuck, which employs a Demand-Driven, subgoal Pruning (DDP) algorithm to do type inference for Squeak.

From the site:

To my knowledge, DDP is the first type inference algorithm that is effective on larger programs (100+ kloc) in Smalltalk-like languages, while still giving context-sensitive results.

See Lex Spoon's dissertation or his papers on the subject for more info.