## Type inference and union types

Hi,

Please be gentle with me, this is my first post :-)

I am currently writing my master thesis (CS) on the topic of adding type inference to the formal specification language VDM++.

VDM++ is an object-oriented, formal specification language, typically used to model mission-critical systems. The syntax is mathematically oriented, using a lot of abstract constructs such as sets, sequences, maps and unions as well as more widespread language constructs.

If anyone is interested in the language spec, it can be found here:
http://www.vdmbook.com/langmanpp_a4.pdf

Anyway, to my question...
I am looking for some pointers to literature on the use of union types, specifically the problems which union types may cause in connection with type inference. In VDM++, unions correspond to set-theoretic unions of two or more types which may be disjoint or non-disjoint.

There seems to be plenty written on intersection types, but union types and ways to statically handle these seem quite scarce.

## Comment viewing options

### Take a look at Joshua

Take a look at Joshua Dunfield's paper Tridirectional Typechecking; it's about typechecking intersection and union types.

### i think there may be some

i think there may be some stuff at BU's church project that relates to this too, but I can't think of anything in particular off hand

### Thanks for the pointers

Dunfield's paper looks promising. Thanks.

### Wikipedia article

According to the wikipedia article on union types:
http://en.wikipedia.org/wiki/Union_type

"Most type inference algorithms cannot cope with untagged union types."

I am curious as to why this is the case. Would it not be possible to infer the most general type of all the types in the union, even if this would be the "TOP" type ?

### Not sure, but I suspect that isn't strictly true

At the time of type checking we may freely propogate information about the type of a union without worrying about what its compiled representation will be, so it doesn't make sense to me to say that the inference algorithm can not cope with untagged union types.

Perhaps what they mean is that compilers are often unable to prove which variant of a type an element of a union type is, and so the compiler is unable to remove type tags. This interpretation seems to make more sense.

Gavin

### The problem...

As I see it, the problem with untagged unions and just having the type inferencer generalize as it goes is that you end up with an inferred type which doesn't actually prove anything useful about your program. In other words it may be possible, but it's certainly completely useless.

### Not entirely useless

Type inference with untagged unions is not neccessarily completely useless. If a program contains any explicit annotations, then the inferred type can be compared with the explicit type in order to check well-typedness of a phrase.

### True dat.

I'm just too used to type inferece without any annotations... Still, I think a weaker version of my statement still applies: You may end up diluting the static guarantees to a point where their value is dubious :).

### Unions and type lattices

The way I see it, union types mess up the type lattice. If you consider the union of a bool and a real (bool|real), then this is an artificial least upper bound in the type lattice. Bool and real would never usually have a common least upper bound, other than the TOP type. In this way, any two (or more) types can effectively get a new artificial LUB. In principal the inference algorithm can keep inferring new union types, eventually approaching the TOP type since the union could include all the types in the type system.

Is this interpretation way off ?

### Don't use Wikipedia

"Most type inference algorithms cannot cope with untagged union types."

I am curious as to why this is the case.

Well is it really the case? Which type inference algorithms can or can not cope with untagged unions? Is it really "most"? How many type inference algorithms did they survey? That's the problem with Wikipedia, you should never trust what they say, because they don't cite their references.

I suggest you use Citeseer to get your facts from reputable sources.

Alternatively, my programming language research engine revealed several interesting papers which seem to contradict the wikipedia article when I fed it the terms type inference untagged union. I hope this is helpful.

### Slightly OT

That's the problem with Wikipedia, you should never trust what they say, because they don't cite their references.

That's not entirely true. Wikipedia policy is that verifiable references should be provided for everything. Good articles are generaly pretty well-referenced, and an article can't reach "featured article" status these days without being comprehensively referenced. But people still have a tendency to add uncited material. You can help fix this problem: if you see something that's uncited, either tag it with the {{cite needed}} template, or (if it's blatantly wrong) just remove it. Alternatively, if you see something uncited for which you know of a good reference, add the reference. Wikipedia is, after all, editable by anyone (for good or ill).

### Does not mix with polymorphism

Untagged unions make type inference combinatorically explode, because you can never unify two type informations - they may always be unioned instead of being equal.

However, the real problem is that untagged unions do not mix well with polymorphism - independent from type inference. And most type inferenced languages value polymorphism pretty high.

Just consider:

f [T] (x : String+T) = case x of
x1:T -> ...
x2:String -> ...

What is the meaning of f if you instantiate T with type String? Or with String+something? Any answer will be like a hack.

### What is a hack?

I don't think this is a typing problem, just say A + A = A.

The behavior of your example depends on how case is implemented. Either you require the cases to be disjoint, then this is an error, or either the first or the last match wins.

You may loose a few nice properties simpler type-systems may have, but that doesn't mean it is a hack imho.

### Branching on Type?

It seems to me undesirable to allow a computation to ask if a value is of a particular type, and that with sufficiently expressive types such a question will often be undecidable or intractable. So from my viewpoint, branching should be done on 'tags' and not on types (and algebraic types are 'tagged').

(If I've mis-used terminology, or if there is a better way of expressing this viewpoint, please correct me - I'm still working my way through the literature. Also, this is my first post. Hello.)

### Hacks, IMO

[Also meant as an answer to Thomas_C]

The behavior of your example depends on how case is implemented. Either you require the cases to be disjoint, then this is an error,

Which basically means that you cannot have polymorphic sums, and the type language is not compositional.

or either the first or the last match wins.

Which means that polymorphic abstractions might silently break if you instantiate them with the "wrong" type.

On a related note, untagged unions are less expressive than tagged ones, because you cannot have alternatives with the same type - which is often needed (just consider abstract syntax trees).

You can work around the latter problems by using auxiliary (nominal) types in which you wrap the alternatives. But then you have just implemented tagged unions the tedious way, without all the nice additional checks from the compiler you get with the real thing.

You may loose a few nice properties simpler type-systems may have

Indeed, in particular the type language is no longer algebraic. That can break quite a few things. Effective type inference is one of them.

Branching on (polymorphic) types is also likely to be vastly more expensive.

### Does not mix with polymorphism ?

"However, the real problem is that untagged unions do not mix well with polymorphism - independent from type inference."

Andreas >> Could you elaborate on this ?

Could anyone elaborate on this ? (edit)

------------------------
The language I am working with (VDM++) features both higher order polymorphic functions and the notorious untagged union types.

(As well as arbitrarily complex type invariants, but thats another story altogether)

### Type invariants

Thank you everyone for the feedback on union types.

VDM types also allow arbitrary invariants on type definitions.
Example 1:

SmallNat = nat
inv s == s < 100

This reads : Define SmallNat to be of base type nat (0,1,2,...), with the additional constraint that values must be less than 100.

Example 2:

instance variables
day : Day
month : Month
year : Year
inv if month in set {4,9,6,11}
then day else (month = 2) => (day

types
Day = nat1
inv d == d <= 31;

Month = nat1
inv m == m <= 12;

Year = int;

Where nat1 is a built in type ranging from {1,2,3,...}

I haven't come across any languages where these sorts of arbitrary type invariants are allowed. Doing a little digging, I found that Mercury (http://www.cs.mu.oz.au/research/mercury/) has a notion of "solver types" that seem similar to this.

I also came across "conditional types" and "constrained types". Does anyone know whether these notions are similar to the type invariants in VDM ?

Has anyone come across anything similar ?

These would present somewhat of a problem for a type inference algorithm, since the invariants may be of arbitrary complexity and thus cannot necessarily be evaluated at compile time.

### Has anyone come across

Has anyone come across anything similar ?

These would present somewhat of a problem for a type inference algorithm, since the invariants may be of arbitrary complexity and thus cannot necessarily be evaluated at compile time.

The name for this idea in its most general form is "dependent types", in which types can include arbitrary computations, possibly involving run-time values. You're right in noting that it presents a problem not just for type inference, but for type checking as well. There are a number of problems that can occur, the most obvious of which are:

• Compile-time type checking might not terminate, since the type language can express non-termination.
• The phase distinction between compile- and run-time may be blurred or lost completely, since some values needed for type checking may not be available until run-time.

There are a number of possible ways to deal with these. We might restrict the language of types in some way that allows us to guarantee termination and a traditional phase distinction.

Alternatively, we might sacrifice the phase distinction completely and treat types more like assertions, and possibly attempt to recover some static checking through, for example, partial evaluation. The problem with this approach is that it's hard for the programmer to understand and predict what guarantees the compiler can make about a particular program, and what checks will need to be deferred until run-time. The partial evaluation techniques required are also an active area of research.

This is only my own off-the-cuff survey of the state of the art here, and I'm sure there are omissions and mistakes. I hope someone more knowledgeable will correct them.

### You got it backwards,

You got it backwards, actually.

Having a phase separation means that you never need to evaluate a program term in order to decide whether a program is well-typed.

If the language lacks a phase distinction, then typechecking may require evaluating some program expressions at compile-time. For example, consider a program (in a made up language) like:

(* This is a program *)

fact :: Nat -> Nat

fact 0     = 1
fact n + 1 = (n + 1) * fact(n)

(* This is a type declaration that uses the fact function.
If fact of the argument is large, it returns a string, and
if it's small it returns a boolean. *)

foo :: m:Nat -> (if fact(m) > 1000 then String else Bool)

foo n =
if fact(n) > 1000 then
"a string"
else
true


But running the program at run time still might not require keeping any type information around.

### Yes

I thought that was what I meant to say, unless I'm misunderstanding you. Thanks for the clarification and example!