Polymorphic Algebraic Data Type Reconstruction

In Polymorphic Algebraic Data Type Reconstruction, Tom Schrijvers and Maurice Bruynooghe create some magic: in addition to normal type declaration inference, they infer ADT definitions. From the abstract:

One of the disadvantages of statically typed languages is the programming overhead caused by writing all the necessary type information: Both type declarations and type definitions are typically
required. Traditional type inference aims at relieving the programmer from the former.

We present a rule-based constraint rewriting algorithm that reconstructs both type declarations and type definitions, allowing the programmer to effectively program type-less in a strictly typed language.

Is the algorithm guaranteed to terminate? Well...

...theoretically there must be some programs for which it does not terminate. However, as with Henglein’s algorithm, we are not aware of any actual program for which it does not terminate.

We quote Henglein to emphasize that the termination of the inference is only an issue for programs that have problematic types:

     . . . why type checking is no more practical than type inference:
     there are constructible ML-programs that fit on a
     page and are, at least theoretically, well typed, yet writing
     their principal types would require more than the number of
     atoms in the universe. So writing an explicitly typed version
     of the program is impossible to start with.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

What are the differences

What are the differences between the proposed type system and Ocaml's polymorphic variants (I've not read the paper yet)?

I could not be *more*

I could not be *more* interested in seeing one of those ML programs Henglein mentions. Can anyone provide a link to such a program?

It doesn't seem hard...

I wasn't exactly being rigorous but
this seems to do the trick.

(Edit: neelk below was rigorous. :) )

Here's how to do it.

The basic idea is that the size of the type of a program can grow much faster than the program itself.

First, consider the program:

   val delta : 'a -> ('a * 'a) 
   let delta x = (x, x)

Note that 1) if you apply delta to a term with a type of size N, you'll get back a term with a size of type 2N. So repeated application of delta will let us grow the size of the type exponentially, and we can iterate this process, to get a doubly-exponential growth in the size of the inferred type:

   let delta x = (x, x)  
 
   let f1 x = delta (delta x)

   let f2 x = f1 (f1 x)

   let f3 x = f2 (f2 x)   (* This function has a HUGE type *)

The reason the program's type can grow so quickly is because there's no form of abstraction in the type language, so common structure must be repeated over and over again.

This program is basically the reason why ML type inference takes exponential time in the worst case. If you hash-cons the type data structure, you can knock off an exponential growth factor, but since we grew the type doubly-exponentially, just writing down the type has to take at least exponential time!

The reason the program's

The reason the program's type can grow so quickly is because there's no form of abstraction in the type language, so common structure must be repeated over and over again.

What sort of abstractions in the type language would alleviate the problem?

let binding in types

What sort of abstractions in the type language would alleviate the problem?

Maybe there's a better one, but

pair : let f = \a:* . (a,a) in a -> f a
pair x = (x,x)

would do.

Elaborated...


(* val delta : 'a -> 'a * 'a *)
absval delta : let f = \a:* . (a,a) in 'a -> f 'a
let delta x = (x, x)

(* val f1 : 'a -> ('a * 'a) * ('a * 'a) *)
absval f1 : let f = \a:* . (a,a) in 'a -> f (f 'a)
let f1 x = delta (delta x)

(* val f2 :
  'a ->
  ( (('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)) ) *
  ( (('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)) )
*)
absval f2 : let f = \a:* . (a,a) in 'a -> f (f (f (f 'a)))
let f2 x = f1 (f1 x)

(* val f3 :
  'a ->
  ((((((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))) *
     (((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))))) *
    ((((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))) *
     (((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))))) *
   (((((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))) *
     (((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))))) *
    ((((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))) *
     (((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))))))) *
  ((((((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))) *
     (((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))))) *
    ((((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))) *
     (((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))))) *
   (((((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))) *
     (((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))))) *
    ((((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))) *
     (((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a))) *
      ((('a * 'a) * ('a * 'a)) * (('a * 'a) * ('a * 'a)))))))
*)
absval f3 : let f = \a:* . (a,a) in 'a -> f (f (f (f (f (f (f (f 'a)))))))
let f3 x = f2 (f2 x) 

One might want to double check.

I suppose one could also introduce letrec, etc. Has much work been done towards such an end? Has it proven useful?

"as"

OCaml has "as"-binding for types, which can be used for recursive (and unrecursive) bindings. It's actually not only useful in OCaml, it's essential. OCaml types may include unnamed row variables, as in
< m : t ; .. >

which denotes an object type with a method `m' and possibly some other methods. If you want to say that some other object type has the same (unknown) collection of methods as this one then you need to somehow refer to these "other methods", which you can do using "as". For example, the identity function at this type might be written
(< m : t ; ..> as 'a) -> 'a

You can also use "as" to give a much shorter type to neelk's function, but OCaml infers (or, more likely, prints) the expanded version by default.

Sometimes I have to ask...

Out of all the things that programming-languages research has shown us *are* possible, why do we want to do this one? I consider type definitions a *feature* of programming languages that allows me to define structures other than pairs/lists/trees, not a hindrance or hassle to avoid.

Algorithms don't kill people

why do we want to do this one? I consider type definitions a *feature*

But, like type declaration inference, there are many ways to use it. Just because it's available doesn't mean you need to leave all type definitions to it.

Or here's another use case. Imagine you're using the SuperHaskell3000 IDE, you rapidly type a bit of code, then click "Refactor -> Type Definitions -> ADTs -> Infer" (or the shortcut chord C-M-S-Tab-x-}-PrtScn) and bam, you've got a type definition that matches the code you wrote. Maybe it doesn't quite match what you think you want, so you make a small alteration to the generated type definition and see where the code is broken by your new model.

Refactoring-driven programming

Or as a variant on James' suggestion, the SuperHaskell3000 IDE could suggest how to change your ADT definitions when you broke code that used them. You add a new clause to a case analysis, the IDE simultaneously suggests a new constructor for your ADT. Transactional editing, of sorts.

Re: Refactoring-driven programming

The danger with that style of programming is that you do "well-typed mistakes". Trying to get code out fast you follow the suggestions given by the editor and you end up with code with the wrong semantics, that seemingly works.

Think of the clippy thingie in Microsoft Word: you'd think it tells you the "right thing" to do but in fact it just follows some heuristics and you want to press Cancel very fast.

I'm not sure

Programming languages are a lot more constrained than natural languages. I realize there are dangers here, but I really think we should expect to see smarter and smarter IDEs, and expect to see more code written in deeply collaborative interaction with our tools. I really believe this is the future.

Semantic editing

As you say the situation with programming languages is much better than with natural languages and the Office paperclip tool, which was always doomed to be somewhere between annoying and dangerous.

I guess it boils down to there often being a close connection between describing a type error, and proposing a solution. In many situations a canonical description of an error is essentially a description of some minimal change to the program that would fix it, and in those cases the description might as well be executable.