## Constraint-Based Type Inference for Guarded Algebraic Data Types

Constraint-Based Type Inference for Guarded Algebraic Data Types

Quite a mouthful, but look:

Guarded algebraic data types, which subsume the concepts known in the literature as indexed types, guarded recursive datatype constructors, and phantom types, and are closely related to inductive types, have the distinguishing feature that, when typechecking a function defined by cases, every branch must be checked under different typing assumptions. This mechanism allows exploiting the presence of dynamic tests in the code to produce extra static type information.

I still have to grok this idea of "dynamic tests producing extra static type information" in its entirety, though.

## Comment viewing options

### It's a pretty simple idea at root, though verbose to explain

First, look at this signature for a small language with arithmetic and boolean expressions:

module type EXPR =
sig
(* The type of expressions. It has an extra type parameter, so that
we can record the type of an expression. *)
type 'a expr

(* For example, lit produces int expr-essions, and plus takes two
int expr's and produces a new one. *)

val lit : int -> int expr
val plus : (int expr * int expr) -> int expr

(* Likewise, t and f stand for true and false in our little
language, and we give them the type "bool expr".  *)

val t : bool expr
val f : bool expr
val and' : (bool expr * bool expr) -> bool expr

(* Because "int expr" and "bool expr" are different types, you can
never put in an int-producing expression in a context where a
bool-producing expression is wanted, and vice versa. *)

val is_zero : int expr -> bool expr
val ifthen : (bool expr * 'a expr * 'a expr) -> 'a expr

(* Let's give some functions to evaluate expressions, too. *)

val eval_int : int expr -> int
val eval_bool : bool expr -> bool
end

module Expr : EXPR =
struct
(* This is the private, implementation type of expressions. It's
just an ordinary abstract syntax tree as a datatype, with no special
features at all. *)

type t =
| Lit of int
| Plus of t * t
| True
| False
| And of t * t
| If of t * t * t
| IsZero of t

(* We equate 'a expr to t here. The 'a never shows up in t, which
leaves it free for us to stuff additional type constraints
into. This trick is often called the "phantom type" encoding. It's a
phantom, because the 'a in 'a expr never shows up in the definition
of t. *)

type 'a expr = t

(* The implementations of the creator functions are just the obvious
wrappers around the constructors. A polymorphic type is inferred for
them, but the module signature restricts them to the more limited
signatures that we want. *)

let lit n = Lit n
let plus (a, b) = Plus(a,b)

let t = True
let f = False
let and' (a,b) = And(a,b)

let is_zero e = IsZero e
let ifthen (test, cons, alt) = If(test, cons, alt)

(* Now, we want to evaluate an expression. Since an expression can
produce either an int or a bool, we create a datatype with the
alternatives, and then write the recursive eval function. *)

type value = Int of int | Bool of bool

(* The eval function does the obvious thing pretty much
everywhere. Notice that none of the assert false branches can ever
be tripped if you build an expression using the constructors in the
EXPR interface. *)

let rec eval e =
match e with
| Lit n -> Int n
| Plus(a, b) ->
(match (eval a, eval b) with
| Int a', Int b' -> Int (a' + b')
| _ -> assert false)
| True -> Bool true
| False -> Bool false
| And(a, b) ->
(match (eval a, eval b) with
| Bool a', Bool b' -> Bool (a' && b')
| _ -> assert false)
| If(test, cons, alt) ->
(match eval test with
| Bool b -> if b then eval cons else eval alt
| _ -> assert false)
| IsZero e ->
(match eval e with
| Int 0 -> Bool true
| Int n -> Bool false
| _ -> assert false)

(* Once you have the evaluator, you can specialize the type of
expressions it takes and then throw away the extra tag. *)

let eval_int e =
match eval e with
| Int n -> n
| _ -> assert false

let eval_bool e =
match eval e with
| Bool b -> b
| _ -> assert false
end


Now, there are two problems with this module. First, there's a whole lot of unnecessary match...assert false branches. Those branches will never be tripped in practice. Second, eval_int and eval_bool are basically identical; it should be possible to write a polymorphic function eval : 'a expr -> 'a.

In fact, with a well-typed expression, the following definition of eval is safe:

let rec eval e =
match e with
| Lit n -> n
| Plus(a,b) -> (eval a) + (eval b)
| True -> true
| False -> false
| And(a,b) -> (eval a) && (eval b)
| If(t,c,a) -> if eval t then eval c else eval a
| IfZero e' -> (eval e') = 0


This is a perfectly safe function of type 'a expr -> 'a (assuming you construct an expression using the safe interface), but would be forbidden by the ML type system because the branch Lit n -> n has type int and the branch True -> true has type bool.

Guarded types let you write functions like this, by specifying phantom-type-like constructors for a datatype, and using that to flow different type information into each of the branches of the pattern match expression. So if you have an 'a expr, you know when you see a Lit n that 'a = int, and so on.

### Thanks

Oh, thanks.

You could have just pointed in the direction of wobbly types paper...

As this one, it was already cited on LtU, but somehow I missed them both.

### This is a good paper. Folks i

This is a good paper. Folks interested in follow-up and more specifics, particularly pragmatics, should probably read An extension of HM(X) with bounded existential and universal data-types and Type inference with structural subtyping: A faithful formalization of an efficient constraint solver also.

There's an oh-my-God intersection between "An extension of HM(X) with bounded existential and universal data-types" and Wobbly types: type inference for generalised algebraic data types, posted here earlier. From the latter:

Our focus is on type inference rather than checking, unlike most previous work on GADTs. The exception is an excellent paper by Simonet and Pottier, written at much the same time as this one, and which complements our work very nicely [SP03]. Their treatment is more general (they use HM(X) as the type framework), but we solve two problems they identify as particularly tricky. First, we support lexically-scoped type variables and open type annotations; and second, we use a single set of type rules for all data types, rather than one set for "ordinary" data types and another for GADTs. We discuss this and other important related work in Section 6.
SP03 is "An extension of HM(X) with bounded existential and universal data-types," of course.

Couple these with concepts from Ontic, which was mentioned on the classic Lambda site, in which there's no syntactic distinction between types and terms, and perhaps you begin to approach a language that has both the features of modern type theory and the syntactic convenience of untyped languages.