The Type-System-Feature-Creep Death Spiral

(This is my first post on LtU, so please tell me if I do anything wrong!)

I've been chewing on this question for a while: How did Glasgow Haskell's type system get so complex? Miranda had a relatively simple Hindley-Milner-style type system. It could do a lot, so people started applying it to more complex tasks, and eventually bumped up against limitations. Haskell added type classes, and they overcame many of these limitations. After a few more iterations of this, we have multi-parameter type classes with functional dependencies. Similar extensions have occured across the language, though few as dramatic as the type class situation.

Haskell's extended type system is Turing-complete. This is rather disturbing, because now Haskell is not one language, but two; one on each side of the :: sign.

Would adding basic data types like lists and integers to the type system be a useful extension? Probably; it would assist with doing compile-time computation, something that many Haskellers have been (ab)using their type system for. How about adding I/O capabilities to the type system? Could come in handy. Who's to say what you can and can't do with types?

The conclusion, chillingly, is that carrying the type system idea to its logical conclusion would leave us with two languages, each with the same capabilities. Each time you added something to one language, you would want to add it to the other. You would have to update the compiler in two places. Any language design effort would be doubled.

This is an extreme example of what could happen, of course, but it definitely makes you ask: Is there a better way? Lisp has shown us that preprocessing code with code in the same language is a big win, both in terms of simplicity and expressive power. I propose that we do the same thing with the type system, and, in fact, that a type system will not be satisfactory until it is the same as the language it types.

I wish I had a paper to present on this; I don't. It just hit me an hour or so ago, and I need feedback on the idea before I can write a paper. So: What do you think? I am I totally off-target? Have you been thinking on the same lines, and, if so, have you got any more specific ideas?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I think you're trying to rein

I think you're trying to reinvent dependant types (types dependant on terms - so you get your type-level list by just using an ordinary list, for example). And yes, I do think they're the way to go - faking them with GHC-Haskell gets a little too painful.

What About Polymorphism?

I'm sorry, I forgot to mention this: I'm a dynamic-typing kind of person; that is, watertight type-checking at compile-time isn't very important to me; without polymorphism, I wouldn't bother with a type system at all. (With a sufficiently expressive type system, I wouldn't be opposed to static checking, though!)

While dependent types look very cool, and solve a lot of unsolved problems, it doesn't look like they help with polymorphism. (I haven't read the entire paper, so please tell me off if it does!)

Allow me to clarify what I mean: The sort of thing I envision is being able to implement, say, type classes, as a sort of "type system macro," in a library, where they can be changed. This would make type system research easier, because you could just change the library, instead of hacking GHC; this would also keep feature creep out of the compiler.

Aside from this, I envision real-world programmers overcoming type system limitations with these "type system macros," a bit like Lisp programmers overcome limitations of their runtime language using macros.

partial evaluation

Imagine programming in an untyped language. Eek! You try to make it typesafe, at least dynamically, by creating structures with a type-tag and a value. You program with these and your own versions of the language primitives that always check the type tags and report type errors at run time.

So far, so slow. So you run your source code through a partial evaluator. It will execute and eliminate much of your type checking code. Perhaps some will remain, to do with tagged unions. Perhaps other code will remain to handle potential run-time type errors.

If the consequence of partial evaluation is that the code for handling run-time type errors gets deleted as unreachable you know that your code passes static type checking.

So I think you could put the static typing issue into the larger context of partial evaluation. Would the attempt to produce a sufficiently capable partial evaluator lead to fame and fortune or obscurity and ruin?

Make it partial computation

If you actually go and read Futamura's own papers on the topic (see his DBLP entries), you'll see he was already there in at least 1988, and probably before that. His paper on the prototype implementation WSDFU (paper available as D-489 from the TOPPS group.

The only caveat: you need both a full-fledged automated theorem prover and a full-fledged symbolic computation system to make it work! Much worse though is the fact that some of these ideas are encumbered by patents. Futamura has gotten some fame from his scientific work, but I somehow really doubt that his patents have made him a fortune.

Let me see if I get what you'

Let me see if I get what you're suggesting: The language has no intrinsic type checking, users implement it themselves, and the compiler moves type-checking forward to compile-time where it can.

This solution is pretty far from what I was picturing: In this solution, type-checking code is intermingled with the rest of the code; in the solution I was envisioning, type-checking is separate from run-time code.

That's not to say it's a bad idea; I think it's a great idea. It blurs the line between type-checking code and non-type-checking code; it becomes "what we can do at compile-time" and "what we can do at runtime."

More links

I started a similar thread on the subject of partially evaluated latent types which contains links to further reading.

Goes deep into part of the probem

Thankyou for the links. Futamura's work is very advanced. He is transforming the seventy-one function

f(u) = if u > 70 then u else f(f(u+1)) fi
into
f(u) = if u > 70 then u else 71 fi
That is way more ambitious than seeing that if u is an integer you can carry out the recursions with type checks and just write in "integer" to the results type tag.

He also focusses on strict, functional languages, which is perhaps a necessary restriction if you want to make such deep simplifications to the program. Less ambitious work might apply more widely. I imagine that one could get a long way on removing dynamic type checks simply by propagating the constants that are set by object initialisers.

Type Specialization

IIUC the type specialization technique could give this result.

Good Question!

You may find Tim Sweeney's post regarding David McAllester's "Ontic" system worthwhile. Ontic very definitely does represent one approach to unifying types and terms in a consistent framework. More of Tim's thinking about types, influenced by Ontic, can be found in this post.

The Ontic specification is one of those papers that I keep going back to repeatedly, as I learn more about type theory: it's actually very approachable to anyone who's willing to work through the material in, say, Pierce's "Types and Programming Languages", and I agree with Tim's assertion that "There's something fundamentally new happening here that hasn't been seen in strongly-typed programming languages: a general means of combining types and values into a single Turing-complete framework that gains the expressive power of functional, logic, and constraint logic programming."

Sweet!

This is exactly the sort of thing I had in mind. Thanks for the tip!

(Incidentially, I made a few false starts on a system a lot like this a while back; I'm glad to see my attempts weren't completely misdirected.)

Ontic Spec link

The solution is simple: get rid of type declarations...

...and use plain ordinary functions as type declarators. For example, a point type could be defined by this piece of code (in an imaginary language of mine):

let point(xval, yval) = tuple (x: xval, y: yval)

Then the point function could be used both as a type and a function. When used as a type, it would mean the result of the function for specific arguments. The compiler would be forced to evaluate the function, in order to calculate the type, thus solving the problem you describe.

For example:

let point1 : point(integer, integer) = point(0, 0)
let point2 : point(real, real) = point(0.0, 0.0)

This solution solves the problem you mention and also makes the type system richer. For example, arrays could be declared as functions over tuple concatenations:

let array(t, n) = tuple (t) :: array(t, n - 1)
let array(t, 0) = tuple ()
let n = array(integer, 10)
let matrix3x3(t) = array(array(t, 3), 3)

Primitive data types would also be functions, allowing them to be used as conversion functions from one primitive type to another for example:

let n = real(integer(string(3)))

This is the goal, but whither termination?

Both Epigram and Ontic have this sort of goal, to unify functions and types. Epigram gets really close to unifying them, there's almost no difference between a type and a function. Ontic I haven't looked at yet.

But you have some problems there. One important goal of type systems is termination. You want to know that your type system will stop at some point. Therefore, type systems are not Turing complete, because termination of Turing complete systems cannot be guaranteed.

For that reason, Epigram is not Turing complete.

Hopefully some bright cookie will discover some way around this sort of problem.

naive question, but why should we be interested in termination?

We just want to extract the form of a value from a function; therefore, we compute the value in compile-time (i.e. execute the function) and store the resulting form as a type. We are not interested in proving that the function terminates for all types of values, but only for the values at the point of usage.

For example, we may have the following function that searches a list for a specific number, then returns that number:

let findNumber(number, head::tail) = if number == head then number else findNumber(number, tail)
let findNumber(number, []) = nil

It's obvious that we can't prove the function terminates. But we aren't really interested in that. If we use the function as a type, we may write:

let someFunction(a : findNumber(10, [20, 10, 40])) = a + 1

In the above function, the function findNumber is executed with the given data in order to find the type of parameter a.

In the same line of thought, we use the function point (see my post above) either for real or for integer or for any other type, primitive or not.

Anyway, I am probably saying something stupid here, so please feel free to correct me.

More to the point:

What's the worst that happens if a (compile-time) typechecker fails to terminate? And what about type inference algorithms (or functions used to define dependent types) which terminate but are non-polynomial? What if I encode Ackermann's function using C++ templates? (Ignoring the eventual and quick overflow on practical hardware?)

Such problems may be annoying; but I'd rather have the program enter an endless loop or unbounded recursion at my desk than at the customer's...

That's what I am also saying.

Executing a function at compile time will make the compiler understand if the function terminates or not. There is no need to prove, by a mathematical theorem, that a function really terminates for all possible input.

No, making the compiler execu

No, making the compiler execute a non-terminating function will only result in the compiler running for ever (ie not terminating).

And furthermore, many functio

And furthermore, many functions have an infinity of possible inputs - so any time you're running them on unknown input (user-derived input, perhaps), exhaustive testing cannot possibly terminate!

The point...

...I believe, is that since it is executing *at compile time*, the *programmer* can decide whether the *build stage* is taking longer than he wants. Then, he can either decide that his compiler is crappy, or consider whether he managed to write a non-terminating type expression and try to debug his types. But since he won't get an executable to ship to his customer until he makes typechecking succeed, it's only his own time wasted.

Maybe I'm missing the point...

...but is there any difference between compile-time and run-time when you are talking about evaluating functions? Perhaps some functions are evaluated only at compile time and some only at run time, but in essence all you've done is staged the run-time.

So the question back is what difference does it make when the function is evaluated. Evaluating a function has the same problem no matter what stage you want to evaluate it, unless there's a limiting syntax.

I think you are

A non-terminating function fails to terminate at both compile time and runtime. This is true. But at runtime, you don't have the luxury of terminating the function manually while it's running on your client's machine. But presumably compile time is completely within your domain, so you do have that luxury. Which is why you really don't want non-terminating behavior in an executable image, but it's tolerable during a build stage. Another option is for the compiler to simply stop after a large number of iterations and say: "I really think this type evaluation should have finished by now. Are you sure you want to continue?" That's essentially what happens in C++ when you blow the template instantiation stack.

Exactly.

I totally agree with your posts above. If a function does not terminate after a specific amount of time, the compiler would have to stop the compilation.

but..

Real men compile at runtime too.

I think it's all the same. Staging happens.

Frank Atanassow made a neat observation a few days back. Roughly paraphrased, he said: "Static languages are just dynamic languages with a libary that implements a static type system. This library lets you use a language in two stages, where the values of the first stage are types and the terms of the second stage are typed terms."

To me, that implies that there isn't any difference between compile and run time for function evaluation. I think that most of the runtime work of dynamically checked languages could be done statically, and that partial evaluation is probably the most extreme case of that.

Is there anything more static than partial evaluation?

Is there anything more dynamic than dynamically checked languages?

Stageable dynamically checked

Stageable dynamically checked languages're more dynamic still, but we've had lisp a long time now. Eval might even predate proper type checks in lisp for all I know.

Not everyone likes VMs

Ultimately, your processor only understands one language. The buck stops here. Unless you want to dynamically compile to machine code, you need a VM, and not everyone likes VMs or is willing (or allowed!) to use them in code. No, not every large program contains a bad Lisp implementation.

Compile-time is but a stage.

Compile-time is but a stage. Nothing stops you running the checking stage just once, building the output and shipping that.

Compilers are just programs

Some programs generate programs in various other languages - assembly, byte-codes, or even in the same language as original program being compiled. Some compilers are written in the same language as that which they are compiling (e.g., C compilers are usually written in C, with some tweaks here and there for the target environment).

Ohmu

You might check out work by Delesley Hutchins, which I've referenced here before.
In the language he describes, the expression 1+1 returns 2, but the expression 1+Integer returns Integer, i.e. you can compute with types just as with values. Includes partial evaluation, etc..

The power of symmetry: unifying inheritance and generative programming (ACM subscription required)

The author, DeLesley Hutchins, is a student of Philip Wadler.

"The Ohmu model unifies functions, classes, instances, templates, and even aspects into a single construct - the structure. Function calls, instantiation, aspect-weaving, and inheritance are likewise unified into a single operation - the structure transformation"

I think Hutchins is blogging

I think Hutchins is blogging at http://delesley.blogspot.com/.

I've been playing (amateurly) with function/type systems myself, trying to feel my way around the problem.

  • 5 is Integer | Constant
  • 5.2 is Double | Constant
  • Constant is Function[0]
  • Value[t] is Function[0]:t
  • Variable[t] is Value:t | Function[1,t]:This
  • Integer is Number
  • Double is Number
  • Byte is Number
  • Sin(Number) is Double | Unary[Number]:Double
  • Unary[t] = Function[1,t]:r
  • LargeNumber = (Number - Byte)
  • Add(Number, Number) is Binary[Number,Number]:Number
  • Binary[Number,Number] is Function[2,Number,Number]:Number
  • Data is Function[0]:This
  • Name is Value[String]
  • Address is Variable[String]
  • Birthday is Variable[String]
  • Budget is Variable[Number]
  • Person is Data with Name, Address, Birthday
  • Company is Person without Birthday with Budget
  • Soundex(String) is Function[1,String]:String
  • Variable ross = Person().Name("Ross").Address("home");
  • Variable sndx = ross.Name.Soundex;

The idea is that constants are functions, types are functions, records are functions, and types are combined by set mathematics, and can be parameterized. This has been a tough problem for me to get my head around -- my goal is to yield something like (what I think) Epigram does...create the most general definitions possible and be able to have the system show what it can "figure out" how to do, and where you need to fill in the blanks.

What does it mean...

...for a type to be a function? I see some things that look like types-as-functions in your list, but I'm not sure what the implications are of making types *actually* functions.

In my little system functions

In my little system functions can be instantiated with (); they are stateful. Types are parameterized and can be invoked with [], which returns a new type, as do the union (|) and other type-set operators.

SimpleNumberCollection = Collection[Number - Currency]

[Number - Currency] is an anonymous type. I could also have said:

SimpleNumber = Number - Currency
SimpleNumberCollection = Collection[SimpleNumber]

PTS / Haskell Modules

A couple of other people have recognized the passing similarity between a types-language and an expression-language. It turns out that LAMBDA (upper-case) from the second-order polymorphic lambda calculus is a replication of the lambda (lower-case) with Galois-connected abstractions of the type and value domains see Cousot's 1997 POPL keynote.

Of course, this can be generalized to N levels, yielding pure type systems---see Stehr's thesis and monadic type systems---see Barthe et al. 1998.

In a more practical sense, the same idea infuses the idea of using Haskell as its own module language---see Shields and Peyton-Jones' First-class Modules paper.

I think it also gives a formal description of effects, using the layered monad translation approach in Filinski's POPL 1999 paper on layered monads. My take on the question is how does multi-staged programming fit into this general languages structure?

partial evaluation

I think Alan has the ultimate answer on the question of typing:

So I think you could put the static typing issue into the larger context of partial evaluation. Would the attempt to produce a sufficiently capable partial evaluator lead to fame and fortune or obscurity and ruin?

Partial evaluation (or abstract interpretation) is indeed the most general intersection between evaluation and typechecking. Building a language around this realization is quite a challenge, though. All of the issues of termination and partiality that are fairly simple in languages like Haskell become compile-time problems and require careful treatment of head normalization and partial approximation, and some degree of implicit or explicit staging.

I do believe this is the next big step in computing. From machine code to Algol/C was the previous big step - the moves from C to C++, Java, C#, and Python are small steps in comparison. But actual attempts at implementing this are few and far between.

"Get rid of type declarations"

One other thing I wanted to point out. In the example above:

let point(xval, yval) = tuple (x: xval, y: yval)

A formal way to make this work is to represent types as identity functions. Then, every identity function represents both a type and a function (you can call it and use it to specify types), while non-identity functions have no interpretation as types. And then, the "point" function above happens to just be an identity function. Then you can think of "int" as both the integer type (and write f(x:int)=x+1) and the identity function on integers and call it with int(3).

So the answer to "can you unify types and functions" is affirmative, but the "should" variant of the question remains open.

Keep in mind that the unification above works because there is a unique isomorphism between the types and identity functions in a programming language. This is the essential criteria that justifies unifying two constructs in a programming language.

When one syntactically unifies constructs that are conceptually distinct, the result is less justifyable.

One example is the unification of functions and lists in LISP - which creates some very interesting possibilities for introspection, but it means that functions carry around a lot of observable metadata that breaks foundational properties like parametricity and extensional equality.

Another example is Java and C#'s unification of value types (like int) and object types (like Int). Though C#'s approach is more automatic, both create strange observable properties, such as exposing pointer equality on boxed Int's that differs from the underlying int equality.

In the long-run, such unification of disparate concepts will be recognized as "clever hacks" rather than valid programming language design practices.

Types and functions are not disparate concepts.

Functions are computations, types are forms that are the results of computations. Every expression creates a type of some kind. Every line of code is a type, with its own requirements.

The real advantage of using functions as type constructors is that a programming language can be minimized (i.e. requiring no special syntax for types) while being truly versatile, and at the same time minimize the compiler's work.