Scaling Type Inference

Coding Horror is a popular programming blog. A recent post concerns type inference in C#:

C# ... offers implicitly typed local variables. ... It's not dynamic typing, per se; C# is still very much a statically typed language. It's more of a compiler trick, a baby step toward a world of Static Typing Where Possible, and Dynamic Typing When Needed.

... I use implicit variable typing whenever and wherever it makes my code more concise. Anything that removes redundancy from our code should be aggressively pursued -- up to and including switching languages.

You might even say implicit variable typing is a gateway drug to more dynamically typed languages. And that's a good thing.

I think this post is interesting for a number of reasons, and the link to LtU is just the start. Now it appears the author is confused as to what “implicitly typed local variables” are, confusing local type inference (which they are) with dynamic typing (which they are not). Many commenters also suffer from this confusion. Other commenters rightly note that the inferred type is not always the type the programmers wants (particularly important in the presence of sub-typing). Furthermore, type inference harms readability. I'm reminded of recent discussion on the PLT Scheme mailing list on the merits of local and global type inference. The consensus there seems to be that while local type inference is useful, global inference is not.

So, wise people, what is the future of type inference? How useful is it really, especially when we look at type systems that go beyond what H-M can handle? How are we going to get working programmers to use it, and understand it? Do we need better tool support? Do we have any hope of better education for the average programmer?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Final Question

Given the author says " Anything that removes redundancy from our code should be aggressively pursued -- up to and including switching languages", should StackOverflow, his website, be programmed in Scala or Typed Scheme?

(I've obviously a partisan for the latter. :-)

"Local type inference

"Local type inference doesn't tell you anything you don't already know. Global type inference can."

Two sides

And this is both good and bad. Many people (rightly, I believe) claim that extensive use of type inference can harm readability. This is particularly important in a language with ad-hoc polymorphism.

All well and good, of

All well and good, of course. But can you have the good without the bad? Clearly, local inference does not give the same benefits as global type inference, so they are not comparable. If we are thinking of the advantages of global inferencing, can we have those without the dangers you mention?

Tooling

Global inference with an IDE adding the appropriate annotations to the display (but not the source per se unless asked) seems a good way to go. Failing that, I'd like a way to separate out annotations that are there to directly express the programmer's intent from ones that're there for ease of understanding - that way, if I want to carry out transformations that preserve well-typedness but not types then I can just regenerate the latter afterwards and only worry about the former. This might also help in tracking type errors on occasion - if it trips a "documentation" annotation but would be well-typed without, that might be suggestive.

But local type inference can...

...save a lot of typing, and it sure makes refactoring easier.

Of course, if that's ALL you want, you can just use type aliases (typedef or similar).

Since I am in a sarcastic

Since I am in a sarcastic mood: People who type slowly, perhaps should not be programmers.

Alternatively: By the time you have enough practice to be a good programmer, the amount of typing (on the keyboard) is less of an issue than the amount of typing (in the code).

Or else they should use Perl

not much typing of either sort there... :)

Not the problem

Modern IDEs handle "the typing" aspect of it. It's the reading aspect that is the problem - too much clutter, too much boilerplate, etc..

This is a double edged

This is a double edged sword: I want the addiotnal stuff when I am reading code, provide of course that it is readable and not line noise...

Read vs write (vs review vs refactoring...) modes

Why couldn't a tool derive a good read mode from a piece of source code, including (inferred) type annotations, documentation, comments, reordering the declarations to follow some narrative hints, unit tests, referenced papers? The more I think about reading vs writing code the less I like this WYSIWYG nature of editors today, the linear structure used today is just one path in the possible narratives that would be interesting to read. Actually it makes me sad that the Smalltalk browser was created so long ago and it still is the state of art for this (don't get me wrong I love the browser, but we could do so much more).

The idea of tools for

The idea of tools for producing different representations for reading code is not new. Eiffel is a prominent language that advocated exactly that (though not for type annotation, naturally).

What is local type inference?

Does this refer to module local type inference? Or is does it refer to the restricted flow of type information through the parse tree (at all levels)?

My Definition

My definition of local type inference is type inference within a local scope which is not dependent on anything other than the explicit types of local symbols. Practically, it means the types of arguments and return types of functions or methods must have type annotatons, as well as any non-local symbols such as global variables.

Here is an example of a function written in GScript, our internal language that uses local type inference:

  function calculateTotalSalaries( emps : List<Employee> ) : double {
    var salaries = emps.map( \ e -> e.Salary )
    return salaries.sum()
  }

You have to annotate the parameter and return type, but everything else is inferred:

  • the "e" paramter in the closure (we use the haskell syntax for closures) is inferred to be of type "Employee"
  • the "salaries" variable is inferred to be List<double>

All of this inference depends only on the declared types of the parameters and the map and sum functions in a very straight-forward way, with no complicated multi-pass inference algorithm necessary. So you get a lot of bang for your buck.

Cheers,
Carson

My experience with writing

My experience with writing highly parametrically polymorphic code in C# (2.0, but the very limited type inference of 3.5 wouldn't help), type inference is crucial. The more "interesting" our types get, the less we want to write them down. In the particular example I'm thinking of, the type parameter instantiations were drowning out the content.

My impression is the "working programmers" have little or no issue with type inference being added to their (statically typed) language of choice. All, Microsoft say, has to do is implement it (which they've partially done, but they could go a lot further). No "education" or "marketing" is necessary.

the type parameter

the type parameter instantiations were drowning out the content.

Totally, 100% agreed. Tuples above 3 parameters are unusable in C#, sometimes forcing me to resort to arrays, and all the dynamic checking that involves.

I'm unconvinced...

Other commenters rightly note that the inferred type is not always the type the programmers wants (particularly important in the presence of sub-typing). Furthermore, type inference harms readability.

These sound fishy to me. The inferred type (at least under C#) is the type that the compiler can guarantee to be correct -- that is, the type that produces a correct program. If the programmer wants a different type for some reason, then they'd better use a transformation that maintains correctness, or they're coding through their hat. I just recently ran across a case like this in some production code, where type inference was choosing the correct type, but the programmer wanted to use a subtype which wasn't always valid. The code, of course, broke during testing, and was eventually rewritten to rely on inference.

In practice, I find that I use type inference under C# (along with the support of Visual Studio) as an interactive conversation with the type system. I'll indicate that I want to build an expression, C# responds with the appropriate type for the portion I'm working on, I respond by either using it directly or applying a transform to move the expression in the direction I want it to go. I get to lean on the language and tools to track the (sometimes overlarge) type system, and oftentimes it points out situations I may not have noticed "by eye". There's a lot less "wrestling with the type checker" at compile time.

As for inference harming readability, I seem to recall the same being said about generics, which we seem to have survived, as well as C++ templates, which have taken on a life of their own (and in some cases, leapt from the lab table and run off to frighten the villagers). I usually chalk it up to novelty, and expect that eventually, our children's children will ask "did you really have to declare storage types by hand, grandpa?"

Correction

Jeff Atwood is definitely NOT confusing local type inference with dynamic typing. He is merely hinting at the similarities between the two
- with local type inference you don't have to mention the type, even though C# is a statically typed language
- with dynamic typing you don't have to ever mention the type save for some exceptions, like when you call a ctor of a class/object

If you look at the code per se, outside of a compiler, you could say that the more type inference you have in a statically typed language, the more your code starts to look like it is written in a dynamically typed language. Thus giving you the benefit of being able to write generic code, as is the default with dynamically typed languages, while still having the compile time type checks.

Readability, and how it is not harmed in any way by "var"

Furthermore, type inference harms readability.

I don't agree. Particularly in an environment like Visual Studio, where the type information is primarily available from the IDE, not some overwrought declaration. How often is the declaration even in the file you are working on, much less visible in the viewport? If you are relying on the declaration syntax to increase readability, "ur doin it wrong".

Right, and this is what I

Right, and this is what I meant by "needing better tool support". So the burden of supporting type inference shifts to the IDE, similarly to how Lisp is unusable without editor support. This is fine, I think, but must be acknowledged.

Fine? We had this debate so

Fine? We had this debate so many time I am reluctant to start it again (really - those interested should read the archives, this is flame war territory around here). As a personal choice, I think tool support is nice but semantics are nicer. I want code I can read printed. I don't care how many characters are needed. Motto: If that's your bottleneck you are doing something wrong.

Views on code? Factoring types?

My bottleneck won't be the physical process of typing, but the process of going from "enough to be happy it'll be well-typed" to "I know the precise type and can serialise it into source code" can be slow and gets slower as the amount of detail in our types gets greater and greater. I'd far rather have a semantically-aware pretty-printer add inferred type annotations in appropriate places than have to hand-maintain them.

That said, most of our languages are appallingly spartan at type level compared to term level. Where are our local bindings, that we may wrap up a chunk of type behind a descriptive name and be done with it? Why doesn't the inferrer propagate all this stuff for us? Could we exploit techniques for common subexpression elimination to suggest where we might want to do so?

Over time this is likely to suggest a separation between a core type language and a metalanguage used to work with the inferrer/elaborator. It doesn't seem uncommon for proof assistants and the like to have a separation in this vein, either.

Is this the sort of semantic support (if any) you have in mind?

"a gateway drug to more dynamically typed languages"

Actually, i'd say it is the opposite: a gateway drug to more statically typed languages. With all the excitement around dynamic languages, statically typed language are going to have to add more type-inference to attract newer developers.

I'm strongly in the local type-inference camp. Annotating types at the function and constructor boundaries seems to be the best compromise between explicitness and implicitness. It cuts down dramatically on code bloat and doesn't require particularly fancy algorithms to implement, giving it the best power to weight ratio of the options out there.

There are more subtle advantages of annotating types at the function boundaries as well. For example, it allows you to lazily compile types: if Type A references Type B, you need only parse the declarations of Type B (and *none* of the types that B depends on) to compile Type A. Taking advantage of that allows you to have statically typed languages with the rapid-feedback cycle of dynamic languages.

Cheers,
Carson

So which is it?

I've heard static typing enthusiasts say declaring types is not a huge burden because type inference does this automatically. Now I hear some saying that (in the presence of subtypes), type inference doesn't work that well, or always give you what you want. All-in-all, I think I'll live in my mixed world of optional type declarations on top of a dynamic language, where I don't have to declare types unless I need to for performance reasons, and the few type errors that do show up are easily remedied. It just feels safer :-).

Now I hear some saying that

Now I hear some saying that (in the presence of subtypes), type inference doesn't work that well, or always give you what you want.

Which comment says that?

It just feels safer :-).

Sticking with the inferred type will always be "safe". What measure of "safety" are you using to make that judgment?

Lame type systems

Now I hear some saying that (in the presence of subtypes), type inference doesn't work that well, or always give you what you want.

The problem isn't subtyping but the particular subtyping choose by these languages (which weren't designed with good type systems in mind). I'm pretty happy typing p :: a -> Parser (a, Int, Int) in Haskell but the equivalent expression in Java<E> F<E, Parser<Tuple<E,Int,Int>>> p() is harder to read and write in Java, mostly because of bad syntax choices for the type language. Beyond that we have problems with inference because there's a poor man's unification for inferred types, so if I write use(build(new SubBar())) where <E> Foo<E> build(E e) and void use(Foo<Bar> foo) the compiler fails to infer the appropriate type for build given the constraints and the expression doesn't type check. Even worse as a programmer I have two ways to "fix" this: add an upcast to the build parameter (i.e. (Bar) new SubBar()) or mess with wildcards to change the use definition (i.e. void use(Foo<? extends Bar> foo)). If the type systems were designed from the ground up with inference in mind they would be significantly different and there wouldn't be more inference problems than we have in SML or Haskell.

Subtyping and type inference

The type inference problem in the presence of parametric polymorphism and subtyping like in Java and C# is significantly harder than in Haskell (98) or SML. You're very unlikely to ever get anything like those FP languages' type inference. Scala is a good language to look at in this regard. It too uses (only) local type inference which is more or less as much as one can reasonably expect. That said, you can do way, way more with local type inference than C# is doing (as Scala demonstrates.)

Both necessary and not

In Typed Scheme, because Scheme inherently uses subtyping, we were unable to choose an H-M style global inference algorithm. Further, based both on our experience with ML/Haskell and our experience with the HtDP style of giving a contract and a purpose statement for every function, we decided that that amount of annotation was a good middle ground. This also has the significant benefit that you can tell by looking at a function what the type is, which is often not the case in ML.

However, in some places, type inference should be considered mandatory. For example, with the exception of FX-89, no language I know of has ever had polymorphic functions which required explicit instantiation at call sites. For example, everyone wants (map add1 (list 1 2 3)) to work without an additional annotation. This, of course, requires type inference. We, like Scala, chose a solution based on Pierce and Turner's Local Type Inference system, but lots of other solutions are possible here (Java does inference at call sites as well, for example).

Tell the type by looking at a function

This also has the significant benefit that you can tell by looking at a function what the type is, which is often not the case in ML.

It is not much of a benefit, because, after (global) inference, the compiler knows the types and they can be easily written out and then you can just look at the minibuffer to see the types of variables.

Addition: One thing worth pointing out is that the term "global" can be misleading when interpreted informally. H-M infers principal types and the context for the inference of the type of a particular expression is not the whole program. Also, in ML, definitions are strictly ordered and mutually recursive definitions are explicitly grouped. So, after the principal type of a function (or any binding) has been inferred, it remains invariant no matter what definitions subsequently follow.

(Current) compilers only report types intermittently

There's a practical problem with this when the code fails to compile for any reason, and the compiler can't tell you what the type of a function is supposed to be. This can be particularly problematic if you're working on someone else's code which you aren't already intimately familiar with.

Explicitly annotating types, at least at the top-level function definition level, has real benefits for just about anything more ambitious than code that only one person is going to read.

Types in signatures

There's a practical problem with this when the code fails to compile for any reason, and the compiler can't tell you what the type of a function is supposed to be.

Actually, compilers usually report, in case of a type conflict, the expected and actual types. In most cases one of them is usually a good approximation of what the type is supposed to be.

Explicitly annotating types, at least at the top-level function definition level, has real benefits for just about anything more ambitious than code that only one person is going to read.

Which happens quite naturally in ML via module signatures. (And let's not forget that ML also allows type annotations on any expression and pattern.) However, given that one can see the inferred types, I rarely find it useful to have type annotations on the implementation side of a module boundary. The main exception is when I'm puzzled by a type-error (which happens rarely to me while programming in SML --- almost all type errors are immediately clear to me after I look at the offending expression(s)) and insert type annotations to test my assumptions. I almost always remove such annotations after I understand the problem.

Re: Types in signatures

Actually, compilers usually report, in case of a type conflict, the expected and actual types. In most cases one of them is usually a good approximation of what the type is supposed to be.

Yes, but when major changes are being made, just getting the types at the places where errors are occurring may not be enough.

I agree that module signatures address this, though. But this brings us back to an approach comparable to the "significant benefit" claimed for Typed Scheme - that you can tell by looking at a function (or the associated module signature) what the type is.

Benefit

But this brings us back to an approach comparable to the "significant benefit" claimed for Typed Scheme - that you can tell by looking at a function (or the associated module signature) what the type is.

I'm puzzled. How is requiring annotations a benefit of Typed Scheme in this context? The alternative being compared to is ML, which allows annotations to be selectively used by the programmer and the tool support that makes it painless for the programmer to see the inferred types is already available. I really don't see this as a particular benefit (or advantage compared to the alternative being discussed) of Typed Scheme at all.

It would be more reasonable to claim that requiring some type annotations is not much of a hindrance compared to not requiring them, because type annotations can arguably improve readability in the absence of tool support.

No puzzle

Sorry, I was trying to say that when module signatures are used, that the approaches are comparable, i.e. I consider module signatures to also be a "significant benefit" over no type signatures at all.

Type inference is needed for generic programming.

Generic programming needs type inference.
For example in C++, in quite a number of situations (constructing a lambda expression, combining adaptors over iterators or ranges...), you do not really care about the actual type of an expression, you only care about its proprieties, in a static duck-typing way.

Worse, when you write a function taking a higher-order function and applying it, the result type of your function depends on the result type of the higher-order function argument you took. In C++ this is usually solved by exposing it within the higher-order function object, but this is not always practical due to overloading, specializations etc.

The next C++ standard realized those needs and introduces auto v = exp;, decltype(exp) and auto SomeFunction() -> decltype(exp) to solve them.

You can then easily see where the confusion comes from: dynamic typing is usually associated with dynamic duck-typing (actually, according to the definitions I've read I hardly see how it can be anything else).
So the fact that C++-style generic programming is static duck typing brings confusion.

Second-order polymorphism

It's probably not worth getting too bent out of shape about the introduction of type inference into your favorite statically-typed language because type inference only goes so far. There will always be places in your code where type annotations will be required rather than optional, so there will always be a type language with which to make explicit type coercions.

You don't like type inference? Then, explicitly annotate the types of your values, have a cola and smile. Got a chunk of code from somebody who doesn't share your fetish for explicit type annotation? That's what the -i flag on your compiler is for. Use it.

There is no crisis. Calm down.