Run time type checking

A couple of items that are loosely related. First, Sun is updating the Java Class File Specification.

The main difference is the introduction of a new verification scheme based on type checking rather than type inference. This scheme has inherent advantages in performance (both space (on the order of 90% savings) and time (approximately 2x)) over the previous approach. It is also simpler and more robust, and helps pave the way for future evolution of the platform.
I haven't read through the fine print of the changes yet - at some point I need to update my class file disassembler - but then I still haven't gotten around to the changes in Java 5 which were much less involved. One thing that I found interesting was the use of Prolog in the specification (predicates and horn clauses).
The type rules that the typechecker enforces are specified by means of Prolog clauses. English language text is used to describe the type rules in an informal way, while the Prolog code provides a formal specification.
On a different note, there is a new thesis paper on A Semantics For Lazy Types related to the Alice ML project. The specifications for runtime type checking are in the more formal operational specs of ML and the runtime type checking is more ambitious than that associated with Java.
Type analysis needs to compare dynamic types, which have to be computed from (probably higher-order) type expressions that may contain types from other modules. If lazy linking has delayed the loading of these modules so far, then the corresponding types are represented by free type variables. For the type-level computation in our model this means that it may encounter free type variables whose denotation depends on yet unevaluated term-level expressions. To proceed, it inevitably has to eliminate these variables by triggering the necessary evaluations on term-level. In other words: type-level computation has to account for lazy types. We give two strategies for this and show soundness properties for the resulting calculi.
In general, it means you still have to with dynamic type checking in static languages - though the meaning is different than the runtime typing that occurs with dynamic PLs.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

grammar

What word is missing from "still have to with dynamic type checking"? Usually, I can guess from context, and not bother you, but here I'm kinda hanging on the conclusion.

contend?

That's what I get for writing a story whilst in the middle of doing a bunch of other things.

Type erasure is fine to a point, but there's still type checking that has to be done at run time. In the case of Java, the type checking is mostly related to security. In the case of Alice ML, the type checking is used for open (distributed) programming.

Erasure is not a good solution

Type erasure only works if the values can be represented uniformly and are self-describing (i.e. tagged). Java's solution to this problem is to use boxing; since all Java objects have metadata attached that describes the type, and all references are represented as single-word pointers, this works fine.

When the language can have complex value types (e.g. tuples), this technique does not work anymore, because at the implementation level, the values cannot longer be represented uniformly.

I think this comes down to the simple fact that at some level between the source language and the implementation, the polymorphism has to either be removed (i.e. code duplication or hetergenous translation), or replaced with a form of indirection (boxing or homogenous translation). The polymorphism is either there at the implementation level, or at the source level.

Personally I think Java's solution is ill-advised; but hey, because there isn't any way to get complicated value types, so cheating with boxes actually works. It fits nicely with Java's over-reliance on allocation of temporary objects.

Even with escape analysis, Java objects are pretty heavyweight. Sun's Java5 implementation uses 8 byte headers. IBM's VM uses 12 byte headers. Ouch. VM and class library footprints aside, is there any wonder by Java heaps become monstrously huge?

Erasure works fine

Erasure works fine in OCaml for example, even though values don't have any tag or metadata describing their type. And there is polymorphism.

I don't know Java, where your statement may make sense, but I disagree that it "Erasure is not a good solution" in general.

The only problem with erasure in OCaml is that you cannot delay linking, you must do it all when types are available. That's where Alice ML is different.

A different erasure?

In the context of Java, "erasure" usually refers to the lack of precise run-time tags. In Java 5, the Java language gained parameterized types but the underlying VM didn't. So while the language's type for your object may be "List of Strings", the VM's run-time tag is just "List". This causes problems because certain language features depend on having accurate run-time tags (downcasts and array covariance).

I think the boxing problem is a separate issue and is related to the way Java does polymorphism. What would you consider a better solution? I think Haskell's typeclass feature is sometimes implemented not by embedding a VTBL pointer in the object but instead by passing it around as an extra parameter. This works fine in a language without downcasts.

Type class implementation

I think Haskell's typeclass feature is sometimes implemented not by embedding a VTBL pointer in the object but instead by passing it around as an extra parameter.

Actually, it's always implemented like that. You cannot OOishly attach type class dictionaries to ojects, because the type class mechanism is much more general - it supports dispatch independent of concrete objects. Standard example is that you can dispatch on the return type of a function.

I don't think this is related to downcasts either. You can actually encode something akin of downcasts with type classes. See the Type-safe cast, for example. Note however that Haskell does not have subtyping, so this is somewhat incomparable to downcasts in OOPLs.

Interesting

I wasn't familiar with Haskell's implementation, but I guessed that something like that must be so. When it comes down to the machine code level, the polymorphic code ultimately needs some kind of metadata to identify the types of values in order to do basic things like copying a value from one location to another, or dispatching to the correct operator implementation based on the type. Tagging values is one solution, but can be inefficient; passing an extra parameter is basically reifying the type and implementing the values as tuples that contain the type information as well. There really doesn't seem to be that large of a design space for implementations, but I suppose that these solutions are acceptable.

As for Java's covariant array types...besides being a language flaw from the beginning, this is the one place where Java almost managed to have parametric types, from the beginning. After all, it's straightforward to think of an "array" type constructor that simply takes an element type; and most VM's that I have worked on have a representation of the types that expresses arrays in this way. Java does reify these parametric types, however--the dynamic check on array stores requires that the element type of the array be known at runtime. So oddly enough, the broken covariant array types from the very first Java 1.0 were closer to the general solution than Java 5's (erased) generics....

It seems that there was a pretty big opportunity to unify arrays and generic classes at some point in Java's development, and it was not taken. Pity. It seems unlikely that any future revision of Java will undo this persistent design flaw due to backwards capability concerns.

Passing an extra parameter

Passing an extra parameter potentially opens up a nice avenue for optimisations as well, as general specialisation and partial evaluation techniques work.

subtyping

Haskell does too have subtyping of type classes, like Java's interface inheritance. For example if a type is instance RealFloat it's required also to be RealFrac, Floating, Fractional, Real, Num, Ord, Show and Eq.

Not the same as subtyping

While I see what you mean this is not the same as subtyping, because type classes are not types. Technically, this is "constraint entailment". One important practical consequence is that subtyping allows you to (implicitly) coerce to a common supertype and build heterogeneous collections. With type classes, you have to resort to (semi-explicit) existential types to do that. Of course, vice versa there are lots of things you cannot do with subtyping.

Subkinding?

I don't know, but isn't typeclasses kinds? and if so does that mean Haskell has Subkinding?

Kinds are an entirely different issue

A kind determines how many type parameters a parametrized type takes, and what are their kinds.

I thougth kinds there a more

I thougth kinds there a more general concept. a sort of meta type, ie type of types. And not restricted to just the arity of a type.

Yes, but type classes aren't

Yes, but type classes aren't types of types, they're constraints on polymorphic types and accompanying evidence. Qualified Types and Kinds aren't the same thing.

GHC's Core language has a kind system that's a little more complex than Haskell 98's - it distinguishes between boxed and unboxed types, for example. In System Fomega, the kinds're effectively the simply-typed lambda calculus lifted up a level.

Just thought of this and

Just thought of this and figured it's worth pointing out:

You can emulate qualified types with kinds when all your predicates only take one parameter, but if they take more than one you can't. So multiparameter type classes can't be emulated with kinds, for example.

Type classes as kinds

You'd also need to restrict the arguments of your predicates to be type variables (and ones that are bound at the "current" quantifier). This seems to be the case in Haskell 98, but not necessarily with the various TC extensions floating around.

Moreover, you'd need something like intersection kinds to capture multiple constraints...

Qi Prolog

It seems Qi's type checker is a Prolog.