Run time type checking

A couple of items that are loosely related. First, Sun is updating the Java Class File Specification.

The main difference is the introduction of a new verification scheme based on type checking rather than type inference. This scheme has inherent advantages in performance (both space (on the order of 90% savings) and time (approximately 2x)) over the previous approach. It is also simpler and more robust, and helps pave the way for future evolution of the platform.

I haven't read through the fine print of the changes yet - at some point I need to update my class file disassembler - but then I still haven't gotten around to the changes in Java 5 which were much less involved. One thing that I found interesting was the use of Prolog in the specification (predicates and horn clauses).

The type rules that the typechecker enforces are specified by means of Prolog clauses. English language text is used to describe the type rules in an informal way, while the Prolog code provides a formal specification.

On a different note, there is a new thesis paper on A Semantics For Lazy Types related to the Alice ML project. The specifications for runtime type checking are in the more formal operational specs of ML and the runtime type checking is more ambitious than that associated with Java.

Type analysis needs to compare dynamic types, which have to be computed from (probably higher-order) type expressions that may contain types from other modules. If lazy linking has delayed the loading of these modules so far, then the corresponding types are represented by free type variables. For the type-level computation in our model this means that it may encounter free type variables whose denotation depends on yet unevaluated term-level expressions. To proceed, it inevitably has to eliminate these variables by triggering the necessary evaluations on term-level. In other words: type-level computation has to account for lazy types. We give two strategies for this and show soundness properties for the resulting calculi.

In general, it means you still have to with dynamic type checking in static languages - though the meaning is different than the runtime typing that occurs with dynamic PLs.

By Chris Rathman at 2006-10-11 20:28 | Implementation | other blogs | 12848 reads

grammar

What word is missing from "still have to with dynamic type checking"? Usually, I can guess from context, and not bother you, but here I'm kinda hanging on the conclusion.

By llimllib at Wed, 2006-10-11 21:51 | login or register to post comments

contend?

That's what I get for writing a story whilst in the middle of doing a bunch of other things.

Type erasure is fine to a point, but there's still type checking that has to be done at run time. In the case of Java, the type checking is mostly related to security. In the case of Alice ML, the type checking is used for open (distributed) programming.

By Chris Rathman at Wed, 2006-10-11 21:57 | login or register to post comments

Erasure is not a good solution

Type erasure only works if the values can be represented uniformly and are self-describing (i.e. tagged). Java's solution to this problem is to use boxing; since all Java objects have metadata attached that describes the type, and all references are represented as single-word pointers, this works fine.

When the language can have complex value types (e.g. tuples), this technique does not work anymore, because at the implementation level, the values cannot longer be represented uniformly.

I think this comes down to the simple fact that at some level between the source language and the implementation, the polymorphism has to either be removed (i.e. code duplication or hetergenous translation), or replaced with a form of indirection (boxing or homogenous translation). The polymorphism is either there at the implementation level, or at the source level.

Personally I think Java's solution is ill-advised; but hey, because there isn't any way to get complicated value types, so cheating with boxes actually works. It fits nicely with Java's over-reliance on allocation of temporary objects.

Even with escape analysis, Java objects are pretty heavyweight. Sun's Java5 implementation uses 8 byte headers. IBM's VM uses 12 byte headers. Ouch. VM and class library footprints aside, is there any wonder by Java heaps become monstrously huge?

By Ben L. Titzer at Thu, 2006-10-12 02:57 | login or register to post comments

Erasure works fine

Erasure works fine in OCaml for example, even though values don't have any tag or metadata describing their type. And there is polymorphism.

I don't know Java, where your statement may make sense, but I disagree that it "Erasure is not a good solution" in general.

The only problem with erasure in OCaml is that you cannot delay linking, you must do it all when types are available. That's where Alice ML is different.

By mrpingouin at Thu, 2006-10-12 06:53 | login or register to post comments

A different erasure?

In the context of Java, "erasure" usually refers to the lack of precise run-time tags. In Java 5, the Java language gained parameterized types but the underlying VM didn't. So while the language's type for your object may be "List of Strings", the VM's run-time tag is just "List". This causes problems because certain language features depend on having accurate run-time tags (downcasts and array covariance).

I think the boxing problem is a separate issue and is related to the way Java does polymorphism. What would you consider a better solution? I think Haskell's typeclass feature is sometimes implemented not by embedding a VTBL pointer in the object but instead by passing it around as an extra parameter. This works fine in a language without downcasts.

By Kannan Goundan at Fri, 2006-10-13 09:06 | login or register to post comments

Type class implementation

I think Haskell's typeclass feature is sometimes implemented not by embedding a VTBL pointer in the object but instead by passing it around as an extra parameter.

Actually, it's always implemented like that. You cannot OOishly attach type class dictionaries to ojects, because the type class mechanism is much more general - it supports dispatch independent of concrete objects. Standard example is that you can dispatch on the return type of a function.

I don't think this is related to downcasts either. You can actually encode something akin of downcasts with type classes. See the Type-safe cast, for example. Note however that Haskell does not have subtyping, so this is somewhat incomparable to downcasts in OOPLs.

By Andreas Rossberg at Fri, 2006-10-13 12:23 | login or register to post comments

Interesting

I wasn't familiar with Haskell's implementation, but I guessed that something like that must be so. When it comes down to the machine code level, the polymorphic code ultimately needs some kind of metadata to identify the types of values in order to do basic things like copying a value from one location to another, or dispatching to the correct operator implementation based on the type. Tagging values is one solution, but can be inefficient; passing an extra parameter is basically reifying the type and implementing the values as tuples that contain the type information as well. There really doesn't seem to be that large of a design space for implementations, but I suppose that these solutions are acceptable.

As for Java's covariant array types...besides being a language flaw from the beginning, this is the one place where Java almost managed to have parametric types, from the beginning. After all, it's straightforward to think of an "array" type constructor that simply takes an element type; and most VM's that I have worked on have a representation of the types that expresses arrays in this way. Java does reify these parametric types, however--the dynamic check on array stores requires that the element type of the array be known at runtime. So oddly enough, the broken covariant array types from the very first Java 1.0 were closer to the general solution than Java 5's (erased) generics....

It seems that there was a pretty big opportunity to unify arrays and generic classes at some point in Java's development, and it was not taken. Pity. It seems unlikely that any future revision of Java will undo this persistent design flaw due to backwards capability concerns.

By Ben L. Titzer at Fri, 2006-10-13 16:39 | login or register to post comments

Passing an extra parameter

Passing an extra parameter potentially opens up a nice avenue for optimisations as well, as general specialisation and partial evaluation techniques work.

By Philippa Cowderoy at Sat, 2006-10-14 15:27 | login or register to post comments

subtyping

Haskell does too have subtyping of type classes, like Java's interface inheritance. For example if a type is instance RealFloat it's required also to be RealFrac, Floating, Fractional, Real, Num, Ord, Show and Eq.

By Julian Morrison at Sat, 2006-10-14 12:43 | login or register to post comments

Not the same as subtyping

While I see what you mean this is not the same as subtyping, because type classes are not types. Technically, this is "constraint entailment". One important practical consequence is that subtyping allows you to (implicitly) coerce to a common supertype and build heterogeneous collections. With type classes, you have to resort to (semi-explicit) existential types to do that. Of course, vice versa there are lots of things you cannot do with subtyping.

By Andreas Rossberg at Sat, 2006-10-14 13:45 | login or register to post comments

Subkinding?

I don't know, but isn't typeclasses kinds? and if so does that mean Haskell has Subkinding?

By Felicia Li Svilling at Sat, 2006-10-14 13:48 | login or register to post comments

Kinds are an entirely different issue

A kind determines how many type parameters a parametrized type takes, and what are their kinds.

By Qrczak at Sat, 2006-10-14 15:16 | login or register to post comments

I thougth kinds there a more

I thougth kinds there a more general concept. a sort of meta type, ie type of types. And not restricted to just the arity of a type.

By Felicia Li Svilling at Sat, 2006-10-14 15:29 | login or register to post comments

Yes, but type classes aren't

Yes, but type classes aren't types of types, they're constraints on polymorphic types and accompanying evidence. Qualified Types and Kinds aren't the same thing.

GHC's Core language has a kind system that's a little more complex than Haskell 98's - it distinguishes between boxed and unboxed types, for example. In System Fomega, the kinds're effectively the simply-typed lambda calculus lifted up a level.

By Philippa Cowderoy at Sat, 2006-10-14 15:54 | login or register to post comments

Just thought of this and

Just thought of this and figured it's worth pointing out:

You can emulate qualified types with kinds when all your predicates only take one parameter, but if they take more than one you can't. So multiparameter type classes can't be emulated with kinds, for example.

By Philippa Cowderoy at Sat, 2006-10-21 17:16 | login or register to post comments

Type classes as kinds

You'd also need to restrict the arguments of your predicates to be type variables (and ones that are bound at the "current" quantifier). This seems to be the case in Haskell 98, but not necessarily with the various TC extensions floating around.

Moreover, you'd need something like intersection kinds to capture multiple constraints...

By Andreas Rossberg at Sun, 2006-10-22 15:04 | login or register to post comments

Qi Prolog

It seems Qi's type checker is a Prolog.

By Leo Richard Comerford at Thu, 2006-10-12 11:48 | login or register to post comments

Lambda the Ultimate

User login

Navigation