New JDK 7 Feature: Support for Dynamically Typed Languages in the JVM

Sun has a new article out that talks about the changes in the upcoming JDK 7 to support dynamic typed languages. Of special note is the new invokedynamic bytecode:

the invokedynamic bytecode instruction enables an implementer of a dynamic language to translate a method invocation into bytecode without having to specify a target type that contains the method. The method specification in this case is a simplified constant pool reference, which specifies only a method name and type descriptor. The descriptor specifies the return type of the call and the type of method arguments.

Full article: New JDK 7 Feature: Support for Dynamically Typed Languages in the Java Virtual Machine

Past discussion on invokedynamic here and JAOO 2005 slides by Gilad Bracha.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I think the Da Vinci Machine

I think the Da Vinci Machine is an even bigger story than invokedynamic, which is really of limited use. Da Vinci should bring Java closer to having DLR capabilities via lightweight dynamic code generation (without using classloaders), which can be exploited in many different ways.

Now what I want to see next on the JVM or CLR is a way to generate class implementations at run-time...and...oh, the debugging story for dynamically generated code currently needs work (at least on the CLR, maybe they will have something better setup on JDK7?).

I think the Da Vinci Machine

I think the Da Vinci Machine is an even bigger story than invokedynamic...

Only no one wrote a story about it for us (all I found is this)!

Collectible Dynamic Types

The upcoming fourth version of the CLR has a feature called collectible assemblies for dynamic type generation which is a leap forward in support for run-time generation of class implementations.

Previous versions have allowed you to dynamically generate class (and other type) definitions, but those definitions have been outside the scope of the garbage collector and thus less than useful for certain dynamic language paradigms. The only workaround was elaborate, expensive in time and space, and required manually tracking which definitions were eligible for collection.

The new version allows collection of a dynamic type definition when all instances of that type, metadata about it, and generic types that refer to it have been collected.

I think the Da Vinci Machine

You can already do that with the CLR and even generate debug info.

This is how debugging works with the DLR. It's not perfect and it doesn't quite work under mono, but it's getting there.

One big thing that's missing on the CLR is type classes, this would allow for much richer uses of generics and maybe even get Haskell running on top of it.

The curious thing is that type classes wouldn't be that much hard to add given the available machinery used by existing implementations.

The last time I looked,

The last time I looked, debugging DLR code requires outputting to an assembly or something, and even then there is weirdness about how to connect back to your source code so debugging makes sense. Things keep evolving, but right now I'm still stuck with CSharpCodeProvider if I want decent debugging experience of dynamically generated code.

Type classes...are just one way of doing things. I'm not sure something so specific should go into the CLR, and...Haskell as a static language, what does it have to do with dynamically typed languages? :)

I share your confusion

I share your confusion about the comments above on type classes. Type classes are a static concept. I have a ghci-style interpreter written in C# for a language that relies heavily on type classes. The standard dictionary-passing (ends up as interface-passing in my implementation, but the idea is the same) translation works just fine.

What kind of support did you have in mind?

If anything, I'd rather see better support for call-by-need. I can't figure out any way to implement a lazy int (even a 30- or 31-bit one) with anything less than 16-bytes of heap space: 4 for a vtable pointer, 4 for a "SyncBlockIndex" used to implement the CLR's (misguided) ability to lock on any object including this thunk, 4 for a reference to the delegate (closure) to lazily call (using null as a flag to indicate that the result has already been memoized), and 4 to memoize the result. Compare this to some mainstream functional language compilers which get away with 4 bytes, using a bit or two for flags which control whether the remaining 30 or 31 bits are interpreted as the memoized value or as a reference to the closure to invoke. Creating such a feature requires tight integration with the garbage collector (and in the CLR, might also impact on verification etc.).

Not a "dynamic language" concern, per se, but for languages with both dynamic and lazy features it would be very nice.

Type classes under the CLR

The issue with implementing type classes with dictionary-passing is that performance will suffer for valuetypes.

Maybe it's my lack of knowledge, but I fail to see how passing interfaces will lead to an efficient implementation of Num over primitive types.

Now on how to better implement call-by-need, on your approach you could get by with a single pointer to the delegate and replace it with the result. This should work as long the type of the delegate is not the same of the result. Type test and unboxing are pretty fast operations.

Thou support for pointer tagging would be interesting, as it has many uses such as fast call-by-need or numeric types on dynamic languages.

The issue with implementing

The issue with implementing type classes with dictionary-passing is that performance will suffer for valuetypes.

Maybe it's my lack of knowledge, but I fail to see how passing interfaces will lead to an efficient implementation of Num over primitive types.

Firstly, the CLR specializes code for value types, so you can't squeeze much more performance out of that. Secondly, even if it didn't specialize, this penalty is only paid for by code that needs the abstraction. You pay for the abstraction you use. But because this abstraction isn't available, people now have to resort to all sorts of convoluted and less safe hacks to get around it, and that's a tragedy.

Besides, the CLR already has a limited form of type classes:

void Foo<T>(T bar)
  where T : Baz
{
  // call Baz methods on bar which is an instance of T
}

Of course, the problem is that the CLR does not support type constructor polymorphism, so all return types from Baz must either be known, we must resort to casting, or the return type must itself be a type parameter to Foo, ie:

void Foo<T, R>(T bar)
  where T : Baz<R>
{
  // call Baz methods on bar which is an instance of T that returns type R
}

C# type inference often falls down on such cases, so it's not pretty to do this either.

This should work as long the type of the delegate is not the same of the result. Type test and unboxing are pretty fast operations.

The lack of typesafe union types really does make this harder than it need be. Currently, only non-pointer fields can overlap.

Re: The issue with implementing

Firstly, the CLR specializes code for value types, so you can't squeeze much more performance out of that. Secondly, even if it didn't specialize, this penalty is only paid for by code that needs the abstraction. You pay for the abstraction you use. But because this abstraction isn't available, people now have to resort to all sorts of convoluted and less safe hacks to get around it, and that's a tragedy.

Thinking more about it, yes, it's pretty easy to encode a limited form of type classes using interfaces in such way that valuetypes are not boxed.

For example:

interface Num { T Add (T a, T b); }

struct IntAdd : Num {
public static IntAdd instance = new IntAdd ();
public int Add (int a, int b) { return a + b; }
}

T Double (K typeClass, T a, T b) where K : Num {
return typeClass.Add (a, b);
}

//Calling it as:
int res = Double (IntAdd.instance, 10, 20);

The mono JIT can reduce this to "res = 30". So it's definitely possible to produce a decent enough encoding.

OTOH it will produce an interface dispatch for reference types and this is not the ideal thing, at least on mono, where they are slower than virtual calls.

OTOH it will produce an

OTOH it will produce an interface dispatch for reference types and this is not the ideal thing, at least on mono, where they are slower than virtual calls.

That's a serious mono deficiency, if true. I can't think of any reason interface dispatch should be slower than a virtual call. They're both just dispatch through a vtable.

Note also that the CLR is worse than the JVM in supporting type class-like abstraction, because the CLR decided to do away with static interface members, allegedly because they weren't OO-enough (they preferred factory patterns). Your example could have been written like this:

T Double<K>(T a, T b) where K : Num<T>
{
  return K.Add (a, b);
}

The way I've done it given the CLR limitations:

interface Num<TSelf, TNum> where TSelf : Num<TSelf, TNum>, struct
{
  TNum Add (TNum a, TNum b);
}
struct IntAdd : Num<IntAdd, int>
{
  public int Add(int a, int b) { return a + b; }
}

T Double<T, K>(T a, T b) where K : Num<K, T> {
  // or lift default(K) to a static member if possible
  return default(K).Add (a, b);
}

I haven't actually measured the costs associated with default(K) and dispatching based on jitted struct call sites, so they may be prohibitive. Where I've used this pattern, I generally lifted this to a static member to amortize any costs. If you can do this, you can remove the dependency on TSelf being a struct, and make it a constraint on new(), so it can be a class type if that's needed.

C#'s type inference can also get confused when you have nested type parameters, so it's not always the most pleasant. As you can tell, I've explored this encoding quite a bit. :-)

Interface dispatch is not

Interface dispatch is not the same as virtual dispatch.

Since types can have any number of interfaces there is no simple fixed offset to vtable methods.

Mono originally implemented interface dispatch assigning a global unique offset to each interface, so each one would always be on the same vtable offset.

This results on icalls having the same performance as vcalls but have as very nasty side effect. Generics causes an explosion on the number of interfaces so vtables started to be pretty huge.

Our current solution uses IMT (Interface Method Tables) which allows for pretty fast dispatch for most cases and a slight slow down for types with tons of interfaces. It's slower, but saved about 2MB of vtable memory on a desktop media player application. (Banshee)

This OOPSLA paper [1] explains the problem in bigger depth. Mono original did what it describes as Selector Indexed Tables.

On the bright side, the IMT machinery was leveraged to implement generic virtual dispatch in a way that is a bit more performant than the MS VM.

Static methods on interfaces would pretty much mean passing around method tables, pretty useful indeed.

[1]http://www.research.ibm.com/people/d/dgrove/papers/oopsla01.html

Constant-time interface dispatch

Resolving an interface vtable from the concrete class might be an expensive operation, but typically the layout of the interface vtable is known at any call site, so dispatch itself is not the expensive operation. Come to think of it, interface vtable resolution can also be constant.

interface IFoo {
  int Bar();
}
class Baz : IFoo {
}

The metadata describes a tree of vtables rooted at the concrete class:

// metadata for class Baz
struct MetaBaz {
  vtable methods;    // vtable for concrete instances
  Meta super;        // metadata ref for down casting to super class
  Meta[] interfaces; // metadata array for down casting to interface
  //...
}
// a MetaIFoo instance allocated for each implementor
struct MetaIFoo {
  vtable methods;
  //...
}
Baz_IFoo_slot = 0; //IFoo vtable has a compile-time known slot

When compiling, you should always be able to assign a unique slot for a given interface for a concrete class (not a slot global to all classes, just per class). Since you know the concrete class at all down-casting sites, resolving an interface vtable from the concrete class should be constant time.

I don't think generics would significantly impact this picture. Assuming a class implements a generic interface, it would simply have a new Meta struct entry in its interfaces array for each generic instantiation of the interface. These are all simple pointer arrays, so I don't think space is an issue. A generic class would need its own metadata struct anyway for each instantiation, and so would gets its own set of vtables too.

The downside is that at runtime, the Meta pointer for interface parameters is passed along with a reference to the data; essentially, dictionary-passing type classes. The upside: these are all constant time operations. There is additional register pressure from passing these extra vtable arguments, but I don't think I've seen the costs of this sort of design actually quantified, as compared to IMTs. The across-the-board constant time operations are certainly seductive.

I think I first came across this sort of encoding in relation to .NET when browsing the LLVM docs back around 2004.

I have no idea of the

I have no idea of the performance difference of implementing interfaces using dictionary passing.

Dispatch would certainly be faster than IMT but the cost of passing the dictionary around is non trivial. It adds to register pressure and add more weight to method calls. But what worries me about such approach is the increase of cost on usually fast operations.

Another issue is that cast from object to an interface type would turn to be pretty expensive.

The issue with the number of types is a problem of using Selector Indexed Dispatch since there are way more interfaces.

Generics makes things more complex because you have to consider variance, which results into weird dispatch semantics with types with ambiguous interfaces.

Dispatch would certainly be

Dispatch would certainly be faster than IMT but the cost of passing the dictionary around is non trivial.

I've seen this stated elsewhere, but I've not seen this cost quantified. Without measurement, it's all just speculation. I'm not convinced that the costs of IMT are lower than pairing a vtable with interface parameters.

It adds to register pressure and add more weight to method calls.

I'm not sure what you mean. It adds a little more overhead to using interfaces, but as you've said, there is already overhead associated with interfaces. The question is, which has less average overhead, and better worst case characteristics?

Another issue is that cast from object to an interface type would turn to be pretty expensive.

Hmm, I'm not sure I understand this objection. In what way are casts more costly when using dictionary-passing?

[Edit: the LLVM notes I mentioned are here: vtables and type info, interfaces. Based on a quick skim, it's not exactly what I described, but a variation.]

Those Lattner's emails you

Those Lattner's emails you pointed don't deal with all the ugly details of his approach, nor does it account for generics sharing or variance.

Anyway, the cost of using a [iface vtable, object] pair is pretty simple to measure.

For calls on X86, using the linux ABI it adds one extra memory store for each interface parameter and on return value. For regular code it certainly have some cost.

On register rich targets such as ppc or ia64 the hit should be softer.

The issue with object to iface cast can be illustrated to the following piece of code:

void dispatch (object obj) { ((IFace)obj).Test (); }

The JIT has to resolve the iface vtable from an unknown object. Either the runtime holds a huge per-type offset vector of interface vtable offsets or it does a binary search on a sorted list of interfaces.

Even if the binary search is done through a generated code thunk it won't be as fast as a simple type test.

There are other issues, as how interface references are stored on heap. You can either store them as boxed pairs of [iface vtable, object], using more memory and slowing down/complicate reference comparisons; or store them directly and have to retrieve the iface vtable from an unknown object all the time.

There isn't much point on arguing that retrieving the iface vtable of a static target can be reduced to pushing a constant since for all those cases it's trivial to convert such icall into a vcall and completely ignore the interface dispatch machinery.

For calls on X86, using the

For calls on X86, using the linux ABI it adds one extra memory store for each interface parameter and on return value. For regular code it certainly have some cost.

I'm not disagreeing with you here: dictionary-passing code does have overhead with respect to fully specialized code. I simply take issue with the presumption that IMT would be faster than dictionary-passing, as I have yet to see dictionary-passing quantified in a way that we can relate to IMT implementations.

I reviewed the paper again, and in fact "directly indexed tables" were quite literally a hair's breadth slower than IMT. Considering the former involves a linear search of an interface array, and reshuffling the array to cache most recently used interfaces, I'm even more skeptical that dictionary passing couldn't be just as good, despite the register pressure.

I'm also confused as to why you originally said interface dispatch is slower than a virtual call, considering the paper's measurements indicate that the performance difference between interface dispatch and a virtual call is actually very small.

void dispatch (object obj) { ((IFace)obj).Test (); }

[...] or it does a binary search on a sorted list of interfaces.

Firstly, I believe that such code sequences are very rare, and indeed should be rare, and probably even should suffer a penality to discourage their use. Secondly, the paper's measurements demonstrate that IMT performance is barely faster than a linear search of an ordered interface array, so considering that such sequences are rare anyway, your argument that a "simple type test" would be faster does not sound very convincing.

You can either store them as boxed pairs of [iface vtable, object], using more memory and slowing down/complicate reference comparisons; or store them directly and have to retrieve the iface vtable from an unknown object all the time.

I was suggesting the iface vtable is passed as an extra parameter wherever interfaces are used. Essentially, it's a (object ref, iface vtable) fat pointer. This includes being stored in object fields. We know statically all points at which only an interface type is known, so we should know all points where a fat pointer would be required.

I'm not disagreeing with you

I'm not disagreeing with you here: dictionary-passing code does have overhead with respect to fully specialized code. I simply take issue with the presumption that IMT would be faster than dictionary-passing, as I have yet to see dictionary-passing quantified in a way that we can relate to IMT implementations.

Without someone actually measuring both in a way that can be related it's pretty hard to quantify it. I'm a bit skeptic about dictionary-passing for a CLR like VM.

I reviewed the paper again, and in fact "directly indexed tables" were quite literally a hair's breadth slower than IMT.

Not sure about then, but doing what they call "Selector Indexed tables" is strictly faster than IMT for collision cases since it doesn't have collisions and allow dispatch with 2 memory loads. The only issue is the memory use.

Firstly, I believe that such code sequences are very rare, and indeed should be rare, and probably even should suffer a penality to discourage their use. Secondly, the paper's measurements demonstrate that IMT performance is barely faster than a linear search of an ordered interface array, so considering that such sequences are rare anyway, your argument that a "simple type test" would be faster does not sound very convincing.

Quite the opposite, this kind of sequence is very common on C# code for two reasons: first because it's common to have code that does interface probing and second because most real world code sucks.

Maybe our experience with mono (I work on it) can share some light on this. We use fixed size 32 entries tables and the rate of slots with collision is not big [1], so does the number of colliding methods per slot.

The major shortcoming on our implementation is that we handle synthetic array interfaces very naively.

[1]The worst of our benchmarks rates with 30% due to heavy use of array interfaces.

first because it's common to

first because it's common to have code that does interface probing

I'm curious, what sort of code does "interface probing" that wouldn't simply place the interface constraint in the function parameter itself? I develop primarily in .NET nowadays, and I don't think I've ever done such probing. If I did and simply don't remember, I almost certainly never wrote a program whose performance would depend critically on a dynamic type test.

Framework code does it all

Framework code does it all the time for optional interfaces, specially when reflection is part of the mix.

Non generics collection and serializaton are good examples of this. The first does it for sorting and all operations that require the objects to implement IComparable, the second to check for ISerializable and the such.

As a ballpark figure, mono's mscorlib has about 508 type tests for interface types.

Non generics collection and

Non generics collection and serializaton are good examples of this.

Serialization is definitely a good example. Still, type check costs are very unlikely to dominate when you're using reflection.

I'm not sure how non-generic collections come into play here. Perhaps you mean probing for IComparable and the like?

Actually, one example of probing which is probably in widespread use is access to EqualityComparer<T>.Default and Comparer<T>.Default. However, the results of this probe are trivially cached within EqualityComparer and Comparer respectively, using a static variable, though you might know better whether generics code sharing poses a problem here.

Call-by-need again

As you said, this is a reasonable way to implement call by need. (Have a single field of type object and store either a (possibly boxed) reference to the result of the call or a reference to the delegate to call.)

I may switch to this one day if the profiling indicates it would help a lot. The drawback, a fairly large one in my opinion, is that the "real" type of the parameter is now hidden. This is especially annoying when code written in the call-by-need language is exposed to callers from other CLR languages.

Generating class implementations at runtime

Now what I want to see next on the JVM or CLR is a way to generate class implementations at run-time

Perhaps I've misunderstood, but Java has had support for this since 1.3 (which must be about a decade old by now!) in the shape of the java.lang.reflect.Proxy class. There are plenty of long-standing 3rd-party libraries such as BCEL and ASM which have similar goals too.

EDIT: corrected the link to the Proxy class.

I meant in a lightweight,

I meant in a lightweight, garbage collector friendly way (which is the whole point of the DLR and Da Vinci). Sun has had dynamic linking forever, but this has always been through a classloader, which means you can't do it often and you have to be very careful about lifetimes. Da Vinci (like the DLR) adds support for lighting weight dynamic code loading, where the code you load is just plain-old object tracked by the garbage collector. You can create a lot of them as you need them and once you forget them they are gone. That was never the case with the classloader approach.

Class loaders and GC

Actually, since at least the days of JDK 1.2, letting a class loader get GC'd would reclaim all classes that it generated, if no instances of those classes exist. So one approach for dynamic code generation was to have a class loader per class. Classes still go into permgen space, so they increase GC pressure. I'm not sure if that is still the case with OpenJDK today, though.

Possile but not easy

Having said all that though, I think it's fair to say that it's been possible but by no means easy to do. The invokedynamic stuff should help make it a lot easier, which I guess gets us back to Sean's point?

i believe invokedynamic is

i believe invokedynamic is orthogonal to the DaVinci function handle infrastructure. They work together to make building dynamic languages easier, but you can use the Da Vinci stuff without using invoke dynamic (since function handles are potentially typed they can be invoked via invokevirtual).