Why can't you push instructions in MSIL/JVML

Doesn't it seem like a good idea to be able to push instructions onto the stack in the MSIL and JVML, and evaluate them later on?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Optimization

The thing is, JIT compiling isn't exactly just-in-time, most is done at startup. I think they might even optimize a bit as it runs, but the point is, dynamic stuff would play havoc with the decidability of the system. It might also lead to some inadvertent security holes.

It doesn't have to be dynamic.

You don't have to resort to dynamic tricks to allow pushing of instructions on the stack. You can add the feature in a type-safe way (see http://www.cat-language.com/typing_stacks.pdf ). Since it can be done at compile-time I doubt it would lead to any potential security flaws.

I wonder if it is the case that many designers of stack-based languages make the same assumption?

When JIT happens

JIT compiling isn't exactly just-in-time, most is done at startup.

Do you have a citation for that? I have heard the opposite from a few former Javasoft employees. IIRC they said that nothing happens on startup and nothing gets compiled until it has been interpreted 10,000 times.

Just In Time

"Remember how HotSpot works. It starts by running your program with an interpreter. When it discovers that some method is "hot" -- that is, executed a lot, either because it is called a lot or because it contains loops that loop a lot -- it sends that method off to be compiled. After that one of two things will happen, either the next time the method is called the compiled version will be invoked (instead of the interpreted version) or the currently long running loop will be replaced, while still running, with the compiled method. The latter is known as "on stack replacement", or OSR."

Frequently Asked Questions About the Java HotSpot VM version 1.4

What does this mean exactly?

I'm not clear what this means. What do you want to do that you can't already do with class loaders? Please don't say "push bytes on to the stack" because that's an implementation technique. What is the problem you're trying to solve?

"I'm not clear what this

I'm not clear what this means.

Treat op-codes as data, and have op-codes for executing and composing op-codes to create anonymous functions.

I want something like (in MSIL):

push(Ldc_I4_2) // push the instruction that pushes 2 
push(Mul) // push the multiply instruction
compose // combine two instructions to make an anonymous function
Ldc_I4 // push 4
swap // swap 4 and the anonymous function
eval // evaluate the anonymous function 8 will be left on the stack

What do you want to do that you can't already do with class loaders?

Class loaders are overkill when you just want to make small lightweight anonymous functions.

What is the problem you're trying to solve?

Efficiently convert a functional stack-based language into MSIL/JVML.

Security

The JVM, at least, ensures type safety by being able to statically determine the type of all stack entries and the maximum size of each stack frame. This is done not just during compilation, but also at classfile load-time, by the bytecode validator. The sort of scheme you describe would either make that completely impossible, or would vastly increase the runtime cost of bytecode validation.

That isn't true

The sort of scheme you describe would either make that completely impossible, or would vastly increase the runtime cost of bytecode validation.

This is not true. The byte-code verifier simply needs to type-check the code, and it is easy to extend the type-system to include higher-order instructions (instructions operating on instructions). This is what all of my work on Cat has been about. I'll point again to my paper which describes how a type system would look if you added higher order instructions to a stack-language http://www.cat-language.com/typing_stacks.pdf.

To provide a concrete example, of how such a type system works here is the previous example I posted, but with a description of the type configuration of the stack after each instruction:

push(Ldc_I4_2)  // push the 4 generating instruction
// types on stack: ( -> int)

push(Mul) // push the multiply instruction
// types on stack: ( -> int) ( -> (int int -> int))

compose // compose instructions
// types on stack: (int -> int) 

Ldc_I4 // push 4
// types on stack: (int -> int) int=4

swap // swap top two items
// types on stack: int=4 (int -> int) 

eval // evaluate the anonymous function 
// types on stack: int=8

A previous implementation of Cat had a fully working type inference engine that did precisely what I describe (see: http://www.codeproject.com/csharp/cat.asp). The overhead of type-checking higher order functions was not huge and it was definitely not impossible.

The addition of higher order instructions would also eliminate the need for generating lots of small class-files when compiling functional languages (like Scala).

I really appreciate this feedback. It is very helpful for me to identify how my work on Cat is relevant to the language community at large, and how to explain the most salient points.

Very impressive

I clearly spoke far too quickly, based on a guess of how hard higher-order type inference was and some knowlege of just how eager the JVM teams were to shave every cycle off of bytecode validation times. Does your solution work for recursive inline functions? Does it handle non-local returns, like Scala and some of the Java closure proposals require? Those would seem to make the type-inference algorithm much less tractable.

The addition of higher order instructions would also eliminate the need for generating lots of small class-files when compiling functional languages (like Scala).

If closures get added to Java 7, it seems most likely that some solution to this problem will be added to Java 8, as people see how the number of small classfiles explode. I'd bet on a special cut-down class file format for single method anonymous classes, rather than higher-order bytecodes, but would be happy to be surprised.

Thanks!

Thanks for the compliment.

Does your solution work for recursive inline functions?

No, the paper's type system is quite trivial, just higher order functions and primitives. This problem is next in line on my plate, but I believe it to be simply a matter of detecting circular dependencies while performing constraint resolution. So in effect simply a graph cycle detection problem.

Does it handle non-local returns, like Scala and some of the Java closure proposals require?

In its current form no. I believe this would simply require the addition of reference or pointer types to the type system. From a Cat perspective, it would result in a non-pure type system not that that is neccessarily a bad thing. I will look into this problem.

I'd bet on a special cut-down class file format for single method anonymous classes, rather than higher-order bytecodes, but would be happy to be surprised.

I hope they don't go that route, I doubt the performance would be as good as higher-order instructions. If I am not mistaken it would be harder to perform higher-order optimizations such as deforestation and function inlining.

Whatever happens, I hope others build on the groundwork I laid out in the paper.

Seems like this is what

Seems like this is what ILGenerator.Emit does. You can emit CIL instructions onto an ILStream and then have that stream saved to a file or just in time compiled and executed.

When I worked at Microsoft, we exploited this feature along with many others features found in the Reflection.Emit namespace to implement IronPython.

Huge overhead

That is correct, the ILGenerator.Emit approach is currently the only way to achieve the effect. It is however several magnitudes of order slower than what can be done with a simple extension of the type system with higher order instructions. The Emit approach requires reentering the compiler every time you compose two instructions together.

The code example I gave can be compiled into only a couple of assembly instructions on a single pass of the compiler. All that is needed is a more sophisticated type system for the MSIL byte code.

This may be the case if you

This may be the case if you use ILGenerator alone, but it's pretty simple to wrap around the Reflection.Emit namespace to implement lightweight anonymous functions/methods on the fly that are JIT'd very quickly. You can cache the result of previous JITs to avoid re-entering the compiler all the time if you like.

You can have a look at IronPython's source code to see how we did it (http://codeplex.com/IronPython)

JIT'd quickly

but it's pretty simple to wrap around the Reflection.Emit namespace to implement lightweight anonymous functions/methods on the fly that are JIT'd very quickly

I just don't see how reentering the compiler (no matter how much caching you do) is possibly comparable to the performance that you can achieve with lambda expressions if you have higher order instructions and compile the code once. In most assembly code, you can execute any data as code, so why not do it?

I would be curious though to look at the JIT caching code in IronPython, can you point me to the appropriate files?

You're absolutely correct

You're absolutely correct that re-entering the compiler repeatedly will be a performance hit. You can avoid re-entering the compiler by saving the result of the JIT into a DynamicMethod object so that you don't constantly re-compile it over and over again.

That's not to say that your idea of higher-order instructions couldn't provide a neat and elegent way of doing the same thing. For example, right now to create a DynamicMethod there is still some bulky work that needs to be done (example: http://msdn2.microsoft.com/en-us/library/exczf7b9.aspx), it's certainly not as clean and straight forward as the pseudocode you provided.

But in terms of performance, higher order instructions would still need to be compiled before they could be executed.

These aren't the stacks you're looking for. Move along.

Another aspect of the same problem is a JVM function must leave the stack in the same condition it found it, while a Cat function need not do so.

The most obvious reason for this is historical: the Java language compiler writers had no need for such a feature and so the JVML didn't have to support it. Now, if Java were a functional programming language, I can imagine higher-order functions getting a little more love in the JVML (which, I believe, would help you out a little). For example, "mul" would probably not be a low-level instruction that modifies the stack, but a regular function that could be pushed onto the stack and then eval'd. However, I still don't see a fundamental need for first-class status for stack-manipulation instructions (push, pop, swap, load-local, etc.).

The JVM stack is simply a way to track temporary results without having to give them explicit names (reminds me a little of De Bruijn numbering, actually). If the JVML designers had chosen a three-address-code-style IR (3AC), there wouldn't even be any stack manipulation instructions. In fact, JVML is just a space-optimized version of a more canonical 3AC IR.

Also, note that many (all?) JIT compilers convert JVML to a 3AC-style IR internally. A stack IR with higher-order stack-manipulation instructions couldn't be converted the same way.

Don't feel singled out :) You may have noticed that almost every non-Java compiler that targets JVML/MSIL ends up emitting non-optimal crap to emulate language features that don't fit in the Java/C# model (Python/Ruby-style dynamic method calls, higher-order functions, lazy evaluation, first-class continuations, disjoint unions, etc.).

I'd like to think that we can come up with an IR that can better handle the variety of languages out there (adding special support just for stack-manipulation functions obviously isn't a general solution). What sounds particularly attractive is an IR that lets you provide safety proofs to help the verifier and JIT. For example, the Cat compiler could emit code that manipulates a standard stack data structure but also emit a proof that would allow the JIT to omit overflow/underflow checks and maybe even lay out the data on the standard call stack. Based on my wild hand-waving, you've probably realized that I have no idea how this could actually be accomplished :)

Machine code is a functional stack-based language

Also, note that many (all?) JIT compilers convert JVML to a 3AC-style IR internally. A stack IR with higher-order stack-manipulation instructions couldn't be converted the same way.

Sure it can. It's just a bit more work.

Besides what is it that JIT compilers compile to in the end? They compile to machine code. And what is this machine code? At least on my machine it is a functional stack-based language. All operations are first class values: they can be passed as data and executed.

Part of the point of the Cat type system is to leverage this fact by assigning instructions to stack operations (i.e. opcodes)

For example, the Cat compiler could emit code that manipulates a standard stack data structure but also emit a proof that would allow the JIT to omit overflow/underflow checks ...

Well typed code in Cat guarantees no underflow. The type system has to do something right ;-)

As for overflow, that is an open-question. It would be easy to show that some-code only ever uses X stack slots, but some operations (recursion for example) require unbounded stack access, and always have a potential for overflow. Maybe a more sophisticated type system that uses dependent types could help in this regards.

Maybe you should compile directly to machine code?

Also, note that many (all?) JIT compilers convert JVML to a 3AC-style IR internally. A stack IR with higher-order stack-manipulation instructions couldn't be converted the same way.

Sure it can. It's just a bit more work.

The current JVML's IR is isomorphic to a 3AC IR and thus can be trivially translated. With higher-order stack manipulation, this is no longer the case. Though it's easy to write a transformation that preserves the execution semantics, it's a little trickier to preserve optimization opportunities.

Besides what is it that JIT compilers compile to in the end? They compile to machine code. And what is this machine code? At least on my machine it is a functional stack-based language. All operations are first class values: they can be passed as data and executed.

Hmm...that's interesting. Now I'd actually be more interested in seeing how Cat maps to a real machine language than how it maps to JVML/MSIL. This would position Cat as a stack-based alternative to JVML/MSIL (both of which I still consider to be 3AC IRs that have been encoded as stack-like IRs for space-efficiency reasons).

I do have my doubts that that modern instruction sets will actually work like a "functional stack-based" language. RISC instructions usually operate on registers and not the stack, there's no lightweight way to "eval" an individual instruction, etc. But these may only be superficial problems; I imagine you've thought about this more than I have.

Unfortunately, I suspect that even if you could find an elegant way to compile Cat, you may be the victim of the fact that, historically, stack-based languages have not been popular. A lot of processor design work has probably gone into optimizing for the needs of Fortran and C compilers and so even if Cat-compiler-produced code were fundamentally more efficient, you may not a the performance benefit on current processors. Of course, this would still be an interesting result.

As for overflow, that is an open-question. It would be easy to show that some-code only ever uses X stack slots, but some operations (recursion for example) require unbounded stack access, and always have a potential for overflow. Maybe a more sophisticated type system that uses dependent types could help in this regards.

If you're interested: http://citeseer.ist.psu.edu/hughes99recursion.html

This paper describes a variant of ML that incorporates space bounds tracking.