Marrying VMs

VMKit is an LLVM project; per the announcement at the Proceedings of the 2008 LLVM Developers' Meeting:

VMKit is an implementation of the Java and .NET Virtual Machines that use LLVM to optimize and JIT compile the code. This talk [slides, video] describes how VMKit integrates components from various systems, how bytecode translation works, describes the current performance status of the system, and discusses areas for future extension.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

much appreciated

The topic post here didn't generate any immediate discussion yet but I wanted to let the poster (C. Stewart) know that it was much appreciated by at least one reader and may (we'll see) have changed some of my technical plans. LLVM is a nifty concept and this is a swell demonstration that it is more than just a concept.



Make that 2. I found it utterly fascinating.

Two replies to a reply *is* discussion

Many thanks, Tom & Matthew. One does sometime wonder...

Tom: I'm curious, does this have anything to do with Pika, or is that abandoned?

re pika

Pika isn't abandoned, exactly, but that specific code branch is a now-pruned sub-tree of the search space - abandoned code, sadly, but not abandoned work. I got a lot of good ideas from the effort to make Pika. Had I been better financed during the project it probably would have been completed but I wasn't so it wasn't and I took my lessons, cut my losses and am reconfiguring those ideas and mixing in some later ones.

There are some - (this is perhaps my informal "thesis") - overlooked "sweet spots" in the design of run-time systems and virtual machines - designs so "sweet" that they have non-linearly better quality than what people are used to yet are simple and tractable. We'll see whether or not I eventually prove my thesis by making such a thing manifest :-)

Thanks for asking.



Another project with similar goal is the Parrot VM, which recently made it to 1.0.

Multiple VMs

I think you may have missed the point of the story, which I expect would my fault for not being clear, and so I shall attempt to remedy this here.

If I understand its goals correctly, Parrot is designed to be a target VM for a number of popular dynamically-typed PLs. Both LLVM and CLR are similarly designed as targets for many languages, with JVM having become a target of several languages through their implementor's ingenuity. The difference between these various multi-PL, single-VM ecologies is to do with which PLs they support, what kind of runtime support they give (e.g., JIT compilers, libraries, object reflection), and how good a fit the VMs for the PLs they support are.

The aim of VMKit is to provide a run-time system based on LLVM that can support several running VMs concurrently. By so doing it provides infrastructure that should be able to support interaction between programs running on the separate VM instances much better than OSs can, for instance a single system for managing state might support much cleaner management of resources used by more than one VM.

dfn of "VM" breaks down

There is still, at the lowest levels, the LLVM, VM. It just happens that they can somewhat parsimoniously implement the instruction sets and memory models of other VMs on top of that. Is that multiple VMs? Or one VM that can be targeted via multiple intermediate formats?

In that sense, a comparison to Parrot is apt.

I think the present work gives a solid boost, though, to the hypothesis that "a very low level VM makes a good foundation."



I'd say that any code that executes other code that conforms to a VM specification is an instance of the VM. Here VMkit is extra runtime supporting LLVM that instantiates VMs that are mostly completely conformant to the JVM and CLR specs. So it is two VMs and one VM at the same time.

I'm sure one doesn't have to look far to find well-established definitions of what a VM is that contradict mine.

ah... it's Parrot confusion (?)

Ok, perhaps the confusion is the result of something you may not know about Parrot?

For example, someone (or some people) are building JVM support for parrot in the form of a translator from JVM object files to Parrot code. Thus, as that work matures, the notion is that Parrot (plus that translation library) will be "code that executes other code that conforms to a VM specification". That will be a JVM instance (using your vocabulary) but it will also be a Parrot instance able to concurrently run a mix of Perl6 and Java code. So, again, I'm not sure I see the distinction you want to draw between VMKit and Parrot in that regard.

I think the interesting practical way in which the two differ is that LLVM and VMKit were built more quickly, have a simpler core, and look (eyeball, not measured) likely to maintain a very sharp performance advantage - perhaps forever.

The key engineering way in which they differ seems to be the architectural style of the core: very low level VM core vs. higher. The low level style of LLVM simplifies the definition and implementation of compilers (batch and JIT) and supports a much wider range of interesting code that can run efficiently on the core VM. VMKit helps to validate this architectural choice.

The architectural choice of a low-level architectural style for the core was not "obviously" better a priori. Indeed, it's almost counter-intuitive if you assume a model in which whatever the primitive operations are they all get taxed with interpretive overhead (even if being JIT compiled) and wouldn't one guess that paying all those taxes on so many low-level ops would add up and eliminate any gains? Well, maybe or maybe not - hence, again, why I was tickled to see this empirical result.



Unified VM vs Married VMs

Actually my point was that another way to achieve the interoperability between different language environments is to use a common VM system, which is the Parrot's approach.

Quoting from the Parrot's site:

Parrot is also designed to provide interoperability between languages that compile to it. In theory, you will be able to write a class in Perl, subclass it in Python and then instantiate and use that subclass in a Tcl program.

I would also comment that, without having looked any of these two projects in detail, the architectural approach of parrot makes more sense to me with regard to providing interoperability. On the other hand, I'm sure that independent VMs running under a common substrate has also its advantages. From the top of my head, I'm guessing that having the VMs separated would make the task of synchronization easier.

"independent VMs [...] common substrate"

Parrot and VMKit are architecturally the same in that regard.

The popular vocabulary really does seem to break down in the face of reality in this area: "VM" is a confusing term.


VMKit vs Parrot

Sorry for (maybe) getting too deep into this, but I could use a good excuse to take a closer look at VMs, since I think it is an interesting subject.

Here's my understanding:

I think that it is pretty clear that VMKit is intended to act as a substrate for implementing higher level VMs.

From VMKit: a Substrate for Virtual Machines (pdf):

In this paper, we propose a solution for building VMs based on a two layer approach: (i) a common substrate providing basic functionalities such as threading, I/Os, GC based memory management, exceptions and a compilation engine (JIT, interpreter, dynamic optimizations) for a VM independent intermediate code, (ii) a high-level Virtual Machine providing the VM specific functionalities. More precisely, a high-level VM implements a runtime engine to find types, methods and fields, and a dynamic translator to translate from a VM specific code to the substrate intermediate code Our precise contribution is to describe VMKit, a first implementation of such a substrate using LLVM as the compilation engine.

Note that I'm not saying that VMKit/LLVM isn't a different kind of VM on its own :-).

Morover, as it is stated in the slides of the talk, one of the goals of the VMKit project is to provide interoperability by providing means for the HL VMs to communicate via this common substrate.

Hence, I would argue that the architectural difference with regards to interoperability is that VMKit defines multiple HL VMs that communicate, while Parrot uses a single HL VM that aims to support "Inter-Language Calling" ([DRAFT] PDD 31: Inter-Language Calling).


we'll leave it there, then

I get what you're saying and don't disagree with the picture. We just use the words slightly differently.