VM's... What's the best?

I want to start making a compiler for a little programming language that I have in mind. I think that doing it right would teach me lot of new things. I want to target the JVM's bytecode using Jasmin or Jamaica but I want to know your opinions...

Is there another good choice to generate bytecode from the opcodes for the JVM?

What about other VM's? CLR is not as portable as JVM... I used Parrot once but i didnt like it in that time. Do you know other options?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Or look at Neko:

Or look at Neko: http://nekovm.org/

Try LLVM

You should at least look at LLVM. That will facilitate integrating your language with code written in other languages, it will buy you a number of optimizing backends for free, and it's a rather more general design than the JVM (which incorporates many Java-the-language specific details that won't make your life any easier unless you're designing a language that is basically a simple extension to Java).

I like LLVM, but it's a bit

I like LLVM, but it's a bit intimidating for beginners. Targeting .NET or the JVM, with their high-level code generation interface is certainly easier, though ultimately more limiting. Neko is also high-level, but the VM is quite tiny, suitable even for embedded systems.

Depends what's intimidating

I think it depends on the individual. If you're intimidated by the low-level machine stuff, structure layout, manual memory management, etc., then LLVM is certainly worse. But it's also a pretty small set of instructions that all have an obvious machine-level interpretation, and can be learned thoroughly in a few hours. The JVM and .NET instruction sets are a lot bigger and there's a lot more novel semantics to learn. So it really depends what you're familiar with, I think.

LLVM

I experimented a bit with LLVM and generally liked it a lot. I haven't used Neko. LLVM does not make it easy at all to deviate from C calling conventions (for instance, to implement a stackless CPS-based calling convention), but then, neither does the underlying machine, most likely. I don't know if Neko is more flexible in this regard.

target VM depends of the source langage

The target VM depends of the source language.

In particular, if you have functional values (or closures) in your language, you need an apply primitive (which the JVM don't have).

If your source language have some parallelism, you need them in the VM.

And of course, the source langage primitives should be available in the VM.

And some features (in particular reflection, explicit continuations) need to be provided by the VM.

In addition to the other suggested VMs you might consider:

  • Ocaml VM
  • Lua VM
  • your own (coded from scratch) VM.
  • CLISP VM.
  • GNU Smalltalk VM.
  • Other LISP implementations (eg using LISP S-expr as your "VM").
  • reflective or multi-staged languages (MetaOcaml) for easy implementation of your VM.
  • Dynamic code generation (perhaps generating C or C++ at runtime, compiling it, and loading it, all in the same process).
  • generating machine code, sort of JITing your VM, (using GNU lightning, Libjit, LLVM JIT)

Regards.

In particular, if you have

In particular, if you have functional values (or closures) in your language, you need an apply primitive (which the JVM don't have).

Objects are closures. Example is in C#, since I'm not familiar with Java's generics syntax:

public interface Function1<T,R> {
  R Apply(T t);
}
public interface Function2<T,U,R> {
  R Apply(T t, U u);
}
...

While in principle closures

While in principle closures and objects are equivalent, in practice it is not the case.

Witness the poor performance of any functional language (be it Scheme) implementation on the JVM.

Scala seems to do OK

This comparison by Derek Young shows Scala and Java giving comparable performance. I'm no expert on benchmarks, but...

Scheme isn't used on the JVM

Scheme isn't used on the JVM because the JVM can't easily support all of Schemes features, ie. continuations. Any Scheme implementation on the JVM thus receives little attention, and correspondingly, there is little incentive to tune it. It has nothing to do with achievable performance. IronPython does pretty well on the CLR, and other than full continuations, I see nothing in Scheme that isn't present Python in some form.

SISC

I'll be sure to tell that to the developer of SISC. :-)

To be fair, we're talking

To be fair, we're talking about compilers, not interpreters. Providing full R5RS seems easier in an interpreter than a compiler, though there are certainly techniques that can be used.

s48vm

I generally agree with this, though I don't think that first-class functions in your language means that the good choices of VM are those with closures.

It's worth advertising the interesting and somewhat neglected VM technology in scheme48, which consists of a prescheme compiler and a VM written in prescheme. Prescheme is comparable to C in the cophistication of the runtime environment it provides (no GC, no heap management, etc.), but is a subdialect of scheme which only permits stack-allocated lambda abstractions (ie. code is only accepted by the prescheme compiler if, after macro expansion, each lambda form can be represented by a stack frame).

I would say...

... GNU Smalltalk or Lua. The latter if you want to play with something small and embeddable, the former might be easier if you want to write your compiler in a high-level language (i.e. not C).

other hll as a vm

Using any high-level language (not just lisp) as a "VM" works too, and more than a few languages have been implemented as a compiler to C; of course some will be more suitable than others. You could use C and the tiny c compiler. And don't forget about forth - lots of choices there, like gforth or FuelVM.

C has several shortcomings as an intermediate language

* Poor support for non-LIFO flow control. For non-local but LIFO flow control (such as exception-handling) one can use setjmp/longjmp, but that's ugly and expensive. Things like continuations or coroutines are much more difficult to code up in C (often needing tricks like "trampolines", or the One Big Interpreter Function Loaded With Gotos).

* No support for tail-call elimination. Some compilers may provide this as an optimization in some cases. In C, this isn't a big deal, as induction over something is handled with loops; but when compiling a recursion-heavy language, you want to have tail-call elimination.

* No portable (and fast) way to do arbitrary-precision math; in particular, there's no portable way to get access to "carry" bits and such. Inline assembly works here, but that restricts portability.

* Flat namespace.

* Dumb linkers. Many whiz-bang HLL features will find problems when you try separate compilation and linking. It took years for C/C++ linkers to sensibly handle templates, after all.

(Much material cribbed from this c2 page)

Well... I'm in some sort of

Well... I'm in some sort of dilemma. Im planning to implement the Lexer and parser in a functional language (F#) also, I want to implement some simple techniques of code optimization. And those are things that LLVM already provides.

I think that using LLVM would teach me some basics of garbage collection and other low level issues, but maybe it would be too much for me.

I wanted also to get some interoperability with an "already working" programming language and the comment of Basile STARYNKEVITCH its enlightening. Targeting the Lua or Ocaml VM sounds pretty good to me... but I'll have to think a little bit more.

Thanks to all :D
All Your comments have been a great help

With F# you can generate

With F# you can generate .NET assemblies using System.Reflection.Emit. The benefits of this include being able to write a compiler that can dual as an interpreter, and you don't need to use a stand-alone assembler/compiler.

or C#, or VB, or managed C++, or L#, or ...

This is true for any .NET language. Personally I use C# because the free MS IDE rocks my world and I can achieve some degree of portability with Mono.

There's an F# exension to

There's an F# exension to Visual Studio. It's rather nice, actually.

edit:
Oh, I see what you mean. The extension doesn't work with the free version of VS. Oh well...use vim?

It does, but you have to roll your own.

The F# extension may not work with the Express IDEs (possibly because they're based on the 'isolated' VS Shell?), but it does work with the free VS 2008 'integrated' Shell, which is available here.

I think that using LLVM

I think that using LLVM would teach me some basics of garbage collection and other low level issues, but maybe it would be too much for me.

You can certainly use GC libraries or other runtime support code developed in C and link these into your generated LLVM code. (You may also be interested in checking out c--, which is a portable intermediate representation that supports a variety of runtime support code.)

If your heart is set on targeting the JVM, I'd recommend using soot instead of one of the "JVM assemblers," but I prefer to deal with three-address code rather than stack-based bytecode directly, and this recommendation reflects that bias. (Although soot also supports a representation that is close to bytecode if you prefer that.) I don't think that targeting the JVM will be the easy way to go, though.

You might have already done this, but if you haven't, you may want to consider developing an interpreter first, especially if your language has some novel features. That way, you can get a better picture of what you might want in an framework for (static or dynamic) compilation -- will you need a calling convention that doesn't obey a stack discipline? (what about exceptions?) How will you manage memory? How will you represent various kinds of values? In my experience, answering these questions is much easier in an interpreter.

Finally, you can find a wide variety of research infrastructures at compiler-tools.org. (I should probably have mentioned this first!) If your favorite tools aren't on the list, please let me know and I'll add them.

Good luck!

Portability of CLR

I just noticed that you had said the CLR is not as portable as the JVM. Not sure that's true, as Mono is available on quite a few platforms.

Quite few platforms, yes.

While I was pleasantly surprised at the number of platforms available now compared to the last time I looked, Mono is still very much less portable than the JVM.

Mono is available on (several variants of 32 bit x86) and (on various stuff running Linux) and (a couple of other platforms). Very reasonable, fairly mainstream combinations like e.g. FreeBSD running on x86 in 64-bit mode ("amd64") aren't supported. There are JVM interprets available for WAY more alternatives. The list of Java virtual machines itself is longer than the number of platforms that Mono has been ported to: http://en.wikipedia.org/wiki/List_of_Java_virtual_machines (and includes listings of support for a fair number of platforms that Mono doesn't support).

Also, the JVM is much more commonly available - ie, installed - than Mono.

That's without any judgement as to whether the JVM or CLR are superior or portable in the technical sense, just availability today.

Eivind.

Mono It's a great project

Mono It's a great project and currently _most_ of the things I have done in Linux with it have worked without problem. But IMHO mono still has some issues going on. Specially with the BCL, and as I said I want to do some interoperation with an existing language (maybe libraries) on whatever VM I choose... In my little experience with F# I had to get into my Windows session to get things work properly.

About System.Reflection.Emit... I'll have to look at it. Thanks Curtis

Or you could just develop

Or you could just develop using Mono on Windows too, and you'd have complete portability from the get-go. No doubt Mono's BCL is incomplete, but enough functionality is there for the majority of applications. F# and Nemerle are both quite usable with Mono, and each of them provides their own mini-BCL (tuples in F# for instance), which I suspect you'll end up doing as well.

ASM for JVM bytecode generation

Try the ASM tool for generating java bytecode: http://asm.objectweb.org/
It's used by other jvm languages like Groovy. It has tools for converting java code to the asm code needed to generate the same bytecode.
And for a parser there is ANTLR: http://www.antlr.org/

other VM's

If it's a functional language, you might try targeting it to a lesser known VM I wrote called avram. If you want high performance multi-threading, I would think Core Erlang is a good choice.

I have made up my decision

I think that the Lua's 5.1 VM has everything I need. I can do some interoperability with Lua's libraries. It's pretty fast. And I have found good documentation about it. There's also support for closures.

This one is other useful thread.
http://lambda-the-ultimate.org/node/1617

Thanks to all