HLVM - High Level Virtual Machine toolkit for dynamic languages

Apologies if this has been posted already, but I haven't seen this mentioned on LtU before.

HLVM is...

"A complete compiler developer's toolkit for creating new languages easily. To write a new compiler, language designers simply write a plugin that describes the language to HLVM and how to translate the grammar productions into HLVM's comprehensive Abstract Syntax Tree (AST). After that, HLVM handles all aspects of code generation, bytecode storage, XML translation, JIT execution or interpretation, and native compilation."

"Aimed at supporting dynamic languages such as Ruby, Python, Perl, Jython, Haskell, Prolog, etc."

"A language interoperability framework."

I'm not sure if "dynamic" means dynamically typed or truly dynamic in that the semantics of the language can be changed from within the language itself.

HLVM is implemented on top of the LLVM that has been discussed on LtU a few times.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


In what sense can Haskell be described as dynamic?

Haskell with the Dynamic

Haskell with the Dynamic type and hs-plugins isn't a bad choice for building dynamic systems.

Got method aliasing?

Their developers' wiki makes me think that they were not so informed. (got garbage collection, Scheme? Lambdas in ML?)

Wiki Never Got Started

The wiki was started as a place to discuss things, but we ended up just doing most of it on IRC. About the only useful content on the wiki is the downloadable IRC conversations. At some point (when I get around to it) the wiki will be taken down. We'll just keep the IRC conversations and turn it into documentation, eventually. Keep in mind, HLVM is being released *VERY* early. Its approaching release 0.2.

Dynamism, Legacy of the Visigoths

"Dynamic" is technical jargon used by programmers, meaning "good". It derives from the Latin dyno mite, meaning "I am extremely pleased", and is first recorded in the historical work Bona Aetas of noted Roman sage and pundit J.J. Walker. Its meaning evolved in the 4th century after monks copying an obscure manuscript on programming linguistics in their ignorance tried to deduce its meaning from context.

In this (occidental) manuscript, the Lingua Lambda, the author described how he had stumbled across Miranda, an early ancestor of Haskell, a typed language that had found its way to the West from the Orient, and which, though crude in some ways, supported many fine features and was, in fact, lauded as the language for discriminating hackers. The author wrote an essay about this language, describing its features, and noted (Miranda dyno mite!) how pleased he was with it.

These monks had lived in monasteries for most of their lives, programming only in C; most of them had never heard of languages like ML or Miranda, and, if they had, would have dismissed them as Oriental nonsense. But this century was, for these monks, also a time of change; the last barbarian invasion had been repelled, but the fleeing barbarians had left behind their legacy, the untyped programming languages. Many of these were adopted by such monks — and thus we now call them "scripting languages" — who were dazzled by features such as "blocks" and "duck typing". (One can still detect in these phrases the vulgarity of their barbarian progenitors. Naturally, the West promptly plunged into a dark age...)

In the Orient, though, typed languages had long supported features such as higher-order functions, structural typing, automatic garbage collection, REPL-style interactive interpretation and user-definable syntax. But for the monks laboring in darkness, these were thoroughly new ideas, and they reasoned that they must be uniquely characteristic of untyped languages.

So it was that they translated Miranda dyno mite as Miranda is untyped, and now we must live with that confusion. Given the meaning of the words in our everyday language, it is, when you pause to think about it, strange that when a programmer asks you how you are, the proper response is "Dynamic, thanks!" if you are feeling well and "Kinda static today..." if you are ill, but the history of language is full of twists and turns, and, after all, far from rational...

dynamic peanut gallery

Frank Atanassow: "Dynamic" is technical jargon used by programmers, meaning "good". It derives from the Latin dyno mite, meaning "I am extremely pleased", and is first recorded in the historical work Bene Tempus of noted Roman sage and pundit J.J. Walker.

That's priceless! The whole conceit works really well as a venue for figurative commentary. If you wrote a book like this, I'd be first in line for a copy. I hope you revisit that style again.

(To be consistent I'll keep all the rest of this post only figuratively related, then post a substantive remark elsewhere.)

A very interesting take on the opposition of dynamic and static appears in Robert Pirsig's Lila, which I read about a dozen years ago. Pirsig is best known for Zen and the Art of Motorcycle Maintenance in which he (among other things) explores the meaning of the word 'quality' at a profound level the way a rhetorician would in the pursuit of abstraction to the point of nervous breakdown.

Lila might be considered a philosophical sequel of sorts, to continue exploring what it means to have meaningful attributes (qualities) with respect to some perceiving agent. I had an 'aha' experience somewhere in Pirsig's presentation of dynamic vs static, so it stuck in my mind. Roughly speaking, either extreme has diminishing returns that makes the other more valuable in context, and both are required (in some balance) to reap the benefit of the other. You can also read the same concepts as innovation vs standards in the technical world.

[Edit: somewhere in the middle between chaos and order, you find a good edgy place you might call the edge, where one finds interesting tension; for example, in stories this is a spot between violence and safety. In academic papers, it might be a spot between wildness and droning. The idea can inform a way of writing funny ironic material in a fashion I like to call waiting for the other shoe to drop. The first shoe is static (something you already know), and the second is one you suspect might be coming. Yes, there's an obvious information theoretic aspect to this.]

Oh, that's beautiful!

Thank you. You've made my day.


Very well written! I'm still chuckling.

Looks like a bit too early

...to judge the project, as it only started in April, and seems to not have any contents except administrative (correct me if I am wrong).

[on edit: ah, I didn't notice Latest Source link on the right hand side. But I still see just sources, and no docs/FAQ/etc.]

initial reactions to new hlvm project

I feel obliged to comment since I recently advocated variety in VMs for dynamic languages. This is my first notice of these two items. As an addendum below, I'll pick something I find interesting.

The LLVM (low level vm) Compiler Infrastructure is the older project of the two, funded by an NSF grant, with approximately twenty developers. One of these folks, Reid Spencer, apparently started the newer HLVM (high level vm) project a few months ago, with plans to use llvm as some form of underlying infrastructure. I'm uncertain from the docs how Spencer plans to apply llvm in a way especially suited for making high level languages.

I looked a bit more at llvm docs and thought its scale made it somewhat hard for an individual to know everything. So it seems a team oriented technology base, where one must trust other folks have not made errors compromising (for example) memory integrity in a memory safe language. Parts of the technology derive from gcc, such as gcc's frontend for C and C++ (because llvm appears to focus on those two langauges right now), and thus the starting technology base in terms of repurposed lines of code is large in size.

Now, imagine you're someone who wants to use a technology base written by one or two (or at very most three) different developers, so it's feasible to know everything by reading every line of code and vetting your trust in the code on a microcosmic level. For such a person, llvm is a hard sell, but has some bits that ought to inform specs for smaller projects.

Recently I've been looking at specs for various assemblers to find directives (etc) that express semantics I might also want to express in directives for other sorts of VM code assemblers. The llvm project has tech along these lines which isn't simply redundant with older traditional materials.

One of the general things I like about llvm is a focus on long term transformation and generation of code. I just wouldn't want to use something with a focus on C or C++ in the mix. I'd prefer not to let an address space I hope to make memory safe include assorted random bits of gathered C and C++ code utils, since I have a good feel for how large a risk is introduced that way.

HLVM Status

It is too early to judge the project yet. Even I don't know what it will be when it grows up. I have a concept of the end target, but there have already been several diversions from course. For example, I didn't expect to write a random test case generator, but I did, simply out of a need for test cases.

However, HLVM isn't vaporware either. As it stands, it is close to "Turing complete" (in the lax sense). That is the focus of the next release, 0.2. If you look around on both the left and right hand side you will find documentation (mostly on how to build), the beginnings of an FAQ, source code, email lists, etc. Because HLVM is still undergoing development, it is premature to commit time to documenting it thoroughly. I do try to keep the doxygen information up to date. It is current as of a week ago. While the doxygen information doesn't help much with getting a general idea for HLVM, you can glean a few things if you read between the lines. If you don't want to invest the time, wait a few months. I will write documentation for it, at the right time.

Dynamic languages ?

Funny that. Haskell is now officially a dynamic language.

Edit: Gasp. Grzegorz drew faster than me.

C++ ?

Why using a low-level language like C++ to do such an high-level task ?

Also, I didn't find any documentation about the "Plugin" system. How does it compare to Neko ? In Neko, you would write a Ruby-to-Neko source translator, then compile the generated Neko program to bytecode using the Neko compiler and finally run the bytecode on the Neko VM.

I take it generating Neko

I take it generating Neko bytecode directly is frowned upon?

The NekoVM bytecode itself

The NekoVM bytecode itself is not documented, and it is not encouraged to directly target the bytecode but instead generate Neko source code. There's a small extension called NXML that allow inclusion of original filenames and linenumbers for more appropriate exception stack traces for example.

Fair enough. Don't suppose

Fair enough. Don't suppose anyone else's built a haskell datatype for the AST and a pretty-printer so I don't have to?

Not yet. But it's an

Not yet. But it's an interesting work to do, isn't it ?

Will post source somewhere

Will post source somewhere and give you a heads-up if I ever get it done. Don't let that stop anyone else from doing it faster though.

Maybe because it doesn't exist?

The key data structure will probably be the AST and there's some Doxygen output for it. Other than that, the status page seems to confirm that it's currently vaporware. That said, the roadmap does seems pretty reasonable and I didn't see any signs of the Far Reaching Yet Forcefully Vacuous Manifesto that so many other website-only software projects seem to have.

Also, I think that even if they're going to pass in-memory ASTs around from the front-end to the back-end, a human-readable text format for that data structure is an inevitability, if only for debugging.

HLVM Released Early

HLVM isn't vaporware, its just being released early. I'm working under the premise of "Release Early, Release Often". Peer review, commentary, and criticism are all good things in my books. It helps make the end product better. For direct commentary, please send an email to hlvm-dev@hlvm.org.

To counter the vaporware claim, HLVM currently contains 28,331 lines of code which has been developed at the average rate of about 524 lines per day. There are also 3179 lines of documentation (much more needed). There have also been 407 commits to the Subversion repository. HLVM isn't vaporware, but I will grant that it is very early in its lifecycle.

It seems that the vaporware reaction is in regards to the HLVM home page which states the goals of HLVM as if they were an accomplished fact. When I wrote that I didn't expect such a reaction. I was merely envisioning the key things I wanted HLVM to be and wrote what I was envisioning. Perhaps that was a mistake. I will update the home page to make clear that HLVM is a project under development.

The AST is the key data structure.

The AST's human-readable form (barely!) is XML. You can find the Relax/NG grammar for the XML format here. The software exists currently to both read and write this format for the AST. For debugging, there is an "hlvm::dump(hlvm::Node*)" function associated with the XML Writer that can be used to dump out any AST node in XML form. For example, from gdb just invoke "call hlvm::dump($X)". The XML format is used for much more than just debugging. It is the basis of the test suite in HLVM. It allows me to work directly with the AST and avoid front-end language issues. The front ends will have their own test suites too, but for now it is important to ensure that the AST is functioning well.

C++ A Practicality

The only effective way to access LLVM is via C++ so those portions of HLVM that need to access LLVM are written in C++. This is nothing other than a pragmatic decision. We fully expect to write higher level portions of HLVM in other languages; we just haven't yet.

The "plugin system" doesn't exist yet. We've discussed it on IRC. Those discussions are in the wiki, but generally not in a highly accessible form. I'll try to correct this by turning our IRC discussions into some form of documentation.

HLVM and Neko are similar. The main difference is that HLVM uses LLVM for its code generation and bytecode. In HLVM, front ends can take three forms. They can be: (1) a plugin which uses the front end library (doesn't exist yet) to build the AST, (2) A stand-alone program that produces the AST nodes in HLVM's XML format which are then read by the hlvm-compiler (exists), or (3) A C++ program that directly creates the C++ AST Nodes.

So in HLVM, you could write a Ruby-to-HLVM source translator as well. The hlvm-compiler tool can then be used to translate the program to LLVM bytecode, LLVM assembly, or a native executable. If bytecode is chosen, the program can be loaded by the HLVM runtime and JIT compiled and executed. The JITC execution is one of the goals of the next release, 0.2.