Cross language runtimes
Draining the Swamp: Micro Virtual Machines as Solid Foundation for Language Development
Kunshan Wang, Yi Lin, Stephen Blackburn, Michael Norrish, Antony Hosking
Many of today's programming languages are broken. Poor performance, lack of features and hard-to-reason-about semantics can cost dearly in software maintenance and inefficient execution. The problem is only getting worse with programming languages proliferating and hardware becoming more complicated. An important reason for this brokenness is that much of language design is implementation-driven. The difficulties in implementation and insufficient understanding of concepts bake bad designs into the language itself. Concurrency, architectural details and garbage collection are three fundamental concerns that contribute much to the complexities of implementing managed languages. We propose the micro virtual machine, a thin abstraction designed specifically to relieve implementers of managed languages of the most fundamental implementation challenges that currently impede good design. The micro virtual machine targets abstractions over memory (garbage collection), architecture (compiler backend), and concurrency. We motivate the micro virtual machine and give an account of the design and initial experience of a concrete instance, which we call Mu, built over a two year period. Our goal is to remove an important barrier to performant and semantically sound managed language design and implementation.
Inside you will find the specification of an LLVM-inspired virtual instruction set with a memory model (enables proper GC support) including a specification of concurrent weak-memory operations (reusing C(++)11, a debatable choice), relatively rich control-flow primitive (complete stack capture enabling coroutines or JIT-style de-optimization), and live code update.
The .NET Compiler Platform (Roslyn) provides open-source C# and Visual Basic compilers with rich code analysis APIs. You can build code analysis tools with the same APIs that Microsoft is using to implement Visual Studio!
In a nutshell: OPEN SOURCE C# COMPILER. Putting aside possible practical implications of this for the .NET ecosystem, I think it is good for programming language geeks to be able to peruse the source code for compilers and language tools.
For the debate about MS being evil, you can head directly to HN where you'll also find an explanation of what bootstrapping a compiler means.
Here's Gil Tene on Azul's Pauseless Garbage Collector for the JVM.
One of the key techniques that we use is massive and rapid manipulation of virtual memory mappings. We will change mappings of virutal to physical memory at the rate of Java allocation.
The same read barrier I mentioned before will also intercept any attempt to read a reference to an object that has been relocated, and that allows us to lazily relocate references without needing a pause. We compact by moving an entire virtual page worth of objects, we kind of blow it up, moving all the live objects to other places and thereby compacting them. But we don't try and locate and find all the pointers to that page immediately.
The challenge seems to be that standard OSes don't currently have enough hooks for them to do this kind of thing so their runtime must live in either their custom hardware and OS or a virtual machine.
a dynamically-typed concurrent language in which lightweight isolated processes communicate by message passing. Thorn includes powerful aggregate data types, a class-based object system, first-class functions, an expressive module system, and a variety of features supporting the gradual evolution of prototype scripts into robust programs.
Thorn is implemented by a compiler targeting the JVM and a Java interpreter, and syntactically resembles Scala, at least superficially.
One of those "features" is a unique (as far as I know) soft type system:
In Thorn, the type dyn (for dynamic) is assumed as default (and never written explicitly). At the other extreme, Thorn supports concrete types, as used in statically typed programming languages. A variable of a concrete type T is guaranteed to refer to a value of that type (or a subtype). [...] While concrete types help with performance and correctness, they introduce restrictions on how software can be used and make rapid development more difficult; scripting languages do not favor them.
As an intermediate step between the two, we propose like types, getting some of the safety of concrete types while retaining the flexibility of dynamic types. Concrete types for var x:T or fun f(x:T) are used in two main places. At a method call x.m(), a static type check ensures that x actually has an m method. At a binding or assignment, like x := y; or f(y), a static type check can ensure that y's value makes sense to assign to x, can reject it entirely, or can inspire a dynamic check. Like types, var x: like T or fun f(x:like T), give the expressive power of concrete type checks on method calls, but do not constrain binding or
assignment. They do require runtime checks and thus may cause programs to fail with runtime type errors: sometimes fewer and never more than dynamic types do.
Concurrency is also a little odd:
Every component (marked by the keyword spawn) runs in a different JVM. Component handles contains sufficient information to identify the node and port on which the component runs.
A couple of papers are linked to the home page; "Thorn - Robust, Concurrent, Extensible Scripting on the JVM", by Bard Bloom, et. al., is a general description of the language, from which come the quotes above; and "Integrating Typed and Untyped Code in a Scripting Language", by Tobias Wrigstad, et. al., with more information about like types.
I have not seen Thorn here before. Apologies if I have just missed it.
One of the future additions to C# announced by Anders Hejlsberg in this entertaining video from 2008 is Compiler as a Service. By that he means the ability to
eval code strings (and I'm guessing that this will also be integrated with C#'s built-in AST objects).
He shows this off at around minute 59, to great effect and great excitement by the audience. It feels like an inflection point. There probably won't be another REPL-less language from now on.
I predict that after that, they'll add hygienic macros and quasisyntax.
Here's a little sausage making article for JVM language implementors. In Compiling Structural Types on the JVM: A Comparison of Reflective and Generative Techniques from Scalaâ€™s Perspective, Gilles Dubochet and Martin Odersky describe
Scalaâ€™s compilation technique of structural types for the JVM. The technique uses Java reflection and polymorphic inline caches. Performance measurements of this technique are presented and analysed. Further measurements compare Scalaâ€™s reflective technique with the â€œgenerativeâ€ technique used by Whiteoak to compile structural types. The article ends with a comparison of reflective and generative techniques for compiling structural types. It concludes that generative techniques may, in specific cases, exhibit higher performances than reflective approaches, but that reflective techniques are easier to implement and have fewer restrictions.
There's no discussion of the the proposed JVM "method handles" and whether they might be an even better solution than runtime reflection.
Whiteoak was mentioned previously on LtU.
ACM Press Release:
The ACM Special Interest Group on Programming Languages (SIGPLAN) today presents its first-ever Programming Languages Software Award to Chris Lattner of Apple Inc. for his design and development of the Low Level Virtual Machine (LLVM), a compiler infrastructure that has been quickly adopted by a wide array of industry and academic organizations. Since LLVMâ€™s release as an open source compiler infrastructure in October 2003, companies including Apple, Adobe, and Cray have incorporated it into their commercial products, reflecting its simplicity, flexibility, and versatility.
VMKit: a Substrate for Managed Runtime Environments, VEE '10
Managed Runtime Environments (MREs), such as the JVM and the CLI, form an attractive environment for program execution, by providing portability and safety, via the use of a bytecode language and automatic memory management, as well as good performance, via just-in-time (JIT) compilation. Nevertheless, developing a fully featured MRE, including e.g. a garbage collector and JIT compiler, is a herculean task. As a result, new languages cannot easily take advantage of the benefits of MREs, and it is difficult to experiment with extensions of existing MRE based languages.
This paper describes and evaluates VMKit, a first attempt to build a common substrate that eases the development of high-level MREs. We have successfully used VMKit to build two MREs: a Java Virtual Machine and a Common Language Runtime. We provide an extensive study of the lessons learned in developing this infrastructure, and assess the ease of implementing new MREs or MRE extensions and the resulting performance. In particular, it took one of the authors only one month to develop a Common Language Runtime using VMKit. VMKit furthermore has performance comparable to the well established open source MREs Cacao, Apache Harmony and Mono, and is 1.2 to 3 times slower than JikesRVM on most of the DaCapo benchmarks.
So... One person built a CLR using VMKit in one month. One consequence of such faster development speeds is that language designers do not have to feel so restricted when targeting a Managed Runtime Environment for their language. If the MRE they want to target has restrictions, they can fork it. If the MRE specification has a gray area, then they can quickly prototype a solution to clarify what the behavior should be for that gray area of the specification. If you are a researcher/student and want to experiment with a new language design and implementation, then you can do so incrementally by first augmenting the MRE and then targeting your language to that new MRE; you can then benchmark the improvements by using the original MRE as a baseline.
Delimited Control in OCaml, Abstractly and Concretely, System Description
We describe the first implementation of multi-prompt delimited control operators in OCaml that is direct in that it captures only the needed part of the control stack. The implementation is a library that requires no changes to the OCaml compiler or run-time, so it is perfectly compatible with existing OCaml source code and byte-code. The library has been in fruitful practical use for four years.
We present the library as an implementation of an abstract machine derived by elaborating the definitional machine. The abstract view lets us distill a minimalistic API, scAPI, sufficient for implementing multi-prompt delimited control. We argue that a language system that supports exception and stack-overflow handling supports scAPI. Our library illustrates how to use scAPI to implement multi-prompt delimited control in a typed language. The approach is general and can be used to add multi-prompt delimited control to other existing language systems.
Oleg was kind enough to send me an e-mail letting me know of this paper's existence (it appears not yet to be linked from the "Computation" page under which it is stored) and to include me in the acknowledgements. Since the paper in its current form has been accepted for publication, he indicated that it can be made more widely available, so here it is. In typical Oleg fashion, it offers insights at both the theoretical and implementation levels.
Bytecodes meet Combinators: invokedynamic on the JVM. John Rose. VMIL'09.
The Java Virtual Machine (JVM) has been widely adopted in part because of its classfile format, which is portable, compact, modular, verifiable, and reasonably easy to work with. However, it was designed for just one languageâ€”Javaâ€”and so when it is used to express programs in other source languages, there are often â€œpain pointsâ€ which retard both development and execution. The most salient pain points show up at a familiar place, the method call site.
To generalize method calls on the JVM, the JSR 292 Expert Group has designed a new invokedynamic instruction that provides user-defined call site semantics. In the chosen design, invokedynamic serves as a hinge-point between two coexisting kinds of intermediate language: bytecode containing dynamic call sites, and combinator graphs specifying call targets. A dynamic compiler can traverse both representations simultaneously, producing optimized machine code which is the seamless union of both kinds of input. As a final twist, the user-defined linkage of a call site may change, allowing the code to adapt as the application evolves over time. The result is a system balancing the conciseness of bytecode with the dynamic flexibility of function pointers.
The abstract is pretty vague, but this paper is actually quite interesting, particularly if you're interested in meta-object protocols and if, like me, you don't have the interest or patience to read JSRs. Of course, invokedynamic has been discussed many times over the years. The wheels of Java turn slowly...