Implementation

Xtext: An IDE on the cheap

The introduction of Helios (Eclipse 3.6) included the release of version 1.0 of Xtext - Language Development Framework.

With Xtext you can easily create your own programming languages and domain-specific languages (DSLs). The framework supports the development of language infrastructures including compilers and interpreters as well as full blown Eclipse-based IDE integration.

Given a grammar, Xtext derives a parser and an IDE with syntax highlighting, code completion, code folding, outline view, real-time error reporting, quick fixes among other standard IDE features. The models then can be used as an EMF Resource(ie as an interpreter) or with a little more work they can be used to generate code as well.

Check out the video clips on their website or the Webinar for a more detailed look.

Racket Released

Racket is a programming language. Racket is the language implementation formerly known as PLT Scheme. Racket is available now. To quote the release announcement:

With Racket, you can script command shells and web servers; you can
quickly prototype animations and complex GUIs; regexps and threads
are here to serve you. To organize your systems, you can mix and
match classes, modules or components. Best of all, you start
without writing down types. If you later wish to turn your script
into a program, equip your Racket modules with explicit type
declarations as you wish. And Racket doesn't just come as a typed
variant; you can also write your modules in a purely functional and
lazy dialect.

Racket comes in so many flavors because Racket is much more than a
standard scripting language or a plain programming language. Racket
supports language extensibility to an unequaled degree. A Racket
programmer knows that making up a new language is as easy as writing
a new library.

If you haven't looked at, say, Typed Racket or how to create your own languages in Racket, now might be a good time.

Formal Compiler Implementation in a Logical Framework

Hickey, Jason and Nogin, Aleksey and Granicz, Adam and Aydemir, Brian (2003) Formal Compiler Implementation in a Logical Framework. Technical Report. California Institute of Technology.

The task of designing and implementing a compiler can be a difficult and error-prone process. In this paper, we present a new approach based on the use of higher-order abstract syntax and term rewriting in a logical framework. All program transformations, from parsing to code generation, are cleanly isolated and specified as term rewrites. This has several advantages. The correctness of the compiler depends solely on a small set of rewrite rules that are written in the language of formal mathematics. In addition, the logical framework guarantees the preservation of scoping, and it automates many frequently-occurring tasks including substitution and rewriting strategies. As we show, compiler development in a logical framework can be easier than in a general-purpose language like ML, in part because of automation, and also because the framework provides extensive support for examination, validation, and debugging of the compiler transformations. The paper is organized around a case study, using the MetaPRL logical framework to compile an ML-like language to Intel x86 assembly. We also present a scoped formalization of x86 assembly in which all registers are immutable.

I don't understand the details of this paper, but the general approach of using rewriting for compilation (including passes such as closure conversion) is very interesting in itself.

(Edit: As a bonus, the paper is derived from the literate MetaPRL sources.)

(Edit: Oleg's applicative-order term rewriting system for code generation shows how simpler rewriting systems can be used in the context of compilation.)

Milawa: A Self-Verifying Theorem Prover for an ACL2-Like Logic

Milawa: A Self-Verifying Theorem Prover for an ACL2-Like Logic

Milawa is a "self-verifying" theorem prover for an ACL2-like logic.

We begin with a simple proof checker, call it A, which is short enough to verify by the "social process" of mathematics.

We then develop a series of increasingly powerful proof checkers, call them B, C, D, and so on. We show that each of these is sound: they accept only the same formulas as A. We use A to verify B, and B to verify C, and so on. Then, since we trust A, and A says B is sound, we can trust B, and so on for C, D, and the rest.

Our final proof checker is really a theorem prover; it can carry out a goal-directed proof search using assumptions, calculation, rewrite rules, and so on. We use this theorem prover to discover the proofs of soundness for B, C, and so on, and to emit these proofs in a format that A can check. Hence, "self verifying."

This might help inform discussions of the relationship between the de Bruijn criterion (the "social process" of mathematics) and formal verification. I think it also serves as an interesting signpost on the road forward: it's one thing to say that starting with a de Bruijn core and evolving a more powerful prover is possible in principle; it's another thing for it to actually have been done. The author's thesis defense slides provide a nice, quick overview.

VMKit: a Substrate for Managed Runtime Environments, VEE '10

VMKit: a Substrate for Managed Runtime Environments, VEE '10

Managed Runtime Environments (MREs), such as the JVM and the CLI, form an attractive environment for program execution, by providing portability and safety, via the use of a bytecode language and automatic memory management, as well as good performance, via just-in-time (JIT) compilation. Nevertheless, developing a fully featured MRE, including e.g. a garbage collector and JIT compiler, is a herculean task. As a result, new languages cannot easily take advantage of the benefits of MREs, and it is difficult to experiment with extensions of existing MRE based languages.

This paper describes and evaluates VMKit, a first attempt to build a common substrate that eases the development of high-level MREs. We have successfully used VMKit to build two MREs: a Java Virtual Machine and a Common Language Runtime. We provide an extensive study of the lessons learned in developing this infrastructure, and assess the ease of implementing new MREs or MRE extensions and the resulting performance. In particular, it took one of the authors only one month to develop a Common Language Runtime using VMKit. VMKit furthermore has performance comparable to the well established open source MREs Cacao, Apache Harmony and Mono, and is 1.2 to 3 times slower than JikesRVM on most of the DaCapo benchmarks.

So... One person built a CLR using VMKit in one month. One consequence of such faster development speeds is that language designers do not have to feel so restricted when targeting a Managed Runtime Environment for their language. If the MRE they want to target has restrictions, they can fork it. If the MRE specification has a gray area, then they can quickly prototype a solution to clarify what the behavior should be for that gray area of the specification. If you are a researcher/student and want to experiment with a new language design and implementation, then you can do so incrementally by first augmenting the MRE and then targeting your language to that new MRE; you can then benchmark the improvements by using the original MRE as a baseline.

Bart De Smet on .NET 4's System.Interactive library

Microsoft employee Bart De Smet, who has a widely trafficked blog, has been writing a lot in the past few months about a new library being designed by his group at Microsoft. Here is a whole truckload of blogpost links, in chronological order, which appears to be how Bart intended folks to read it:

Dec 26, 2009: More LINQ with System.Interactive – The Ultimate Imperative
Dec 27, 2009: More LINQ with System.Interactive – Exceptional Exception Handling
Dec 28, 2009: More LINQ with System.Interactive – Sequences under construction
Dec 29, 2009: More LINQ with System.Interactive – Exploiting the code = data relationship
Dec 30, 2009: More LINQ with System.Interactive – More combinators for your Swiss Army Knife
Jan 01, 2010: The Essence of LINQ – MinLINQ
Jan 07, 2010: More LINQ with System.Interactive – Functional fun and taming side-effects

I don't usually read blogs, but I thought this was a pretty cogent series of posts. Also, judging by how interested LtU and the surrounding blogosphere community was from Erik Meijer's presentation on the Rx framework at the JVM Language Summit 2009, I figured people would like this as well.

Resolving and Exploiting the k-CFA Paradox

Resolving and Exploiting the k-CFA Paradox, Matthew Might, Yannis Smaragdakis, and David Van Horn. To appear in PLDI 2010.

Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented (OO) languages. Despite the similarities, the vocabulary and results in the two communities remain largely distinct, with limited cross-understanding. One of the few links is Shivers's k-CFA work, which has advanced the concept of "context-sensitive analysis" and is widely known in both communities. Recent results, however, indicate that the relationship between the different incarnations of the analysis is not understood.

Van Horn and Mairson proved k-CFA for k ≥ 1 to be EXPTIME-complete, hence no polynomial algorithm exists. Yet there have been multiple polynomial formulations of context-sensitive points-to analyses in OO languages. Is functional k-CFA a profoundly different analysis from OO k-CFA? We resolve this paradox by showing that OO features conspire to make the exact same specification of k-CFA be polynomial-time: objects and closures are subtly different, in a way that interacts crucially with context-sensitivity. This leads to a significant practical result: by emulating the OO approximation, we derive a polynomial hierarchy of context-sensitive CFAs for functional programs, simultaneously achieving high precision and efficiency.

I learned that performance bounds on flow analysis were fascinating from earlier work by David van Horn and Harry Mairson, so it's good to see that this line of work is still being continued, and even better to see new algorithms come out of it.

Delimited Control in OCaml, Abstractly and Concretely, System Description

Delimited Control in OCaml, Abstractly and Concretely, System Description

We describe the first implementation of multi-prompt delimited control operators in OCaml that is direct in that it captures only the needed part of the control stack. The implementation is a library that requires no changes to the OCaml compiler or run-time, so it is perfectly compatible with existing OCaml source code and byte-code. The library has been in fruitful practical use for four years.

We present the library as an implementation of an abstract machine derived by elaborating the definitional machine. The abstract view lets us distill a minimalistic API, scAPI, sufficient for implementing multi-prompt delimited control. We argue that a language system that supports exception and stack-overflow handling supports scAPI. Our library illustrates how to use scAPI to implement multi-prompt delimited control in a typed language. The approach is general and can be used to add multi-prompt delimited control to other existing language systems.

Oleg was kind enough to send me an e-mail letting me know of this paper's existence (it appears not yet to be linked from the "Computation" page under which it is stored) and to include me in the acknowledgements. Since the paper in its current form has been accepted for publication, he indicated that it can be made more widely available, so here it is. In typical Oleg fashion, it offers insights at both the theoretical and implementation levels.

Verified Just-In-Time Compiler on x86

Verified Just-In-Time Compiler on x86
Magnus O. Myreen

This paper presents a method for creating formally correct just-in-time (JIT) compilers. The tractability of our approach is demonstrated through, what we believe is the first, verification of a JIT compiler with respect to a realistic semantics of self-modifying x86 machine code. Our semantics includes a model of the instruction cache. Two versions of the verified JIT compiler are presented: one generates all of the machine code at once, the other one is incremental i.e. produces code on-demand. All proofs have been performed inside the HOL4 theorem prover.

(To appear in next week's POPL.)

I've been enjoying this paper on my commute this week. It's a nice little distillation of some of the basics of the engineering structure of a JITted language and how the pieces fit together in a correct implementation. As JIT compilers become more and more commonplace, I'd like to see them presented in such a way that they're no more scary or daunting -- at least in principle -- than traditional offline compilers. Perhaps a chapter in EoPL4?

Syntactic Proofs of Compositional Compiler Correctness

Syntactic Proofs of Compositional Compiler Correctness

Semantic preservation by compilers for higher-order languages can be verified using simple syntactic methods. At the heart of classic techniques are relations between source-level and target-level values. Unfortunately, these relations are specific to particular compilers, leading to correctness theorems that have nothing to say about linking programs with functions compiled by other compilers or written by hand in the target language. Theorems based on logical relations manage to avoid this problem, but at a cost: standard logical relations do not apply directly to programs with non-termination or impurity, and extensions to handle those features are relatively complicated, compared to the classical compiler verification literature.

In this paper, we present a new approach to “open” compiler correctness theorems that is “syntactic” in the sense that the core relations do not refer to semantics. Though the technique is much more elementary than previous proposals, it scales up nicely to realistic languages. In particular, untyped and impure programs may be handled simply, while previous work has addressed neither in this context.

Our approach is based on the observation that it is an unnecessary handicap to consider proofs as black boxes. We identify some theorem-specific proof skeletons, such that we can define an algebra of nondeterministic compilations and their proofs, and we can compose any two compilations to produce a correct-by-construction result. We have prototyped these ideas with a Coq implementation of multiple CPS translations for an untyped Mini-ML source language with recursive functions, sums, products, mutable references, and exceptions.

A submitted draft of another paper from Adam, continuing to expand LambdaTamer's reach.

XML feed