Implementation

Continuation-Passing C: Compiling threads to events through continuations

Gabriel Kerneis and Juliusz Chroboczek, "Continuation-Passing C: Compiling threads to events through continuations", arXiv: 1011.4558.

In this paper, we introduce Continuation Passing C (CPC), a programming language for concurrent systems in which native and cooperative threads are unified and presented to the programmer as a single abstraction. The CPC compiler uses a compilation technique, based on the CPS transform, that yields efficient code and an extremely lightweight representation for contexts. We provide a complete proof of the correctness of our compilation scheme. We show in particular that lambda-lifting, a common compilation technique for functional languages, is also correct in an imperative language like C, under some conditions enforced by the CPC compiler. The current CPC compiler is mature enough to write substantial programs such as Hekate, a highly concurrent BitTorrent seeder. Our benchmark results show that CPC is as efficient, while significantly cheaper, as the most efficient thread libraries available.

Automatic Staged Compilation

Automatic Staged Compilation, doctoral dissertation of Matthai Philipose:

[...] The past few years have seen the emergence of staged optimization, which produces run-time optimizations that often have much lower run-time overhead than traditional optimizers, yet do not sacrifice any of their functionality. The key to the technique is a method, called staging, to transfer optimization overhead to static compile time from run time. Unfortunately, developing staged variants of individual optimizations has been highly specialized, labor-intensive work; staging pipelines of optimizations even more so.

This dissertation presents a system called the Staged Compilation Framework (SCF), which automatically stages entire pipelines of compiler optimizations at arguably little additional engineering cost beyond building the slower traditional version of the pipeline. SCF harnesses two powerful but traditionally difficult-to-use techniques, partial evaluation and dead-store elimination, to achieve staging. An implementation of SCF shows that staged compilation can speed up pipelines of classical compiler optimizations by up to an order of magnitude, and more commonly by a factor of 4.5 to 5.

I haven't read through it all yet, but after a cursory skim it certainly looks interesting.

Pure and Declarative Syntax Definition: Paradise Lost and Regained, Onward 2010

Pure and Declarative Syntax Definition: Paradise Lost and Regained by Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth from Delft

Syntax definitions are pervasive in modern software systems, and serve as the basis for language processing tools like parsers and compilers. Mainstream parser generators pose restrictions on syntax definitions that follow from their implementation algorithm. They hamper evolution, maintainability, and compositionality of syntax definitions. The pureness and declarativity of syntax definitions is lost. We analyze how these problems arise for different aspects of syntax definitions, discuss their consequences for language engineers, and show how the pure and declarative nature of syntax definitions can be regained.

I haven't compared this version with the Onward 2010 version, but they look essentially the same. It seems timely to post this paper, considering the other recent story Yacc is dead. There is not a whole lot to argue against in this paper, since we all "know" the other approaches aren't as elegant and only resort to them for specific reasons such as efficiency. Yet, this is the first paper I know of that tries to state the argument to software engineers.

For example, the Dragon Book, in every single edition, effectively brushes these topics aside. In particular, the Dragon Book does not even mention scannerless parsing as a technique, and instead only explains the "advantages" of using a scanner. Unfortunately, the authors of this paper don't consider other design proposals, either, such as Van Wyk's context-aware scanners from GPCE 2007. It is examples like these that made me wish the paper was a bit more robust in its analysis; the examples seem focused on the author's previous work.

If you are not familiar with the author's previous work in this area, the paper covers it in the references. It includes Martin Bravenboer's work on modular Eclipse IDE support for AspectJ.

Yacc is dead

In Yacc is dead (2010) Matthew Might and David Darais of the University of Utah, Salt Lake City...

present two novel approaches to parsing context-free languages. The first approach is based on an extension of Brzozowskiâ€™s derivative from regular expressions to context-free grammars. The second approach is based on a generalization of the derivative to parser combinators. The payoff of these techniques is a small (less than 250 lines of code), easy-to-implement parsing library capable of parsing arbitrary context-free grammars into lazy parse forests. Implementations for both Scala and Haskell are provided. Preliminary experiments with S-Expressions parsed millions of tokens per second, which suggests this technique is efficient enough for use in practice.

It seems every problem in computer science can be solved with either one more level of indirection or a derivative.

Sawzall Language Open Sourced

Google open sources Szl - compiler and runtime for Sawzall Language:

Sawzall is a procedural language developed for parallel analysis of very large data sets (such as logs). It provides protocol buffer handling, regular expression support, string and array manipulation, associative arrays (maps), structured data (tuples), data fingerprinting (64-bit hash values), time values, various utility operations and the usual library functions operating on floating-point and string values. For years Sawzall has been Google's logs processing language of choice and is used for various other data analysis tasks across the company.

The original paper from Rob Pike et al.

Turning down the LAMP: Software specialization for the cloud

Several years ago, a reading group I was in read about the Flux OSKit Project, which aimed to provide a modular basis for operating systems. One of the topics of discussion was the possibility of, and possible benefits of, an application-specific OS. (For example, the fearful spectre of EmacsOS was raised.)

Today, I ran across "Turning down the LAMP: Software specialization for the cloud", which actually makes a pretty strong case for the idea on a virtual machine infrastructure,

...We instead view the cloud as a stable hardware platform, and present a programming framework which permits applications to be constructed to run directly on top of it without intervening software layers. Our prototype (dubbed Mirage) is unashamedly academic; it extends the Objective Caml language with storage extensions and a custom run-time to emit binaries that execute as a guest operating system under Xen. Mirage applications exhibit significant performance speedups for I/O and memory handling versus the same code running under Linux/Xen.

As one example,

Frameworks which currently use (for example) fork(2) on a host to spawn processes would benefit from using cloud management APIs to request resources and eliminate the distinction between cores and hosts.

On the other hand, I suspect that this "unashamedly academic" idea may already be advancing into the commercial arena, if I am correctly reading between the lines of the VMware vFabric tc Server^TM marketing material.

Joe Duffy: A (brief) retrospective on transactional memory

A (brief) retrospective on transactional memory, by Joe Duffy, January 3rd, 2010. Although this is a blog post, don't expect to read it all on your lunch break...

The STM.NET incubator project was canceled May 11, 2010, after beginning public life July 27, 2009 at DevLabs. In this blog post, written 4 months prior to its cancellation, Joe Duffy discusses the practical engineering challenges around implementing Software Transactional Memory in .NET. Note: He starts off with a disclaimer that he was not engaged in the STM.NET project past its initial working group phase.

In short, Joe argues, "Throughout, it became abundantly clear that TM, much like generics, was a systemic and platform-wide technology shift. It didnâ€™t require type theory, but the road ahead sure wasnâ€™t going to be easy." The whole blog post deals with how many implementation challenges platform-wide support for STM would be in .NET, including what options were considered. He does not mention Maurice Herlihy's SXM library approach, but refers to Tim Harris's work several times.

There was plenty here that surprised me, especially when you compare Concurrent Haskell's STM implementation to STM.NET design decisions and interesting debates the team had. In Concurrent Haskell, issues Joe raises, like making Console.WriteLine transactional, are delegated to the type system by the very nature of the TVar monad, preventing programmers from writing such wishywashy code. To be honest, this is why I didn't understand what Joe meant by "it didn't require type theory" gambit, since some of the design concerns are mediated in Concurrent Haskell via type theory. On the other hand, based on the pragmatics Joe discusses, and the platform-wide integration with the CLR they were shooting for, reminds me of The Transactional Memory / Garbage Collection Analogy. Joe also wrote a briefer follow-up post, More thoughts on transactional memory, where he talks more about Barbara Liskov's Argus.

Ocaml 3.12 released

This notice comes a little late, but the latest version of OCaml, version 3.12, has been released. Surprisingly, for a point release there's a lot of interesting new language features:

Some of the highlights in release 3.12 are:

Polymorphic recursion is supported, using explicit type declarations on the recursively-defined identifiers.
First-class modules: module expressions can be embedded as values of the core language, then manipulated like any other first-class value, then projected back to the module level.
New operator to modify a signature a posteriori: S with type t := tau denotes signature S where the t type component is removed and substituted by the type tau elsewhere.
New notations for record expressions and record patterns: { lbl } as shorthand for { lbl = lbl }, and { ...; _ } marks record patterns where some labels were intentionally omitted.
Local open let open ... in ... now supported by popular demand.
Type variables can be bound as type parameters to functions; such types are treated like abstract types within the function body, and like type variables (possibly generalized) outside.
The module type of construct enables to recover the module type of a given module.
Explicit method override using the method! keyword, with associated warnings and errors.

I'm especially intrigued by first-class modules, and the destructive signature operations, both of which should make it much easier to write libraries.

Type Classes as Objects and Implicits

Type classes were originally developed in Haskell as a disciplined alternative to ad-hoc polymorphism. Type classes have been shown to provide a type-safe solution to important challenges in software engineering and programming languages such as, for example, retroactive extension of programs. They are also recognized as a good mechanism for concept-based generic programming and, more recently, have evolved into a mechanism for type-level computation. This paper presents a lightweight approach to type classes in object-oriented (OO) languages with generics using the CONCEPT pattern and implicits (a type-directed implicit parameter passing mechanism).

This paper also shows how Scalaâ€™s type system conspires with implicits to enable, and even surpass, many common extensions of the Haskell type class system, making Scala ideally suited for generic programming in the large.

Martin Odersky and team's design decisions around how to do type classes in a unified OO and FP language continue to bear fascinating fruit. Implicits look less and less like "poor man's type classes," and more and more like an improvement upon type classes, in my opinion given a quick read of this paper.

Compiling Structural Types on the JVM

Here's a little sausage making article for JVM language implementors. In Compiling Structural Types on the JVM: A Comparison of Reflective and Generative Techniques from Scalaâ€™s Perspective, Gilles Dubochet and Martin Odersky describe

Scalaâ€™s compilation technique of structural types for the JVM. The technique uses Java reflection and polymorphic inline caches. Performance measurements of this technique are presented and analysed. Further measurements compare Scalaâ€™s reflective technique with the â€œgenerativeâ€ technique used by Whiteoak to compile structural types. The article ends with a comparison of reflective and generative techniques for compiling structural types. It concludes that generative techniques may, in specific cases, exhibit higher performances than reflective approaches, but that reflective techniques are easier to implement and have fewer restrictions.

There's no discussion of the the proposed JVM "method handles" and whether they might be an even better solution than runtime reflection.

Whiteoak was mentioned previously on LtU.

Lambda the Ultimate

User login

Navigation