Implementation

Taming the IXP network processor

Taming the IXP network processor, Lal George and Matthias Blume. PLDI 2003.

We compile Nova, a new language designed for writing network processing applications, using a back end based on integer-linear programming (ILP) for register allocation, optimal bank assignment, and spills. The compiler's optimizer employs CPS as its intermediate representation; some of the invariants that this IR guarantees are essential for the formulation of a practical ILP model.

Appel and George used a similar ILP-based technique for the IA32 to decide which variables reside in registers but deferred the actual assignment of colors to a later phase. We demonstrate how to carry over their idea to an architecture with many more banks, register aggregates, variables with multiple simultaneous register assignments, and, very importantly, one where bank- and register-assignment cannot be done in isolation from each other. Our approach performs well in practise--without causing an explosion in size or solve time of the generated integer linear programs.

This is a nice paper showing how to design and compile a small functional language in a high-performance domain with a very irregular underlying machine model.

Compiling with Continuations, Continued

Compiling with Continuations, Continued, Andrew Kennedy. ICFP 2007.

We present a series of CPS-based intermediate languages suitable for functional language compilation, arguing that they have practical benefits over direct-style languages based on A-normal form (ANF) or monads. Inlining of functions demonstrates the benefits most clearly: in ANF-based languages, inlining involves a renormalization step that rearranges let expressions and possibly introduces a new ‘join point’ function, and in monadic languages, commuting conversions must be applied; in contrast, inlining in our CPS language is a simple substitution of variables for variables.

We present a contification transformation implemented by simple rewrites on the intermediate language. Exceptions are modelled using so-called ‘double-barrelled’ CPS. Subtyping on exception constructors then gives a very straightforward effect analysis for exceptions. We also show how a graph-based representation of CPS terms can be implemented extremely efficiently, with linear-time term simplification.

Generational Real-time Garbage Collection

Generational Real-time Garbage Collection: A Three-part Invention for Young Objects, Daniel Frampton, David F. Bacon, Perry Cheng, David Grove. ECOOP 2007.

While real-time garbage collection is now available in production virtual machines, the lack of generational capability means applications with high allocation rates are subject to reduced throughput and high space overheads.

Since frequent allocation is often correlated with a high-level, object-oriented style of programming, this can force builders of real-time systems to compromise on software engineering.

We have developed a fully incremental, real-time generational collector based on a tri-partite nursery, which partitions the nursery into regions that are being allocated, collected, and promoted. Nursery collections are incremental, and can occur within any phase of a mature collection.

We present the design, mathematical model, and implementation of our collector in IBM's production Real-time Java virtual machine, and show both analytically and experimentally that the collector achieves real-time bounds comparable to a non-generational Metronome-style collector, while cutting memory consumption and total execution times by as much as 44% and 24% respectively.

Since the last post on GC was so popular, I figured I should post another one. The next few days I'll probably post links to a subset of the papers presented at ECOOP that a) I saw the talk for, and b) I found particularly striking from a language design POV.

Garbage Collection Without Paging

Garbage Collection without Paging, Matthew Hertz, Yi Feng, Emery D. Berger. PLDI 2005

Garbage collection offers numerous software engineering advantages, but interacts poorly with virtual memory managers. Existing garbage collectors require far more pages than the application’s working set and touch pages without regard to which ones are in memory, especially during full-heap garbage collection. The resulting paging can cause throughput to plummet and pause times to spike up to seconds or even minutes.We present a garbage collector that avoids paging. This bookmarking collector cooperates with the virtual memory manager to guide its eviction decisions. Using summary information (“bookmarks”) recorded from evicted pages, the collector can perform in-memory full-heap collections. In the absence of memory pressure, the bookmarking collector matches the throughput of the best collector we tested while running in smaller heaps. In the face of memory pressure, it improves throughput by up to a factor of five and reduces pause times by up to a factor of 45 over the next best collector. Compared to a collector that consistently provides high throughput (generational mark-sweep), the bookmarking collector reduces pause times by up to 218x and improves throughput by up to 41x. Bookmarking collection thus provides greater utilization of available physical memory than other collectors while matching or exceeding their throughput.

Guaranteed Optimization

"Guaranteed Optimization", in Active Libraries and Universal Languages, by Todd L. Veldhuizen. PhD thesis, 2004.

This chapter describes a new technique for designing optimizers that provide proven guarantees of what optimizations they perform. We use rewrite rules to capture common patterns of `adding abstraction' to programs, and develop a proof technique for compilers based on superanalysis (Chapter 2) to prove that any sequence of rule applications is undone in a single step. Such compilers address the problem of `abstraction penalty' --- see Section~1.4.1 --- and offers programmers an intuitive guarantee of what improvements the compiler will perform. We also show that compilers based on this approach find `optimal programs' in a carefully defined sense.

I know it's a little unusual to link to a chapter of a thesis instead of a paper, but on his web page the author said the paper I was planning on linking to ("Guaranteed Optimization: Proving Nullspace Properties of Compilers") had been superceded.

Anyway, the basic idea well illustrates the adage that a small change of POV can yield a lot of insight. Here, Veldhuizen changes the POV of a compiler writer from "optimization" to "de-pessimization", and gets a really nice result: an exact characterization of the specific kinds of inefficient patterns the compiler will eliminate. The value of this kind of predictability can be very large; functional programmers can rely on tail call optimization because we know exactly when the compiler will do the transformation.

Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams

Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, John Whaley and Monica S. Lam. PLDI 2004.

This paper presents the first scalable context-sensitive, inclusion-based pointer alias analysis for Java programs. Our approach to context sensitivity is to create a clone of a method for every context of interest, and run a context-insensitive algorithm over the ex-panded call graph to get context-sensitive results. For precision, we generate a clone for every acyclic path through a program's call graph, treating methods in a strongly connected component as a single node. Normally, this formulation is hopelessly intractable as a call graph often has 10^14 acyclic paths or more. We show that these exponential relations can be computed efficiently using binary decision diagrams (BDDs). Key to the scalability of the technique is a context numbering scheme that exposes the commonalities across contexts. We applied our algorithm to the most popular applications available on Sourceforge, and found that the largest programs, with hundreds of thousands of Java bytecodes, can be analyzed in under 20 minutes.

This paper shows that pointer analysis, and many other queries and algorithms, can be described succinctly and declaratively using Datalog, a logic programming language. We have developed a system called bddbddb that automatically translates Datalog programs into highly efficient BDD implementations. We used this approach to develop a variety of context-sensitive algorithms including side effect analysis, type analysis, and escape analysis.

Binary decision diagrams are one of the coolest data structures of the last 20 years, and are also one of the basic enabling data structures underlying modern SAT solvers. Using them to implement Datalog is a really clever idea.

EDIT: I relied on memory instead of Google, and got it wrong about BDDs and SAT solving. Modern DPLL-based SAT solvers generally do not use BDDs, because when your solutions are far apart in the solution space the representing BDD would get quite large. BDDs are widely used in verification, though, and are also still one my favorite data structures. :)

Extending Java with Yield

The Yielder library by infomancers (a.k.a. Aviad Ben Dov of the Chaotic Java weblog) is a library providing Java with Ruby/Python/C#-like yield-ing for iterators.

It works by using Java 1.5's facility for hooking user-defined class inspectors/transformers into the classloader, and rewriting the bytecode of a yieldNextCore() method containing calls to yieldReturn(foo) so that its local variables are promoted to members of an anonymous inner class, which also maintains the state of the iterator/generator.

Examples, rationale and implementation are discussed in detail at Chaotic Java. There is also a collections library built on top of Yielder, with tree traversal iterators and map/filter-like functionality.

From a PL perspective, this ability to hook into the classloader and do bytecode manipulation on plain java classes provides an interesting mechanism for prototyping new language features in libraries. It could equally well support an aspect framework (via annotations), and perhaps even something like Terracotta's distributed heap.

Application-specific foreign-interface generation

Application-specific foreign-interface generation, John Reppy and Chunyan Song, October 2006.

A foreign interface (FI) mechanism to support interoperability with libraries written in other languages (especially C) is an important feature in most high-level language implementations. Such FI mechanisms provide a Foreign Function Interface (FFI) for the high-level language to call C functions and marshaling and unmarshaling mechanisms to support conversion between the high-level and C data representations. Often, systems provide tools to automate the generation of FIs, but these tools typically lock the user into a specific model of interoperability. It is our belief that the policy used to craft the mapping between the high-level language and C should be distinct from the underlying mechanism used to implement the mapping.

In this paper, we describe a FI generation tool, called FIG (for Foreign Interface Generator) that embodies a new approach to the problem of generating foreign interfaces for high-level languages. FIG takes as input raw C header files plus a declarative script that specifies the generation of the foreign interface from the header file. The script sets the policy for the translation, which allows the user to tailor the resulting FI to his or her application. We call this approach application-specific foreign-interface generation. The scripting language uses rewriting strategies as its execution model. The other major feature of the scripting language is a novel notion of composable typemaps that describe the mapping between high-level and low-level types.

FFIs are a perennial engineering problem, and it's very nice to see progress being made on automating what's automatable about building interfaces. Their interface specification language is built from two little DSLs. The first one is a language that for specifying how to map low level types to high level types, and the second one is a rewriting-based language for translating API functions, which makes use of the type mapping programs you defined earlier. The whole thing is quite pretty, and seems to read very well.

An interesting gimme for you stack-language fans: the DSL that Reppy and Song use to specify type mappings from low-level to high-level types is a combinator-based language that reads a bit like Forth or Postscript.

A Functional Description of TeX's Formula Layout

A Functional Description of TeX's Formula Layout, Reinhold Heckmann and Reinhard Wilhelm, Journal of Functional Programming 1997.

While the quality of the results of TeX's mathematical formula layout algorithm is convincing, its original description is hard to understand since it is presented as an imperative program with complex control flow and destructive manipulations of the data structures representing formulae. In this paper, we present a re-implementation of TeX's formula layout algorithm in the functional language SML, thereby providing a more readable description of the algorithm, extracted from the monolithical TeX system.

I've wanted a simple description of what TeX is doing for a long time, and it's nice to see that laid out in a clear and readable way.

Type-sensitive control-flow analysis

Type-sensitive control-flow analysis, John Reppy, 2006 ML Workshop.

Higher-order typed languages, such as ML, provide strong support for data and type abstraction. While such abstraction is often viewed as costing performance, there are situations where it may provide opportunities for more aggressive program optimization. Specifically, we can exploit the fact that type abstraction guarantees representation independence, which allows the compiler to specialize data representations. This paper describes a first step in supporting such optimizations; namely a modular control-flow analysis that uses the program’s type information to compute more precise results. We present our algorithm as an extension of Serrano’s version of 0-CFA and we show that it respects types. We also discuss applications of the analysis with a specific focus on optimizing Concurrent ML programs.

XML feed