Implementation

Kona

Kona is a new open-source implementation of Arthur Whitney's K, an ASCII-based APL like language. Kona is a fully working version of K3.

If you haven't ever tried APL/J/K or ilk you might find this language incomprehensible at first -- unless you like a challenge! Watch the screencasts or read some of our earlier APL/J stories.

Regardless of your interest in K, any LtUer worth his salt will enjoy the source code. We wrote a bit about the history of the remarkable C coding style used in the past, but I can't locate the link at the moment.

Finding and Understanding Bugs in C Compilers

In Finding and Understanding Bugs in C Compilers Xuejun Yang, Yang Chen, Eric Eide, and John Regehr of University of Utah, School of Computing describe Csmith, a fuzzer for testing C compilers. The hard part was avoiding undefined behavior.

Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.

Two bits really stuck out for me. First, formal verification has a real positive impact

The striking thing about our CompCert results is that the middleend bugs we found in all other compilers are absent. As of early 2011, the under-development version of CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task. The apparent unbreakability of CompCert supports a strong argument that developing compiler optimizations within a proof framework, where safety checks are explicit and machine-checked, has tangible benefits for compiler users.

And second, code coverage is inadequate for ensuring good test thoroughness for software as complex as a compiler.

Because we find many bugs, we hypothesized that randomly generated programs exercise large parts of the compilers that were not covered by existing test suites. To test this, we enabled code coverage monitoring in GCC and LLVM. We then used each compiler to build its own test suite, and also to build its test suite plus 10,000 Csmith-generated programs. Table 3 shows that the incremental coverage due to Csmith is so small as to be a negative result. Our best guess is that these metrics are too shallow to capture Csmith’s effects, and that we would generate useful additional coverage in terms of deeper metrics such as path or value coverage.

Leveled Garbage Collection

Leveled Garbage Collection by Guanshan Tong and Michael J. O'Donnell:

Generational garbage collection (GGC) is one of the most popular garbage collection techniques. GGC gains a performance advantage by performing minor collections on the younger objects in the heap, reducing the number of major collections of the whole heap. A promotion policy determines when an object moves from the younger generation to the older. The design of GGC has been justified by the plausible assumption that many objects die very young, and a few live a very long time. But, attempts to tune the performance of GGC by adjusting the promotion policy have been disappointing - only the simplest immediate promotion policy has proved attractive. The success of GGC is probably due to simplicity and to avoiding scans of the whole heap, rather than to accurate lifetime predictions.
This paper presents Leveled Garbage Collection (LGC), a new algorithm that is not based on object ages. It uses a heap structure and collection scheme similar to those of generational garbage collectors, and has a non-age-based promotion policy that doesn't promote all of the live objects, but still guarantees ample free space immediately after each garbage collection. By tuning LGC's promotion policy, we can often improve on GGC with immediate promotion.
Performance comparisons show that LGC outperforms GGC with immediate promotion policy in many cases, while losing only slightly on cases favorable to immediate promotion. LGC has a substantial advantage when the heap fits in main memory, and an even greater advantage as the heap gets paged to disk.

Leveled GC is based on a more general heuristic than generational GC, in that it tries to keep as many objects as possible in the nursery because minor collections are so much cheaper. What I found most interesting about this paper is that it scales well with virtual memory, which as we know can degrade performance significantly. They provide benchmarks demonstrating a marked difference when large heap sizes trigger paging (Section 5.2.2). LGC performance is hardly affected, while the runtime of generational GC degrades significantly.

Memory Models: A Case for Rethinking Parallel Languages and Hardware, CACM, August 2010

Memory Models: A Case for Rethinking Parallel Languages and Hardware by Sarita V. Adve and Hans-J. Boehm

This is a pre-print of the actual version.

The era of parallel computing for the masses is here, but writing correct parallel programs remains far more difficult than writing sequential programs. Aside from a few domains, most parallel programs are written using a shared-memory approach. The memory model, which specifies the meaning of shared variables, is at the heart of this programming model. Unfortunately, it has involved a tradeoff between programmability and performance, and has arguably been one of the most challenging and contentious areas in both hardware architecture and programming language specification. Recent broad community-scale efforts have finally led to a convergence in this debate, with popular languages such as Java and C++ and most hardware vendors publishing compatible memory model specifications. Although this convergence is a dramatic improvement, it has exposed fundamental shortcomings in current popular languages and systems that prevent achieving the vision of structured and safe parallel programming.

This paper discusses the path to the above convergence, the hard lessons learned, and their implications. A cornerstone of this convergence has been the view that the memory model should be a contract between the programmer and the system - if the programmer writes disciplined (data-race-free) programs, the system will provide high programmability (sequential consistency) and performance. We discuss why this view is the best we can do with current popular languages, and why it is inadequate moving forward. We then discuss research directions that eliminate much of the concern about the memory model, but require rethinking popular parallel languages and hardware. In particular, we argue that parallel languages should not only promote high-level disciplined models, but they should also enforce the discipline. Further, for scalable and efficient performance, hardware should be co-designed to take advantage of and support such disciplined models. The inadequacies of the state-of-the-art and the research agenda we outline have deep implications for the practice, research, and teaching of many computer science sub-disciplines, spanning theory, software, and hardware.

Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing

In Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing, Rendel Tillmann and Klaus Ostermann at the University of Marburg, Germany apply the "don't repeat yourself" principle to parsers and pretty printers.

Parsers and pretty-printers for a language are often quite similar, yet both are typically implemented separately, leading to redundancy and potential inconsistency. We propose a new interface of syntactic descriptions, with which both parser and pretty-printer can be described as a single program. Whether a syntactic description is used as a parser or as a pretty-printer is determined by the implementation of the interface. Syntactic descriptions enable programmers to describe the connection between concrete and abstract syntax once and for all, and use these descriptions for parsing or pretty-printing as needed. We also discuss the generalization of our programming technique towards an algebra of partial isomorphisms.

Language Virtualization for Heterogeneous Parallel Computing

Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun describe an approach to parallel DSLs that is a hybrid between external DSLs and internal DSLs in Language Virtualization for Heterogeneous Parallel Computing.

As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatible mix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomposition and machine-specific details. Ordinary programmers are having a difficult time using these programming models effectively. To provide a programming model that addresses the productivity and performance requirements for the average programmer, we explore a domain-specific approach to heterogeneous parallel programming.

We propose language virtualization as a new principle that enables the construction of highly efficient parallel domain specific languages that are embedded in a common host language. We define criteria for language virtualization and present techniques to achieve them.We present two concrete case studies of domain-specific languages that are implemented using our virtualization approach.

While the motivation of the paper is parallelization the proposed design looks like LINQ expression trees dialed to 11.

Azul's Pauseless Garbage Collector

Here's Gil Tene on Azul's Pauseless Garbage Collector for the JVM.

One of the key techniques that we use is massive and rapid manipulation of virtual memory mappings. We will change mappings of virutal to physical memory at the rate of Java allocation.

And

The same read barrier I mentioned before will also intercept any attempt to read a reference to an object that has been relocated, and that allows us to lazily relocate references without needing a pause. We compact by moving an entire virtual page worth of objects, we kind of blow it up, moving all the live objects to other places and thereby compacting them. But we don't try and locate and find all the pointers to that page immediately.

The challenge seems to be that standard OSes don't currently have enough hooks for them to do this kind of thing so their runtime must live in either their custom hardware and OS or a virtual machine.

Continuation-Passing C: Compiling threads to events through continuations

Gabriel Kerneis and Juliusz Chroboczek, "Continuation-Passing C: Compiling threads to events through continuations", arXiv: 1011.4558.

In this paper, we introduce Continuation Passing C (CPC), a programming language for concurrent systems in which native and cooperative threads are unified and presented to the programmer as a single abstraction. The CPC compiler uses a compilation technique, based on the CPS transform, that yields efficient code and an extremely lightweight representation for contexts. We provide a complete proof of the correctness of our compilation scheme. We show in particular that lambda-lifting, a common compilation technique for functional languages, is also correct in an imperative language like C, under some conditions enforced by the CPC compiler. The current CPC compiler is mature enough to write substantial programs such as Hekate, a highly concurrent BitTorrent seeder. Our benchmark results show that CPC is as efficient, while significantly cheaper, as the most efficient thread libraries available.

See also the CPC website.

Automatic Staged Compilation

Automatic Staged Compilation, doctoral dissertation of Matthai Philipose:

[...] The past few years have seen the emergence of staged optimization, which produces run-time optimizations that often have much lower run-time overhead than traditional optimizers, yet do not sacrifice any of their functionality. The key to the technique is a method, called staging, to transfer optimization overhead to static compile time from run time. Unfortunately, developing staged variants of individual optimizations has been highly specialized, labor-intensive work; staging pipelines of optimizations even more so.

This dissertation presents a system called the Staged Compilation Framework (SCF), which automatically stages entire pipelines of compiler optimizations at arguably little additional engineering cost beyond building the slower traditional version of the pipeline. SCF harnesses two powerful but traditionally difficult-to-use techniques, partial evaluation and dead-store elimination, to achieve staging. An implementation of SCF shows that staged compilation can speed up pipelines of classical compiler optimizations by up to an order of magnitude, and more commonly by a factor of 4.5 to 5.

I haven't read through it all yet, but after a cursory skim it certainly looks interesting.

Pure and Declarative Syntax Definition: Paradise Lost and Regained, Onward 2010

Pure and Declarative Syntax Definition: Paradise Lost and Regained by Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth from Delft

Syntax definitions are pervasive in modern software systems, and serve as the basis for language processing tools like parsers and compilers. Mainstream parser generators pose restrictions on syntax definitions that follow from their implementation algorithm. They hamper evolution, maintainability, and compositionality of syntax definitions. The pureness and declarativity of syntax definitions is lost. We analyze how these problems arise for different aspects of syntax definitions, discuss their consequences for language engineers, and show how the pure and declarative nature of syntax definitions can be regained.

I haven't compared this version with the Onward 2010 version, but they look essentially the same. It seems timely to post this paper, considering the other recent story Yacc is dead. There is not a whole lot to argue against in this paper, since we all "know" the other approaches aren't as elegant and only resort to them for specific reasons such as efficiency. Yet, this is the first paper I know of that tries to state the argument to software engineers.

For example, the Dragon Book, in every single edition, effectively brushes these topics aside. In particular, the Dragon Book does not even mention scannerless parsing as a technique, and instead only explains the "advantages" of using a scanner. Unfortunately, the authors of this paper don't consider other design proposals, either, such as Van Wyk's context-aware scanners from GPCE 2007. It is examples like these that made me wish the paper was a bit more robust in its analysis; the examples seem focused on the author's previous work.

If you are not familiar with the author's previous work in this area, the paper covers it in the references. It includes Martin Bravenboer's work on modular Eclipse IDE support for AspectJ.

XML feed