John C. Reynolds Doctoral Dissertation Award (SIGPLAN)

Presented annually to the author of the outstanding doctoral dissertation in the area of Programming Languages. The award includes a prize of $1,000. The winner can choose to receive the award at ICFP, OOPSLA, POPL, or PLDI.

Nominations must be submitted to the secretary of SIGPLAN by January 5th 2014 to be considered for this year's award. The nominated dissertation must be available in an English language version to facilitate evaluation by the selection committee and have been awarded in the year 2013.

I know of some outstanding dissertations. I am sure you do to. So why not nominate them for this honor (for further details see here)?

Taking Off the Gloves with Reference Counting Immix

Taking Off the Gloves with Reference Counting Immix, by Rifat Shahriyar, Stephen M. Blackburn, and Kathryn S. McKinley:

Despite some clear advantages and recent advances, reference counting remains a poor cousin to high-performance tracing garbage collectors. The advantages of reference counting include a) immediacy of reclamation, b) incrementality, and c) local scope of its operations. After decades of languishing with hopelessly bad performance, recent work narrowed the gap between reference counting and the fastest tracing collectors to within 10%. Though a major advance, this gap remains a substantial barrier to adoption in performance-conscious application domains. Our work identifies heap organization as the principal source of the remaining performance gap. We present the design, implementation, and analysis of a new collector, RCImmix, that replaces reference counting’s traditional free-list heap organization with the line and block heap structure introduced by the Immix collector. The key innovations of RCImmix are 1) to combine traditional reference counts with per-line live object counts to identify reusable memory and 2) to eliminate fragmentation by integrating copying with reference counting of new objects and with backup tracing cycle collection. In RCImmix, reference counting offers efficient collection and the line and block heap organization delivers excellent mutator locality and efficient allocation. With these advances, RCImmix closes the 10% performance gap, outperforming a highly tuned production generational collector. By removing the performance barrier, this work transforms reference counting into a serious alternative for meeting high performance objectives for garbage collected languages.

A new reference counting GC based on the Immix heap layout, which purports to close the remaining performance gap with tracing collectors. It builds on last year's work, Down for the count? Getting reference counting back in the ring, which describes various optimizations to raw reference counting that make it competitive with basic tracing. There's a remaining 10% performance gap with generational tracing that RCImmix closes by using the Immix heap layout with bump pointer allocation (as opposed to free lists typically used in RC). The improved cache locality of allocation makes RCImmix even faster than the generational tracing Immix collector.

However, the bump pointer allocation reduces the incrementality of reference counting and would impact latency. One glaring omission of this paper is the absence of latency/pause time measurements, which is typical of reference counting papers since ref counting is inherently incremental. Since RCImmix trades off some incrementality for throughput by using bump pointer allocation and copy collection, I'm curious how this impacts the pause times.

Reference counting has been discussed a few times here before, and some papers on past ref-counting GC's have been posted in comments, but this seems to be the first top-level post on competitive reference counting GC.

LVars: monotonic update for deterministic parallel programming

LVars are one outcome of Lindsey Kuper's ongoing PhD work at Indiana University. They generalize existing models for deterministic parallelism by considering a general framework of monotonic read and write operations. They were briefly mentioned on LtU before (along with the strongly related work on Bloom in the distributed systems community), and were recently presented in two distinct and complementary articles.

The first article describes the basic building blocks and ideas of LVars:

LVars: Lattice-Based Data Structures for Deterministic Parallelism
Lindsey Kuper, Ryan R. Newton
2013

Programs written using a deterministic-by-construction model of parallel computation are guaranteed to always produce the same observable results, offering programmers freedom from subtle, hard-to-reproduce nondeterministic bugs that are the scourge of parallel software. We present LVars, a new model for deterministic- by-construction parallel programming that generalizes existing single-assignment models to allow multiple assignments that are monotonically increasing with respect to a user-specified lattice.

LVars ensure determinism by allowing only monotonic writes and "threshold" reads that block until a lower bound is reached. We give a proof of determinism and a prototype implementation for a language with LVars and describe how to extend the LVars model to support a limited form of nondeterminism that admits failures but never wrong answers

The second relaxes the original model by introducing failure, which widens its applicability:

Freeze After Writing: Quasi-Deterministic Parallel Programming with LVars
Lindsey Kuper, Aaron Turon, Neelakantan Krishnaswami, Ryan R. Newton
2014

Deterministic-by-construction parallel programming models offer programmers the promise of freedom from subtle, hard-to-reproduce nondeterministic bugs in parallel code. A principled approach to deterministic-by-construction parallel programming with shared state is offered by LVars: shared memory locations whose semantics are defined in terms of a user-specified lattice. Writes to an LVar take the least upper bound of the old and new values with respect to the lattice, while reads from an LVar can observe only that its contents have crossed a specified threshold in the lattice. Although it guarantees determinism, this interface is quite limited.

We extend LVars in two ways. First, we add the ability to “freeze” and then read the contents of an LVar directly. Second, we add the ability to attach callback functions to an LVar, allowing events to be triggered by writes to it. Together, callbacks and freezing enable an expressive and useful style of parallel programming. We prove that in a language where communication takes place through freezable LVars, programs are at worst quasi-deterministic: on every run, they either produce the same answer or raise an error. We demonstrate the viability of our approach by implementing a library for Haskell supporting a variety of LVar-based data structures, together with two case studies that illustrate the programming model and yield promising parallel speedup.

Something I personally found surprising and impressive about LVars is that, while I was initially interested in the very formal aspects of providing a theoretical framework for deterministic concurrency, it very quickly produced a practical library that people can use to write parallel program -- and competitive with existing high-performance approaches. As described in a recent blog post, a Haskell library is available on Hackage -- but surely LVars-inspired libraries could make sense in a lot of other languages as well.

Copatterns: the final approach to codata?

Andreas Abel and Brigitte Pientka's Well-Founded Recursion with Copatterns; a Unified Approach to Termination and Productivity is one of my highlights of the just-finished ICFP 2013, but it makes sense to focus on the first paper on this work, published at POPL back in January.

Copatterns: Programming Infinite Structures by Observations
Andreas Abel, Brigitte Pientka, David Thibodeau, Anton Setzer
2013

Inductive datatypes provide mechanisms to define finite data such as finite lists and trees via constructors and allow programmers to analyze and manipulate finite data via pattern matching. In this paper, we develop a dual approach for working with infinite data structures such as streams. Infinite data inhabits coinductive datatypes which denote greatest fixpoints. Unlike finite data which is defined by constructors we define infinite data by observations. Dual to pattern matching, a tool for analyzing finite data, we develop the concept of copattern matching, which allows us to synthesize infinite data. This leads to a symmetric language design where pattern matching on finite and infinite data can be mixed. We present a core language for programming with infinite structures by observations together with its operational semantics based on (co)pattern matching and describe coverage of copatterns. Our language naturally supports both call-by-name and call-by-value interpretations and can be seamlessly integrated into existing languages like Haskell and ML. We prove type soundness for our language and sketch how copatterns open new directions for solving problems in the interaction of coinductive and dependent types.

Codata has been often discussed here and elsewhere. See notably the discussion on Turner's Total Functional Programming (historical note: this 2004 beautification of the original 1995 paper which had much of the same ideas), and on the category-theory-inspired Charity language. Given those precedents, it would be easy for the quick reader to "meh" on the novelty of putting "observation" first (elimination rather than introduction rules) when talking about codata; yet the above paper is the first concrete, usable presentation of an observation in a practical setting that feels right, and it solves long-standing problem that current dependently-typed languages (Coq and Agda) have.

Coinduction has an even more prominent role, due to its massive use to define program equivalence in concurrent process calculi; the relevant LtU discussion being about Davide Sangiorgi's On the origins of Bisimulation, Coinduction, and Fixed Points. The POPL'13 paper doesn't really tell us how coinduction should be seen with copatterns. It does not adress the question of termination, which is the topic of the more recent ICFP'13 paper, but I would say that the answer on that point feels less definitive.

xkcd: Functional

XKCD comic on functional programming. Overlay money quote:

Functional programming combines the flexibility and power of abstract mathematics with the intuitive clarity of abstract mathematics.

The Size-Change Termination Principle for Constructor Based Languages

The Size-Change Termination Principle for Constructor Based Languages, by Pierre Hyvernat:

This paper describes an automatic termination checker for a generic first-order call-by-value language in ML style. We use the fact that value are built from variants and tuples to keep some information about how arguments of recursive call evolve during evaluation. The result is a criterion for termination extending the size-change termination principle of Lee, Jones and Ben-Amram that can detect size changes inside subvalues of arguments. Moreover the corresponding algorithm is easy to implement, making it a good candidate for experimentation.

Looks like a relatively straightforward and complete description of a termination checker based on a notion of 'sized types' limited to first-order programs. LtU has covered this topic before, although this new paper doesn't seem to reference that particular Abel work.

Types for Flexible Objects

Types for Flexible Objects, by Pottayil Harisanker Menon, Zachary Palmer, Alexander Rozenshteyn, Scott Smith:

Scripting languages are popular in part due to their extremely flexible objects. These languages support numerous object features, including dynamic extension, mixins, traits, and first-class messages. While some work has succeeded in typing these features individually, the solutions have limitations in some cases and no project has combined the results.

In this paper we define TinyBang, a small typed language containing only functions, labeled data, a data combinator, and pattern matching. We show how it can directly express all of the aforementioned flexible object features and still have sound typing. We use a subtype constraint type inference system with several novel extensions to ensure full type inference; our algorithm refines parametric polymorphism for both flexibility and efficiency. We also use TinyBang to solve an open problem in OO literature: objects can be extended after being messaged without loss of width or depth subtyping and without dedicated metatheory. A core subset of TinyBang is proven sound and a preliminary implementation has been constructed.

An interesting paper I stumbled across quite by accident, it purports quite an ambitious set of features: generalizing previous work on first-class cases while supporting subtyping, mutation, and polymorphism all with full type inference, in an effort to match the flexibility of dynamically typed languages.

It does so by introducing a host of new concepts that are almost-but-not-quite generalizations of existing concepts, like "onions" which are kind of a type-indexed extensible record, and "scapes" which are sort of a generalization of pattern matching cases.

Instead of approaching objects via a record calculus, they approach it using its dual as variant matching. Matching functions then have degenerate dependent types, which I first saw in the paper Type Inference for First-Class Messages with Match-Functions. Interesting aside, Scott Smith was a coauthor on this last paper too, but it isn't referenced in the "flexible objects" paper, despite the fact that "scapes" are "match-functions".

Overall, quite a dense and ambitous paper, but the resulting TinyBang language looks very promising and quite expressive. Future work includes making the system more modular, as it currently requires whole program compilation, and adding first-class labels, which in past work has led to interesting results as well. Most work exploiting row polymorphism is particularly interesting because it supports efficient compilation to index-passing code for both records and variants. It's not clear if onions and scapes are also amenable to this sort of translation.

Edit: a previous paper was published in 2012, A Practical, Typed Variant Object Model -- Or, How to Stand On Your Head and Enjoy the View. BigBang is their language that provides syntactic sugar on top of TinyBang.

Edit 2: commas fixed, thanks!

Dynamic Region Inference

Dynamic Region Inference, by David Pereira and John Aycock:

We present a garbage collection scheme based on reference counting and region inference which, unlike the standard reference counting algorithm, handles cycles correctly. In our algorithm, the fundamental operations of region inference are performed dynamically. No assistance is required from the programmer or the compiler, making our algorithm particularly well-suited for use in dynamically-typed languages such as scripting languages. We provide a detailed algorithm and demonstrate how it can be implemented efficiently.

A novel garbage collector that solves reference counting's cycle problems by introducing "regions", which demarcate possibly cyclic subgraphs. These regions are updated by merge and split operations that occur on pointer update and incrementally on region allocation, respectively, ie. adding a pointer to B into aggregate C merges their regions, and trying to allocate a new region first attempts to split some random candidate region by computing the local reference counts via union-find of the region's members.

Obviously dynamic regions don't share contiguous storage like their static counterparts, so "regions" here are purely a logical concept to ensure completeness of reference counting. The implementation adds two words to each object, one for pointing to the object's current region, the other for a "next" pointer for the next object in the region.

The practicality of this approach isn't clear compared to other cycle detection algorithms, and no benchmarks are provided. I haven't found any follow-up work either.

Dependently-Typed Metaprogramming (in Agda)

Conor McBride gave an 8-lecture summer course on Dependently typed metaprogramming (in Agda) at the Cambridge University Computer Laboratory:

Dependently typed functional programming languages such as Agda are capable of expressing very precise types for data. When those data themselves encode types, we gain a powerful mechanism for abstracting generic operations over carefully circumscribed universes. This course will begin with a rapid depedently-typed programming primer in Agda, then explore techniques for and consequences of universe constructions. Of central importance are the “pattern functors” which determine the node structure of inductive and coinductive datatypes. We shall consider syntactic presentations of these functors (allowing operations as useful as symbolic differentiation), and relate them to the more uniform abstract notion of “container”. We shall expose the double-life containers lead as “interaction structures” describing systems of effects. Later, we step up to functors over universes, acquiring the power of inductive-recursive definitions, and we use that power to build universes of dependent types.

The lecture notes, code, and video captures are available online.

As with his previous course, the notes contain many(!) mind expanding exploratory exercises, some of which quite challenging.

Parsing people, unite! Call for position papers for Parsing@SLE (SPLASH, Indianapolis)

Parsing@SLE is a new workshop on parsing programming languages and other software languages. The intended participants are the authors of parser generation tools and parsers for programming languages and other software languages. For the purpose of this workshop "parsing" is a computation that takes a sequence of characters as input and produces a tree or graph shaped model as output. This possibly includes tokenization using regular expressions, deriving trees using context-free grammars, mapping to abstract syntax trees and perhaps even some semantic analysis.

The goal of the workshop is to bring together today's experts in the field of parsing, in order to explore open questions and possibly forge new collaborations. The topics may include algorithms, implementation and generation techniques, syntax and semantics of meta formalisms (BNF), etc. We expect to attract participants that have been or are developing theory, techniques and tools in the broad area of parsing non-natural languages such as programming languages and other software languages (domain specific languages, configuration languages, build languages, data description languages, query languages, etc.)

We solicit short abstracts, asking for positions, demonstrations and early achievements. The submissions will be reviewed on relevance and clarity, and used to plan the mostly interactive sessions of the day.

* workshop website
* Deadline August 15 2013
* Notification September 1 2013
* Submit a position paper