Taking Off the Gloves with Reference Counting Immix, by Rifat Shahriyar, Stephen M. Blackburn, and Kathryn S. McKinley:
Despite some clear advantages and recent advances, reference counting remains a poor cousin to high-performance tracing garbage collectors. The advantages of reference counting include a) immediacy of reclamation, b) incrementality, and c) local scope of its operations. After decades of languishing with hopelessly bad performance, recent work narrowed the gap between reference counting and the fastest tracing collectors to within 10%. Though a major advance, this gap remains a substantial barrier to adoption in performance-conscious application domains. Our work identiï¬es heap organization as the principal source of the remaining performance gap. We present the design, implementation, and analysis of a new collector, RCImmix, that replaces reference countingâ€™s traditional free-list heap organization with the line and block heap structure introduced by the Immix collector. The key innovations of RCImmix are 1) to combine traditional reference counts with per-line live object counts to identify reusable memory and 2) to eliminate fragmentation by integrating copying with reference counting of new objects and with backup tracing cycle collection. In RCImmix, reference counting offers efï¬cient collection and the line and block heap organization delivers excellent mutator locality and efï¬cient allocation. With these advances, RCImmix closes the 10% performance gap, outperforming a highly tuned production generational collector. By removing the performance barrier, this work transforms reference counting into a serious alternative for meeting high performance objectives for garbage collected languages.
A new reference counting GC based on the Immix heap layout, which purports to close the remaining performance gap with tracing collectors. It builds on last year's work, Down for the count? Getting reference counting back in the ring, which describes various optimizations to raw reference counting that make it competitive with basic tracing. There's a remaining 10% performance gap with generational tracing that RCImmix closes by using the Immix heap layout with bump pointer allocation (as opposed to free lists typically used in RC). The improved cache locality of allocation makes RCImmix even faster than the generational tracing Immix collector.
However, the bump pointer allocation reduces the incrementality of reference counting and would impact latency. One glaring omission of this paper is the absence of latency/pause time measurements, which is typical of reference counting papers since ref counting is inherently incremental. Since RCImmix trades off some incrementality for throughput by using bump pointer allocation and copy collection, I'm curious how this impacts the pause times.
Reference counting has been discussed a few times here before, and some papers on past ref-counting GC's have been posted in comments, but this seems to be the first top-level post on competitive reference counting GC.
Dynamic Region Inference, by David Pereira and John Aycock:
We present a garbage collection scheme based on reference counting and region inference which, unlike the standard reference counting algorithm, handles cycles correctly. In our algorithm, the fundamental operations of region inference are performed dynamically. No assistance is required from the programmer or the compiler, making our algorithm particularly well-suited for use in dynamically-typed languages such as scripting languages. We provide a detailed algorithm and demonstrate how it can be implemented efficiently.
A novel garbage collector that solves reference counting's cycle problems by introducing "regions", which demarcate possibly cyclic subgraphs. These regions are updated by merge and split operations that occur on pointer update and incrementally on region allocation, respectively, ie. adding a pointer to B into aggregate C merges their regions, and trying to allocate a new region first attempts to split some random candidate region by computing the local reference counts via union-find of the region's members.
Obviously dynamic regions don't share contiguous storage like their static counterparts, so "regions" here are purely a logical concept to ensure completeness of reference counting. The implementation adds two words to each object, one for pointing to the object's current region, the other for a "next" pointer for the next object in the region.
The practicality of this approach isn't clear compared to other cycle detection algorithms, and no benchmarks are provided. I haven't found any follow-up work either.
Heap space analysis for garbage collected languages, by Elvira Albert, Samir Genaim, Miguel GÃ³mez-Zamalloa:
Accurately predicting the dynamic memory consumption (or heap space) of programs can be critical during software development. It is well-known that garbage collection (GC) complicates such problem. The peak heap consumption of a program is the maximum size of the data on the heap during its execution, i.e., the minimum amount of heap space needed to safely run the program. Existing heap space analyses either do not take deallocation into account or adopt specific models of garbage collectors which do not necessarily correspond to the actual memory usage. This paper presents a novel static analysis for garbage collected imperative languages that infers accurate upper bounds on the peak heap usage, including exponential, logarithmic and polynomial bounds. A unique characteristic of the analysis is that it is parametric on the notion of object lifetime, i.e., on when objects become collectible.
Similar work has been covered here in the past.
Concurrent Revisions is a Microsoft Research project doing interesting work in making concurrent programming scalable and easier to reason about. These papers work have been mentioned a number of times here on LtU, but none of them seem to have been officially posted as stories.
Concurrent Revisions are a distributed version control-like abstraction  for concurrently mutable state that requires clients to specify merge functions that make fork-join deterministic, and so make concurrent programs inherently composable. The library provide default merge behaviour for various familiar objects like numbers and lists, and it seems somewhat straightforward to provide a merge function for many other object types.
They've also extended the work to seamlessly integrate incremental and parallel computation  in a fairly intuitive fashion, in my opinion.
Their latest work  extends these concurrent revisions to distributed scenarios with disconnected operations, which operate much like distributed version control works with source code, with guarantees of eventual consistency.
All in all, a very promising approach, and deserving of wider coverage.
 Sebastian Burckhardt and Daan Leijen, Semantics of Concurrent Revisions, in European Symposium on Programming (ESOP'11), Springer Verlag, Saarbrucken, Germany, March 2011
 Sebastian Burckhardt, Daan Leijen, Caitlin Sadowski, Jaeheon Yi, and Thomas Ball, Two for the Price of One: A Model for Parallel and Incremental Computation, in Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'11), ACM SIGPLAN, Portland, Oregon, 22 October 2011
 Sebastian Burckhardt, Manuel Fahndrich, Daan Leijen, and Benjamin P. Wood, Cloud Types for Eventual Consistency, in Proceedings of the 26th European Conference on Object-Oriented Programming (ECOOP), Springer, 15 June 2012
Feature-Oriented Programming with Object Algebras, by Bruno C.d.S. Oliveira, Tijs van der Storm, Alex Loh, William R. Cook:
Object algebras are a new programming technique that enables a simple solution to basic extensibility and modularity issues in programming languages. While object algebras excel at deï¬ning modular features, the composition mechanisms for object algebras (and features) are still cumbersome and limited in expressiveness. In this paper we leverage two well-studied type system features, intersection types and type-constructor polymorphism, to provide object algebras with expressive and practical composition mechanisms. Intersection types are used for deï¬ning expressive run-time composition operators (combinators) that produce objects with multiple (feature) interfaces. Type-constructor polymorphism enables generic interfaces for the various object algebra combinators. Such generic interfaces can be used as a type-safe front end for a generic implementation of the combinators based on reï¬‚ection. Additionally, we also provide a modular mechanism to allow diï¬€erent forms of self-references in the presence of delegation-based combinators. The result is an expressive, type-safe, dynamic, delegation-based composition technique for object algebras, implemented in Scala, which eï¬€ectively enables a form of Feature-Oriented Programming using object algebras.
A follow-up to Object Algebras, this new paper addresses a few of the limitations described in that LtU thread by adding type constructor polymorphism to increase their safety. The paper describes an implementation in Scala, which is the only widely available statically typed OOP language with a sufficiently powerful type system needed to support FOP.
This new work also describes some composition mechanisms for object algebras in the context of more expressive languages.
How OCaml type checker works -- or what polymorphism and garbage collection have in common
There is more to Hindley-Milner type inference than the Algorithm W. In 1988, Didier RÃ©my was looking to speed up the type inference in Caml and discovered an elegant method of type generalization. Not only it is fast, avoiding the scan of the type environment. It smoothly extends to catching of locally-declared types about to escape, to type-checking of universals and existentials, and to implementing MLF.
Alas, both the algorithm and its implementation in the OCaml type checker are little known and little documented. This page is to explain and popularize RÃ©my's algorithm, and to decipher a part of the OCaml type checker. The page also aims to preserve the history of RÃ©my's algorithm.
The attraction of the algorithm is its insight into type generalization as dependency tracking -- the same sort of tracking used in automated memory management such as regions and generational garbage collection. Generalization can be viewed as finding dominators in the type-annotated abstract syntax tree with edges for shared types. Fluet and Morrisett's type system for regions and MetaOCaml environment classifiers use the generalization of a type variable as a criterion of region containment. Uncannily, RÃ©my's algorithm views the region containment as a test if a type variable is generalizable.
As usual with Oleg, there's a lot going on here. Personally, I see parallels with "lambda with letrec" and "call-by-push-value," although making the connection with the latter takes some squinting through some of Levy's work other than his CBPV thesis. Study this to understand OCaml type inference and/or MLF, or for insights into region typing, or, as the title suggests, for suggestive analogies between polymorphism and garbage collection.
Doug Gregor of Apple presented a talk on "A module system for the C family" at the 2012 LLVM Developers' Meeting.
The C preprocessor has long been a source of problems for programmers and tools alike. Programmers must contend with widespread macro pollution and include-ordering problems due to ill-behaved headers. Developers habitually employ various preprocessor workarounds, such as LONG_MACRO_PREFIXES, include guards, and the occasional #undef of a library macro to mitigate these problems. Tools, on the other hand, must cope with the inherent scalability problems associated with parsing the same headers repeatedly, because each different preprocessing context could effect how a header is interpreted---even though the programmer rarely wants it. Modules seeks to solve this problem by isolating the interface of a particular library and compiling it (once) into an efficient, serialized representation that can be efficiently imported whenever that library is used, improving both the programmer's experience and the scalability of the compilation process.
Slide[PDF] and Video[MP4]
Slides and videos from other presentations from the meeting are also available.
Video: Records, sums, cases, and exceptions: Row-polymorphism at work, Matthias Blume.
I will present the design of a programming language (called MLPolyR) whose type system makes significant use of row polymorphism (RÃ©my, 1991). MLPolyR (Blume et al. 2006) is a dialect of ML and provides extensible records as well as their exact dual, polymorphic sums with extensible first-class cases.
Found this to be an enjoyable and thorough overview of MLPolyR, a language created for a PL course that goes all-out on various dimensions of row polymorphism, resulting in a small yet powerful language. (previously)
How to Make Ad Hoc Proof Automation Less Ad Hoc
Georges Gonthier, Beta Ziliani, Aleksandar Nanevski, and Derek Dreyer, to appear in ICFP 2011
Most interactive theorem provers provide support for some form of user-customizable proof automation. In a number of popular systems, such as Coq and Isabelle, this automation is achieved primarily through tactics, which are programmed in a separate language from that of the proverâ€™s base logic. While tactics are clearly useful in practice, they can be difficult to maintain and compose because, unlike lemmas, their behavior cannot be specified within the expressive type system of the prover itself.
We propose a novel approach to proof automation in Coq that allows the user to specify the behavior of custom automated routines in terms of Coqâ€™s own type system. Our approach involves a sophisticated application of Coqâ€™s canonical structures, which generalize Haskell type classes and facilitate a flexible style of dependently-typed logic programming. Specifically, just as Haskell type classes are used to infer the canonical implementation of an overloaded term at a given type, canonical structures can be used to infer the canonical proof of an overloaded lemma for a given instantiation of its parameters. We present a series of design patterns for canonical structure programming that enable one to carefully and predictably coax Coqâ€™s type inference engine into triggering the execution of user-supplied algorithms during unification, and we illustrate these patterns through several realistic examples drawn from Hoare Type Theory. We assume no prior knowledge of Coq and describe the relevant aspects of Coq type inference from first principles.
If you've ever toyed with Coq but run into the difficulties that many encounter in trying to construct robust, comprehensible proof scripts using tactics, which manipulate the proof state and can leave you with the "ground" of the proof rather than the "figure," if you will, in addition to being fragile in the face of change, you may wish to give this a read. It frankly never would have occurred to me to try to turn Ltac scripts into lemmas at all. This is much more appealing than most other approaches to the subject I've seen.