Junfeng Yang, Heming Cui, Jingyue Wu, Yang Tang, and Gang Hu, "Determinism Is Not Enough: Making Parallel Programs Reliable with Stable Multithreading", Communications of the ACM, Vol. 57 No. 3, Pages 58-69.
We believe what makes multithreading hard is rather quantitative: multithreaded programs have too many schedules. The number of schedules for each input is already enormous because the parallel threads may interleave in many ways, depending on such factors as hardware timing and operating system scheduling. Aggregated over all inputs, the number is even greater. Finding a few schedules that trigger concurrency errors out of all enormously many schedules (so developers can prevent them) is like finding needles in a haystack. Although Deterministic Multi-Threading reduces schedules for each input, it may map each input to a different schedule, so the total set of schedules for all inputs remains enormous.
We attacked this root cause by asking: are all the enormously many schedules necessary? Our study reveals that many real-world programs can use a small set of schedules to efficiently process a wide range of inputs. Leveraging this insight, we envision a new approach we call stable multithreading (StableMT) that reuses each schedule on a wide range of inputs, mapping all inputs to a dramatically reduced set of schedules. By vastly shrinking the haystack, it makes the needles much easier to find. By mapping many inputs to the same schedule, it stabilizes program behaviors against small input perturbations.
The link above is to a publicly available pre-print of the article that appeared in the most recent CACM. The CACM article is a summary of work by Junfeng Yang's research group. Additional papers related to this research can be found at http://www.cs.columbia.edu/~junfeng/
Multiple Dispatch as Dispatch on Tuples, by Gary T. Leavens and Todd D. Millstein:
Many popular object-oriented programming languages, such as C++, Smalltalk-80, Java, and Eiffel, do not support multiple dispatch. Yet without multiple dispatch, programmers find it difficult to express binary methods and design patterns such as the "visitor" pattern. We describe a new, simple, and orthogonal way to add multimethods to single-dispatch object-oriented languages, without affecting existing code. The new mechanism also clarifies many differences between single and multiple dispatch.
Multimethods and multiple dispatch has been discussed numerous times here on LtU. While the theory has been fully fleshed out to the point of supporting full-fledged type systems for multiple dispatch, there has always remained a conceptual disconnect between multimethods and the OO model, namely that methods are supposed to be messages sends to objects with privileged access to that object's internal state. Multimethods would seem to violate encapsulation inherent to objects, and don't fit with the conceptual messaging model.
This paper goes some way to solving that disconnect, as multiple dispatch is simply single dispatch on a distinct, primitive class type which is predicated on N other class types and thus supporting N-ary dispatch. This multiple dispatch support can also be retrofitted to an existing single-dispatch languages without violating its existing dispatch model.
Taking Off the Gloves with Reference Counting Immix, by Rifat Shahriyar, Stephen M. Blackburn, and Kathryn S. McKinley:
Despite some clear advantages and recent advances, reference counting remains a poor cousin to high-performance tracing garbage collectors. The advantages of reference counting include a) immediacy of reclamation, b) incrementality, and c) local scope of its operations. After decades of languishing with hopelessly bad performance, recent work narrowed the gap between reference counting and the fastest tracing collectors to within 10%. Though a major advance, this gap remains a substantial barrier to adoption in performance-conscious application domains. Our work identiﬁes heap organization as the principal source of the remaining performance gap. We present the design, implementation, and analysis of a new collector, RCImmix, that replaces reference counting’s traditional free-list heap organization with the line and block heap structure introduced by the Immix collector. The key innovations of RCImmix are 1) to combine traditional reference counts with per-line live object counts to identify reusable memory and 2) to eliminate fragmentation by integrating copying with reference counting of new objects and with backup tracing cycle collection. In RCImmix, reference counting offers efﬁcient collection and the line and block heap organization delivers excellent mutator locality and efﬁcient allocation. With these advances, RCImmix closes the 10% performance gap, outperforming a highly tuned production generational collector. By removing the performance barrier, this work transforms reference counting into a serious alternative for meeting high performance objectives for garbage collected languages.
A new reference counting GC based on the Immix heap layout, which purports to close the remaining performance gap with tracing collectors. It builds on last year's work, Down for the count? Getting reference counting back in the ring, which describes various optimizations to raw reference counting that make it competitive with basic tracing. There's a remaining 10% performance gap with generational tracing that RCImmix closes by using the Immix heap layout with bump pointer allocation (as opposed to free lists typically used in RC). The improved cache locality of allocation makes RCImmix even faster than the generational tracing Immix collector.
However, the bump pointer allocation reduces the incrementality of reference counting and would impact latency. One glaring omission of this paper is the absence of latency/pause time measurements, which is typical of reference counting papers since ref counting is inherently incremental. Since RCImmix trades off some incrementality for throughput by using bump pointer allocation and copy collection, I'm curious how this impacts the pause times.
Reference counting has been discussed a few times here before, and some papers on past ref-counting GC's have been posted in comments, but this seems to be the first top-level post on competitive reference counting GC.
Dynamic Region Inference, by David Pereira John Aycock:
We present a garbage collection scheme based on reference counting and region inference which, unlike the standard reference counting algorithm, handles cycles correctly. In our algorithm, the fundamental operations of region inference are performed dynamically. No assistance is required from the programmer or the compiler, making our algorithm particularly well-suited for use in dynamically-typed languages such as scripting languages. We provide a detailed algorithm and demonstrate how it can be implemented efficiently.
A novel garbage collector that solves reference counting's cycle problems by introducing "regions", which demarcate possibly cyclic subgraphs. These regions are updated by merge and split operations that occur on pointer update and incrementally on region allocation, respectively, ie. adding a pointer to B into aggregate C merges their regions, and trying to allocate a new region first attempts to split some random candidate region by computing the local reference counts via union-find of the region's members.
Obviously dynamic regions don't share contiguous storage like their static counterparts, so "regions" here are merely a concept to get reference counting to work. The implementation adds two words to each object, one for pointing to the object's current region, the other for a "next" pointer for the next object in the region.
The practicality of this approach isn't clear compared to other cycle detection algorithms, and no benchmarks are provided. I haven't found any follow-up work either.
Heap space analysis for garbage collected languages, by Elvira Albert, Samir Genaim, Miguel Gómez-Zamalloa:
Accurately predicting the dynamic memory consumption (or heap space) of programs can be critical during software development. It is well-known that garbage collection (GC) complicates such problem. The peak heap consumption of a program is the maximum size of the data on the heap during its execution, i.e., the minimum amount of heap space needed to safely run the program. Existing heap space analyses either do not take deallocation into account or adopt specific models of garbage collectors which do not necessarily correspond to the actual memory usage. This paper presents a novel static analysis for garbage collected imperative languages that infers accurate upper bounds on the peak heap usage, including exponential, logarithmic and polynomial bounds. A unique characteristic of the analysis is that it is parametric on the notion of object lifetime, i.e., on when objects become collectible.
Similar work has been covered here in the past.
Concurrent Revisions is a Microsoft Research project doing interesting work in making concurrent programming scalable and easier to reason about. These papers work have been mentioned a number of times here on LtU, but none of them seem to have been officially posted as stories.
Concurrent Revisions are a distributed version control-like abstraction  for concurrently mutable state that requires clients to specify merge functions that make fork-join deterministic, and so make concurrent programs inherently composable. The library provide default merge behaviour for various familiar objects like numbers and lists, and it seems somewhat straightforward to provide a merge function for many other object types.
They've also extended the work to seamlessly integrate incremental and parallel computation  in a fairly intuitive fashion, in my opinion.
Their latest work  extends these concurrent revisions to distributed scenarios with disconnected operations, which operate much like distributed version control works with source code, with guarantees of eventual consistency.
All in all, a very promising approach, and deserving of wider coverage.
 Sebastian Burckhardt and Daan Leijen, Semantics of Concurrent Revisions, in European Symposium on Programming (ESOP'11), Springer Verlag, Saarbrucken, Germany, March 2011
 Sebastian Burckhardt, Daan Leijen, Caitlin Sadowski, Jaeheon Yi, and Thomas Ball, Two for the Price of One: A Model for Parallel and Incremental Computation, in Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'11), ACM SIGPLAN, Portland, Oregon, 22 October 2011
 Sebastian Burckhardt, Manuel Fahndrich, Daan Leijen, and Benjamin P. Wood, Cloud Types for Eventual Consistency, in Proceedings of the 26th European Conference on Object-Oriented Programming (ECOOP), Springer, 15 June 2012
Feature-Oriented Programming with Object Algebras, by Bruno C.d.S. Oliveira, Tijs van der Storm, Alex Loh, William R. Cook:
Object algebras are a new programming technique that enables a simple solution to basic extensibility and modularity issues in programming languages. While object algebras excel at deﬁning modular features, the composition mechanisms for object algebras (and features) are still cumbersome and limited in expressiveness. In this paper we leverage two well-studied type system features, intersection types and type-constructor polymorphism, to provide object algebras with expressive and practical composition mechanisms. Intersection types are used for deﬁning expressive run-time composition operators (combinators) that produce objects with multiple (feature) interfaces. Type-constructor polymorphism enables generic interfaces for the various object algebra combinators. Such generic interfaces can be used as a type-safe front end for a generic implementation of the combinators based on reﬂection. Additionally, we also provide a modular mechanism to allow diﬀerent forms of self-references in the presence of delegation-based combinators. The result is an expressive, type-safe, dynamic, delegation-based composition technique for object algebras, implemented in Scala, which eﬀectively enables a form of Feature-Oriented Programming using object algebras.
A follow-up to Object Algebras, this new paper addresses a few of the limitations described in that LtU thread by adding type constructor polymorphism to increase their safety. The paper describes an implementation in Scala, which is the only widely available statically typed OOP language with a sufficiently powerful type system needed to support FOP.
This new work also describes some composition mechanisms for object algebras in the context of more expressive languages.
How OCaml type checker works -- or what polymorphism and garbage collection have in common
There is more to Hindley-Milner type inference than the Algorithm W. In 1988, Didier Rémy was looking to speed up the type inference in Caml and discovered an elegant method of type generalization. Not only it is fast, avoiding the scan of the type environment. It smoothly extends to catching of locally-declared types about to escape, to type-checking of universals and existentials, and to implementing MLF.
Alas, both the algorithm and its implementation in the OCaml type checker are little known and little documented. This page is to explain and popularize Rémy's algorithm, and to decipher a part of the OCaml type checker. The page also aims to preserve the history of Rémy's algorithm.
The attraction of the algorithm is its insight into type generalization as dependency tracking -- the same sort of tracking used in automated memory management such as regions and generational garbage collection. Generalization can be viewed as finding dominators in the type-annotated abstract syntax tree with edges for shared types. Fluet and Morrisett's type system for regions and MetaOCaml environment classifiers use the generalization of a type variable as a criterion of region containment. Uncannily, Rémy's algorithm views the region containment as a test if a type variable is generalizable.
As usual with Oleg, there's a lot going on here. Personally, I see parallels with "lambda with letrec" and "call-by-push-value," although making the connection with the latter takes some squinting through some of Levy's work other than his CBPV thesis. Study this to understand OCaml type inference and/or MLF, or for insights into region typing, or, as the title suggests, for suggestive analogies between polymorphism and garbage collection.
Doug Gregor of Apple presented a talk on "A module system for the C family" at the 2012 LLVM Developers' Meeting.
The C preprocessor has long been a source of problems for programmers and tools alike. Programmers must contend with widespread macro pollution and include-ordering problems due to ill-behaved headers. Developers habitually employ various preprocessor workarounds, such as LONG_MACRO_PREFIXES, include guards, and the occasional #undef of a library macro to mitigate these problems. Tools, on the other hand, must cope with the inherent scalability problems associated with parsing the same headers repeatedly, because each different preprocessing context could effect how a header is interpreted---even though the programmer rarely wants it. Modules seeks to solve this problem by isolating the interface of a particular library and compiling it (once) into an efficient, serialized representation that can be efficiently imported whenever that library is used, improving both the programmer's experience and the scalability of the compilation process.
Slide[PDF] and Video[MP4]
Slides and videos from other presentations from the meeting are also available.