Reagents: Expressing and Composing Fine-grained Concurrency

Reagents: Expressing and Composing Fine-grained Concurrency, by Aaron Turon:

Efficient communication and synchronization is crucial for finegrained parallelism. Libraries providing such features, while indispensable, are difficult to write, and often cannot be tailored or composed to meet the needs of specific users. We introduce reagents, a set of combinators for concisely expressing concurrency algorithms. Reagents scale as well as their hand-coded counterparts, while providing the composability existing libraries lack.

This is a pretty neat approach to writing concurrent code, which lies somewhere between manually implementing low-level concurrent algorithms and STM. Concurrent algorithms are expressed and composed semi-naively, and Reagents automates the retries for you in case of thread interference (for transient failure of CAS updates), or they block waiting for input from another thread (in case of permanent failure where no input is available).

The core seems to be k-CAS with synchronous communication between threads to coordinate reactions on shared state. The properties seem rather nice, as Aaron describes:

When used in isolation, reagents are guaranteed to perform only the CASes that the hand-written algorithm would, so they introduce no overhead on shared-memory operations; by recoding an algorithm use reagents, you lose nothing. Yet unlike hand-written algorithms, reagents can be composed using choice, tailored with new blocking behavior, or combined into larger atomic blocks.

The benchmarks in section 6 look promising. This appears to be work towards Aaron's thesis which provides many more details.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Looks promising!

I quite enjoyed this paper. The primitive combinators make a whole heck of a lot of sense and the resulting example code is clear and concise.

It seems like this could be a strong foundation for making CSP-style code play nicer with concurrent data-structures. I've found that when working with Go or Clojure's core.async, the channels tend to infect surrounding code and turning everything in to a "server" in some sense. Frequently, something from java.util.concurrent may be more appropriate. Unfortunately, it's impossible to multiplex on a core.async channel and a concurrent-hash-map lookup, for example.

I think this work would

I think this work would benefit from tighter integration with the MCAS implementation. Non-blocking multi-word compare-and-swap tend to work as state machines that are executed concurrently (threads cooperate instead of blocking). If the type system can guarantee the referential integrity (and low, bounded, latency) of computations, we can figure out the new value in the middle of the state machine (e.g., to implement X *= 2) and avoid retries on stale values. The type system's help and first class functions are pretty essential, so that's probably why systemsy people with their C background never explored that.

From a low level point of view, the paper glosses over a major weakness of many MCAS algorithms: we have to acquire mutable locations with an atomic compare-and-swap even if we only need consistent reads (e.g., to atomically update X = Y + Z). That's not very nice for read-mostly workloads, where a classical STM should work much better.