Joe Duffy: A (brief) retrospective on transactional memory

A (brief) retrospective on transactional memory, by Joe Duffy, January 3rd, 2010. Although this is a blog post, don't expect to read it all on your lunch break...

The STM.NET incubator project was canceled May 11, 2010, after beginning public life July 27, 2009 at DevLabs. In this blog post, written 4 months prior to its cancellation, Joe Duffy discusses the practical engineering challenges around implementing Software Transactional Memory in .NET. Note: He starts off with a disclaimer that he was not engaged in the STM.NET project past its initial working group phase.

In short, Joe argues, "Throughout, it became abundantly clear that TM, much like generics, was a systemic and platform-wide technology shift. It didn’t require type theory, but the road ahead sure wasn’t going to be easy." The whole blog post deals with how many implementation challenges platform-wide support for STM would be in .NET, including what options were considered. He does not mention Maurice Herlihy's SXM library approach, but refers to Tim Harris's work several times.

There was plenty here that surprised me, especially when you compare Concurrent Haskell's STM implementation to STM.NET design decisions and interesting debates the team had. In Concurrent Haskell, issues Joe raises, like making Console.WriteLine transactional, are delegated to the type system by the very nature of the TVar monad, preventing programmers from writing such wishywashy code. To be honest, this is why I didn't understand what Joe meant by "it didn't require type theory" gambit, since some of the design concerns are mediated in Concurrent Haskell via type theory. On the other hand, based on the pragmatics Joe discusses, and the platform-wide integration with the CLR they were shooting for, reminds me of The Transactional Memory / Garbage Collection Analogy. Joe also wrote a briefer follow-up post, More thoughts on transactional memory, where he talks more about Barbara Liskov's Argus.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Disillusionment Part III

Disillusionment Part III: the Privatization Problem

I still remember the day like it was yesterday. A regular weekly team meeting, to discuss our project’s status, future, hard problems, and the like. A summer intern on board from a university doing pioneering work in TM, sipping his coffee. Me, sipping my tea. Then that same intern’s casual statement pointing out an Earth-shattering flaw that would threaten the kind of TM we (and most of the industry at the time) were building. We had been staring at the problem for over a year without having seen it. It is these kinds of moments that frighten me and make me a believer in formal computer science.

Here it is in a nutshell:

bool itIsOwned = false;
MyObj x = new MyObj();
…
atomic { // Tx0                          atomic { // Tx1
    // Claim the state for my use:           if (!itIsOwned)
    itIsOwned = true;                            x.field += 42;
}                                        }
int z = x.field;
...

The Tx0 transaction changes itIsOwned to true, and then commits. After it has committed, it proceeds to using whatever state was claimed (in this case an object referred to by variable x) outside of the purview of TM. Meanwhile, another transaction Tx1 has optimistically read itIsOwned as false, and has gone ahead to use x. An update in-place system will allow that transaction to freely change the state of x. Of course, it will roll back here, because isItOwned changed to true. But by then it is too late: the other thread using x outside of a transaction will see constantly changing state – torn reads even – and who knows what will happen from there. A known flaw in any weakly atomic, update in-place TM.

I really have problems seeing what is going on, or wrong, here, which must be due to a failed understanding of what atomic means in his context. They found out that their semantics for atomic were too weak? What are the weak atomic semantics then?

(Is it they assumed that in transactions they could freely change non-shared variables over atomic regions?)

They're using a boolean lock

They're using a boolean to control exclusive access to x, but they are erroneously accessing the state of x outside of atomic scope. This permits dirty reads of x after Tx0 commits, due to optimistic reads of isItOwned and writes of x in Tx1.

Joe seems to be saying that STM requires a tradeoff: optimistic reads/writes, transactions are short, proper isolation enforced, pick any two.

Transactional Console.WriteLine

Transactional IO of the sort you mention goes beyond 'Transactional Memory' into 'Distributed Transactions', where one integrates transactional systems separated by IO barriers. A classic example would be composing STM with an SQL transaction so that we maintain atomicity and consistency properties between an application and a database. Achieving transactions that touch two or more databases is another common example. Transactional consoles and GUI forms would introduce 'long running transactions' - which are rarely well supported, but remain interesting possibilities. It isn't unusual to want transactions that interact meaningfully with a user... e.g. filling out a multi-page form, or a purchase from Amazon - or obtaining flight, hotel, and vehicle rental.

The value and utility of distributed transactions is much debated, no less than STM. But one must recall that 'Console.WriteLine' is syntactically little different from 'Object.MethodCall'. If we don't compose transactions across method calls, there isn't much reason to bother with STM in an OO system. If one does support transactions across method calls, then it seems only reasonable that a Console object shouldn't be any different.

I spent some time last year developing a variation on STM for the actors model and contemplating how to simplify integration of X/Open XA at my version of an foreign function interface (FFI) layer. I would love to see better language support for integrating both STM and various industry standards for distributed transactions. (Though we really need three-phase commit to support long-lived transactions... something that X/Open XA does not provide.)

FFI proved to be the primary source of complexity. I could integrate transactions with non-transactional systems in any number of ad-hoc 'good enough' manners at the FFI. For example, if a file system is not transactional, then perhaps we can check file-times and lock files to develop something 'good enough' for practical application. Or, for a printer, I could perhaps test-queue a job separately from executing it. A great many FFI systems are 'transaction-safe' with just a little bit of memory... e.g. asking for the current time. I provided logging facilities to the adaptor modules to help with the FFI. But adapting even a few FFI modules for 3-phase commit was painful enough to make me reconsider the effort.

I'm no longer pursuing transactional actors, and I instead pursue a new distributed computation model (called RDP) with inherently BASE (Basically Available, Soft State, Eventually Consistent) characteristics. But I've spent many hours considering the feasibility of achieving ACID properties for statically defined subgraphs of agents in an RDP system. I would lose the ad-hoc compositional properties of transactional systems - which I consider to be the defining characteristic of transactional systems. But the weaker goal of 'maintaining consistency within a well-defined subgraph of (potentially persisted) distributed or concurrent components' might be enough to keep an implementation simple and practical - avoiding the monumental challenge of adapting a transactional FFI.

Neither Clojure's nor Haskell's STM make especially feasible the use of distributed transactions; they achieve the much simpler (and thus more practical) goal of local memory transactions. They integrate IO in an ad-hoc manner. IIRC, Clojure provides 'agents' that can queue up a bunch of messages to be delivered on commit. Haskell has an implementation of an Control.Concurrent.AdvSTM monad that can perform IO actions onCommit or onRetry... which is quite similar in utility to Clojure's agents. Haskell also has a library for transactional persistent memory, which attempts to be similar to Prevayler or Hibernate. It seems that someone very recently started developing transactions for Distributed Haskell.

Anyhow, I wouldn't say that transactional Console.WriteLine is necessarily "wishywashy code" (disregarding whether STM.NET approached it poorly). There is no fundamental reason we couldn't develop consoles and browsers and such that support X/Open XA standards (or some other DTP standard better suited to long-lived transactions), and even inform users of when an input or form is transactional and therefore may be user-aborted or user-committed. It strikes me as more wishy-washy the way we currently have 'OK, Apply, Cancel' at the bottom of nigh every form... yet lack software integration to assure us that 'Cancel' will do its job.

What a mess

What the hell is this garbage?! The writeup indicates the team's design and development process was filled with shoddy thinking and unsound premature optimization stemming from the lack of a coherent notion of transactions. "Transactional Console.ReadLine"? "Weakly atomic update-in-place"? While I've come to expect rambling writeups of cruft transaction folklore from the SQL Server blogs, I'm very surprised to see this from Microsoft Research.

Well

Joe does not represent the STM.NET team. He only helped put the team together. This is his personal views. The STM.NET team didn't blog much, but I agree that Joe gives the impression that Microsoft struggled with how to map Tim Harris's ideas into a platform with pervasive impure, side-effecting functions.

STM and AdvSTM in Haskell is much cleaner. The conclusion of Joe's blog post is hopefully a sign of lessons learned, although even those conclusions don't match up with what the next generation massively concurrent programming languages like Chapel, Fortress and X10 are proposing. In those languages, the stuff Joe talks about in his conclusion is implementation details hidden from the programmer.

While I've come to expect rambling writeups of cruft transaction folklore from the SQL Server blogs, I'm very surprised to see this from Microsoft Research.

Interesting. I would like an example or two to see this - by e-mail if you don't want to post here.

Joe Duffy and the team

Joe Duffy and the team working on STM were not from MSR. They were an internal incubation. BTW I find the write up pretty insightful. It seems to capture problems and pittfals in trying to realize a workable STM model that would see real use.

I'd actually like to see a system of atomics with pessimistic concurency control. The idea of an "atomic" region to catch fine grain locking errors seems much more tractable, and useful.

"In the imperative setting."

Your "capture problems and pittfals in trying to realize a workable STM model that would see real use" statement implicitly assumes imperative setting.

There is a short overview from Simon Peyton-Jones about relative profitability of STM in functional (Haskell) and imperative (.Net) setting.