Patrick Logan on Software Transaction Memory

A detailed blog post on STM - and why it is a Bad Thing.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

contains no actual arguments

I didn't see any serious arguments in that post. It was just the repeated assertion, in inflammatory language, that STM is bad. It implies the false dichotomy that things have to be shared-everything or shared-nothing. It implies that the reason some people like it is because it's "shiney" without mentioning the reduced complexity of the programming model, or the fact that transactions have been used quite successfully in DBMSs for decades. All in all not something I would expect to see on the LtU homepage.

Ditto

I was extremely disappointed. Patrick, who I know to be a serious thinker about programming language issues, blithely suggests that everyone should simply rewrite their systems in Erlang or some other shared-nothing message-passing language, nevermind the prohibitive costs of doing so for any real-world production system. He also handwaves the issue of transaction composability away as "academic," which is particularly odd considering the real world effort to make transactions composable. Complaining that STM takes you "too far away from the domain" while recommending a shared-nothing message-passing rewrite is also pretty rich, IMHO.

I also suspect that Patrick hasn't read The Next Mainstream Programming Languages, where I think Tim makes a very persuasive case that:

  • Large-scale stateful software isn't going to go away, particularly when it already exists and is shipping.
  • Preemptive threads and monitors don't scale.
  • Imperative programming is the wrong default.
  • There are multiple viable approaches to effects in otherwise pure languages (monads, effect types...)
  • Slide 50: "Claim: Transactions are the only plausible solution to concurrent mutable state," where the alternatives considered on slide 49 are "referentially-transparent functions, message-passing concurrency, or continue using the sequential, single-threaded approach."

Now, it's possible that Tim's analysis doesn't apply to many domains other than games. But I don't see that argument being made effectively.

STM makes me nervous

I happen to agree completely with Patrick's observations. I have some reasonable experience in writing transactional systems (the JTA implementation for Weblogic, among others) and I'll attempt to state my concerns without actually having worked with an STM implementation. I'm sure someone will correct me if my concerns are misplaced or if I'm wrong.

STM is an optimistic transaction model suited for low levels of interference between threads of control. You can easily add a whole lot of overhead by introducing an innocuous looking method (that happens to touch a lot of state or merely increases the contention window) inside the atomic block. Unlike db transactions, STM has no timeout and may suffer mysterious slowdowns until the transaction goes through after many retries. At worst, it could livelock, if the window of contention and and the retry frequency are big enough. STM works for Haskell because mutation is very limited in the language; I very much doubt STM will be a good solution in the hands of a C#/Java programmer.

Second, there are many areas that don't come under the purview of STM (file activity, screen updates, real db transactions with different tx semantics). They don't work under an arbitrary retry model, unlike in a locking case where you know you own the critical section.

There seem to be a number of inconsistencies in Tim Sweeney's preferences. "large scale stateful software isn't going away" and "imperative programming is the wrong default" are contradictory, or at least sounds like an impasse. Performance is crucial to him, yet he's willing to live with a 4X slowdown for STM.

In any case, I think threads with shared-memory semantics are a huge disaster, and Erlang isn't the only *kind* of solution around (functional, shared-nothing).

It is possible to create (and I'm working on it) a message passing system that supports a gazillion tasks, non-interference between tasks, asynchronous message passing, uses collocation effectively if it is there, has user-level scheduling (what does the kernel know about your application's scheduling preferences anyway?).

It can be fast (blindingly fast if you talk to the Occam folks), composable (CSP, Occam), able to naturally take advantage of multiple cores and processors, and eliminate the unnecessary distinction (at compile time) between collocated and distributed systems. Finally, it can be intuitive to a normal programmer (someone who is still confused by monads, or lack of mutability) by retrofitting all the other concepts onto a traditional language and preserving referential integrity under aliasing.

I think the Actor/CSP model is a far better way of dealing with concurrency, but the implementations available so far (other than Erlang) haven't been able to demonstrate the full potential of the model.

STM nervousness

STM makes me nervous too, but raw shared-state concurrency is terrifying, so that's an improvement! Here are some thoughts:

"large scale stateful software isn't going away" and "imperative programming is the wrong default" are contradictory

My current analysis of the sort of code that's required in a game is that purely functional programming (plus Haskell-style ST non-imperative local state) is sufficiently expressive for around 70% of the code we write, which accounts for 95% of our performance profile. Thus I see functional programming as the right default.

The other 30% of our code (which accounts for 5% of performance) is necessarily so stateful that purely functional programming and message-passing concurrency are implausible. Thus I see imperative programming, in some form or another, as remaining an essential tool for some systems present in modern software.

Performance is crucial to him, yet he's willing to live with a 4X slowdown for STM

Without STM, the only tractable way to manage this code is to single-thread it. I'm not nearly smart enough to write race-free, deadlock-free code that scales to 10,000 freely-interacting objects from 1000 C++ classes maintained by tens of programmers in different locations.

With STM, I can continue writing software at full productivity, and it makes 5% of my performance profile become 4X slower. I break even at 4 threads, and come out ahead after that. So, eventually, STM wins over single-threading.

How do the message-passing guys implement robust interactions between independent objects with mutable state, like the classic "bank transfer" example that justifies transactions in databases? You end up writing an endless set of ad hoc transaction-like message exchanges for each interaction that may occur in the system. Thus my argument that STM is the only productivity-preserving concurrency solution for the kind of problems we encounter in complex object-oriented systems that truly necessitate state, such as games.

message passing and transactions

The other 30% of our code (which accounts for 5% of performance) is necessarily so stateful that purely functional programming and message-passing concurrency are implausible

Can you help me understand why message passing concurrency is implausible for your domain? Isn't the Opengl API itself a wrapper over a message passing architecture?

In the server/middleware space that I'm more conversant with, there are a large number of examples where mutability, concurrency and message passing are standard patterns. The Tuxedo transaction monitor has just a few primitives (tp_send, tp_enqueue etc.) and uses OS processes for isolation. It works for Visa and Amazon.

The Singularity OS is built on a foundation of message passing and isolated processes, where each process is written in a mutable style; there is no shortage of concurrency or mutability. (Incidentally, Tim Harris who co-wrote the STM-Haskell paper works on Singularity as well)

How do the message-passing guys implement robust interactions between independent objects with mutable state, like the classic "bank transfer" example that justifies transactions in database

The term "message passing" includes occam-style messaging to MQSeries-style queued messaging, and robustness is a factor in both spaces and dealt with differently.

The bank transfer kind of example is easily handled by enqueuing a "transfer" message to a known teller object (this is service oriented architecture). An online store handles an order by enqueuing a message to the store to reduce the inventory. In such cases, each message send itself is an ACID transaction and an auditable action, so the act of transferring money may involve several DB transactions. I would hazard that more than 95% of transactions in the enterprise world are not between databases, but between a db and a messaging system.

Another example of a high performance message passing architecture is SEDA, with explicit support for scheduling between stages.

Coming back to LtU, the languages for writing such systems are woefully backward. I am working on a pet project to attempt to fix it in a familiar imperative setting, which I hope to demo in a couple of months. I have learnt much from the Erlang effort. I like Erlang not so much for its language but because of its systemic features (the supervisor hierarchy, failure mechanisms, lightweight isolated processes, etc.)

Hear, Hear

Coming back to LtU, the languages for writing such systems are woefully backward. I am working on a pet project to attempt to fix it in a familiar imperative setting, which I hope to demo in a couple of months. I have learnt much from the Erlang effort. I like Erlang not so much for its language but because of its systemic features (the supervisor hierarchy, failure mechanisms, lightweight isolated processes, etc.)
I could not agree more. Looking forward to that.

Can you help me understand

Can you help me understand why message passing concurrency is implausible for your domain? Isn't the Opengl API itself a wrapper over a message passing architecture?

Consider a game like Grand Theft Auto, where 10,000 or more objects move around and interact independently. Here, the objects are things like people, cars, weapons, props, etc. Each object has attributes that change over time, such as position, damage, relationships with other objects (who's carrying what), etc. At any point, any set of objects can potentially interact with each other in a stateful way, for example I can get in a car, drive it around, and run into a mailbox, damaging it.

Many of these interactions require atomic updates of groups of objects. For example, to fire a weapon, I need to first determine whether I'm carrying the weapon, whether the weapon has any ammunition, and then I need to create a bullet object. If a weapon has one bullet and two people tried to fire it simultaneously, non-atomic updates might lead both players to conclude that they can fire the weapon, so two bullets are created, and the gun is left with -1 bullets, an inconsistent state.

Basically, all of the atomicity arguments that have been applied to databases (e.g. the bank-transfer example) directly map onto the game example.

Therefore, you need some way to guarantee atomicity. Candidates:

  • We do this today (on 3-core CPUs) by simply single-threading this kind of code.
  • You could implement this using message passing, but you'd be writing and debugging your own ad-hoc transactional protocol for each of the tens of thousands of updates and state transitions present in the program.
  • If our game were sufficiently simple, we could multi-thread it by carefully locking and synchronizing the objects at the appropriate point, but as software complexity scales up, the analysis of whether the program might eventually deadlock is intractable.

So, in this case, transactional memory is the most natural productive solution to this problem. Keep in mind, we live in a comparatively simple world, since we always targeting a single multi-core CPU with shared coherent memory, and aren't concerned with databases, distributed computing, or fault-tolerance.

ad-hoc protocols

"but you'd be writing and debugging your own ad-hoc transactional protocol for each of the tens of thousands of updates and state transitions present in the program"

This seems to be simply a language user interface problem. If passing a message looks just like a function call, then designing an ad-hoc message passing protocol is no different than designing a function or object interface for these things (which you are already doing).
"Sending a message" (as it is called) to an object already looks like a function call in OOPLs. There's no reason sending a message to a channel/process/actor can't also look like a function call.

Bloat

The problem with implementing ad-hoc transactions using message-passing is that it bloats the code tremendously. In the single-threaded world or the STM world, you say "if(Ammo>0) {Ammo--; FireBullet();}". That would bloat up into tens of lines of asynchronous code to ask the recipient if he has ammunition, to reserve that ammunition, to finalize the interaction by effecting the ammo reduction, handle failure asynchronously, etc. Such code needs far more thought and testing than the STM version.

A solution that imposes an order-of-magnitude code bloat, productivity decrease, etc, is unattractive.

async message passing syntax

[edit: actually I'm arguing more that async message passing doesn't have to be as code intensive as it is currently. I grant the atomicity problem is a problem.]

I don't see why message passing would have to be longer than your example using STM. Check out the syntax for Raph Levien's Io language. It makes continuation passing style easy to write. In his syntax, semicolon introduces a function. So the semicolon at the end of a statement is actually a function representing the continuation of the program. Channels are simply functions. A remote procedure call with an asynchronous continuation winds up looking just like sequential code.

level of abstraction

It seems as if you make the conceptual leap from "everything is an object" to "everything is a process". I don't see why you'd have pretty much the same code in Erlang, based on my layman's knowledge of guns. Just like noone hanging out of a car window, shooting like mad, would be dealing with individual "bullet objects", your program ought to work at the level of "if (clip_empty()) {reload()}". If another actor has to get involved, say because the shooter doesn't have a spare clip in his pocket, a request is needed - in Erlang, that would most likely be a one-liner, not tens of lines of code.

There is no need for two-phase commits etc. just like there isn't in the corresponding real-life scenario. There is also hardly any call for specific error handling where the gun has to account for a scenario where the clip doesn't magically appear. The gun doesn't reload itself, does it? And the banking analogy doesn't hold, since you wouldn't have two shooters in different cars loading the same gun.

The thing about concurrency-oriented programming is that you normally go to real life for abstractions, and many of the good abstractions are quite simple. What would be the point of not using a simple counter to keep track of how many bullets remain in the gun? There is no interesting concurrency happening, but a simple relationship: (shot fired) -> (one less bullet). Make gun, clips, bullets into passive objects controlled by an actor - the shooter.

Ad-hoc Transactional Protocols Considered Harmful

You could implement this using message passing, but you'd be writing and debugging your own ad-hoc transactional protocol for each of the tens of thousands of updates and state transitions present in the program.

You said that critical point so clearly and distinctly! I'm very impressed.

The necessity for atomic updates is an application-specific thing. Given variables a and b, whether two otherwise sequential updates to them must be transactional depends on the semantics of the update within the application. It cannot be inferred, because the information only exists in the programmer's head. For the software to treat the updates atomically, it must be explicitly informed of the need.

Thus the smallest possible amount of information the programmer must give the application is that the updates have to happen together. In other words:

atomic {
update a;
update b;
}

represents a floor on the amount of annotation. There can be no solution to atomicity that is any more concise than this. (The specific syntax is of course irrelevant; begin; ... commit; is equivalent.)

So that code has to exist somewhere. It will either 1) exist in client code, or it will 2) exist in the message-handler implementation of the "ad-hoc transactional protocol" mentioned above. Note that 1 is a proper subset of 2, and that the additional overhead of 2 is substantial, and must be paid again with each additional atomic update requirement.

Having used a variety of distributed computing techniques, including DCE, RMI, pub/sub, and raw TCP, I am convinced of the superiority of message passing, particularly pub/sub message passing. It is a great solution to the problems of shared-everything, as well as a great abstraction in its own right. However it is not a solution to atomic updates. The solution to the problem of atomic updates is transactions.

For example, to fire a

For example, to fire a weapon, I need to first determine whether I'm carrying the weapon, whether the weapon has any ammunition, and then I need to create a bullet object. If a weapon has one bullet and two people tried to fire it simultaneously, non-atomic updates might lead both players to conclude that they can fire the weapon, so two bullets are created, and the gun is left with -1 bullets, an inconsistent state.

I'm not sure that I understand why this would be a problem in a message-passing system. If the ammunition state of the gun is internal to the gun (as it should be, since it's a property of the gun), then the first "fire" message to reach the gun would result in the creation of a bullet object. The second "fire" message would fail to create a bullet, because the internal ammunition level of the gun would have already reached 0 as a result of the first "fire" message (in any sensible message-passing system, it is possible for actors to refuse to process a new message until they've completely processes earlier messages - that's one of the fundamental differences between actors and passive objects). You wouldn't end up with two bullets, or a negative ammo state. The only remaining problem is that two players think they own the same weapon, but I fail to see how STM would solve that problem either.

Am I missing something here? Can you please explain why you think a message-passing implementation (in say occam or Erlang) wouldn't preserve atomicity in this case?

Consider this example

Consider this example instead.
You fire the bullet at a petrol tank. Potentially lots of other bullets, and other objects may be near this tank too. All of them will need to:

* Check if they hit the tank
* If so cause it to explode
* and deal the appropriate damage to themselves (e.g. modify bullet trajectory)

This needs to happen atomically. In other words, even if you decide that you didn't hit the barrel and it should not be modified, there is no way of knowing this BEFORE you've read the location of the barrel (and in case you do need to update it, you better have exclusive access!).

You could do this by having the barrel object implement an ad hoc transaction protocol, but if you have lots of objects in this scene who all examine this barrel (99.9% do nothing to it) would cause contention. Or you could optimize it by having a non-transactional check of the location to avoid acquiring exclusive access when you don't need it, but this is familiar territory from locks: in the real world you'll have much more complicated logic than this, where you really can't know if you needed transactional semantics until you've basically done all the work already.

STM solves this because each thread could examine the object all it wants, it's only when it's actually modified that collisions could potentially happen. Message passing is great, but they don't really work well for everything.

Futures

I think that for games, actually using just straight futures might be the best approach, as you really dont want any observable concurency but just a clean (optional input) -> output loop.

Transactional Java Virtual Machine

Hi,

I'm not sure if this thread is still active, but I know Patrick Logan because he used to work at my company GemStone Systems. In Patrick's article he briefly mentioned how GemStone has successfully been selling STM for many years. In essence, our technology IP here is that we optimize the standard Sun Java virtual machine and make it transactional, i.e., we can transparently fetch and store objects from disk and let multiple threads and processes share data in a consistent manner. No bytecode rewriting and AOP hacks are required since the virtual machine manages read/write barriers and any class can be made orthogonally persistent.

Though not originally targeted for game simulation engines that share lots of state between threads on a many-core system, it might be viable to utilize such transactional VM technology as a simple-to-program, main-memory, game engine based on optimistic concurrency with full WW and RW conflict detection and resolution. Of course if transaction durability is required, there's persistence too. That might help with game scalability.

Just a thought from a non-gamer (last time I played a video game was Atari 2600) and non-game programmer (last time I programmed a video game was using shape tables on an Apple II).

Welcome to LtU. Threads here

Welcome to LtU. Threads here remain active for as long as there's interest...

Discrete Simulation

My understanding of game technology (particularly game AI) is that processing occurs in discrete chunks, with a message receipt & decision phase followed by an update phase. In this model, where is the need for STM? Can't you simply queue up updates in the read & decision phase and execute them serially in the update phase, partitioning among threads updates to different data regions?

In this way, there is no need for atomic blocks since all updates within a discrete interval happen in a single update phase, and there can be no conflicting cross-thread reads or updates.

I mention this model because Tim does not include it among his three candidates. I am keen to hear his response, if he's still out there.

Disclaimer: I am not a game programmer. My reference for the discrete model is Steve Rabin's "Designing a general robust AI engine" in Game Programming Gems Vol. I, and also the SIGMOD '07 paper "Scaling Games to Epic Proportions".

Math check

... it makes 5% of my performance profile become 4X slower. I break even at 4 threads,...

If the STM portion uses 5% of four processors, then shouldn't you be comparing STM to 20% of a single processor instead of 5% of each processor? I suppose this assumes something about the amount of work that must occur between world updates, but in the extreme case where you'd need 100% of one processor to run the game logic, it looks like you'd need 20 processors before requiring STM (assuming the 5% stays constant). Edit: And then it's not that you ever "break even" with STM - the single threaded model just hits a wall.

Lock-freedom

STM is based on Fraser's work on lock-free algorithms. Has the Haskell STM lost those properties? If not, then lock-free algorithms cannot suffer from (global) dead-lock or live-lock.

Not all STM is lock-free

Well, not all STM is lock-free - many different transaction mechanisms have been proposed.

But "lock-free" is one of those nice problem hiding euphemisms.

Deadlocks are replaced by cyclic conflicts in "lock-free" transactional mechanisms. The deadlock resolution that many "locking" systems have, including locking transactional systems like Oracle, is replaced by conflict resolution.

I say "replaced" in a very liberal way, they're conceptually very similar and can produce very similar problems in naive implementations :-)

Lock-free, not lock free

Lock-free is not a euphemism, it is a defined term (defined in the paper I linked). A lock-free algorithm, by definition, cannot dead- or live-lock. Starvation may still occur. The paper above suggests that wait-free systems (lock-free but further starvation cannot occur), do require a system like deadlock resolution, but lock-free systems do not. I'm not sure if you are arguing against the defined term. If so, can you provide an example of a case that would produce "cyclic conflicts" and what problem it would cause.

There is no reason...

There is no reason that a defined term can not be an euphemism, as I'm sure victims of "ethnic cleansing" will agree :-). "Lock-free" in many cases is a euphemism for "abort and retry".

Add to the fact that "lock-free" algorithms require locking at some level (usually in terms of processor bus locks) and that some may use OS level waiting locks to implement the "lock-free" protocol and you have a very murky term indeed.

Lock-free algorithms do have a system similar to deadlock resolution, but it's not a system that actually resolves dead- or live-lock (I never said the systems DID actually dead- or live- lock), it's the conflict resolution mechanism used to resolve data conflicts (the Conflict Resolution Algorithm inherent in Optimistic Concurrency Control Validation).

In one narrow lock-free forward validation scheme (which is NOT the totality of STM or lock-free STM conflict resolution algorithms), your conflict resolution algorithm is to roll back any transaction that would conflict with another as that conflict is found. This algorithm avoids the problem of "cyclic conflicts" (which in a waiting system would cause a deadlock).

In a backward validation scheme this method is not possible and both transactions must be aborted/retried, in the case of a cyclic conflict (such as pieces of state written in the opposite order order) - in a naive implementation of this lock-free algorithm, it is possible to get pathalogical starvation, in the same way Ethernet collisions would in a naive implementation.

In similar situations in a locking database, it will deadlock (seeing the cycle in the graph of "waiters" for acquiring locks), detect the deadlock and roll back one (or both) of the transactions in the waiting list, possibly to be retried. My point was that this deadlock resolution is very similar to the conflict resolution of backwards validating optimistic schemes (for instance).

Re: working on it

Do tell?

Erlang already has STM...

... it's called Mnesia.

Mnesia tables can hold any kind of object, and don't necessarily have to be persisted to disk. Transactions consist of supplying a transaction function for the database manager to execute (via mnesia:transaction(Fun)).

Erlang folklore is that not many Erlang programs need or use Mnesia, but it's there for when shared state is useful. Maybe Tim's "70% stateless, 30% stateful" statistic could be modified downward, say something like "70% stateless, 20% message-passing, 10% stateful"?

I've read Tim's case...

... and I have issues with some of those points based on personal experience. So, answering those points in order...

* Large stateful software is indeed here to stay... although, I tend to think single-owner state is still a good idea (even if it can be passed between owners).

Microsoft's Singularity project allows single owner state to be passed around and even passed across channels (a messaging mechanism).

* Preemptive threads and monitors don't scale... and neither do transactions. Anyone who's worked on a high performance online transaction server will tell you that while the transactions are there for their contractual consistency (a legal requirement in much financial software), not because they scale well.

Quite often you end up funneling data back to a single points because it's the only way to get the transactions occurring reliably at a decent speed. I'm talking about a custom in-memory database (check pointed) not an IO limited disk monster.

Tim's case of 10000 entities with few overlapping updates feels a bit contrived when you're talking about "mainstream" programming languages, or even games. For example, how to correlate large amounts of information about 10000 entities into a single place concurrently? What if the information needs to be worked on as it's being provided? Transactions scale poorly in such cases.

* Imperative programming being the wrong default... I tend to agree with it, but I don't think it should be dogmatic :-)

* No argument there.

* Well, I have supporting evidence otherwise. I've implemented highly parallel systems (scaling to 32 processors rather easily) that contain thousands of entities with mutable state all being correlated against real time events and providing real time information updates.

How did I do it? Well, most of it was done with lock-less message passing and by using user mode context switches in certain cases (implemented transparently). Transactions never would've met some of requirements of some of the response times required.

These systems weren't trivial either, they were large scale device management systems that often had to handle the interactions of multiple devices at once.

The best solution to use is purely dependent on your design and architecture. If you come up with your design and architecture with message passing in mind, it's just as viable. I believe this applies to games as well in my experience with them.

The solution to use should probably be decided on a case by case basis. Each has it's merits and flaws.

Agreed

Also, why is the under-the-covers complexity of STM bad, when the under-the-covers complexity of garbage collection is good?

Because garbage collection

Because garbage collection is usually a bad idea. Seriously it is. No Seriously. In the vast majority of cases the ownership of memory is clearly defined and best dealt with by the compiler or programmer. In the few cases where garbage collection is useful a hell of a lot of implementations fail to actually collect it.

Basically, using garbage collection should be the programmers choice (it's supposed to be their job after all). The same seems true of STM. Widespread application of a technique to all areas as if it were a silver bullet is always foolish.

GC

According to my reading, your post make two points:

1) Many implementations of garbage collectors are bad.
2) If we could avoid having a GC, and still get rid of the garbage that would be better.

In a language with functions as first class citizens, it is not obvoius to me that you can easily live without a garbage collector - especially if you do not want to require whole-program compilation at all times.

Maybe the seeds of some arguments?

The post seemed to claim that there are some issues with transactions (not detailed, maybe starvation or contention?) which will at the very least damage composability, and that STM doesn't address distributed systems at all.

I didn't see anything to back up these claims - as far as I can tell you could rewrite everything in terms of garbage collection: "dynamic memory is a bad idea, garbage collection moves memory management into the system but doesn't explain how it will solve any of the problems and I think it won't, garbage collection won't work in distributed systems"

Perhaps these arguments have more force against STM, and LtU seems like a fine place to ask for them to be worked out (homepage vs. forum is debatable, but not worth debating). Summing it up, I'd say Patrick Logan raised some reasonable questions, but didn't provide answers.

So, can anyone point to theoretical limits, classes of programs current STM implementations handle poorly, composable reasoning strategies for correctness and progress of programs under other concurrency strategies, etc.

Point, where art thou?

I'd say Patrick Logan raised some reasonable questions, but didn't provide answers.

It's more like hinted that reasonable questions exist but failed to show them. Right now the post is so heavily edited that it looks worse than a wiki page on thread mode during a flame war.
I read the post twice in this current incarnation (i.e. with the comments mixed in the original text in unpredictable ways) and I'm unable to find anything other than "STM is bad, mmmkay?". There are valid questions to be raised wrt STM, for example they fail to provide process isolation (e.g. I would like to ensure that only a group of process access a bunch of STM vars, so if starvation occurs only this subsystem needs to be crashed and restarted).
Just saying that STM is wrong and everything should be done in with message passing is bizarre. There are other concurrent models (e.g. Oz's dataflow variables) so where is the argument showing that message passing is better for most domains? This feels like Smalltalk advocates or pure FP aficionados that believe in one hammer to rule them all.
I'm sorry Patrick but sometimes message passing is the wrong tool (while I believe that it's the most useful default model).

DBMS transactions do not scale

Are there any transactional systems out there that are not the usual shared memory or superfast interconnect giants?

Distributed transactions are probably the hardest research problem in database theory, practically not solved in large scale.

Transactions can scale, given enough information about them - whether they commute etc., but that would probably lead to integrating a theorem prover in the language, something more radical than using message passing. It's an interesting paradigm though.

STM replaces locks & semaphores, not MapReduce

Many problems can be parallelized easily by splitting the data up and streaming it through many processors. And in these cases, you definitely want to use MapReduce, nested data parallelism, or some other kind of parallel 'map' abstraction.

Unfortunately, not all problems are so easy to parallelize. Witness Tim Sweeny's example from the Unreal Engine: 10,000 game-world entities spread across many cores, all being updated in parallel. There's no easy way to do this with any variant of 'map', because the entities in the game world interact in arbitrary ways, and affect each other's state.

Now, you could solve this problem with semaphores and locks, but you'd go insane. And this is where STM offers a huge win: It takes all the nightmarish complexity of use fine-grained locks, and replaces it with a simple transaction model, complete with automatic restarts and strong guarantees against deadlock.

I can't think of any reason to loath STM so strongly unless you are absolutely sure that you'll never encounter a problem which requires shared state, and which resists the other tools in your toolbox.

Might have a point...

Lots of dodgy assertions and straw-man--bashing, but there are some lucid moments. This sentence intrigued me:

And if some group is going to retrofit transactional memory into some significant Java or C# system, well, they would be far better off investing that time into a simpler, shared-nothing rewrite into a language like Erlang, a simpler language like Smalltalk, or even a better shared-nothing coordination mechanism like Javaspaces.

Does he have a point? Might industry invest resources in STM that would be better invested in the Erlang model?

How to fix a broken toolbox.

Does he have a point? Might industry invest resources in STM that would be better invested in the Erlang model?

Who knows? Both Java and C# provide almost none of the guarantees one has using STM in Haskell or message-passing in Erlang. He's comparing between kludging STM in an existing imperative language against rewriting millions of lines of code in an obscure language (I like Smalltalk and Erlang but they're unknown to most of the industry) or start using a tuplespace mechanism to solve all concurrent problems. It's an apples and rhinos comparison. A much better comparison would be: kludge STM and light-weight, isolated, processes with message passing in both languages and see what would be easier (hint: isolation in Java is far away from being lightweight). Or else let's compare between rewriting to Haskell with STM vs Erlang with message passing (assuming that both have the man-centuries necessary to create the libraries and tools available in Java or C#).
Also how would Smalltalk automagically solve concurrency issues. AFAICS most Smalltalk implementations today use the shared-memory, threads and locks as it's basic concurrent solution.

Java + Haskell = Quark?

I don't know enough about STM or Quark, but would it be easy to port STM to Quark so you could get access to the Java libraries via something very Haskell-ian?

It looks like there might be

It looks like there might be an implementation of STM (LibCMT) for C#. C# has a form of software isolated processes (through the use of AppDomains), though they're not fully-fledged as yet. They can be used to model the kind of isolation for transactional memory I believe you're referring to.

MC# and use message-passing concurrency, in slightly different forms, although they're not strictly C# (MC# compiles to C#, so they can be integrated at some level). They are similar enough to C# that the effort involved in porting existing concurrent C# code would be entirely due to the switch to message-passing.

It would be an interesting experiment to take an existing piece of concurrent C# code and adapt it to both the LibCMT library and MC#, to get a feel for which is the shorter trip.

STM package written in C# published 2005

Seems like Microsoft Research published a C# Software Transactional Memory package back in 2005...

The SXM is a software transactional memory package written in C#. It is much easier to use than prior STMs because it uses Reflection.Emit to transparently add synchronization code to sequential data structures.

Not much concrete info, but still interesting

I've seen lots of cheerleading about STM, but not much practical experience to back it up. Patrick is bright enough that I don't immediately dismiss his opinions as aimless ranting :)

Uses of STM?

Who is using/has used STM for real work these days? In what language?

Erlang, while obscure, does have somewhat demanding real-life applications.

Has the situation changed from eg. Haskell vs. Erlang, reloaded?

Yes, in Java

This incarnation is being used in an open-source university management system. The system was developed in-house and has been in production for a couple of years now. For more insight, you can read the SCOOL'05 paper, or go through the presentation.

Tonight make it magnificent

I didn't initially notice that the post in question seems to be a reaction, at least in part, to this ACM Queue article (at least, I think it is; I have a hard time telling what's really going on on that blog page.)
Among other things, the above article "illustrates how an atomic statement could be introduced and used in an object-oriented language such as Java." I think Guillaume Germain captured one of the concerns about this possibility quite well:

I can see misguided programmers starting to sprinkle their code with 'atomic' statements ("just in case"), a bit like one would do with 'yield' statements in a non-preemptive concurrent system.

I can see how this prospect could generate a certain amount of trepidation. The problem, though, is not with STM itself, and in that respect, Patrick errs when he says things like "this is a really bad feature that could screw up a lot of software for years to come". Refactoring for factuality, I think he means something more like "I'm concerned that this is an all-too-seductive feature that could screw up a lot of software for years to come". In that, he could be right, but it actually has nothing to do with the technical merits of STM.

I don't see the concern

Essentially that argument reduces to "some people might not use it right" which I must point out is nonfalsifiable (ahead of the fact anyway) and can be said of anything. For example, has it been our experience that misguided programmers have started sprinkling their Java code with synchronized blocks, just in case? I haven't seen that happening; the problem I've mostly seen is underuse of synchronized causing race conditions. (Please don't take this as an endorsement of Java shared-state concurrency.)

Predicting feature quality

Essentially that argument reduces to "some people might not use it right" which I must point out is nonfalsifiable (ahead of the fact anyway) and can be said of anything.

I almost agree, except I suppose that there must be some class of features which would actually be a bad idea to add to a language such as Java — for example, how about raw memory pointers? I think there are some plausible criteria for recognizing certain kinds of bad features in advance — for example, features which violate safety properties. However, in this case, an argument of that nature hasn't actually been made, afaict.

But many decisions in PL design are based largely on the goals and convictions of the designer. If Patrick were designing a PL, presumably he'd leave out STM, and then the question could be decided in the marketplace, which is probably where such decisions ultimately belong (because there doesn't seem to be any other reliable way to decide them).

Yes!

For example, has it been our experience that misguided programmers have started sprinkling their Java code with synchronized blocks, just in case?

For the record, Yes, this has most definitely been the case. I have seen exactly this phenomenon all over the codebase of at least one large (top 50) e-commerce site. When subtle synchronization and race condition bugs occur (e.g., users click the same effectful link several times in quick succession, session data gets messed up, something bad happens...), I've seen synchronized markers employed like pixie dust until the problem seems to go away. I wish I were kidding when I say that virtually no one understands how this software really works or why synchronization occurs where it does.

Essentially that argument

Essentially that argument reduces to "some people might not use it right" which I must point out is nonfalsifiable (ahead of the fact anyway)

Not completely non-falsifiable. The C# designers regretted adding the convenient "lock { }" statement to the language for just this reason: it gave people a false sense of security, and harmed performance.

A sympathetic take

Though I agree with those who found Patrick's post difficult to follow and possibly overly provocative (though it is a personal blog post... ;-) ), I found myself in sympathy with the overall feeling.

I have nothing against research into STM, of course, and can imagine some scenarios where it would be a nice tool to have if used lightly and tastefully, but there seems to be a danger that it might be becoming the latest panacea. Such "silver bullets" discourage people from trying known good solutions where they are available (such as message passing) and instead simply enable existing pathologies for a bit longer. (Or simply convince people that the problems will be gone Real Soon Now.)

If we follow the "we already use transactional DBs" meme to its logical conclusion, essentially what STM is doing is cutting out the middle-man and embedding the database into your app, but without the persistence.

As anyone who has worked with a transactional DB with highly contended data knows, transaction support in the DB gives you some things for free, but makes other kinds of headaches. Retry support after a rollback, for example, and subtle deadlock conditions can create unacceptable performance and reliability hits.

This doesn't mean I want to get rid of transactions, of course, but it makes me much less optimistic that adding more of them liberally to my system will make my life easier for large apps.

If we add STM as a tool high in the CTM hierarchy of statefulness, where we take it for granted that we are not using any more state than we really need to to solve our problem elegantly, I'm much less worried, of course.

On the comment of being the

On the comment of being the latest panacea, I have to argue that such thinking might make progress a bit stagnant wouldn't it? Researching something thoroughly until it's proven to not work well seems a better idea then just go with something that's proven. I'm not sure if that was your intent in that comment but it seems to be the result.

Eat your dinner before you reach for dessert

I have to argue that such thinking might make progress a bit stagnant wouldn't it? Researching something thoroughly until it's proven to not work well seems a better idea then just go with something that's proven.

Given that there are a whole bunch of great ideas on how to reduce state and improve concurrency that we have already that have yet to gain wide acceptance and implementation in industry, I can't say I'm all that worried I might be stifling innovation by critiquing an idea whose major appeal is that it looks just like the same rickety old techniques most people are familiar with.

What?

...an idea whose major appeal is that it looks just like the same rickety old techniques most people are familiar with.

How is that even slightly true? STM, as far as I can tell, doesn't look anything like conventional techniques for thread communication.

The main appeal of STM is not really that it looks like anything else, but that it gives you very real guarantees about the composability of working components. You can take two transactions which work well on their own, and compose them in a number of ways into new transactions, the semantics of which have nice properties. The most obvious way is sequentially, and you get a guarantee that the result of the transaction, supposing that it commits, is independent of the behaviour of all other threads -- even if things happen in another thread in between the two transactions you've composed, they can't impact the result.

You can also compose them such that if the first fails to commit at first, and decides to retry, then the second is tried instead (and if it retries, then the whole transaction retries). This allows you to wait for any of a number of resources to become available, and to do different things depending on which it was. The inner transactions don't need to be prepared specially for being composed in this manner.

You also get all this composability even in the face of exceptions. A subtransaction which wants to fail by throwing an exception doesn't need to care about what changes the transactions it might eventually be part of might have made in case they don't catch it (and be left half-committed). It just throws the exception, and that causes the transactions it's a part of to behave as if none of the things they've done had ever happened and rethrow the exception until it's caught.

Incidentally, this behaviour can be used to cause transactions to throw an exception when they otherwise would have retried. x `orElse` (throw ...).

The trouble with many of the other systems, including most of the message passing systems, is that in order to compose components that work separately, one either ends up rewriting them, or sticking a layer of indirection in front of them, or worse yet, setting up some system of locks. Or, you design some transactional system from the start in terms of your messages. It would be easy to put STM to work in a combined solution there (STM transactions as messages), but the other way around is trickier to get right.

But what I'd really like to point out, more than all of this, is that STM is not really about the operational behaviour of the system. All it does is provide a language for saying things that you need to be able to say with regard to thread communication. "These things happen together, or not at all.", "If this doesn't work, then try this instead.", and so on. It's the right level of abstraction because it leaves the compiler and library writers room to implement things in more and more clever ways, getting more performance, while keeping the semantics (in the sense of the overall effect of the program) fixed.

This is very much the same as garbage collection. Early garbage collectors were not great at what they did, but they provided the machinery for the right abstraction for working with memory. The early implementations of STM are not great, but they're not nearly as clever as they could be at managing the resource they've been given -- they're the somewhat obvious implementations so that we can see what programming using STM is like. Specifically, the decision as to when to retry a transaction, and which transactions in the system are likely to run into contention issues can probably be done in a much more clever way, though the current approach is not as bad as it could be. If it turns out that we like STM (and personally, I really do think we will), then the lower level implementation can improve without us having to rewrite our programs, much like garbage collectors did.

Another unrelated and small remark that I'd like to point out is that message passing is the sharing of state. It's just a particular form of threads communicating their state to one another. When one thread sends another a message, it is creating a data dependency between its local state and that other thread's local state. Systems which can't do this are not really expressing concurrency, just parallelism. I point this out only because it grates on my ears a bit when people expound the virtues of "no shared state", and then talk about systems where threads share state with one another (albeit in a responsible and selective fashion). I know that people are using "shared state" to refer to one particular model of thread communication, but that should not lead one to the idea that other models share no state between threads. On the flip side of the same coin, calling STM "shared state" would be somewhat degrading, because while it could be used in that way, it would be an unduly awkward way of achieving that effect, and one would get none of the benefits of using transactions. Similarly, one could use message passing where threads continuously communicated all changes in any parts of local state which might be relevant, but it would be an awkward misuse of the system.

Shared Context

How is that even slightly true? STM, as far as I can tell, doesn't look anything like conventional techniques for thread communication.

The familiar thing is transactional databases, which are quite commonly, though not always with conscious awareness, used as big multi-process global variables.

I point this out only because it grates on my ears a bit when people expound the virtues of "no shared state", and then talk about systems where threads share state with one another (albeit in a responsible and selective fashion).

Many of us here take for granted a theory of statefulness that is not binary, but rather hierarchical, following CTM. For us, shared state refers to a particular slot in that hierarchy. In this scheme, message passing is less stateful than full state-sharing.

CTM and Transactions on Cells

This may be a naive question, in that I don't know a lot of the details about STM.... but.... Isn't there a correlation between STM and the CTM section 8.5.4 on Transactions - Implementing transactions on cells?

Dey turk ma jugh!

But what I'd really like to point out, more than all of this, is that STM is not really about the operational behaviour of the system. All it does is provide a language for saying things that you need to be able to say with regard to thread communication. "These things happen together, or not at all.", "If this doesn't work, then try this instead.", and so on. It's the right level of abstraction because it leaves the compiler and library writers room to implement things in more and more clever ways, getting more performance, while keeping the semantics (in the sense of the overall effect of the program) fixed.

This is very much the same as garbage collection.

Aha. Which helps explain the reaction, too. On the purely pragmatic front, it's reasonable to be skeptical that compilers are going to do a good job of implementing a new high-level abstraction, until their success is well proven, and the appropriate use cases well understood.

There are also all the factors surrounding what happens when something that used to be hard becomes easier. Concurrency is a bit different from garbage collection, though. When garbage collection works well, which is often, programmers don't really have to reason about memory management much (or even understand it!) That seems unlikely to be the case for concurrency: it's still going to be necessary to reason about it, even if the way concurrency requirements are expressed becomes more high-level. So the concern that programmers will misuse the feature due to lack of understanding of the underlying issues seems to have more of a basis in this case, than in the case of garbage collection.

(This just helps me understand why STM might have provoked a negative reaction, I'm not agreeing with the reaction.)

GC works, programmers don't

When garbage collection works well, which is often, programmers don't really have to reason about memory management much (or even understand it!)

Tell that to my poor desktop with 1.5gb of RAM entirely occupied. Or to our customers servers running J2EE. People don't get memory management and just hope that the GC will magically keep the memory usage low, the same thing happens today with concurrency and will continue to happen, even if the entire industry adopts Haskell/STM or Erlang. Also as with memory management, concurrency solutions will keep showing up, even if the industry adopts STM as it adopted GC: there's still research on linear resources and regions because GC isn't enough. Programming is a very hard activity and all tools can (and probably will be) misused. Today "memory is cheap" solves all memory leaks problems, tomorrow starvation and friends will require the "cores are cheap" approach.

Tell that to my poor

Tell that to my poor desktop with 1.5gb of RAM entirely occupied. Or to our customers servers running J2EE. People don't get memory management and just hope that the GC will magically keep the memory usage low, the same thing happens today with concurrency and will continue to happen, even if the entire industry adopts Haskell/STM or Erlang.
J2EE is the antethesis of concurrent programming -- it is the new mainframe. Cram everything into one JVM then put a lot of textual boilerplate around it to make it appear "loosely coupled". Better concurrent programming tools would allow these systems to "spread out".

The same is true of desktops, certainly. Only until recently though they've not had multiple CPUs readily available, so there was no where to spread out to. Now that there is room to spread out, the languages we've been using prevent it.

STM will not help spread systems out. Message passing and shared-nothing sequential programming will. Most programmers should be learning about messaging and most tool development should be in support of them. STM is a shiny tool in the best interest of very few programmers.

Transactions

J2EE is the antethesis of concurrent programming -- it is the new mainframe.

J2EE is a collection of a whole lot of different standards. It includes successes such as servlets and JSP, failures such as EJB 1.0, and JMS: Java's message passing API, which I thought was what you were arguing for. And it's hard to see how a language (Java) with threading and synchronization primitives built in can be the "antithesis of concurrency". Even if we stipulate all the complaints about the model itself, it's undeniably deeply concurrent.

I'm unclear what you mean by the "new mainframe" remark. Did you mean that J2EE is a highly successful high-throughput centralized system? Somehow the remark came off as negative; perhaps this is one of those from-the-hip barbs you admitted to elsewhere?

STM will not help spread systems out.

You don't say what "spread out" means; my best guess is distributed computing. If so, when you say that transactions will not help with distributed computing you are correct. However this is unsurprising since transactions do not address distributed computing or communication of any kind; they address atomic updates. We could just as well have pointed out that message passing doesn't help with atomic updates, or that type inference doesn't help with male pattern baldness. The dichotomy you are pushing is entirely a false one: transactions and message passing are independent techniques. We can have one or the other or both or neither.

I happen to think that message passing is a good technique, although I lack the brazen confidence to say that it is what "most programmers should be learning about." As to who transactions will benefit, the answer is that it will benefit programmers who are working on software that requires atomic updates. Whether this is a "very few" programmers or not I have no idea; I know it includes me.

Finally, I note that the repeated characterization of people studying transactions as being driven by "shiny" things is inflammatory and counterfactual. If you don't care to address or even acknowledge the various arguments in this thread against your assertions, that is your prerogative, however if you could do so without insulting me I would appreciate it.

You don't say what "spread

You don't say what "spread out" means; my best guess is distributed computing. If so, when you say that transactions will not help with distributed computing you are correct. However this is unsurprising since transactions do not address distributed computing or communication of any kind; they address atomic updates.

I believe his core point was that both techniques can safely address concurrency, but that the message-passing abstraction also permits distribution with no additional machinery; it is thus ultimately more expressive and useful. Perhaps a CS version of Occam's razor is in order: don't multiply abstractions unnecessarily.

Irony

Perhaps a CS version of Occam's razor is in order: don't multiply abstractions unnecessarily.

Ironically, the occam programming language was developed as a message-passing language capable of expressing both concurrency (most occam programs were composed of networks of parallel "processes" which communicate through message-passing), and distribution (occam processes could be transparently distributed across a network of transputer microcontrollers). IIRC, the name of the language was chosen because Occam's Razor was a good expression of the underlying philosophy of the language.

Optimistic concurrency control is the big question mark

I've written (with some other students) and implementation of STM for Ocaml. The implementation was dirt simple; it took less than five hundred lines of code to implement it, and that includes extras like first-class transactions and optimizations to eliminate overhead in the first-order case. So I don't believe that it's a very complex model.

However, the giant question mark for me is the automatic rollback and retry. Being optimistic and rolling back on failure is extremely efficient when it works, but when it doesn't it can be very expensive. I just don't know how to predict the performance of an STM program, and I worry about debugging programs that fail to meet performance targets, because they may fail to hit targets nondeterministically.

And Maurice Herlihy (the guy who has mostly been pushing the idea) has never been quiet about this being the big research question behind STM.

If we add STM as a tool high

If we add STM as a tool high in the CTM hierarchy of statefulness, where we take it for granted that we are not using any more state than we really need to to solve our problem elegantly, I'm much less worried, of course.

I guess this is my perspective on the whole thing, which makes me see this as something of a mountain out of a molehill. I realize that STM and, for example, Mnesia, aren't precisely the same thing, but as compared to message passing, they're certainly more similar than different. So arguing for Erlang rather than STM seems a little short-sighted, when the Erlang community has already acknowledged the need for sharing at this level, and has in fact embraced and bragged about it! I'd refer back to PVR's Convergence paper.

The only thing I really have to say about the original blog post is that yes, Patrick is someone whose opinions I take seriously and yes, the current format of the post (99% comments by now?) makes it virtually impossible to tell what it originally said...

Em, peek. Is it safe?

I realize that STM and, for example, Mnesia, aren't precisely the same thing, but as compared to message passing, they're certainly more similar than different. So arguing for Erlang rather than STM seems a little short-sighted, when the Erlang community has already acknowledged the need for sharing at this level, and has in fact embraced and bragged about it!
The difference being that programming in Mnesia, even the "in memory" variety, is still deliberately a "database" thing. Most everything should be in Erlang.

Many of the Erlang processes might be very well be "database-like" stateful things, with the Erlang message queue and pattern matching essentially acting like a transaction processing monitor. No mnesia needed.

I can sympathize with game development. I've been a developer for digital electronics simulators where 80% is just objects and 20% is highly tuned C. I would imagine that not the average game developer even gets involved in that highly tuned part on a daily basis.

So it's not you I am worried for. It's the average game developer, the average financial systems developer, the average business systems developer. While the mechanism per se worries me, it's the increasing fascination with it that bothers me most.

Wow. I throw barbs like this out fairly often, shooting from the hip if you will, to see what comes back. I had no anticipation for this much feedback one way or the other. Very enlightening. I should have known it would be from the LtU crowd!

I had no anticipation for

I had no anticipation for this much feedback


You should post here more often... :-)

For those new to LtU: Patrick was a regular contributor in the early days of LtU, and I regularly try to persuade him to come back...

I Wish

I barely have time for my half-baked ideas on my own blog! Then I hit on something controversial like this and the firehose reminds me...

You see what I have to deal

You see what I have to deal with here, guys... ;-)

a bit more...

Nice to see you on LtU again!

Upon further reflection, I do have pretty big doubts about STM for Java/C#/etc. programmers... I had only encountered STM in the Haskell setting, and never really imagined that anyone would try to apply it in a mainstream language without radical changes to the host language.

It seems to me that, with the possibility of rollback and retry, STM requires a change to our understanding of side effects at least as radical as a switch to pure FP. Perhaps in the end the programming model won't have to change as much as switching to Haskell, for example, but I think that the understanding of side effects will have to change. Obviously the nice fit between Haskell and STM is well understood, but the other side of that coin is that this is a potentially huge hurdle for languages like Java.

Does anyone have a pointer to work on bringing STM to one of these mainstream languages, particularly solutions to IO operations and other side effects? Will we have to overhaul the entire JDK, for example? Even so, won't programmers need to understand that prompting a user for input needs to be "committed" before the program actually waits for that input?

STM package written in C# published 2005

Seems like Microsoft Research published a C# Software Transactional Memory package back in 2005...

The SXM is a software transactional memory package written in C#. It is much easier to use than prior STMs because it uses Reflection.Emit to transparently add synchronization code to sequential data structures.

Comments

the current format of the post (99% comments by now?) makes it virtually impossible to tell what it originally said...

Yeah, I was personally a bit annoyed by the fact that he duplicated and interspersed my response into the original post along with his replies. Replying in a comment would have been better form, I think. *hint*

While his point is

While his point is interesting, he must be an Outlook user: comments are before the article, etc.

The result is quite obscure, bleh.

Outlook User?

No. Just busy and sloppy.

No Sense

And no, I can't make heads or tails out of the original blog post anymore either.

Why Not Both?

In my day job working on databases I use transactions AND message-passing all the time. Transactions need to be kept small and fast or else performance and locking problems start affecting performance - and they aren't easily predictable because different users accessing a common store (database) with different requirements can interact in obscure ways.

The answer (in my experience) is to reduce shared state to an absolute minimum, use transactions with their Undo, Retry or Leave facilities to handle clean updates to global state and then message passing (via that state) for the top level processes.

It seems to be that the real crux of this argument is that STM in an imperative language like Java would be a terrible idea, but not in a language like Haskell in which the state that STM co-ordinates is both minimised and compartmentalised.

Good Point, IMHO

Thoughout this thread, I've been mulling over how the issues Patrick raised over STM, and what I believe to be the more reasonable responses to them, relate to Functional Relational Programming: Out of the Tarpit, which I still need to make time to go over in more detail.

I think you nail it in your

I think you nail it in your last paragraph.

Why terrible?

I wouldn't be so quick to assume that this is such a "terrible idea" in Java, at least from a technical perspective (unless there are technical barriers that haven't been raised yet).

Quite a bit of the use of Java in server-side corporate/enterprise work is quite stateless and transaction-oriented, and relies largely on the use of traditional databases to achieve that. As main memory gets bigger & cheaper, having ways to replace some of that database usage with similarly-architected in-memory solutions could be useful. Just one example.

In Theory

In theory.

Exactly!

I might be the one exception, or maybe it's because the only exposure to STM (in a practical sense) is Haskell, but I never even considered STM and message passing to be competing technologies!

I've always assumed that the consensus was "use implicit-ish approaches for parallellism where possible, use message passing for concurrency where suitable, and use STM at the very 'bottom' when you absolutely need shared state".

It seems like this whole argument here is about choosing "either or", and that it completely disappears once you look at it through the lens of a language like Haskell -- functional without state or with local state for the majority of the cases, message passing concurrency for most of the concurrent stuff underneath, and every now and then shared state with STM.
In my opinion that gives a really good toolkit which doesn't limit you in what kind of applications you can write conveniently. I think Tim Sweeney gave very good examples of when message passing just doesn't do the job -- can we please stop pretending that either approach solves all problems? They complement each other nicely -- people are just extra excited about STM because it very nicely solves the remaining hard problems after you've started using message passing everywhere it makes sense (locks, monitors, lack of compsoability etc. in the cases where you absolutely do need shared state). I'm currently working in the games industry as well, so it may be that games are just inherently ill-suited for message passing, but I doubt it; I'm sure lots of applications would do just fine with just message passing, but shared state will always be needed here and there.

STM in Erlang?

Hm, does that mean it would be great to have STMishness implemented for Erlang, to get the best of all possible worlds?

When I say STM I should be

When I say STM I should be clear that I mean "STM in Haskell". For example you can statically guarantee that only transactional actions can be performed in an "atomic" block, no others, and you can't run transactions without "atomic" either. (So all of those "people will forget and cause bugs, or abuse it and cause performance issues" complaints that have been raised, are dead wrong -- it can only be used for transactions thanks to the type system and it can never be forgotten and cause bugs, again thanks to the type system).

So to answer your question, I do think think STM comes to its right in a pure and statically typed setting, so Erlang wouldn't be my first choice. Haskell comes very close to the "ideal" (it already has STM and message passing, and purity). In order for messages to be truly useful we may need some more help from the type system (e.g. better records? union types?).

STM in Erlang

Interesting to see some discussion around implementing (something like) it in Erlang.

Would STM even be possible in Java?

It seems to me that STM requies a pure language. As I understand it, STM requires that the transactions can be retried as many times as necessary with the same meaning each time.

In Haskell, encapsulation within the STM monad ensures that the transaction has no dependencies other than the STM variables. Would this be possible to enforce in a language like Java?

Could it be done using a pure sublanguage (based on Featherweight Java perhaps)? Ignoring the issues of mutable state outside of the system (e.g. I/O), would treating the entire heap as an STM store be feasible?

Apparently

Somebody thinks so.

TFA^2

The article which seems to have triggered all this also describes how it might work in Java (e.g. "how an atomic statement could be introduced and used in an object-oriented language such as Java.")

Note that two of the authors of the article are from Intel, which has an obvious interest in promoting solutions that might help with multicore architectures.

The main problem

The main problem which languages other than Haskell face in implementing STM is defining a sublanguage of things which are permissible in transactions and can be rolled back transparently in case of a retry or exception, and then ensuring that nothing else accidentally lands in a transaction. You can include more than just interactions with transactional variables in this -- actions which simply read from the environment without affecting anything would also be okay. A language like Java would almost certainly need some extensions to its type system to support this. You could do it in dynamically typed languages like Python and Ruby as well, but you do need to take extra care. Personally, in cases like that, I'd probably want to compile the whole transaction and make sure it's okay before running any of it. Throwing an exception at the first bad action (and not performing it) would also work, but makes me feel somewhat uneasy. That might just be my tendency toward static guarantees though.

Setting aside the need to check that transactions really consist of valid actions, yes, you could go ahead and make every variable a transactional variable. That seems a little strange to me, but in principle it could be done. You'd really have to work on making the transaction machinery lightweight. Every variable update or read not already in an atomic block would become a small transaction in itself. Note that atomic blocks might be running optimistically that have read the variables you're modifying, and will have to be notified that they must restart, because something has trampled on their read set.

You would probably want whole assignment statements like:

x <- f(x) + g(y)

to occur as-if-atomically, so this might translate at a lower level into something along the lines of:

atomically
   x' <- read x
   y' <- read y
   write x (f(x') + g(y'))

where read and write are the primitive transactions for accessing transactional variables, so x' and y' are the actual data contents of x and y respectively. (This translation obviously depends on a whole lot of properties of your actual language. Here I'm assuming that f and g want actual values and not references as parameters, and that they are not themselves variables.)

The only trouble I see with this aside from the obvious performance concerns is that it does almost nothing to push users in the direction of limiting the state that they share. When you have to explicitly construct transactional cells, and hand them to your threads, it gets you to think about something which is fundamentally important to the design and well-being of your program. Then again, so do lots of features of Haskell which people using other languages like to ignore. ;)

STM locks and byzantine shared state

I've got some comments from personal experience of implementing a "kind of" STM in Askemos - a somewhat unusual distributed environment (somewhat akin to Erlang anyway), where STM suddenly makes sense.

I don't see shared state as a Devil

I may be naive, but I don't see shared state as a Devil to eradicate.

On the other hand, I do not believe that programmers should ever have to deal with locks-- they are way too easy to mess up.

So I will take any idea that allows parallel programming without locking. I like the idea of rollback and retry, but I am not sure I want to do that any time that a sharing conflict arises. Sometimes I may want to keep whatever value was written last, or first, or the one matching a predefined criteria without retrying the "losing" transaction. In fact if I was to chose I would not make rollback and retry the default.

STM may be composable with itself, but is it composable with other concurrency mechanisms? For example, how do I "rollback" message passing? Do I need to delay it until the transaction commits? Is it illegal to receive a message during a transaction? What if the program decides to parallelise transactions in a loop (should be safe), causing unnecessary conflicts?

Isn't it about cost?

I mean, all design choices ultimately come down to which costs you are willing to pay and which ones you are trying to minimize, right? In an ideal world, memory and CPU would be infinite, all data would be immutable and every computation would be reversible, right? Programming languages are simply compromises on that ideal that allow them to be implemented on physical computers with real constraints that sometimes dictate practical and impractical design choices.

Now I'm the last person in the world that could be called an expert on concurrency, so I need a simple, concrete example that I can wrap my head around. In the game World of Warcraft, there are mining nodes which players can run up to and mine, receiving raw materials as reward. However, the resource is limited and mutually exclusive, so only one person can mine a resource at a time, and it can only be mined four times or less before disappearing.

I'm imagining a game engine that would support this feature in which each player was represented by a dedicated thread/lwp. I'm trying to figure out how STM and MP compare in this task, so my simplistic understanding leads me to these two designs:

For the STM design, each player thread attempts to access the mining node optimistically. If they are the only player attempting to mine the node, they succeed, and a packet is sent from the server to the client indicating success. If two players attempt to mine the same node, one of them succeeds first, and sends a success packet, while the other one fails and must roll back, sending a failure packet. This scales up to N players.

For the MP design, one thread must "own" the mining node. Since it doesn't make sense for any of the player threads to own it, there must be some additional "environmental" thread which owns it. In the single player case, the player thread sends a message to the mining thread asking to mine the node. The mining thread sends a success message back to the player thread, and the player thread sends a success packet back to the client. In the simultaneous mining case, one player thread message reaches the mining thread first and wins, getting back a success message. The other message is rejected with a failure message.

They both seem like perfectly reasonable approaches to the problem, with the difference being that the MP solution requires an additional thread (which may already exist for other reasons). Furthermore, even in the non-contentious case, the MP version requires communication between two threads, whereas the STM version is basically as fast as a single-threaded implementation. In the highly contentious case of, say, 20 people all trying to mine the node at the same time, the STM case would force a potentially expensive rollback for all but one of the threads, while the MP case would simply send 20 message both ways (but at the cost of serializing the message processing, since the mining thread can only process one message at a time).

From this, I naively conclude that STM is probably faster in the optimistic case, and that MP probably degrades better under contention. If you know a priori whether a given interaction is likely to have low or high contention, then it seems to be a matter of sound design to choose one or the other, rather than dogmatically insisting on One Concurrency Solution To Rule Them All.

But, I'm open to being convinced otherwise.

P.S.

I forgot to mention that Erlang users are probably more likely to favor MP because threads are extremely cheap, while Java users are more likely to favor STM because threads are considerably more expensive. So it isn't necessarily that one group is right or wrong, but that the intrinsic costs involved make one solution or other more or less favorable. But maybe I just don't know what I'm talking about.

Intel STM compiler now available

We have a lot to learn before we can decide whether STM offers some relief from locks (they are NOT going away) and offers help for programming, or for tools which compose programs automatically.

We think that the existence of a C/C++ compiler supporting Software Tranactional Memory (STM) would be a great help. So...

Today, we released a prototype version of the Intel C/C++ Compiler with support for STM. It is available from Whatif.intel.com.

The Intel STM Compiler supports Linux and Windows producing 32 bit code for x86 (Intel and AMD) processors. We hope that the availability of such a prototype compiler allows unprecedented exploration by C / C++ software developers of a promising technique to make programming for multi-core easier.

This prototype compiler download requires that you already have the Intel compiler installed - our web site explains how to get an evaluation copy (free for limited time) for the compiler to install, or Linux users can get a 'non-commercial' license - either are enough to use the STM compiler download.

Releasing this compiler offers an opportunity for exploration and learning. I think we should all hope that it helps understand the promise or myth of the value of STM better. The opinions certainly run the full range today - as this blog indicates all too well!