Patrick Logan on Software Transaction Memory

A detailed blog post on STM - and why it is a Bad Thing.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

contains no actual arguments

I didn't see any serious arguments in that post. It was just the repeated assertion, in inflammatory language, that STM is bad. It implies the false dichotomy that things have to be shared-everything or shared-nothing. It implies that the reason some people like it is because it's "shiney" without mentioning the reduced complexity of the programming model, or the fact that transactions have been used quite successfully in DBMSs for decades. All in all not something I would expect to see on the LtU homepage.

Ditto

I was extremely disappointed. Patrick, who I know to be a serious thinker about programming language issues, blithely suggests that everyone should simply rewrite their systems in Erlang or some other shared-nothing message-passing language, nevermind the prohibitive costs of doing so for any real-world production system. He also handwaves the issue of transaction composability away as "academic," which is particularly odd considering the real world effort to make transactions composable. Complaining that STM takes you "too far away from the domain" while recommending a shared-nothing message-passing rewrite is also pretty rich, IMHO.

I also suspect that Patrick hasn't read The Next Mainstream Programming Languages, where I think Tim makes a very persuasive case that:

  • Large-scale stateful software isn't going to go away, particularly when it already exists and is shipping.
  • Preemptive threads and monitors don't scale.
  • Imperative programming is the wrong default.
  • There are multiple viable approaches to effects in otherwise pure languages (monads, effect types...)
  • Slide 50: "Claim: Transactions are the only plausible solution to concurrent mutable state," where the alternatives considered on slide 49 are "referentially-transparent functions, message-passing concurrency, or continue using the sequential, single-threaded approach."

Now, it's possible that Tim's analysis doesn't apply to many domains other than games. But I don't see that argument being made effectively.

STM makes me nervous

I happen to agree completely with Patrick's observations. I have some reasonable experience in writing transactional systems (the JTA implementation for Weblogic, among others) and I'll attempt to state my concerns without actually having worked with an STM implementation. I'm sure someone will correct me if my concerns are misplaced or if I'm wrong.

STM is an optimistic transaction model suited for low levels of interference between threads of control. You can easily add a whole lot of overhead by introducing an innocuous looking method (that happens to touch a lot of state or merely increases the contention window) inside the atomic block. Unlike db transactions, STM has no timeout and may suffer mysterious slowdowns until the transaction goes through after many retries. At worst, it could livelock, if the window of contention and and the retry frequency are big enough. STM works for Haskell because mutation is very limited in the language; I very much doubt STM will be a good solution in the hands of a C#/Java programmer.

Second, there are many areas that don't come under the purview of STM (file activity, screen updates, real db transactions with different tx semantics). They don't work under an arbitrary retry model, unlike in a locking case where you know you own the critical section.

There seem to be a number of inconsistencies in Tim Sweeney's preferences. "large scale stateful software isn't going away" and "imperative programming is the wrong default" are contradictory, or at least sounds like an impasse. Performance is crucial to him, yet he's willing to live with a 4X slowdown for STM.

In any case, I think threads with shared-memory semantics are a huge disaster, and Erlang isn't the only *kind* of solution around (functional, shared-nothing).

It is possible to create (and I'm working on it) a message passing system that supports a gazillion tasks, non-interference between tasks, asynchronous message passing, uses collocation effectively if it is there, has user-level scheduling (what does the kernel know about your application's scheduling preferences anyway?).

It can be fast (blindingly fast if you talk to the Occam folks), composable (CSP, Occam), able to naturally take advantage of multiple cores and processors, and eliminate the unnecessary distinction (at compile time) between collocated and distributed systems. Finally, it can be intuitive to a normal programmer (someone who is still confused by monads, or lack of mutability) by retrofitting all the other concepts onto a traditional language and preserving referential integrity under aliasing.

I think the Actor/CSP model is a far better way of dealing with concurrency, but the implementations available so far (other than Erlang) haven't been able to demonstrate the full potential of the model.

STM nervousness

STM makes me nervous too, but raw shared-state concurrency is terrifying, so that's an improvement! Here are some thoughts:

"large scale stateful software isn't going away" and "imperative programming is the wrong default" are contradictory

My current analysis of the sort of code that's required in a game is that purely functional programming (plus Haskell-style ST non-imperative local state) is sufficiently expressive for around 70% of the code we write, which accounts for 95% of our performance profile. Thus I see functional programming as the right default.

The other 30% of our code (which accounts for 5% of performance) is necessarily so stateful that purely functional programming and message-passing concurrency are implausible. Thus I see imperative programming, in some form or another, as remaining an essential tool for some systems present in modern software.

Performance is crucial to him, yet he's willing to live with a 4X slowdown for STM

Without STM, the only tractable way to manage this code is to single-thread it. I'm not nearly smart enough to write race-free, deadlock-free code that scales to 10,000 freely-interacting objects from 1000 C++ classes maintained by tens of programmers in different locations.

With STM, I can continue writing software at full productivity, and it makes 5% of my performance profile become 4X slower. I break even at 4 threads, and come out ahead after that. So, eventually, STM wins over single-threading.

How do the message-passing guys implement robust interactions between independent objects with mutable state, like the classic "bank transfer" example that justifies transactions in databases? You end up writing an endless set of ad hoc transaction-like message exchanges for each interaction that may occur in the system. Thus my argument that STM is the only productivity-preserving concurrency solution for the kind of problems we encounter in complex object-oriented systems that truly necessitate state, such as games.

message passing and transactions

The other 30% of our code (which accounts for 5% of performance) is necessarily so stateful that purely functional programming and message-passing concurrency are implausible

Can you help me understand why message passing concurrency is implausible for your domain? Isn't the Opengl API itself a wrapper over a message passing architecture?

In the server/middleware space that I'm more conversant with, there are a large number of examples where mutability, concurrency and message passing are standard patterns. The Tuxedo transaction monitor has just a few primitives (tp_send, tp_enqueue etc.) and uses OS processes for isolation. It works for Visa and Amazon.

The Singularity OS is built on a foundation of message passing and isolated processes, where each process is written in a mutable style; there is no shortage of concurrency or mutability. (Incidentally, Tim Harris who co-wrote the STM-Haskell paper works on Singularity as well)

How do the message-passing guys implement robust interactions between independent objects with mutable state, like the classic "bank transfer" example that justifies transactions in database

The term "message passing" includes occam-style messaging to MQSeries-style queued messaging, and robustness is a factor in both spaces and dealt with differently.

The bank transfer kind of example is easily handled by enqueuing a "transfer" message to a known teller object (this is service oriented architecture). An online store handles an order by enqueuing a message to the store to reduce the inventory. In such cases, each message send itself is an ACID transaction and an auditable action, so the act of transferring money may involve several DB transactions. I would hazard that more than 95% of transactions in the enterprise world are not between databases, but between a db and a messaging system.

Another example of a high performance message passing architecture is SEDA, with explicit support for scheduling between stages.

Coming back to LtU, the languages for writing such systems are woefully backward. I am working on a pet project to attempt to fix it in a familiar imperative setting, which I hope to demo in a couple of months. I have learnt much from the Erlang effort. I like Erlang not so much for its language but because of its systemic features (the supervisor hierarchy, failure mechanisms, lightweight isolated processes, etc.)

Hear, Hear

Coming back to LtU, the languages for writing such systems are woefully backward. I am working on a pet project to attempt to fix it in a familiar imperative setting, which I hope to demo in a couple of months. I have learnt much from the Erlang effort. I like Erlang not so much for its language but because of its systemic features (the supervisor hierarchy, failure mechanisms, lightweight isolated processes, etc.)
I could not agree more. Looking forward to that.

Can you help me understand

Can you help me understand why message passing concurrency is implausible for your domain? Isn't the Opengl API itself a wrapper over a message passing architecture?

Consider a game like Grand Theft Auto, where 10,000 or more objects move around and interact independently. Here, the objects are things like people, cars, weapons, props, etc. Each object has attributes that change over time, such as position, damage, relationships with other objects (who's carrying what), etc. At any point, any set of objects can potentially interact with each other in a stateful way, for example I can get in a car, drive it around, and run into a mailbox, damaging it.

Many of these interactions require atomic updates of groups of objects. For example, to fire a weapon, I need to first determine whether I'm carrying the weapon, whether the weapon has any ammunition, and then I need to create a bullet object. If a weapon has one bullet and two people tried to fire it simultaneously, non-atomic updates might lead both players to conclude that they can fire the weapon, so two bullets are created, and the gun is left with -1 bullets, an inconsistent state.

Basically, all of the atomicity arguments that have been applied to databases (e.g. the bank-transfer example) directly map onto the game example.

Therefore, you need some way to guarantee atomicity. Candidates:

  • We do this today (on 3-core CPUs) by simply single-threading this kind of code.
  • You could implement this using message passing, but you'd be writing and debugging your own ad-hoc transactional protocol for each of the tens of thousands of updates and state transitions present in the program.
  • If our game were sufficiently simple, we could multi-thread it by carefully locking and synchronizing the objects at the appropriate point, but as software complexity scales up, the analysis of whether the program might eventually deadlock is intractable.

So, in this case, transactional memory is the most natural productive solution to this problem. Keep in mind, we live in a comparatively simple world, since we always targeting a single multi-core CPU with shared coherent memory, and aren't concerned with databases, distributed computing, or fault-tolerance.

ad-hoc protocols

"but you'd be writing and debugging your own ad-hoc transactional protocol for each of the tens of thousands of updates and state transitions present in the program"

This seems to be simply a language user interface problem. If passing a message looks just like a function call, then designing an ad-hoc message passing protocol is no different than designing a function or object interface for these things (which you are already doing).
"Sending a message" (as it is called) to an object already looks like a function call in OOPLs. There's no reason sending a message to a channel/process/actor can't also look like a function call.

Bloat

The problem with implementing ad-hoc transactions using message-passing is that it bloats the code tremendously. In the single-threaded world or the STM world, you say "if(Ammo>0) {Ammo--; FireBullet();}". That would bloat up into tens of lines of asynchronous code to ask the recipient if he has ammunition, to reserve that ammunition, to finalize the interaction by effecting the ammo reduction, handle failure asynchronously, etc. Such code needs far more thought and testing than the STM version.

A solution that imposes an order-of-magnitude code bloat, productivity decrease, etc, is unattractive.

async message passing syntax

[edit: actually I'm arguing more that async message passing doesn't have to be as code intensive as it is currently. I grant the atomicity problem is a problem.]

I don't see why message passing would have to be longer than your example using STM. Check out the syntax for Raph Levien's Io language. It makes continuation passing style easy to write. In his syntax, semicolon introduces a function. So the semicolon at the end of a statement is actually a function representing the continuation of the program. Channels are simply functions. A remote procedure call with an asynchronous continuation winds up looking just like sequential code.

level of abstraction

It seems as if you make the conceptual leap from "everything is an object" to "everything is a process". I don't see why you'd have pretty much the same code in Erlang, based on my layman's knowledge of guns. Just like noone hanging out of a car window, shooting like mad, would be dealing with individual "bullet objects", your program ought to work at the level of "if (clip_empty()) {reload()}". If another actor has to get involved, say because the shooter doesn't have a spare clip in his pocket, a request is needed - in Erlang, that would most likely be a one-liner, not tens of lines of code.

There is no need for two-phase commits etc. just like there isn't in the corresponding real-life scenario. There is also hardly any call for specific error handling where the gun has to account for a scenario where the clip doesn't magically appear. The gun doesn't reload itself, does it? And the banking analogy doesn't hold, since you wouldn't have two shooters in different cars loading the same gun.

The thing about concurrency-oriented programming is that you normally go to real life for abstractions, and many of the good abstractions are quite simple. What would be the point of not using a simple counter to keep track of how many bullets remain in the gun? There is no interesting concurrency happening, but a simple relationship: (shot fired) -> (one less bullet). Make gun, clips, bullets into passive objects controlled by an actor - the shooter.

I want my 2pc

There is no need for two-phase commits etc. just like there isn't in the corresponding real-life scenario.

Dealing with race conditions in message passing scenarios has been motivation enough to adapt Software Transactional Memory to distributed transactions in Actors Model.

Besides, ideally software should be easier to reason about than real life.

I'm sort of curious what Patrick Logan would think about the idea that Message Passing doesn't solve the important problems solved by transactions.

Ad-hoc Transactional Protocols Considered Harmful

You could implement this using message passing, but you'd be writing and debugging your own ad-hoc transactional protocol for each of the tens of thousands of updates and state transitions present in the program.

You said that critical point so clearly and distinctly! I'm very impressed.

The necessity for atomic updates is an application-specific thing. Given variables a and b, whether two otherwise sequential updates to them must be transactional depends on the semantics of the update within the application. It cannot be inferred, because the information only exists in the programmer's head. For the software to treat the updates atomically, it must be explicitly informed of the need.

Thus the smallest possible amount of information the programmer must give the application is that the updates have to happen together. In other words:

atomic {
update a;
update b;
}

represents a floor on the amount of annotation. There can be no solution to atomicity that is any more concise than this. (The specific syntax is of course irrelevant; begin; ... commit; is equivalent.)

So that code has to exist somewhere. It will either 1) exist in client code, or it will 2) exist in the message-handler implementation of the "ad-hoc transactional protocol" mentioned above. Note that 1 is a proper subset of 2, and that the additional overhead of 2 is substantial, and must be paid again with each additional atomic update requirement.

Having used a variety of distributed computing techniques, including DCE, RMI, pub/sub, and raw TCP, I am convinced of the superiority of message passing, particularly pub/sub message passing. It is a great solution to the problems of shared-everything, as well as a great abstraction in its own right. However it is not a solution to atomic updates. The solution to the problem of atomic updates is transactions.

For example, to fire a

For example, to fire a weapon, I need to first determine whether I'm carrying the weapon, whether the weapon has any ammunition, and then I need to create a bullet object. If a weapon has one bullet and two people tried to fire it simultaneously, non-atomic updates might lead both players to conclude that they can fire the weapon, so two bullets are created, and the gun is left with -1 bullets, an inconsistent state.

I'm not sure that I understand why this would be a problem in a message-passing system. If the ammunition state of the gun is internal to the gun (as it should be, since it's a property of the gun), then the first "fire" message to reach the gun would result in the creation of a bullet object. The second "fire" message would fail to create a bullet, because the internal ammunition level of the gun would have already reached 0 as a result of the first "fire" message (in any sensible message-passing system, it is possible for actors to refuse to process a new message until they've completely processes earlier messages - that's one of the fundamental differences between actors and passive objects). You wouldn't end up with two bullets, or a negative ammo state. The only remaining problem is that two players think they own the same weapon, but I fail to see how STM would solve that problem either.

Am I missing something here? Can you please explain why you think a message-passing implementation (in say occam or Erlang) wouldn't preserve atomicity in this case?

Consider this example

Consider this example instead.
You fire the bullet at a petrol tank. Potentially lots of other bullets, and other objects may be near this tank too. All of them will need to:

* Check if they hit the tank
* If so cause it to explode
* and deal the appropriate damage to themselves (e.g. modify bullet trajectory)

This needs to happen atomically. In other words, even if you decide that you didn't hit the barrel and it should not be modified, there is no way of knowing this BEFORE you've read the location of the barrel (and in case you do need to update it, you better have exclusive access!).

You could do this by having the barrel object implement an ad hoc transaction protocol, but if you have lots of objects in this scene who all examine this barrel (99.9% do nothing to it) would cause contention. Or you could optimize it by having a non-transactional check of the location to avoid acquiring exclusive access when you don't need it, but this is familiar territory from locks: in the real world you'll have much more complicated logic than this, where you really can't know if you needed transactional semantics until you've basically done all the work already.

STM solves this because each thread could examine the object all it wants, it's only when it's actually modified that collisions could potentially happen. Message passing is great, but they don't really work well for everything.

Futures

I think that for games, actually using just straight futures might be the best approach, as you really dont want any observable concurency but just a clean (optional input) -> output loop.

This is a better approach.

This is a better approach. At time T1, the player fires a bullet. The bullet object essentially ray-traces to see if it'll collide with another object, and at what time T2. It schedules a timer tick message to be sent to it at that time.

When receiving the timer tick at T2, the bullet then checks to see if the aforementioned object still exists. If so, kaboom! If not, ray tracing proceeds again, complete with a reschedule, until the bullet's range has been exceeded. Note that your locking, if needed at all, is constrained exclusively to one, and only one object at a time.

This is the basic algorithm for simulating electronics circuits in software (often without the use of multithreading, by the way), and since it's basically 100% event-driven, it can even be done entirely in a single process/thread. In fact, considering the sheer quantity of cycles invested in switching CPU state from thread to thread (even Erlang's is non-zero, folks), using a purely event-driven approach such as this just might be preferable.

If you want to model the real world, then model the real world how it actually works. It's all pretty common sense, at least to me.

What a strange thing to say...

If I'm reading you correctly you're avocating an event-driven model (nothing wrong with that per se), but then you say:

If you want to model the real world, then model the real world how it actually works.

The real world is massively parallel -- not event-driven.

Massively parallel implies atomicity

I fail to see the problem that STM solves for the massively parallel objects in a game.

Tim Sweeney wrote:
http://lambda-the-ultimate.org/node/2048#comment-25105

The other 30% of our code (which accounts for 5% of performance) is necessarily so stateful
that purely functional programming and message-passing concurrency are implausible.
Thus I see imperative programming, in some form or another, as remaining an essential tool
for some systems present in modern software.

Without STM, the only tractable way to manage this code is to single-thread it. I'm not nearly
smart enough to write race-free, deadlock-free code that scales to 10,000 freely-interacting
objects from 1000 C++ classes maintained by tens of programmers in different locations.
How do the message-passing guys implement robust interactions between independent object
with mutable state...

http://lambda-the-ultimate.org/node/2048#comment-25151

At any point, any set of objects can potentially interact with each other in a stateful way,
for example I can get in a car, drive it around, and run into a mailbox, damaging it.

Many of these interactions require atomic updates of groups of objects.

sylvan wrote:
http://lambda-the-ultimate.org/node/2048#comment-40141

Consider this example instead.
You fire the bullet at a petrol tank. Potentially lots of other bullets, and other objects may be near
this tank too. All of them will need to:

* Check if they hit the tank
* If so cause it to explode
* and deal the appropriate damage to themselves (e.g. modify bullet trajectory)

This needs to happen atomically.

Bárður Árantsson wrote:
http://lambda-the-ultimate.org/node/2048#comment-43313

The real world is massively parallel -- not event-driven.

How can a massively parallel world be built without orthogonal objects which are implicitly atomic?

In a massively parallel world, the atomicity should be implicitly in the class design for orthogonal actors:

* Shooter requests gun to fire
* Gun fires and atomically reduces it's ammunition count, replies to shooter if necessary
* Bullet is an orthogonal object created by the gun
* Bullet does read only access of other objects to determines next hit
* Bullet requests to tank to hit it
* Tank records hit, decide when itself should explode, sends damage requests to nearly objects,
replies to bullet as necessary
* Bullet records whether it is still active (moving or not)

Any other design would not be OO orthogonal (you would have global logic trying to manage groups of objects) and thus would not be able to handle all scenarios that may arise in the massively parallel world. Composibility comes from correct OO orthogonality.

Where are the race conditions? If there are any race conditions, i.e. nearby objects absorb/alter the shockwave of the explosion for other nearby objects, then STM does not help you with the needed logic.

It seems to me that as games scale to be more massively parallel (more permutations of things can happen) along with the power of more CPU cores, then the class design is going to have be atomically orthogonal any way.

In short, any global algorithms need to be replaced by iterative atomic ones, i.e. the objects absorbing the shockwave reply to the tank's explosion object, explosion object iteratively alters it's shockwave, and the explosion object is responsible for tranversing the objects in an order which correctly models the perturbation of the shockwave. If shockwave propogation is global solution matrix, then you can argue that transactions are needed on a collection of nearby objects, but then how do handle the unexpected event of another explosion nearly interim? I think ultimately massively parallelism resolves to iterative atomic algorithms.

I am not a game programmer (yet). Have I missed something?

Massively Parallel Worlds

One can still support global rules in a massively parallel system. Rules-based programming, for example, can easily be massively parallel - there is an inherent parallelism principle involved: parallelize evaluation of the conditions (which might be defined by pure logic or functions of arbitrary complexity), and parallelize execution of the events. Transactions, in this case, could serialize the triggered events to give programmers a handle on grokking rules interactions. Transactions would run in parallel in the absence of any conflicting operations.

Of course, you likely wouldn't want to define a shockwave as something 'instantaneous'. In reality, shockwaves and fireballs take milliseconds once they hit air. In games, shockwaves and fireballs last for seconds. To deal with conditions like this, you should probably create a shockwave/fireball-object in the world's database and let it influence the world for a little while before dying out.

A neat little tweak on the whole 'rules-based programming' is to support arbitrary reactive programming and data-binding - not just for the conditions defining the rules, but also for the state of the world itself. This would allow you to logically express, say, the position of a bullet or shockwave as a function of time, then trigger rules automatically over time based on the logical 'expansion' of the shockwave. One might also bind to external ('real world') data and event sources. Players would just be one example.

In any case, there are too many convenience advantages to defining 'gameplay rules' (including physics) at a global level, using rulebooks and such, for me to readily give them up. And I'd say that the ability to have gameplay operations reference and impact multiple gameplay 'objects' is a relatively critical feature - not something to easily abandon.

See Inform language, which isn't parallel but does offer an example of rules-based gameplay development.

Example rule?

Could you give me even one example of a rule that requires to lock the state of orthogonal instances across it's update of each of the affected instances?

The visualization in my mind is that real world rules interact incrementally, i.e. for example with physics once you've applied a reaction force to an object, then there is a domino chain reaction of reaction forces as the physics calculation is repeated for all involved (nearby) objects. So I still fail to visualize an example how a global rule is beneficially applied to numerous objects in one atomic step? Rather I visualize that a global rule is applied as atomic incremental steps on interacting instances (actors).

If I am correct or for algorithms where I am correct, then it seems we need the language to lock the class instance so that it blocks synchronous (re-entrant) access and queues threads asynchronously, i.e. the transaction granularity is per-instance?

If there are global (batch) algorithms that will require multi-instance (set of) transaction granularity then we will need some transaction mechanism for that, but I do not yet visualize even one example of such an algorithm. One potential example is if physics algorithms calculate all the forces on all the interacting objects at one point in time simultaneously, and in this case, the algorithm needs to tell the language which instances are blocked for the atomic operation. I do not see how a general STM memory scale granularity is optimal, because the algorithm granularity is still per-instance as a set of instances. It seems to me for efficiency a run-time check for a per-instance lock flag (mutex) occurs only once per thread's atomic (non-re-entrant due to mutex) method call; whereas, a general STM memory scale (granularity) protection will involve a run-time check on every write to the class instance's data by the same thread.

In short, we need per-instance mutexs, automatically enforced by the language for single instance atomicity, and with language support for multi-instance transactions.

Note I started studying all the concepts (after reading Tim Sweeney's paper about future of game programming) about 24 hours ago, i.e. lazy vs. strict/lenient, imperative vs. functional, etc.., so perhaps I am incredibly wrong. My knowledge of functional programming is at earliest stage of conceptual awakening, and no experience other than foreach() in imperative OO languages and some HaXe (O'Caml derivative ActionScript 3 clone). Just sharing.

Batch updates emulate the asynchronous nature of real world

I realized the update order of instances is perhaps the reason we need batch (multi-instance, transaction) updates, but I am not yet convinced we can't adjust algorithms to per-instance incremental update propagation:

http://lambda-the-ultimate.org/node/3637#comment-51620

Rules, Actors, and STM

The visualization in my mind is that real world rules interact incrementally, i.e. for example with physics once you've applied a reaction force to an object, then there is a domino chain reaction of reaction forces as the physics calculation is repeated for all involved (nearby) objects.

In rules-based programming, describing 'incremental' simulations is not difficult.

For example, you might define a program that involves rules for what happens when one domino collides with another. In such a collision, the output behavior will want to look at and modify the state (velocity) of two different dominos. Given appropriate rules for determining collisions, one could knock one domino down and see them all fall.

Even for an incremental program, where each operation involves an interaction between only two elements, it is no less valuable that each operation on two dominos be atomic rather than suffering interference from yet more operations whilst halfway through completing the physics computations.

But to address your overall argument, your focus on 'real world' parallelism does not address a simple reality in the context of gameplay: games aren't the real world. Programmers need to describe gameplay rules, which may possess little relation at all to physics. For example, 'after the golden cat icon is placed on the pedestal, all cats in town become feral'. Not much physics there, eh?

If I am correct or for algorithms where I am correct, then it seems we need the language to lock the class instance so that it blocks synchronous (re-entrant) access and queues threads asynchronously, i.e. the transaction granularity is per-instance?

The model you describe is common to Actors Model implementations. It does serve well for describing program components (first-class processes) and supports object-capability security, distribution, concurrency.

I favor an asynchronous actors model (no queues, allows replicated distribution) with a primitive 'cell' for state across operations, protected by transactions.

I would not recommend Actors Model for representing domain or gameplay objects (tanks, bullets, etc.), though I'd rather not go into the (many) various reasons here. It is my assessment that OOP for domain modeling is a lot like pushing a cart with a rope.

As a point, you shouldn't equate transactions to locking. You might wish to look into optimistic concurrency.

It seems to me for efficiency a run-time check for a per-instance lock flag (mutex) occurs only once per thread's atomic method call; whereas, a general STM memory scale (granularity) protection will involve a run-time check on every write to the class instance's data by the same thread.

STM's performance for isolated update to an isolated object is somewhat limited, true.

On the other hand, favoring STM allows you to avoid the actor's message-queue, which allows local computation (e.g. on a stack) and allows 'inlining' for references to a fixed actor, and even allows 'replies' from actors without risk of deadlock wait cycles. Further, a single actor can process many operations simultaneously (and roll back only for actual conflicts as opposed to blocking for potential ones), and a stateless asynchronous actor can effectively be replicated.

So, while STM may seem to be on the flaky side for performance, STM may optimize and scale better than queued messages for atomic actors. Then, of course, there are its many flexibility and safe composition advantages.

Still leaning towards the per-instance granularity for OO code

...Even for an incremental program, where each operation involves an interaction between only two elements, it is no less valuable that each operation on two dominos be atomic rather than suffering interference...

This is still per-instance mutex atomic as I proposed, because the interaction between the two is the method call and reply.

...For example, 'after the golden cat icon is placed on the pedestal, all cats in town become feral'...

It is still per-instance mutex atomic, because one possible correct OO design is the global cat feral state should be a static member of cat class. The instances could override, and should check the timestamp of their override to see if the global member has later timestamp and thus priority.

And if the feral global state is placed in a parent class or in a child instance that cat holds a static reference to, then the composability improves because new programming in future can leverage the global feral state. The important point is that composability improves with correct OO design, and thus I don't think it is hard or desirable to be avoided by trying to find some lower-level platform method of hiding the complexity-- that will just lead to composability spagetti. In short, as I wrote in other thread today about universal trend of entropy, the more global management code (the less work one has done to achieve correct OO design), the lower the re-usability of the code, the shorter the half-life of the code, the faster the local exponential peak & decay.

See we need to be thinking more about correct design and algorithms. Correct OO design for multi-threading can't be solved with concepts that are not well matched (i.e. functional programming is not OO state programming, and STM granularity is not OO per-instance granularity).

I would not recommend Actors Model for representing domain or gameplay objects (tanks, bullets, etc.), though I'd rather not go into the (many) various reasons here. It is my assessment that OOP for domain modeling is a lot like pushing a cart with a rope.

As a point, you shouldn't equate transactions to locking. You might wish to look into optimistic concurrency.

I will need to think more about this and study your link. I hope with above examples, maybe I have already thrown some curve balls which might nullify your prior conclusions?

I also need to think more about you wrote about STM.

Optimistic concurrency is smaller granularity than per-instance

...but at a cost in overhead, plus perhaps encouraging sub-optimal OOP design.

As a point, you shouldn't equate transactions to locking. You might wish to look into optimistic concurrency.

If threads are not OFTEN able to modify the same object (instance) AND NOT modify the same data members of the object, then optimistic concurrency is worse (than mutex locking the entire instance) due to the additional overhead of per data member (memory-wise) protection (comparison or checksum). Thus I conclude optimistic concurrency is only superior to per-instance locking when the atomic algorithms are OFTEN more granular than per-instance. Further, it will only be superior if the blocked threads OFTEN have no other equivalent priority alternative work to do instead, i.e. if the blocked thread is not choosing randomly among which of it's siblings to update incrementally.

This is an example why we need think carefully about algorithms. The choosen platform/language has to match well to the granularity of propagation of the algorithms we choose.

STM's performance for isolated update to an isolated object is somewhat limited, true.

But we must qualify "isolated", thus ditto above, with attention on the bolded logic.

On the other hand, favoring STM allows you to avoid the actor's message-queue, which allows local computation (e.g. on a stack) and allows 'inlining' for references to a fixed actor, and even allows 'replies' from actors without risk of deadlock wait cycles. Further, a single actor can process many operations simultaneously (and roll back only for actual conflicts as opposed to blocking for potential ones), and a stateless asynchronous actor can effectively be replicated.

I am not privy to the overall design you are using for "Actors Model", so the debate of the applicability of STM will hinge on my ability to analyze alternative ways to organize the design, potentially similar to the curve ball examples I threw in the prior post.

For example, I am proposing that the mutex be language driven at the per-instance granularity, thus I am not visualizing what the message queue is for? Also I am proposing that the reply is the return from a method call from one instance to another, thus there is no wait deadlock potential. In my current visualizaation, the interaction between an instance and its world is through a single thread (the per-instance mutex) so then why would it need to spawn multiple interactions as message instead of completing each one atomically? If there is indeed some need to integrate over multiple instances, then we need multi-instance transactions and I have argued above that per-instance granularity is probably what is needed (or STM if the granularity is OFTEN sub-instance data and otherwise blocking priority work). Thus I am visualizing we do not send a message to each instance involved, but rather we collect them in mutex collection (or other language construct) that locks their per-instance mutex and then do array operations on all of them. What is to be gained by the sub-instance data member granularity with STM given my logic?

STM even worse for multi-instance transactions?

The cost for the bolded logic in the prior post, is even greater with multi-instance transactions, because the non-optimistic case rollback will potentially undo multiple instances of work. The likelihood of non-optimistic collisions probably increases with the number of instances in the transaction, and I suspect increases non-linearly?

Mental wrote:

Having written several implementations of both STM and various message-passing-based concurrency models for Ruby lately, I'm a lot less sunny on STM than I used to be even a few weeks ago.

I was having fun implementing STM until I realized that I was able to implement the Actor model correctly in a few hours, versus several weeks for getting the fiddly aspects of STM down.

The biggest immediate problem for STM is starvation -- a large transaction can just keep retrying, and I'm not sure there is a way to address that without breaking composability. ...and composability is the whole raison d'etre of STM in the first place.

Transactional Actors Model

Found the link to your Transactional Actors Model in the other thread:

A prototypical example of a race condition:

* Actor B accepts the vocabulary 'getBalance', 'withdraw (amount)', and 'deposit (amount)'.
* Actors C and D, concurrently, want to withdraw half of the current balance. So each of them calls 'getBalance', divides the returned amount by two, then withdraws that amount.
* The result: based on interleaving of 'getBalance' and 'withdraw' messages, the final balance could be at 3/4 its original balance, or it could be at zero.

The automated per-method call mutex that I envisioned would also not automatically solve the race condition above, unless the getBalance() conditional logic was merged into the withdraw() method call (message). (however note my realization at bottom)

The atomic transaction (per-instance or STM optimistic concurrency over entire instance data) will otherwise have to be declared manually at the block containing both getBalance() and withdraw() method calls. The disadvantage (non-optimistic outcome cost) of the STM optimistic concurrency is that everything that was done in contained atomic block declaration has to be undone, because there may have been domino effects on the wrong assumption of the value of getBalance().

The per-instance mutex locking seems to be preferable. The general rule is that if you call methods that do not return instance state, then language compiler can set and unset the mutex automatically so that no other threads operate on the class asynchronously. If we call methods and cache (store) state locally, then we need to lock the per-instance mutex (that the state came from), until we discard that state.

Perhaps the language could do this block atomicity automatically also? If all class state is only revealed in getters, then the compiler can track the assignment of getters to local scope (usually stack) variables, and derivative computations, and release mutex automatically at block exit (or even earlier). The problem is (or derivative) assignment of getter to external scope (i.e. a closure), which could cause divergence or a deadlock on the mutex. Thus it should be a compiler error to create a closure, or pass into a function that does, on the getter (or it's derivative) instance. This closure problem will manifest as uncaught race errors when we do atomic block declarations manually and do not hold the mutex while the external copies of state still exist.

Probabilistic preferred to fatality?

It can easily become more complicated. For example, Actor B may need to negotiate with three actors U, V, W to perform a withdrawal, any of which might be similarly disrupted due to message interleaving, may require coordination from B (message sent to U might depend on responses from V, W), and might need to be reversible (response from U might require reversing an operation started with V, W)

If the aforementioned read state lock is implemented, then afaics, the only potential error due to multi-instance interaction is the deadlock where the locks prevent the interaction from completing and locks releasing. And this deadlock (or starvation for optimistic rolbacks) risk is going to apply when ever multi-instance transactions are performed (no matter which concurrency mechanism is employed, e.g. per-instance mutex or STM optimistic), so this is another reason why we need to get the OO correct and atomic operations need to be as narrow as possible.

It might be better to do no concurrency locks or rollbacks, and assume that errors due to out-of-date cached instance state are small compared to the risk of deadlocks (which are fatal), especially if we design our atomic blocks to be as narrow as possible. The behavior becomes more probabilistic (like nature!) and less deterministic, but at least deterministic in sense that deadlocks are assured to never happen if we never lock, and race starvation divergence never happen if we never rollback. This is interesting as the move to multi-core may be analogous to how the models of the universe got more probabilistic as we moved from spacetime to quantum scale:

My Theory of Everything (near bottom)

Is there any way to get determinism with cached instance state and also determinism of termination without deadlocks and infinite rollbacks starvation divergence? I suspect no if there is any global state, not without 100% (infinite) regression, which takes me right back to my point that since we can not sample for infinite time, then we never know the deterministic truth. It seemed to piss off some people that science is faith, but alas it is. Evidence is convincing only in the frame of reference where it is, which is a small subset of the infinite universe (even the infinite permutations of interactions/perceptions/realities occurring simultaneously here on earth).

Note that as the number of cores (threads) reaches towards infinity asymptote, then the atomic operations must become asymptotically infinity narrow (granular), thus the errors due to not locking or rollback on cached state would asymptotically approach 0 and the prediction from My Theory of Everything would be fulfilled:

As the measuring capabilities have been proliferated into the quantum realm, the quantum shared reality has become probabilistic, i.e in between deterministic and random. I will agree with Einstein, that the quantum evidence is "silly", and offer the plausible conjecture/prediction that deterministic perception of matter at the quantum scale is possible, because if the current measuring devices are sampling below the period and frequency required by Nyquist, then random aliasing effects are observed. At quantum granularity, the entropy (i.e. disorder, information content) is much greater than at the space-time scale, thus providing a window to many more possible perceptions (realities). In short, my conjecture is that the frequency of the signals at quantum scale is much lower and/or higher than our current measuring devices can detect deterministically.

In short, we have a sampling theory problem.

==========
An optimization might be to do per-instance locking as per my prior post, then break deadlocks and accept the error. The advantage is over accepting the (trending towards infinitely small as we reduce the atomic time we cache state, as we improve our OO design to accommodate trend towards infinite cores) error that can occur if we do no concurrency coordination as proposed in this post, at the cost of (out-of-order execution) errors being greater due to longer time elapsed before we detect and break a deadlock.

Deadlocks and Memory Transactions

this deadlock (or starvation for optimistic rolbacks) risk is going to apply when ever multi-instance transactions are performed (no matter which concurrency mechanism is employed, e.g. per-instance mutex or STM optimistic)

Mature implementations of database transactional systems tend to use a hybrid combination of optimism and pessimism in order to control the slider between 'progress' and 'parallelism'. The same can be applied to STM by, for example, becoming more pessimistic after the first roll-back, or by profiling (at both call-site and resources) statistics for how successful optimism has proven in the recent past. Pessimism may also be applied to only the specific resources that have caused conflicts in the past.

Deadlock is possible with locking transactions, of course, but transactions allow one to detect such deadlocks and safely break them (via aborting them), and do so automatically.

Deadlock caused by cyclic waits among actors can also be detected and broken, but the local semantics of the program are harmed by a non-local error. The cost is usually a lost message or two at some arbitrary point in the cycle. (If you're using futures, you can just break the future but still process the message.) One can work around such errors, but it does make composition more challenging (requiring self-discipline and deeper knowledge of implementation details).

Breaking pessimistic deadlocks introduces error?

Shelby Moore III wrote:

Agreed as long as we accept the potential out-of-order execution error introduced by breaking them (with pessimistic locks we can't rollback the changes already accrued by the deadlocked threads that are chosen to lose their locks)...

You may wish to reply at the above linked thread.

I am not privy to the

I am not privy to the overall design you are using for "Actors Model" [...] I am proposing that the mutex be language driven at the per-instance granularity, thus I am not visualizing what the message queue is for? Also I am proposing that the reply is the return from a method call from one instance to another, thus there is no wait deadlock potential.

Waiting on a mutex puts the whole thread (including the message) into the mutex's queue. This approach is avoided because it is actually quite rare that actors need a reply from the method. Thus, Actors model is more often implemented such that the outgoing message ends up in a queue by itself, allowing the caller to move on and perform more activity.

Still, support for replies is quite popular. So many implementations of Actors model allow you to wait for a reply, either as part of the call or via a 'future' [wikipedia] that may be waited upon after performing some other activity.

If an actor can BOTH wait on replies AND delay incoming messages while processing a message, then it can deadlock given any cyclic message path. Such cycles are often difficult to statically eliminate in a composable system.

[...] OFTEN able to modify the same object (instance) AND NOT modify the same data members [...]

Even if there is a single point of contention, STM can still perform well... but it does require some extra support for 'merging' and for identifying when the functional processing after looking at the state results in the same 'answer'. (STM can readily distinguish read-only, read-write, and write-only operations for any given unit of state, though the latter requires writes be a T->T transform, which return unit, to really be of utility.)

Additionally, many read-only transactions can work with the same object without any risk of conflict, especially given multi-version concurrency control.

Dodging feral cats

...For example, 'after the golden cat icon is placed on the pedestal, all cats in town become feral'...

It is still per-instance mutex atomic, because one possible correct OO design is the global cat feral state should be a static member of cat class.

I was assuming you need to make each cat 'domestic' again as part of the game.

In any case, rules tend to interact with one another, and may be arbitrarily complex. It is not unusual for a single action to influence more than one object, especially when working with with 'relationships' between objects, or interacting with an 'environment'.

The important point is that composability improves with correct OO design

Given that 'correct OO design' is pretty much defined by design patterns that meet certain composability requirements, I'll agree. But that's a trivial claim, really, since composability improves with good design no matter which paradigm you're using.

Correct OO design for multi-threading can't be solved with concepts that are not well matched (i.e. functional programming is not OO state programming, and STM granularity is not OO per-instance granularity).

I would suggest the contrary. OO design is not complemented all that much by concepts that very closely 'match' OO it, since those are exactly the concepts one could readily implement atop OO.

Most paradigms, including OO, are very well complemented by orthogonal concepts. These include pure functional programming for composing and analyzing immutable messages and instance-state; logic programming for domain logic processing; reactive expressions (both logic and functional) for liveness, performance, and disruption tolerance; events distribution, subscriptions, and 'plumbing' for efficient multi-cast and runtime configurability and demand-driven support; syntax extensions and declarative meta-programming for compiling domain-specific stuff into OO; transactions for regulating multi-object interactions; persistence for scalability; etc.

OOP concepts extend to external order dependencies

In any case, rules tend to interact with one another, and may be arbitrarily complex. It is not unusual for a single action to influence more than one object, especially when working with with 'relationships' between objects, or interacting with an 'environment'.

But that doesn't necessarily mean the implementation of the rules must depend on order-of-execution. Marking all the cats feral or domestic is not very order-dependent in of itself.

Relevant posts:

Fatality vs. probabilistic tradeoff revisited
Applicable programming paradigms for out-of-order execution?

Given that 'correct OO design' is pretty much defined by design patterns that meet certain composability requirements, I'll agree. But that's a trivial claim, really, since composability improves with good design no matter which paradigm you're using.

I am referring not to the typical class OOP principles (inheritance, encapsulation, etc), but to the more general goals of OOP to minimize inter-class dependencies, of which order-of-execution inter-class dependencies is included.

Another relevant post:

Is (up to WAN) propagation of transactions desirable?

I would suggest the contrary. OO design is not complemented all that much by concepts that very closely 'match' OO it, since those are exactly the concepts one could readily implement atop OO.

Most paradigms, including OO, are very well complemented by orthogonal concepts...transactions for regulating multi-object interactions...

I do not disagree about using the best tool for the job, nor disagree that there are tools which complement OOP. But I was writing about optimum multi-threading design, not about complementation for other goals (your quick synopsis of complements is appreciated though). My point is that eliminating/reducing the need for transactions is the universal goal, because the less a class depends on external orders, the more composable it will be, the less those orders can propagate so wide that conflicts reduce the distributed program to a least common denominator brittleness.

Doesn't work the way you've

Doesn't work the way you've presented it: any good FPS player who's been playing a while has psyched someone into walking into a projectile they weren't otherwise going to walk into - often in reaction to that projectile's being fired in the first place! More generally, things being shot at move, and you don't actually know when the collision will take place even if they do still collide

Works only in some fairly restrictive conditions

This is a better approach. At time T1, the player fires a bullet. The bullet object essentially ray-traces to see if it'll collide with another object, and at what time T2. It schedules a timer tick message to be sent to it at that time.

That works only if 1) the original target can't unpredictably move closer to the firing point while the projectile is traveling and 2) nothing else can unpredictably move into the path during the interval between T1 and T2.

Transactional Java Virtual Machine

Hi,

I'm not sure if this thread is still active, but I know Patrick Logan because he used to work at my company GemStone Systems. In Patrick's article he briefly mentioned how GemStone has successfully been selling STM for many years. In essence, our technology IP here is that we optimize the standard Sun Java virtual machine and make it transactional, i.e., we can transparently fetch and store objects from disk and let multiple threads and processes share data in a consistent manner. No bytecode rewriting and AOP hacks are required since the virtual machine manages read/write barriers and any class can be made orthogonally persistent.

Though not originally targeted for game simulation engines that share lots of state between threads on a many-core system, it might be viable to utilize such transactional VM technology as a simple-to-program, main-memory, game engine based on optimistic concurrency with full WW and RW conflict detection and resolution. Of course if transaction durability is required, there's persistence too. That might help with game scalability.

Just a thought from a non-gamer (last time I played a video game was Atari 2600) and non-game programmer (last time I programmed a video game was using shape tables on an Apple II).

Welcome to LtU. Threads here

Welcome to LtU. Threads here remain active for as long as there's interest...

Discrete Simulation

My understanding of game technology (particularly game AI) is that processing occurs in discrete chunks, with a message receipt & decision phase followed by an update phase. In this model, where is the need for STM? Can't you simply queue up updates in the read & decision phase and execute them serially in the update phase, partitioning among threads updates to different data regions?

In this way, there is no need for atomic blocks since all updates within a discrete interval happen in a single update phase, and there can be no conflicting cross-thread reads or updates.

I mention this model because Tim does not include it among his three candidates. I am keen to hear his response, if he's still out there.

Disclaimer: I am not a game programmer. My reference for the discrete model is Steve Rabin's "Designing a general robust AI engine" in Game Programming Gems Vol. I, and also the SIGMOD '07 paper "Scaling Games to Epic Proportions".

Math check

... it makes 5% of my performance profile become 4X slower. I break even at 4 threads,...

If the STM portion uses 5% of four processors, then shouldn't you be comparing STM to 20% of a single processor instead of 5% of each processor? I suppose this assumes something about the amount of work that must occur between world updates, but in the extreme case where you'd need 100% of one processor to run the game logic, it looks like you'd need 20 processors before requiring STM (assuming the 5% stays constant). Edit: And then it's not that you ever "break even" with STM - the single threaded model just hits a wall.

STM risks

Tim Sweeney wrote:

The other 30% of our code (which accounts for 5% of performance) is necessarily so stateful that purely functional programming and message-passing concurrency are implausible. Thus I see imperative programming, in some form or another, as remaining an essential tool for some systems present in modern software...

...Without STM, the only tractable way to manage this code is to single-thread it. I'm not nearly smart enough to write race-free, deadlock-free code that scales to 10,000 freely-interacting objects from 1000 C++ classes maintained by tens of programmers in different locations.

With STM, I can continue writing software at full productivity...

...STM is the only productivity-preserving concurrency solution for the kind of problems we encounter in complex object-oriented systems that truly necessitate state, such as games.

If STM rollback collisions are not infrequent, then it is possible that real-time events can suffer. Also you have to isolate the propagation (to external resources and modules) of rollback dependencies.

Hacking away composability?

Additionally if the rollback dependencies got into your 70% (95% of performance) functional programming code, it could be worse perhaps than just running the 30% (5% of performance) code single-threaded.

RE: STM Risks

if the rollback dependencies got into your 70% (95% of performance) functional programming code, it could be worse perhaps than just running the 30% (5% of performance) code single-threaded

The reason for the small time spent in imperative code is that 5% is all the time it takes to do something with the result of the last functional computation and obtain parameters for the next functional computation.

One cannot just 'run the 30% of code single-threaded' and still readily parallelize the other 70%. Interdependence interferes. Functional code doesn't internally parallelize well for small or medium-scale computations (due to scatter-gather overhead, false sharing, etc.). Instead, a good chunk of functional-code parallelism is simply inherited from task parallelism.

If STM rollback collisions are not infrequent, then it is possible that real-time events can suffer.

Unlike most other concurrency control methods, transaction approaches can support priority override based on deadline or real-time requirements. And, much like other approaches to real-time, programmers will need to take care to keep the synchronization effort and computations small and predictable in order to achieve real-time requirements. It is difficult to see STM as being at a disadvantage relative to most alternatives for this purpose. (Wait-free is better for real-time, of course, but requires even more discipline and is not general purpose.)

Also you have to isolate the propagation (to external resources and modules) of rollback dependencies.

You should support propagation of transaction semantics through as many modules as feasible. Add transaction barriers where one must, but propagate where one can. This offers the most flexibility to programmers.

delete duplicate post

...

RE: STM Risks

One cannot just 'run the 30% of code single-threaded' and still readily parallelize the other 70%.

Queue up large-scale computations.

Functional code doesn't internally parallelize well for small or medium-scale computations.

Thus queue latency is not a factor, since functions can't do real-time computations any way.

...transaction approaches can support priority override based on deadline or real-time requirements.

Afaics, I translate that to mean transactions can be aborted and rolledback, leading to starvation if as I wrote "If STM rollback collisions are not infrequent"?

It is difficult to see STM as being at a disadvantage relative to most alternatives for this purpose.

When "If STM rollback collisions are not infrequent", then STM may be at a disadvantage compared to single-threaded, especially if it requires propagation everywhere as you assert below. Possibly consider single-threaded, with some special helper threads and some clever algorithms with judicious rare use of mutex or monitors. If 5% is supporting 95%, we can already support 20 cores with the 5% single-threaded. If we can parallelize 40% of the 5% (reduced to 3%), then we can support 33 cores, and with clever algorithms that might be fraction of the 30% of the code. (Idealized of course not counting overhead, etc). If the 4x performance hit with STM is correct, then STM 5% increases to 20%, so you need a minimum of 20 cores to see any advantage over single-threaded. (5% per core = 4 cores min for the 30% code, then you have 80% needing 16 more cores). A calculation comparison that no one was thinking about apparently. You need to get way up into the 100s of cores before you could potentially see doubling or tripling performance wins with STM, and at that level of parallelism you are not likely to see narrow conflict windows or your modeling of the world has become so parallel (e.g. a core per autonomous, fully reactive senses object in world) that your imperative code has potentially shrunk as a fraction.

You should support propagation of transaction semantics through as many modules as feasible. Add transaction barriers where one must...

The (ongoing) technical debate on that was already linked in my prior post above.

Philosophically any thing that is in my terse pessimism "every where, else aliasing error rollback hacks glue" scares the crap out of me (25+ years computing programming, couple of large million user projects, but no extensive concurrency experience). [Biblical wisdom rant]Future's contracts are slavery. Make no promises, do not be surety for anything, if you want to remain free[/Biblical wisdom rant].

RE: STM Risks

priority override based on deadline or real-time requirements.
I translate that to mean transactions can be aborted and rolledback, leading to starvation if as I wrote "If STM rollback collisions are not infrequent"?

Even if you've overloaded your computing resources and added architectural bottlenecks to the point that low-priority transactions are colliding and retrying on a regular basis, you can still push high-priority transactions through the system.

This is, as mentioned, superior to most alternatives. Lock-based approaches, even if you avoid deadlock, still force the high-priority operations to wait on the low-priority operations. Lock-free and wait-free approaches fail to support general-purpose operations (though they do support a useful subset).

Usefully, a mature transactions system can also use pessimism and interact with the scheduler to serialize the problematic transactions on the problematic resources and clean up any snaggles. Lock-based approaches don't have that sort of room to grow more intelligent.

Queue up large-scale computations. [...] Thus queue latency is not a factor, since functions can't do real-time computations any way.

It is unclear how queues are supposed to help, or why their latency should matter.

If my single-threaded stateful code needs the result of a functional computation to progress, how does putting that computation into a queue help me progress? What does my single thread of stateful code do next?

Am I to assume that I've already gone to the effort of implementing lightweight threads or some sort of continuation framework so that my (no-longer) single-threaded code may yield after putting something in a queue and find something else to do until its parallel computation is completed?

It may be wise to keep in mind that 'multi-threading' is not the same as 'multi-CPU-cores'. The issues surrounding multi-threaded operation are present also in green threads. Achieving cooperative multi-threading can help avoid a few of the pitfalls, but cannot readily be generalized.

consider single-threaded, with some special helper threads and some clever algorithms with judicious rare use of mutex or monitors. If 5% is supporting 95%, we can already support 20 cores with the 5% single-threaded.

There is an error in how you're imagining the imperative code's relationship to the functional code.

The imperative code tends to compute a function, do something with the result, gather some parameters, repeat by computing another function. It is this sort of pattern that leads to numbers like '5% time spent in imperative code and 95% functional'.

Unfortunately, this pattern also means you cannot simply 'separate' most of the functional computation. That is, attempting to run the imperative code single-threaded and run the functional code in another thread simply results in the imperative threads spending 95% of its time waiting for functional computes from other threads.

Now, above you suggest putting the large functional computes into a queue, where they'll presumably be processed by other threads. Further, you could (as mentioned above) relax the 'single-threaded' idea a bit and have programmers jump through a few hoops to support cooperative multi-threading on a single processor.

But, if you're parallelizing the 'large' functional computes, that means you'll still be performing small and medium computes locally (as you must, for performance). Now, as a simple guesstimate, one might say the further breakdown is: 5% imperative, 20% small and medium functional computes, 75% large functional computes. (I feel I'm guessing low at 20%.)

With those numbers, one would need to run the 5% imperative PLUS the 20% small and medium functional computes on a single-thread. This would allow supporting not much more than four cores.

However, by parallelization of the initial 5%, you can very rapidly increase utilization. If you have four threads running tasks from a queue and rescheduling whenever a wait on a 'large' compute occurs, then each thread will still be spending about 5% of its time in imperative code and 20% of its time in medium and small functional computes. However, you can now run four tasks at once, and (ignoring overhead) each task thread can support about three more cores for large functional computes, for a total of sixteen cores.

Using the '4x imperative cost' as STM's primary impact, then the overall breakdown would be 20% imperative, 20% small and medium functional computes, 75% large functional computes, for a total of 115% of the original cost (before accounting for collision rate). However, each task would be spending 40% of its time in local computes. Under this condition, each core focused 100% on tasks could support about two cores with large functional computes. With four tasks cores, one could achieve utilization of twelve cores total.

In any case, these numbers are all speculative. But you should take into account interdependence of code - i.e. the imperative code is not independent of the functional code - and note that even if one is keeping the imperative code to a single processor it is unrealistic to keep it to a single thread, lest one spend all one's time waiting on functional computes.

When "If STM rollback collisions are not infrequent", then STM may be at a disadvantage compared to single-threaded, especially if it requires propagation everywhere as you assert below.
[...]
Philosophically any thing that is in my terse pessimism "every where, else aliasing error rollback hacks glue" scares the crap out of me.

Don't mistake a claim that transactions should be supported as widely as possible for a claim that transactions require being supported as widely as possible. It is programmers and users that benefit from widespread support for transactions (languages, file-systems, hardware, user-interfaces and multi-page web interactions, mission control protocols for unmanned systems, etc.).

RE: STM Risks

Even if you've overloaded your computing resources and added architectural bottlenecks to the point that low-priority transactions are colliding and retrying on a regular basis, you can still push high-priority transactions through the system.

Afaics, not if the high-priority transactions include (have termination/completion dependencies on) the low-priority ones algorithmically.

STM is not going to be able to separate the high-priority interwoven with low-priority logic of the execution-order dependent algorithms.

For example, just because we can complete the priority transaction of placing the mouse event into a message queue, doesn't mean the processing of what mouse events should do, does not suffer synchronization perturbation (possibly even to the extent the program state machine becomes random in I/O responsiveness and thus random in for example game outcome forks, another evidence of the theoretical infinite time sampling problem aliasing error I keep referring to).

Lock-based approaches, even if you avoid deadlock, still force the high-priority operations to wait on the low-priority operations.

Any concurrency approach (including STM per above) which causes delays in the response to real-time demands, is going to exhibit random aliasing error. There is no way of escaping that our algorithms have to be well structured for concurrency, and any low-level mechanism will just push the aliasing error around but not eliminate it. The key advantage lock-based has over rollback-based, is other than a deadlock, we are sure not to waste any time. And as I wrote in debate/discussion at other LtU page, deadlocks can broken by accepting atomicity error. AXIOM: if the algorithms are execution-order dependent, we are going to get error some where in concurrency. (this is due to the infinite time sampling problem aliasing error) That is why I am postulating algorithmic approaches at the other LtU page.

Usefully, a mature transactions system can also use pessimism and interact with the scheduler to serialize the problematic transactions on the problematic resources and clean up any snaggles. Lock-based approaches don't have that sort of room to grow more intelligent.

Maybe it would be more productive to invest the energy in adjusting our algorithms to be concurrent, instead of expending energy to create more spaghetti to make a single-threaded algorithm pretend it is running on a single thread.

It is unclear how queues are supposed to help, or why their latency should matter.

If my single-threaded stateful code needs the result of a functional computation to progress, how does putting that computation into a queue help me progress? What does my single thread of stateful code do next?

Am I to assume that I've already gone to the effort of implementing lightweight threads or some sort of continuation framework so that my (no-longer) single-threaded code may yield after putting something in a queue and find something else to do until its parallel computation is completed?

It may be wise to keep in mind that 'multi-threading' is not the same as 'multi-CPU-cores'.

I am assuming multi-core, because that is the context in which Tim Sweeney raised the need to make his 30% code concurrent.

I understanding that in a game the 70% (95% performance) code is mostly rendering, which can be split into parallel chunks (we do not need to wait on the result of one chunk to start the next one). Thus you have M cores processing N slave queued tasks/chunks. Afaics, if each N is long-duration (as you asserted) compared to the latency that the Nth task sits on the queue, then there is no significant synchronization perturbation error compared to converting the master thread to be concurrent. Note the slave tasks operate in parallel to the master thread and can place their result in a queue as well, so there is no blocking wait. Latency is the only issue.

The imperative code tends to compute a function, do something with the result, gather some parameters, repeat by computing another function.

Agreed, if that is true and not the model I assumed above. You might be correct if the 5% must all be completed before the next frame of rendering parallelism can be started; however, the prior frame could still be rendering while the 5% for the next frame is running. I am assuming above that roughly up to 40% of the 5% is breaking up the monolithic rendering task into parallel sub-tasks. I suspect I am correct (my experience was doing real-time 3D rendering on Intel 386 processor before 1993 and Art-O-Matic real-time cell, depth-of-field blur and photo-realistic shading in 1997, 12 years before Street Fighter IV and with much less processor power, worked on Corel Painter and EOS Photomodeler 1993 - 1996, also Ventura Publisher at the MacOS->Windows emulation layer).

Now, as a simple guesstimate, one might say the further breakdown is: 5% imperative, 20% small and medium functional computes, 75% large functional computes. (I feel I'm guessing low at 20%.)

I think the 70% code (95% of performance) is already running multi-threaded, see slide 42 of Tim Sweeney's presentation.

Don't mistake a claim that transactions should be supported as widely as possible for a claim that transactions require being supported as widely as possible.

I fail to see the distinction. I assume you think that barriers can be erected with the "hack glue" (apologies for derogative just my terse pessimism about them being perfectly damped barriers), and I visualize those as leaking the dependencies some where any way (perhaps in future time as composability deadlocks).

From this page, maybe these are relevant:

Maximize Locality, Minimize Contention
Use Threads Correctly = Isolation + Asynchronous Messages
Avoid Exposing Concurrency: Hide It Inside Synchronous Methods wrote:

* High-latency operations that can benefit from concurrent waiting...
* Compute-intensive operations that can benefit from parallel execution...

RE: STM Risks

I'll make this my last comment, since I feel we're abusing LtU as a forum. Ehud attempted a while back to disabuse us of that notion: blog, not forum.

Regarding priority: It is possible to have a low-priority transaction update a queue (which naturally serves as a transaction barrier) such that an observer of the queue will later spin off a high-priority transaction. And vice versa. However, for I/O responsiveness, one ensures that the tasks associated with an input operation maintains a relatively high priority until appropriate feedback can be offered (such as greying a button and setting up a progress bar, if it will be a while).

Within any given transaction, transaction-priority cannot change. If there are hierarchical sub-transactions (as usually implied by 'atomic' blocks) then priority among those only impacts conflicts between sub-transactions.

One can use transactions at coarse or fine granularity much the same as one can use locks. If anything, the lock-based design has much greater pressure than the transactional approaches to favor coarse granularity in order to avoid deadlock issues. Examples include the Giant Kernel Lock and the Global Interpreter Lock).

our algorithms have to be well structured for concurrency, and any low-level mechanism will just push the aliasing error around but not eliminate it

Agreed.

The key advantage lock-based has over rollback-based, is other than a deadlock, we are sure not to waste any time

I suppose by not 'wasting any time' you refer to the need to occasionally perform 'rework' in a transaction-based approach. It is true this is an advantage of a lock-based approach.

However, to compare that properly against the STM approach, you need to determine how much 'time' is wasted by locks blocking safe concurrent reads on shared data and safe concurrent writes on subsets of shared data, plus how much 'time' is wasted working around the limitations of locks. That is, the granularity of locking also has costs (should there be enough cores for parallel operations).

deadlocks can broken by accepting atomicity error

Deadlocks can be broken in other ways, too, that don't involve atomicity error.

For example, you could throw a deadlock exception in one of the threads still waiting on a mutex, choosing to break isolation and lose a message instead of breaking atomicity. I suggest that this is a wiser route, giving much more opportunity for programmers to take intelligent action after a deadlock.

Of course, it remains less than ideal that one need consider deadlock at all... and reliability will likely become a problem no matter the resolution mechanism.

AXIOM: if the algorithms are execution-order dependent, we are going to get error some where in concurrency.

This axiom does not hold in our universe. One can have many partial-order dependencies and still achieve observable determinism in the presence of concurrency. This is a feature leveraged by data-parallelism and declarative concurrency.

That is why I am postulating algorithmic approaches [for execution-order independence]. Maybe it would be more productive to invest the energy in adjusting our algorithms to be concurrent, instead of expending energy to create more spaghetti to make a single-threaded algorithm pretend it is running on a single thread

Not all services or algorithms can be made execution-order independent, especially not those involving stateful manipulations and other sorts of I/O. In part, that is because many important problem-domains simply do not allow execution-order independence.

Further, execution-order independence does not readily compose unless working with pure operations. That is, even if arbitrary services S1 and S2 are individually execution-order independent, such that final semantic state is the based on the set of inputs received, it is generally not the case that a new algorithm or service S3 - that interacts with one service based on the state of the other - will also be semantically order-independent.

As far as adjusting services to be more concurrent, I agree. As noted above, one can use transactions at a fine granularity such that services possess a great deal of imperative (multi-task) concurrency tamed by large numbers of small transactions. For even more concurrency, individual tasks with large functional computes may leverage data-parallelism.

Characterizing transactions as producing single-threaded algorithm spaghetti ignores many patterns for leveraging transactions.

I understanding that in a game the 70% (95% performance) code is mostly rendering, which can be split into parallel chunks

The 90% CPU profile mentioned in Tim Sweeney's presentation referred to scene-graph traversal, physics simulation, collision detection, path finding, sound propagation, and so on (review slides 12, 17, 47). The scene-graph traversal would be included in determining what to render (and where to render it), but rendering itself was lumped with 'shading' as a separate compute running on the GPU.

When an object is updated, it is physics simulation and collision detection (and other gameplay rules) that determine with which other objects it must interact. It isn't as though the bullet knows when created that the tank will lie in its path. In this way, and others, imperative 'stateful' code interacts non-trivially with the functional code.

RE: STM Risks

...I feel we're abusing LtU as a forum...

I think we did a pretty good job of characterizing many of the issues and tradeoffs of one the main points of the blog post (concurrency migration). It would nice if one person could come in and say "all the issues are and here is a link" without any discussion, but this blog page is 1 year old (the STM blog page is 3 years old) and no one had done that. I arrived here as a reader wanting that information and could not find it any where succinctly in a Google search I do hope Ehud finds a way to close leaves so we can ignore sections that do not add value according to our individual interests. I hope he also fixes the bug where if you don't close a tag, the rest of the page (below your post) is affected.

One can use transactions at coarse or fine granularity much the same as one can use locks. If anything, the lock-based design has much greater pressure than the transactional approaches to favor coarse granularity in order to avoid deadlock issues...

I agree with the deadlock risk and with the more well-defined granularity window for transactions, but the use of that window will still depend on the algorithmic opportunities available. I think with locks the window risk is harder to ascertain, depending on the algorithm, so it might be wider, narrower, unknowable, random, etc.. It really depends on the situation.

...granularity of locking also has costs...

Agreed.

For example, you could throw a deadlock exception in one of the threads still waiting on a mutex, choosing to break isolation and lose a message instead of breaking atomicity.

Clever :) It will depend again on the situation and each choice can introduce domino effects and complexity.

AXIOM: if the algorithms are execution-order dependent, we are going to get error some where in concurrency.

This axiom does not hold in our universe. One can have many partial-order dependencies and still achieve observable determinism in the presence of concurrency. This is a feature leveraged by data-parallelism and declarative concurrency.

I think it does hold, if we interpret it correctly.

Just because you can fit perfectly, doesn't mean you will. If one can find the perfect fit to the partial order dependency, then one can get the minimum impacts. We must define error. I mean aliasing error. That can mean many different things. For example, one way we get determinism with STM can be via future's cells, in which case the aliasing error can be real-time degradation as we had already discussed (no need to rehash that). If we have done very good fitting to the partial-orders, these effects are minimized, perhaps even unnoticeable (e.g. JPEG lossy DCT compression has aliasing errors but we can't see them usually).

Another point is that fitting to these partial orders is another way we are adjusting our algorithm to concurrency, which is my main point, which you agreed with. We are simply moving towards fitting our algorithm to concurrency in ways that minimize aliasing errors to tolerable levels. We have many tools to help us and we must choose what fits best, each situation will vary perhaps.

Not all services or algorithms can be made execution-order independent, especially not those involving stateful manipulations and other sorts of I/O. In part, that is because many important problem-domains simply do not allow execution-order independence.

Ditto what I wrote above. So I agree, but we will get (possibly unnoticeable and irrelevant) aliasing error to the degree we do or do not fit well enough to concurrency. It is still an algorithmic fit. STM is in the toolset for that.

Further, execution-order independence does not readily compose unless working with pure operations.

Agreed very much. That is why I suggested the multi-threaded parts be walled off as much as possible from the external interfaces. But of course that causes problems as we discussed. The models used will depend on the situation. You really need an expert working on this! That is the main thing I hope readers take away from this. This will not be a cake walk for a novice. Just slapping STM or monitors on single-thread code could be a nightmare in many cases.

Characterizing transactions as producing single-threaded algorithm spaghetti ignores many patterns for leveraging transactions.

I think I meant that in approximately in terms of someone blindly slapping transactions on a single-threaded code base and praying. Agreed that transaction belong in the toolset of the expert.

...imperative 'stateful' code interacts non-trivially with the functional code...

Agreed, I was thinking of that too. It is challenge for sure. Should be fun if one has a lot of hair to pull out still :).

Will be interesting to see what they come out with for UT4. I read elsewhere to expect no new UT4 for several years.

I will give you the final reply, I just wanted to end up by showing that we reached mutual understanding on the issues we discussed (and I assume we didn't discuss all the issus). I love that!

Lock-freedom

STM is based on Fraser's work on lock-free algorithms. Has the Haskell STM lost those properties? If not, then lock-free algorithms cannot suffer from (global) dead-lock or live-lock.

Not all STM is lock-free

Well, not all STM is lock-free - many different transaction mechanisms have been proposed.

But "lock-free" is one of those nice problem hiding euphemisms.

Deadlocks are replaced by cyclic conflicts in "lock-free" transactional mechanisms. The deadlock resolution that many "locking" systems have, including locking transactional systems like Oracle, is replaced by conflict resolution.

I say "replaced" in a very liberal way, they're conceptually very similar and can produce very similar problems in naive implementations :-)

Lock-free, not lock free

Lock-free is not a euphemism, it is a defined term (defined in the paper I linked). A lock-free algorithm, by definition, cannot dead- or live-lock. Starvation may still occur. The paper above suggests that wait-free systems (lock-free but further starvation cannot occur), do require a system like deadlock resolution, but lock-free systems do not. I'm not sure if you are arguing against the defined term. If so, can you provide an example of a case that would produce "cyclic conflicts" and what problem it would cause.

There is no reason...

There is no reason that a defined term can not be an euphemism, as I'm sure victims of "ethnic cleansing" will agree :-). "Lock-free" in many cases is a euphemism for "abort and retry".

Add to the fact that "lock-free" algorithms require locking at some level (usually in terms of processor bus locks) and that some may use OS level waiting locks to implement the "lock-free" protocol and you have a very murky term indeed.

Lock-free algorithms do have a system similar to deadlock resolution, but it's not a system that actually resolves dead- or live-lock (I never said the systems DID actually dead- or live- lock), it's the conflict resolution mechanism used to resolve data conflicts (the Conflict Resolution Algorithm inherent in Optimistic Concurrency Control Validation).

In one narrow lock-free forward validation scheme (which is NOT the totality of STM or lock-free STM conflict resolution algorithms), your conflict resolution algorithm is to roll back any transaction that would conflict with another as that conflict is found. This algorithm avoids the problem of "cyclic conflicts" (which in a waiting system would cause a deadlock).

In a backward validation scheme this method is not possible and both transactions must be aborted/retried, in the case of a cyclic conflict (such as pieces of state written in the opposite order order) - in a naive implementation of this lock-free algorithm, it is possible to get pathalogical starvation, in the same way Ethernet collisions would in a naive implementation.

In similar situations in a locking database, it will deadlock (seeing the cycle in the graph of "waiters" for acquiring locks), detect the deadlock and roll back one (or both) of the transactions in the waiting list, possibly to be retried. My point was that this deadlock resolution is very similar to the conflict resolution of backwards validating optimistic schemes (for instance).

Re: working on it

Do tell?

Erlang already has STM...

... it's called Mnesia.

Mnesia tables can hold any kind of object, and don't necessarily have to be persisted to disk. Transactions consist of supplying a transaction function for the database manager to execute (via mnesia:transaction(Fun)).

Erlang folklore is that not many Erlang programs need or use Mnesia, but it's there for when shared state is useful. Maybe Tim's "70% stateless, 30% stateful" statistic could be modified downward, say something like "70% stateless, 20% message-passing, 10% stateful"?

Unnecessary?

...the unnecessary distinction (at compile time) between collocated and distributed systems.

How is the distinction unnecessary? Do I really want to use Erlang threads for local SIMD? Isn't it nice when static analysis let's us do without format checks for local arrays, just chewing them up as memory blocks? More generally, do we really want to be forced to assume that every data item is potentially corrupt, that every communication will fail to return?

collocation used when available

The distinction is unnecessary because one can take advantage of locality when available and automatically integrates extra systems for checking messages (and ordering them, ensuring reliable delivery, checking for type-safety, etc.) when distribution is actualized and comes from a not trusted remote platform or actor. As far as static analysis goes: no reason not to do that. Sriram's work includes typed actors for Java, and in my own work I'm certainly using statically typesafe actors as well.

Still, I don't agree with many of Sriram's conclusions, such as his idea that "shared nothing = failure isolation" actually applies in a system of shared services, or that actors eliminate need for transactions. Partial failure in an actor configuration is a serious problem, race conditions still exist, and in actors model the transactions would do well to exist and simply be distributed. Similarly, I don't fully agree with his assertion about locality.

There are useful compromises in distinguishing (at compile time) between collocated and distributed systems. The language design I've worked on has first-class actor configurations and allows annotations of locality (e.g. actor A is nearby actor B) with semantics for automatic distribution and mobility, albeit distribution limited by certificate-based secrecy-level challenges between platforms (so that sensitive actor names aren't accidentally distributed). My approach results in "cliques" of actors that will (after distribution) always be localized on specific machines. These "cliques" may be compiled to take full advantage of guaranteed locality, while between cliques one still takes advantage of locality when it is present. (There are many other advantages of actor configurations as well, but none so much as this one reflect the advantage in recognizing locality yet still supporting automatic distribution at compile-time.)

As far as SIMD goes, that can still be used in actors model. One does require the language provide the necessary abstractions for it, of course, such as pairwise operations over arrays or matrices.

No inconsistency?

Sriram Srinivasan wrote:

There seem to be a number of inconsistencies in Tim Sweeney's preferences. "large scale stateful software isn't going away" and "imperative programming is the wrong default" are contradictory, or at least sounds like an impasse.

You stated why the only the marriage of stateful with non-imperative (i.e. lots of monads, even if not the predominant default) might work in practice:

STM works for Haskell because mutation is very limited in the language; I very much doubt STM will be a good solution in the hands of a C#/Java programmer.

I've read Tim's case...

... and I have issues with some of those points based on personal experience. So, answering those points in order...

* Large stateful software is indeed here to stay... although, I tend to think single-owner state is still a good idea (even if it can be passed between owners).

Microsoft's Singularity project allows single owner state to be passed around and even passed across channels (a messaging mechanism).

* Preemptive threads and monitors don't scale... and neither do transactions. Anyone who's worked on a high performance online transaction server will tell you that while the transactions are there for their contractual consistency (a legal requirement in much financial software), not because they scale well.

Quite often you end up funneling data back to a single points because it's the only way to get the transactions occurring reliably at a decent speed. I'm talking about a custom in-memory database (check pointed) not an IO limited disk monster.

Tim's case of 10000 entities with few overlapping updates feels a bit contrived when you're talking about "mainstream" programming languages, or even games. For example, how to correlate large amounts of information about 10000 entities into a single place concurrently? What if the information needs to be worked on as it's being provided? Transactions scale poorly in such cases.

* Imperative programming being the wrong default... I tend to agree with it, but I don't think it should be dogmatic :-)

* No argument there.

* Well, I have supporting evidence otherwise. I've implemented highly parallel systems (scaling to 32 processors rather easily) that contain thousands of entities with mutable state all being correlated against real time events and providing real time information updates.

How did I do it? Well, most of it was done with lock-less message passing and by using user mode context switches in certain cases (implemented transparently). Transactions never would've met some of requirements of some of the response times required.

These systems weren't trivial either, they were large scale device management systems that often had to handle the interactions of multiple devices at once.

The best solution to use is purely dependent on your design and architecture. If you come up with your design and architecture with message passing in mind, it's just as viable. I believe this applies to games as well in my experience with them.

The solution to use should probably be decided on a case by case basis. Each has it's merits and flaws.

Agreed

Also, why is the under-the-covers complexity of STM bad, when the under-the-covers complexity of garbage collection is good?

Because garbage collection

Because garbage collection is usually a bad idea. Seriously it is. No Seriously. In the vast majority of cases the ownership of memory is clearly defined and best dealt with by the compiler or programmer. In the few cases where garbage collection is useful a hell of a lot of implementations fail to actually collect it.

Basically, using garbage collection should be the programmers choice (it's supposed to be their job after all). The same seems true of STM. Widespread application of a technique to all areas as if it were a silver bullet is always foolish.

GC

According to my reading, your post make two points:

1) Many implementations of garbage collectors are bad.
2) If we could avoid having a GC, and still get rid of the garbage that would be better.

In a language with functions as first class citizens, it is not obvoius to me that you can easily live without a garbage collector - especially if you do not want to require whole-program compilation at all times.

Maybe the seeds of some arguments?

The post seemed to claim that there are some issues with transactions (not detailed, maybe starvation or contention?) which will at the very least damage composability, and that STM doesn't address distributed systems at all.

I didn't see anything to back up these claims - as far as I can tell you could rewrite everything in terms of garbage collection: "dynamic memory is a bad idea, garbage collection moves memory management into the system but doesn't explain how it will solve any of the problems and I think it won't, garbage collection won't work in distributed systems"

Perhaps these arguments have more force against STM, and LtU seems like a fine place to ask for them to be worked out (homepage vs. forum is debatable, but not worth debating). Summing it up, I'd say Patrick Logan raised some reasonable questions, but didn't provide answers.

So, can anyone point to theoretical limits, classes of programs current STM implementations handle poorly, composable reasoning strategies for correctness and progress of programs under other concurrency strategies, etc.

Point, where art thou?

I'd say Patrick Logan raised some reasonable questions, but didn't provide answers.

It's more like hinted that reasonable questions exist but failed to show them. Right now the post is so heavily edited that it looks worse than a wiki page on thread mode during a flame war.
I read the post twice in this current incarnation (i.e. with the comments mixed in the original text in unpredictable ways) and I'm unable to find anything other than "STM is bad, mmmkay?". There are valid questions to be raised wrt STM, for example they fail to provide process isolation (e.g. I would like to ensure that only a group of process access a bunch of STM vars, so if starvation occurs only this subsystem needs to be crashed and restarted).
Just saying that STM is wrong and everything should be done in with message passing is bizarre. There are other concurrent models (e.g. Oz's dataflow variables) so where is the argument showing that message passing is better for most domains? This feels like Smalltalk advocates or pure FP aficionados that believe in one hammer to rule them all.
I'm sorry Patrick but sometimes message passing is the wrong tool (while I believe that it's the most useful default model).

DBMS transactions do not scale

Are there any transactional systems out there that are not the usual shared memory or superfast interconnect giants?

Distributed transactions are probably the hardest research problem in database theory, practically not solved in large scale.

Transactions can scale, given enough information about them - whether they commute etc., but that would probably lead to integrating a theorem prover in the language, something more radical than using message passing. It's an interesting paradigm though.

Actor Transactions

Are there any transactional systems out there that are not the usual shared memory or superfast interconnect giants?

Transactions over actors should scale so long as the data or resources being accessed by the transactions are not highly focused on a very few processes.

STM replaces locks & semaphores, not MapReduce

Many problems can be parallelized easily by splitting the data up and streaming it through many processors. And in these cases, you definitely want to use MapReduce, nested data parallelism, or some other kind of parallel 'map' abstraction.

Unfortunately, not all problems are so easy to parallelize. Witness Tim Sweeny's example from the Unreal Engine: 10,000 game-world entities spread across many cores, all being updated in parallel. There's no easy way to do this with any variant of 'map', because the entities in the game world interact in arbitrary ways, and affect each other's state.

Now, you could solve this problem with semaphores and locks, but you'd go insane. And this is where STM offers a huge win: It takes all the nightmarish complexity of use fine-grained locks, and replaces it with a simple transaction model, complete with automatic restarts and strong guarantees against deadlock.

I can't think of any reason to loath STM so strongly unless you are absolutely sure that you'll never encounter a problem which requires shared state, and which resists the other tools in your toolbox.

Might have a point...

Lots of dodgy assertions and straw-man--bashing, but there are some lucid moments. This sentence intrigued me:

And if some group is going to retrofit transactional memory into some significant Java or C# system, well, they would be far better off investing that time into a simpler, shared-nothing rewrite into a language like Erlang, a simpler language like Smalltalk, or even a better shared-nothing coordination mechanism like Javaspaces.

Does he have a point? Might industry invest resources in STM that would be better invested in the Erlang model?

How to fix a broken toolbox.

Does he have a point? Might industry invest resources in STM that would be better invested in the Erlang model?

Who knows? Both Java and C# provide almost none of the guarantees one has using STM in Haskell or message-passing in Erlang. He's comparing between kludging STM in an existing imperative language against rewriting millions of lines of code in an obscure language (I like Smalltalk and Erlang but they're unknown to most of the industry) or start using a tuplespace mechanism to solve all concurrent problems. It's an apples and rhinos comparison. A much better comparison would be: kludge STM and light-weight, isolated, processes with message passing in both languages and see what would be easier (hint: isolation in Java is far away from being lightweight). Or else let's compare between rewriting to Haskell with STM vs Erlang with message passing (assuming that both have the man-centuries necessary to create the libraries and tools available in Java or C#).
Also how would Smalltalk automagically solve concurrency issues. AFAICS most Smalltalk implementations today use the shared-memory, threads and locks as it's basic concurrent solution.

Java + Haskell = Quark?

I don't know enough about STM or Quark, but would it be easy to port STM to Quark so you could get access to the Java libraries via something very Haskell-ian?

I did just that

I have written an STM implementation integrated with quark: http://diversions.nfshost.com/blog/2008/03/27/a-cal-webapp-with-persistent-data-using-gwt-stm-and-bdb/

I don't think my STM implementation is terribly good -- it probably locks more than it should -- but it works with the various examples SPJ uses, and has similar syntax to the Haskell version.

My implementation persists the mutable state to Oracle's BDB.

I find Quark a great way of experimenting with functional programming while still having access to my favourite Java libraries.

(lucky these threads live forever :-)

It looks like there might be

It looks like there might be an implementation of STM (LibCMT) for C#. C# has a form of software isolated processes (through the use of AppDomains), though they're not fully-fledged as yet. They can be used to model the kind of isolation for transactional memory I believe you're referring to.

MC# and Cω use message-passing concurrency, in slightly different forms, although they're not strictly C# (MC# compiles to C#, so they can be integrated at some level). They are similar enough to C# that the effort involved in porting existing concurrent C# code would be entirely due to the switch to message-passing.

It would be an interesting experiment to take an existing piece of concurrent C# code and adapt it to both the LibCMT library and MC#, to get a feel for which is the shorter trip.

STM package written in C# published 2005

Seems like Microsoft Research published a C# Software Transactional Memory package back in 2005...

The SXM is a software transactional memory package written in C#. It is much easier to use than prior STMs because it uses Reflection.Emit to transparently add synchronization code to sequential data structures.

Not much concrete info, but still interesting

I've seen lots of cheerleading about STM, but not much practical experience to back it up. Patrick is bright enough that I don't immediately dismiss his opinions as aimless ranting :)

Uses of STM?

Who is using/has used STM for real work these days? In what language?

Erlang, while obscure, does have somewhat demanding real-life applications.

Has the situation changed from eg. Haskell vs. Erlang, reloaded?

Yes, in Java

This incarnation is being used in an open-source university management system. The system was developed in-house and has been in production for a couple of years now. For more insight, you can read the SCOOL'05 paper, or go through the presentation.

Tonight make it magnificent

I didn't initially notice that the post in question seems to be a reaction, at least in part, to this ACM Queue article (at least, I think it is; I have a hard time telling what's really going on on that blog page.)
Among other things, the above article "illustrates how an atomic statement could be introduced and used in an object-oriented language such as Java." I think Guillaume Germain captured one of the concerns about this possibility quite well:

I can see misguided programmers starting to sprinkle their code with 'atomic' statements ("just in case"), a bit like one would do with 'yield' statements in a non-preemptive concurrent system.

I can see how this prospect could generate a certain amount of trepidation. The problem, though, is not with STM itself, and in that respect, Patrick errs when he says things like "this is a really bad feature that could screw up a lot of software for years to come". Refactoring for factuality, I think he means something more like "I'm concerned that this is an all-too-seductive feature that could screw up a lot of software for years to come". In that, he could be right, but it actually has nothing to do with the technical merits of STM.

I don't see the concern

Essentially that argument reduces to "some people might not use it right" which I must point out is nonfalsifiable (ahead of the fact anyway) and can be said of anything. For example, has it been our experience that misguided programmers have started sprinkling their Java code with synchronized blocks, just in case? I haven't seen that happening; the problem I've mostly seen is underuse of synchronized causing race conditions. (Please don't take this as an endorsement of Java shared-state concurrency.)

Predicting feature quality

Essentially that argument reduces to "some people might not use it right" which I must point out is nonfalsifiable (ahead of the fact anyway) and can be said of anything.

I almost agree, except I suppose that there must be some class of features which would actually be a bad idea to add to a language such as Java — for example, how about raw memory pointers? I think there are some plausible criteria for recognizing certain kinds of bad features in advance — for example, features which violate safety properties. However, in this case, an argument of that nature hasn't actually been made, afaict.

But many decisions in PL design are based largely on the goals and convictions of the designer. If Patrick were designing a PL, presumably he'd leave out STM, and then the question could be decided in the marketplace, which is probably where such decisions ultimately belong (because there doesn't seem to be any other reliable way to decide them).

Yes!

For example, has it been our experience that misguided programmers have started sprinkling their Java code with synchronized blocks, just in case?

For the record, Yes, this has most definitely been the case. I have seen exactly this phenomenon all over the codebase of at least one large (top 50) e-commerce site. When subtle synchronization and race condition bugs occur (e.g., users click the same effectful link several times in quick succession, session data gets messed up, something bad happens...), I've seen synchronized markers employed like pixie dust until the problem seems to go away. I wish I were kidding when I say that virtually no one understands how this software really works or why synchronization occurs where it does.

Essentially that argument

Essentially that argument reduces to "some people might not use it right" which I must point out is nonfalsifiable (ahead of the fact anyway)

Not completely non-falsifiable. The C# designers regretted adding the convenient "lock { }" statement to the language for just this reason: it gave people a false sense of security, and harmed performance.

A sympathetic take

Though I agree with those who found Patrick's post difficult to follow and possibly overly provocative (though it is a personal blog post... ;-) ), I found myself in sympathy with the overall feeling.

I have nothing against research into STM, of course, and can imagine some scenarios where it would be a nice tool to have if used lightly and tastefully, but there seems to be a danger that it might be becoming the latest panacea. Such "silver bullets" discourage people from trying known good solutions where they are available (such as message passing) and instead simply enable existing pathologies for a bit longer. (Or simply convince people that the problems will be gone Real Soon Now.)

If we follow the "we already use transactional DBs" meme to its logical conclusion, essentially what STM is doing is cutting out the middle-man and embedding the database into your app, but without the persistence.

As anyone who has worked with a transactional DB with highly contended data knows, transaction support in the DB gives you some things for free, but makes other kinds of headaches. Retry support after a rollback, for example, and subtle deadlock conditions can create unacceptable performance and reliability hits.

This doesn't mean I want to get rid of transactions, of course, but it makes me much less optimistic that adding more of them liberally to my system will make my life easier for large apps.

If we add STM as a tool high in the CTM hierarchy of statefulness, where we take it for granted that we are not using any more state than we really need to to solve our problem elegantly, I'm much less worried, of course.

On the comment of being the

On the comment of being the latest panacea, I have to argue that such thinking might make progress a bit stagnant wouldn't it? Researching something thoroughly until it's proven to not work well seems a better idea then just go with something that's proven. I'm not sure if that was your intent in that comment but it seems to be the result.

Eat your dinner before you reach for dessert

I have to argue that such thinking might make progress a bit stagnant wouldn't it? Researching something thoroughly until it's proven to not work well seems a better idea then just go with something that's proven.

Given that there are a whole bunch of great ideas on how to reduce state and improve concurrency that we have already that have yet to gain wide acceptance and implementation in industry, I can't say I'm all that worried I might be stifling innovation by critiquing an idea whose major appeal is that it looks just like the same rickety old techniques most people are familiar with.

What?

...an idea whose major appeal is that it looks just like the same rickety old techniques most people are familiar with.

How is that even slightly true? STM, as far as I can tell, doesn't look anything like conventional techniques for thread communication.

The main appeal of STM is not really that it looks like anything else, but that it gives you very real guarantees about the composability of working components. You can take two transactions which work well on their own, and compose them in a number of ways into new transactions, the semantics of which have nice properties. The most obvious way is sequentially, and you get a guarantee that the result of the transaction, supposing that it commits, is independent of the behaviour of all other threads -- even if things happen in another thread in between the two transactions you've composed, they can't impact the result.

You can also compose them such that if the first fails to commit at first, and decides to retry, then the second is tried instead (and if it retries, then the whole transaction retries). This allows you to wait for any of a number of resources to become available, and to do different things depending on which it was. The inner transactions don't need to be prepared specially for being composed in this manner.

You also get all this composability even in the face of exceptions. A subtransaction which wants to fail by throwing an exception doesn't need to care about what changes the transactions it might eventually be part of might have made in case they don't catch it (and be left half-committed). It just throws the exception, and that causes the transactions it's a part of to behave as if none of the things they've done had ever happened and rethrow the exception until it's caught.

Incidentally, this behaviour can be used to cause transactions to throw an exception when they otherwise would have retried. x `orElse` (throw ...).

The trouble with many of the other systems, including most of the message passing systems, is that in order to compose components that work separately, one either ends up rewriting them, or sticking a layer of indirection in front of them, or worse yet, setting up some system of locks. Or, you design some transactional system from the start in terms of your messages. It would be easy to put STM to work in a combined solution there (STM transactions as messages), but the other way around is trickier to get right.

But what I'd really like to point out, more than all of this, is that STM is not really about the operational behaviour of the system. All it does is provide a language for saying things that you need to be able to say with regard to thread communication. "These things happen together, or not at all.", "If this doesn't work, then try this instead.", and so on. It's the right level of abstraction because it leaves the compiler and library writers room to implement things in more and more clever ways, getting more performance, while keeping the semantics (in the sense of the overall effect of the program) fixed.

This is very much the same as garbage collection. Early garbage collectors were not great at what they did, but they provided the machinery for the right abstraction for working with memory. The early implementations of STM are not great, but they're not nearly as clever as they could be at managing the resource they've been given -- they're the somewhat obvious implementations so that we can see what programming using STM is like. Specifically, the decision as to when to retry a transaction, and which transactions in the system are likely to run into contention issues can probably be done in a much more clever way, though the current approach is not as bad as it could be. If it turns out that we like STM (and personally, I really do think we will), then the lower level implementation can improve without us having to rewrite our programs, much like garbage collectors did.

Another unrelated and small remark that I'd like to point out is that message passing is the sharing of state. It's just a particular form of threads communicating their state to one another. When one thread sends another a message, it is creating a data dependency between its local state and that other thread's local state. Systems which can't do this are not really expressing concurrency, just parallelism. I point this out only because it grates on my ears a bit when people expound the virtues of "no shared state", and then talk about systems where threads share state with one another (albeit in a responsible and selective fashion). I know that people are using "shared state" to refer to one particular model of thread communication, but that should not lead one to the idea that other models share no state between threads. On the flip side of the same coin, calling STM "shared state" would be somewhat degrading, because while it could be used in that way, it would be an unduly awkward way of achieving that effect, and one would get none of the benefits of using transactions. Similarly, one could use message passing where threads continuously communicated all changes in any parts of local state which might be relevant, but it would be an awkward misuse of the system.

Shared Context

How is that even slightly true? STM, as far as I can tell, doesn't look anything like conventional techniques for thread communication.

The familiar thing is transactional databases, which are quite commonly, though not always with conscious awareness, used as big multi-process global variables.

I point this out only because it grates on my ears a bit when people expound the virtues of "no shared state", and then talk about systems where threads share state with one another (albeit in a responsible and selective fashion).

Many of us here take for granted a theory of statefulness that is not binary, but rather hierarchical, following CTM. For us, shared state refers to a particular slot in that hierarchy. In this scheme, message passing is less stateful than full state-sharing.

CTM and Transactions on Cells

This may be a naive question, in that I don't know a lot of the details about STM.... but.... Isn't there a correlation between STM and the CTM section 8.5.4 on Transactions - Implementing transactions on cells?

Dey turk ma jugh!

But what I'd really like to point out, more than all of this, is that STM is not really about the operational behaviour of the system. All it does is provide a language for saying things that you need to be able to say with regard to thread communication. "These things happen together, or not at all.", "If this doesn't work, then try this instead.", and so on. It's the right level of abstraction because it leaves the compiler and library writers room to implement things in more and more clever ways, getting more performance, while keeping the semantics (in the sense of the overall effect of the program) fixed.

This is very much the same as garbage collection.

Aha. Which helps explain the reaction, too. On the purely pragmatic front, it's reasonable to be skeptical that compilers are going to do a good job of implementing a new high-level abstraction, until their success is well proven, and the appropriate use cases well understood.

There are also all the factors surrounding what happens when something that used to be hard becomes easier. Concurrency is a bit different from garbage collection, though. When garbage collection works well, which is often, programmers don't really have to reason about memory management much (or even understand it!) That seems unlikely to be the case for concurrency: it's still going to be necessary to reason about it, even if the way concurrency requirements are expressed becomes more high-level. So the concern that programmers will misuse the feature due to lack of understanding of the underlying issues seems to have more of a basis in this case, than in the case of garbage collection.

(This just helps me understand why STM might have provoked a negative reaction, I'm not agreeing with the reaction.)

GC works, programmers don't

When garbage collection works well, which is often, programmers don't really have to reason about memory management much (or even understand it!)

Tell that to my poor desktop with 1.5gb of RAM entirely occupied. Or to our customers servers running J2EE. People don't get memory management and just hope that the GC will magically keep the memory usage low, the same thing happens today with concurrency and will continue to happen, even if the entire industry adopts Haskell/STM or Erlang. Also as with memory management, concurrency solutions will keep showing up, even if the industry adopts STM as it adopted GC: there's still research on linear resources and regions because GC isn't enough. Programming is a very hard activity and all tools can (and probably will be) misused. Today "memory is cheap" solves all memory leaks problems, tomorrow starvation and friends will require the "cores are cheap" approach.

Tell that to my poor

Tell that to my poor desktop with 1.5gb of RAM entirely occupied. Or to our customers servers running J2EE. People don't get memory management and just hope that the GC will magically keep the memory usage low, the same thing happens today with concurrency and will continue to happen, even if the entire industry adopts Haskell/STM or Erlang.
J2EE is the antethesis of concurrent programming -- it is the new mainframe. Cram everything into one JVM then put a lot of textual boilerplate around it to make it appear "loosely coupled". Better concurrent programming tools would allow these systems to "spread out".

The same is true of desktops, certainly. Only until recently though they've not had multiple CPUs readily available, so there was no where to spread out to. Now that there is room to spread out, the languages we've been using prevent it.

STM will not help spread systems out. Message passing and shared-nothing sequential programming will. Most programmers should be learning about messaging and most tool development should be in support of them. STM is a shiny tool in the best interest of very few programmers.

Transactions

J2EE is the antethesis of concurrent programming -- it is the new mainframe.

J2EE is a collection of a whole lot of different standards. It includes successes such as servlets and JSP, failures such as EJB 1.0, and JMS: Java's message passing API, which I thought was what you were arguing for. And it's hard to see how a language (Java) with threading and synchronization primitives built in can be the "antithesis of concurrency". Even if we stipulate all the complaints about the model itself, it's undeniably deeply concurrent.

I'm unclear what you mean by the "new mainframe" remark. Did you mean that J2EE is a highly successful high-throughput centralized system? Somehow the remark came off as negative; perhaps this is one of those from-the-hip barbs you admitted to elsewhere?

STM will not help spread systems out.

You don't say what "spread out" means; my best guess is distributed computing. If so, when you say that transactions will not help with distributed computing you are correct. However this is unsurprising since transactions do not address distributed computing or communication of any kind; they address atomic updates. We could just as well have pointed out that message passing doesn't help with atomic updates, or that type inference doesn't help with male pattern baldness. The dichotomy you are pushing is entirely a false one: transactions and message passing are independent techniques. We can have one or the other or both or neither.

I happen to think that message passing is a good technique, although I lack the brazen confidence to say that it is what "most programmers should be learning about." As to who transactions will benefit, the answer is that it will benefit programmers who are working on software that requires atomic updates. Whether this is a "very few" programmers or not I have no idea; I know it includes me.

Finally, I note that the repeated characterization of people studying transactions as being driven by "shiny" things is inflammatory and counterfactual. If you don't care to address or even acknowledge the various arguments in this thread against your assertions, that is your prerogative, however if you could do so without insulting me I would appreciate it.

You don't say what "spread

You don't say what "spread out" means; my best guess is distributed computing. If so, when you say that transactions will not help with distributed computing you are correct. However this is unsurprising since transactions do not address distributed computing or communication of any kind; they address atomic updates.

I believe his core point was that both techniques can safely address concurrency, but that the message-passing abstraction also permits distribution with no additional machinery; it is thus ultimately more expressive and useful. Perhaps a CS version of Occam's razor is in order: don't multiply abstractions unnecessarily.

Irony

Perhaps a CS version of Occam's razor is in order: don't multiply abstractions unnecessarily.

Ironically, the occam programming language was developed as a message-passing language capable of expressing both concurrency (most occam programs were composed of networks of parallel "processes" which communicate through message-passing), and distribution (occam processes could be transparently distributed across a network of transputer microcontrollers). IIRC, the name of the language was chosen because Occam's Razor was a good expression of the underlying philosophy of the language.

Optimistic concurrency control is the big question mark

I've written (with some other students) and implementation of STM for Ocaml. The implementation was dirt simple; it took less than five hundred lines of code to implement it, and that includes extras like first-class transactions and optimizations to eliminate overhead in the first-order case. So I don't believe that it's a very complex model.

However, the giant question mark for me is the automatic rollback and retry. Being optimistic and rolling back on failure is extremely efficient when it works, but when it doesn't it can be very expensive. I just don't know how to predict the performance of an STM program, and I worry about debugging programs that fail to meet performance targets, because they may fail to hit targets nondeterministically.

And Maurice Herlihy (the guy who has mostly been pushing the idea) has never been quiet about this being the big research question behind STM.

If we add STM as a tool high

If we add STM as a tool high in the CTM hierarchy of statefulness, where we take it for granted that we are not using any more state than we really need to to solve our problem elegantly, I'm much less worried, of course.

I guess this is my perspective on the whole thing, which makes me see this as something of a mountain out of a molehill. I realize that STM and, for example, Mnesia, aren't precisely the same thing, but as compared to message passing, they're certainly more similar than different. So arguing for Erlang rather than STM seems a little short-sighted, when the Erlang community has already acknowledged the need for sharing at this level, and has in fact embraced and bragged about it! I'd refer back to PVR's Convergence paper.

The only thing I really have to say about the original blog post is that yes, Patrick is someone whose opinions I take seriously and yes, the current format of the post (99% comments by now?) makes it virtually impossible to tell what it originally said...

Em, peek. Is it safe?

I realize that STM and, for example, Mnesia, aren't precisely the same thing, but as compared to message passing, they're certainly more similar than different. So arguing for Erlang rather than STM seems a little short-sighted, when the Erlang community has already acknowledged the need for sharing at this level, and has in fact embraced and bragged about it!
The difference being that programming in Mnesia, even the "in memory" variety, is still deliberately a "database" thing. Most everything should be in Erlang.

Many of the Erlang processes might be very well be "database-like" stateful things, with the Erlang message queue and pattern matching essentially acting like a transaction processing monitor. No mnesia needed.

I can sympathize with game development. I've been a developer for digital electronics simulators where 80% is just objects and 20% is highly tuned C. I would imagine that not the average game developer even gets involved in that highly tuned part on a daily basis.

So it's not you I am worried for. It's the average game developer, the average financial systems developer, the average business systems developer. While the mechanism per se worries me, it's the increasing fascination with it that bothers me most.

Wow. I throw barbs like this out fairly often, shooting from the hip if you will, to see what comes back. I had no anticipation for this much feedback one way or the other. Very enlightening. I should have known it would be from the LtU crowd!

I had no anticipation for

I had no anticipation for this much feedback


You should post here more often... :-)

For those new to LtU: Patrick was a regular contributor in the early days of LtU, and I regularly try to persuade him to come back...

I Wish

I barely have time for my half-baked ideas on my own blog! Then I hit on something controversial like this and the firehose reminds me...

You see what I have to deal

You see what I have to deal with here, guys... ;-)

a bit more...

Nice to see you on LtU again!

Upon further reflection, I do have pretty big doubts about STM for Java/C#/etc. programmers... I had only encountered STM in the Haskell setting, and never really imagined that anyone would try to apply it in a mainstream language without radical changes to the host language.

It seems to me that, with the possibility of rollback and retry, STM requires a change to our understanding of side effects at least as radical as a switch to pure FP. Perhaps in the end the programming model won't have to change as much as switching to Haskell, for example, but I think that the understanding of side effects will have to change. Obviously the nice fit between Haskell and STM is well understood, but the other side of that coin is that this is a potentially huge hurdle for languages like Java.

Does anyone have a pointer to work on bringing STM to one of these mainstream languages, particularly solutions to IO operations and other side effects? Will we have to overhaul the entire JDK, for example? Even so, won't programmers need to understand that prompting a user for input needs to be "committed" before the program actually waits for that input?

STM package written in C# published 2005

Seems like Microsoft Research published a C# Software Transactional Memory package back in 2005...

The SXM is a software transactional memory package written in C#. It is much easier to use than prior STMs because it uses Reflection.Emit to transparently add synchronization code to sequential data structures.

Comments

the current format of the post (99% comments by now?) makes it virtually impossible to tell what it originally said...

Yeah, I was personally a bit annoyed by the fact that he duplicated and interspersed my response into the original post along with his replies. Replying in a comment would have been better form, I think. *hint*

While his point is

While his point is interesting, he must be an Outlook user: comments are before the article, etc.

The result is quite obscure, bleh.

Outlook User?

No. Just busy and sloppy.

No Sense

And no, I can't make heads or tails out of the original blog post anymore either.

Why Not Both?

In my day job working on databases I use transactions AND message-passing all the time. Transactions need to be kept small and fast or else performance and locking problems start affecting performance - and they aren't easily predictable because different users accessing a common store (database) with different requirements can interact in obscure ways.

The answer (in my experience) is to reduce shared state to an absolute minimum, use transactions with their Undo, Retry or Leave facilities to handle clean updates to global state and then message passing (via that state) for the top level processes.

It seems to be that the real crux of this argument is that STM in an imperative language like Java would be a terrible idea, but not in a language like Haskell in which the state that STM co-ordinates is both minimised and compartmentalised.

Good Point, IMHO

Thoughout this thread, I've been mulling over how the issues Patrick raised over STM, and what I believe to be the more reasonable responses to them, relate to Functional Relational Programming: Out of the Tarpit, which I still need to make time to go over in more detail.

I think you nail it in your

I think you nail it in your last paragraph.

Why terrible?

I wouldn't be so quick to assume that this is such a "terrible idea" in Java, at least from a technical perspective (unless there are technical barriers that haven't been raised yet).

Quite a bit of the use of Java in server-side corporate/enterprise work is quite stateless and transaction-oriented, and relies largely on the use of traditional databases to achieve that. As main memory gets bigger & cheaper, having ways to replace some of that database usage with similarly-architected in-memory solutions could be useful. Just one example.

In Theory

In theory.

Exactly!

I might be the one exception, or maybe it's because the only exposure to STM (in a practical sense) is Haskell, but I never even considered STM and message passing to be competing technologies!

I've always assumed that the consensus was "use implicit-ish approaches for parallellism where possible, use message passing for concurrency where suitable, and use STM at the very 'bottom' when you absolutely need shared state".

It seems like this whole argument here is about choosing "either or", and that it completely disappears once you look at it through the lens of a language like Haskell -- functional without state or with local state for the majority of the cases, message passing concurrency for most of the concurrent stuff underneath, and every now and then shared state with STM.
In my opinion that gives a really good toolkit which doesn't limit you in what kind of applications you can write conveniently. I think Tim Sweeney gave very good examples of when message passing just doesn't do the job -- can we please stop pretending that either approach solves all problems? They complement each other nicely -- people are just extra excited about STM because it very nicely solves the remaining hard problems after you've started using message passing everywhere it makes sense (locks, monitors, lack of compsoability etc. in the cases where you absolutely do need shared state). I'm currently working in the games industry as well, so it may be that games are just inherently ill-suited for message passing, but I doubt it; I'm sure lots of applications would do just fine with just message passing, but shared state will always be needed here and there.

STM in Erlang?

Hm, does that mean it would be great to have STMishness implemented for Erlang, to get the best of all possible worlds?

When I say STM I should be

When I say STM I should be clear that I mean "STM in Haskell". For example you can statically guarantee that only transactional actions can be performed in an "atomic" block, no others, and you can't run transactions without "atomic" either. (So all of those "people will forget and cause bugs, or abuse it and cause performance issues" complaints that have been raised, are dead wrong -- it can only be used for transactions thanks to the type system and it can never be forgotten and cause bugs, again thanks to the type system).

So to answer your question, I do think think STM comes to its right in a pure and statically typed setting, so Erlang wouldn't be my first choice. Haskell comes very close to the "ideal" (it already has STM and message passing, and purity). In order for messages to be truly useful we may need some more help from the type system (e.g. better records? union types?).

STM in Erlang

Interesting to see some discussion around implementing (something like) it in Erlang.

Would STM even be possible in Java?

It seems to me that STM requies a pure language. As I understand it, STM requires that the transactions can be retried as many times as necessary with the same meaning each time.

In Haskell, encapsulation within the STM monad ensures that the transaction has no dependencies other than the STM variables. Would this be possible to enforce in a language like Java?

Could it be done using a pure sublanguage (based on Featherweight Java perhaps)? Ignoring the issues of mutable state outside of the system (e.g. I/O), would treating the entire heap as an STM store be feasible?

Apparently

Somebody thinks so.

TFA^2

The article which seems to have triggered all this also describes how it might work in Java (e.g. "how an atomic statement could be introduced and used in an object-oriented language such as Java.")

Note that two of the authors of the article are from Intel, which has an obvious interest in promoting solutions that might help with multicore architectures.

The main problem

The main problem which languages other than Haskell face in implementing STM is defining a sublanguage of things which are permissible in transactions and can be rolled back transparently in case of a retry or exception, and then ensuring that nothing else accidentally lands in a transaction. You can include more than just interactions with transactional variables in this -- actions which simply read from the environment without affecting anything would also be okay. A language like Java would almost certainly need some extensions to its type system to support this. You could do it in dynamically typed languages like Python and Ruby as well, but you do need to take extra care. Personally, in cases like that, I'd probably want to compile the whole transaction and make sure it's okay before running any of it. Throwing an exception at the first bad action (and not performing it) would also work, but makes me feel somewhat uneasy. That might just be my tendency toward static guarantees though.

Setting aside the need to check that transactions really consist of valid actions, yes, you could go ahead and make every variable a transactional variable. That seems a little strange to me, but in principle it could be done. You'd really have to work on making the transaction machinery lightweight. Every variable update or read not already in an atomic block would become a small transaction in itself. Note that atomic blocks might be running optimistically that have read the variables you're modifying, and will have to be notified that they must restart, because something has trampled on their read set.

You would probably want whole assignment statements like:

x <- f(x) + g(y)

to occur as-if-atomically, so this might translate at a lower level into something along the lines of:

atomically
   x' <- read x
   y' <- read y
   write x (f(x') + g(y'))

where read and write are the primitive transactions for accessing transactional variables, so x' and y' are the actual data contents of x and y respectively. (This translation obviously depends on a whole lot of properties of your actual language. Here I'm assuming that f and g want actual values and not references as parameters, and that they are not themselves variables.)

The only trouble I see with this aside from the obvious performance concerns is that it does almost nothing to push users in the direction of limiting the state that they share. When you have to explicitly construct transactional cells, and hand them to your threads, it gets you to think about something which is fundamentally important to the design and well-being of your program. Then again, so do lots of features of Haskell which people using other languages like to ignore. ;)

STM locks and byzantine shared state

I've got some comments from personal experience of implementing a "kind of" STM in Askemos - a somewhat unusual distributed environment (somewhat akin to Erlang anyway), where STM suddenly makes sense.

I don't see shared state as a Devil

I may be naive, but I don't see shared state as a Devil to eradicate.

On the other hand, I do not believe that programmers should ever have to deal with locks-- they are way too easy to mess up.

So I will take any idea that allows parallel programming without locking. I like the idea of rollback and retry, but I am not sure I want to do that any time that a sharing conflict arises. Sometimes I may want to keep whatever value was written last, or first, or the one matching a predefined criteria without retrying the "losing" transaction. In fact if I was to chose I would not make rollback and retry the default.

STM may be composable with itself, but is it composable with other concurrency mechanisms? For example, how do I "rollback" message passing? Do I need to delay it until the transaction commits? Is it illegal to receive a message during a transaction? What if the program decides to parallelise transactions in a loop (should be safe), causing unnecessary conflicts?

Isn't it about cost?

I mean, all design choices ultimately come down to which costs you are willing to pay and which ones you are trying to minimize, right? In an ideal world, memory and CPU would be infinite, all data would be immutable and every computation would be reversible, right? Programming languages are simply compromises on that ideal that allow them to be implemented on physical computers with real constraints that sometimes dictate practical and impractical design choices.

Now I'm the last person in the world that could be called an expert on concurrency, so I need a simple, concrete example that I can wrap my head around. In the game World of Warcraft, there are mining nodes which players can run up to and mine, receiving raw materials as reward. However, the resource is limited and mutually exclusive, so only one person can mine a resource at a time, and it can only be mined four times or less before disappearing.

I'm imagining a game engine that would support this feature in which each player was represented by a dedicated thread/lwp. I'm trying to figure out how STM and MP compare in this task, so my simplistic understanding leads me to these two designs:

For the STM design, each player thread attempts to access the mining node optimistically. If they are the only player attempting to mine the node, they succeed, and a packet is sent from the server to the client indicating success. If two players attempt to mine the same node, one of them succeeds first, and sends a success packet, while the other one fails and must roll back, sending a failure packet. This scales up to N players.

For the MP design, one thread must "own" the mining node. Since it doesn't make sense for any of the player threads to own it, there must be some additional "environmental" thread which owns it. In the single player case, the player thread sends a message to the mining thread asking to mine the node. The mining thread sends a success message back to the player thread, and the player thread sends a success packet back to the client. In the simultaneous mining case, one player thread message reaches the mining thread first and wins, getting back a success message. The other message is rejected with a failure message.

They both seem like perfectly reasonable approaches to the problem, with the difference being that the MP solution requires an additional thread (which may already exist for other reasons). Furthermore, even in the non-contentious case, the MP version requires communication between two threads, whereas the STM version is basically as fast as a single-threaded implementation. In the highly contentious case of, say, 20 people all trying to mine the node at the same time, the STM case would force a potentially expensive rollback for all but one of the threads, while the MP case would simply send 20 message both ways (but at the cost of serializing the message processing, since the mining thread can only process one message at a time).

From this, I naively conclude that STM is probably faster in the optimistic case, and that MP probably degrades better under contention. If you know a priori whether a given interaction is likely to have low or high contention, then it seems to be a matter of sound design to choose one or the other, rather than dogmatically insisting on One Concurrency Solution To Rule Them All.

But, I'm open to being convinced otherwise.

P.S.

I forgot to mention that Erlang users are probably more likely to favor MP because threads are extremely cheap, while Java users are more likely to favor STM because threads are considerably more expensive. So it isn't necessarily that one group is right or wrong, but that the intrinsic costs involved make one solution or other more or less favorable. But maybe I just don't know what I'm talking about.

Intel STM compiler now available

We have a lot to learn before we can decide whether STM offers some relief from locks (they are NOT going away) and offers help for programming, or for tools which compose programs automatically.

We think that the existence of a C/C++ compiler supporting Software Tranactional Memory (STM) would be a great help. So...

Today, we released a prototype version of the Intel C/C++ Compiler with support for STM. It is available from Whatif.intel.com.

The Intel STM Compiler supports Linux and Windows producing 32 bit code for x86 (Intel and AMD) processors. We hope that the availability of such a prototype compiler allows unprecedented exploration by C / C++ software developers of a promising technique to make programming for multi-core easier.

This prototype compiler download requires that you already have the Intel compiler installed - our web site explains how to get an evaluation copy (free for limited time) for the compiler to install, or Linux users can get a 'non-commercial' license - either are enough to use the STM compiler download.

Releasing this compiler offers an opportunity for exploration and learning. I think we should all hope that it helps understand the promise or myth of the value of STM better. The opinions certainly run the full range today - as this blog indicates all too well!

Erlang / Actor Model + STM

There has been some recent discussions on the benefit and mechanisms of supporting transactions even in Actor Model languages such as Erlang.

I think there are plenty of good arguments for avoiding shared memory process models (where the primary vocabulary is 'get' and 'set'), but the Actor Model doesn't avoid shared state because each actor may be stateful. Thus, transactions still have a very important place when it comes to maintaining consistency and making concurrency and complex negotiations or coordination events manageable for programmers.

Erlang model of concurrency by no means reduces the value of transaction-based coordination... effectively STM+Actor Model.

[Edit: I've since written a page on this subject; I believe it well enough formalized to implement.]