Seeking broad survey of wide range of concurrency idioms

Ideally, I'd love to find something like Wilson's GC survey, covering everything from various low level mechanisms such as raw thread/lock manipulation, condition variables, etc. up to much more programmatic stylized idiomatic approaches to concurrency - Erlang style actors, internal pipes or what I have heard called "flow based programming", co-routines and inversion of control strategies (not just performance based approaches).

Also I'm interested in "failed" or, rather, currently unpopular approaches such as parallel-apply (Connection Machine Lisp ?????) and so on and so forth.

While not essential, it would be great to have examples explained in the context of what can be done in a strongly typed language.

Any and all pointers to such materials greatly appreciated.

Scott

Comment viewing options

Not many "We tried method XXX and it wasn't so Hot" papers.

I'd love to see such a paper myself.

Sadly it will be a collection of "Method X is cool because" hype rather than "we burned several million dollars using XXX idiom in an industrial scale system, and it was always flaky and hard. Things got better when we ripped it out and did YYY"

(Ps: We burned lots and lots on slapping Java synchronization blocks around everything we believed needed them and the result was slow, buggy and unreliable. We ripped them out and went to event queues and things got better.

We're using event queues in our next system... and it only burns us when some sod breaks the paradigm.

I burned man months on Ruby threads and Condition Variables, things got better when I ripped them all out and did processes and "select" instead.)

Sadly it will be a

Sadly it will be a collection of "Method X is cool because" hype rather than "we burned several million dollars using XXX idiom in an industrial scale system, and it was always flaky and hard. Things got better when we ripped it out and did YYY"

The reason people rip things out in industry settings has to do with software not providing a factor of 3 or better margin of safety, which is industry standard safety guarantees.

London Stock Exchange started ripping out its entires Microsoft stack 3 years ago, including .NET, and is now running pure Linux as of about a month ago. Why? margin of safety, related to no good service buses with integrated complex event processors in .NET, etc. ... and Windows Server inferior TCP/IP stack.

Practical concurrency idioms

Shared State Concurrency: The C/C++/Java/C# approach. Any state can be mutated by any thread at any time; the threads are responsible for using synchronization primitives (locks, mutexes, etc) to coordinate accesses to share state.

Communicating Sequential Processes / Message Passing Concurrency: Erlang's approach. Each process can only access its own local variables; processes communicate by sending messages to each other.

Pure Functional Programming: All functions are free of side-effects, so in any piece of code, its subexpressions can be evaluated in any order -- or concurrently -- without changing the result of the computation. This can be generlized to Lazy Functional Programming, where thunks represent yet-uninitialized values and causing threads that accesses them to suspend until the result is available. It can be further generlized to Concurrent Constraint Programming, where thunks (futures) can be specified (narrowed) at places other than their declaration site. When an uninitialized thunk is encountered, only the code that directly depends on it is suspended, while execution continues forward. The next step is Concurrent Constraint Logic Programming, adding backtracking as a means of exploring potential execution paths in a program concurrently.

Transactions: The programmer subdivides effectful code into delimited blocks, and the language runtime is responsible for running the blocks concurrently while guaranteeing that each one operates atomically (without interference from other blocks), dynamically resolving dependencies.

Massively parallel implies atomic OO design

Tim Sweeney wrote:

Shared State Concurrency: The C/C++/Java/C# approach. Any state can be mutated by any thread at any time; the threads are responsible for using synchronization primitives (locks, mutexes, etc) to coordinate accesses to share state.

Isn't massive parallelism of gaming implicitly going to require atomic orthogonal OO:

http://lambda-the-ultimate.org/node/2048#comment-51596 (my new post in an old thread)

The race condition shouldn't be avoided, rather embraced algorithmically as natural in the OO class model of real massively parallel world. Then transactions (multiple object atomic locks) are not desired nor needed. Am I missing something (please see linked post)?

Am I missing something

Just below shared state concurrency, message passing is listed. This corresponds to the flavor of OO you seem to prefer, which has its roots in real-time embedded systems engineering. It is based on ultra-simple design principles like finite state automata and organizing "classes" at the analysis stage according to the problem domain, and specifying which classes collaborate with each other via navigation relationships. In this model, behavior is normalized to BCNF as pure structure and data is normalized to BCNF, allowing the model to be checked for counter examples / conceptual inconsistencies.

This is more systems design, though. A major goal of programming language design is to figure out how to represent this stuff at the implementation level.

Actors / OO Concurrency + Transactions

The race condition shouldn't be avoided, rather embraced algorithmically as natural in the OO class model of real massively parallel world.

I'd much rather my programs be easier to compose and comprehend than is the real massively parallel world...

Composition means the ability to construct a new service by referencing two or more other services and build contracts between them to benefit a user.

The ability to achieve a complex contract in multiple steps is extremely useful (since it allows negotiations as part of building the contract).

The ability to achieve a complex contract atomically is also extremely useful (since it avoids partial failures and half-built services).

The ability to achieve a complex contract in isolation is also extremely useful (since it allows programmers to grok code locally, without knowing or considering all possible external interference).

To achieve both of these properties together, without risk of deadlock, I can think of no simpler solution than use of transactions.

For transactions to work across code-ownership and module boundaries, they must be part of the language.

A few references: Process Calculi for transactions [LtU node], Transactional Actor Model [c2]. The latter is my own work, and has a simple motivating example.

Per-instance mutex granularity

I have further theorized that typical algorithms in game genre require per-instance mutex granularity, whether they be incrementally per-instance thread atomic or multi-instance transactionally atomic:

http://lambda-the-ultimate.org/node/2048#comment-51617

Z-bro wrote:

A major goal of programming language design is to figure out how to represent this stuff at the implementation level

But the algorithm has to well match the language's realistic parallelism (optimization) capabilities. For example, the conceptual intrique over lenient vs. lazy and functional programming versus imperative, seems to be an attempt to find the Holy Grail of one language that can optimize all algorithms. We may be able to find larger partitions of algorithms and language expression that optimize well, but it seems unlikely we will find global compiler optimizations that outperform all cases of simple thinking about matching algorithms to language a priori.

dmbarbour wrote:

I'd much rather my programs be easier to compose and comprehend than is the real massively parallel world...

Composition means the ability to construct a new service by referencing two or more other services and build contracts between them to benefit a user.

The ability to achieve a complex contract in multiple steps is extremely useful (since it allows negotiations as part of building the contract)...

Per-instance atomicity of algorithms should be the easiest to understand and compose, because the entropy has been maximized, which according the 1856 statement of thermodynamics, "the trend of the universe is maximum disorder", so then we are not fighting against the universal trend of matter, thought, evolution:

http://esr.ibiblio.org/?p=1271#comment-240632
http://esr.ibiblio.org/?p=1271#comment-240650
http://esr.ibiblio.org/?p=1271#comment-240672
...(Eric Raymond's blog, author of Cathedral & Bazaar)

As soon as you introduce global state (decrease entropy), it will cascade and cause breakage of composition at the per-instance (class) granularity.

Think about how we interact in the real world, most of productivity comes from infinite permutations of interactions (i.e. freedom). See the comparison table here for illustration of freedom:

Bell Curve Economics

Those institutions that try to become the global organizers and be God/Universe (i.e. society, govt) eventually exponentially peak, break, and decay (waste 99% of the productive capability of the world):

Analogously for global contract algorithms, this is why we are always rewriting code. We need to minimize these global contracts in order to maximize code re-use and composition.

Writing algorithms that propagate incrementally between autonomous actors should not be impossible. We program each actor to independently model it's incremental interaction with it's environment. But how to handle instance ordering in an algorithm that needs to propagate globally? In nature, these things happen in parallel instantly, so perhaps order doesn't matter if we have one core (thread) for each instance. Hmmm, thus multi-instance transactions are necessary because we do not have enough cores, not because we have too many cores-- an opposite way of thinking about the future.

Then if I am correct, composition is writing a new class actor. That actor can not break any other actor already programmed in the environment.

Perhaps we need to be thinking more about algorithms for the next genre of multi-core, and not get too far off track on the Holy Grail of minimized entropy (the perfect one-size fits all algorithms language paradigm), which on rough glance looks to me to be a false God/Universe/Idol (2nd Commandment). There is only one universal truth and that is trend to maximum entropy/disorder, i.e. maximum number of independent actors and permutations of interactions. Along the way, we can get off on local order. Local orders (code we throw away every few years) is a natural progression towards the universe trend of maximum disorder.

Apologies in advance if this post is overly philosophical. I tend to think in terms of paradigm shifts, as these have the biggest impact on computing. Each of the greats (e.g. Tim Sweeney) made their mark with a big paradigm shift (often by luck and not forethought). I think the big paradigm shifts are usually at a higher level than the language.

Philosophical macro view

I am thinking about distributed computing and it's application on the organization of projects:

http://esr.ibiblio.org/?p=1332&cpage=1#comment-241469

Multi-threading problem space validates My Theory of Everything?

http://lambda-the-ultimate.org/node/2048#comment-51638

Interesting. I wonder if anyone else will latch on to the concepts.

Global?

Again, traditional Actors Model, from 1973, matches your notion of 'per-instance mutex granularity'. Technically, there isn't any need to instantiate mutexes, only to ensure that no more than one message enters a given actor at any given time (which may be done using lock-free algorithms and judicious use of single-pointer-width compare-and-swap.) One may optimize a bit further by recognizing stateless actors and allowing them to process any number of messages at once.

As soon as you introduce global state (decrease entropy), it will cascade and cause breakage of composition at the per-instance (class) granularity.

It is unclear to me how 'global state' is implied by an argument for transactions, especially in context of just having offered you a reference to discussions on transactions for actors model and process calculi.

However, it's worth pointing out that actors capture more than a little state, and that any notion of 'global state' corresponds to a bunch of actors sharing a common reference.

Per-instance atomicity of algorithms should be the easiest to understand and compose, because the entropy has been maximized, which according the 1856 statement of thermodynamics, "the trend of the universe is maximum disorder", so then we are not fighting against the universal trend of matter, thought, evolution

There seems to be a logical leap here. I imagine a man setting his house on fire and saying, "It's easiest to build houses this way, because I'm not fighting the universal trend of matter, thought, and evolution. Embrace entropy, dude!"

Technically, there isn't any

Technically, there isn't any need to instantiate mutexes, only to ensure that no more than one message enters a given actor at any given time (which may be done using lock-free algorithms and judicious use of single-pointer-width compare-and-swap.)

Right. This is called run-to-completion semantics. As for your judicious use, this problem must be solved at two levels. First, if the problem domain is wrong, you cannot easily build an asychronous state machine and process events off a queue - it's too easy for an event to be corrupted due to violating RTC semantics. For a great discussion of pitfalls implementing state machines, see Ch.6 of Miro Samek's book Practical Statecharts in C/C++.

At the implementation level, if you maintain your code at this level then there is much risk for logic errrors that cause incorrect event dispatcher thread behavior. That's why MDA tools exist for translation from a model to a c++ code implementation.

Global is caching instance state external to owning instance

Again, traditional Actors Model, from 1973, matches your notion of 'per-instance mutex granularity'.

I tried to quickly read that 1977 MIT paper and couldn't find a quick list of the laws they proposed, so I am not sure if I agree or not. But any way, I think the more relevant point is that generalized (there is global state) multi-threaded interactions between actors can not result in always deterministic outcomes.

Technically, there isn't any need to instantiate mutexes, only to ensure that no more than one message enters a given actor at any given time (which may be done using lock-free algorithms...

That is equivalent to a lock, i.e. can still cause a circular deadlock state.

It is unclear to me how 'global state' is implied by an argument for transactions...

However, it's worth pointing out that actors capture more than a little state, and that any notion of 'global state' corresponds to a bunch of actors sharing a common reference.

I am differentiating between state that is stored and owned by an instance, i.e. it's data members, and global state which is any instance state cached outside the instance (that owns the original copy of the state, possibly cached in another instance) which is what causes concurrency errors due order of operations. So if actors have data about themselves, that is not global state, but if they store (cache) data about other actors, that is global state. Note that elimination of global state makes it impossible to get circular deadlocks with "no re-entrant methods" (the automated aspect of my per-instance mutex idea), because the only atomic locks exist for the duration of the method call only (since no global state is cached, then no atomic blocks of imperative code are created).

There seems to be a logical leap here...

The point I am making is that the less entangled state is (i.e. less cached), then the more permutations of interactions are possible asynchronously. Or looking at it from the functional programming metaphor, the order of evaluation of interactions between actors does not matter when there is no state cached. Functional programming in the pure form 100% eliminates shared (cached) state. I need to think more about how if possible functional logic (calculus) could be used to encapsulate independent actors-- I think we will find that we need gradients between where we are now (single thread) and where we want to be in future (infinite threads, no state and incremental algorithms that model the atoms in the world without any global cached state).

The discussion is helping me see things, thank you.

No global means "shared nothing"

Jeff Darcy wrote:

...but then thereâ€™s a split between those who propose STM as a solution and those who propose more Erlang-like â€œshared nothingâ€ approach based on messaging. Every once in a while, but too rarely in my opinion, somebody seems to notice that putting operations under a lock and putting them inside a transaction are actually very similar. Transactions offer some desirable properties such as isolation and composability, but the two approaches have much in common and require similar diligence from the programmer. Races, deadlock/livelock, starvation, and so on are still entirely possible...

...People given messaging often end up implementing their own kind of distributed state sharing, whether they realize theyâ€™re doing it or not, and almost invariably implement it poorly so that they have stale or inconsistent data everywhere...

The enemy of concurrency is cached shared state. Period. Whether you exchange the state via messages or not is irrelevant.

Messages and Replies are Global State

I am differentiating between state that is stored and owned by an instance, i.e. it's data members, and global state which is any instance state cached outside the instance (that owns the original copy of the state, possibly cached in another instance) which is what causes concurrency errors due order of operations.

So if actors have data about themselves, that is not global state, but if they store (cache) data about other actors, that is global state.

To be clear, then, you're including all messages (including replies) whether they be in-transit or actively processed, as 'global' state. Right?

Machine main memory is equivalent to an enormous list of cells each possessing 'get()' and 'set(Value)'. Any operations on a reply from 'get()' would qualify as 'global state' according to your above definition. Further, the message to 'set(Value)' is likely influenced by local state (properties of an actor instance) and likely influenced by whatever message the actor was processing.

Even if you avoid such 'obvious' stateful designs as emulating machine main memory, you can't avoid 'global state' according to your above definition. After all, there is no purpose or value for local instance state EXCEPT to influence outgoing messages (including replies). If you don't want to use local instance state in a manner that produces cached results of its influence on the environment, you simply can't use local instance state at all.

generalized (there is global state) multi-threaded interactions between actors can not result in always deterministic outcomes.

Even for interactions among actors possessing only local instance data, it is easy to have non-deterministic outcomes.

Note that elimination of global state makes it impossible to get deadlocks with "no re-entrant methods" (the automated aspect of my per-instance mutex idea), because the only atomic locks are method calls (since no global state is cached, then no atomic blocks of imperative code are created).

To be clear, a great many OO design patterns (observer pattern, frameworks, visitor pattern, etc.) make it difficult to statically eliminate reentrant methods. It is feasible to eliminate some of these patterns via judicious application of other language features (e.g. you don't need visitor pattern if your language has an alternative to OO (such as functional) for domain and data modeling).

I need to think more about how if possible functional logic (calculus) could be used to encapsulate independent actors

Look up Erlang language.

where we want to be in future (infinite threads, no state and incremental algorithms that model the atoms in the world without any global cached state)

You'll have more than a little trouble colliding atoms together without global state helping atoms determine which other atoms are nearby with which they might collide... (which, again, is getting into my position against OO for domain modeling).

My global state is relative to errors we can kill

See my post immediately above yours, we were both composing at same time.

...you're including all messages (including replies) whether they be in-transit or actively processed, as 'global' state. Right?...

...Machine main memory...would qualify as 'global state' according to your above definition...

...you simply can't use local instance state at all...

No. Please see this post:

http://lambda-the-ultimate.org/node/2048#comment-51638

My definition of global state pertained to the ability to avoid circular deadlocks, and starvation due to rollbacks.

Perhaps you are referring to the possible "out of order" execution errors (no longer deterministic in order of execution) in logic that can domino from multi-threading in general, which is impossible to avoid (except in useless/stateless pure funtional logic calculus). Refer again to the linked post above. We have to accept some probability of error in stateful programs, even if they are single-threaded. If they are multi-threaded, we have out-of-order state machines error risk (even more regression testing will be needed, all things being equal...which is not acceptable!). Other than pure stateless functional logic programs (which practically do nothing!), the way to minimize this is careful stateful OOP design, as it is with single-threaded programming. I have suggested that the per-instance locking is most natural (best match) for some programming tasks. I have also suggested that when the state machines are contained within their own instances (no cached state external to itself), then the out-of-order execution error risk becomes much more manageable if it forces us to think of algorithms which do not depend on order-of-execution of actor interactions. Also if we don't receive and cache external instance state, then we do not need to manually declare atomic blocks of imperative code, because the atomic operation is the method call (assuming our language insures methods are never re-entered by another thread). Again this level of atomic concurrency protection will not prevent all out-of-order execution error, we never will with any concurrent methodology other than pure stateless functional logic programs.

Perhaps we should call this out-of-order perturbation, since it is only error if we programmed our OOP with the assumption certain order of execution. So I guess what I am saying is we need to select the granularity of "out-of-order" possibilities that is most natural to our thinking about the algorithms we are using.

We should be thinking much more about algorithms that do not depend on order of execution, than about crutches (STM, marriages of functional programming with monards, etc) that do nothing to eliminate out-of-order execution risk unless they add deadlock or starvation risk.

In short, we need a model that insures we never get deadlocks or starvation (analogous to insure garbage collection that never leaks memory, even with circular references), and which helps us think naturally about algorithms that scale naturally to out-of-order execution. Shorter still, we need to program for out-of-order execution.

Functional logic probably has a role to play, where the side effects are isolated to the per-instance in my suggestion, but I do not understand this well yet.

Even for interactions among actors possessing only local instance data, it is easy to have non-deterministic outcomes.

Agreed. Errors also occur in single-threaded programs. I guess the goal is to make the scaling of multi-threaded out-of-order execution errors manageable (and debug-able) within the programmers resources.

...a great many OO design patterns (observer pattern, frameworks, visitor pattern, etc.) make it difficult to statically eliminate reentrant methods.

Good point to raise, but in what I suggested, we are only concerned about re-entry by another thread than the one which originally entered the method (has the mutex lock or however you implement the exclusive access), so re-entry by the same thread would not be a problem. Not sure if that can help solve that. Also, can't this problem simply be avoided simply by having the thread that interfaces with external code spawn a new thread to do actor interactions in our internal scene graph?

You'll have more than a little trouble colliding atoms together without global state helping atoms determine which other atoms are nearby...

Afaics, the list of actors is not my definition of global state, unless we need write locks (or optimistic transactions) on the list, as these can lead to deadlocks (or starvation). My point was as the interaction becomes infinitesmal series of incremental nuclear forces, then there is no higher-order shared state needed external to each atom (actor). The actors then need to know almost nothing about each other, just a few simple forces they react to each other orthogonally (without any persistant state cached about each other).

See figure 4 on page 8 of Programming Paradigms, I assert atoms are on the "unorganized aggregates region".

Applicable programming paradigms for out-of-order execution?

For global logic external to my suggested ideal of incremental propagation (no order-of-execution assumed) algorithms where instances can decide about themselves autonomously:

Consider "Constraint Programming" (e.g. AI rules) on the left side (most stateless) on Figure 2 on page 8 (13) of Programming Paradigms. Afaics, rules can be applied irrespective of order, e.g. "human instances at dark increase the randomness of their shooting accuracy". So at night, the rule will update the "shooting randomness instance state" for all human instances, but there is no order dependency on this update. As night starts, humans are updated out-of-order, but will this affect gameplay? Even if we have forced an order it would make no difference, except if we had so many humans in the scene that ones not currently seen would be updated before those visually seen, thus causing a skip of the realism in the gameplay. We could alter the rule to update humans instances in order of proximity to visual portions of the scene, thus there could be potential out-of-order errors (if updating the shooting randomness or other thread activities caused domino side effects on object order relative to scene occlusion) that would be no worse than the original out-of-order rule. But how likely are we to be able to control these out-of-order effects in a single-threaded program any way? Even in a single-threaded program, we have multiple competing priority tasks and we are probably choosing them arbitrarily and assuming they all execute within a few frames, so that out-of-order effects are not noticeable.

In the case of real-time games programming, the killer out-of-order effects we need to avoid are those that diverge the gameplay, not just cause a temporal skip of less than 50 - 100 ms.

Global state and Concurrency

We should be thinking much more about algorithms that do not depend on order of execution, than about crutches (STM, marriages of functional programming with monards, etc) that do nothing to eliminate out-of-order execution risk unless they add deadlock or starvation risk.

You should also be thinking about the cost of designing and interfacing with these algorithms that do not depend on order of execution. It has been my experience that local locks, for the purpose you describe, require an ever-growing 'vocabulary', where the vocabulary describes 'negotiation rules' (as a sort of tree-based decision), which prevents any given design from ever being complete. These negotiations are also limited to two objects at a time.

For example, with STM I could use 'getBalance' and 'withdraw' and combine them into a correct program that withdraws half my balance. I have no need to modify vocabularies. But, with the local object lock, I need to add something like 'withdraw_half' or (being a bit more forward thinking) 'withdraw-fn(Function of money)'.

This withdrawal-fn accepts a simple negotiation, but more complex examples tend to require more complex negotiations.

we are only concerned about re-entry by another thread than the one which originally entered the method (has the mutex lock or however you implement the exclusive access), so re-entry by the same thread would not be a problem

Even recursive algorithms must be reentrant.

have the thread that interfaces with external code spawn a new thread to do actor interactions in our internal scene graph

Not a bad idea if the 'thread' is kept cheap enough to spawn them rapidly.

My point was as the interaction becomes infinitesmal series of incremental nuclear forces, then there is no higher-order shared state needed external to each atom (actor). The actors then need to know almost nothing about each other, just a few simple forces they react to each other orthogonally (without any persistant state cached about each other).

It is my understanding that with such a simulation there are electrostatic forces and gravitational forces and magnetic forces that aren't readily being accounted for in the dimensions of space and time... and many of these forces result from interactions and motion of many atoms (e.g. magnetic fields caused by charged particles moving at a high velocity).

Even when working with 'infinitesimal' models of domains, it tends to be the case that domain objects influence both an 'environment' and other domain objects. And, realistically, most programmers are uninterested in programming domain models on the level of atoms.

Design priorities

For example, with STM I could use 'getBalance' and 'withdraw' and combine them into a correct program that withdraws half my balance. I have no need to modify vocabularies.

But you have a risk of fatal circular deadlock (if you are using locking) or divergent starvation (if are using optimistic rollbacks). If you can quantify that risk as "almost never" or "occasional temporary starvation performance hit, that usually reconverges", then you have a competitive paradigm to compare against other options.

But, with the local object lock, I need to add something like 'withdraw_half' or...

Yes you need to design for the level of out-of-order execution perturbation that you can tolerate. See the post I made immediately above yours as I was again composing at same time you were. Is a proliferation of class interfaces a bad thing? Doesn't it aid composability? And the proliferation could be mitigated to the extent our algorithms do not care about order-of-execution, which is ultimately what we need any way in order to minimize errors even in a single-threaded program that reacts to arbitrary external I/O and internal scene events. Can even the single-threaded programmer visualize all the order-of-execution of his code paths? That is why we need so much testing and this will only increase with massively distributed games over LAN or WAN.

In both methods, we have to quantify the risk of the algorithms we are using.

Even recursive algorithms must be reentrant.

By the same thread no? Any way, you agreed that perhaps a light-weight thread creation could be a solution.

It is my understanding that with such a simulation there are electrostatic forces and gravitational forces and magnetic forces that aren't readily being accounted for in the dimensions of space and time... and many of these forces result from interactions and motion of many atoms...

I am thinking we have no hope of modeling this if we have to integrate over all atoms in the scene (in the universe for that matter, since nuclear effects are known to extend across extended space and time), rather we have to hope there is some autonomous model where each force or chunk of matter can model it's own interaction relative to the actors it considers. We know (thus far) that quantum mechanics is probablistic, thus a probablistic incremental algorithm may apply. For example, let's say we model each human in our scene according to his/her inputs and outputs, and we process these fast enough in real-time, that the order-of-operations is not visible in our frame rate, then the only question is the random order-of-operations going to affect the outcome of the gameplay? But isn't that the whole point, that the game should be as realistic as possible, thus non-deterministic outcomes are desireable. For those cases where we want deterministic outcomes with respect to order-of-execution, e.g. your cats go feral example, then we must handle them adequately in our OOP design to insure the determinism we desire. My point is for many things we want to get away from deterministic outcomes as that is more realistic (more freedom, more permutations of possible outcomes, more entropy, more information, more knowledge gained, more evolution, etc).

Transaction Deadlock is not 'Fatal'.

But you have a risk of fatal circular deadlock (if you are using locking) or divergent starvation (if are using optimistic rollbacks). If you can quantify that risk as "almost never" or "occasional temporary starvation performance hit, that usually reconverges", then you have a competitive paradigm to compare against other options.

Transaction deadlocks are not 'fatal'. They can be detected quite easily, and selectively broken to ensure progress of at least one transaction. That's actually an advantage of pessimistic transactions with locking: you can guarantee progress of at least one thread (and you even get to choose which one!)

There are also hybrids of optimism and pessimism. For example, I could be optimistic, but become pessimistic on a retry. And, if I need to become pessimistic recently and often, I could use profiling and a bit of advanced analysis to intelligently decide between optimism and pessimism (automatically, much like branch prediction).

It is even possible to be optimistic for some resources and pessimistic for others, though there is a point where the overhead of tracking such details outweighs the benefits.

Transactions can even be hierarchical (such that one doesn't need to redo the whole for a partial error) or have automatic 'decay' barriers (such that some side-effects are atomic and the rest are simply messages delayed until post-commit).

Finally, it is the case that one can quantify 'starvation' as 'almost never', though it depends on the design of the program (how much read-write contention on a single point) and may further benefit from certain primitives (e.g. subscriptions) being specially treated.

Is a proliferation of class interfaces a bad thing? Doesn't it aid composability?

The need to grow vocabulary to achieve new and useful compositions is a very bad thing. It restricts composition across code-ownership barriers to whatever the originator had the foresight to imagine. It runs up against the 'expression problem', requiring you to locate and update across concrete classes. It increases coupling, since classes and interfaces become bound to the idioms in a particular program. It is not simple.

If objects are the unit of atomicity, classes have pressure to grow interfaces to (atomically) support complex 'negotiation' or 'command-patterns' that are, in essence, full DSLs, decision trees, functor objects, etc. Essentially, one ends up working around the limitations of the programming language by greenspunning a new one.

the proliferation could be mitigated to the extent our algorithms do not care about order-of-execution, which is ultimately what we need any way in order to minimize errors even in a single-threaded program that reacts to arbitrary external I/O and internal scene events

It is possible to achieve simplicity via transactions where you need ordering and not use transactions where you don't care about ordering, is it not?

Even recursive algorithms must be reentrant.
By the same thread no?

Yes. Even by the same thread.

My point is for many things we want to get away from deterministic outcomes as that is more realistic (more freedom, more permutations of possible outcomes, more entropy, more information, more knowledge gained, more evolution, etc).

'Transactions' do not achieve determinism. What they do achieve is 'isolation' for arbitrary-sized sub-program elements, thus allowing programmers to better understand the consequences of their code - i.e. to pretend that a given procedure executes 'instantly' with respect to the program logic of everything outside that transaction. The partial-order of transaction serialization is not deterministic.

Fatality vs. probabilistic tradeoff revisited

Transaction deadlocks are not 'fatal'. They can be detected quite easily, and selectively broken to ensure progress of at least one transaction. That's actually an advantage of pessimistic transactions with locking: you can guarantee progress of at least one thread.

Agreed as long as we accept the potential out-of-order execution error introduced by breaking them (with pessimistic locks we can't rollback the changes already accrued by the deadlocked threads that are chosen to lose their locks), I had written:

An optimization might be to do per-instance locking as per my prior post, then break deadlocks and accept the error. The advantage is over accepting the (...) error that can occur if we do no concurrency coordination as proposed in this post, at the cost of (out-of-order execution) errors being greater due to longer time elapsed before we detect and break a deadlock.

dmbarbour wrote:

The inability to pin down a vocabulary, or an 'ever growing' vocabulary, is a very bad thing. It hurts composability across code-ownership barriers, and it runs contradictory to keeping things simple.

Hmmm, I think maybe I disagree, but maybe I can be persuaded. I think the English language has never stopped growing. I think such growth is essential for adaptability. I visualize it helps composability across code-ownership barriers, if we get sub-classing correct as per Tim Sweeney's recent suggestion. If I don't have the API I need, I subclass and create it. Easy to work around any API composability limitation for my application. Where we don't want ever growing complexity is in the alphabet and grammar (i.e. the programming syntax). The only complexity cost of APIs that I can think of are the legacy of maintaining them, but if we've done our OOP well, this is isolated and eventually we can freeze our maintenance and create new super classes going forward.

Negotiations expressed may need to grow arbitrarily complex; if objects are the unit of atomicity, classes have pressure to grow interfaces to support complex 'command-patterns'...

Only to the extent that we are obnoxious about enforcing order-of-execution. My suggestion was in conjunction with a move towards more autonomous algorithms that are out-of-order execution tolerant.

Shelby Moore III wrote:

I am visualizing an algorithmic shift of mindset towards instance autonomy, where A calls B, B sets a few state changes places itself on the thread scheduler list, then returns. Later B gets a chance to do some computation on this new state in another random thread. My point is to well match the lock granularity to the algorithm, so that both order-of-execution and lock recursion are irrelevant. I like per-instance method duration locks because they can be automated by the compiler, then the code within each instance (most of the code in application I assume) doesn't have to worry about manual critical section declaration.

dmbarbour wrote:

It is possible to achieve simplicity via transactions where you need ordering and not use transactions where you don't care about ordering, is it not?

Agreed, thanks and I will now capitulate that there will be cases where STM is advantageous, when you have some order-of-execution needs that are not reasonable to clutter the per-instance API (vocabulary) layer with and when rollback is well formed, but read what I wrote below please.

'Transactions' do not achieve determinism.

But transactions are required by order-of-execution determinism assumptions inherent in the algorithms of the program. It may possible to eliminate them in cases where the algorithms can be modified to be tolerant of out-of-order execution. And I am making a philosophical point that such an algorithmic shift is generally desirable on the large scale of the universal trend of maximum entropy. And I think that trend applies to how we will re-use code more widely (distributed computing) and build new more granular models of code ownership, contribution/hosting, and maintenance. OTOH, nature needs determinism on it's march towards global randomness (i.e. imagine the gradient search for a valley with simulated annealing, which analogous to the cooling rate of ice determining how many fractures it has). Back on point, we must use the best algorithms and tools for the job at hand. I think games may be ready to make the leap towards less determinism in many facets (probably already are). But not everything can move to maximum entropy in finite time. So yes, we need both methods. I agree with you.

It is a pleasure debating/discussing with you. Most productive. Nobody's nose got bent out of joint :D

Agreed as long as we accept the potential out-of-order execution error introduced by breaking them (with pessimistic locks we can't rollback the changes already accrued by the deadlocked threads that are chosen to lose their locks).

This is incorrect. All transactions that are aborted to break a deadlock are, in fact, properly aborted and rolled back.

What you had assumed in your prior post (about "an optimization") is indeed a technical option, but unnecessarily violates the transactional properties of the system by disfavoring 'atomicity', for a marginal benefit to performance. You are more likely to benefit from a reduction to the 'isolation' or 'durability' property than to the 'atomicity' property (ACID).

transactions are required by order-of-execution determinism assumptions inherent in the algorithms of the program

Certainly, transactions can be used to achieve order-of-execution determinism. Nonetheless, they do not guarantee determinism externally, and they don't even guarantee determinism internally. I could write up an atomic block that spins up a few parallel computations, for example. Consider that nearly every transaction in Transactional Actors Model has concurrent elements: in traditional Actors Model, every object is running in its own implicit thread.

Transactions are a way for programmers to comprehend and decide correctness for a sub-program without worrying about interference from the various other sub-programs that are running concurrently.

Example: I use a transaction to create an object graph that needs to hook into the environment before it works properly, hook it in at several points, then end transaction.

As a programmer, given the above transaction, I can say for certain that there is no risk of a method-call from the environment while my object-graph is only half-installed. To the logic of other sub-programs, my object hooked into multiple locations simultaneously. I can also abort the transaction if I'm unable to install the entire object-graph, which offers me much more confidence in the safety of the installation.

Not without caching data for rollback?

This is incorrect. All transactions that are aborted to break a deadlock are, in fact, properly aborted and rolled back.

I was assuming that we were only using per-instance mutex locks, then afaics there is no way to rollback the instance data that has already been modified (by threads other than the one we will let continue after breaking) by the time we arrive at the deadlock.

I think you are assuming a model like STM, where rollback state is cached on each write.

What you had assumed in your prior post (about "an optimization") is indeed a technical option, but unnecessarily violates the transactional properties of the system by disfavoring 'atomicity', for a marginal benefit to performance.

It favors not breaking on external rollback dependencies (another form of error) and allows the operations to complete without rollback, accepting atomicity error in return.

And the same caveat applies as for STM, that rollback has to be well formed on any external dependencies.

I can also abort the transaction if I'm unable to install the entire object-graph, which offers me much more confidence in the safety of the installation.

To the extent that there are no external dependencies created during install, then rollback is the same as delete (thus rollback isn't necessary). If there are externalities, then rollback error is a risk.

Atomic Transactions

I think you are assuming a model like STM, where rollback state is cached on each write.

I'm assuming a model where transactions can be aborted without sacrificing 'atomicity'. Like STM. Like any serious transaction system. Actual implementation can be multi-version concurrency control (MVCC), snapshots with roll-forward from log, or any number of other approaches.

I honestly have never heard of a transactions implementation that doesn't support transactional abort.

There may be a few external resources where abort can't be fully performed, but even those can make a best-effort (which is no worse than best-effort of lock-based representations, and often quite reasonable).

Mutexes don't rollback

I honestly have never heard of a transactions implementation that doesn't support transactional abort.

I was referring to a theoretical automatic per-instance mutex so that all methods of the instance are blocked to other threads while a thread has entered a method of that instance. In short or in general, I was referring to a mutex lock on a resource. I was saying that any deadlocks which arise from this transactional (is atomicity a better word?) method are either fatal or incur atomic error violation if the deadlocks are broken. My motivation for avoiding rollbacks is described in posts below this one on this page.

=====
ADD: I was describing Monitor, mutual-exclusion above.

Don't Need Mutexes; Have Transactions

In general, use of transactions means shunning mutual exclusion operations (except for those the transaction manager decides to apply). The two 'concurrency control' methods usually aren't compatible.

What you discuss should probably not be called transactions. 'Concurrency control' would be appropriate.

Agree rollback transactions are useful, but we need both

Suggest we end this fork and continue at other fork further discussion/debate on my assertion that the two (or other combinations of) concurrency strategies can co-exist and adds to the utility, while decreasing the risk of "infinite code propagation", of using rollback-capable transactions, e.g. STM.

Rollback more than Delete

To the extent that there are no external dependencies created during install, then rollback is the same as delete (thus rollback isn't necessary). If there are externalities, then rollback error is a risk.

"Installing" an object-graph means handing object references to objects that exist prior to the transaction starting. One cannot just 'delete' the objects... even if 'delete' is allowed in the language, one must additionally clean up resources hooked into the environment. A 'rollback' will easily uninstall the object when aborting the transaction.

External dependencies are not a risk, since I cannot 'install' a reference to a language object into a system that is not already language-aware.

Not without external dependencies propagation error

To the extent that there are no external dependencies created during install

"Installing" an object-graph means handing object references to objects that exist prior to the transaction starting...

In my semantics, those are "external dependencies created". They are external to the installed object-graph. If you mean the entire pre-existing object-graph is also rollback compliant, then I understand why you define them as non-external. Again, in general instance cases rollbacks propagate the external rollback dependency into infinite code space.

External dependencies are not a risk, since I cannot 'install' a reference to a language object into a system that is not already language-aware

You are assuming your object-graph has no external referencing members, thus you've just broken future composability in that case.

You are assuming your

You are assuming your object-graph has no external referencing members, thus you've just broken future composability in that case.

It seems I was unclear. I did not mean to say there were no external referencing members ('installation' implies that there must be some such members). What I meant to say is that there is no transaction safety risk, since anything that knows how to carry an object reference (something remarkably language-specific) inside my language will also know how to deal with transactions inside my language.

Non STM aware external references

My point still stands that there must be types of external references for which the language or STM is not aware, else composability is broken. If the ability to reference things in new ways depends on language updates, then you do not have composability in that area of the programming paradigm. Maybe an example will make it more clear. The language (and thus STM) is not aware of distributed paradigm frameworks and protocols that reference external resources. I know you've made the point that everything every where (or at least "XOpen") should add STM awareness, and that is precisely my point of "infinite code propagation brittleness". May I suggest we carry on the discussion from this single fork in order to reduce the proliferation of forks and focus in our synopsis of debate.

Transactions atomize groups of mutable effects

Certainly, transactions can be used to achieve order-of-execution determinism. Nonetheless, they do not guarantee determinism...

Transactions are a way for programmers to comprehend and decide correctness for a sub-program without worrying about interference from the various other sub-programs that are running concurrently.

The relevance to the my point is that we should be striving to eliminate/reduce infinite propagation in the code space of dependencies on mutable effects (i.e. order-of-execution dependencies that have no bounds in code space), shifting the dependencies, to the infinite sampling in the space-time domain, which have finite bounds in code space (I suggested per instance boundary). Refer to my example that if A requests to modify B, then B doesn't need to complete the modification of itself in any order in time, and instead records the message and places itself on the thread queue. We develop such algorithms that shift the dependencies to the space-time domain and to finite space in the code domain.

Transactions mask the infinite code space sampling constraint, as it is shifted from order-of-execution dependencies to the rollback of external dependencies.

Transactions are not eliminating error, just shifting it (perhaps successfully in many cases near-term, but building up brittleness over time and code space reach).

The universal trend should be towards eliminating/reducing code space dependencies, not increasing them by spreading STM rollback globally and infecting all code with "it is okay to ignore and not algorithmically deal with out-of-order-execution on multi-threading".

Your notions of describing 'algorithms' in time-independent ways, or attempting to "shift to the time-space domain", strike me as a severe limitation of expression. This is similar to requiring programmers to work only with commutative function. It is true that commutative functions can operate in any order and return the same result, but they don't work for many problem domains.

Admittedly, you're sacrificing deterministic observation and simply requiring the semantics of each object be such that method calls are (semantically) commutative, such that there are no rules or behavioral contracts (even implicitly) involving pre-conditions and post-conditions for method calls. Those 'before' and 'after' conditions are bad mojo in your philosophy - the very essence of order dependence!

The same criticism applies: even if you stick to semantics (how programmers are expected to interpret behavior) rather than deterministic independence, I suspect this doesn't work (without semantic errors) for very many problem domains. The issue will be building semantically order-independent vocabularies for each interface that achieve the desired goals.

Transactions are not eliminating error, just shifting it

Transactions eliminate malign race-conditions and allow clean recovery after partial failure. Those are errors eliminated. They also support persistent language by allowing partial volatility, keep vocabularies simple, and allow some forms of simple backtracking and undo that can potentially be leveraged into the application layer and made available to users.

And one can control their extent, using a few barriers.

Use the best tool for the job

Note all code that can potentially run forever is already infinite in the (space-)time domain. I will now further clarify the issue I raised is whether trading potential code space propagation dependencies for (space-)time-indeterminism is beneficial in any cases (and gradients between the two extremes). Suggest we end this fork and as we continue this at other fork, consider how functional programming differs from imperative programming, the former in pure form being 100% commutative (order-of-execution tolerant). I was not suggesting the commutative be the only beneficial property of an alternative concurrency programming paradigm. We will try to better explain the benefits and trade-offs of STM versus (and/or in combination with) other concurrency strategies in the other fork as described and linked from there.

such that there are no rules or behavioral contracts (even implicitly) involving pre-conditions and post-conditions for method calls...

...The issue will be building semantically order-independent vocabularies for each interface that achieve the desired goals

Suggest we carry this debate/discussion forward to other fork, so our effort remains digestable for readers (we are proliferating too many forks of discussion). I understand you are (justifiably in some cases but overriding considerations IMHO in other cases) criticizing that my proposed alternative concurrency strategy shifts the burden of correct semantics from the implementation layer to the higher level algorithms, and what I want to discuss with you at other fork is that time-indeterminism has to come at the algorithm layer (whether you we use semantics which force it at the language layer, i.e. 100% functional programming, or relax the semantics with monads or other paradigms that give more leeway for imperative time-determinism at the high-level algorithm layer), because if the algorithm is not inherently commutative (i.e. out-of-order-execution tolerant, aka time-indeterminant), then implementation methods (e.g. STM) which attempt to compensate for the commutative intolerance of the higher-level algorithms, have the trade-off of shifting the problem to potential for "infinite code propagation". In some cases we will choose to use STM rather than make the algorithm time-indeterminant, especially when the mutable effects grouping is well isolated from external dependencies (this will be almost a free lunch, with some considerations and caveats). There is no free lunch in all cases, we must have all the options at out disposal and understand their advantages and trade-offs. Let's discuss these many topics further at the fork I am pointing to.

Re Starvation

My definition of global state pertained to the ability to avoid circular deadlocks, and starvation due to rollbacks.

There are, in fact, many forms of starvation with which you should concern yourself. Priority inversion, convoying, and arbitrary infinite delays are also problematic when working with real-time or soft-realtime systems.

Your current design requires a chain of locks, such that if A calls B calls C calls D, then the thread has locked A, B, C, and is waiting on D.

Even in the absence of cyclic wait, a thread that holds D's lock may hold it indefinitely, making recursive calls for example. This can bar any number of other threads, including those of higher priority, from ever making progress.

When I'm wearing my process-control and real-time hat, it is a considerable advantage of STM that one can abort a low-priority transaction in order to achieve a high-priority transaction, even if this means potential for starvation of the low-priority transaction.

No free lunch

Your current design requires a chain of locks, such that if A calls B calls C calls D, then the thread has locked A, B, C, and is waiting on D.

Even in the absence of cyclic wait...

I assume you agree by "even in the absence of cyclic wait" then that there can be no deadlock, because all the methods can be re-entrant for the same thread? I was only proposing they be non-re-entrant for threads other than the one with current per-instance lock.

...a thread that holds D's lock may hold it indefinitely, making recursive calls for example. This can bar any number of other threads, including those of higher priority, from ever making progress.

Agreed it defeats thread priority for shared resources, e.g. responding quickly to I/O from the user. But single-threaded programs have this problem now don't they due to external events to the extent they are not event re-entrant (e.g. older browsers Javascript) or otherwise lockup resources to prevent out-of-order execution error?

Programs in general perform very poorly for real-time interaction if they can not make long duration algorithms run with full re-entrancy and either optimistic rollback or acceptable out-of-order execution error.

That is why I am visualizing an algorithmic shift of mindset towards instance autonomy, where A calls B, B sets a few state changes places itself on the thread scheduler list, then returns. Later B gets a chance to do some computation on this new state in another random thread. My point is to well match the lock granularity to the algorithm, so that both order-of-execution and lock recursion are irrelevant. I like per-instance method duration locks because they can be automated by the compiler, then the code within each instance (most of the code in application I assume) doesn't have to worry about manual critical section declaration.

Many cases of recursion do not lock the instances being walked in recursive spaghetti (e.g. walking the BSP scene graph testing some conditional).

When I'm wearing my process-control and real-time hat, it is a considerable advantage of STM that one can abort a low-priority transaction in order to achieve a high-priority transaction, even if this means potential for starvation of the low-priority transaction.

But one potential "can of worms" is that many things do not rollback well. Rollback is not always a well formed proposition.

Second, there are many areas that donâ€™t come under the purview of STM (file activity, screen updates, real db transactions with different tx semantics). They donâ€™t work under an arbitrary retry model, unlike in a locking case where you know you own the critical section.

There are design tradeoffs and all of the options are going to introduce error, there will never be a bug-free program, so we have to choose wisely and ultimately the market chooses the winners.

But maybe we can agree that incremental algorithms (those that reduce the transaction size) are going to be beneficial in any case? Which is the thrust of my point towards autonomous instances and maximizing entropy.

In short, big monolithic things suck :D

Non-Transactional FFIs, Hardware, Protocols

But one potential "can of worms" is that many things do not rollback well. Rollback is not always a well formed proposition.

Agreed. When working with hardware controllers, foreign-program interfaces, protocols that don't support transactions, etc. one must favor alternatives. These alternatives include 'roll-forward' - useful if the reply for a request is independent of the current state. One may also use locks for the duration of a transaction, or may simplify just a bit by telling the runtime to that you can handle only one transaction at a time.

On the other hand, I believe that widespread support for software-level transactions will (over many years) grow to heavily influence hardware interfaces, foreign function interfaces, and software protocols. It would be wise to at least ensure one can map into X/Open XA and other common transaction standards.

Is (up to WAN) propagation of transactions desirable?

These alternatives include 'roll-forward' - useful if the reply for a request is independent of the current state.

Is that another form of acceptable error (similar to the out-of-order error accepted when we break pessimistic deadlocks)? Or is it an exception that must be handled by the caller (request failed returned)?

I believe that widespread support for software-level transactions will (over many years) grow to heavily influence hardware interfaces, foreign function interfaces, and software protocols. It would be wise to at least ensure one can map into X/Open XA and other common transaction standards.

Hmmm, I think I may disagree that this is necessarily a desirable universal path forward. Again I think big monolithic things suck. We should be instead finding ways to eliminate/reduce dependencies on order (of execution, of static class vocabularies, etc), so that our distributed world can scale and be more free. We don't want the entire world connected to a single point of failure. That would be horridly brittle.

AXIOM: transactions should be as local and as narrow as possible, because the wider we propogate them, then the more opportunities for conflicts. I.e. how are you going to profile the entire distributed world to ensure you've chosen the correct transaction strategy (optimistic, pessimistic, hierarchal, etc)? It will become conflatingly intractable/impossible.

Shelby Moore III wrote:

But maybe we can agree that incremental algorithms (those that reduce the transaction size) are going to be beneficial in any case? Which is the thrust of my point towards autonomous instances and maximizing entropy.

In short, big monolithic things suck :D

AXIOM: transactions should

AXIOM: transactions should be as local and as narrow as possible, because the wider we propogate them, then the more opportunities for conflicts.

I agree with this axiom (principle, really). I suspect I disagree on the extent of what "as local and as narrow as possible" actually entails, but I do support a provision in Transactional Actor Model [c2] for keeping transactions 'local' and 'narrow'. See the section on "Decaying" Transactions towards the bottom.

How are you going to profile the entire distributed world to ensure you've chosen the correct transaction strategy (optimistic, pessimistic, hierarchal, etc)? It will become conflatingly intractable/impossible.

There is no need to do so.

First, keep in mind that profiling is an optimization strategy, not for 'correctness', and isn't any more critical than any other global distributed optimization. (Indeed, I suspect the benefits from tweaking transactions will ultimately be marginal compared to benefits from proper multi-cast and efficient message-routing on an overlay.)

Second, whatever global strategy you're imagining, I could probably achieve 70% of the same gains with a combination of more local, heuristic decisions by individual transaction managers (e.g. profile each resource, such that resources that have a long history of causing conflict get locked), and global decisions from the call-site (e.g. profile each site that starts a transaction, and favor pessimism for transactions that have a long history of encountering conflicts).

These alternatives include 'roll-forward' - useful if the reply for a request is independent of the current state.

Is that another form of acceptable error (similar to the out-of-order error accepted when we break pessimistic deadlocks)? Or is it an exception that must be handled by the caller (request failed returned)?

The object of your sentence (that to which 'that' refers) is unclear. But, supposing a transaction involves a non-transactional resource for a simple command or query operation (as opposed to a long chain thereof) these are usually easily handled. A simple command op doesn't need to return anything meaningful (reply with '()'), and so can be rolled-forward in the event of commit.

One will be sacrificing transactional semantics a bit if other operations should have been able to perform reads influenced by that command.

Rollback is predicted on infinite sampling model

I just realized that rollback is conceptually dependent on the model of infinite sampling, because the aliasing error introduced by a rollback is only 0 if infinite reach of external dependencies are assumed to be compliant.

Shelby Moore III wrote:

It favors not breaking on external rollback dependencies (another form of error) and allows the operations to complete without rollback, accepting atomicity error in return.

And the same caveat applies as for STM, that rollback has to be well formed on any external dependencies.

Transactions shift infinite code space sampling dependency on order-of-execution to infinite code space sampling dependency on rollback externalities.

dmbarbour wrote:

There is no need to do so.

First, keep in mind that profiling is an optimization strategy, not for 'correctness', and isn't any more critical than any other global distributed optimization...

It is incorrect if the externalities do not rollback. How can you know if the externalities rollback when they propagate into infinite code space?

Perhaps "profile" was the wrong word. I didn't mean to imply the determination of incidence of rollback alone, instead the incidence relative to ability/cost of externalities to rollback in those incidences. And you simply can't measure (profile) that external potential for failure or exponential cost, which thus can throw off you judgement about cost of local incidence of rollback.

Rollback is safe if used locally

The great potential I see for rollback transactions (e.g. STM) is that it can used in local scenarios that are well isolated from external dependencies, and can be run asynchronously orthogonal to operations which do not require atomicity, i.e. threads which do not need atomicity are never rolled back. Thus one can transition portions of their OOP code to non-atomic algorithms as I have suggested and keep STM around for everything else until the "everything else" is reduced to zero (probably never).

Combining concurrency strategies

Rollback-capable transactions are useful for code where the atomicity is a groupable (typically per thread?) set of mutable effects and their rollback is deterministic in all code paths.

The trade-off is that the grouping and/or rollback determinism are in reality never contained in finite code space-time and thus are not deterministic because we can not sample for infinite space-time (i.e. in code space over all time). Attempting to contain (any type of, not just order-of-execution) determinism to finite code space-time thus impacts composability finiteness, and is why over time code bases become brittle and have to be discarded. This goes back to the central philosophical point that I launched this discussion with, where in quotes below "global state" was anything that caused order-of-execution dependency (errors), i.e. STM doesn't make the reduction in entropy or the infinite time sampling problem disappear:

Shelby Moore III wrote:

As soon as you introduce global state (decrease entropy), it will cascade and cause breakage of composition at the per-instance (class) granularity.

Shelby Moore III wrote:

Is there any way to get determinism with cached instance state and also determinism of termination without deadlocks and infinite rollbacks starvation divergence? I suspect no if there is any global state, not without 100% (infinite) regression, which takes me right back to my point that since we can not sample for infinite time, then we never know the deterministic truth.

But that is sort of trivial point in the sense that all code will be to some degree deterministic and thus to some degree brittle (in my Theory of Everything, existence is precisely order, and only disorder is not bounded/limited/brittle).

Shelby Moore III wrote:

But transactions are required by order-of-execution determinism assumptions inherent in the algorithms of the program. It may possible to eliminate them in cases where the algorithms can be modified to be tolerant of out-of-order execution. And I am making a philosophical point that such an algorithmic shift is generally desirable on the large scale of the universal trend of maximum entropy. And I think that trend applies to how we will re-use code more widely (distributed computing) and build new more granular models of code ownership, contribution/hosting, and maintenance. OTOH, nature needs determinism on it's march towards global randomness (i.e. imagine the gradient search for a valley with simulated annealing, which analogous to the cooling rate of ice determining how many fractures it has). Back on point, we must use the best algorithms and tools for the job at hand.

Thus the point I think I am trying to make is that we should try to survey and understand how the level of determinism we build into our algorithms and paradigms impacts the code robustness, programmer productivity, and composability (again determinism leaks into infinite code space-time). STM is a tool, not a panacea and we should survey where it lies on the gradient between indeterminism and determinism (specifically relative to order-of-execution dependency), what are the alternative paradigms, trade-offs, and composability between paradigms.

Inspired by the nature of interaction of objects in a scene graph of a game that emulates the real world, I postulated that perhaps the atomicity that was most natural for that type of program would be instances of those objects being time-indeterminant (i.e order-of-execution indeterminant) in the algorithms of their interaction, but time-determinant within the imperative code for the instance class methods. I postulated a concurrency strategy that applied to that model, employing a theoretical automatic (by compiler) mutex lock (not rollback-capable) where methods of an instance are not re-entrant by other threads during the lock (i.e. no cross thread asynchronous access to instance). We debate/discussed some of the trade-offs. Perhaps you could help me re-summarize them?

I was driving towards making the global scope (algorithms external to instance) time-indeterminant, so that determinism's brittle effects and leak into infinite code space-time, could be eliminated from the global scope. Intuitively I know that my goal will be defeated and that somehow the time-dependence within the instance methods will somehow leak out. But to what degree, because intuitively I also know it pays big paradigm-shift wins when we match to algorithmic efficiencies in the model of what our program needs to do (the word is "resonance"). I want to survey and analyze. Afair, dmbarbour main criticism was the leakage would take the form of a proliferation of class methods (vocabularies), and Shelby Moore III replied:

I think the English language has never stopped growing. I think such growth is essential for adaptability. I visualize it helps composability across code-ownership barriers, if we get sub-classing correct as per Tim Sweeney's recent suggestion. If I don't have the API I need, I subclass and create it. Easy to work around any API composability limitation for my application...

Also it seems to me that both STM and the alternative concurrency strategies, such as the the one I postulated above, can co-exist and be used asynchronously. This is true when the grouping for STM is distinct from the grouping for the other concurrency strategy, e.g. if STM is analyzing collisions on the same memory location by different threads and my postulated mutex locks are orthogonal and in no conflict. dmbarbour, you some where stated you did not agree, so I think we need to debate/discuss this.

Note the grouping for STM per thread is very broad and will result in a lot of determinism leakage into infinite code space-time. Reminds me of how apparently the "clean or elegant" garbage collection algorithm dependencies reach out into every plugin in Mozilla/Firefox which they've apparently been struggling with all sorts of hacks to solve. I would again like to re-emphasize the AXIOM that the transactions ideally be infinitely narrow, i.e. that we have no determinism in our program. Funtional programming reduces determinism to a single return value and termination guarantee. So the AXIOM means that maximum disorder yields maximum composability (flexibility, i.e. least brittleness). We know this is intuitively true, because we have the most freedom of creativity (not necessarily the highest productivity) when we start with raw random materials (a clean chalkboard).

Shelby Moore III

See figure 4 on page 8 of Programming Paradigms, I assert atoms are on the "unorganized aggregates region".

I am thinking STM is going to be very useful for generalized imperative code (no time-indeterminism algorithms) where we isolate it to helper threads that do not interface it with external resources at all. We know we give up the external composability, but we gain the time-determinism algorithms. In this case, the thread grouping is not broad, as we've put the threads in a sandbox. This may have some similarities to what dmbarbour stated about multi-threadings running on each Transactional Actor, then messaging between them to isolate external time-determinism leakage. We then have to be careful the messaging doesn't leak the time-determinism.

Monitor mutual-exclusion

Shelby Moore III wrote:

...employing a theoretical automatic (by compiler) mutex lock (not rollback-capable) where methods of an instance are not re-entrant by other threads during the lock (i.e. no cross thread asynchronous access to instance).

I.e. a Monitor mutual-exclusion:

Transaction Barriers

The assertions of 'infinite propagation' for transactions (though 'code') ignore that a wide variety of techniques for controlling the propagation of transactions, many of which are very easily applied to work with non-transactional resources.

A very simple transaction barrier is a cell. A transaction writes a value to the cell. An observer may later see the update, but not until after commit. A more implicit barrier is the 'decaying' transaction I mentioned earlier: one sends a message whilst within a transaction, but if the transaction itself is not waiting upon a reply from that message, then it may 'fall out' of the transaction by delaying the message in-transit until after commit. One could make such a 'decay' barrier explicit for actors that return 'unit'.

Using these techniques, one can control how much exposure to transactions. It does require paying some attention to how the semantics of the interface: the nature of these barriers naturally impacts the API to the resource, forcing code utilizing the API to curb the 'infinite extent' of its transactions and use somewhat less convenient approaches for achieving safe concurrent operation. This can be used for external resources and for frameworks or services within the language.

Many external resources, of course, are inherently transaction-safe (wall-clock, random number generators, static resource caches, mouse and keyboard inputs). Many more are close enough to bridge the gap - still achieve 'atomicity' and an acceptable level of 'isolation' - even if they can't achieve full ACID properties. For example, for a 'file' resource: your edits might stomp on the edits of other processes, or might not be complete in event of power failure, but may still be good enough for practice.

For other external resources, you take care to build an interface and choose semantics that allow you to add a transaction barrier should you desire one.

So, despite your fears about 'infinite code-space influence' of transactions, the viral expansion of STM can be controlled wherever it needs to be controlled. Further, the transaction framework for several resources allows one to achieve something more flexible than non-transactional approaches.

In the large, I would be quite suspicious of a claim that a plain object approach achieves a more 'correct' behavior, as opposed to simply ignoring certain classes of error that transactional approaches or transaction barriers may catch.

Hacking away composability?

I for one, do appreciate you taking your time to share your knowledge and helping to survey the various options and trade-offs.

The assertions of 'infinite propagation' for transactions (though 'code') ignore that a wide variety of techniques for controlling the propagation of transactions, many of which are very easily applied to work with non-transactional resources.

How do these "hacks" impact composability? For example:

Shelby Moore III wrote:

Reminds me of how apparently the "clean or elegant" garbage collection algorithm dependencies reach out into every plugin in Mozilla/Firefox which they've apparently been struggling with all sorts of hacks to solve.

dmbarbour wrote:

A very simple transaction barrier is a cell. A transaction writes a value to the cell. An observer may later see the update, but not until after commit.

At least 2 problems:

1. The external observer is no longer seeing changes in real-time (as compared to running the same local program single-threaded without transactions), but rather external observer timing becomes dependent on transaction size. The observer updates may have nothing to do with orthogonal transactions. This is similar to the criticism you made about my monitor concurrency wherein chains of recursive locks could interfere with thread priority. In short, this appears to trade the "infinite code propagation leakage" from rollback-capable dependency to synchronization perturbation. I hope you are started to appreciate my statement that we have an "infinite sampling" theory problem, because these aliasing errors are just getting pushed around into different spaces due to the fact that our single-threaded algorithm is not out-of-order-execution tolerant.

2. These cells are not implemented at the low-level automatically, but require explicit changes to the local code that was ported from single-threaded.

A more implicit barrier is the 'decaying' transaction...

I do not completely grasp that, but sounds like more synchronization perturbation propagation brittleness?

...the nature of these barriers naturally impacts the API to the resource, forcing code utilizing the API to curb the 'infinite extent' of its transactions and use somewhat less convenient approaches for achieving safe concurrent operation. This can be used for external resources and for frameworks or services within the language.

That is the destruction of composability. Once you have competing dependencies (i.e. STM and some other paradigm conflict, e.g. real-time synchronization dependency), you have the potential for deadlock on dependencies and then ALL code has to be re-factored. Besides the fact that the dependency is forcing API changes (or translation layer) to external things before they can be composed.

Many external resources, of course, are inherently transaction-safe ...mouse and keyboard inputs...

I do not think so in case of I/O, because there is a real-time synchronization problem.

...more are close enough to bridge the gap - still achieve 'atomicity' and an acceptable level of 'isolation' - even if they can't achieve full ACID properties.

In brittleness propagation, a little is lot. Think of the Butterfly Effect or more precisely that composability reduction is additive.

Also we have the inter-opt issues to interface diverse APIs for rollback capabilities, and there will be deadlock incompatibilities between rollback-capability APIs.

For other external resources, you take care to build an interface and choose semantics that allow you to add a transaction barrier should you desire one.

I am conceptually thinking of using STM in those cases where I can completely isolate in a sandbox, and the algorithms needed wouldn't be natural without transactions. Because I view a little bit of external leakage as being a future's contract to a lot of leakage.

So, despite your fears about 'infinite code-space influence' of transactions, the viral expansion of STM can be controlled wherever it needs to be controlled. Further, the transaction framework for several resources allows one to achieve something more flexible than non-transactional approaches.

That flexibility could have a huge leakage creep cost over time. I think the safest is to be very judicious about where one applies rollback-capable transactions. A wholesale approach could be dangerous in unforeseen ways.

...I would be quite suspicious of a claim that a plain object approach achieves a more 'correct' behavior, as opposed to simply ignoring certain classes of error that transactional approaches or transaction barriers may catch.

I am skeptical also of the monitor, mutual-exclusion concurrency that I postulated, but regarding the catching of execution order errors, the point of my proposal was to match algorithms to application better and if those algorithms could naturally be execution-order agnostic, then there are no such errors to catch. Then the externalities problem disappears also.

Composition with Transaction Barriers

How do these "hacks" impact composability?

Transaction barriers don't prevent any form of composition available to non-transactional Actors Model and Publish/Subscribe systems. They do impact expression of composition in a relatively simple way: rather than using return-values ('replies') for feedback, you must send a callback handle or request a subscription. This avoids any promise of callback prior to commit.

Transaction barriers do impact 'composability' in the sense that the system across a transaction barrier does not compose transactionally, but that is exactly their purpose. Overall, these are very clean "hacks".

external observer is no longer seeing changes in real-time (as compared to running the same local program single-threaded without transactions)

That is not a problem. The semantics of a change from a transaction is that, to external observers, the changes to the environment occur atomically upon 'commit', and that intermediate states are not visible.

If the external observer sees changes in real-time without transactions, then (ignoring temporal interference from the commit protocol itself) said observer will also see changes in real-time with transactions. Real-time will simply be measured from the point of 'commit', in accordance with transaction semantics.

this is similar to the criticism you made about my monitor concurrency wherein chains of recursive locks could interfere with thread priority

There is no interference with priority, since even pessimistic transactions may be aborted to make way for the higher-priority transaction to ensure responsiveness.

cells are not implemented at the low-level automatically, but require explicit changes to the local code that was ported from single-threaded

I'm afraid I don't follow this argument. How cells are expressed and implemented is language dependent, and they may be primitive. It is also unclear from where all this single-threaded code is supposedly being ported.

I do not completely grasp [decaying transactions], but sounds like more synchronization perturbation propagation brittleness?

The essence of transactional decay: messages whose replies are not needed get delayed until after commit. This isn't 'brittle' in any way I can see. It reduces transaction conflicts, increases chance of transaction success, and allows one to control the extent of transactions (keeping them narrow and shallow, as opposed to spreading wide and deep). I offered a reference to Transactional Actor Model [c2], near the bottom.

As far as not grasping private language... 'synchronoization perturbation propagation brittleness' ain't exactly a point on your side ;).

[controlled API] is the destruction of composability.

Transaction barriers only 'destroy composability' for transactions; that is, transactions do not compose across such a barrier. That is exactly their purpose.

Other orthogonal properties still compose, such as interaction priority, multi-cast, demand-driven subscriptions, capability security, type-safety, safe and deadlock-free concurrency, garbage collection, persistence, disruption-tolerance, automatic distribution, a variety of partial-evaluation optimizations, etc.

One claims programming systems are 'composable' when composition is able to meet desirable properties without imposing onerous or conflicting requirements on users. I posit that, even while using transaction barriers, the set of useful properties achieved in composition of a transactions-based approaches can be greater than that of your monitors-based approach.

Once you have competing dependencies (i.e. STM and some other paradigm conflict, e.g. real-time synchronization dependency), you have the potential for deadlock on dependencies and then ALL code has to be re-factored.

What does 'deadlock on dependencies' mean?

the dependency is forcing API changes (or translation layer) to external things before they can be composed

Changes from what? Translation to what? Are you assuming an existing implementation of each API in a particular language?

All APIs require disciplined design in order to both match the expectations of the language and the system they control. One describes semantics for the API, and those semantics determine the requirements imposed on users (including ordering of calls) for safe, secure, high-performance, leak-free, deadlock-free, <desirable property etc.>, composition.

Any non-trivial API design impacts composition.

Many external resources, of course, are inherently transaction-safe ...mouse and keyboard inputs...

I do not think so in case of I/O, because there is a real-time synchronization problem.

I'm afraid you'll need to explain what the problem happens to be, and how transactions interfere with this input.

In brittleness propagation, a little is lot. Think of the Butterfly Effect or more precisely that composability reduction is additive.

When a restriction in the features a programmer can achieve is due to inherent limitations of the environment and systems being controlled - as opposed to a weakness in composition or expression inherent to the programming language - I do not believe that the 'butterfly effect' to be a serious issue. It isn't as though a competing approach can do 'better' for working with that edge system.

One should instead express these limitations in the environment via corresponding limitations in the API and vocabulary, and the API then guides the code that interacts with that subsystem. If the system later upgrades, then upgrade the API.

A wholesale approach could be dangerous in unforeseen ways.

Even Smurfs and Carebears could be dangerous in unforseen ways. Let's focus on what we can see, shall we?

my proposal was to match algorithms to application better and if those algorithms could naturally be execution-order agnostic, then there are no such errors to catch. Then the externalities problem disappears also

Problem is that the domain is often not order agnostic. You can't remove a job from a printer queue until after you've added it, for example.

I agree with the basic idea of 'avoid order-dependence where feasible'. More accurately, I agree with a corollary: 'take advantage of order-independence where possible'. And that can be done with data-parallelism, reactive programming, events distribution and events processing, making it cheap and easy to 'spawn' tasks, etc.

I simply think your overall assumption - that order independence 'matches algorithms to applications better' - is in error.

I also believe that to automatically take advantage of 'order independence' pretty much means concurrency with observable determinism. Programmers are plenty capable of achieving order independence when it is purely semantic (via spawning tasks, parallel 'foreach' operations on a list, etc.).

RE: Composition with Transaction Barriers

I may not be able to respond to each of your points, because apparently our models of reality are diverging, perhaps on the order of the difference between space-time and quantum mechanics. So (in this fork of discussion at least) it is becoming increasingly difficult for us to understand each other, because our base models from which we are communicating are not resonating. How would one explain quantum mechanics or space-time in terms of each other? They are not (yet) unified. We may be able to unify our divergent models and resonate our understanding, if one or both of us is smart enough. I will not be disappointed if I fail to on my side, as the discussion has been productive. Thank you very much. Einstein who said about the deceit of models (remember sampling theory says we don't know the true signal until we've sampled infinite time, e.g. "even a broken clock is correct 2x per day" where "broken" is relative to whose clock everyone else is using, i.e. Whose clock is broken twice per day? yours or mine? depends on whose shared reality we are modeling):

Einstein quote:
"If the facts don't fit the theory, change the facts"

As far as not grasping private language... 'synchronization perturbation propagation brittleness' ain't exactly a point on your side ;).

:) Yeah maybe it is just semantics of communication that is blocking the resonance of our mutual understanding. But that is to some degree correlated to the different models of reality in our minds and different areas of focus/expertise. Whoops on "monitor mutual exclusion" earlier :)

What does 'deadlock on dependencies' mean?

"Composability deadlock" means a plurality (2 or more) of things we want to compose, all need changes to interfaces of each other in such a circular spaghetti, that we can not make the changes without re-factoring code underneath the public interfaces. In other words, it is no longer possible to achieve the resonance at the translation/glue/Adapter Pattern (between interfaces) layer, i.e. composability not achievable by adding code only by re-factoring code.

I agree with the basic idea of 'avoid order-dependence where feasible'. More accurately, I agree with a corollary: 'take advantage of order-independence where possible'. And that can be done with data-parallelism, reactive programming, events distribution and events processing, making it cheap and easy to 'spawn' tasks, etc.

Agreed for example.

When a restriction in the features a programmer can achieve is due to inherent limitations of the environment and systems being controlled - as opposed to a weakness in composition or expression inherent to the programming language - I do not believe that the 'butterfly effect' to be a serious issue. It isn't as though a competing approach can do 'better' for working with that edge system.

My point is it better to re-factor the algorithm for concurrency, than to propagate composability brittleness. Agreed that without re-factoring the single-threaded algorithm, all concurrency strategies will leak errors into the composability domain/space (btw, I don't like the use of the shorthand "domain" for data domain, i.e. DOM...domain is general term and should remain so).

In short, re-factor sooner than later, because composability deadlocks later impact more code.

The essence of transactional decay: messages whose replies are not needed get delayed until after commit. This isn't 'brittle' in any way I can see. It reduces transaction conflicts, increases chance of transaction success, and allows one to control the extent of transactions (keeping them narrow and shallow, as opposed to spreading wide and deep).

Ah I see now what "decay" means in that context. Afair, you changed terminology since your prior post on that in our discussion, before you wrote "transaction futures", which I think is more intuitive semantics.

Any futures contract is slavery (brittleness) if it obstructs nearer term freedom (i.e. if the wait causes something else to wait in a way it wouldn't have otherwise, see below for more on conflation of real-time atomicity with transaction atomicity), otherwise I agree it is actually an algorithmic improvement towards concurrency that is analogous to the queued replies from queued concurrent tasks. However, unlike the linked example where the parallel tasks are assumed to have no impact on latency of real-time I/O synchronization (because the Unreal 3 engine already works), if you are proposing transaction futures as a general mechanism, one has to be careful the futures don't impact on real-time dependencies. That is what I mean by my private language, 'synchronization perturbation propagation brittleness'.

Thus, for transaction futures to work without slavery (in the algorithm, no conflation/interweaving of real-time atomicity and transaction atomicity), they are actually a modification of the single-threaded algorithm away from execution-order dependency. Thus we are re-factoring the single-threaded algorithm as my AXIOM expects. There is no free lunch. We have to re-factor the single-threaded algorithm. I repeat what I wrote many posts ago, "we have sampling theory problem", meaning we have to sample for infinite code space-time unless we fix the algorithm. That aliasing error can get pushed into the composability domain space.

Many external resources, of course, are inherently transaction-safe ...mouse and keyboard inputs...

I do not think so in case of I/O, because there is a real-time synchronization problem.

I'm afraid you'll need to explain what the problem happens to be, and how transactions interfere with this input.

Example here.

Changes from what? Translation to what?...

All APIs require disciplined design...[shelby paraphrases: semantics/laws must be adhered to]

Any non-trivial API design impacts composition.

Refer to my definition of "composabilty deadlock" above. The semantics of the interfaces collide-- agreed impact composition. So we must minimize the semantics we force outward. I am not saying we must minimize the features and vocabularies we offer outward. Offering is not same as forcing. One is a free market, the other is slavery. The problem I have with STM is not the features it exposes locally (juicy tempting like what fiat offers), but the fact that it forces propagation externally (creeping brittleness or slavery, just like fiat+govt law does). That force is the greatest evil I can think of in software engineering. Interfaces should offer features, minimize forced requirements on the caller.

Statism is failure. The best designs are those that minimize the external dependencies they radiate (propagate). AXIOM: leave the smallest footprint for the largest productivity. COROLLARY: small things grow faster.

Afaics so far, STM is a mechanism for hiding the concurrency errors in the single-threaded algorithm and forcing the error into other spaces, e.g. composability space. It is this force (of the law of rollback semantics) that is it's own undoing in the end, analogous to how all laws except those constant natural laws (as wisely enumerated in Bible, apparently only for the 144,000 who want to know), are statism's undoing. Statism exponentially peaks and then utterly fails miserably (starvation, war, hyperinflation, total devaluation, etc) over and over throughout history every several decades. Ditto code. We throw it all away and start over every few years (Moore's Law compounds faster than difference of interest rates and population growth rate).

I am really a preacher masquerading as a software engineer :).

If we view STM as a tool to help the re-factoring from single-threaded to concurrent algorithms, and try to minimize STM's propagation (e.g. your notable example above of modifying or taking advantage of transactions futures concurrency in the single-threaded algorithm), then I can view STM as helpful and not evil. When I say "evil", I don't mean all that deceptive propaganda ("Global Warming", etc) which is parcel of the statism evil, rather the tapeworm economics (cancer) of statism that wastes 99% of human productivity. Statism means any law/force and/or group-think/ideaology that propagates and thus concentrates control. Programming is just a microcosm of that.

Transaction barriers only 'destroy composability' for transactions; that is, transactions do not compose across such a barrier. That is exactly their purpose.

Other orthogonal properties still compose, such as interaction priority...

It is very common illusion amongst most people in the world that the semantics of law can protect them from random effects of nature. Ditto insurance.

The aliasing errors that I am mentioning are random and are a result of the inability to sample for infinite time all the code paths of the single-threaded algorithm (infinite time regression). This is kind of a deep point, I suppose I can not adequately convey it here, no matter how many times I mention it. Fundamentally semantics can not make any guarantees because of the Halting problem, which is another way of stating the infinite sampling problem. Look also at GÃ¶del's Theorem, as it seems logically analogous to the work I did recently to reveal that sampling theory tells us we never know the true signal, thus science is faith (evidence of a shared reality only), that seems to piss a lot of "well reasoned" people off (many invent all sorts of heuristics to try to push away the implications relative to faith in general, but four of the world's smartest men Carl Sagan understood "Albert Einsteinâ€”considered God to be essentially the sum total of the [natural law]...An atheist has to know a lot more than I know."). I deduce also Isaac Asimov and Marvin Minksy, "If you wire it randomly, it will still have preconceptions of how to play. But you just won't know what those preconceptions are." (reality is true only to extent it is shared). An example of false security of erroneous overextended reason (idols), Edge Conditions of local state diagram provides the illusion that the randomness of natural law is contained or bounded, but that ignores the infinite external state diagram. Plato demonstrated that scientific models are only approximations of real world.

We have a very relevant and timely example of the human failure that results from over-idolizing semantics (for surety) with resultant erroneous claims blaming the OOP semantics for the failure. The OOP semantics didn't fail, rather it was the lack of respect for natural law and the limitations of semantics thereof.

That is not a problem. The semantics of a change from a transaction is that, to external observers, the changes to the environment occur atomically upon 'commit', and that intermediate states are not visible.

If the external observer sees changes in real-time without transactions, then (ignoring temporal interference from the commit protocol itself) said observer will also see changes in real-time with transactions.

Disagree, because as I said in prior post, the transactions atomicity is not necessary correlated to the real-time atomicity, i.e. in the non-transactions single-threaded execution, a reply to the real-time externality can occur before the completion of some atomicity in what will need to be (as you suggested) a transaction cell in multi-threaded.

As I wrote here, STM can not unweave conflated atomicities of different types-- only the algorithm can do that. Semantics of transactions is not orthogonal to the natural law of the world and can not protect you. AXIOMS: No man is an island. Make many offers, but no promises. Again refer to Greenspan's key blunder of mistaking slavery for what he and other academia think is a free-market. They depended on (idolized and prayed to) semantics, not on the constant natural laws.

my proposal was to match algorithms to application better and if those algorithms could naturally be execution-order agnostic, then there are no such errors to catch. Then the externalities problem disappears also

Problem is that the domain is often not order agnostic. You can't remove a job from a printer queue until after you've added it, for example.

I assume by "domain" you mean the stateful world, ie. data (DOM) or side-effects domain or the space of anything that is not purely functional? Is there a more specialized term in CompSci than "domain"?

Agreed the model of the stateful world matters. That is why I started this whole discussion by noting that for gaming in the real world actors have the most freedom when they do not form futures contracts and simply trade incrementally.

I will probably try to end my contribution to this discussion here, unless you have a major revelation to add.

Again thank you so much, I learned a lot in a very short time with you. You are obviously very knowledgeable in this field and you brought me up to speed. I will admit it, 2 days ago I didn't know squat specifically about transactions (other than use of MySQL 3.xx, my 25+ years of programming, common sense, and the Bible's natural laws), and I faked it. :D

Transactions: The programmer

Transactions: The programmer subdivides effectful code into delimited blocks, and the language runtime is responsible for running the blocks concurrently while guaranteeing that each one operates atomically (without interference from other blocks), dynamically resolving dependencies.

This is not the only form of transactional information system design. See Weikum and Vossen's excellent text Transactional Information Systems.

For example, for modern distributed data binding, the current flavor appears to be multi-version concurrency control (MVCC).

Communicating Sequential Processes / Message Passing Concurrency: Erlang's approach. Each process can only access its own local variables; processes communicate by sending messages to each other.

The more general style here is the Shared Nothing idiom.

Shared Nothing?

The more general style here is the Shared Nothing idiom.

I never did understand the 'shared nothing idiom'. As I see it, the only way to share nothing is to never communicate even indirectly... which isn't a particularly useful idiom for composing a program.

What does 'shared nothing idiom' mean to you? (Just taking my hammer to your words and nailing stuff down. ;)

Shared Nothing generally

Shared Nothing generally describes Mike Stonebraker's classical database systems design paper, The Case For Shared Nothing Architecture

As I see it, the only way to share nothing is to never communicate even indirectly...

Well, often, that is exactly how shared nothing is implemented. Removing unnecessary communication is a good thing. This is called (application) partitioning in 00.

In REST, shared nothing means that the client does not have a workflow that describes N steps in a process, but instead negotiates each step with the server via the hypermedia document. Likewise, the server does not store client-side information to service each request. In this sense, REST has much in common with the kind of classical OO used in real-time embedded systems engineering, where objects are atomic entities in the problem domain and they collaborate by sending messages to each other - in OO, the engine is finite state automata, which is very similar to the Hypermedia as the Engine of Application State (HATEOAS) principle in REST. In OO, because the engine is finite state automata, doing such things as adding guards to a state transition effectively encodes workflow into client callers. And, in OO, guards are the source of architectural decay.

In database literature,and especially the relational model, there is no physical storage requirements for where a logical piece of datum must go. So you can say things like "store all European customers in Europe" (sharding), and also have a rules engine sit in front of the database and do integrity checks for the database. In this scenario, you might have a security policy where the database(s) only talks to the rules engine, and the rules engine acts as a proxy on behalf of the client. In Fielding's terminology, the rules engine is really just a connector.

which isn't a particularly useful idiom for composing a program.

Well, yes. But it is a useful idiom for application partitioning.

In the sharding example, you will lose real-time reporting capabilities on queries that involve Europe and America - you would be doing a distributed join and your bottleneck will be the network and the speed of light. But why do you want to ask those questions anyway? The only scenario I can think of is exchange markets, especially commodities, because the price of a commodity in one market is usually mirrored in another market within miliseconds, and you are looking to effectively do an arbitrage scheme where you can make millions of dollars off nips or whatever.

I think his point was that

I think his point was that 'shared nothing' is a misleading term. If, in the semantic model, you're sharing something (an agreement on the value of something), it doesn't matter how you encode it. Writing the same application with value/message passing primitives just encodes the sharing, and thus ultimately should have the same problems. The libraries you write on top of the primitives are the interesting thing -- and, at that point, one has to wonder -- in a well-designed distributed system, should programmers even know message passing is going on?

I may have read too much into his question.

in a well-designed

in a well-designed distributed system, should programmers even know message passing is going on?

That's Erik Meijer's Volta project's question.

Another question:

in a well-designed distributed system, should programmers even need to distingiush shared nothing, or should proper 00 control paritioning just be how they think?

If you are responding to

If you are responding to Leo's question (and thus Erik's Volta project), then you have to look at how Erik actually proposes to split things across tiers as required.

You didn't mention Dynamic Data Flow

Dynamic Data Flow The programmer creates virtual computation space and establish (conditioned) relations between nodes in space by sending messages.

The matrix multiplication loops

will transform into
We send matrices' elements into Arecv and Brecv andtake results from array C elements receiver.
This is long, but dataflow analysis could help transforming an ordinary array code into code like that.
See Wavescalar on EDGE page.
I also have a (informal and unpublished) paper about dynamic dataflow here.

That presentation seems

That presentation seems rather tied to historical dataflow computer architectures. There was a great paper (citation escapes me -- it was a survey one) describing the fork between architectural and linguistic approaches to dataflow.

In this case, many dynamic dataflow languages provide synchronous semantics and then achieve dynamism by adding a 'filter' construct: there is no longer an output for every input. I would argue that higher-order dataflow languages are also dynamic in the sense that you can model them as changing the shape of the dataflow graph over time. My point here is that dynamic dataflow is indeed already included above -- that's how I (often) think about FRP!

Also, I didn't realize there was a renewed interest in such hardware for general computation -- even the GPU companies seem to be going thicker, not thinner. Thank you!

A few parallel designs...

Peter Van Roy has written a few documents that survey different forms of concurrency. CTM and a recent chapter Programming Paradigms for Dummies [LTU Node] include such surveys.

Tim Sweeney's Next Mainstream Programming Language [LTU Node] includes a survey of a few and their 'places' in game design. He mentions transactions above, and support for wide-vector parallelization qualifies as a form of data-parallel computation.

My own interest is in 'open' distributed computing, but concurrency is part and parcel to distributed designs.

Data-Parallel computation, possible with pure logic computation and pure functional computation, allow parallelization without affecting semantics. This is generally considered "parallelization" rather than "concurrency" because only the implementation is concurrent, not the semantics.

Some systems, like Google's MapReduce, aim to achieve data-parallel computations by shipping the computation to a much larger data-set. Wide-vector operations seen in supercomputers, graphics cards, and to a lesser degree in modern processors achieve this at a much smaller degree, but still achieve useful performance boosts.

The main challenge with data-parallel is deciding where to break up the computations - a challenge greatly exacerbated by the fact that the cost-to-benefit ratio of any given breakdown depends heavily on both the implementation architecture and on pressure from other sources of concurrency. If decided by hand, there is a high syntactic overhead, and the decision is likely too 'local' - the annotations will need be adjusted for different architectures. One may do better with some sort of adaptive design that uses profiling to decide where to break a computation down.

For distributed data-parallelization, one must also manage inevitable operations and node failures, which isn't a trivial problem. One can't fire and forget.

Functional Reactive Programming, with data-binding allows a great deal of parallelization and also has very nice features for caching, distribution and efficient multi-cast, disruption tolerance, and minimizing latencies. I personally think it to be an excellent bet for describing scene graphs. I think PVR called this 'synchronous dataflow programming' in paradigms for dummies. One can also do logic-programming with data-binding, which is neat.

In general, this is implemented such that updates invalidate caches eagerly and subscribers recompute lazily. One must avoid a common fallacy where one sees 'glitches' where, for example, x changes from 3 to 4 atomically but observers of (x*(x+2)) see 15->20->24 or 15->18->24.

Presumably, data-binding could be usefully combined with transactions to allow more than one variable to update as a single atomic action. However, I have not seen this in practice.

Blackboard Architecture and Tuple Spaces achieve concurrency by having a bunch of active agents talking to a central database. One can generally wait until a tuple with a given pattern is identified. This has a major advantage in that the elements are not temporally coupled and have excellent disruption tolerance. One can also perform ACID transactions on the central database.

The main weaknesses of this design. (1) Coordination of components to perform a shared activity becomes very complex. Different components cannot see each-other's transactions until they are completed, so multi-step behaviors between components cannot be performed atomically. Due to this limitation, individual components of the system will have inherent pressure to 'bloat up' to reinvent features rather than coordinate to achieve them. (2) Cleanup. Any given update might be observed by many components over a long period of time. It is rarely clear when or by whom a tuple should be eliminated from the repository. One can use expirations and such, but ultimately it's going to be heuristic and best guess. (3) Security and sharing (across organizations) - the use of a central database to coordinate elements is common, but not fundamentally secure. Such systems tend to have 'egg-shell' security - rather fragile, and a gooey middle that can rot quite easily after a breach.

Central point of failure is also a potential concern, but may be mitigated by varied design of the database for redundancy or distribution.

One could presumably use micro-databases to coordinate and secure them as object capabilities, but then distribution of the micro-db caps is a problem and so is multi-database pattern-matching and multi-database updates (one could fall back on FRP and distributed transactions, though).

Complex Event Processing achieves concurrency by having components produce and react to discrete events.

Unlike message passing concurrency, there is reduced coupling between producer and consumer, usually via some sort of 'channels' or 'component BUS' or other publish subscribe model. The reduced coupling is advantageous for distribution and modularity, and also allows the system as a whole to continue even when individual nodes fail.

The main challenges of the CEP architecture occur upon startup and shutdown, as components may 'miss' important events. Usefully, one can hybridize with the 'database' approach and have the network itself keep a history to reduce startup and shutdown issues.

There is much promise in combining CEP with distributed transactions, where a sequence of events - including reactions from listeners - may be party of a common (hierarchical) transaction.

One can also optimize CEP by ensuring it is demand-driven: a source for events is made available in a registry, but isn't 'activated' until there is a corresponding sink for events. This design can, for example, automatically enable and disable a webcam based on whether there is a frame-demand. This sort of demand-driven nature can be chained through a CEP system by creating each CEP 'channel' with three facets: one to subscribe, one to update, and one suitable for data-binding that simply indicates whether someone is subscribed. (An object-per-facet is quite suitable for object capability security...)

I also propose Simple Event Processing (SEP): like CEP with a few limitations aimed to achieve greater 'implicit' parallelization by sharing computation efforts. SEP forbids creation of new events, forbids operations that reference more than one event, and forbids side-effects (no internal state, no 'sends'). An SEP network could union, filter, and transform event streams, and may be combined with functional-reactive and data-binding to filter, transform, and union streams based on mutable policies (using FRP). The main advantages of this tweak: (1) it becomes possible to automatically optimize multi-cast and sharing of common 'sub-network' computations (useful when a few thousand observers might be watching an ad-hoc 'query' whose results you don't wish to name and prepare explicitly in advance). (2) Demand-driven nature can be enforced automatically end-to-end on the SEP network. CEP could then be implemented at the very 'edges' of an SEP network. To my knowledge, these optimizations have never been targeted by an existing implementation.

I'll note that, in real life, events are very rarely discrete. Most 'events' (weddings, forest fires, battles, earthquakes, toilet flushes) have lifetimes, and have a status that varies over this lifetime. One might even include the 'aftermath' of a forest fire or earthquake or wedding (cleanup, honeymoon, flooding, etc.) as part of the event. Or, put another way, the lifetime of an event is for as long as information about it remains interesting. (Discrete events also are difficult to expose in a UI due to temporal coupling and display lifetime issues.) As a consequence, CEP and SEP are probably not the best ways to model 'real life' events. Rather, FRP used to obtain live views of a database (or more than one) is often a better design for that purpose.

Rather than modeling 'real life' events, SEP serves better in an intermediate role as a high-performance plumbing (multi-cast, demand-driven, modular, composable, 'smart' network) layer just below an objects or actors model - especially in a distributed system (where you don't want a single-point-of-failure that might come if you try to represent SEP channels as 'objects'). Messages between actors serve well as 'discrete events'.

Publish, Subscribe models focus on distribution of events and data (of the sort used by SEP/CEP/FRP). A given model determines how one describes a subscription, and how much the model 'knows' about the data being published. Of particular interest are 'domain-centric' schemes where one can (say) describe data and event feeds in terms of geographic and geometric regions (tell me about crime within 2 miles of lat/lon). This sort of scheme would work well with massively concurrent systems with a potentially large number of multi-cast consumers, such as an MMORPG.

Publish-subscribe can be securely composed by having capability-secure but composable 'registries' for data-sources. I.e. I can create a new registry and add some sources, have it compose Google's registry, so I can see Google's data but Google cannot see mine. The trick is to achieve this coordination without introducing a 'demand' on the event and data sources, without hindering multi-cast optimizations, and potentially under some anonymity/blinding constraints (e.g. via a trusted intermediary). That takes some design effort, but is doable.

Thanks - Looking also for coarser grained mechanisms

For a crude example, servlets operating under a web application server; or simple 'grep' and 'wc' style utilities tied together in a Unix pipeline. Etc.

Again, thanks much!

Scott

p.s. On the lower level control mechanism .....

Given the ire drawn by the threads + locks model, I'm curious what folks think this the most elegant concurrency control mechanism/framework.

Again, mucho, mucho thanks in advance.

Scott

The threads+locks+shared-state model contradicts many principles associated with 'elegance'. For example, elegant systems should be composable, but two lock-based programs - independently correct - may deadlock when composed.

Shared state design also introduces challenges for explicit memory management, mutable collections (esp. with idioms that map an operation across the collection), visitor patterns, etc.

That said, threads and shared state can work without locks. Wait-free and lock-free algorithms and data-structures can be reasonably elegant. And a language with greater support for 'persistent' structures (such as ropes, copy-on-write maps, etc.) avoid many problems of shared state.

wait-free can be hard, too

i got the impression that writing a wait or lock free system was one of those Really Hard To Get Right So 99% Of Us Shouldn't Think They Can things.

totally agreed

Wait-free is really, really, really hard to get right.

That's why I said 'elegant' and not 'easy'. Like ballet. ^_^

Flow-Based Programming (FBP)

FBP is usually considered a message-passing approach, but connections between processes are specified separately from the processes. Processes only communicate via data streams, so applications built this way are relatively free of side-effects, and applications can be grown incrementally, so FBP has a number of the characteristics listed for SEP above. An early mainframe implementation of FBP, using "green" threads, is in use at a major Canadian bank as the infrastructure of a large application which has been in continuous use since the early 70s, processing millions of transactions a day, and undergoing continuous changes to logic and hardware, and has in my opinion significantly improved application maintainability.

I would very much like to see someone look at FBP in terms of the requirements of modern systems, especially multi-core and clusters, to evaluate how it should be extended to satisfy those requirements - and also to try to determine how it compares with other, related, approaches.

Great writeup of dataflow

Great writeup of dataflow systems on the FBP site (http://www.jpaulmorrison.com/fbp/cognates.htm )!

It'd be nice to add a lot of the more interesting theoretical work (lucid/synchrone/lustre/clock calculus, several of the frp dialects, the ptolemy languages, kahn process networks, etc.), systems projects (capriccio-era attempts at event processing in web servers), continuous/streaming processing projects, and popularly successful forms (max/sp, labview, etc.), but, as is, is an awesome slice of 70s/80s systems projects. Again, great!

Edit: to answer your question, check out capriccio (which is more about events than event streams) and the citation chain around it. Such work is often around the granularity that FBP is interested in. Extending it to multicore from multiprocessors and datacenters is an open question, but I don't think that's what you mean. More generally, while dataflow was inspired by parallel hardware, getting general versions to work well in parallel settings is still an open question (though believed to be better than say functional languages because of the emphasis on data movement).

Hi! I wanted to send this

Hi, Leo! I wanted to send this via email, but your last email address is broken:

Thanks for the kind words! And thanks for the pointers to all the neat stuff I need to look into - most of which I have never heard of! Your note is timely, as I am working on reissuing my book for Print On Demand, and I have been wondering what to cut out, and what to include... It did seem to me that it has references to technologies that today's readers will have never heard of, and probably will never use! There are also some descriptions of applications that were difficult then - but dead easy nowadays.

So I'll definitely look up Capriccio, and its citation chain - and keep things like Linda, CSP, but drop the stuff that hasn't survived... What about CEP, Map Reduce? Also lucid/synchrone/lustre/clock calculus, several of the frp dialects, the ptolemy languages, kahn process networks, etc.), systems projects (capriccio-era attempts at event processing in web servers), continuous/streaming processing projects, and popularly successful forms (max/sp, labview, etc.) And I am trying to find out more about ETL... I'd very much appreciate your feedback (either here or via email)!

I haven't followed CEP

I haven't followed CEP because (at least when I first read about it) it seemed rather niche relative to my interests. Map/Reduce is a nice framework for distributed computing concerns and optimizations (e.g., the 'stragglers' paper from OSDI) but not sure how it clearly fits in here. Never heard of ETL... and fixed my email, thanks :)

Map/Reduce is a nice

Map/Reduce is a nice framework for distributed computing concerns and optimizations (e.g., the 'stragglers' paper from OSDI) but not sure how it clearly fits in here.

I don't see much difference between the concurrent and the distributed case, except for the costs of communication. I would think the map-reduce distribution algorithm could be parameterized by these worker-communication costs, and so intelligently distribute work locally too.

Add failure (a lot of fun

Add failure (a lot of fun work now on predicting types of failure). If you've got 20,000 machines for a job, some will fail hard and some will fail soft. Restarting work, duplicating work, dropping work, etc. are common concerns (and optimization targets) for Hadoop etc. folks. To many, Erlang is interesting because of the failure handling.

I think you might also be under-emphasizing the scale of data involved. E.g., Erlang supports concurrency, but nobody would claim it's fast. Judging by the frameworks and talks I've seen for Erlang, the model falls down when dealing with big data (rather than a lot of little data).

There is some interesting

There is some interesting stuff comparing Erlang and FBP, contributed by Joe Armstrong and Ulf Wiger some time ago, in http://www.jpaulmorrison.com/cgi-bin/wiki.pl?ErlangLanguage - Joe says that Erlang is very fast in its natural application area, "happily supporting hundreds of thousands of concurrent processes" - I believe fairly short-lived ones. FBP is usually in the tens to a few hundred relatively long-lived processes.

I suspect, in your context,

I suspect, in your context, they mean it scales, therefore allowing fast overall progress despite system saturation. They don't mean you should write FFT or anything require fast single-threaded performance in it. This isn't me speculating, this is from their talks. Hence my distinction of computations over a lot of data, as opposed to a lot of computations over small bits of data. MapReduce etc. grew out of an unaddressed business need, and folks in companies that use it are still not happy enough with the performance there (meaning you have to get someone to sign off on you using cluster time) -- Erlang isn't even in the running.

I don't quite understand how

I don't quite understand how the issues you've raised prevent map-reduce from being scaled back from a distributed scenario to the local concurrent scenario. Certainly node failure is more catastrophic locally, particularly if it's hardware related, but I don't see how that diminishes map-reduce's usefulness as compared to other approaches to concurrency, which must suffer from the same problems.

How often do you assume your

How often do you assume your L1 cache fails in your dual-core GUI app, and would handling it be a first-class selling point for a JavaScript framework?

Edit:

So you don't need the failure handling benefits locally. You don't need the help in managing data distribution. You don't need help scheduling multiple user jobs. So if you don't need any of the benefits of what Hadoop has been engineered for.. what do you need?

If you just mean building programs out of map/localreduce/globalreduce.. I have no idea how to build a GUI out of that. Those functions are interesting if you do category theory or intro to computer science. PL abstractions and SE abstractions assume them as building blocks. I'm not saying anything you don't know; your question has me pretty confused.

I suspect when Sandro said

I suspect when Sandro said "I don't see much difference between the concurrent and the distributed case, except for the costs of communication," what he really meant was "I think we could usefully scale back the distributed case to a concurrent local one and avoid costs of communication". (Sandro, I assume, was responding to your emphasis on 'distributed' when you said "Map/Reduce is a nice framework for distributed computing concerns and optimizations".)

Thus, your answer - focusing on the 'much difference' and describing the multitude of relevant differences other than communication costs (such as failure handling) - confused Sandro, and Sandro's confusion is now confusing Leo...

I might be wrong, of course, but this has been my impression.

Right.. but that makes

Right.. but that makes little sense to me. It's tantamount to suggesting people install DrScheme, run in an even more reduced form of the beginner's level, and write the general stuff in Java. Essentially, that map/reduce is a good replacement for shell scripting to cobble together your app.

I'm a very problem-oriented person and that just rubs me wrong.

Map/Reduce

It makes sense from where I'm sitting, not so different from supporting Linq or SQL for local data - though, honestly, I'd prefer more general support for relations and logical programming in favor of map/reduce. Under relational DML, a Map/Reduce operation consists (in essence) of a Projection and an Aggregation (commutative+associative function, and identity). All this can be parallelized quite well.

Admittedly, if I had only a single processor, or only small amount of data, map/reduce wouldn't be all that helpful for performance (whereas the logic programming and relations would still be darn convenient). But given multi-core architectures or a GPGPU, map/reduce would still be useful over large, local lists and sets. (While 'dual core' may be common now, I suspect 'hexadecacore' won't be too rare in a decade...)

But it may be that Map/Reduce means something different to you than it does to me. There is, of course, the framework named 'MapReduce' provided by Google. Then there are the Map and Reduce operations pair, such that Map/Reduce refers to support for these operations. To which do you refer?

Clarification

David,

You are arguing for Map/Reduce/Merge, which changfes the reduce stage and appends a merge stage. See: Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters

Locally, the advantage hindges on having many processors per core and not using mechanical disk - Instead, quantum tunneling. Flash memory and solid state. Everyone talks about the multicore kerfluffle, but nobody pays attention to changes in the memory mountain.

Leo is speakly strictly about Map/Reduce.

For more fine-grained approaches, see JH Davenport's work in the '80s and '90s on using functional programming in distributed environments.

What I'd really like...

What I'd really like is a powerful total-functional programming language offering proofs of observational equality between functions.

Given observational equality, demand for a property such as commutativity, associativity, or an identity element becomes trivial to express:

 define commutative F = ((\x y -> x F y) == (\x y -> y F x)) define associative F = ((\x y z -> x F (y F z)) == (\x y z -> (x F y) F z)) define idelement I F = ((\x y -> F (F I x) y) == F) 

Knowing associative+commutative properties and an identity element for fairly complex functions allows one to parallelize a fold over a list - no matter how the list elements are divided, the result will be deterministic. It also allows deterministic results to be pulled out of sets without having overly specialized aggregation operators (I think 'sum', 'average', etc. are far too specialized).

It is possible to express all relational operations (join, select, project, group-by) and all traditional aggregates (sum, average, multiply, standard deviation, etc.) in terms of these aggregation operations. 'Union' is associative and commutative, and 'empty set' serves as an identity element. Admittedly, it may be better to offer specialized syntax for a few of these, and more complex logic operations (akin to Datalog) can be difficult to express this way, and indexing might better leverage a dedicated 'select' (though it'd be nicer if query optimizers could leverage indexes based on peeking inside the functions).

I think that would be a sweet-spot for supporting implicit data-parallel operations in a pure-functional language 'layer', and allowing databases to essentially be first-class. (Combine it with FRP data-binding, capability-security, persistence, actors, and distributed transactions, and I'll be a happy camper.)

A set of immutable tagged-union data will serve very well as a 'database', with each constructor essentially identifying a distinct relation in the database. With some first-class support, one might refine such a 'set type' for internal structure (especially candidate keys, possibly referential integrity). Further, it would be flexible in an 'open data-type' sense: a 'missing' relation is simply an 'empty' relation and subtyping could occur on a lattice. Table-dee (true) and table-dum (false) can be easily expressed via the presence or absence of a 0-arity constructor inside a set, which is nice for logic and data-log style queries. (By comparison, if one used a record-of-relations, a 'missing' relation would prevent functions referencing such relations from operating on the record, and 'table-dee' and 'table-dum' become dedicated booleans.)

I'd really like to see a better unification of pure logic and functional programming in the lower layers. Haskell uses its lists for this sort of purpose, but that doesn't really offer much for data-parallelization, doesn't readily support indexing, and requires explicit refinement (uniquify).

A stepping stone is

A stepping stone is Fortress, where they are (or were) experimenting with allowing annotations of e.g., communitivity. Not sure about creating your own annotations and if they are checked (hard to do -- some races are benign and therefore would show up as false positives) -- Fortress seems to first be answering whether they can really use it.

As is.. writing (though obviously not checking) such programs seems like it'd be easier as using a library over a Cilk or TBB style parallel system than over Map/Reduce/Merge which forces the structure too much.

what he really meant was "I

what he really meant was "I think we could usefully scale back the distributed case to a concurrent local one and avoid costs of communication"

Well, there are communication costs, such as cross-processor cache contention, they are simply different, and orders of magnitude smaller than communication across a network. Map-reduce could then distribute much finer-grained workloads across processors.

focusing on the 'much difference' and describing the multitude of relevant differences other than communication costs (such as failure handling) - confused Sandro, and Sandro's confusion is now confusing Leo...

Perhaps it is simply a misunderstanding, as I took Leo's, "not sure how [map-reduce] clearly fits in here", to mean he wasn't sure how map-reduce could be considered a concurrency idiom.

How often do you assume your

How often do you assume your L1 cache fails in your dual-core GUI app

Certainly fault-tolerance is a useful feature of map-reduce, but not its only feature. Certainly it can move work off of a core if it suddenly fails, but map-reduce also enables transparent scheduling of a program across multiple workers, and there's no reason those workers couldn't be local instead of remote. I'm really having a hard time understanding what's so confusing.

You don't need the help in managing data distribution. You don't need help scheduling multiple user jobs.

Of course you need those, that's what a concurrency idiom is all about. Why would you want to manually handle data distribution or scheduling if a generic framework could handle it for you?

If you just mean building programs out of map/localreduce/globalreduce.. I have no idea how to build a GUI out of that.

What does a GUI have to do with it? This thread is about concurrency idioms, not about GUIs. Map-reduce may or may not be ill-suited to constructing GUIs, but that's irrelevant to its suitability as a concurrency idiom is it not?

And GUI interaction would not benefit from being concurrent in any case since a GUI is rarely a bottleneck of program performance.

I want an example of a

I want an example of a non-distributed program where you'd use map/reduce, as opposed to, Cilk, CML, Scheme, etc.

You seem to be either reading more into Map/Reduce than I am or are underestimating its benefits, and it sounds like the former. This is as a person in PL who works essentially daily on optimizing multicore code, so I'm curious.

Edit: example of a concurrent non-widely-distributed program

Any program over which you'd

Any program in which you'd perform folds and maps can be hooked into MapReduce. That seems like a fairly large class of programs, particularly in functional languages.

That repeats what I said I

That repeats what I said I hoped you didn't mean in my own replies.

If we now seem to be on the

If we now seem to be on the same track, do you agree or disagree that map-reduce can be used to write concurrent programs? If not, why not? If so, then is it not a "concurrency idiom" as requested in the original poster's question? That's really what it comes down to.

Whether map-reduce can be used to program a certain class of concurrent programs, ie. your GUI example, is not relevant: no concurrency idiom is so expressive as to naturally express all concurrent programs of interest without undesirable tradeoffs. I think as long as the class of concurrent programs it can express is sufficiently large, then it's interesting as a concurrency idiom in its own right.

Map/Reduce is implicitly

Map/Reduce is implicitly concurrent. Its restrictions make hard problems tractable in the distributed case. I don't see why it's interesting in the non-distributed one (e.g., a laptop or phone). I threw up GUIs as one class of concurrency problems on such devices; I can point to various concurrency models and say that's at least one domain they're good for. What about map/reduce? Stating that it can describe any program written with maps and folds isn't useful in this context, except perhaps to ask the question -- why is that better than writing it with maps and folds straight up? Clarity? Performance? Reasoning?

I don't see why it's

I don't see why it's interesting in the non-distributed one (e.g., a laptop or phone). Stating that it can describe any program written with maps and folds isn't useful in this context, except perhaps to ask the question -- why is that better than writing it with maps and folds straight up? Clarity? Performance? Reasoning?

Interesting or not, it satisfies the question asked in this thread. I personally find it interesting, because if more functional languages provided map-reduce and map-reduce-merge in their standard libraries instead of traditional maps and folds, perhaps with a simple effect system to ensure it's safe to parallelize, they could transparently parallelize many programs. This has been a long-standing claim of functional languages, and all it requires is a tweak to standard libraries.

If you're not interested in this type of solution or the types of problems it solves, that's your prerogative, but that doesn't make it any less relevant to the question asked.

I took issue with trying to

I took issue with trying to say that the distinction for map/reduce on the distributed vs. non-distributed case isn't significant. One has been proven, the other is a relative failure. It's worth researching, but pretending this dichotomy doesn't exist is odd.

I never said the differences

I never said the differences between distributed and non-distributed weren't significant, I said there weren't significant differences for the purposes of concurrency as specified in this thread: scaling across CPUs or scaling across hosts can be treated the same from map-reduce's perspective, modulo some communication cost overheads used to inform work chunk size. Also, the fact that map-reduce hasn't really been tried in the local concurrent case doesn't make it a failure, unless you meant something else by this.

ETL is

ETL is Extract/Transform/Load and is a key application type in the Data Mining area - very expensive in both run time and dollars! Here is some stuff about it: http://www.jpaulmorrison.com/cgi-bin/wiki.pl?ETL . Do you have anything more general on events vs. threads? I found the PPT presentations under Capriccio assumed too much knowledge! Thanks!

The one which I was thinking

The one which I was thinking of when I wrote Capriccio was SEDA (from some of the same folks): http://www.eecs.harvard.edu/~mdw/proj/seda/

They are partially looking at it from an expressive standpoint (events and cooperative scheduling) -- though the lack of continuation support is odd in this respect. These are systems guys, so there is more emphasis on performance. A lot of it seems to be geared to multiplexing many threads that are IO bound. Unclear how the multicore case maps in, but that's relatively nascent.

Classical Jackson Inversion

Michael Jackson, the strutured analysis & design guru, did a case-analysis of input forms to output forms for transformations-processing - this was done in the '70s. His idea came from compiler research in that era.

It is still very relevant today, as Michael Kay's Saxon XML libraries use its push-based semantics to handle streams of XML loads. XSLT is really just ETL for the XML-minded.

Jackson described his

Jackson described his approach as writing a bunch of main-lines, and then turning all except one inside-out. This is not always easy! Also you often run into what he called a "structure clash". In FBP, all the processes are essentially main-lines, and you avoid both of those problems...

There are indeed

There are indeed alternatives to Classical Jackson Inversion that resolve structure clashes. Probably the most interesting is the paper "Structure Clashes -- An Alternative to Program Inversion". It uses prototypes, which incidentally is the object model used by Alessandro Wsarth and Alan Kay in their "computing with tabs" paper on the VPRI website, as well as Jonathan Edwards' object model for subtext/coherence language.

CAR Hoare also used what he called the 'essence' of Jackon Inversion as an influence for CSP (excellent paper, by the way).

Anyway, the major original flaw with Jackson inversion was not structure clashes but rather the requirement for intermediate representation, which flow-based programming has never had as a limitation since it effectively uses pipeline promises (aka stream processing with futures) instead. - see Barbara Liskov's work on pipeline promises and Ted Nelson's Project Xanadu. Object capability security fanatic Mark Miller happened to work on Xanadu's model of this. For a mathematical explanation of how to make Jackson Inversion work for things like XSLT libs like Michael Kay's Saxon, a theoretical treatment is given in A Theoretical Approach to Program Inversion. What this effectively means is that you can lift any program structure and its corresponding data structure into a stream and process it. The prototype paper, on the other hand, explains how to do this in a maintainable fahsion.

Pipeline promises

Could you explain how pipeline promises (aka stream processing with futures) relate to FBP - I know the term, but have always had trouble visualizing how it's implemented!

FBP and Futures

flow-based programming has never had as a limitation since it effectively uses pipeline promises (aka stream processing with futures) instead

This assertion strikes me as somewhat unusual. It has always been my impression that FBP and promise pipelining are very much orthogonal. In what way have FBP implementations (especially the multi-process variations) historically supported promise pipelining?

Not an expert

While I've studied Jackson Inversion fairly wel,, FBP is a new idiom to me.

I started reading Paul's book only a few months ago, and we started an email correspondance almost immediately. He helped me track diown some of his obscure citations, such as Nate Edwards' "configurable modularity" position paper.

Looks like we're continuing

Looks like we're continuing the conversation, here! I confess I'm a bit confused - didn't you say in an earlier post that flow-based programming effectively uses pipeline promises? I could imagine that they are related, but surely the mechanism is quite different...?

Herb Sutter's Effective Concurrency

Herb Sutter has a book coming out soon from AW about Effective Concurrency that will be based off his blog and DDJ articles:

The most recent one contains links to the previous ones: Effective Concurrency: Avoid Exposing Concurrency â€“ Hide It Inside Synchronous Methods

OPL - Our Pattern Language

Ralph Johnson has a blog post discussing a pattern language for parallel programming.

Sort of (d)evolved conversation

I was hoping for a pointer to some survey of *current* and *past* practice, not yet another debate on the technologies more or less still restricted to research labs (STM and such).

More in the spirit of my original question: What does Oracle do on a big Sun box with oodles of CPU's? What are the various strategies for making use of oodles of CPU's on MVS machines? What were multi cpu scaled architectures before threads when we had pretty much only processes, some kind of shared memory and a few IPC mechanisms? How do the last decades' multi-cpu architecture super computers divvy up jobs across CPU's? How do old fashioned Unix pipelines benefit (or not) from adding oodles of CPU's to a machine.

Just examples, but you get the idea. I'm looking for info on deployed technologies that likely has not been the subject of research in years or even decades.

I'm really looking for a picture of recent past and current industry practice and have no interest (in this inquiry) in what *might* be coming down the pike with multicore chips in the future.

Many thanks again.

Scott

Solved problem (?)

Problems of large scale concurrency were largely solved in the 1970s and '80s by people building and using really big computers. Until recently there were only a few machines large enough for those solutions to make much difference, so of course the large body of practicing programmers is unaware of them. The main epistemic feature of computer programming is NIH -- there's little incentive to learn from our forebears when we can just Make It Up, which is where threads come from.

That said, in the community of programmers that deal with really large problems running on really large hardware, things have pretty much shaken out to the point where they mostly use OpenMP and its descendents. But there's not much programming language interest in that.

Thanks!

On long car rides, my dad used to tell of big systems using various MVS technologies (dedicated I/O processors, virtualiization and CICS on smart terminals and transaction monitors come to mind).

Then I remember walking by the cool blinking lights of the Connection Machine at Aiken Lab back in the 1980's.

And so nowadays, every once in a while I have the occasion to strongly doubt whether we haven't learned more than a thing or two about taking advantage of concurrency, in various ways, over the past 40+ years.

Scott (not as old and salty as he sounds)

You know, I wrote to Edward

You know, I wrote to Edward Lee to point out that FBP doesn't suffer from the kinds of problems he describes, and he wasn't too interested... This seems to me to highlight the divide between academics and practical developer types - academics aren't interested unless they can write a scholarly paper extending an idea - what they don't want to do is actually try something out! [end rant!]

Perhaps I read you wrong...

I'm sorry, but you're not actually suggesting that OpenMP is the way to of maximize the amount of parallelizable code so that software benefits from hardware parallelism "for free" over time, the solution to the problem of which you speak, are you?

Massively Parallel Real Time

OK, since you asked for examples in addition to surveys, here's what we did at Flavors Technology (now PST) in the 80s...

Our application focus was hard real time, massively parallel. The hardware was custom but based on off-the-shelf 32-bit microprocessors; there were about 80 microprocessors in a system, time-sliced into about 10,000 virtual processors. Each virtual processor was individually programmable, and had guaranteed resources: local storage, processing time every "frame" (typically 60 Hz), N inputs from global memory, and M outputs to global memory. Communication was only through global memory. It was synchronized as follows...

Every frame proceeded in three phases:
- all inputs are read from global memory
- all of the virtual processors' code is run to completion
- all outputs are written to global memory

The order of writing the global memory is based on a virtual processors' priority; the rule is "last write wins" since only the last write is seen by other virtual processors. Because the writes are grouped by virtual processor, there is no need for locks since writes of multiple values is coherent.

If you look at this sideways it is a kind of real time STM. The hard real time guarantees were enforced because the frame time was fixed; a program that tried to exceed its pre-allocated resource limits was either suspended (in some versions of the architecture) to resume on the next frame without re-reading its inputs, or terminated and an error flagged.

The the compilers and runtime were implemented to order the code execution and global memory access to pipeline all operations so all hardware was maximally utilized.

This architecture was deployed in many places over a few generations of implementation. It is still in operation at some sites. The newest systems are rooms full of stock quad or opto Xeon machines on high speed networks. I believe the largest of these systems is a country wide simulation of the entire Japanese Skinkansen; it runs much faster than real time, and is used in testing of new train schedules, operator training, disaster planning and recovery, testing of new station configurations, electrical grid loading simulations, etc.

There are lots of bells and whistles in the Paracell programming language used on these systems, but as Tom Duff mentioned, there seems to be little PLT interest in multiprocessing, and especially in real time systems.

Clarification

I didn't mean to needlessly *deprecate* the value of ongoing research by any stretch. If nothing else, I have no qualifications to start such a debate.

Personally, I was just curious 1) what was done successfully during the past few decades with "parallel processing" very roughly defined and 2) how might some of that practice be repackaged up and applied to current or upcoming programming languages running on the "family" of current multi-cpu/core architectures coming up soon or out there now.

I'm loving task parallelism

I'm loving task parallelism for multicores. Dataflow of course for GPUs (... but that's not novel). Obviously, pthreads dominate :(

Promises/futures, nested data parallelism, and software transactional memory are popular in research but haven't really crossed over into the application space. I wouldn't call them success relative to task parallelism (where I often hear about deployments for TBB).

Looking at the-new-java languages aimed at parallelism helps: Fortress (Sun), X10 (IBM), and habenero (from Indiana U.).

As for conjecture: Work in data structures and algorithms might end up being more important for the question of parallelism. E.g., communication-minimizing linear algebra. Stuff like map/reduce might get us much of the rest of the way there. The middle ground is unclear (... and why I like research like Cilk/TBB and lightweight annotations).

Edit: actually, the biggest win outside of dataflow: web server technology. Actors, HTTP, SQL, anything that can use memcached.

1) what was done

1) what was done successfully during the past few decades with "parallel processing" very roughly defined

Umm... this is not just theory.

For example, my reply to Tim Sweeney about types of transactional information systems is very practical. MVCC is how Oracle's concurrency mechanism works. Moreover, I gave you an academic reference to look this up. That's extremely valuable, as MVCC is not documented well in most places, so a solid reference is extremely helpful in understanding it. MVCC flips a lot of transaction control concepts, such as serialization, on its head.

Other replies have given you similar feedback.

2) how might some of that practice be repackaged up and applied to current or upcoming programming languages running on the "family" of current multi-cpu/core architectures coming up soon or out there now.

This is an open-ended question, and really depends on the kind of language you are building.

[Edit: for MVCC in programming languages, there is the Multiverse project that targets Java 1.6. At the syntactic level, programming languages tend to be biased toward representations of programs where the invariants on sequence dependencies are not clear. This is a modularity problem related to representing "state retention" correctly, and it affects both monad-based state and traditional OOPL where identity and state are tightly coupled.]