Tim Bray: On Threads

A report on Sun's "Chip Multi-Threading" summit.

Not much that is new here, but the trend towards concurrency is one we discussed a few times, and which seemed to attract interest.

Bray's comments regarding erlang and other languages are sure to annoy LtU readers. Let's not start a flamefest. Let me just remark that even if you want shared data structures, Java isn't your only option. You should at least check out Ada... (and you can compile Ada to the JVM, if that's what you want to do. let's not confuse language and implementation).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Differences in philosophy.

The Erlang comments didn't annoy me, they just stated the obvious; and pointed out a key philosophical issue:

Some folks think that Shared State Concurrency is a good idea; including many in the OO camp. Java seems to be the modern epitome of this school of thought; though the mantle was previously worn by Smalltalk.

Others, including those in the Erlang camp, think that shared state concurrency is a bad idea--that it's the fundamental problem. Bray seems to think that by getting rid of SSC; Erlangers are throwing the baby out with the bathwater; Erlanger think that this particular baby is the problem itself; no matter how much scrubbing is performed in the proverbial bathtub. Instead, Message Passing Concurrency is proposed as a better model; one which scales to distributed systems (including unreliable networks) more easily; and has far fewer opportunities to screw up.

Another proposed solution to the problem the explicit transactional model--a database of some sort. Not necessarily relational; but that's a good way to go.

And of course, many in the FP community go further and dislike all side effects; just not those which are visible from multiple threads of control and thus introduce a whole class of nasty race conditions. Declarative concurrency is a productive model which is (almost) side-effect free (I say "almost" because there are ways to cheat, and exceptions can introduce observable non-determinism).

CTM, of course, is an excellent reference for this particular debate; augmented with The Third Manifesto for arguments in favor of the explicit transactional model (with the caveat that Darwen and Date only consider relational databases to be worthy of discussion; other database topologies are given short shrift. Transaction semantics are, of course, largely orthogonal to database topology).

A quick suggestion to PVR: If you ever do a second edition of CTM, might a section on databases/explicit transactions (as a way of addressing concurrency issues) be a worthwhile addition?

Another suggestion - joins

Funny, just yesterday I started re-reading the "Declarative concurrency" chapter of CTM, with a goal to understand, whether a specific pattern of concurrent execution can be naturally expressed using dataflow variables, or it needs message passing, or better (or worse) yet a separate linguistic abstraction.

The pattern comes from join calculus and consists of multiple joins reading from multiple shared inputs. No datum should be consumed by more than one join. So if we have J1 reading from X and Y, and J2 reading from Y and Z, then the sequence X, Y, Z fires J1, while the sequence Y, Z, X fires J2 (only). There is a relation of precedence between joins (or in more advanced settings between more complex objects that include joins), so each potential conflict (like X, Z, Y) falls back to the precedence (e.g., J1 < J2, so X, Z, Y fires J1).

If you think this example is too simple, try a cyclic one: add J3 reading from X and Z. Specifying different requirements for conflict resolving is left as an exercise for a reader :-)

So another quick suggestion to Peter for CTM2 (if I may) would be to at least use joins in examples. This has its appeal for the industry as well - joins are staples of workflow and process management.

Transactions and joins

A quick suggestion to PVR: If you ever do a second edition of CTM, might a section on databases/explicit transactions (as a way of addressing concurrency issues) be a worthwhile addition?

Section 8.5 is already about transactions, but it is more about understanding the concepts and how to implement them. Do you mean a section that is more about *using* transactions?

to at least use joins in examples.

Thanks for the suggestion, I'll keep it in mind. Can you recommend a good reference (from the join calculus or workflow literature) on the practical use of joins in the join calculus sense?

References

Can you recommend a good reference (from the join calculus or workflow literature) on the practical use of joins in the join calculus sense?
For an example of algorithm based on joins, see Santa Claus in Polyphonic C# (for some reason MS Research site is not responding ATM).

For an example of a workflow, see YAWL example 2 (click on workitems appearing in terminals to "complete" them). Actually, Example 1 is more focused - just a fork and a join, but it does not demonstrate non-monotonic nature of workflows to full extent (while Example 2 does - not only agents are able to "steal" workitems from each other, but also different paths of execution are possible, see actions D and H competeing for C6). This non-monotonic feel is very similar to Petri nets - looks like another suggestion :-)

Maybe, but I doubt Tim is right.

Well, Tim says that Erlang "abjures the use of global data," and this isn't quite right. Erlang abjures the use of global mutable data. If you have enabled the shared-heap option in the OTP compile, then there is no copying when passing a large message between two threads. The structure stays in the same location on the heap. This is essentially what a global data structure does... except that Erlang's structure is immutable.

Immutable vs mutable structures have a number of efficiency tradeoffs, but linear logic can make dealing with mutable structures nearly as natural as immutable structures.

I've talked at length with those in industry who actually use Erlang for telecommunications and financial services... They are quick to point out the "COPL all-day vacation", and the stability and robustness of Erlang versus threaded Java when under load. They attest that concurrency gets done in a order of magnitude less time in Erlang than in C. And, they lament that subtle race conditions usually exist long after even after threaded imperative code is "finished."

Dataflow concurrency by future variables a la Oz is another possibility for declarative concurrency. Like Erlang, it's not entirely free of race conditions. However, the only race conditions happen when a unification error arises... thus, if you don't get an error, then you don't have a race condition. I think dataflow concurrency addresses Tim Bray's desire to have two threads operating on the same structure.

I have wondered how naturally and efficiently Erlang-style concurrency would support the stream processing model being touted for the Cell processor. In Erlang, a message has to be fully constructed before being passed along. One could break messages into smaller bits, which might break the natural structure of the program... but then shared-state concurrency might not be any better in this regard.

So, my response to Tim's "maybe, but I doubt Erlang has it right" is "maybe, but I doubt Tim is right." Furthermore, if Erlang's concurrency model truly has fundamental inefficiencies, I believe that they can be overcome without resorting to shared-state concurrency.

My best evidence is the modern graphics processor, which has eclipsed the CPU in transistor count and raw power. A GPU registers are reset at each stage in the pipeline, which enforces a kind of declarativeness. While a GPU isn't a general-purpose processor, this declarativeness has allowed it to take better advantage of parallel processing and VLSI compared to general-purpose CPUs.

Fully constructed messages?

I have wondered how naturally and efficiently Erlang-style concurrency would support the stream processing model being touted for the Cell processor. In Erlang, a message has to be fully constructed before being passed along.

How is it a problem that messages have to be fully constructed, as long as they coincide with atomic work units?

If they don't... well, can somebody come up with a convincing example why they absolutely cannot be made that way without degrading program structure?

Erlang and global data

Well, Tim says that Erlang "abjures the use of global data," and this isn't quite right.

It is actually completely wrong! Erlang's model is message-passing concurrency with global transactional shared state (the Mnesia database). Erlang by no means advocates a pure message-passing approach, but a multi-layered approach: pure functional on the bottom (the definition of an Erlang lightweight process, which may be augmented with state local to a process), message passing in the middle, and transactional shared state on the top. Mnesia is crucial for Erlang's high availability (fault tolerance for software errors). On the other hand, it is important that Mnesia itself is implemented using message passing. I.e., the underlying computation model is still message passing with a functional core (like chapter 5 of CTM).

Parallel transitive closure algorithm

I think dataflow concurrency addresses Tim Bray's desire to have two threads operating on the same structure.

An example of how nice this is can be found on page 470 of CTM: Figure 6.13 shows a parallel dataflow version of the classic Floyd-Warshall transitive closure algorithm. This version is derived in a trivial way from the declarative version of Figure 6.12, by adding thread statements judiciously.

Thanks

Thanks for the pointer. I have a copy of CTM, and have made it through the first 4 chapters. I really need to spend more time in it. My statement was uncertain because of possible misunderstanding of Tim's statement. Anyway...

Oz first caught my eye when I was working on understanding Hypersets, and I happened to work through one of Oz's web tutorials... I quickly realized that bisimulation was included in the language, which was more than I could do with non-wellfounded data structures in Haskell.

I am rather interested in techniques that would make working with non-wellfounded data as easy as pattern matching on trees. I don't have any ideas how, yet...

What is a "non-wellfounded da

What is a "non-wellfounded data structure"?

Well-foundedness and Bisimulation

What is a "non-wellfounded data structure"

In context, Leon means a data structure representing a non-wellfounded set.

In practice, this would mean a datastructure that contains one or more circular references, or one or more infinite chain of members (say an infinite list comprehension).

I quickly realized that bisimulation was included in the language, which was more than I could do with non-wellfounded data structures in Haskell.

I'm not sure what you mean, Leon, by Oz having bisimulation built-in: do you mean unification?

I could see that might work for simpler examples, but I'd be worried with hairier instances. Interesting idea though.

Bisimulation meaning Entailment

CTM calls it entailment. Maybe there is a difference between bisimulation and entailment, but my understanding of Oz isn't that deep yet.

For example:

local A B C A1 A2 B1 B2 C1 C2 in
   A = a(B C)
   B = b(C A)
   C = c(A B)
   A1 = a(B2 C1)
   A2 = a(B2 C2)
   B1 = b(C2 A2)
   B2 = b(C1 A1)
   C1 = c(A1 B1)
   C1 = c(A2 B1)
   {Show A==A1}
end

will print out "true"... there are multiple representations of the same non-wellfounded structure. This is a major improvement in observability versus analogous structures in Haskell, which can only observe finite prefixes of non-wellfounded structures.

However, I also don't think it's enough to make direct representations of non-wellfounded data generally useful in Haskell and Oz.

Trees and Graphs

I used induction and bisimulation as terms for these two concepts but by know I think tree-like and graph-like are the better definitions.

More specifically, in the mathematical sense, most graph-like [data]structures are well-founded (there is an ordering such that each non-empty set has a smallest element), thus the induction/bisimulation distinction is not correct.

To some extend, the standard bisimulation proof [scheme] to show that structures describe similar shapes, entailment, is correct because you have well-founded sets.

What scales and what doesn't

Here:
I stuck my hand up and asked: is there anything that in principle doesn’t scale with multithreading? There wasn’t a lot that leapt to the minds’ eyes, except for compiler code.

Looking at the application level, Tim might be right. load-balancing web servers exist, and I would classify such solutions mainly as engineering effort. However, on the algorithm level, there some hard nuts which are not at all obvious how to parallelize well.

Depth-first search is one such candidate (unless NC=P, which is not likely).

Composable memory transactions

It'll be interesting to see how lock-free programming and in particular software transactional memory works in larger systems. The Haskell version of it is utterly beautiful, but still only single-threaded at the moment. I understand that they are developing a concurrent runtime using STM, which has the potential to be a big win.

More

Tim reponds to comments.