## The right default: concurrent components with message passing

Here's something to offset all the
long discussions on typing that
have been taking place recently
(see Concurrent Components With Message Passing):

In our experience, the right default for structuring programs is as concurrent components that communicate through asynchronous message passing. Components should be sequential or communicate synchronously only if necessary. This is not a new idea; Carl Hewitt anticipated it thirty years ago in the Actor model. But today we have many strong reasons for accepting it. For example: [...] (discussion continues with
references to Erlang, E, and CTM)
[...] Unfortunately, these reasons and their conclusion are almost completely ignored by mainstream languages and by books on program design. In both, object-oriented programming and shared-state concurrency are given priority, even though they are the wrong default. [...]

So, is this heresy or an unfortunately ignored truth?

## Comment viewing options

### "Wrong default" is an awfully strong way to put it

Though I do definitely agree that a lot more discussion and emphasis on method calls as async messages is in order.

Unfortunately this is partly a chicken-and-egg problem. How do you properly teach this, in a way that students can make some use of it, if none of the languages they have any hope of making extensive use of don't support it?

I, for one, would very much like to see more support for async methods, and asynchrony in general, in languages and language runtimes and supporting systems. (Which, I suppose, I'm in a position to do to some extent, so I should put my money where my mouth is here...)

### Re: "Wrong default" is an awfully strong way to put it

Dan: "Wrong default" is an awfully strong way to put it

I would agree that it's an awfully strong way to put it. I would also agree that it's the correct way to put it. I just don't understand how any thinking person in our industry can defend the status quo given our horrible failure to build correct systems in all but the most rare of cases.

### Speaking of strong statements...

I just don't understand how any thinking person in our industry can defend the status quo given our horrible failure to build correct systems in all but the most rare of cases.

Now that is a strong way to put something! Not having a lot of concurrent programming experience, I won't comment on whether shared-state concurrency is bankrupt, but its being so is hardly a reason to condemn the entire industry. The status quo is hardly monolithic, so I would claim that "any thinking person" would refrain from passing judgement on the status quo as a whole. While you can point to plenty of software failures, I wouldn't say that correct (or at least "sufficiently correct") systems are all that hard to find. Right now, for example, I am composing this message in a fairly slick threaded browser (Safari) on top of a multitasking OS (MacOS), which together form a very stable system. The system doubless has bugs, and therefore isn't provably correct, but it's fine for my purposes.

### Strength and Failure

sean: Now that is a strong way to put something!

Yep, it is. On purpose.

sean: Not having a lot of concurrent programming experience, I won't comment on whether shared-state concurrency is bankrupt, but its being so is hardly a reason to condemn the entire industry.

On the contrary; it is indeed a reason to condemn the entire industry. It just isn't the only reason to condemn the entire industry. Other reasons are summed up nicely in <http://www.cse.buffalo.edu/~kershner/cse4-542/ErrorLecture.F01.ppt> and <http://case.syr.edu/assets/documents/SoftwareProjectManagement-7-26-01Roundtable.ppt>, among other places.

sean: The status quo is hardly monolithic, so I would claim that "any thinking person" would refrain from passing judgement on the status quo as a whole.

I would agree with you if there were trend data indicating that the failure rate of software projects were trending downward. There aren't.

sean: While you can point to plenty of software failures, I wouldn't say that correct (or at least "sufficiently correct") systems are all that hard to find.

But you never see the overwhelming majority of the failures, so you're suffering from selection bias.

sean: Right now, for example, I am composing this message in a fairly slick threaded browser (Safari) on top of a multitasking OS (MacOS), which together form a very stable system.

That's funny; we appear to be using the same software. Unfortunately, the existence of Safari and Mac OS X tell us precisely nothing about the claim either of us is making: it's statistically irrelevant to your claim, and I never claimed that there were no successful shared-state-concurrency-based systems.

sean: The system doubless has bugs, and therefore isn't provably correct, but it's fine for my purposes.

Mine too! But that doesn't change the fact that most software projects using shared-state concurrency have bugs and, as a consequence, schedule problems that software projects using asynchronous message-passing do not, or the fact that the overwhelming majority of software projects fail. Software development is a mess; it's a full-on crisis. Frankly, in a way, you're making my point for me: any time anyone dares to even suggest that the way software development is done could be improved, the defenders of the status quo come out of the woodwork. That's not how we're going to fix our utterly broken software development processes, some of which have to do with the tools we've crafted, and some of which do not.

### Other reasons are summed up n

Other reasons are summed up nicely in...
I managed to find a non-PPT version of the second, but since it mostly deals with planning and process, plus a heavy dose of marketing-ese, I don't see how it supports your contention that the alleged failure of shared-state concurrency, a particular programming technique, is a reason to condemn the software industry.

IMHO the one relevant part of the presentation was the claim in the title, that "85% of software projects fail". First, I'd like to know how Mr. Oddy defines "failure." If 85% of projects are completely scrapped without being replaced by a better alternative, that's bad. If 85% fail to completely meet their original (or even final) requirements documents, but are still usefully deployed, or if parts of their code are reused in a subsequent project, then 85% is not so bad.

Second, what is a reasonable failure rate for software? Biotech startups and fast-food restaurants have high failure rates, but we don't talk about "the fast-food crisis" because it simply doesn't matter that individuals fail, because replacements will always exist. In cases where it is important for an individual project to succeed, organizations like JPL have the expertise and money to get reasonably high success rates. For most commercial applications, it simply doesn't matter (in the big picture) if over half of projects fail, because there is so much redundant software in the market. You have to ask yourself whether, at some point, it's cheaper fund a few high-success projects, or to simply start more projects cheaply and allow most to fail.

If it's still cheaper on the whole to allow most projects to fail in many cases, there's no reason to see a downward trend in the failure rate. Furthermore, since software is getting more complex in both LOC and number of features, I could argue that it is amazing if the failure rate isn't increasing.

Frankly, in a way, you're making my point for me: any time anyone dares to even suggest that the way software development is done could be improved, the defenders of the status quo come out of the woodwork.
Yep, bring on the nattering nabobs of negativism, the chattering cockroaches of crapulence... ;) I'm not saying software development is as good as it will get, or even that easily-attainable improvements don't exist. I'm just saying that (1) there are huge distances between "suboptimal", "mediocre", and "utterly in crisis"; and that (2) not using a particular programming technique, even when a study could show that doing so would reduce failure rates, does not by itself constitute evidence of a universal, or even widespread "full-on crisis".

If you were to argue that shared-state concurrency is an example of a general failure to adopt better techniques, that this more general failure can potentially be addressed by changes to the software development process, and that not addressing it has some cost to society as a whole, you'd have a much better case. Maybe this is what you're arguing, but if so, I don't think the whole paper's been posted to LtU yet...

Edit: I think we agree that "software engineering" is somewhere between a misnomer and a farce. I'm just speculating that in many cases, software's economics don't support an engineering discipline, so the lack of such shouldn't be called a failure.

### Fail-Only?

sean: [Tons of good lead-in elided]

sean: In cases where it is important for an individual project to succeed, organizations like JPL have the expertise and money to get reasonably high success rates. For most commercial applications, it simply doesn't matter (in the big picture) if over half of projects fail, because there is so much redundant software in the market. You have to ask yourself whether, at some point, it's cheaper fund a few high-success projects, or to simply start more projects cheaply and allow most to fail.

An excellent point, I think, if you accept the notion that most personal computer software is an interchangable commodity, but now we're back to another orthogonal argument, namely whether or not that should be the case at this point in history.

Still, leaving that aside, an interesting point remains: is the software industry simply following the "fail-only design" of Erlang/Joe Armstrong's thesis writ large? Are we just implementing "components" at the granularity of entire applications, allowing them to fail, and implementing a "cold swap" replacement strategy ("Word crapped out on me, but I managed to import my important document into Word Perfect well enough")? It seems to me that this only works if your software is very cheap or, conversely, everyone who needs the software to "work" is very rich. Maybe that's the observation: America is rich enough that we can afford to work this way.

sean: If it's still cheaper on the whole to allow most projects to fail in many cases, there's no reason to see a downward trend in the failure rate. Furthermore, since software is getting more complex in both LOC and number of features, I could argue that it is amazing if the failure rate isn't increasing.

This suggests, though, that an economic collision is on the way: at what point will the increased failure rate not be affordable anymore?

sean: I'm not saying software development is as good as it will get, or even that easily-attainable improvements don't exist. I'm just saying that (1) there are huge distances between "suboptimal", "mediocre", and "utterly in crisis"; and that (2) not using a particular programming technique, even when a study could show that doing so would reduce failure rates, does not by itself constitute evidence of a universal, or even widespread "full-on crisis".

It's true that I shouldn't plot a curve from a single data point. :-) I think you've done an excellent job of revealing a differing set of assumptions: much of the industry likes to act as if it is mature and commoditized. My own sense is that the industry is in its infancy, and what commoditization we can identify arises from a set of dramatically reduced expectations on the part of the market (cf. Alan Kay's great quote about software being an exploitation market like the ones created for teenagers). Of course, one flaw in the commodity argument is the notion that even the most obvious software categories that could be described that way, e.g. word processing and spreadsheets, aren't actually fungible in practice: migrating from one word processor to another is excruciating, and people are loathe to do it. So what this economic perspective on software really means is that the ones who make money in software are the ones who:

1. Provide the necessary software infrastructure for other software to run. This is popularly called the "operating system."
2. Survive long enough with enough profitability that they can afford to only have 15% (OK, for a single company, that's almost certainly too low) succeed.
3. Achieve data-format lock-in via a combination of steps 1 and 2.
4. Charge nearly as much for an upgrade as for the original unit, which is supported by 1, 2, and 3.
In the end, I guess I don't feel that you're mistaken as much as I feel that it doesn't have to be this way, and that treating the industry this way ultimately only serves the Microsofts and Adobes and other hardware-attached (Windows, Postscript) monopoly-position (Windows, Photoshop/Illustrator) players. That is, if we were to make it an order of magnitude easier to write "complex" software, it would be beneficial to the industry by virtue of allowing developers who can't afford a 10% failure rate, let alone a 50% failure rate, to play.

sean: If you were to argue that shared-state concurrency is an example of a general failure to adopt better techniques, that this more general failure can potentially be addressed by changes to the software development process, and that not addressing it has some cost to society as a whole, you'd have a much better case. Maybe this is what you're arguing, but if so, I don't think the whole paper's been posted to LtU yet...

This is indeed the more global point, and you've stated it in a considerably better manner than I did. Unfortunately, you're right; I haven't found a single source to refer to, and haven't yet written the thesis myself. :-)

sean: I think we agree that "software engineering" is somewhere between a misnomer and a farce. I'm just speculating that in many cases, software's economics don't support an engineering discipline, so the lack of such shouldn't be called a failure.

Right. Instead we should open the discussion about whether software's economics should support an engineering discipline, Alan Kay's "exploitation market for teenagers," or yet something else.

### Real world examples

Unfortunately, the existence of Safari and Mac OS X tell us precisely nothing about the claim either of us is making: it's statistically irrelevant to your claim, and I never claimed that there were no successful shared-state-concurrency-based systems.

Well, I think it says something. People are able to write large multithreaded programs that work well and we should keep that in mind.

How much of a difference does it make to have separate processes and message-passing? My feeling, having written a little bit of multithreaded code again recently, is that it's similar to the difference that automatic memory management makes. Safari probably doesn't use automatic memory management either, though, so how do they do it? Are they programming really well and/or carefully and avoiding the problems we expect, or are they spending their weekends tracking down race conditions and memory leaks? It would be interesting to know.

Side-note: Off hand I can't find any program that I use on my Linux box that is multithreaded. I wonder if this is significant, or if I'm just overlooking some obvious programs.

### Safari and GC

Actually, the parts that are written in Objective C have GC (or at least reference counting -- I'm not sure which). I don't know how much of it is ObjC and how much is C, but judging by the crazy things people can do with plugins (e.g. PithHelmet, an excellent ad filter), a fair amount is ObjC.

### Safari and Threads, Java's AWT

Do you know what they're using threads for? (I don't suppose the source code is available?) What's the idiomatic way to use threads in OSX GUIs?

I recall that Java's AWT started out very multithreaded but (IIUC) this was bad and they switched to manipulating the GUI exclusively within a single-threaded event-loop. This requires some discipline (you still can use the objects with multiple threads) but it drastically simplifies the design: redrawing, handling mouse clicks, updating instance variables, etc, are all done by the same thread and therefore cannot happen concurrently. You don't even have to think about synchronizing them.

Assigning responsibilities (and objects) exclusively to individual threads looks like the winning strategy to me.

For an illuminating discussion of the difficulties see How to Use Threads in The Java Language Tutorial. This is closely related to the most incredibly frustrating multithreading problem that I have encountered personally.

### Cocoa, the interface system f

Cocoa, the interface system for Objective-C programs on OS X, is single-threaded. Events are delivered to a single thread, and GUI elements all get manipulated from that thread. (It's all done as callbacks on objects, as is standard with most OO GUI toolkits) There's some stuff you can do with the GUI from alternate threads, but it's really dodgy, and only on more recent versions of OS X. (The earlier you go the less you can do from alternate threads)

Generally you end up using alternate threads for network stuff, non-GUI rendering (into private bitmaps and suchlike things), and other stuff like that. I expect Safari uses threads for network IO, building and parsing its DOM trees (or whatever it does), rendering GIF images offscreen, and things of that nature, leaving the GUI thread to do the drawing and UI management.

So it sounds to me like we're all doing concurrency in the same general way: divide up the work into tasks that we want to do concurrently, assign one thread/process to each task, and try to make the interactions between tasks as simple and as few as possible.

### I'd disagree that it's the un

I'd disagree that it's the universally correct way to put it. Is it the correct answer sometimes? Sure, absolutely. But all the time? Nope, I just don't think so.

Maybe I spend too much time down at the metal trying to make these systems work, rather than actually using the resulting systems, but from my point of view there are very few universal absolute statements. There are an awful lot of cases where it's perfectly fine to assume that objects are local, method calls resolve in-process, and you can get at all an object's squidgy bits. There certainly are times when an object should be viewed as a fully opaque thing, but universally? Nah, I don't think so.

Maybe (maybe) once we've piled on an order of magnitude more speed to the current computing infrastructure (CPU, disk, bandwidth, and network latency) it might make more sense to view it as a reasonable option but even then I don't know that it'd be the right universal default.

And even then, you're stuck with a huge mass of programmers, and programming languages, which assume it isn't. That, more than the issues of performance, is where the momentum is.

### Claims vs. Non-Claims

Dan: I'd disagree that it's the universally correct way to put it. Is it the correct answer sometimes? Sure, absolutely. But all the time? Nope, I just don't think so.

But we're talking about experience and default choices, not absolutes. As was pointed out in another post, Peter and Seif feel that state is important to modularity. One of the major themes of CTM is that purity is overrated. They offer specific examples of where they think this purity is overdone in particular languages, e.g. the refusal to treat mutation as first-class results in the complexities of monads in Haskell. So these are not anti-state pedants.

I also think it's unfortunate that the specific references to Erlang and E were not quoted, because there are important true statements there: Erlang and E both are designed around asynchronous message-passing and they both have excellent correctness and security justifications for doing so. If someone wishes to attempt to rebut their observations, that's fine. But ignoring them won't do.

Finally, if you re-read what was actually written, it's as qualified as you could want: "In our experience... right default." My post was only intended to support that claim, including the "In our experience" part. My experience supports the view that shared-state concurrency introduces problems that asynchronous message-passing does not. Therefore I believe that software development tools should make asynchronous message-passing the default and shared-state concurrency an option. Why should that be even remotely controversial?

Dan: Maybe I spend too much time down at the metal trying to make these systems work, rather than actually using the resulting systems, but from my point of view there are very few universal absolute statements...

You're asserting an absolutism that isn't there. You may say "But the claim is that asynchronous message-passing is universally a better default than shared-state concurrency." Yes, I suppose there is a point there, but "better" is itself relative and an ontological judgment made in programming language design: as things stand, I'm in an excellent position to respond by observing that most popular languages treat shared-state concurrency as "better" than asynchronous message-passing concurrency, if they bother to treat concurrency at all at the language level (e.g. C, C++, and Objective-C do not). So given that, and given that shared-state concurrency and asynchronous message-passing are of equivalent expressive power, yes, I maintain that defaulting to asynchronous message-passing is a consistently better choice.

Dan: Maybe (maybe) once we've piled on an order of magnitude more speed to the current computing infrastructure (CPU, disk, bandwidth, and network latency) it might make more sense to view it as a reasonable option but even then I don't know that it'd be the right universal default.

I can't even parse this. What does speed have to do with it? Are you suggesting that asynchronous message-passing is somehow necessarily slower than shared-state concurrency by more than epsilon? If so, I suggest that you look at Erlang and the Ericcson telephone switching systems built with it again, or work through the exercises in CTM with Oz, or run Donut Lab or Den in E.

Dan: And even then, you're stuck with a huge mass of programmers, and programming languages, which assume it isn't. That, more than the issues of performance, is where the momentum is.

Here I think you've finally hit the nail on the head: most programmers think that they already have a mastery of shared-state concurrency, despite the overwhelming evidence to the contrary, and are afraid that they would have to learn something new. I don't know what to say about this, other than that it is precisely this stagnation of the industry due to its practitioners that offends me most. And I say this as a working C++ programmer.

### I think there's a lot more ab

I think there's a lot more absolutism in defaults than you give credit for, which might be where the apparent mismatch is coming in.

Right now, for most OO systems, there's no syntactic difference between a remote method call and a local one--unless you're peeking beneath the covers it's all hidden, or at least hideable. Presumably (and yeah, I know this is potentially a very large presumption :) the great mass of programmers out there aren't dead-stupid, so... that's a very good indication that the defaults have an enourmous amount of power, arguably a near-absolute amount. Changing the defaults is, then, essentually a universal change.

Some of the questions that really need to be asked here (again, with the presumption that the great mass of programmers, and the great mass of language designers, aren't dead-stupid) are "Why aren't more people using message passing method calls" and "What constructs could be used to make it easier to use as an alternative?"

One reason to not do so is cost -- objects with state have a non-trivial cost associated with them in a pure message-passing system (since that state needs to be either stored in a central location which is communicated with, presumably with some sort of synchronization, or needs to be serialized a lot) and if you use a threading system where each 'remote' object gets its own thread then there's a not-free thread-switch cost. (Erlang ameliorates this to some extent with a custom threading system) I dunno about you, but if I had to pay an extra couple of dozen cycles per method call that adds up really quickly, even on reasonably snappy hardware.

The only way I can reasonably see to change the current state of affairs (one that is somewhat perilous, as it is a significant change in the way most people look at object systems and language design) is to hoist up CCMP-stle objects (or classes) to the same level as local objects (or classes) with switching from one to the other as simple matter of a keyword on a declaration or class definition.

If you did that, though... what else do you have to do? (Especially if you're adding it to an exising language) What facilities need to be provided, what support does the runtime have to give, what impact does it have on other design decisions? It's a lot more complex an issue than just declaring that all objects live in isolation and communicate via immutable messaages.

On the other hand, if you do have the answers (or someone else does) I'd be more than thrilled to get 'em and, bluntly, I'm very likely to weld in good support to Parrot for it. (Not that that matters to most people, but it does for me :)

### Concurrency and Distribution

Dan: [Excellent discussion of message-passing vs. shared-state issues elided]

Dan: I dunno about you, but if I had to pay an extra couple of dozen cycles per method call that adds up really quickly, even on reasonably snappy hardware.

This is obviously fair: if there is indeed a couple of dozen extra cycles per method call, on all method calls, everywhere, that's probably a non-starter.

Dan: The only way I can reasonably see to change the current state of affairs (one that is somewhat perilous, as it is a significant change in the way most people look at object systems and language design)...

Correct, and about time. :-) Let me once again highly recommend working through Concepts, Techniques, and Models of Computer Programming. I think you'll enjoy it even if you end up disagreeing with most, or perhaps even all, of its conclusions.

Dan: ...is to hoist up CCMP-stle objects (or classes) to the same level as local objects (or classes) with switching from one to the other as simple matter of a keyword on a declaration or class definition.

The illusion of a common store is extended across multiple sites and automatically supported by very efficient protocols. In addition, full control is retained over network communication patterns, permitting very efficient use of network resources. Furthermore, reliable, fault tolerant applications can easily be developed.
This would naturally lead one to say "tell me more about these 'very efficient protocols.'" See A Lightweight Reliable Object Migration Protocol.

Dan: If you did that, though... what else do you have to do? (Especially if you're adding it to an exising language) What facilities need to be provided, what support does the runtime have to give, what impact does it have on other design decisions? It's a lot more complex an issue than just declaring that all objects live in isolation and communicate via immutable messaages.

Exactly right. Where I think you and I differ, if anywhere :-) is on the very real issue of whether you can reasonably implement Peter's point in existing languages or not. Your thesis seems to be that you can't. I don't disagree with that. I interpret your suggestion to be that the suggestion of changing defaults is therefore infeasible. Assuming that that's a fair interpretation, which I gleefully admit is not at all clear, I would have to respectfully disagree, and I hope that the reason is obvious: Java has succeeded in the marketplace despite its flaws, Python has succeeded in the marketplace despite its flaws... it's not to say that introducing a new language based on different fundamentals would be a sure-fire success. Far from it. But I would posit two things:

1. The Java experience strongly suggests that a cautious juxtaposition of controversial features (garbage collection, virtual machine, mandatory declared exceptions...) and familiar features (C++-like syntax, class-based object model, simple type system, monitor-based threading model, text-file-based edit/compile/link/run/crash/debug cycle) coupled with aggressive marketing can succeed in the marketplace.
2. The Python experience suggests that you can build a real community around even less mainstream ideas than Java's (mandatory whitespace, list comprehensions, not-necessarily-class-based object model, heavily reflective meta-architecture, no edit/compile/link/run/crash/debug cycle...)
So a goodly chunk of what I'm suggesting is that you can have a new language with different fundamentals, but you need to either accept being a relatively niche player for the predictable future, or you need to bite the bullet and put your spiffy new fundamentals under some serious syntactic sugar that rather strongly resembles... well... C or C++. And you have to do so in a way that when you really do interact with C or C++ APIs, the illusion doesn't completely collapse and your users discover that they're using Haskell in C++ clothing. :-) Oh, and you still need to market the living daylights out of the thing, preferably via the same mechanism that Sun succeeded with Java, i.e. by convincing someone else to use/embed your language in a major project a la Netscape and applets.

Dan: On the other hand, if you do have the answers (or someone else does) I'd be more than thrilled to get 'em and, bluntly, I'm very likely to weld in good support to Parrot for it. (Not that that matters to most people, but it does for me :)

Oh, you're that Dan! Well, I certainly don't have the answers, but it seems to me that the Oz folks are doing a bang-up job on the issues, as are the E folks; see <http://erights.org/elib/concurrency> and the page immediately following it on distribution for their designs.

### Oh, you're that Dan! D'oh! G

Oh, you're that Dan!
D'oh! Guilty as charged.

Anyway, I'm not particularly convinced you couldn't do this now, in perl, python, or ruby. You'd want to wedge into the new method for objects to decide whether the object in question was local or not (creating the object in another thread and returning a handle or something), and then wedge into the dispatch to pass messages instead of dispatching straight to the code. You could likely do it with just an attribute on the new method, leaving the actual class code alone.

This is probably doable in C++ as well, but I'd have to know C++ to know for sure. I don't, and I'd as soon keep it that way, thanks. :)

Now, whether this is a good idea or not is up in the air. I could see it going either way, but that's a syntax and object, thing, and I don't do either if I can help it.

Where things get interesting is at the semantic level--what does an engine have to provide to make this sort of thing work well transparently? I can see needing some sort of potentially distributed GC system (if objects can move across processes or machines), a good serialization system (potentially serializing continuations, though that's not tough), some sort of event system to manage async calls, and maybe support for lazy evaluation of high-level variables for transparent asynchrony. A message-passing system as a first-class part of the runtime/VM/whatever would be needed as well.

Hrm. This doesn't actually sound all that tough -- it almost looks like it could be done as simply as adding an isolated property/attribute/mixin/interface/parent/whatever to a class. You'd lose direct access to the guts of an object or variable or whatever, but if direct access is mediated by code anyway you'd not actually notice that.

Y'know, I may have to put the MMD/CPS/INTERCAL idea aside for a while and work on this. Sounds interesting, and possibly very useful.

### Shared-state concurrency to blame in N% of cases

for what value of N, would you say?

I know there's a class of particularly nasty bugs and pathologies associated with shared-state concurrency. From my own experience, I've found that trying to reduce exposure to such problems (while remaining within the same model of concurrency) means adding considerably to the complexity of programs, which can create fresh problems in turn. But there are surely many other kinds of things that can go wrong with programs, especially ones written in C++?

### Shared-State Concurrency, or C++?

Dominic Fox: for what value of N, would you say?

Well, we'd have to define "blame" and "cases." So: in what percentage of projects would I say that the use of shared-state concurrency has led directly to project failure, where "directly" means "ignoring other contributing factors" and "failure" is defined to include "schedule and/or budget overruns that resulted in the project not being profitable?" In something approaching 20 years of professional software development, I would have to say that that number, thus defined, approaches 100% asymptotically.

The push-back that I always get about this is that "software projects always go over schedule and/or over budget." Well, sure; that reflects the fact that there's no such thing (yet) as "software engineering." How exactly that contradicts my observation or my point is unclear.

Dominic: I know there's a class of particularly nasty bugs and pathologies associated with shared-state concurrency. From my own experience, I've found that trying to reduce exposure to such problems (while remaining within the same model of concurrency) means adding considerably to the complexity of programs, which can create fresh problems in turn.

I think you're making Peter's and my point for us: doing shared-state concurrency in currently-popular languages is problematic. When you attempt to minimize the problems without changing the model, presumably because your language either doesn't support any other models or because it makes such a model change too costly, you introduce new complexity and hence new risks. Doesn't this suggest precisely that a different concurrency model "by default" would be a better idea? Doesn't it make you think that a bit better language support for avoiding the problems would be nice?

Dominic: But there are surely many other kinds of things that can go wrong with programs, especially ones written in C++?

Of course, but the thread is about asynchronous message-passing concurrency vs. shared-state concurrency. It just so happens that I'm working in C++ on top of the pthreads and Win32 threads APIs, and so the architecture is one of shared-state concurrency, to put it mildly. Does the C++ code have other issues? Most definitely! But since the system in question deals with thousands of systems distributed over the Internet, issues of concurrency, scalability, throughput, etc. crop up early and often, kind of like Chicago voters. :-)

### Unix and the internet

Maybe I have the granularity wrong here, but I think that Unix and the Internet are a huge success story for isolated processes communicating by message-passing. The protocols in RFCs are specified with message passing and they're much clearer to me than the (mostly implied) protocols I've seen in shared-memory programs. It's also reasonably easy to think about crashes, recovery, and upgrades in systems that are split between isolated parts like separate processes and machines.

I think languages that do operating-system-like things (including concurrency) should take a close look at these nice properties and try to preserve them. I also think it's significant that Thread.stop() is considered too dangerous to exist even though kill -9 can be used with confidence.

But I don't really understand what "default" means in this context. Programs like compilers seem logically sequential and I don't know where they'd use concurrency. It's also interesting to consider systems like Emacs and (if I understand correctly) Oberon that get very nice properties as a consequence of being globally non-concurrent.

### REST and Ramblings on Sequentiality

Luke Gorrie: Maybe I have the granularity wrong here, but I think that Unix and the Internet are a huge success story for isolated processes communicating by message-passing.

This is an excellent point that I shouldn't have needed reminding of: it's precisely the point that the REST, or "Representational State Transfer," community is attempting to make. The principles of REST underly Waterken's Web Calculus, "that is independent of the application programming model. The model enables reasoning about the functionality and security of an application interface." The page referenced includes situations of hypertext, REST, RPC, DEM, NOM, CCP, and UML within the calculus. <http://www.waterken.com/dev/Web/AMP> defined an Abstract Messaging Protocol based on the Web Calculus; <http://www.waterken.com/dev/Server> is a Java-based web server that eases the development of web services based on the Web Calculus, including support for the "httpsy" scheme for decentralized identification as described at <http://www.waterken.com/dev/YURL>.

This is good stuff!

Luke: But I don't really understand what "default" means in this context. Programs like compilers seem logically sequential and I don't know where they'd use concurrency. It's also interesting to consider systems like Emacs and (if I understand correctly) Oberon that get very nice properties as a consequence of being globally non-concurrent.

I take "default" to mean, informally, that the constructs of the language and libraries support one approach with greater concision than the other. The meaning is made more formal in CTM, by the way: languages are defined as kernels with more-or-less syntactic sugar, and the book proceeds literally by only adding a kernel construct when it is absolutely necessary in order to gain expressive power. Concurrency is introduced in chapter four; state isn't added until chapter six; shared-state concurrency waits until chapter eight. This reflects both the added expressive power (in the literal sense of extending the kernel language with new features) and, from a pedagogical perspective, the increased difficulty of reasoning about the language thus extended (in the literal sense of requiring more complex proof techniques, and the intuitive sense of being easier to introduce bugs in the more expressively-powerful languages).

As for compilers being sequential, that's not necessarily true. For example, the Glasgow Haskell Compiler works largely via graph reduction, and graph reduction lends itself to parallelization modulo sharing. Generally speaking, it's easier to compile functional languages concurrently than stateful ones, although there's a certain amount one can do either way.

EMACS and Oberon are indeed interesting, as are the various other systems that do concurrency without doing preemptive threads and shared state. :-) There are many approaches to doing just that, ranging from coroutines to A Poor Man's Concurrency Monad to state threads that mimic a threads API but don't actually use threads and don't actually incur shared state to SEDA to yet other approaches.

Really, observing that shared-state concurrency is fraught with problems is an utterly non-controversial observation, or at least it should be, given the amount of time and energy the industry puts into addressing the problems. What seems to be controversial is the notion that perhaps our tools should enable alternative approaches in a native, natural fashion.

### Bitter Experience

Having spent about eight years in the server-side Java world, with its notions of shared-state concurrency, and now doing time again in the C++ ghetto, with its notions of shared-state concurrency, I have to come down on the side of "unfortunately ignored truth." It doesn't take much experience with Multilisp, Erlang, E, or Oz to see that, either.

In general, I would have to say that the reflexive insistence that mutation is necessary to get "real work" done is the most pernicious lie spread within our industry.

### Mutation and real work

Paul,
I discovered OCaml lately, and am enjoying it thoroughly.
At work, I use c++, python, php, Java.

One of the features of OCaml I like most is its practicality (save for its non concurrent threading): it has imperative constructs such as mutation.
Without it, how do you design a logging (traces and logs) that can be turned on/off or made more/less verbose during program execution? How to modify the environmnent while running? How do you keep the user's data in an http session without groing the stack without mutable data structures?
Not trolling, just genuinely interested in answers to the problem. I do not insist mutability is necessary, just find myself coming back to it.

Thank you

### Good Question!

First, it depresses me that anyone feels the need to defend against accusations of trolling from me. I'm going to attempt to excise that particular word from my vocabulary henceforth; whenever I find myself tempted to use it, I believe I'll simply attempt to exercise enough self-discipline not to post, and overcome my egoic need to have the last word.

With that out of the way, it's hopefully obvious that I agree that mutation is useful, since I am an O'Caml programmer rather than a Haskell or Concurrent Clean programmer. :-) The post you're replying to reflects my intense frustration over the state of affairs, not only with mutation, but specifically with shared-state concurrency, in Java and C++: it's just damnably difficult to write correct concurrent code in those languages. The alternative models of concurrency are declarative concurrency and message-passing concurrency; these, as well as shared-stated concurrency, are covered extremely well (as is everything else!) in Concepts, Techniques, and Models of Computer Programming, commonly referred to as "the CTM" around here. We're fortunate indeed to have Peter Van Roy as a guest editor/frequent commenter on LtU.

But you were asking about mutation in general. It's hard to know where to start with this, if only because there are so many LtU threads about it, and for me to attempt to address the question is especially difficult due to my own lack of extensive hands-on experience with purely functional languages. Still, I appreciate the power of hard-core referential transparency, particularly when it leads to stunning results such as ZipperFS, the existence of which prompted (no pun intended) me to ask Oleg Kiselyov to combine his pure-O'Caml implementation of delimited continuations, which requires code to be in monadic style, with his, Jacques Carette's, and Lydia E. van Dijk's syntax extension for monads in O'Caml, the point ultimately being to be able to implement Oleg's Zipper, which is a really good answer to the question "How do I make mutable immutable data structures?" with a minimum of syntactic hair due to the monadic style. For more information about the generic Zipper and the ZipperFS, look at the two entries starting here.

Your logging, modifying the environment, and HTTP session questions are also very good, and once again, Oleg Kiselyov has shown a powerful way to address them, in his dynamically scoped variables for O'Caml. This code relies on his native implementation of delimited continuations for O'Caml; an exercise for the reader (or writer!) would be to port it to the pure O'Caml delimited continuations included with the pa_monad syntax extension already mentioned.

Finally, elsewhere in this very same thread is a reference to Software Transactional Memory, which would probably be most syntactically familiar to most programmers. This post and the one it refers to are probably the best places to start exploring STM.

That's a lot of rambling from me; hope it helps!

### Sometimes the most natural

Sometimes the most natural and simple solution to a problem is to use mutable state; its use should just be limited to where it makes sense. Languages like OCaml and Oz allow unrestricted mutable state but reduce the need for it and discourage its overuse. Haskell on the other hand only allows impure computations within monads.

I think a good approach (for a strict language) is to allow effects to be part of the main language in the ML style, but trace their use through an effect-and-security system.

### Side-note on state

On the same page, Teaching Programming with
the Kernel Language Approach
by Peter Van Roy and Seif Haridi:

State is essential for modularity,
because it allows changing a component's behavior over time without
changing its interface.

While I do see, how state is essential for modularity, because it allows changing a component's behavior over time without changing its value (ok, value of its reference), I do not understand the statement about interface. Unfortunately, I cannot currently afford to buy the book to answer this question :)
Though probably I should start saving, judging by reviews...

### Changing behavior without changing interface

While I do see, how state is essential for modularity, because it allows changing a component's behavior over time without changing its value (ok, value of its reference), I do not understand the statement about interface.

Peter brought this up when he was a guest blogger last year; see State and modularity. The problem is caused by needing to emulate state by passing extra information as parameters; if new information needs to be passed, the interface changes.

The last post of that discussion notes however that if state is an abstract data type, then the interface doesn't need to change if its representation changes. What is wrong with that logic?

### ADT is simpler than state

The last post of that discussion notes however that if state is an abstract data type, then the interface doesn't need to change if its representation changes.

Exactly. Then why bring state where simpler (ADT) concept will do?

On the other hand, preserving value implies preserving interface. That's why I suggested that the real purpose of state was to allow changing behavior while preserving value, and preservation of interface is just a side effect.

### CCWMP - holy grail or just another software architecture?

"Concurrent components with message passing" - our company's product is architectured exactly as described by this concise little phrase (of course, we don't call it CCWMP). When I first started at the company and was introduced to the CCWMP architecture in the training session, it felt like some kind of holy grail of design.

After years of looking at it, I realize it is not the holy grail that I thought it was. Some complexities that come up in CCWMP include:

- Sometimes it is necessary to have the entire context of the message-pass (and transformation) as it goes through 7 components. Do you track that context in some global-overseer or do you tack the global context onto the message?

- How do you want to handle the routing of the messages between components? Do you want a hard coded top-level router or do you need to have something that is more generic? For bonus points, you can try to figure out how to trace the context of a message after it gets forked.

- How do you define a COMPONENT in your CCWMP? Does it have two interfaces, like a pipe or is it multi-pronged? Is each interface in your system single or bidirectional? How do you track 'sessions' of messages between two components (e.g. exchanges of multiple messages that are part of a transaction)?

- I haven't even gone into the implementation details discussions of which could be quite vocal at times. For example, how do you manage the memory for the message passing through the system? Reference-counted, garbage collected styled? I can tell you that this GC anything is controversial with the hardcore real-time guys. (Heck, even C++ is controversial at our company but that's a discussion on its own).

These are issues that the system designer of a CCWMP can deal with, with enough coffees and BEvERages.

### Slow motion method calls.

CCWMP is indeed undervalued in academic courses and text books. Students should have it firmly kicked into their furry heads that this is the way you do it _unless_ there is an excellent reasons not to _and_ you have a well thought out way of doing the alternative.

However, as Alfred said, there is more to it...

My favourite anti-pattern is the "slow motion method call". Where two or more components are passing an amazing flock of messages holding "session identifiers". If you look closely you find hundreds of lines of code to do with...

• Storing session ids.
• Matching session ids.
• A big state machine working around an item of state associated with a session id.
• Timeouts to cope with the case of a message not coming back in time.
What is happening here? They have got "slow motion procedure call", but with a huge tangle of code to cope with the lack of continuations in C/C++/Java.

A normal method call unwinds nicely giving a return code of success or fail, or throws an exception. A slow motion method call across threads uses time outs to detect failure of the far side.

Before CCWMP can really be pronounced Good, we need a Good Clean Way (or Alternative) to these @$#%$!# slow motion method calls.!

### John Carter: Before CCWMP can

John Carter: Before CCWMP can really be pronounced Good, we need a Good Clean Way (or Alternative) to these @$#%$!# slow motion method calls.!

Yep. It really does no good to say "we're going to move away from shared-state concurrency" if the surrounding language pretty much exposes the Von Neumann paradigm, and that's it, which is why elsewhere in the thread I've indicated my disbelief that attempting to graft other concurrency models onto currently-popular languages would work very well.

Now, having said that, the Nice examples using JCSP don't look too bad, but then, Nice isn't Java, and if I'm willing to use a non-Java that compiles to JVM bytecodes, why not Scala? So I think you're right: we can talk about different models until we're blue in the face, but unless we're willing to tackle the surrounding-language issues head-on, we're likely wasting our breath.

### microkernels

CCWMP is indeed undervalued in academic courses and text books.

Isn't the complexity and overhead of message passing the reason microkernels haven't lived up to the hype in the operating system world? I don't usually think of microkernel research as being an overlooked area of academia. Maybe there's some common ground to be had between the two groups. The OS people should be exposed to abstractions like continuations and monads, and the PL camp should investigate what can be gleened from modern microkernels like L4.

### Domains

Greg Buchholz: Isn't the complexity and overhead of message passing the reason microkernels haven't lived up to the hype in the operating system world?

Message passing is only complex if the surrounding context doesn't support it well. C and C++ don't support it well, so those communities think it's "complex."

Message passing at the OS level is a bit different than, e.g. a message-passing-oriented language that uses it intra-process, of course, due to the need to cross the user-space/kernel-space boundary, but that's true of every other approach also. L4 does seem like our best bet for getting this stuff right, especially with Jonathan Shapiro engaged on the security side and bringing his EROS experience to the table.

Greg: The OS people should be exposed to abstractions like continuations...

Continuations vs. threads is a big topic among OS designers; Using continuations to implement thread management and communication in operating systems is just one paper on the subject.

### Lengthy URL

That ACM URL is widening the page.

Taken care of!

### Two orthogonal issues?

I am all for message passing instead of shared state, but it seems to me that asynchronous vs. synchronous is an orthogonal issue.

Ada's model, based loosely on CSP, is essentially synchronous message passing. It seems to me this approach is easier on the programmer than asynchronous models such as erlang's (but I admit my erlang experience is rather limited).

### Flow-Based Programming

J. Paul Morrison's 1994 book is online, cited by CTM; summary paper.

The paper touches Ehud's question; the book touches a previous discussion about parallel programming simplicity:

Strangely though, in spite of being at the leading edge of application development, it is also simple enough that trainee programmers can pick it up, and it is a much better match with the primitives of data processing than the conventional primitives of procedural languages. The key, of course (and perhaps the reason why it hasn't caught on more widely), is that it involves a significant paradigm shift that changes the way you look at programming, and once you have made this transition, you find you can never go back!

Analog and digital hardware design always begins with a signal flow diagram. Is there a fundamental reason why software design should be different? On the subject of software engineering, let me ask: Is it possible that hardware engineering has something to teach us?

Hardware designers start by imagining multiple components spread across a board (or die), all doing their little bit in parallel, input to output, output to input. Note that hardware includes "stateful" components like flip-flops and integrating capacitors. Personally I find this mode of design appealing for software, and marvel at resistance to the notion that parallel programming can be easy.

### Like LabVIEW?

I personally can't wrap my head around LabVIEW at all - too many years of doing if-then-else-for etc. has made it hard for me to imagine coding in any other way.

### Hardware designers start by

Hardware designers start by imagining multiple components spread across a board (or die), all doing their little bit in parallel, input to output, output to input. Note that hardware includes "stateful" components like flip-flops and integrating capacitors. Personally I find this mode of design appealing for software, and marvel at resistance to the notion that parallel programming can be easy.

The thing is, hardware at the gate and IC level is parallel, so hardware designers are just using the platform.
When it comes to software, how many processors have you got? Two? If blocking is concern, adopt an event-driven model.
On the other hand, parallelism is very relevant when the program is to run on networked computers, the number of which can be quite high (and dynamic too)

Of course, threading can be simpler to program than event-driven designs, provided that shared state is well defined.

### Concurrency primitives

An interesting discussion, I must admit. I haven't read the CTM book, but I feel the authors follow the right way.

The question: what kind of concurrency primitives should be used in a general-purpose PLs (like Java, C# or Ocaml) in order to program easier and less buggy? Shared-state is no good, but is async MP really better?

So far when it comes to multi-threaded modules in Java, I worked out some patterns that are similar to this "CCWMP". I investigated actor-based PL, I studied some message-passing approaches, I read excellent pages on E language, and I also had some other ideas. Not to mention that old CSP, Ada, Monitors that we were taught at the university. And I conclude that

• there are many subtleties in this area
• I found no well-established terminology of these different concepts
• better concurrency paradigms are yet to be invented

So I think we should work out which concurency paradigms are which, what properties they do have in common and what diffrent; any good comparatory papers on that? Then we could continue to discuss pros and cons of diffrent approaches.

Or perhaps I'm just a confused newbie, and you have it all resolved already?

### Re: Flow-Based Programming

More from the 2001 paper.

A common measure of complexity in a program is simply the number of if-then-else decisions. If you analyze these in a conventional business application, you find that many of these decisions are related to synchronization of data - data has to be available when it is needed, and must disappear when it is no longer needed. In a purely synchronous program, this can result in extremely complex code. This in turn means that it has been extremely hard to come up with reusable components in the area of business programming. Once you have defined a few general purpose subroutines, such as formatting, table lookup, and some conversion routines, ideas start to dry up! There are "natural" primitives in business programming, such as "sort", "merge", "summarize", etc., but these are hard to do in a synchronous program, as they tend to be "spread out" in time. The technology described in this paper, on the other hand, tends to yield exactly this kind of primitive, with very little imagination or effort required!

We are thus starting to be able to articulate a rather fundamental question: why should we constantly have to force-fit a style of application that deals almost exclusively with transformations applied to data streams, into a predominantly synchronous, procedural, mathematically-oriented computing environment? When we add to this the growing requirement for world-wide distributed operation and 7/24 availability, which put additional strains on our machines and on the very talented and dedicated people who look after them, it should not be at all surprising that we have such a hard time implementing this type of application on languages and machines that were primarily designed for a completely different problem space.

Sometimes ideas and concepts sink in better with a more familiar syntax, and so to add to the excellent CTM mentioned above, I'd also like to recommend "Implicit Parallel Programming in pH" by Nikhil and Arvind.

The "implicit" part is important, because multiple CPUs will soon be the norm, and shared state concurrency across large numbers of CPUs simply won't scale. Peter is right, concurrent components with message passing should be the default. Things may be bad now, but if we don't change they will be a lot worse. We were able to ditch the GOTO statement, we can make this change as well.

And at the risk of being controversial, (and possibly to be enlightened from any misunderstandings of mine), one of the problems with monads is that they are able to work because they explicitly sequence operations in order to thread state through the program. But whilst this simple sequential approach may be appropriate to a single CPU, this is not the environment our programs will be running in shortly.