Concurrent Composition and Algebras of Events, Actions, and Processes

Possibly interesting, if you are into process algebras. I am insufficiently clueful about the topic area to pass serious judgement on the paper, but I found it a mostly accessible read at any rate.

Mark BURGIN and Marc L. SMITH 2006

PDF

There are many different models of concurrent processes. The goal of
this work is to introduce a common formalized framework for current research in
this area and to eliminate shortcomings of existing models of concurrency.
Following up the previous research of the authors and other researchers on
concurrency, here we build a high-level metamodel EAP (event-action-process) for
concurrent processes. This metamodel comprises a variety of other models of
concurrent processes.

I mean, any paper that says "One of the reasons is that, as experts claim, nontrivial concurrent programs based on threads, semaphores, and mutexes are incomprehensible to humans" can't be all bad.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

xref

And yet it moves

One of the reasons is that, as experts claim, nontrivial concurrent programs based on threads, semaphores, and mutexes are incomprehensible to humans.

I find this claim always a bit hard to coalesce with the really existing reality, wherein every piece of computing infrastructure used to transfer, store, and process the bits I'm writing here is using threads, semaphores, and mutexes.

Edit: As Bryan Cantrill writes:

"The heart of the problem is, perhaps, that no one really knows how to organize and maintain large systems that rely on locking" admonished Nir Shavit recently in CACM. (Which gives rise to the natural follow-up question: is the Solaris kernel not large, does it not rely on locking or do we not know how to organize and maintain it? Or is that we do not exist at all?)

(my emphasis)

Does it Compose?

In large lock-based systems we have a lot of difficulty avoiding race-conditions and deadlocks and priority inversions and starvation and convoying. In a large system with locking, you need to grok the whole system to prevent problems; independently valid components that use locking code might deadlock when composed. That makes locks very difficult to reconcile with large systems - especially those involving open (third-party pluggable) modularity.

We have a much easier time with data-parallelism, where we split large calculations into different threads (Cilk, Intel Concurrent Collections, OooJava, Data Parallel Haskell). Where we need concurrent semantics, we have easier time with shared-nothing and lock-free architectures (SOA, Actors, FBP, Publish/Subscribe), or event-loop concurrency (E, Croquet, Win32), or single-assignment concurrency (Oz, concurrent constraint), or temporal logic concurrency (FRP, Dedalus, TCC, UTCC), or transactional systems (databases, tuple spaces, blackboard, hibernate, gemstone, STM).

The fact that so much of our computing infrastructure explicitly uses semaphores and mutexes and shared-resource threading speaks of some odd perceptions of the subject, widespread ignorance of other options, maybe a naive considerations for performance and hand-optimization, or perhaps just inadequate language design - the fault of a language community that too easily dismisses concurrency as something to 'tack on' later (excusable for C in 1972, but less so for Bit-C in 2008).

There is also a lot of inertia: the prevalence of synchronous IO from the OS (such as every process having three synchronous ports by default) - despite the fact that modern CPUs perform almost no synchronous operations. The limited tools offered by the OS make it difficult to implement other concurrency models efficiently, which certainly resists migration away from thread-based shared resource concurrency, which in turn discourages experiment with providing other concurrency models from the OS. Fighting against inertia is difficult, sometimes radical.

Finally, there are also extremely low standards for success. Somehow, in a manner I find nigh incomprehensible: developers that struggle to get the locking right, fail, fight off recurring concurrency bugs repeatedly, discover rare race-conditions months after shipping, fix one bug to create another... these developers will turn around and say that threading and locks were successful and comprehensible. Consider this report where students consistently reported that locking was 'easier' even though they consistently made more mistakes. Is this confabulation? selective memory? unconscious incompetence? Maybe there's something about human nature that associates a retrospective 'triumph' with 'good tools' even though it really means 'your tools sucked, you fool; you built this with your own blood, sweat, and parentheses'. Real progress is measured by the extent to which we no longer need to think about or struggle with what we are doing, yet can still accomplish our goals effectively and efficiently - thereby leaving us open to higher-level goals, but offering no 'triumph' for whatever the tool purposes.

Modern OS kernels are examples of 'triumph'. That should tell you just what I think of the tools the developers were using. Of course, until better tools are available (and the OS itself can in many ways be considered such a tool) a triumph every new kernel will be.

Re: Cantrill’s “concurrency hucksters and shysters”

Re: comment-62421:

As Bryan Cantrill writes, Is the Solaris kernel not large, does it not rely on locking or do we not know how to organize and maintain it?

This reminds me of the epic debate between Rich Hickey and Cliff Click:

Hickey:
“As we add locks, performance of both approaches improves, but a new source of errors comes into play for the manual approach — lock acquisition order and the resultant deadlock risk. At this point STM becomes dramatically less complex than manual locking.”
Click:
“Not interesting in practice. Because in practice, deadlocks are easy to find & fix. You get a crash dump, crawl the stacks, discover the lock cycle & reorganize. Sure the situation could be better, but deadlocks are a 'crash once' kind of bug. Dev's don't like 'em, but they don't lose sleep over them either.”

“Not interesting in

“Not interesting in practice. Because in practice, deadlocks are easy to find & fix. You get a crash dump, crawl the stacks, discover the lock cycle & reorganize. Sure the situation could be better, but deadlocks are a 'crash once' kind of bug. Dev's don't like 'em, but they don't lose sleep over them either.”

Deadlocks are also easy to introduce. Fixing one deadlock could just create another. And technically, every bug could be a 'crash once' kind of thing. Doesn't mean we shouldn't use every tool we have to eliminate as many types of bugs as possible.

"use every tool we have "

Doesn't mean we shouldn't use every tool we have to eliminate as many types of bugs as possible.

That would be true if were the case that the marginal cost of adopting each new tool were negligible.

It isn't so it isn't.

Cliff Click at VMLI 2010, during McKinley's keynote Q&A:

Cliff essentially said that every month he writes a lock-free, scalable version of some algorithm in his Azul Systems products.

He also said during his keynote that we've discovered that providing developers with the illusion that locks are cheap has in turn resulted in a horrible programming model that doesn't work in terms of human writeable/human readable code, because people don't understand how to use locks effectively and just end up slapping locks all over their code haphazardly. He then said he doesn't believe the hype in STM as the future, but that he might at some point take a stab at writing a runtime that supports scalable STM.

Doesn't mean we shouldn't use every tool we have to eliminate as many types of bugs as possible.

Cliff also said that in his experience QA has been invaluable in preventing regressions, using techniques such

His keynote and McKinley's should both be available online at some point.

Deadlocks not so easy to find and fix

My experience does not click with Cliff Click's...

I do a lot of work with a project large enough to need considerable configuration management, with many pluggable modules and scriptable elements. Often, a deadlock won't occur except for particular configurations. Sometimes, those deadlocks don't occur except in particular scripted scenarios or users clicking various elements of a UI in some particular order coinciding with a particular scheduler timing.

Given those issues, deadlocks can be very difficult to find - and often aren't discovered until after the project is late in a testing phase, or possibly even shipped, at which point a fix is much more expensive.

Once found, deadlocks are not always easy to fix.

Many deadlocks are trivial fixes, but on occasion you'll find a 'subtle' deadlock error that cannot be removed (except by introducing a race-condition) with any small change in code. When this happens, developers will need to reorganize a considerable body of code - and there is rarely assurance that their 'fix' won't introduce more bugs (possibly for another configuration).

Additionally, one isn't just concerned with bugs in one's own code. One uses libraries and third-party components. I've found several race-condition and deadlock bugs in Mozilla's Javascript (SpiderMonkey), for example, simply because I used the API in the documented manner but in a slightly different way than browsers normally stress it.

These experiences have me desiring a language that lets me prevent these issues, reason about them locally rather than waiting until there are huge, systemic, configuration-management level problems to deal with. Of course, if you get the architecture right from the beginning, a lot of these deadlock and race-condition issues can be prevented. But getting the architecture right is not especially different from picking a better language.

An argument against locks is not the same as an argument for STM. There are many other approaches to supporting concurrency and parallelism while avoiding deadlock. But I certainly disagree with Click's characterization and dismissal of the problem.

Cliff's experience is about

Cliff's experience is about parallelism (GPU, multicore, cloud) not inducing hard-to-fix concurrency bugs. Naturally concurrent systems (your example of JS) are, almost by definition, naturally challenged by coordination. What Cliff is saying, and I actually agree with (and can use statistics to back it up), is that most of the stuff you want to parallelize doesn't have very fancy coordination and thus the parallelization should quickly reveal relevant coordination bugs.

Getting on a different soapbox, when you *are* dealing with parallel code, functional correctness coordination (in the sense of deadlocks, atomicity, etc.) isn't the problem, but coordination more in the performance sense (is the memory in the right place at the right time?).

Essentially, yes, parallelism has shown itself to be a language problem, but not (surprise!) one of coordination as people in the verification & testing community (e.g., see POPL/PLDI/ICSE/FSE/etc. for the last decade), make it out to be. Concurrency, in general, is an important area, but multicore/cloud typically isn't where the concurrency is coming (and when it is, I'd argue the concurrency challenges aren't of deadlocks etc. but more tedious stuff like change propagation in data structures).

Concurrency as basis for Scalable Parallelism

I agree with some of what you're saying. Concurrency is where the coordination issues really pop up. OTOH, locks are unnecessarily expressive for parallelism. So is STM. People arguing Locking vs. STM should be discussing at least some form of concurrency.

Concurrency, in general, is an important area, but multicore/cloud typically isn't where the concurrency is coming (and when it is, I'd argue the concurrency challenges aren't of deadlocks etc. but more tedious stuff like change propagation in data structures).

I believe that concurrency, via mobile code and tierless architectures, will become the most significant opportunity for scalable parallelism. To a lesser extent, I believe this is already the case, but we need to improve certain aspects of our languages and code-distribution architecture to take fine-grained advantage of it.

*David borrows Leo's soap box.*

It has been observed that many 'concurrent' applications fail to scale beyond a few parallel components. In many cases, some sort of data-parallelism is feasible (e.g. parmap, SIMD, GPU) and we should certainly be able to leverage those opportunities! But I'd like to address just the concurrency aspect - the coordination of diverse systems and interests - and argue that even that is a sufficient basis for scalable parallelism, assuming we leverage it properly.

The scalability of concurrency as a basis for parallelism comes in realizing that the number of relationships in a system can rise O(N^2) with the number of components. Thus, we need to embed most significant computation into the relationships, rather than the components. This means, in the less hand-wavy sense, that we associate code directly with the relationships - code that could be executed at each end and in between. We end up with a 'smart' network and relatively 'dumb' endpoints, at least in the logical overlay. Establishing a relationship would involve mobilizing code to the remote machines to maintain the relationship, rather than using expensive one-size-fits-all communications.

When most computation is moved to the relationships, the resources used by a service will scale commensurately with the number of clients it directly services - and vice versa; resources used by a client will be commensurate with the number of services it directly uses. Further, the number of relationships can increase because we are no longer communications bounded.

There are modern barriers to that scalability.

Our inexpressive code-distribution architectures currently don't support patterns where a server can introduce and maintain ad-hoc point-to-point relationships between clients, nor may clients introduce and maintain ad-hoc point-to-point relationships between services. That sort of ad-hoc networking allows us to establish relationships between arbitrary nodes in the graph, at least once we possess the capabilities. Without it, we are restricted to 'shallow' relationships, which will be communications bounded and fragile under disruption. Ad-hoc relationships established between services would consume client resources (since a client would always be 'in-the-loop' maintaining the relationship) and vice versa. Code distribution would allow us to establish a relationship then maintain it only when the nature of the relationship needs to change.

To support code-distribution with ad-hoc networking, we need much better support for secure, robust composition than most of today's languages offer. Certainly, no language with locking would ever qualify! Also, sandbox based security prevents the required expressiveness for remotely composing ad-hoc relationships. The code we mobilize must carry its own authority, allowing it to communicate with remote services, multiple services on the host, and even other mobile code. Naturally, this means authority must be very fine-grained. The capability security model, with distributed capabilities, is the most promising for this purpose.

Finally, we need paradigms suitable for sharding and replication to the appropriate remote hosts. This requires some control over information-flow, annotating the more or less sensitive data with the requirements for hosting it. This requires the language know which state is used for caching, which state needs synchronized, which state can be regenerated... ideally, we would simply avoid expressing all non-essential state. (Reactive programming is suitable for implicit caching. Temporal semantics are useful for implicit streaming, synchronization, and coordination, which can avoid some need for explicit state.) Due to conditions of disruption, this requires we understand which behaviors are idempotent, which need to be reliable, where quality of service is critical, etc.

Ideally, we could write our code, integrating diverse client and server side APIs (where 'Document Object Model' might be called a client-side API) as one big 'chunk' describing a bunch of relationships. And then our code should be automatically replicated and sharded as necessary to do its job efficiently, effectively, and securely. It would help if most code was literally specified as maintaining relationships rather than endpoints - e.g. a sort of networked FRP allows some intelligent decisions on how much of the function to put server-side and how much to put client-side.

So it isn't that concurrency is a poor basis for parallelism. It's that our modern languages are deeply flawed in other ways that prevent us from effectively leveraging concurrency for parallelism.

I envision a world where ambassador patterns are widespread: clients mobilize publish/subscribe agents to live at or near a service, speak to the service on their behalf, translate a simpler client-side vocabulary into whatever rich interactions the service API requires. This would save bandwidth, improve latencies, maintain ad-hoc client interests across network disruption, and provide enormous opportunity for parallelism. Even assuming the shared service elements don't scale beyond 8-10 concurrent processors, one would still be able to utilize all 864 cores to take care of tens of thousands of ambassadors from thousands of clients. Just as easily, you could generate a lot of parallelism by shifting a good chunk of processing client-side, so that only the critical bits of communication actually move across the network and the service is even available (albeit slightly degraded) while offline.

That is only the beginning of my vision. With the right language and architecture, we could also achieve resilience, graceful degradation, eventual consistency, partitioning tolerance, denial-of-service resistance, flash-crowd scalability, multi-cast, content-distribution, responsibility and legitimacy tracking, real-time reactivity and QOS properties, streaming data, and live upgrade. Temporal RDP is so far my best bet at achieving all of these at once.

Long term, I would also like pervasive incremental temporal constraint satisfaction - generating and executing a shared 'plan' that can change reactively but is 'incremental' (stubborn, changing only where the old plan failed to meet new constraints). That's simply the level of expressiveness people really need in the domain I work in - unmanned systems - demand an on-the-fly-configurable balance of operator control and multi-platform cooperative autonomy (swarming). Temporal RDP doesn't provide this long-term goal by itself (I haven't figured out how to achieve it without sacrificing security) so developers will need to settle for constraint-management 'design patterns' and frameworks or services, and make it 'pervasive' by hand. Fortunately, code-distribution does allow flexible late-bound distribution of these high-level patterns.

Anyhow, I'm not saying we should be rid of other sources for parallelism. By all means, use data-parallelism where it is available! But I believe concurrency will be a primary source for parallelism once various flaws are resolved in what today are extremely pathetic code-distribution and security models.

And, when we get it right, it really will be a compute 'cloud' - complete with competitive CPU and Memory and Privacy/Trust service markets. Today's clouds are pathetic imitations of what they can become.

parallelism is where the compute is

The scalability of concurrency as a basis for parallelism comes in realizing that the number of relationships in a system can rise O(N^2) with the number of components. Thus, we need to embed most significant computation into the relationships, rather than the components.

That's really elegant!

But I believe concurrency will be a primary source for parallelism

... even if it's kind of a trick :) One of the 'aha!'s of FRP is that the edges can be thought of as values (and, in a sense, an enabler of the trick). A parmap over such an edge is still an edge (letting us get beefy edges)... but also still a form of data parallelism.

As for the security and client/cloud model, I'm starting to lean towards a secure-cloud-as-a-service model: consumers have a data+compute account somewhere they trust, visit a service that'll run on it, and done. Either users pay power/energy fees on their account activity (just like your normal utilities) or the consumed service runs ads and pays for you. The cloud utility is responsible for most of your security needs (e.g., your data not leaking to the consumed apps). I think there'll be important concurrency issues with many clients -- imagine walking into a room with a bunch of communicating phones -- but most of the heavy compute will be elsewhere and thus simpler.

Viewed the right way

I'd normally consider data-parallelism as involving "big values" - e.g. perhaps a single stream of 1.3 million LIDAR points per second, or perhaps face detection on a continuous stream of 4-megapixel frames. We don't need really huge values, but that's where you'd get the greatest benefits from the data-parallelism. This is, IMO, a very 'different flavor' of parallelism and scalability than I was describing above. If we have one stream of 4-megapixel frames, we can gain a lot of data-parallelism just processing each frame, and we can get even more parallelism by processing multiple frames in parallel (as a pipeline) but we don't have much concurrency-based parallelism because there is only one stream and only one consumer.

What I described earlier is certainly a concurrency-based parallelism, even if it does use mechanisms that - viewed the right way - look a lot like data-parallelism. You know it's concurrency-based parallelism because it doesn't scale for large-value computations, such as working with big sets and matrices. Rather, it scales for a large number of diverse activities and relationships each involving relatively small values (where 'relatively small' is being compared to high-performance sensor streams...). In many important ways, this concurrency-based parallelism is quite orthogonal (and complementary) to the data-parallelism.

Anyhow, the 'thought of as' logic is an interesting direction. I'll give you one of my own:

Viewed the right way, Firefox is a popular cloud-based remote-desktop service provided by Mozilla Corporation. The executable itself is just a shard of that service, replicated to different client machines in order to improve latency and network characteristics, and reduce communications burden from Mozilla (who must only communicate updates). This sharded service achieves a boost to parallelism and scalability of perhaps seven to eight orders of magnitude. That probably beats anything you've personally achieved with data-parallelism... and it was achieved with concurrent relationship based parallelism.

Is that a justifiable view? In the security sense, you'd need to consider what authority Mozilla corporation really has over your machine while you're running their client. You also need to consider the extent to which you trust Mozilla corp to not abuse those authorities, and to not allow other network services (which you obtain through Mozilla Corporation's Firefox cloud service) to abuse them. And then there's the continuous maintenance sense... automatic downloads of patches and upgrades. I'll grant it's quite arguable in either direction because we can't easily apply this pattern to modern cloud services (due to our flawed code-distribution architecture). But, even if you disagree with it, perhaps the example will offer a broader sense of what 'cloud' can potentially mean, and help you get a handle on the level of scalability and parallelism truly offered through concurrency.

We can play all sorts of games with 'viewed the right way' ;-)

Is what I did a 'trick' to achieve parallelism? I suppose it might be, viewed the right way.

You rock, David!

That is one of the most insightful posts I've read in a while, especially the bit about moving computations into relationships.

Look at Plaid

The Plaid programming language adds concepts of state and permissions into the typesystem and would a step in the right direction per your suggestion. While your vision is certainly powerful and I would love to see it come to fruition, there's also the question of how do we go about teaching people to use these concepts (and the tools that will emerge) in writing working code on a daily basis.

It moves but at what cost/effort?

Without endorsing STM, programming with threads is hard to get right even with mature projects like MySQL, Apache, Mozilla, and OpenOffice as Learning from Mistakes - A Comprehensive Study on Real World Concurrency Bug Characteristics illustrates.

ignore me

(wrong area)