Use Cases for Shared-Memory Concurrency?

The async/await thread got me thinking about Coyotos, and the fact that outside of the kernel I couldn't think of any use-case where mutable shared-memory concurrency was motivated in application code. Assumptions:

async/await integrated into the application-level languages, and
non-shared-memory concurrency with messages in the style of Node worker threads remains available (shared physical heap, non-shared logical heap).
the usual separate address space mechanisms remain available

While there are clearly algorithms that benefit from mutable shared-memory concurrency, most have equivalently efficient implementations built on messaging without use of shared memory. This becomes especially true if we add in the ability to observe when data is deep-frozen (which address many cases of "initialize before going parallel) to support immutable shared memory concurrency. For example, we did a high-performance ethernet driver in EROS that utterly relied on mutable concurrent shared memory for performance, but could have been implemented with immutable shared memory if we had had the ability to transfer an observably frozen reference as part of a message. Offhand, I'm not convinced that message ports need to support mutable shared memory data transfers at all.

By "observably deep frozen", I mean a reference into a previously mutable graph that can no longer be reached (from anywhere) by a reference that permits further mutation. One way to do this is to return a single [transitively] read-only reference to some element of the graph that we can show has not otherwise escaped from the execution context in which it had been mutable, and whose mutable execution context is known to have terminated at the point of return.

A similar effect can be had from deep-copy, but deep-copy can have prohibitive cost for large object graphs.

I'm sure there will be some obvious answers, but I'm prompted to ask: what kinds of codes or problem types exist that truly require the shared memory concurrent model (i.e. that cannot be efficiently encoded using a combination of async/await and non-mutably-shared concurrent threads with messaging)? Is there a common pattern to them? Is the pool of such codes large enough to warrant first-class design support in programming languages and OS runtimes? How many of those cases would be eliminated if we had reliably observable deep frozenness in the type system or the language runtime?

To be clear, this is mostly an idle ponder on my part. Putting the async/await discussion together with observable frozen-ness in my head got me to thinking on this, so I thought I'd put the question out there for discussion.

Regards and thanks...

Observable freezing is hard

This is not an answer to your question, but observable freezing seems pretty damned hard to get right. C++'s const has been broken pretty much since its inception, precisely because it is unable to capture the actual immutability of the object in question - it only says “you cannot mutate this thing through this reference”. Things only get worse when you want to share linked data structures, because the semantic constness of this node means that you cannot modify that other node - or maybe you can modify it, but only in certain limited ways (e.g., if you are rolling your own laziness in a strict language).

This means that the state of being frozen is more subtle than a generic language feature can describe, and can only be adequately modeled using user-defined abstract data types. But few languages are simultaneously good for defining abstract data types and for concurrent programming...

By Eduardo León at Thu, 2020-09-10 18:46 | login or register to post comments

SSA perspective

My project (once I get it off the ground) might be able to manage usable observable freezing. I think it might take some effort to make it work with shared-memory multithreading per the OP, but it helps to start somewhere.

First, immutable values are the default. Mutable variables are available, but are categorized, marked, and handled as "resources" (along with the likes of filehandles). Conceptually, resources are modeled as a linear sequence of values (as in SSA). Potential mutation points are marked; the lifetime of each value in the sequence is bounded by these mutation points.

To make the values in this linear sequence naturally immutable requires alias control: by default (unless explicitly declared otherwise), variables may not alias other variables and function arguments must not alias other arguments. Mutation points can then act as potential updates for all aliased resources, and must be sequenced accordingly.

In order to support shared-memory concurrency, I think you'd need to work out how to integrate multithreaded memory management (which for performance purposes should probably be distinct from local allocations).

By Bruce J. Bell at Sat, 2020-09-12 02:50 | login or register to post comments

Not hard

Observably frozen is a straightforward extension of escape analysis. The tricky part is the type metacomstructor, and Typescriot has shown me a way to do that. So no, it’s not hard for a language to describe. We clearly understood how to do it in ButC. We got hung upon transitive read only, which is more like what you describe. Though that is also doable.

By shap at Sat, 2020-09-12 04:58 | login or register to post comments

Avoiding Shared Memory Concurrency is Difficult

It is exceedingly difficult to avoid introducing shared-memory concurrency.

With shard objects or actors, a shared mutable variable can be modeled as an object with get/set methods. Shared memory as a collection thereof.
With shared channels, a singleton buffer essentially models a variable. Multiple processes could read and write the buffer.
With zero-buffered channels, we can model buffered channels using rendezvous through an intermediate identity process.
With single-assignment futures/promises, we can indirectly model channels hence variables by assigning `(Value, Next Future)` pairs.
With temporal logic, we can model mutable shared variables as a predicate with `(name, time, value)` triples.
Any effects involving a tuple space, blackboard system, shared database or filesystem is essentially shared memory concurrency.

Even within pure computations, we can model shared memory concurrency using a continuation monad to model green threads operating upon a shared environment. Aside from lambda calculus, models of deterministic concurrency (such as Kahn Process Networks or Lafont Interaction Nets) can even more directly model interaction with a shared-memory environment without the monadic structure.

If you intend to remove shared memory from the path of least resistance for programmers via choice of syntax and keywords, that should be achievable. Change the default to something other than shared memory concurrency.

But I think you probably will not be able to avoid supporting shared memory concurrency "at all". Since we cannot easily avoid support for shared memory concurrency, the question becomes: Why not support it well? Shared memory concurrency can be difficult to reason about no matter how we choose to model it, but there are also lots of ways to mitigate those disadvantages, e.g. with support for observers, transactions, implicit history tracking, separation of reader and writer references, etc.. Basically, a richer memory model.

By dmbarbour at Mon, 2020-09-14 19:50 | login or register to post comments

Bullet responses

So with the clarification that I meant concurrently mutable shared memory, I think that most of your examples are removed. Even the channel examples do not require concurrent mutability.

One might wish otherwise, but the pure/imperative debate will still be going long after you and I have surrendered the ability to participate. I was attempting to drill down on the more specialized challenges of concurrent mutability.

By shap at Tue, 2020-09-15 18:04 | login or register to post comments

re Concurrently

I think you missed my point. I'm not saying any of these models require concurrency. Obviously, we can model objects without concurrency. I'm saying that if you have any of these features in a concurrent system, you (indirectly) support concurrently mutable shared memory for free. It's very difficult to avoid supporting concurrent mutable shared memory. If the users want to model it within your system, they'll probably find a way. And they'll subsequently suffer all of those "specialized challenges of concurrent mutability", such as dealing with race conditions (albeit, potentially at a more coarse granularity than byte-level).

By dmbarbour at Tue, 2020-09-15 18:44 | login or register to post comments

Use Cases for Shared Memory Concurrency

I can think of a couple use-cases for shared-memory concurrency within application-layer code.

Hierarchical embedding.
Multi-agent systems.

Hierarchical embedding is a form of composition for applications, i.e. we can embed one application within another, perhaps use it to control the host application. See Croquet Project for one old example. Embedding can also be useful for testing of apps, i.e. we 'embed' applications into a debug/test/integration environment. The essential constraint for hierarchical embedding is that applications can pretend to be the OS for embedded applications. Thus, if you've designed an application model that requires shared state within the 'kernel', it is more compositional to also support shared state within the 'apps'.

Regarding multi-agent systems, some of the better designs for these systems are oriented around shared state - blackboard metaphor, tuple spaces, shared database, CRDTs, etc.. Albeit, with some built-in control models.

By dmbarbour at Mon, 2020-09-14 20:48 | login or register to post comments

Interesting thoughts

I'll have to think about hierarchical embedding - it's a term I haven't heard before. Speaking in ignorance, I wonder if this isn't a case where the need for shared memory concurrency is a result of recursive dependencies on a previous concurrent mutable-shared system. Pretending to be the OS doesn't intrinsically require shared memory concurrency unless simulating existing OS architectures is a requirement. I've shipped two commercial debuggers, and the debug/test/integrate case is not compelling in my mind. That's a case that usually requires breaking the normal runtime semantic assumptions regardless.

None of the multi-agent examples you give strike me as compelling case for concurrent shared state. There is a significant amount of evidence that single-threaded databases would have dramatically better performance when async/await is used, and for most of the rest it is not unreasonable to imagine that the state being shared should be deep frozen.

If I had to guess, I'd say that driving use cases will most likely come from applications where mutable shared memory becomes intimately tied to control flow. Iterative weather simulations seem like a potential example. Also, my statement about async/await runs into some very hard challenges in the presence of timeouts. It also seems possible to me that video decoding might create significant memory pressure if the storage space for the frames cannot be reused.

Perhaps I should clarify that I'm using the term "concurrent" rigorously here, in the sense that instructions are being simultaneously issued on multiple hardware processors. Async/await and similar pseudo-concurrent patterns remain "in bounds" in my question.

I'm not suggesting or imagining that there is a "free lunch" to be had. I'm contemplating whether we may have reached a point of evolution in PL runtimes where new approaches to old problems are no longer risible.

By shap at Tue, 2020-09-15 17:35 | login or register to post comments

Parallelism and Concurrency

Your use of the term "concurrent" is not the one I'm familiar with. From Wikipedia "In more technical terms, concurrency refers to the decomposability property of a program, algorithm, or problem into order-independent or partially-ordered components or units." Concurrency is all about how we express programs or systems, not how we implement them.

Obviously, concurrency is usually an excellent opportunity for hardware parallel computation. But parallelism can be supported without concurrency (e.g. matrix multiplications in parallel), and concurrency can be supported without parallelism (e.g. via time-sharing of a single CPU).

it is not unreasonable to imagine that the state being shared should be deep frozen

We can easily have a shared database where the 'value' of the database is deep frozen. But we'd still have shared, concurrent mutable state for the top-level database variable, e.g. the `HEAD` variable. And if we model deep updates to different paths via persistent data structures, the `HEAD/PATH` combo is essentially another model of concurrently mutable shared memory, with all the same problems as before.

I'm not sure why you consider deep-freeze very relevant to the question of whether we have concurrent mutable state. At least in my mind, the questions of shared mutable state concurrency and *pervasive* state are wholly independent.

applications where mutable shared memory becomes intimately tied to control flow (...) weather simulations, video decoding

Memory pressure can be solved based on other techniques, without introducing shared memory. For example, bounded-buffer channels with pushback for static-sized data can be implemented to rotate through the same few memory slots while logically introducing new slots. Double-buffering can be viewed as an instance of bounded-buffer channels.

I'm contemplating whether we may have reached a point of evolution in PL runtimes where new approaches to old problems are no longer risible

New word for me. 'risible' - laughable. Thanks. :D

I believe we can and should design PLs that have nicer 'default' properties for concurrent and parallel computations. I don't believe we can avoid supporting mutable shared memory concurrency 'at all' (in the sense of preventing users from modeling it). But we can de-emphasize it, remove it from the path of least resistance.

By dmbarbour at Tue, 2020-09-15 19:44 | login or register to post comments

I have apparently been misusing terms

The term I wanted is "parallel", not "concurrent". The first issue I'm concerned about here is the situation where multiple hardware instructions might attempt simultaneous reads or writes on a memory location.

But I'm also concerned with mutable concurrency.

Maybe I need to go back and frame my original question a whole lot better and more clearly.

By shap at Wed, 2020-09-16 17:43 | login or register to post comments

deep freeze

(I may be accosted by the semantics police for using some of these terms, but here goes ...)

In the design of the Ecstasy programming language, we completely eliminated "shared mutable state". Shared state is either (deeply) immutable, or it is visible (and mutable) only via the single owner of the mutable state (conceptually a message passing metaphor, but the mechanism is purposefully opaque).

Thus far, it seems to have been a very wise design decision. However, we have not implemented the native code compiler yet, so there will be an accounting (for the performance and complexity implications) at some future point in time.

It was interesting to see the topic of "deep freezing" come up here, since it's a topic that we just recently had to deal with, as a side-effect of this design.

By cpurdy at Thu, 2020-09-17 20:34 | login or register to post comments

re deep freeze

I've often thought that Aldrich's approach in Plaid Language in this subject was interesting. IIRC:

objects could have linear and shared/weighted references
unless marked for shared mutable state, objects can only be mutated via linear references
attempting to mutate a shared reference can implicitly wait for all shares to release

Effectively, we get 'deep freeze' simply based on whether an object is shared or not. Naturally, we could easily force any object into the frozen state forever. But frozen can also be a temporary status, for temporary sharing.

Of course, without something like Rust's 'borrow' checker, it can be difficult to locally reason about whether a "temporary" freeze will actually be temporary.

By dmbarbour at Fri, 2020-09-18 20:44 | login or register to post comments

implementation approach

We actually mark the objects (those that can theoretically be mutable) as being immutable; that cannot be undone.

The interesting thing is how to make an object into a deeply immutable object. Here's how we do it for List objects, for example: https://github.com/xtclang/xvm/blob/master/ecstasy/src/main/x/collections/ListFreezer.x

You'll notice that last line calls "duplicate()", which comes from CopyableCollection, which itself derives from Replicable and Duplicable. These use virtual constructors, which are basically automatically-present factory methods with rules enforced at compile and runtime. Here's the Duplicable interface, for example; note how an calss implementing it agrees to have a compatible constructor available: https://github.com/xtclang/xvm/blob/master/ecstasy/src/main/x/Duplicable.x

By cpurdy at Fri, 2020-09-18 21:51 | login or register to post comments

Clarifying

After scratching my head about two puzzling responses from dmbarbour, I realized that my title was really bad. The title should have been "Use cases for mutable shared memory concurrency."

I did not mean to suggest doing away with shared memory. My question is whether we still have use-cases for concurrently mutable shared memory.

Apologies for my error.

By shap at Tue, 2020-09-15 17:58 | login or register to post comments

Use Cases for Parallel Update of Memory

In light of your recent clarification.

The first issue I'm concerned about here is the situation where multiple hardware instructions might attempt simultaneous reads or writes on a memory location.

Just a few thoughts:

There is a granularity of "location" issue. E.g. conceptually, a matrix is one big location, but we might wish to support slicing a matrix such that different threads update different rows or different columns. Uniqueness of slices might be expressed via substructural types, or via some structured program expression. Slices won't necessarily be aligned nicely with words or CPU cache lines. E.g. what happens when we instead take a slice of a bit-vector? But we could probably make the details of updating bit-level slices semantically nicer than it is today.

There are potential arguments for lock-free data structures. Though, I think that using persistent immutable data structures (especially finger trees and log-structured merge radix trees) would be a wiser option in 95% of the potential use cases for lock-free.

When I do FPGA programming, occasionally I use a shared register as a notifier between loops. But I always try to keep this as one-writer, multiple-readers.

By dmbarbour at Wed, 2020-09-16 22:24 | login or register to post comments

Lambda the Ultimate

User login

Navigation