## Functional building blocks as concurrency patterns

While teaching INGI1131, my concurrent programming course, I have become even more impressed by a concurrent paradigm, namely functional programming extended with threads and ports, which I call multi-agent dataflow programming. This paradigm has many good properties:

• The declarative concurrent subset (no ports) has no race conditions and can be programmed like a functional language. The basic concept is dataflow synchronization of single-assignment variables. A useful data structure is the stream, a list with dataflow tail used as a communication channel.
• Nondeterminism can be added exactly where needed and minimally, by using ports. A port is simply a named stream to which any thread can send.
• All functional building blocks are concurrency patterns. Map, fold, filter, etc., are all useful for building concurrent programs. Here are two examples: a contract net protocol and an observable port object. Chapter 5 of CTM gives many more examples.
• Concurrent systems can be configured in any order and even concurrently with actual use of the system. Note that this is true for ports even though they can express nondeterminism.
• Designing concurrent programs is amazingly easy. For example, any declarative part of the program can be put in its own thread, loosening the coupling between system's parts, without changing correctness.
• The paradigm is easy to implement efficiently.
• The paradigm is easily extended to support fault-tolerant transparent distributed programming: see Raphael Collet's dissertation. Google's MapReduce is a famous example.

This paradigm seems to be exactly what is needed for both small and big parallel systems (both multicore and Internet, tight and loose coupling). I am surprised that it is not used more often. What do you think? Does it deserve a bigger role?

## Comment viewing options

### What are the advantages of this model over the Actor model?

What are the advantages of this model over the Actor model? I have found the Actor model extremely adaptive to parallel programming, without sacrificing imperative programming. The same advantages you mention of the data flow model are valid for actors as well:

1) no race conditions. Each object is its own little world.

2) all building blocks are concurrency patterns. In fact, with the Actor model, any concurrency that exists in programs is automatically revealed.

3) designing concurrent programs is amazingly easy as well, without the need for programs to be declarative. The Actor model also maps extremely well to object-oriented design, which is very important.

4) it can easily be implemented in quite an efficient manner.

### Higher level model

Of course it's possible to have race conditions in Erlang. While each process is mostly its own little world they do tend to need to talk to each other in an unpredictable order and they can each have state. It's quite possible for some orderings of messages to leave two processes in a mutual state that is not consistent. What Erlang process isolation does is set a minimum granularity at which races can occur.

That aside, one downside to Erlang's "all parallelism is via messages" model is that you have to build a lot of infrastructure to do even simple parallel things. If I want to do some computation with lots of data parallelism in Erlang I have have to write a bunch of message dispatching related code that has nothing to do with the computation I want to perform. Data flow variables, on the other hand, can do it in a way that looks much more like the way you would write the algorithm in sequential code.

But the bulk of what PvR is talking about is the idea of viewing a "message queue" as a single stream which can manipulated using higher order functions like map, fold, etc. That's just not feasible in Erlang which is very much designed around user code reacting to one message at a time usually with an imperative effect.

### Perhaps it depends on the language?

Of course it's possible to have race conditions in Erlang. While each process is mostly its own little world they do tend to need to talk to each other in an unpredictable order and they can each have state. It's quite possible for some orderings of messages to leave two processes in a mutual state that is not consistent. What Erlang process isolation does is set a minimum granularity at which races can occur.

Assuming that your definition of a race condition is the common one (for example, as defined here), since there is no shared state in Actor objects, I do not see how race conditions can occur between two actors.

That aside, one downside to Erlang's "all parallelism is via messages" model is that you have to build a lot of infrastructure to do even simple parallel things. If I want to do some computation with lots of data parallelism in Erlang I have have to write a bunch of message dispatching related code that has nothing to do with the computation I want to perform. Data flow variables, on the other hand, can do it in a way that looks much more like the way you would write the algorithm in sequential code.

I don't know about Erlang, but minimal (or even none) setup code is required in object-oriented languages in which each object is an actor itself. Any expression involving actors in these languages is automatically threaded, and the code is written exactly in the same way as before, i.e. sequential. I've tried the Actor model using c++ and Java in a recent project, and the results where very good: code was automatically threaded, and increasing the number of cores increased the performance almost linearly (depending on algorithm, of course).

But the bulk of what PvR is talking about is the idea of viewing a "message queue" as a single stream which can manipulated using higher order functions like map, fold, etc. That's just not feasible in Erlang which is very much designed around user code reacting to one message at a time usually with an imperative effect.

It's quite feasible in c++/Java though. For example, an array of actor objects can apply a functor object on its elements in parallel. In fact, applying functional principles on Actors works wonderfully.

### Race conditions between actors

I do not see how race conditions can occur between two actors.

Race conditions can occur between three actors. For example, two clients and a server.

### Example?

Can you please give an example?

### Races

Actors embed state (an actor may "designate the behavior to be used for the next message it receives"). Two client actors, C1 and C2, can easily share this embedded state by sharing a reference to a service actor S. This allows race conditions to occur between C1 and C2.

E.g. imagine S is an actor that supports 'get (double)' and 'set (double)' messages. If C1 and C2 were using S to represent a bank account balance, and each wished to withdraw \$, then they each could 'get' the current status, determine how much they wish to withdraw, then 'set' the final status. This situation would exactly correspond to a 'classical' race condition exhibited in many programming texts.

If one needs shared data with manageable concurrency, Actor Model would integrate very well with Software Transactional Memory (e.g. associating messages with transactions so that state updates can be reversed and conflicts with other negotiations can be resolved). Actor Model also integrates well with other database services (e.g. representing SQL in messages between actors).

This is actually a problem with Erlang. It doesn't provide language features to support concurrent or transactional access to shared data that happens to be shared via common reference processes. And, thus, one will end up greenspunning it... getting a slow, buggy, non-standard, 80% solutions to shared resource management.

### Races Possible with STM Too

While I'm a big fan of STM (I'm using it in my own little pet language after all), it does suffer the same problem as the actor model in this case. You can just as easily read a value in one transaction and then write another value based on the one read in a second transaction. The solution to your withdrawal example is essentially the same with both the actor model and STM. In the case of the actor model, rather than get and set messages, the actor should have withdrawal and deposit methods similarly in the case of STM the whole withdrawal or deposit must be done within a single transaction.

From my perspective, the true advantage of STM over the actor model is that implemented properly you can have unrestricted read access while a long write transaction is taking place. In cases where there is a high probability of failure in the write process, you gain a similar advantage with the ability to run multiple write transactions simultaneously (though obviously at most one can succeed on the first try in such a scenario).

### STM Races

Absolutely, one would normally use 'withdraw' messages rather than 'set' messages. But that would not avoid the race condition if the amount withdrawn is to be a function of the amount currently in the account.

Actor Model doesn't eliminate the value of STM for performing ACID negotiations and transactions even when the actor's messages are dedicated to the domain. One could for this particular example use a 'withdraw (amount) if (function over account holds true)' message, but as negotiations grow more complex, especially regarding the implementation of server S (which might be a front-end service that must atomically negotiate a withdrawal with two or more other actors), one will eventually be reinventing and greenspunning transactions on a per-service basis - a phenomenal waste and duplication of effort. A variant of STM for the Actor Model would be far more composable, flexible, independently optimizable, and much less prone to programmer error.

And I agree that you need to keep STM to a single transaction to avoid race conditions... I'd imagine that goes without saying, but perhaps you've seen some people misunderstand.

### Transactions

Transactions involving only a single actor are quite simple to implement, whether in a message-passing or monitors/locks environment. E.g. in Java:

public class Account {
private int balance = 0;
/** @return amount deposited */
public synchronized int deposit(final int amount) {
if (amount < 0) { return 0; }
balance += amount;
return amount;
}
/** @return amount withdrawn */
public synchronized int withdraw(final int amount) {
if (balance - amount < 0) { return 0; }
balance -= amount;
return amount;
}
public synchronized int withdrawFunc(final BalanceFunc f) {
return this.withdraw(f.calculate(balance));
}
public interface BalanceFunc {
public int calculate(final int balance);
}
}


The difficulty comes not from user-defined code, but when there are multiple actors involved, in which case you need some managing code that can ensure order in which locks are acquired (or otherwise ensure consistency):

public void transfer(Account source, Account dest, int amount) {
Account a = source, b = dest;
if (System.identityHashCode(a) > System.identityHashCode(b)) {
a = dest; b = source;
}
synchronized (a) {
synchronized (b) {
dest.deposit(source.withdraw(amount));
}
}
}


Of course, for this to be correct you also need to ensure that all access to these actors/objects is via the transaction manager, otherwise another thread can still screw things up. Not impossible to do, but much much easier if your language runtime takes care of it for you.

### Not Actor Model

Actors are not simple objects. To get an 'actor' in a typical OOP implementation, you'd need to wrap each 'object' in a dedicated thread with a dedicated message queue (like the Erlang process & inbox). Then, when receiving method calls, you wrap them up in a message object and stick them in the queue.

The use of Java's "synchronized" really wouldn't apply. In typical OOP, the call occurs in the caller's thread, and this requires 'synchronization' when two or more callers in different threads interact with the same object. In Actor Model, the call occurs in the callee's thread, and (at least without dedicated messages/method-calls through the callee) there is no real way for different callers to interact or synchronize with each other - the Java 'synchronized' block wouldn't do anything at all, really.

In OOP, the synchronization of caller with callee allows a return value (from source.withdraw) to be received. In Actor Model, there is no such synchronization of caller with callee, and thus one does not have 'return values' (though one can use 'source.withdraw(caller,amount)' then wait for a result to appear in the caller's inbox.) More often one would create a new actor just to receive the callback ('source.withdraw(callbackID,amount)') in a sort of Actor-based continuation passing style.

[Edit: 'source.withdraw(new Actor(recv(X){dest.deposit(X);}),amount)' would, perhaps, have been a better example.]

...

One could, of course, transform the Actor Model so that actors automatically have a vocabulary to support locks and that every message contains the caller's ID so that the callee can determine whether or not to delay that message or process it immediately. However, choosing the sort of bondage and extreme self-discipline needed to support locks (especially in a potentially distributed environment with possible delay or disruption) over the simplicity of optimistic STM strikes me as borderline insane.

If I'm going to make a transform to support transactions, might as well make it 'Message M in Transaction T' rather than 'Message M from Caller C (who may or may not be synchronized)'.

Besides, the use of transactions is much better for delegation and capability security, too, allowing the same transaction to be used automatically as one actor starts to involve others (i.e. A calls B which calls C which calls D... all in one transaction) and allowing transactions themselves to be assigned a set of SPKI capabilities and Kerberos tickets. Really... locks... are... insane, Insane, INSANE!!!!

### I know

I know what the actor model is. As you say, you can implement an actor easily enough in a shared-memory setting, and you can implement shared memory cells easily enough in a message-passing actor model. The problem I demonstrated had little to do with the particular form of concurrency control used: it applies to shared-memory with locks, to actors, to monitors, or indeed most forms of observable non-deterministic concurrency. My point was precisely that transactions become particularly useful when there are multiple actors involved. The code was just to demonstrate that single-actor transactions supporting arbitrary caller-supplied code are easy to implement, even in an environment as impoverished as Java! Clearly, the example is even simpler to express in Erlang or Oz or Alice ML.

Supporting correct updates to multiple actors/object simultaneously, though, is hard(er) no matter whether you are using the actor model or shared state with locks/monitors, etc. You still have to deal with achieving mutual exclusion for all involved actors at the same time (i.e., all involved actors need to be blocked from processing other messages while the transfer is in progress[*]), and you still need to ensure reliable ordering to avoid deadlock. We can translate the Java example I gave into a hypothetical ActorsJava, and see that the code needed is almost identical. Instead of using synchronized blocks we use message sends. Other than that, the same steps are needed:

void transfer(Account source, Account dest, int amount) {
Account a = source, b = dest;
if (System.identityHashCode(a) > System.identityHashCode(b)) {
a = dest; b = source;
}
a.do {
b.do {
source.withdraw(amount) {|x| dest.deposit(x); }
}
}
}


Where the "do" message just executes an arbitrary block of code in the callee's process, and we introduce a Ruby-like syntax for lambdas/blocks for conciseness. You can't simply perform the inner message send (withdraw/deposit), for the same reason you can't in normal Java: a third party could then observe a state in which the money has left the source but not arrived at the destination. So we need to ensure that both objects are blocked processing messages (our "do" messages) before we continue with the transfer, and we need to ensure this blocking is done in a reliable order (to prevent deadlock). (I'm assuming that the deposit/withdraw messages from their own threads can still magically get through. In reality, it would be more complex).

An actor model doesn't really make this any easier. You still need to think hard about the design and all the concurrency problems. I don't believe approach advocated in the OP of this thread (i.e. declarative + ports) makes it any easier either. I'd welcome code to prove me wrong on either count. Transactions do make this easier, by doing the ordering and lock acquiring/actor suspending/... for you:

void transfer(Account source, Account dest, int amount) {
atomic {
dest.deposit(source.withdraw(amount));
}
}


That's not to say that there are no benefits to actors over explicit locks — there surely are for many operations. Just that they don't save you from the problems of correct composition at a larger (multi-actor) scale.

[*] Of course, there are other strategies than the pessimistic one shown here that blocks all involved actors until it completes. I don't think they fundamentally alter the story, however.

[EDIT 2: Just noticed that my actor version won't work as written assuming asynchronous message sends. The "do" messages also need to synchronise somehow so that they don't exit before the inner messages have completed.]

### why transactions are not enough?

Transactions have good properties over low-level lock-based mechanisms but I am not convinced they will take us through...
1) Transactions only provide ACID like guarantees. A lot of time there is an ordering intention between parallel code which until now had been achieved using low-level and bug-prone mechanism like wait/notify. Such ordering intention occurs across parallel loop iterations, producer-consumer pattern. This paper (sec 5.3) provides examples of real-world bugs that had some ordering intention in addition to isolation:
http://opera.cs.uiuc.edu/~shanlu/paper/asplos122-lu.pdf

2) Strong atomicity is a desirable property but to guarantee strong atomicity, all possible concurrent code needs to be made transaction aware. Hence STM will inherently have pay-for-all instead of pay-per-use cost.

3) Transaction focus on getting parallelism right instead of opportunities for more parallelism.

4) And other problems related to rolling back observable effects etc.

Due to the reasons above, I believe Actor model looks more promising. Actors are concurrent entities and the model is inherently concurrent. Parallelism is bound by the number of actors in the system. More the merrier, except that the language and run-time should encourage programmers to have fine-grained actor designs.

Message-passing captures the data access (which we intend to do when we wait for and hold locks) as well as the ordering intention.

Saying that I believe there is definitely an argument for language/library-level support for pay-per-use-only transactions for actors. I would like to call it Software Transactions rather than Software Transactional Memory, because there is no memory-level data sharing in actor model :)

### Transactor Model

... as long as we are making suggestions for names: Transactor Model, Actor Transactions, etc.

As far as 'pay-for-use' vs. 'pay-for-all': The loss of symmetry by specializing so only some actors support transactions will inevitably result in accidental complexity - programs components with code dedicated to working around other parts of the program based on whether or not they are transaction capable. Besides, there are many advantages in modularity and flexibility ad-hoc interactions if transactions are always available.

Leave optimization to the compiler: if the compiler can determine the lifetime of an actor and can prove the actor won't receive messages from a conflicting transaction during its lifespan, then that compiler may (for that actor) elide the extra code and memory necessary to recognize and resolve transaction conflicts.

[edit 2011: I've since developed a transactional actor model... and then discovered cause to abandon transactions.]

### Actor Transactions

I would favor the Actor Transactions since Transactors has been taken in a similar but yet different context:
http://portal.acm.org/citation.cfm?id=1040322

Another way to implement Actor transaction modularly with pay-per-use cost would be in the run-time. Actor languages inevitably have a run-time for message passing and other semantic features. So the way to implement transactions would be "install" meta-actors in response to request for transactions. These meta-actors intercept the messages and provide transactional capability for the duration of transaction. The cost would be higher than the compiled case but it would strictly be pay-per-use. Some relevant work in this direction:
http://osl.cs.uiuc.edu/docs/ecoop93/ecoop93.pdf

Also this way compiled actor code can participate in transactions without having to recompile them. By the way how much actor code is out there and where? :)

### Every Erlang process has all

Every Erlang process has all the necessary properties to call it an Actor. Web services would also be 'Actors', albeit not written in a dedicated language. So, I'd imagine there's a lot of actor code out there... and, seriously, a great deal of it would benefit from transactions.

The runtime specialization for transactions seems like it could work well enough, though I'm curious as to how much it would actually save.

### Code Organization

Actor Model doesn't eliminate the value of STM for performing ACID negotiations and transactions even when the actor's messages are dedicated to the domain. One could for this particular example use a 'withdraw (amount) if (function over account holds true)' message, but as negotiations grow more complex one will eventually be reinventing and greenspunning transactions on a per-service basis - a phenomenal waste and duplication of effort.

I've never used the actor model personally, but I would imagine that rather than trying to greenspun transactions you would just add to the vocabulary of messages that the actor maintaining the state supports. This obviously forces changes to the way your program is organized which may be undesirable.

And I agree that you need to keep STM to a single transaction to avoid race conditions... I'd imagine that goes without saying, but perhaps you've seen some people misunderstand.

My impression is that understanding of transactions amongst many programmers working in industry is poor. The large number of web applications that either use a database that doesn't support transactions (MySQL with MyISAM tables) or use auto-commit would seem to suggest understanding of transactions is poor at least among programmers in that sector.

Further, in a more complex example you might have subroutines that start a transaction, perform an operation, and commit. Multiple calls to such subroutines might not be safe without wrapping the whole thing in an outer transaction which may not be immediately obvious to a user of the subroutines who did not write them. Will this be a huge problem in practice? I don't really know as I've never worked on a multi-developer project that used STM.

### Vocabulary for State Management

I would imagine that rather than trying to greenspun transactions you would just add to the vocabulary of messages that the actor maintaining the state supports.

"Just" is a dangerous word ;).

All actors are implicitly stateful. One could capture this state into 'cells' that are maintained by actors (that approach has nice properties for distributed programming, mobility, and migration). But, if one does, then one still needs to somehow associate the updates with a given transaction.

Since actors, by default, don't support 'sessions' (they can't associate a previous message with a future message) what one really needs to do is transform the vocabulary for each actor. For each message M, one results with M = 'M in transaction T'. To match previous behavior, one could have a special default transaction T=unit, that acts as an 'open,send,commit' rolled into one message.

### Not what I meant

My intention was that rather adding state management to the existing messages, modify the vocabulary such that all operations that need to be done as a single atomic operation on the state associated with the actor can be accomplished with a single message. This obviously places certain requirements on your code organization as code that might have been located in a "client" of a particular actor would need to be in the actor itself. While I would imagine this could be rather undesirable in certain circumstances, it seems to me that it would be the most straightforward way to approach concurrency with the actor model.

I don't disagree with your assessment of what would be required to properly support transactions on top of the actor model though. For distributed I can see the combination making sense, but for shared memory systems if you have STM why bother with actors at all?

modify the vocabulary such that all operations that need to be done as a single atomic operation on the state associated with the actor can be accomplished with a single message

For specific cases one would be able to resolve this issue much of the time ''assuming'' that one has full control over both the server and the client code. However, even with well chosen vocabulary there would still be situations and negotiations between three or more actors where one would be forced to make compromises between atomicity, isolation, and consistency.

In the general (and more common) case where you don't control every actor - i.e. for ad-hoc negotiations with old servers or black-box clients - you'd effectively be out of luck. The strategy you suggest would mostly eliminate any benefits of modularity and implementation independence that one is aiming to gain by use of Actor Model to begin with.

For distributed I can see the combination making sense, but for shared memory systems if you have STM why bother with actors at all?

With STM alone nothing ever happens. You'll need some sort of process model. That model doesn't need to be the Actor Model, but the Actor Model is a fine one to choose. It avoids many constraints imposed by the Object Oriented and threads+stacks model where one must be careful (often in a system-wide sense) to avoid 'feedback' (callbacks that cause calls) that could eventually blow the stack.

In general, I disfavor designs where a programmer of a component needs to have implementation awareness of the context. Actor Model helps avoid need for this awareness in a way that CSP+STM or OOP+Threads+STM do not.

Integrating the transaction support at the Actor Model level (via a transform over the vocabulary) rather than directly in the shared memory buys you a huge amount of language symmetry... once you do so you can effectively make transactions themselves 'transparent' to the process model (such that processes can partake of transactions, even as they communicate with other processes, without directly knowing about it). This is a huge win, even on local machines, since transactional updates to the memory can then be effectively and consistently delegated to other processes (perhaps via an intermediary like a registry).

There are some similar gains for supporting capability security via a slight transform over the Actor Model (very nice if your aim is to blur the line between language and operating system).

There are, of course, other process models that will work well enough for most of these purposes, especially if one is unconcerned about distribution, migration, mobility, etc.

### For specific cases one would

For specific cases one would be able to resolve this issue much of the time ''assuming'' that one has full control over both the server and the client code.

This was my assumption in this case. Obviously, when you don't have full control over the design of a system you must be mindful of the constraints imposed.

In the general (and more common) case where you don't control every actor - i.e. for ad-hoc negotiations with old servers or black-box clients - you'd effectively be out of luck.

Quite correct. Of course, in order to use transactions properly in those scenarios the protocol used by those old servers or black box clients supports transactions. Otherwise how do you know when to commit? How do you properly indicate commit failure so that the client can retry? If you just start a new transaction for each request and commit it immediately you're back in danger of race conditions.

The strategy you suggest would mostly eliminate any benefits of modularity and implementation independence that one is aiming to gain by use of Actor Model to begin with.

I don't think it would eliminate modularity benefits, but it would put some (perhaps severe) restrictions on where your module boundaries can be and that is admittedly undesirable. I don't see how it impacts implementation independence though? Perhaps my problem is I'm not thinking about sufficiently complex examples?

With STM alone nothing ever happens. You'll need some sort of process model.

Well obviously. My intent was more that if you had a suitable implementation of STM, "traditional" process models are fine. My preference for process models is currently dataflow (which is why I'm using it in my language).

It avoids many constraints imposed by the Object Oriented and threads+stacks model where one must be careful (often in a system-wide sense) to avoid 'feedback' (callbacks that cause calls) that could eventually blow the stack.

Okay you lost me here. What does OOP have to do with process models? And what do you mean by "feedback"? Do you just mean regular recursion or are you talking about some kind of event system in which code inside an event handler can cause other event handlers to fire in such a way as to cause unbounded recursion?

Integrating the transaction support at the Actor Model level (via a transform over the vocabulary) rather than directly in the shared memory buys you a huge amount of language symmetry... once you do so you can effectively make transactions themselves 'transparent' to the process model (such that processes can partake of transactions, even as they communicate with other processes, without directly knowing about it). This is a huge win, even on local machines, since transactional updates to the memory can then be effectively and consistently delegated to other processes (perhaps via an intermediary like a registry).

I'm definitely sold on making transactions transparent. My approach with Rhope is to allow the user to declare subroutines as using transactions (currently only on so called "global stores" which are sort of like namespaces for global variables, eventually on it's arguments). Calling a subroutine declared such automatically starts a transaction (nesting doesn't work because of a flaw in the current implementation but will be fixed in the new implementation I'm working on) and the transaction is committed when the subroutine completes. If the commit fails, the subroutine call is automatically restarted. All of this is completely transparent to the caller and only requires a very simple annotation to the callee.

I'm just not sold on the Actor Model+STM combo, but I'm mainly interested in exploiting shared memory systems (and to a lesser extent, hardware like GPUs and the Cell) so some of the actor model's advantages aren't as relevant to me.

### Parry Parry Parley

Of course, in order to use transactions properly in those scenarios the protocol used by those old servers or black box clients supports transactions.

Indeed. And for that reason, STM can't just be 'greenspunned' in. If you don't recall what I was promoting earlier: I believe ALL actors should support transactions. By default. Automatically. In a standard manner. As in: it's a feature automatically provided by the language and distributed programming environment to every actor.

Of course, one does need to make current transaction implicit to all 'sends'. One mechanism is:

1. each message M is associated with a transaction T. One might denote this as (M,T).
2. an actor that is responding to message (M,T) is capable of at least:
• designating its behavior in response to the next message (this designation will be rolled back if T fails to commit)
• sending a finite number of messages in T
• create a finite number of new actors (these actors will be destroyed if T fails to commit)
3. Excepting during certain stages of a commit phase where it is explicitly forbidden, any actor involved in approving T may abort transaction T in response to any message (even one not associated with T). This implies that T must describe contact information.
4. An aborted transaction should result in a resumable exception being delivered to the actor that initiated that transaction.
5. an actor may enter a scoped transaction via an atomic block. For actors already in an implicit transaction, this would generally be made into a a hierarchical transaction to simplify partial rollback.
6. A scoped transaction can only be committed by the actor that initiated it.
7. A commit protocol is applied automatically, with the potential for a final abort.
8. There is an implicit 'root' transaction that corresponds to the global space. A procedure that is NOT reacting to a message (M,T), or any actor reacting to a message (M,root), is effectively forced to commit every change immediately (which may cause conflicting long-running transactions to abort).

That should answer a few of your questions: " [...] how do you know when to commit? How do you properly indicate commit failure so the client can retry?"

I don't see how it impacts implementation independence though? Perhaps my problem is I'm not thinking about sufficiently complex examples?

Perhaps. You seem to be assuming that service S is implemented in such a manner that the only state updated as part of a transaction or negotiation is that within S. This allows you to develop a language such that clients C1 and C2 can really big, complex messages to S. Since S will process these messages one at a time, the updates to S can be considered atomic.

But your assumptions place massive implementation constraints on S. In particular, your strategy doesn't really allow actor S to communicate or negotiate with multiple other services provided by multiple other actors (perhaps D, E, F) as part of implementing the service the actor S is intended to provide. That is an enormous implementation burden.

When the implementation of S is so heavily constrained by the particular strategy you chose to accomplish certain non-functional requirements, well, perhaps 'violation of implementation independence' isn't the best phrase for it... but it's the one I came up with.

My preference for process models is currently dataflow (which is why I'm using it in my language).

Dataflow model is also a good one, esp. for local operations. I'm a big fan of functional reactive programming myself. However, as a distributed systems afficionado, I tend to focus my efforts on achieving a number of non-functional requirements that you probably aren't concerned about... such as delay tolerance, disruption tolerance, mobility, migration, and security for open systems programming.

Models such as Dataflow often take advantage of assumptions that I'm often not able to make in my own projects. These assumptions can be used to simplify the syntax, verification, optimization, and implementation. This is a good thing. Go for it.

Okay you lost me here. What does OOP have to do with process models?

By itself? not much. But the whole phrase was "OOP and threads+stacks".

And what do you mean by "feedback"? Do you just mean regular recursion or are you talking about some kind of event system in which code inside an event handler can cause other event handlers to fire in such a way as to cause unbounded recursion?

I'm talking about the latter. Critically, without system-wide knowledge of the objects or callbacks involved, it is not usually possible to avoid unbounded recursion... especially not when so many useful programming patterns (e.g. CPS) effectively depend upon it.

I'm definitely sold on making transactions transparent. My approach with Rhope is [...] completely transparent to the caller and only requires a very simple annotation to the callee.

I tend to favor the other way around. The callee shouldn't need to know whether it is happening within a transaction or not. This allows the caller to set policy for what happens atomically (thus aiding with the separation of policy and implementation).

I'm just not sold on the Actor Model+STM combo, but I'm mainly interested in exploiting shared memory systems (and to a lesser extent, hardware like GPUs and the Cell) so some of the actor model's advantages aren't as relevant to me.

That's reasonable. I'm not going to claim Actor Model+STM is the end-all-be-all. I am going to claim that (Actor Model with STM) is much better than (Actor Model without STM) excepting, perhaps, from the initial implementation effort standpoint.

"To match previous behavior, one could have a special default transaction T=unit, that acts as an 'open,send,commit' rolled into one message."

This is not as simple as that. Imagine in response to this message m from A, the actor B sends out two more messages to C. Now since m is part of a "unit" transaction T, both the messages to C should be part of the transaction T, which I believe is not what you aimed for.

And because of this reason, the compiled code has to have two copies of behavior. One when the message is processed as part of the transaction and other when it is not. Makes the binary bulkier and relying on source to be available :)

I am working on Actor programming as part of my research work and coordination mechanism is something I am really interested in. I would be very interested in real-world actor programs and understanding their coordination patterns.

### Top Level Transactions

There are actually a number of ways to handle (Message,Transaction)=(m0,unit) (or (m0,'root') as I later called it). The immediate possibilities I can think of are: treat it as {open,send,commit}, {open,receive,commit}, or simply as a top-level send/receive outside of any transactions. A while after writing my earlier statement, I did realize that there really are some useful distinctions between these options... especially regarding how a stray command to abort will be processed.

In the case of {open,send,commit} one will coerce (m0,unit)=>(m0,new Transaction()) prior to sending it. If the resulting transaction is aborted, it will be as though the message were never sent. In the case of {open,recv,commit}, one will coerce (m0,unit)=>(m0,new Transaction()) prior to receiving it. If the resulting transaction is aborted, it will be as though the message were never received. In the pure Actor Model, where there is no interstitial processing of the message, these would be equivalent.

Those behaviors are actually pretty useful... but they can also be easily performed explicitly, which makes them less useful.

A top-level send/receive outside of any transaction would be closer to what I intended, and closer to the original Actor Model. In this case, a stray abort that is in transaction 'unit' (semantically 'outside of any transaction') will then need definition. One might have it throw a simple local and possibly resumable runtime exception like aborted.

 A sends (m0,unit) to B B sends (m1,unit) to C B sends (m2,unit) to C

Would really be the behavior I was aiming for with my original statement, corresponding exactly to the pre-transaction-support behavior:

 A sends m0 to B B sends m1 to C B sends m2 to C

And because of this reason, the compiled code has to have two copies of behavior. One when the message is processed as part of the transaction and other when it is not.

I don't believe that to be the case. Whether or not one is coercing transactions around the message (m0,unit) or treating it as a message outside of transactions one would normally need to follow the path involving transactions. The reason? Because processing (m0,unit) will still have the potential to disrupt other transactions.

The vast majority of setup code for transactions would occur when opening a new atomic {} block, and when updating state contingent upon a transactional commit. The latter could easily be bypassed based on the transaction happening to be 'unit', and the former is a non-issue (otherwise we wouldn't be receiving message (m0,unit)).

Keeping two copies of the behavior might, of course, be useful for a runtime specialization optimization... but is a speed vs. space tradeoff. In the pay-for-use scheme you suggested earlier, you could compile for messages of the form (m,unit) by default, only whipping out the transaction support and change the message handler code when an actor receives (m,T) with (T != unit). Of course, it may require you compile twice ahead of time... or that you keep a copy of the code available... or that you compile to a higher-level bytecode that can be processed directly to add/remove transactions (you could use a JIT to compile hotspots).

I would be very interested in real-world actor programs and understanding their coordination patterns.

I've rarely worked with a 'pure' actor model, but any system with asynchronous message passing would be a good place to observe such patterns. Erlang programs, various service designs, etc. would all qualify, so long as they have a vocabulary and operate in terms of datagrams. Coordination patterns are essentially synonymous with 'protocols' and include everything from handshakes to voting protocols to clock synchronization and commit protocols. Actors can support connections and streams in essentially the same way you'd support them in TCP... except that you'd be reinventing that wheel.

Useful properties with which to measure such coordination patterns are: latency, bandwidth, disruption and delay tolerance, requirement for synchronized clocks, graceful degradation, resilience (ability to recover quickly), robustness (ability to resist attack), anonymity (ability of an actor to hide its ID either directly or via an intermediary), support for migration and mobility (ability of actors to move around to different machines and physical locations mid-protocol), etc.

### That's not a race condition.

That is not a race condition, it's an erroneous algorithm, and the same error can take place using dataflow programming: two or more agents can read the same value and all of them can try to put another value back.

### Dataflow vs. Actors

and the same error can take place using dataflow programming: two or more agents can read the same value and all of them can try to put another value back.

Depends on what kind of dataflow you're talking about. In the paradigm described in the OP there are two flavors involved. A declarative core in which race conditions are impossible but there is no mutable state (at least as far as the user of the language is concerned, with enough information the implementation can invisibly use state for performance reasons) and a non-deterministic superset which can suffer from these sorts of problems, but is more flexible.

So for tasks in which non-deterministic concurrency is required they are equivalent, but the difference is that with actors you always need to use a non-deterministic mechanism when you want concurrency whereas with the paradigm described in the OP you can allow non-determinism only where needed. I haven't used the actor model, so I can't speak from experience there, but working with Rhope (not quite like what is described in the op, deterministic dataflow core + STM for global state) I have found this distinction very useful. Typically only a small number of sub-routines use transactions and for the rest I can get concurrency without having to worry about using state in an unsafe way.

### Seriously?

That is not a race condition,

That's like one of the two textbook example of a race condition found in most network/distributed programming books. It's a race because sometimes it works, sometimes it doesn't. Which depends on the behavior of a scheduler.

it's an erroneous algorithm

By definition a race condition is erroneous.

### Races can be benign.

Races can be benign.

### It's a race because

It's a race because sometimes it works, sometimes it doesn't. Which depends on the behavior of a scheduler.

Therefore, the race can happen in the declarative part of the multi-agent dataflow programming model, since it is under the restrictions of the same scheduler.

### What is a race condition then?

If you don't consider the bank account example a race condition, could you please give us an example of what you do consider a race condition?

### Ok, it's a race condition

Ok, it's a race condition then (I admit I haven't thought of that kind of errors as race conditions).

Can this race condition happen in the declarative concurrent dataflow paradigm?

### Racing to the finish line

A race condition (such as the one in the bank account example) is observable nondeterministic behavior. The declarative concurrent model cannot (by definition) exhibit observable nondeterminism. The price to be paid for this is that the declarative concurrent model simply isn't capable of expressing the bank account example (the operators necessary to - for example - select between inputs from different clients aren't built into the declarative concurrent model). So the answer to your question is no.

### A literal race condition

I do not see how race conditions can occur between two actors.

Imagine two actors that represent cars. Each car sends itself periodic messages updating its position on the track based on its speed. When one car reaches the finish line it prints to console "I won", sends a "you_lost" message to the other car and stops itself. Upon receiving a "you_lost" a message from another car, an actor prints "I lost" to the console and stops.

If the rate at which the cars move is roughly the same then it's possible that either will win. It's even possible that both will win (a tie). The outcome is unpredictable. Oddly, even if one car is programmed to be slower than the other, vaguaries of scheduler and message ordering can still make the slower car win sometimes. Hence we have a race condition in our race simulation. If we want a completely deterministic simulation then we'll have to do a lot more work above and beyond what is directly provided by the actor model.

### It also exists in the dataflow approach.

It's a matter of scheduling and the problem also exists when using ports to feed a dataflow stream.

In the topic text, it is mentioned that "The declarative concurrent subset (no ports) has no race conditions and can be programmed like a functional language". Assuming that the race conditions mentioned there refer to "standard" race conditions (i.e. the ones solved by the no-shared-memory constraint), the Actor model exhibits the same feature.

### Nonstandard?

Assuming that the race conditions mentioned there refer to "standard" race conditions

What's not standard about the race condition in my example? You linked to a Wikipedia article in an earlier message. I'll quote the article:

A race condition or race hazard is a flaw in a system or process whereby the output and/or result of the process is unexpectedly and critically dependent on the sequence or timing of other events.

That looks like a pretty standard definition of a race condition to me.

It's a matter of scheduling and the problem also exists when using ports to feed a dataflow stream.

Nobody has argued that PvR's ports and streams can't exhibit races. The argument was and is with your contention that actors can't have races.

### no race conditions == no logic errors?

I was referring to this definition (quoting the article):

Race conditions arise in software when separate processes or threads of execution depend on some shared state.

Since there is no shared state between actors, there can't be races between them.

But will accept the general definition of a race condition. How does multi-agent dataflow programming prevents race conditions? in the example posted above with the two cars, let's say that both cars (two independent agents, i.e. two different threads) write to a single assignment variable the result of the contest, i.e. who won the race. Depending on scheduling, either agent can set the result, depending on which thread runs first.

Obviously, when someone says "The declarative concurrent subset (no ports) has no race conditions. The basic concept is dataflow synchronization of single-assignment variables." it really means that
update of single assignment variables is atomic; it does not mean that logical race conditions are not possible. My point is that Actors have the same feature.

### Shared state

Since there is no shared state between actors, there can't be races between them.

You see, there's the problem right there. Actors might not share mutable memory, but they definitely can share state. State is that which makes something respond differently to the same inputs. Actors can be stateful objects. The behavior of several actors can depend critically on the state of one actor, hence shared state.

State != representation.

It really means that
update of single assignment variables is atomic; it does not mean that logical race conditions are not possible. My point is that Actors have the same feature.

Nope. He means exactly what he says. There cannot be races in the pure form of dataflow programming. There CAN be deadlocks, but not races.

One way to think of it is that dataflow variables are kissing cousins to call-by-need lets and parameters in Haskell.

### There cannot be races in

There cannot be races in the pure form of dataflow programming.

Certainly, if there is absolutely no shared state at all. But, let's consider the bank account transaction example given above: the problem definition includes a shared state, namely the same bank account.

Even if the withdrawal of money is written in an absolutely purely declarative style, with no destructive update whatsoever, in the end, the bank account's amount of money will have to be changed. So, two independent applications, written in purely declarative styles, can still cause the same problem as a race. The problem of changing state is simply transferred outside of the application (i.e. the database of the bank), but the logical outcome is still totally non-deterministic, depending on which agent finishes its job first.

I think that the 'no races possible' feature comes from the declarative programming style and not from the concurrency programming style. In other words, one can say that 'no races are possible in the purely declarative programming paradigm', but one can not say that 'no races are possible in the dataflow programming paradigm', if the latter lacks the attribute of pureness.

### There is shared state

Since there is no shared state between actors, there can't be races between them.

Each actor has state. Through the messages that actor supports, other actors can be given indirect access. Just because the actors B and C can't have direct pointers to actor A's data doesn't mean they don't have shared access to A's state.

Obviously, when someone says "The declarative concurrent subset (no ports) has no race conditions. The basic concept is dataflow synchronization of single-assignment variables." it really means that
update of single assignment variables is atomic; it does not mean that logical race conditions are not possible.

Not at all. The declarative subset prevents all access to shared state. Single assignment dataflow variables can only be assigned to once. All expressions that depend on those variables will wait until they have been populated at which point those variables can no longer be modified. An algorithm in which one car tries to access the state of another car in a non-deterministic way is unexpressable in such an environment.

### Is it unexpressable, i.e. is

Is it unexpressable, i.e. is it rejected by the compiler or it is rejected at run-time? the post below (by David Barbour) implies that failure is determined at run-time, not at compile time.

### Racing to the Set!

Depending on scheduling, either agent can set the result, depending on which thread runs first.

This is true. But with declarative concurrency either both agents will set the single-assignment result to the same value OR the program will end in failure when the second agent inevitably comes along and attempts to set an already bound variable to a different value.

Ending in failure is a deterministic outcome, too.

While there may be 'internal' race conditions due to scheduling regarding which of two threads gets to bind that variable first, the final result of declarative concurrency does not exhibit any observable race conditions.

### So, at run-time, only one

So, at run-time, only one agent gets to set a variable? Isn't it too late when this check happens at run-time?

I thought the point of all this is to prove that it can not happen at compile time.

### Not the Compiler's Job

Declarative Concurrency proves that there is "no observable non-determinism" at the design time of the Declarative Concurrency paradigm... well before the compiler gets to it.

Multiple agents can set the variable, but they'll all need to set that variable to the same value if the program is to continue. And failing to set it to the same value is deterministic.

### That sounds like arbitrarily

That sounds like arbitrarily powerful theorem prover land, which isn't very satisfying. If so, as we can do dynamic race detection for C threads, we can also employ symbolic/concolic execution there as well, and reap the same benefit. Something stronger seems to be desirable :)

I'm assuming there are some analyses to flag these points, a la barrier inference, at least to reduce runtime overhead.

Something sounds fishy.

### Huh?

It doesn't need to be proven by an automated theorem prover, Imeyerov. Humans are generally responsible for proving their language or language-subset has the properties of Declarative Concurrency before they call it 'Declarative Concurrency'. And among those properties: it ain't declarative concurrency if it can have observable non-determinism.

If you can find a subset of C that exhibits declarative concurrency, more power to you... though, given that C standard doesn't even admit to the possibility of multithreading, I'd be a bit surprised.

There is no "arbitrarily powerful theorem prover" involved, at least not of the automated sort. One tends to associate Declarative Concurrency with a language (or a subset of one), rather than with individual programs.

### Ah, reread it, makes more

Ah, reread it, makes more sense. I read your message in the small, not the large, zeroing in too closely on a particular sentence.

### Message passing and race conditions

the ones [race conditions] solved by the no-shared-memory constraint

It's quite possible to have race conditions (according to the Wikipedia definition) in message-passing systems without shared memory (which includes the Actor model). This is to be expected, otherwise such systems would be less expressive than shared-memory systems (e.g., Java threads and monitors)! Another way to say it is that message-passing systems without shared memory can express observable nondeterministic behavior. Declarative concurrency is a more restricted model; it cannot express observable nondeterministic behavior. But this restriction gives enormous dividends, similar to the ones functional languages give.

Ports are an extension to the declarative concurrent model. The extended model, with a deterministic core (declarative concurrency) and a nondeterministic extension (ports), has very good properties. It allows to add observable nondeterminism exactly where needed and nowhere else. It manages to combine the advantages of functional programming together with the advantages of message passing, for most practical programming situations. That is exactly my point.

### "Declarative concurrency is

"Declarative concurrency is a more restricted model; it cannot express observable nondeterministic behavior"

Why is that? aren't threads subject to the same scheduling constraints as in the case of Actors?

In the example given above (the two cars), supposing that we have two agents (two different cars represented by two different threads), if the cars themselves decide if they have won or not, then we can definitely can have a race condition: one of the two agents may consider that it has crossed the finish line depending on which thread the O/S decides to run.

Perhaps there is something missing in my understanding, I would be grateful if someone cared to elaborate.

### Yes but still no race

Why is that? aren't threads subject to the same scheduling constraints as in the case of Actors?

Threads do get scheduled whenever, but in the pure subset of dataflow your code can't observe the ordering. All possible execution orderings lead to the same result.

It's just like in ordinary purely functional code. Other than non-termination, lazy and eager evaluation both always lead to the same result. The only constraint on ordering is created by explicit dependencies. So if it it doesn't matter what order things are evaluated then it follows that you can do them in parallel up to the limits that the data dependencies allow. If you mix non-termination back into the mix then dataflow behaves basically like call-by-need in the sense that bottom doesn't break your program until you do something to force it (although, unlike call-by-need, an infinite loop can consume considerable system resources even when you don't try to use its value).

Threads in a declarative concurrency context have no ability to query whether or not a particular variable is already assigned to a value. They may only access the value (which will cause a wait if it is not already bound) or attempt to bind a value to the variable. (As James said: "your code can't observe the ordering".)

If two or more threads each attempt to bind a different value to the variable, the result is failure for that program or functor.

Moreover, because these threads don't have the ability to query whether the variable is already bound, this failure is deterministic - it will happen every time because the same set of threads will always (in every run) attempt to assign that same variable to the same different values.

Which thread is the first to perform the binding can vary. Which thread is second to attempt the binding and thus cause the failure can vary. But the observable result (failure) is deterministic - there is no observable non-determinism.

Is that the piece you're missing?

### Thanks.

Thanks a lot, your posts are informative and enlightening. Indeed, that was the missing piece.

Does this failure happen at run time or it is observed at compile time?

Isn't it too late when it happens at run-time?

EDIT:

Isn't it a race condition that a single-assignment variable is assigned in a non-deterministic way?

### Depends on the style of data flow variable

Does this failure happen at run time or it is observed at compile time?

That depends on what kind of data flow variable you are supporting. If you have full logic variables with unification, then there isn't much you can realistically do about checking it statically.

If you have more restricted ones, where you separate read and write capabilities and eliminating them is done merely by directed binding (e.g. certain styles of single assignment variables, aka promises), then you can impose a linear type system to rule out failure statically.

And if you restrict yourself further to something like plain futures, where there is only one computation that can deliver the result (e.g. with functional threads, where the future is implicit and represents the later result of the thread), then failure is excluded by construction.

### Not a Race Condition

Since you've been relying upon the Wikipedia def:

A race condition or race hazard is a flaw in a system or process whereby the output and/or result of the process is unexpectedly and critically dependent on the sequence or timing of other events. The term originates with the idea of two signals racing each other to influence the output first.

Since the output and/or result of the process in Declarative Concurrency exhibits no observable non-determinism and thus does not 'depend' (critically or otherwise) upon the sequence or timing of the thread events, it cannot be called a 'race condition'.

Does this failure happen at run time or it is observed at compile time?

Due to Halting Problem, this is not a failure one could consistently observe at compile-time except by running the code either forever (in case of deadlock or non-termination) or until it returns.

If the process or functor will be created with runtime-computed initial variables, even that much isn't possible.

Isn't it too late when it [the failure] happens at run-time?

It is not too late. Declarative Concurrency would need to be mixed with side-effects (thus resulting in something other than Declarative Concurrency) for the order to be observed by the outside world. As is, if there is a failure, that failure will happen before any side-effects and will be the only observable output of the program.

As Andreas states, Declarative Concurrency doesn't need to share variables and may, thus, avoid that sort of failure by construction. In the forms that don't share variables, the possibility of two threads even attempting to 'assign' the same variable is precluded. (The Oz/Mozart Declarative Concurrency described in PVR's CTM does use variables.) Based on the design, a dataflow language could instead share futures or even lazy data (with any given application being performed by at most one thread).

### Since the output and/or

Since the output and/or result of the process in Declarative Concurrency exhibits no observable non-determinism and thus does not 'depend' (critically or otherwise) upon the sequence or timing of the thread events, it cannot be called a 'race condition'.

Isn't it a race condition when there is a function which returns the result of the first thread that terminates? Suppose we have a function:

f() = a() or b()


With 'a' and 'b' being two threads, and the result being a single assignment variable, function f() may return either the result of a() or the result of b(), depending on the scheduler's mood.

That does not sound very deterministic to me; it has all the attributes of a race condition.

Due to Halting Problem, this is not a failure one could consistently observe at compile-time except by running the code either forever (in case of deadlock or non-termination) or until it returns.

If the process or functor will be created with runtime-computed initial variables, even that much isn't possible.

So the failure will be observed at run-time. Not of much use then (speaking from a practical point of view - the meaning of all this is to prevent these errors from even exist in the first place, isn't it?).

It is not too late. Declarative Concurrency would need to be mixed with side-effects (thus resulting in something other than Declarative Concurrency) for the order to be observed by the outside world. As is, if there is a failure, that failure will happen before any side-effects and will be the only observable output of the program.

That is at the discretion of the programmer though, i.e. if the side affects come after the observation of the failure.

Overall, after reading this thread, and various other things online, my answer to the comment 'I am surprised that it is not used more often' is that this model does not offer any substantial improvement over alternatives: errors are still not caught in compile time, the programmer must be careful not to introduce any IO before the lock (I mean the single assignment :-)).

### Choice operator

That does not sound very deterministic to me

That's because your "or" is a non-deterministic choice operator (or so it seems). Obviously, when you add a non-deterministic operator to the mix, the language becomes non-deterministic.

### function f() may return a point for Declarative Concurrency

function f() may return either the result of a() or the result of b(), depending on the scheduler's mood

You can't express a function f() with these properties you describe in a declarative concurrent language. If your arguments are going to consist of: "If I could violate Declarative Concurrency then my program wouldn't have the properties of Declarative Concurrency", I'll really feel like responding with a juvenile: "Well, duh!".

speaking from a practical point of view - the meaning of all this is to prevent these errors from even exist in the first place, isn't it?

Unification failure can happen even in a single thread (thread X = 1; X = 2; end). So preventing that sort of failure is not 'the point' of declarative concurrency. The advantage of declarative concurrency is allowing one to take advantage of concurrency and multiple cores whilst avoiding race conditions and the many various 'problems' uniquely associated with concurrency.

And, yes, you do pay for this by being limited in terms of what sort of programs you can describe. For example:

• you can't describe a non-deterministic choice operator with 'first one wins' semantics... such as your 'f()' above
• you generally can't use this form of concurrency to apply multiple different search strategies to a AI 'best move' problem or break up a search space because (in general) you'll ultimately need to terminate before studying all moves and choose a non-deterministic 'best' result of those you have discovered
• this sort of concurrency can't readily take advantage of resources and databases outside the concurrent environment without mixing in a little non-determinism... because those resources can change. (PVR's 'ports' introduce non-determinism for exactly this purpose.)

But, despite these and various other limitations, Declarative Concurrency can be used to make subcomputations for 'pure functions' (functions without side-effects) operate in parallel. Many solutions to a wide range of problems in a wide variety of domains can be described in terms of such functions, and so declarative concurrency can help for a fairly wide range of problems in a wide variety of domains.

So, no, the point isn't to prevent failure.

The point is to take advantage of multiple cores for a wide range of problems in a wide variety of domains while constructively avoiding the issues and gotchas often associated with concurrency.

That is at the discretion of the programmer though, i.e. if the side affects come after the observation of the failure.

It's actually determined by the language whether or not it's at the discretion of the programmer.

In Oz/Mozart a programmer is free to 'mix in' code with side-effects into any function. But that isn't true in every language. The ability to 'mix' procedural code and side-effects into functions is a language property I strongly disapprove for a number of reasons I won't clarify here.

I favor segregating functional and procedural, segregating evaluation and execution, and interleaving them without ever allowing them to blur. In a language that segregates functional code from procedural code, one may also segregate declarative concurrency (e.g. some sort of implicit or explicit 'call-by-parallel-future' as opposed to 'lazily-call-by-need') from procedural concurrency (for which I favor the Actor Model), effectively allowing pure functions to take advantage of multiple cores declare parallel processing in a manner independent of the process model. Whether actors could communicate so produced parallel futures to one another would be a design decision. It might only be allowed on the local machine (as a hidden optimization).

In a language such as I favor, it would NOT be at the discretion of the programmer whether side-effects could be introduced into the declarative concurrency. The ability to introduce side-effects wouldn't even be available to the writers of declarative concurrent code subset.

### Sounds very similar to Kamaelia

This sounds very similar to Kamaelia - http://www.kamaelia.org/Home

In Kamaelia you have components, which have inboxes and outboxes. You take data from inboxes and work on it. You send data to outboxes. If you have a piece of data, you own it. If you've sent it on, you don't.

This naturally gives you metaphors that are easy to work with, and also naturally gives you a co-ordination language - in the form of Pipelines, Graphlines, TPipes, Carousels, ConnectedServers, Backplanes (think pub/sub), and so on.

The core difference from the actor model is that the sender does NOT know who the receiver is. This allows complete independence of testing and decoupling of dependencies.

It's exactly the same model as other concurrent models we know work, specifically:

• Hardware (pins, wires, VHDL, verilog, asynchronous hardware design)
• Unix Pipelines (stdin/stdout/stderr, named pipes)
• Biological systems (neurons, axons, muscles, etc)
• Even web mashups follow a similar model. (you can equally argue that many are actually like an actor system of course, except things like SQS)

An overview of Kamaelia, and how you build systems using this model is here:
practical concurrent systems made simple using kamaelia

There's a whole slew of presentations on kamaelia on slideshare I've given at various places which may also be useful.

Probably the biggest benefits you get over the actor model is this:

• Composability, since it's integral to the approach, which leads to ...
• Dependencies become explicit, which leads to ...
• Testability & maintainability (Since you have high modularity, small interfaces, and explicit co-ordination)
• As a bonus, something that looks like recursion stays being recursion. (If method calls suddenly turn into messages, how do you know if a what looks like recursion is recursion or normal code?)

It's worth noting that this model we use in Kamaelia and described here, which I believe is also the same model used by Mascot 30 years ago is functionally equivalent to the actor model.

Best links regarding MASCOT I've found are here: http://async.org.uk/Hugo.Simpson/

PDF of the standard:
http://async.org.uk/Hugo.Simpson/MASCOT-3.1-Manual-June-1987.pdf (12MB)

Incidentally, having built many systems using this model over the past 5 years or so, we have come to the conclusion that this model (and hence the actor model by itself) is not sufficient for expressing all concurrent systems in the cleanest, most direct way possible. You do still need to share data periodically. In those scenarios, you need systems like software transactional memory to assist you.

Hints that this is the case are well known if you look at systems which are wildly concurrent:

• Large scale hardware systems often end up with some shared data storage area, even in highly concurrent systems.
• Biological systems of all kinds, even plants, have a secondary, global communications channel, which is the hormonal system for expressing issues or availability or lack thereof of resources.
• Unix shell systems have used the ENVironment (cf /usr/bin/env) for advertisement of resources, and service lookup.
• SOA systems still need storage
• Mashups rapidly grew the key/value storage they need.

Probably the most interesting thing you gain of course is the ability to have what we sometimes term higher order components. That is systems that are concurrent themselves that accept a concurrent system (or concurrent system factory) as an argument and do something interesting with it.

The carousel component is an interesting example of that, however generally speaking most co-ordination components have the same characteristic, and lead to the ability to express concurrent patterns in the form of (sensibly) reusable, testable and maintainable code.

If you prefer something more like a paper, there is also this paper on Kamaelia, which is now 3 years out of date.
However, one of the key results not described there is this: we have now tested this approach on novices as well as experienced developers. The novices get it and build interesting, highly concurrent, systems. Those who have built network software, unix shell software, or mashups, or anything based around "take this, move it here" get it and build interesting, highly concurrent systems.

Those who are experienced, generally expect life to be harder than it is and find it a bit of a mind-reset, and then have an epiphany and realise that concurrent systems can be fun and easy to work with :-)

By novices, I mean pre-university students as well as university students, and post-university students. Experienced developers means just that :-)

Caveat: I am naturally biased here since Kamaelia has been my baby for the past few years, but I believe it works and sounds very similar to this post.

### What you describe may be

What you describe may be better placed in a dedicated thread.

I'll certainly take a look. Kamaelia intrigues me a bit... I'm quite aware that the language/mechanism with which you describe service components and the language/mechanism with which you hook them together into communicating systems should generally be kept distinct. Based on how you distinguish Actor Model from Kamaelia, it seems the main benefit you're providing is exactly that.

### Does Kamaelia have a declarative concurrent subset?

Declarative concurrency (see Chapter 4 of CTM) is one of the pillars of the multi-agent dataflow paradigm. Is it supported by Kamaelia?

### Depends on humpty dumpty principle

I've no idea as to whether the answer here is yes or no, primarily because I don't own a copy of your book (I had to google "CTM concurrency" to find out what you were referring to). I suspect the answer is "yes, with the caveat that components themselves which you can orchestrate declaratively are written in python and hence internally sequential". (ie /similar/ to CSP)

Kamaelia's concurrency model is based primarily on the model employed by Occam, Unix pipelines, and asynchronous hardware systems, primarily through the realisation that asynchronous hardware systems and scalable network servers tend to have similar *logical* architecture, even though the implementation is radically different.

Kamaelia itself comprises of two halves: Axon which is a python library for wrapping up the concurrency aspects using a dataflow component metaphor (OK, actually something more fun than that if you skip through to slides , and Kamaelia which is a library of python/Axon based components.

However, components can (and are) containers for other components, meaning that you rapidly start creating interesting container components, such as graphline, pipeline, carousel, tpipe, Backplane/publishTo/SubscribetTo.

I suspect that given the caveat that Python is of course NOT declarative in nature, that the fact that we can use things like graphline/pipeline/backplane and friends in a declarative manner means that the answer to your question is "yes". However, I haven't read your book, so I can't say for certain whether what you say means what I think you mean or whether it means what you say it means (ie humpty dumpty principle applies :-).

Quickest answer is probably to point at few files and see whether you think they match your expectations:

Large: Whiteboard application

EITParsing, ChannelTranscoder, repeatingFile, & main part of this probably match:
from this 'record & transcode everything' program.

There's also this speak and write tool which has a fairly large declaration of the system at the end. Which is also described on the kamaelia website in a little more detail. (it's a tool for helping children learn to read and write based on gesture recognition and speech synthesis)

This shows something rather different - since we can effectively change subsystems of components, if designed for that - which shows a different form of declarative system creation: the core code from a greylisting proxy

All in all, I think the answer is "yes", but without reading your book and seeing what you call "the pillars of multi-agent dataflow" I can't really say yes or no.

To me the approach of having something you need for orchestration/composition (which happens to be declarative) I personally view as common sense and pretty common in the hardware & unix shell worlds.

We've been using this approach now for probably 5 years or so and it seems like a nice model, though there are aspects that would be nice to improve (primarily in the area of shards). Overall though it's more granular than occam or hardware, more fine grained than unix shell level, and retaining the clarity of python. Novices and experts seem able to pick it up nicely (novices seem to find it easier - probably because they have less preconceptions), which is nice - having tested that.

BTW, should I have heard of your book?

### CTM

BTW, should I have heard of your book?
CTM has been mentioned quite a bit here on LtU in the past, and we seem to have evolved an implicit assumption that people reading LtU are familiar with it. That said, I notice that you haven't been a member here for very long, so it's probable that you've not seen previous discussion of Peter's book. You can probably get a good feel for the content by looking back over previous discussions of CTM (via the link above), which might help you decide whether or not you think the book is of interest to you.

### Many thanks :-)

Cheers. It's been a while since I've thought about reading an undergrad CS book. I'll see if I can pick up a copy somewhere. Best way to judge after all is to judge for yourself.

I occasionally read random threads on Lambda the Ultimate when someone points at a good discussion here, but I'd not seen CTM mentioned (probably gives a good indication of how often I look here). I'm generally too tied up with work/home/the usual things to allow myself the luxury of popping by here let alone saying anything!

I thought I'd say something this time because we've used this approach (slightly different names for things I think) very successfully in a number of different systems to the extent that in many cases it's now easier for us to implement a naturally concurrent system rather than try to jump through hoops to make a system into a something sequential. (ie its often quicker to build a correct concurrent system this way than it is to build its sequential counterpart. Not always, but often. Wiki type systems for example don't fit this model so far AFAICT for example)

I've tried a bunch of different metaphors out for describing this - primarily aimed at making the area more accessible to the average developer and the most compelling one I've found is calling it "sociable software". ie making software more sociable.

After all, in concurrent systems, one thing is asking another thing to do something, and there's nice ways (sending a message - asking politely) and nasty ways (interrupting their state - calling a random method without checking its safe). One is more sociable than the other. I think that particular metaphor needs a bit of work, but this presentation "sociable software" at Barcamp Manchester did go down pretty well.

There's a reference to Manchester University's async hardware stuff that I worked on briefly which is partly what inspired (some time later) the approach we took in Kamaelia.

Anyway, hopefully I'm not made too much of a faux pas by not knowing the implicit assumptions of this community :-)

(It's my experience that academia assumes the world revolves what it knows, and that industry assumes the world revolves around it, and the same for any groups in between :-)

That said, since CTM is an implicit assumption on the part of this community though, I guess Peter may be upset by someone not knowing about it, so apologies for that.

### Confusion

When I see the phrase "Wiki type systems", I immediately wonder what a typed Wiki could possibly be. (Monadic pages? Contribution combinators? Co-recursive spam?) Of course you mean "Wiki-type systems", systems that are Wiki-like. But I was excited for a minute.

### A Wiki type system would,

A Wiki type system would, perhaps, support objects or object-descriptions as pages, complete with attributes that can be imported and accessed by other pages. Imagine a massive multi-programmer Smalltalk IDE (where anyone can edit, and where each page name is a potential object reference).

If the objects were user-edited, such a type system would likely be dynamic rather than static... though, with enough dependency-tracking, I suppose one could 'recompile' affected components after an update (or invalidate any JIT compilation).

I've been chewing on a similar idea - something like a Wiki-based IDE (and not just for mashups like QEDWiki). You could look at a topic on C2 for a little on it.

The June 2003 draft of CTM is still available for free on the Internet. But be warned that the draft has some errors: don't flame me if you find errors in it. The published version has a huge number of corrections and small improvements. I accept flames for errors in the published version.

Chapter 4 defines declarative concurrency. Chapter 5 defines message-passing concurrency, but many of the examples actually use the multi-agent dataflow paradigm. For example, two techniques are given for avoiding deadlocks with active objects doing callbacks: one using threads and one using continuations. The thread solution uses a property of declarative concurrency: the expression

Z=X+Y

can be replaced by

thread Z=X+Y end 

which removes the deadlock cleanly. It works because Z=X+Y is declarative. In the declarative concurrent model, adding threads only changes calculation order, not results of calculations.

### Thanks!

I think what you're talking about is more fine grained. I'll buy a copy of the book and once I've read the appropriate bit give some feedback. I think however, from what I'm seeing, our model validates yours (even if, maybe, not exactly the same - but they do sound close).

Perhaps worth noting is the key difference between this and something like the actor model is that by *not* defaulting to knowing who the receiver is you gain reusability. Indeed, you gain reusability of not just components, but also patterns of concurrency.

I'll defer further comment until I've looked at your book. Many thanks for the pointer to the PDF (until the paper copy arrives :).

Oh, incidentally, this MiniAxon tutorial was written originally for a (specific) pre-university trainee with about a week's development experience, and has been tweaked since then. Pretty much every student we've had through since then - either via trainees, or via google summer of code - or other routes - has been able to pick it up and build interesting things. They end up focussing on the problem they want to solve, rather than worrying about concurrency. (In a way reminiscient of people picking up a language with garbage collection vs one without)

(please note, the miniaxon tutorial is logically compatible with our core system, and components built for a miniaxon will work with mainline axon, but axon itself is implemented somewhat differently from the tutorial)

Picking up on the observable port object, being able to look inside running systems is a real boost - which we can do using the Axon Visualiser .

As a result, I think our two models are either the same or compatible.
The neat thing about this is that this means both of us have reinvented MASCOT, from different perspectives, which is over 30 years old - something I didn't hear about until last Christmas.

### Kamaelia does not directly support declarative concurrency.

I have read (most of) your book, and I've read enough about Kamaelia to be reasonably confident in asserting that it doesn't provide any language features supporting the expression of declarative concurrency.

Declarative Concurrency promises determinism (or, more accurately, "no observable non-determinism" in the outputs and results of the program). The examples in the CTM book usually achieve this by means of passing delayed 'tails' and other promises for data bindings (upon which a thread will implicitly wait) to different threads, with single assignment (each binding only ever receiving just one value). This fits very well into the 'functional building blocks as concurrency patterns' topic.

Kamaelia is comparable to the Actor Model where components are, by default, communicating in a largely asynchronous fashion. Whenever two different components fan-in (send messages) to one component, especially if to a single inbox, one will introduce potential and observable non-determinism in the result.

Through careful effort or discipline by the programmer (e.g. always waiting indefinitely on an empty inbox (no polling) + never checking more than one inbox at a time (no multi-wait) + never having two different components send messages to the same inbox (no sharing)), a programmer could ensure a deterministic result. But such self-discipline is not a reason to say that Kamaelia supports declarative concurrency any more than one can say that C supports functional programming.

### Cheers

I thought the humpty dumpty principle applied, which is why I was being cautious in my answer. Thanks for clearing that up.

*relurks*

### Do dataflow variables have to be primitive?

You could implement dataflow variables as a data structure, like refs/cells in some languages. For example, like x = ref(4); x := 5 you could do x = dvar(); unify(x,y). Is a library that implements dataflow variables and ports sufficient, or would it be too cumbersome?

### By completeness, no...

... one would generally expect they don't have to be primitive. However, when it comes to fitting concurrency into the functional paradigm (especially if it is to be 'pure' functional excepting the use of ports), I'm certain it helps quite a bit.

### Not possible as library

Jules: You could implement dataflow variables as a data structure

Unfortunately, no. The essential idea about data flow variables like the ones described by Peter is that synchronisation on them is implicit - you can pass them around in place of "real" data and any thread that needs to look at them automatically suspends until the value is available. This makes parallelism integrate seamlessly with the rest of the language.

In terms of typing this means that every type is inhabited by data flow variables. This is what e.g. logic variables in Oz or futures in Alice ML give you. Unfortunately then, all language primitives and the entire runtime system have to be aware of them[*]. A library approach would necessarily introduce a separate type for data flow variables, and you need a dereference operation to get to its content - which essentially is an explicit synchronisation operator.

[*] Having worked on and with the implementations of these languages, I have to disagree with Peter's claim that "the paradigm is easy to implement efficiently". I guess it depends on how you define "efficient", but making a language with data flow variables as efficient as a conventional sequential language is at least as hard as doing that for a lazy language. Neither of the two languages mentioned gets close, despite man years of engineering.

### Conversions and views

A library approach would necessarily introduce a separate type for data flow variables, and you need a dereference operation to get to its content - which essentially is an explicit synchronisation operator.

A language with a strong notion of user definable conversions might be able to do it pretty cleanly. In such a language, "add" wouldn't take two numbers but two values that can automatically be converted to numbers. The user would specify that a "Future a" is convertible to an "a" via a dereference. And voila, you could add two Future Num's or a Future Num with a Num.

Scala implicit conversions and view bounds are more or less what I'm thinking, but the standard library isn't generally geared towards view bounds thus conversions tend to happen before application. So your objection still stands in Scala.

### That doesn't work...

...unless you make every function view-bounded over every individual argument, including every trivial function the programmer ever writes. If you go that far you can as well just wrap everything into type Future. Both approaches are whole program transformations and I don't consider either practical (even ignoring the likely devastating performance implications).

The only modular approach would be to define Future t to be a subtype of t for all t. But to make that fly the language already needs to be fully subtyping-polymorphic in all its primitives, which AFAICS is even harder to implement efficiently than just hardwiring the special case of futures.

### Some thoughts

unless you make every function view-bounded over every individual argument,

Yup. I'm just blue skying here. Imagine that if a function requires a Number as an argument then it could leave it up to callers to specify how to convert actual arguments appropriately. It seems that at most the cost shouldn't be much more than working with dictionary passing (in say GHC type classes) times the cost of the user's conversion code. In fact, a bit of strictness analysis could allow the compiler to indicate when conversion can take place before function application and the compiler could statically elide most such conversion function passing. E.g. '+' is strict in both arguments, so a caller could be forced to do the conversion before calling it. But if the caller was also using strict numeric arguments then it might not have to do any conversion either. Only non-strict arguments would need to defer conversion until use.

Hmmm, on second thought that kind of analysis might destroy the whole purpose of using futures to begin with. That needs some more thought.

### Primitives

Note that by "primitives" I did refer not only to basic functions like addition and friends, but to all operations in the languages, including basic stuff like function application, field access etc. You would need to be able to "override" the behaviour of all of them, which makes it much more costly than Haskell-style overloading. As a particularly pathological example, there can well be futures of type unit even, so you'd need to be able to override the behaviour of matching unit values (which aren't even represented usually).

(As an aside, the subtyping idea also seems inherently incompatible with features like sealed classes and final methods etc.)

### Implementing the multi-agent dataflow paradigm efficiently

Having worked on and with the implementations of these languages, I have to disagree with Peter's claim that "the paradigm is easy to implement efficiently".

My comment is from my experience with Prolog, which can be compiled down to code similar in efficiency to a good low-level C-like language (see the Aquarius and Parma systems), and from my knowledge of KLIC, a concurrent logic language with similar C-like efficiency. No serious compiler work has been done for Oz. The Mozart compiler does a straightforward translation to a byte-coded emulator. So we cannot judge the efficiency of the multi-agent dataflow paradigm by the efficiency of Mozart.

For Alice ML, I am thinking of OCaml, which has very good implementations that give C-like efficiencies. I am also thinking of the compilation of dataflow languages, e.g., David Culler's work, in which a language that is pervasively concurrent (all operations implicitly execute in a separate thread, which Oz does not do BTW) is compiled efficiently by using dataflow analysis to partition it into sequential parts. Maybe you can comment on whether the techniques of these languages can be applied to Alice ML?

### Easy implementation?

My comment is from my experience with Prolog, which can be compiled down to code similar in efficiency to a good low-level C-like language (see the Aquarius and Parma systems), and from my knowledge of KLIC, a concurrent logic language with similar C-like efficiency.

Not being terribly familiar with these, I cannot comment. Do they qualify as "easy" implementations?

No serious compiler work has been done for Oz.

Indeed. But if data flow synchronisation was "easy" to implement, no "serious" work should be necessary for it.

On the other hand, the compiler is only half the story anyway. And lots of respective effort has been put into the Oz runtime, which is at least as important, because that's what you generally have to fall back to.

For Alice ML, I am thinking of OCaml, which has very good implementations that give C-like efficiencies.

OCaml does not have data flow synchronisation, so I don't see what it is supposed to demonstrate in this respect.

Alice ML certainly tries to follow similar implementation ideas where possible, but the presence of futures invalidates many of them. In particular, they rule out many standard optimisations, or at least make them much more complicated.

I am also thinking of the compilation of dataflow languages, e.g., David Culler's work, in which a language that is pervasively concurrent (all operations implicitly execute in a separate thread, which Oz does not do BTW) is compiled efficiently by using dataflow analysis to partition it into sequential parts. Maybe you can comment on whether the techniques of these languages can be applied to Alice ML?

I don't really know. I am convinced, though, that good optimisation of data flow variables requires advanced flow and strictness analysis. But that can only ever address intra module paths. All these techniques break down as soon as it comes to separate compilation. I consider this very much a killer, especially in languages like Oz or ML that encourage fine-grained and dynamic modularisation.

The only way around that is JIT compilation with dynamic analysis and potential re-jitting. We did some of that in Alice ML, and it helped quite a bit, but still it is far from competitive to OCaml. And it certainly is not an easy approach either.

(Aside: "pervasive concurrency" was in Oz 1, but was deemed a major mistake.)

### Syntax for dataflow programming

I've been working on an undergraduate senior thesis this year with the goal of designing a language, call Curin, that incorporates many of the ideas described in this post. So far, most of our time has been spent researching what syntax would work well for a dataflow language.

We are currently using a syntax that looks somewhat similar to that of Unix shells. In particular, input can be passed to a function from either the left or right side. The advantage of this is that the input and output of multiple functions can be connected without needing nested parenthesis. For example: 0 | f 1 | g 2 | h 3' compared to h(g(f(0, 1), 2), 3)'.

If anyone is interested in taking a look at our work, all of our code is available from our subversion server. The quick reference and sample grammar are probably the more interesting of the things we have done so far.

If anyone knows of any similar languages or has any suggestion on how to compile to a low level representation I would be interested in hearing about them.

### Synchronous dataflow programming

This looks quite closely related to the Lucid Synchrone or Reactive ML languages.

### Rhope

I'm not sure how similar my language, Rhope is to what you're working on (dataflow behavior is currently limited to local variables only unfortunately, though that will change), but I spent a bit of time trying to find a good textual representation for what used to be a purely visual dataflow language so perhaps it will be useful to you.

Parameters can be passed in a more or less traditional fashion with the exception of a few oddities that don't have much to do with the dataflow aspects (for instance, Do Stuff[1,2,3,4] is equivalent to [1,2]Do Stuff[3,4] and [1,2,3,4]Do Stuff). The part that I think is somewhat novel (at the very least I haven't seen it before) is it's notation for denoting dependencies out side of simple parameter passing. For instance, conditional evaluation looks something like this in Rhope:

If[<some boolean expression>]
{
Do Something[]
}{
Do Something Else[]
}

In the preceding code, each output of the If sub-routine (it has two) is mapped to one of the pairs of brackets that follow it. All code within the brackets has a dependency on the corresponding output of If. So Do Something[] will be executed if and when the first output of that call to If is populated and Do Something Else[] will be executed if and when the second output is populated.

If the data from the output is needed the ~ character can be used to refer to it within the corresponding block or the outputs can be assigned to local dataflow variables. For instance, the following two snippets both produce the same dataflow graph:

val1,val2 <- Make Some Values[1,2,3]
Do Something[val1,val2]

val1 <- Make Some Values[1,2,3] {}
{ Do Something[val1, ~] }


and likewise these two snippets:

f[g[3]]

g[3] { f[~] }`

In practice, this syntax has worked out okay for the most part. Biggest problems in my experience working with the language are the excess of brackets in heavily nested calls, especially when using infix notation and that ~ can only refer to the current output block when in many cases it would be convenient to refer to one (or more) level(s) up.

I don't have an efficient implementation of Rhope yet, so I can't offer much help on the compilation front. I've just begun work on a native code compiler, but it will be a while before I have something working.