Erlang concurrency: why asynchronious messages?

The title says it all. Asynchronious messages are a very nice solution when you're sending a command (send it and forget it, as opposed to twice the time required waiting for a response). But what about querying for information? In this case you're forced to fake synchronious messanging by maintaining information about your request (either in a hashmap, or in a message itself). Of course continuations are also a nice solution for faking synchronious messages.

It's pretty easy to see how message passing and special scheduling are great paradigms for the type of problems that Erlang was designed to solve. Not so easy for asynchronious messages, though. Can anyone comment on why Erlang designers went with asynchronious messanging? Is it because it's a more general feature and synchronious messages can be implemented on top of them?

Comment viewing options

O'Caml, too

I've wondered the same thing about O'Caml's Thread module. When creating a thread, the thread can return any value it wants but it will be discarded. Even though there is a function to wait for the termination of a thread, it returns nothing. I've often thought that this convention might be due to tradition (I know at least Java and Python work similarly).

Async is king

Asynchronous messaging is inherently how computer networks (e.g. the internet) work so it's an extremely important way of thinking.

For the answer to your question I would suggest asking the erlang-questions mailing list. That's where Erlang's designers hang out.

it's an extremely important

it's an extremely important way of thinking

Sure is. But so is reasoning about synchronous IPC.

It's interesting that Ada which was intended for somewhat similar types of software as erlang uses synchronous message passing (the rendezvous model).

CSP

IIRC, Ada's rendezvous model was somewhat influenced by Hoare's work on CSP. It's interesting to return to the source, so to speak, and see what Hoare had to say about synchronous vs. asynchronous messaging in CSP. From Communicating Sequential Processes:

For many years now, I have chosen to take unbuffered (synchronised) communication as basic. My reasons have been
1. It matches closely a physical realisation on wires which connect processing agents. Such wires cannot store messages.
2. It matches closely the effect of calling and returning from subroutines within a single processor, copying the values of the parameters and the results.
3. When buffering is wanted, it can be implemented simply as a process; and the degree of buffering can be precisely controlled by the programmer.

Hoare also lists several other reasons for preferring unbuffered communications as a primitive in another section. These reasons include: problems with deadlock situations when using finite-length buffers; implementation difficulties with infinite buffers; problems with the mathematical tractability of infinite buffers. On the other hand, Hoare also goes on to point out that

Of course, none of these arguments carry absolute conviction. For example, if buffered communication were taken as primitive, this would make no logical difference in the common case of alternation of subroutine call and return; and synchronisation can be achieved in all other cases by following every output by input of an acknowledgement, and every input by output of an acknowledgement.

Since sychronous (rendezvous) message-passing can emulate asynchronous message-passing, and vice versa, the issue isn't so much which is better or more primitive, but rather which makes more sense as the default message-passing paradigm for a particular language. Occam went with synchronous messaging, partly because it was influenced by CSP, and partly because it was primarily intended to be used in tight clusters of transputers communicating through reliable unbuffered links. OTOH, Erlang was designed for use in distributed switching systems with potentially faulty nodes, so it's not all that suprising that asynchronous message-passing made more sense as a default.

For the record, Ada95 added to something called "protected objects" to the language. These are extneded monitors, and moved Ada beyond CSP. Buffers, for example, can be implemented as protected objects. More here.

That's silly. Modifying

That's silly. Modifying registers is inherently how our computers work but we've managed to develop abstractions over this so that we're freed from the burden of doing this. Same applies to messanging.

Isn't async the abstraction?

Isn't asynchronous communication the abstraction here? I mean, the network is naturally synchronous (unbuffered). When I send a message down the wire to another machine, the machine actually physically has to pick that message up and do something with it. In asynchronous communication we abstract that process away: we don't worry about having to stop and listen to get a message, the runtime takes care of receiving and buffering messages for us until we need them.

I think Luke's point wasn't that asynchronous communication matched the actual physical nature of networks (e.g. the Internet), but that it is the model of many networks (especially the Internet). Now, I'm no network guy, so I can't verify this as true, but I believe that was his point.

Abstract asynchrony

I'm starting to detect a terminological quagmire here (I'm fond of pointing these out ;-) )

The simplest definition of an asynchronous message, pace Hoare, is "a message which doesn't require a response". Buffering is a technique for preventing messages from dropping on the floor on the other end, but your problem may or may not require that solution, and it certainly isn't required for asynchronous communication by definition.

As to what is more abstract, my own preference in technical contexts is NOT to define abstract as an opposite to concrete, as is often the case in more casual parlance, but rather as an opposite to particular.

The more abstract concept is the one that is parameterized over the widest ranging inputs (or similarly, the highest order inputs). Within this definition, asynchronous communication is less than or equally abstract to synchronous: it requires only a channel and a message to send down that channel.

Synchronous communication requires both of these elements and also some notion of a response channel and response message.

(If you want to create a definition of abstract that made more inputs more abstract than less inputs, I could live with that in some contexts, but over all, it seems to me that both ideas are at the same level of abstraction for my definitional taste.)

Which choice is the better default is largely a question of your default assumptions about your problem domain: some problems are naturally synchronous, some more naturally aren't.

Ah.

The simplest definition of an asynchronous message, pace Hoare, is "a message which doesn't require a response". Buffering is a technique for preventing messages from dropping on the floor on the other end, but your problem may or may not require that solution, and it certainly isn't required for asynchronous communication by definition.

Ah, okay, I get it now. That makes a lot more sense.

I wouldn't say the network is synchronous

The TCP protocol specifically takes into account the fact that messages may arrive at the destination server in a different order than they were sent. The UDP protocol is send and pray.

Synchronous network?

Not so much....

Let's say you have a 1Gbps network and 1.5GHz machine executing an instruction every two cycles. Sending a 64-byte message is something like 512ns or the equivalent of 256 instructions. That would be a pretty long delay itself, even neglecting transit time for the message and any extra delays introduced by hardware protocols.

Given most modern hardware, you're looking at asynchronous behavior even before the message leaves the host.

Building Blocks

I think an important consideration is which option) makes it easier to build reliable concurrency abstractions that can be used as building blocks.

You might be intrested in this old LtU thread.

For one thing, Erlang was designed for telecoms, and many of the protocols used in telecoms are asynchronous. Even the protocols with synchronous semantics (such as Diameter) have asynchronous underpinnings. One thing one notices when programming in e.g. Erlang is that there are many different forms of synchronous communication, and finding a single primitive that meets all requirements is difficult. Consider the simple case of sending a message to another "node". This normally (in Erlang) involves possibly establishing a TCP connection between the nodes and then sending the message. Should the sender be suspended while nodes are handshaking? This is sometimes desirable, but other times disastrous. Erlang now has a send_nosuspend() function to allow the programmer to choose between basically asynchronous and truly asynchronous. :-)

The work that led to Erlang included programming experiments in a number of languages. This was refered to as the POTS experiments (POTS = Plain Old Telephony System). One of the languages used was Ada, and the Ada rendezvous led to problems, esp in a distributed setting. Asynchronous message passing seemed like a more flexible approach.

It should be noted that the combination of asynchonous message passing and selective receive makes this "tradeoff" of rolling your own synchronous dialogues quite easy to bear. Selective receive with implicit buffering allows a synchronous dialogue to be completely(*) encapsulated within a function call, making it ideal for a library function. This is more difficult to achieve in event-based programming. Erlang programmers use synchronous dialogues quite frequently, usually through the gen_server behaviour. You simply call the function gen_server:call(Server, Request), and the function returns the response from the server.

(*) ... for practical purposes. One may object that Erlang's single message queue per process does not amount to complete encapsulation, but in practice, pattern matching on tagged messages in a single message queue offers an excellent compromise, IMO.

A simple form of parallel evaluation can be illustrated with the following two commands in an Erlang shell:
 Eshell V5.4.8 (abort with ^G) 1> Pids = [spawn(fun() -> receive {From,get} -> From ! {self(),I} end end) || I <- lists:seq(1,10)]. [<0.32.0>, <0.33.0>, <0.34.0>, <0.35.0>, <0.36.0>, <0.37.0>, <0.38.0>, <0.39.0>, <0.40.0>, <0.41.0>] 2> [begin P ! {self(),get}, receive {P,I} -> I+1 end end || P <- Pids]. [2,3,4,5,6,7,8,9,10,11] 

This was evaluated on a Windows machine with one cpu, and the scheduler is predictable enough that the results always arrive in the same order, but with the SMP version of erlang on a multi-core machine, the "evaluation" would be truly parallel. Not that this would be an especially efficient way of incrementing a list of integers, but you get the picture.

By the way, a faily obscure

By the way, a faily obscure and unknown part of Ada is optional (but standard) support for distributed programming (usually called Annex E in reference to its location in the langaue definition).

Isolation

Quoting from Joe Armstrong's thesis:

Isolation implies that message passing is asynchronous. If process communication is synchronous then a software error in the receiver of a message could indefinitely block the sender of the message destroying the property of isolation.
The idea is that asynchronous messaging promotes a robust design in which dropped messages don't cause system failure. In languages that use synchronous message-passing, such as occam, you'll sometimes find designs that insert an "overwriting buffer" (effectively emulating asynchronous message-passing) between processes that need to be isolated.

That's waht I meant earlier

That's waht I meant earlier when I wrote about "reliable" components.

I think it was a matter of

I think it was a matter of necessity: asynchronous messaging is what most telecom apps need.

In the 6 years I have been working on defense applications, I've never met a need for synchronous messaging.

Orthogonality

If you don't mind the plug...

In Alice ML (the same holds for Oz) we chose to make "message passing" and asynchronicity orthogonal features, using the concept of futures and "functional threads". This makes the schisma between synchronous and asynchronous message passing more or less disappear.

Message passing comes in the form of RPCs. If f is a so-called proxy function (an RPC wrapper), then

f(args)

performs an RPC - synchronously. If we want to make it asynchronous, we simply use a thread:

spawn f(args)

Here, "spawn e" is an expression that initiates evaluation of e in a new (light-weight) thread. It immediately returns a future of the result, which will be replaced by the proper result once it is available.

This covers both synchronous and asynchronous invocations. But unlike most other approaches, futures also provide middle ground: you can ignore the future created by spawn, giving you conventional asynchronous calls, but you can also use it, giving you asynchronous calls with results:

val x = spawn f(args)

Data flow synchronisation ensures that you synchronise implicitly if x is needed before being available - but unlike with synchronous calls, synchronisation now happens at the latest possible time, thus maximising concurrency and hiding network latency as much as possible.

In this approach, synchronicity and asynchronicity (and middle-ground) can easily be chosen on a per-call basis. We can also employ asynchronicity independent from message passing, which sometimes is useful.

Concurrent ML

Reppy's book Concurrent Programming in ML discusses the chose between asynchronous and synchronous messages, and motivates Concurrent ML's choice of the latter in terms of the value of common knowledge; when message passing is synchronous, both threads know that they managed to synchronize.

IIRC, the book also points out that both synchronous and asynchronous message passing can be used to implement the other style (and provides examples of implementing asynchronous message passing mechanisms on the basis of synchronous ones; the sender just spawns a thread to send the message).

CML and Erlang

I did some googling a while back trying to correlate the work on Concurrent ML with that on Erlang. I got the distinct impression that much of the work was carried out roughly in parallel, but with different input requirements and very little (if any) communication between the two camps.

The goal for the Erlang team was basically: "make something like PLEX (the programming language of Ericsson's legacy phone switches) - only better". One of the important goals for Concurrent ML was to make the concurrency constructs as decidable as everything else in ML.

Like Hoare wrote, it's easier to conduct strict reasoning about synchronous communication. This was also the conclusion Nordlander et al reached with OHaskell. And I recall from projects with Thomas Arts and Clara Benac Earle, model checking Erlang via uCRL, how unbounded buffers and asynchronous messaging tends to complicate formal reasoning.

OTOH, the products we build with Erlang tend to be too complex to reason about formally anyway, and constructs that (albeit perhaps theoretically flawed) seem to work well in practice become quite attractive. Our systems don't need to be fully decidable - but need to fulfill performance requirements with a certain probability. This allows us to use techniques that may be (almost) beyond the reach of formal reasoning, but subjectively simplify programming and human understanding of our specific type of product. In mission-critical hard-realtime systems, this approach would be questionable, to say the least.

OT

Welcome Ulf! Long time no see :-)

For one semi-formal treatment of asynchronous message passing, see Elements of Network Protocol Design, by Mohamed G. Gouda*. An old, somewhat out of date, introduction is Protocol Verification Made Simple.

I find myself intuitively dubious about it being "easier to conduct strict reasoning about synchronous communication," but I am interested in network protocols and synchronous communications simply do not cover the general case.

* Disclaimer: Mohamed was my advisor. But I don't get any money from the book.

[On edit: better title :-) ]

Asynchronous for the internet

... and motivates Concurrent ML's choice of the latter in terms of the value of common knowledge; when message passing is synchronous, both threads know that they managed to synchronize.

This can be problematic in distributed applications because different nodes will naturally see events in a different order. To create the rendezvous abstraction it seems like you'd need to do a lot of tricky implementation or impose limitations to make the protocol deterministic (e.g. strict request/response pairing). Has this been studied?

For an example the first few versions of SLIME's network protocol were built on CSP and this led us to a dead end. Any time the protocol allowed both Emacs and Lisp to initiate a transition we would lose synchronization (e.g. when Emacs sends a request and Lisp simultaneously enters the debugger to handle a unix signal). This was a big problem for a long time until we saw the light and switched to a simple asynchronous protocol. Sad because the original design was more beautiful -- it just couldn't handle every case.

NB: I like CSP/CML/Ada. I'm just suggesting that asynchronous messaging is a better match for internet protocols.

network rendezvous

To create the rendezvous abstraction it seems like you'd need to do a lot of tricky implementation or impose limitations to make the protocol deterministic (e.g. strict request/response pairing). Has this been studied?

I imagine it's probably been studied by a number of different groups. However the only one I can think of off the top of my head is Mario Schweigler's work on transparent networking for occam. The goal of Schweigler's work was to retain occam's CSP-like synchronization semantics when using "channels" that connect different nodes on a network. It does end up requiring a bit of complexity in the implementation, but all of that is apparently pretty much transparent to the user (I haven't tried it, but have heard good reports from others). The result is that you can use networked channels just like any other occam channel.

As for whether synchronous or asynchronous messaging is a better match for internet protocols, I suspect it depends on the application. After all, there's a reason that we have both UDP and TCP.

Well

Hey! UDP and TCP both provide asynchronous communication. :-)

It does occur though that most common internet protocols could probably be modelled neatly in CSP. For example in HTTP and SMTP I can't think of any important situations where both server and client are able to initiate a new event at the same time. In these cases there's no risk of losing synchronization so the synchronous and asynchronous designs produce the same program anyway. But it seems to me that introducing one "can occur at any time" state transition will wreck havoc on a synchronous design :-)

I read Mario Schweigler's paper but I didn't understand how they create their common ordering of events on different nodes. Can someone explain?

[Removed unnecessary digression.]

I do not think that word means what you think it means

Hey! UDP and TCP both provide asynchronous communication. :-)

Yes, but with different qualities (ordering guarantees and the like). You can see asynch vs. synch as a similar question of choosing the appropriate guarantees for a particular application.
But it seems to me that introducing one "can occur at any time" state transition will wreck havoc on a synchronous design :-)
That hasn't been my experience with occam. It's not uncommon to have process networks in which a single process is potentially communicating with several other proceses - any one of which might send it a message that changes its state. But perhaps there's been something of a misunderstanding here. Synchronous message-passing doesn't guarantee that whatever state machine you're communicating with is in a particular state, it simply guarantees that when you managed to send your message (a) the receiving state machine was in a state to receive it, and (b) the message was actually received. It may still be necessary to design your processes such that they don't make unwarranted assumptions about the state of other processes with which they're interacting.
I read Mario Schweigler's paper but I didn't understand how they create their common ordering of events on different nodes. Can someone explain?
The way I read it, events within a given channel are ordered. I didn't see any mention of guarantees beyond that. Is this question perhaps this is related to the possible misunderstanding I mentioned above?

Example

Suppose you have a protocol like this:

TwoWayRPC = client-send-request -> client-get-response -> TwoWayRPC
| server-send-request -> server-get-response -> TwoWayRPC


My question is, what happens if both the server and the client immediately decide to send a request? I'd assume they can both initiate their request events since they both think the other process is in the initial state. But client-send-request -> server-send-request -> ... would be a malformed trace, so how do you break the tie?

NB: I have not programmed in Occam so I may be missing something obvious. I'm thinking of the property of CSP that the stream of events is identical for all processes.

Possible solutions

Given the situation you have described, what you would end up with is deadlock. Both processes would be blocked waiting to complete their send, and neither would be in a state to receive a request. There are several possible solutions to this problem, including:

• Redesign the system to be a true client/server design (i.e. the server cannot make requests, only respond to them).
• Insert small buffering processes (occam supports extremely lightweight processes) between the client and server, allowing the sends to take place without blocking. Following a send the client and server could then use occam's ALT construct to choose between receiving a response or a request. This solution effectively introduces a limited amount of asynchrony into the message-passing.
• Separate the requesting and responding tasks into parallel processes, allowing requests to be both made and responded to simultaneously.
• Add a third, arbitrating process to the system. Both client and server make their requests to this arbitrator, which then decides which request will actually get processed.
There are probably other solutions as well. Which one is most appropriate will obviously depend on the context in which the design is being used.

OK

OK but this strikes me as a rather asynchronous way of thinking. I had expected there to be some protocol in the style of two-phase commit to break ties enforce CSP's universal ordering on events.

Horses for courses

OK but this strikes me as a rather asynchronous way of thinking.

Asynchrony is inherent in the problem, as you defined it. If both communicating processes are capable of nondeterministically initiating a communication, then they are both capable of reaching states where it doesn't make sense for them to synchronize. The design problem is then to find a way to move both processes back to states in which synchronization again makes sense. You would presumably face the dual problem in a language like Erlang: no extra effort is required for asynchrony, but anytime you have a problem in which you want to ensure some kind of synchronization between different nodes (say a guarantee that some particular message has been received and acted upon) you will need to add extra infrastructure. As I said elsewhere in this thread, which default makes the most sense depends on the application area you are working in. Considering how popular and widely used (especially in Europe) occam was in its heyday, I'd say its safe to assume that the synchronous model made sense for the application areas in which it was used (AFAIK, primarily embedded systems in the automotive and aerospace domains).
I had expected there to be some protocol in the style of two-phase commit to break ties enforce CSP's universal ordering on events.
That's effectively what the arbitrator solution gives you. But if you were looking for something built into the language, then no, AFAIK, there isn't anything (although I'm far from an expert on occam, so there may be something I don't know about). In CSP, one might specify both client and server as making an "external choice" between sending or receiving a request - the resulting system would "break the tie" nondeterministically, as a result of the way CSP's external choice operator is defined. But, unlike CSP, occam doesn't allow choice to be guarded by a message send (only a message receive). Hence the need for a little extra infrastructure.

A digression....

Luke, you have raised an issue that has been bothering me for a while regarding the relationship between Gouda's AP (that I mentioned earlier) and pi calculus-ish formalisms.

Your TwoWayRPC example, in hopefully good pseudo-Erlang:

TwoWay(p) ->
after 0 -> p!Request,
end
end.


(That is, using two processes each with the other's name as p.)

This system will deadlock if both processes send Requests simultaneously.

The roughly equivalent example in AP:

process p
begin
rcv Request from q -> send Reply to q
[] true -> send Request to q  # true here is similar to "after 0"
[] rcv Reply from q -> skip
end


(To get executable code similar to the Erlang, there's a bit more work to do, but it would be behaviorally identical to this.)

The two examples seem fairly similar, but the second will not deadlock, among other behavioral differences. The primary difference between the pi calculus and related languages and AP seems to be that the former allow "recursive" receives---like that receive after sending the request.

The ability to completely encapsulate communications, as allowed by Erlang-ish receives, is nice. But it seems to make the system significantly harder to reason about. What's up with that?

HTTP

For HTTP < 1.0, a simplistic CSP model might be adequate. For 1.1, you are looking at keepalives and streaming requests.

Suppose a client requests a large html page, and the server sends it. As soon as the client starts receiving the page, it begins parsing it and streaming requests for embedded elements (images, say) of the page. The server can stream replies for those elements, including breaking up each element into multiple chunks.

The single-step, request alternating with reply, leads to unacceptable performance and lovely tips like, "use (an empirically determined) 4 connections/requests per server".

I'd love to get the time and opportunity to learn and use KRoC type stuff, some day...

Seems like the various mailing lists and web sites related to Occam/CSP/KRoC are pretty dead. The only thing that looked alive was the upcoming '07 meeting. Is it dead Jim? [And JCSP looks quite dead.]

While reading the KRoC.net paper, I couldn't find (maybe I missed it) how they address the standard Fallacies of Distributed Computing. They kept saying they wanted networked end-points to look the same as local ones, which just seems nutty (although I am not an expert nor play one on TV).

Not as far as I know

Traffic on the occam and jcsp mailing lists tends to ebb and flow. You're correct that there hasn't been much going on lately on either list, but I don't see that as reason for concern.

As for KRoC.net, my understanding is that it's now been folded into the main KRoC codebase. KRoC itself is perhaps not progressing as quickly as it used to, since I believe that Fred Barnes (the primary developer) is now focusing most of his efforts on nocc, the "new occam compiler".

Somewhat off-topic article on leaky abstractions

I thought of this article on Joel on Software during the above discussion. It's a little rambling, but does discuss TCP a bit.

I think this section in particular explains why you would want to use asynchronous messages on a distributed app: