Erasmus: A Modular Language for Concurrent Programming

A Modular Language for Concurrent Programming, September 2006, Technical Report by Peter Grogono and Brian Shearing.

How will programmers respond to the long-promised concurrency revolution, which now appears both inevitable and imminent? One common answer is "by adding threads to objects". This paper presents an alternative answer that we believe will reduce rather than add complexity to the software of the future. Building on the ideas of an earlier generation, we propose a modern programming language based on message passing. A module cannot invoke a method in another module, but can only send data to it. Modules may be constructed from other modules, thus permitting processes within processes. Our goal is to provide the flexibility and expressiveness of concurrent programming while limiting, as much as possible, the complexity caused by nondeterminism.

The principle innovations reported in the paper derive from bringing together ideas -- some well known, but others almost forgotten -- found in the historical software literature, and combining these ideas to solve problems facing modern software developers. In addition, at least one idea reported here appears to be novel, namely the introduction of an interface hierarchy based not on data elements or methods, but on path expressions, on the actual flow of control within a module. It is more natural to classify components of a process-oriented system by control flow rather than data content.

Another novel feature is the integration of unit tests into the source of each component, thus reducing the possibilities for testing to get out of step with development.

The project home page is here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Concurrent ML?

we propose a modern programming language based on message passing. A module cannot invoke a method in another module, but can only send data to it

What is the novelty in comparison to CML? Bad that it is not even mentioned in the bibliography...

Is CML really that relevant?

I am not an expert on concurrent languages, but is Concurrent ML really such a noteworthy language in this area that is deserves mention in their bibliography?
On the CML web page http://cml.cs.uchicago.edu/doc.html I find no academic publications.

CML publications

There are actually quite a few publications about Concurrent ML. I don't know why they're not listed on the CML page.

I believe CML was the first

I believe CML was the first concurrent functional language. Perhaps even the first "production language" period to embed concurrency constructs. I'd definitely say its noteworthy.

Realy?

Is CML going from 1986-88 as Erlang is? But don't wory, Erlang is not mentioned in this article too ;-) They reinvent wheel and worse they looks like Sumerian don't exists.

"Production language" with embedded concurrency constructs

Several "production languages" with embedded (message-passing) concurrency constructs predate CML by a number of years. The ones that immediately leap to mind are:

  • Erlang (initial experiments in 1987, production use in 1988)
  • occam (initial release in 1983)
  • Ada (international standard in 1983)
That's not to say that CML isn't noteworthy. John Reppy did some fantastic work on it, and the language has certainly been influential. But its noteworthiness is not tied to being the first "production language" to include concurrency constructs.

I probably should have been

I probably should have been more precise in my wording. CML was the first statically typed language to include flexible concurrency constructs that I'm aware of. Neither Erlang nor Occam are statically typed, though they provide flexible concurrency. Ada is statically typed, but it's rendezvous model has long been considered too limited.

occam's type system

I'm fairly sure that occam has been statically typed since at least version 2 (released in 1986), in addition to statically checking various "parallel usage rules" on channels and variables. Occam 1 may well have been statically typed as well, although the only types it supported were machine words and arrays.

[Edit: Just to be clear, I think that there's no doubt that CML provides some substantially more powerful abstractions than occam. On the other hand, I think occam was, in its day, probably much more widely used in industry than CML.]

Hmm, for some reason I was

Hmm, for some reason I was under the impression that Occam was dynamically typed. I stand corrected!

A common mistake

No worries. It seems to be a common mistake. I'm not sure why. If anything, occam was the antithesis of "dynamic" - it didn't even support dynamically-sized arrays (let alone dynamic process networks) until Fred Barnes and Peter Welch added them to occam-pi. Classical occam was used a lot in the embedded domain, and static allocation was considered a good thing from the perspective of predictability.

CML

I have a great deal of respect for John Reppy's work, which I first encountered in his 1988 paper on synchronous operations as first-class values. I used to include a lecture or two on this paper when teaching ML. Re-reading this paper, I realize that CML choose and Erasmus select are very similar. However, other languages (CSP, occam, Joyce, Ada, ...) have used similar constructs.

Threads in CML are first-class continuations; Erasmus threads are not.

Joyce

I have been looking for Joyce implementation, (As described in "A Programming Language for Distributed Systems"). Any idea where I can find it? (Or who might be working with it now.)

Any information at all would be appreciated.

Re: Joyce

author apparently passed away [LtU].

a message from the author mentioning occam and joyce.

a related language of his, super pascal, is apparently still available.

but i didn't stumble across joyce itself.

thanks,


 author apparently passed away [LtU].

I have been reading his papers.

Trying to compile superpascal, but not successful yet (seems to depend on extensions by SPARCompiler for pascal which seems to have been discontinued by sun.)

Yes

Yes, CML is noteworthy and influential. The foldoc page has a nice summary of its features. From those features, one can guess why concurrency implementations in e.g. PLT Scheme and Scheme 48 were based on it.

Standing on the shoulders of many giants

CML is a great language, but it's hardly the only (or the most popular) language based on message-passing concurrency. It's a shame that their bibliography left out CML, but it also left out a lot of other concurrent languages (Alef, Limbo, Erlang, E, SALSA, etc.). What they did cover are the languages that seem most closely related to the goals of Erasmus.

Regarding novelty, although I haven't looked at Erasmus deeply yet, I believe following are probably the key features:

  • Processes as the fundamental building block of programs, as opposed to using functions or objects (this is similar to occam, but embedded with a language that appears to have a richer feature-set than occam).
  • Protocols definitions that act like types for channels, but define message sequencing, and may include pre- and post-conditions. These sound a little like the formal dialogs used in Active C#, but I don't think that dialogs support an inheritance hierarchy the way Erasmus' protocols do.

However, as the Erasmus group freely admits, pieces of what they're doing has been done before in various languages (including CML). The main goal of the Erasmus project seems to be to find a way to integrate all of the good ideas from earlier languages into a coherent whole.

Thanks for posting this Chris - I hadn't come across Erasmus before, but it sounds like an interesting project. I'm looking forward to reading more about it.

No one Erlang citation

It sounds strange. Article about concurrent programming and without any Erlang occurrence in whole article. Haven't authors ever heard of this industrial proved and widely used language? More than half of this article is many years used and verified in practice.

Erlang citation

We do not cite Erlang in the report that Chris linked. We do cite it in the reports of April 2007 and September 2007. The success of Erlang is very encouraging because it demonstrates the feasibility of process-oriented languages. Unlike Erlang and CML, however, Erasmus is not a functional language. I would certainly not deny that "more than half of this article is many years used and verified in practice" — we looked for the good stuff before we started and are still learning!

CML and other languages

The report cited by Chris Diggins does not cite CML, but it probably should have done. However, there are a large number of concurrent languages, and it is hard to know where to stop. In other reports, we have cited Erlang, occam, Active C#, Ada and several other languages.

There is not a great deal that is novel in Erasmus. As Hoare said, the language designer's job is to consolidate rather than innovate. What is of interest to us is not so much a set of features as a way of thinking. Forget functions; forget objects; think processes. It's too early to know whether it will work or not, but we are having fun exploring.

The web page that Chris links to has a note saying This page is provided mainly for people working on the Erasmus Project. We will put up a more useful page, for general reading, when the project reaches an appropriate level of maturity. Brian and I were not really ready for an open discussion but (to quote Brian) In a way it's nice that the tops of our heads are finally being seen above the parapets. So, thanks Chris, and fire away!

Peter

Forget functions

Forget functions; forget objects; think processes


Isn't this the same thing the Erlang guys proselytized using the term concurrency oriented programming (COP)?

COP

According to Armstrong's History of Erlang, Erlang started as an "experiment in adding concurrency to Prolog". A "philosophy" developed within the Erlang group, and it is now called COP. COP has been explained in various ways, one of which is "a kind of hybrid language between concurrent languages and functional languages", which is why Erlang is often referred to as a "functional language".

Our ideas certainly have a lot in common with COP. We make no claim to be functional. However, we are finding that there are advantages to reducing the amount of state in processes, which suggests a possible convergence with Erlang.

What is different from the Actor model?

I read the paper carefully, understood it, but failed to understand how the Process model is different from the Actor model. It seems both models are exactly the same, i.e. based on sending messages from one part of a program to the other.

Similarities and differences

Yes, both models are exactly the same: processes sending messages. Analogously, Smalltalk and Java are exactly the same: instances of a hierarchy of classes invoking one another's methods.

A few differences

There are a lot of different models that can be summarized as "sending messages from one part of a program to another". These models differ in exactly how the messages are passed, and what the "parts of the program" look like. To me, the most immediate differences between what Erasmus is doing and the classical Hewitt/Agha Actor model are:

  1. Communication in Erasmus is fundamentally based on rendezvous, while Actors fundamentally communicate asynchronously. There has been a lot of argument about which makes the most sense as a "primitive" of communication. Both approaches have been used successfully in industrial applications as well as academia.
  2. Actors have names, and communicate by sending messages to named entities. Erasmus processes appear to be anonymous, in the sense that communication between processes is via ports rather than by sending to named Actors. It's possible to debate endlessly about the pros and cons of port/channel-based communications versus address-based communications. One advantage of port-based comms is that it makes it easier to achieve compositionality of processes, which would provide support for the scale-free assembly of systems that the Erasmus group seems to be shooting for.

The use of port/channel-based rendezvous communications makes Erasmus more similar to something like CSP or the pi-calculus than to the Actor model.

It's possible to debate

It's possible to debate endlessly about the pros and cons of port/channel-based communications versus address-based communications.

The two approaches may have the same network-level semantics, but they don't have the same security properties. Protected ports (aka capabilities) are fundamentally more secure than addresses. Unless you spawn an agent for every channel, in which case addresses and ports are now synonymous.

Forgeability

Aren't addresses forgeable were channels not? I don't see how we can make adresses and ports equivalent if we can forge an arbitrary address and therefore gain access to the process behind it but ports must be passed explicitly.

I have thought of that, but

I have thought of that, but I would imagine an address is not forgeable within a language. Even if it is, we could make them capabilities by making the addresses sparse (ie. unguessable).

Can vs. Must

We can make capabilities out of them, but if we can forge addresses then there's this hole where addresses aren't used as capabilities, similar to OO languages that can be used to write programs using caps but basically fail at it because the backdoor is there and widely used in the base libraries. I agree that we can make forgeable addresses unguessable (e.g. the whole safe URI concept) but this is an encoding and the properties of the encoding don't hold outside it (i.e. in the host language), so we can only say that sparse unguessable addresses are equivalent to capabilities, but if the program uses something else all the bets are off. In a capability environment we can look at the port definition and know for sure what capabilities the program attached to it has, but sparse unguessable addresses don't have the same property.

On an open network, sparse

On an open network, sparse capabilities are all we have. It's not clear to me that addresses are forgeable though.

advantages of ports

Thank you for the clarification! One reason for preferring ports is that there is less coupling between components. A disadvantage is that components must be explicitly linked: the 'plumbing problem'. We have various plans for dealing with the plumbing problem, notably the distinction between 'cells' that provide structuring and 'processes' that perform actions.

Security will be very important for future concurrent and distributed systems. In IBM's Hermes, a process starts out with one port, through which it receives all of the capabilities it needs to do its job, including ports linking it to other processes. Erasmus can do this too, since ports are first-class, and this approach may turn out to be the basis of our security model.