Asynchronous messaging as integral part of programming language

I come from a world where asynchronous message passing is the normal way of designing your applications. The operating system, called OSE, is designed primarily for this. It took a large number of years before it got a mutex...

Now that things are becoming more and more concurrent, this programming paradigm starts to become more and more attractive. The reason is that this will create concurrency on a much higher level, making communications between processes taking less bandwidth.

My own finding is that if you construct a thread/process that has the following design:

myProcess
   initialization code
   forever
      getMessage
      if message in wait queue
         continue executing message
      else
         start message execution
         if response needed, put message into wait queue
      end if
   end forever
end myProcess

The important thing here is that the process has only one synchronization point, the line saying "getMessage". After that, it will work on the message until it needs a response from someone in order to continue, or until the work is done and a reply has been sent.

The problem is that above construction is non-trivial to implement in languages like C, and it is the wait queue that has the complexity. I would like support from a language to make this easier.

If you also consider that you might want to make decision about what objects will be put in which core/process/thread late in the design, and maybe also being able to change this after the first version has been shipped, this becomes an interesting problem. I come from a real-time world where you want things to be quick and easy to design and implement from the beginning, and then possible to fine tune until the performance and characteristics are just the way you want it.

My idea here was to concentrate on the popular object-oriented approach. The process above would then be a (special?) class, where the methods would actually be messages being sent to an instance of that class. This way, the programmer does not need to learn any new special stuff to get things done.

The idea here would be that the methods describe sequences to be executed, where each method corresponds to an asynchronous message being received by the object. If the sequence calls another object that uses asynchronous message passing, the compiler handles the fact that the method will return to main loop, put its state into a wait queue, send the request and then wait for an answer. When the answer comes, it will continue in the method where it left off.

One implication of this is that the accesses to the internal state of the object is subject to concurrency, so some way of specifying what needs to be atomic must be offered to the programmer to help the compiler. Or at least, I think so...

Anyone has any thoughts on this? Am I on to something here, or just out in the blue dreaming?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Vat Concurrency

I think you're looking for something close to the E language vat model. In E vats, each thread owns multiple objects, and there is no shared-memory concurrency (i.e. different vats for parallelism), and there are no waits between vats. Rather than a 'message' queue (which treats the thread as one big object), we have an 'operations' queue per thread.

Invoking an object adds an operation to the queue of the relevant thread. Since we aren't binding directly to threads (i.e. there is no visible 'Vat' reference), this allows late-binding of the decisions on how objects are distributed to different vats. Also, authority to add operations to a given vat is not uniform - i.e. your rights to manipulate a vat depend on which object references you hold. This is a valuable property for security.

Local objects can be invoked in a more stack-based manner. Doing so would give you semi-transparent distribution, since you cannot break apart cliques with local invocations.

send the request and then wait for an answer. When the answer comes, it will continue in the method where it left off

While this can be (and has been) done, I cannot recommend it because it means your 'local view' of a method call still has attributes changing underneath you (i.e. weaving between two different messages to a single object). Implicitly 'continuing where you left off' would severely undermine the benefits of having a 'single synchronization point'.

What E uses instead is the notion of promise pipelining. Developers immediately get a token that represents a future answer. They can do a number of things with this token, such as assign new events to run locally when it is fulfilled, or even pass it as a parameter.

A simple alternative to promises, of course, is to simply send a callback object with each message. This can be a bit painful to use, but at least it avoids any harmful illusions.

There is work to bring the vat or similar models into other languages, such as Scala and Haskell.

Asynchronous messaging as integral part of programming language

There are, of course, plenty of languages with asynchronous messaging built in. Erlang would be the classic example, which is one of many actors model languages. I just don't believe that's what you're looking for, really.

Asynchronous thread - good or bad?

That the thread is asynchronous, i.e. it can receive new messages while waiting for answers to requests, is intended to solve potential priority inheritance problems. Imagine the following scenario:

Three threads are communicating, T1, T2 and T3 in falling priority order. Imagine that T3 receives a request from somewhere, and to fulfill that requests, sends a message to T2 and then waits for an answer. T2 takes a considerable amount of time to fulfill this request, and during this time T1 sends a request to T3. This means that T1 now suddenly waits for T2 to finish its job, even though T2 has lower priority. This is a classical priority inversion problem. This was the problem I tried to solve using what I call asynchronous threads.

I'm aware that this means that each object must handle concurrency, however, it is a much simpler case since the concurrency is within the same thread. This means that the places where the concurrency occurs can be precisely determined.

My vision was that the language would give as much support as possible to make this concurrency as easy to handle as possible.

Thanks for the reference to the E language! It seems very close to what I want. I'll read on it more closely to see if it actually might be exactly what I want...

Priority Perversion

Solving the priority inversion problem is good, all else being equal. But there are many other criterion by which we should also aim to simplify the lives of our developers!

You should not assume that long waits on a remote thread are due to IO waits. Consider the possibility that T3 requests something from T2 such as "please render this POV-Ray scene" and T2 uses a ray-tracing algorithm which can take considerable time. When T1 comes along and says: "please render MY POV-Ray scene with my uber-priority" , well, T2 doesn't see the request until it waits on a remote request, right?

E doesn't directly solve this problem either, though easy access to a task-queue certainly makes it easy for developers to favor idioms that break big tasks into a lot of smaller ones - i.e. use the task-queue as a partial replacement for the stack. You'll see similar idioms in other event-queue models, such as Grand Central Dispatch. I've been imagining that I'll force the issue in my next language design - i.e. ensure a bounded amount of work per event at the language level, such that I can make real-time guarantees in a closed subsystem.

Anyhow, I think you stop! for a minute, and reconsider one of your assumptions: Why should priorities belong to 'threads'? I would note that assigning priorities to threads has its own problems, such as implicit priority elevation: T3 can implicitly elevate its own priority by asking T2 to act on its behalf. (Doesn't that sound like something we'd want to control?)

More generally, we might ask questions such as: Where do priorities belong? Who assigns priorities? Why should we respect them?

For my own goals, I had to think about these problems in the context of distributed systems, where there is no Master Control Program providing ultimate authority over priority. In distributed models, we must represent authority for priority in a way that foreign subsystems can respect without relying on trust. Also, within each subsystem, we must somehow encode which priorities we locally respect. In other words: Priority must be an explicit part of our programming model, protected by a security model. Making priority 'implicit' to the environment is a design mistake.

Of course, there are plenty of security models. I figured out the above principle regarding priorities years before I even heard of object capability security. My earliest approaches were closer to the SPKI model, which is flexible and tolerable (barely).

I now prefer an ocap basis for security, and I simply stamp messages with a priority field - i.e. priority is just data sent along with each request. Controlling priority can then be achieved as a simple attenuation pattern:

priorityWrap :: priority -> ((priority,a) -> action) -> (a -> action)
priorityWrap n fn x = fn (n,x)

Alice can give Bob a capability after wrapping it with whatever authority we decide Bob should have.

But slightly alter the scenario with delegation: if Alice gave Charlie the full capability (so that Charlie can pick his own priority), then Charlie would be able to assign priorities to Bob. Thus, Charlie becomes an administrator for Alice's service. But, now it seems that Bob is unable to assign a priority for Alice's service, so if Bob wants to further delegate... he's stuck! He can only share the capability with the full priority he already possesses.

To fix this, we need to make priority itself compositional... e.g. a list of numbers rather than a single number. The new priority wrapper would look like:

priorityWrap :: (Monoid priority) => priority 
  -> ((priority,a) -> action) -> ((priority,a) -> action)
priorityWrap nAuth fn (nDisc,x) = fn (nAuth `mappend` nDisc, x)

Now the message accounts for both authoritative and discretionary priorities, such that Bob can assign different fine-grained priorities for his own behaviors or when further delegating.

If you are unfamiliar with monoids, it just means two things that can be added together in a useful way and having an additive identity element. Naively we might be appending two lists of priority numbers and use a lexicographic priority order. A more realistic model for the priority type would probably involve granting 'ranges' of authority for certain priorities (e.g. Bob can use priority values 20-40) and sort of intersecting or weaving ranges at different steps.

Anyhow, now Alice can assign priorities, and Bob can assign sub-priorities, and so on, down to whatever fine-grained level we wish to assign them. And these values are secure, since nobody can bypass the capability path.

But we're still left with a concern: Alice might be answering a low-priority request when she receive's Bob's high-priority request. Or Alice might have a very large queue of tasks, filled with a bunch of low-priority actions.

This issue is a big part of what led to me rejecting queues!

The basic solution, as I see it, is that all incoming requests must somehow be 'visible' to Alice, like a database or tuple space, such that she can pick the ones she wants to work on. This serves triple purposes:

  1. Alice can pick high-priority tasks over low-priority tasks.
  2. Alice now has a lot more local discretionary ability to choose between finishing a low-priority task and switching to a high-priority task. (This is valuable because keeping a task half-finished might be wasting a LOT of resources - open files, memory, etc. Also, it may be unsafe to abandon a task at certain stages.)
  3. Since Alice sees all the tasks at once, she can aim to solve many requests at once. Ten birds with one stone is not at all out of line, which potentially means an order of magnitude performance improvement.

Okay, so I say: out with the queues, and good riddance. Tasks are instead given a more content-centric representation in some sort of tuple space, where one of the attributes might be 'priority' (if it matters).

Eventually, this notion of tuple spaces replacing queues became my 'demand monitor' concept in RDP, which allows Alice to observe demands (requests) that hold at the same instant. Queues are replaced with temporal semantics, such that we can model and track future requests orthogonally to present ones. But, back when I was still favoring the actors model, I was envisioning more along the lines of many miniature stateful databases. Bloom achieves something similar to my vision, with first-class relations and queries. (Like Bloom, I was using Datalog as a basis for a the query language. I might still, though now I'm enamored of grammars as a feasible basis for large, lazy, structured data.)

A careful consideration of priority management leads me quite far afield of the model you seem to be imagining, but was one (of many) concerns that has shaped my language designs.

Priority vs security

When Ï was talking about thread priority I didn't think about security aspects. In embedded applications security is often only a concern at the "edges". Within the application it is more bugs and bad characteristics that are of a concern. So my view of thread priorities are more of the "knobs to use to fine tune your application characteristics" rather than anything else.

The whole security concept is a bit big chew to take in one bite. I have to ponder a bit how it should affect my language. The problem that I see is that for some people, it is not an issue. Simple applications run from the command line might not care, thinking that there's an OS that exist to care about. This would be true for applications that does text formatting and other things that does not really concern security. For embedded applications, the thinking is quite often the same. There's some kind of framework around it that handles the security, the application itself only tries to get the job done. Other applications might have something to adapt to, e.g. Kerberos, and therefore needs something that integrates well with that. After a while this might get messy... So I'm not so sure there is one single security model to support, and therefore, I'm not sure the language should require one. It might however support some hooks/tokens/whatever to make it possible to integrate your favorite security algorithm without to much effort.

While my design using "asynchronous threads" is a bit more advanced for the programmer, it also gives more opportunities to fine tune characteristics. My experience is that if the customer is given something that cannot be fine tuned well enough, they'll only resond with "Ok, this was nice. Now, implement some knobs for us!". When designing a language, you must be quite open to this. But you are quite correct that the programmer must split big operations into chunks, or else the "asynchronous threads" will become quite synchronous.

Having said all this, priorities within a tightly coupled clusters of processors in an SMP system is a difficult problem. This is because priorities is probably meant to be global in the whole cluster since the threads can move around it, but now threads with different priorities might run in parallell. This creates some new challenges for the application designer. I think asynchronous messaging might make this less painful, but there are probably some caveats left.

I've also considered supporting both "asynchronous threads" as well as "synchronous threads" (which is what I'd call Vat design when it comes to how to handle incoming messages). But on the other hand, if I just could make asynchronous case simple enough, it wouldn't be worth supporting both modes. And that would be kind of neat...

Security does not need to be

Security does not need to be complicated, expensive, or difficult. Object capability security, for example, is just good modularity - rigorously applied at every level and protected by the language. Type systems can also support secure construction. Language level security mechanisms can actually improve developer productivity (simplify debugging, testing, maintenance, reasoning, modularity and reuse), and performance (simplify parallelism, heterogeneous memory targets, optimizer analysis).

From a security perspective, buggy code is not so distinct from malign code. Within an application, different code should have different levels of 'trust', e.g. scripts, plugins.

Security frameworks and perimeter - or 'eggshell' - security is only complicated and heavyweight because the programming model is not robust in the first place. If this is your only understanding of what security can be, then you are seriously missing out!

Security is not something to absorb in one bite. I do recommend you seek and read more about the relationship between security and language design. The E language website, erights.org, is a good initial resource.

With respect to 'tuning' of applications - I agree, it is important to be able to do so. But, if your goal is to late-bind objects into threads, and to have tasks that cross multiple threads, then threads are certainly not the right granularity for your priorities and tuning.

I would avoid the phrase 'synchronous threads' to describe vats. Just call them 'vats' - that way you can point to prior work on the subject, and people can find information on it through google. And you'll avoid unnecessary connotations to synchronous programming, which is a different subject associated with temporal semantics.

along those lines

podcast: making security decisions disappear into the users workflow, and i can never effing find the short paper that's the 7 or 10 or 12 or whatever guidelines version of that. if i listened to the podcast again i'm sure it would tell me...

Bugs vs malign code

I do agree about all you say, except that I would still like to differentiate between bugs and malign code when talking about security. There are some significant differences in my opinion. One of them is the fact that bugs can be analyzed statistical and with that as a tool make bugs less common. Doing the same for malign code is also possible, but is more dangerous and would also likely produce a somewhat different result.

There also another thing that differ: When protecting against malign code, application certificates and means of ensuring an end user identity is often used, while this is not so interesting techniques for bugs.

Having said that, I also realize that when it comes to distributed security it becomes harder to differentiate the two. So you still have a point that the language needs to implement features that helps security in all respects.

I'll do some more reading on the E language and see what I learn. It is a very interesting language.

'Reply' Link

Hey, Mats! There's a 'reply' link by my name. If you use that, the conversation gets structured as a proper thread.

Buggy or Malign

To a client of buggy code, or a client of malign code, the problem is the same: how to control damage and recover from it. From an omniscient perspective, the two might seem different though, I agree.

Certificates are effectively a 'rights amplification' pattern. That's something we should avoid unless necessary for a security policy - i.e. it's better to be secure than to be insecure and accept only 'signed' modules. Still, rights amplification can be a useful technique.

Reordering of the incoming message queue

If I understand the documentation for the E language correctly, it provides the ability to reorder the incoming message queue. Spontaneously this feel dangerous, since this would make things execute in a hard to predict order. While this probably can be handled by the Vat itself, the question is if the whole system can handle it? The hard to predict consequence also means that it might be hard to predict that it would create problems...

Is there any experience whether this is a real issue, or just a theoretical problem?

E order

E is quite rigorous about ordering. Messages between two vats are strictly ordered, for example. 'Eventual' sends are ordered as called. The only unpredictability in ordering I know of is the merge-order of eventual messages from other vats and local tasks.

E makes weaker semantic guarantees - order is specified in terms of messages to a reference, and because some far objects are passed-by-constructor, and we want to maintain the ability to semi-transparently distribute objects among vats. So developers should not reason about order of messages to a particular vat, but rather to a particular object.

I have never seen any E code that reorders the incoming message queue, and certainly not in any unpredictable manners. Any ability to reorder the queue will certainly be a capability, not an ambient authority - i.e. only a small amount of trusted code would have access to it, and problems would be easily isolated. So, at the very least, this would be a theoretical problem.

I'm a bit curious as to what documentation brought you to this concern. Can you provide a link?

I won't say E order is perfect. For perfection, join-order should be a precise, formal part of the semantics itself, e.g. using temporal semantics.

Reordering incoming queue

I tried to find where I read about it, but failed to find it. I do remember it was talking about priorities, and the mechanism intended to make high priority message bypass low priority messages.

Some things to think about

What is a message in your language and how does it appear to behave to the programmer? Often messages have some kind of payload data that can be accessed. Does a message behave like a reference once it is retrieved from queue, may the programmer cache it for later use, what kind of lifetime does it have?

Are you aiming for concurrency, or for parallelism?

If methods can be accessed globably (via messsages) then what about members? Do threads share memory or do they possess private memories and communicate only via messages?

How do you construct data structures that allow concurrent access? If the wait queue processing is complex enough to deserve language support then what about more complex structures?

How do threads perform I/O - what order do side-effects occur for an outside observer (ie the environment) ?

Lastly, I didn't follow your point about bandwidth. Synchronous message passing needs one extra bit. Reliable message transfer needs that bit anyway, so you only save bandwidth if your messages are unreliable - how would that affect your semantics, especially processing waiting for responses?

Latency is of course clearly reduced in an asynchronous message passing system (with unreliable messages), and your idea seems to map cleanly onto the low level architectures that are commonly in use. To take advantage of the fit you might look at applications built on stateless requests. Flakey 3g and tiny keyboard made this comment excessively terse, insert padding as appropriate :)

Some things to think about

Well, lets see if I manage to answer all those questions:

I intended to make messages look the same as a method invocation. This would actually go one step further than E language, since E language uses another syntax for (potentially) remote method calls. It also has a special primitive to get an answer to such a call, if needed, called "when". Works like "when an answer has been received, do this". My idea was that neither of these would be needed. I.e. a programmer does not need to care whether a call might be translated into a message or is a direct call. The idea is that this shall be resolved during link time, and what object that should be placed in which process/CPU/board whatever is described elsewhere. This should make it possible to do late changes to how objects are distributed.

How the message itself looks like is something solved outside of the language specification. It should be possible to have several alternative implementations for this, depending on needs like encryption, reliable transport algorithm et.c. The message itself is translated into parameters to a method, and the return message is derived from the return value of the method.

If you want to cache message content, you must cache the values of the method parameters. I do not (currently...) see a need to have a special reference for the message itself. I'd say the lifetime of the message is over when it has been removed from the incoming queue and the corresponding method call has been issued.

If you mean multiple threads concurrently accessing shared memory as being parallelism, then this is what I try to avoid. If concurrency means that you have several CPUs running in parallel and thereby creating true multi-tasking, then this is but one scenario that I want to support. I want to support all kinds of parallelism regardless if it means several computers or just one single-core computer.

I think members are never accessed directly, but must be accesses via methods. The thing I'm pondering on is whether to have a possibility to declare members where getter/setter methods are automatically created.

The default behavior is to not share any memory between threads. I suspect there will be cases when you really must do so, for which cases I'm planning to have a special class available that could realize this. I have the same way of thinking when it comes to hardware resources: Encapsulate them in language provided objects. This means that it becomes quite obvious when you access memory directly. It also means that you seldom do that. Instead you should use references and indexes for normal accesses.

When it comes to data structures supporting concurrency that you can get when having this "asynchronous thread" design (anyone know a better word for this?), is something that I'm currently trying to solve. One idea is to have atomic low-level methods which cannot be interrupted (which means are forbidden to communicate with other objects) and only these methods are allowed to change the state of the object. Whether this is a good idea or not, I do not know. I've had ideas about being able to specify properties on members saying things like "this member is dependent on changes on that member" and so on. But in the end, isn't the whole purpose of an object to collect all stuff that are closely dependent on each other? With that thinking I got back to atomic methods instead...

I intend to support detailed descriptions of what kind of side-effects that might occur. A coarse grading could be something like the following:

  • Does not change any states. Return values, if applicable, can be safely cached for the duration of the object.
  • Changes state of the object. Optimizations as for when object is called and when the return value is collected can be done, as long as the sequence as observed from the outside remains the same.
  • Changes external states, e.g. I/O. Means that things need to occur where they appear in the program.

    I wasn't clear about synchronous vs asynchronous messages apparently. I would say that when there is no return value, you save bandwidth since you have only the low-level reliable transport communication to take care of. These can employ sliding windows and adding it to reply messages to make it efficient. If there is a reply message, it needs to be sent and there has to be some guarantees about message ordering and also a payload to include. I think this was what I referred to as taking less bandwidth.

    There is also some concerns about latency. If the only answer you need is whether the operation went well or not, you could assume that it went well until you get an exception message saying otherwise. It might not be applicable for all cases, since it depends on how dependent you are to know the result before you go on. It asynchronous messaging opens up this opportunity.

    I've heard about stateless requests, but I'm not sure what problem they solve. Do you have any links?

    I hope I managed to answer all your questions...

  • Sounds like you want RPC!

    Sounds to me you are talking about object oriented RPC. Obj->method(args) is either entirely handled within the same thread or if the actual object lives in another thread, turns into a request to the other thread and when done a reply message to the original thread. Then your main receive loop is hidden within the RPC layer. Marshaling, unmarshalling args and results is done by the runtime. Given such class definitions in an IDL (interface definition language), a compiler can generate the stub functions for the clients and glue for the servers.

    BTW. We use OSE @ $work and am well familiar with the problem you describe. Not having a RPC layer has resulted in a very difficult to read code. Basically you logic has to be chopped up in pieces bracketed by such messages and pieces connected via a state machine, usually very complicated. Basically one ends up implementing this RPC layer by hand, only badly. And every reader is forced to look at these low level details. Unfortunately there is far too much legacy code.

    Enea should consider providing an IDL that can be used from c/c++.

    RPC++?

    Describing what I try to accomplish as being RPC might be a good start, but I do have higher ambitions. Making the compiler taking care of how to reorder statements depending on whether the call will be remote or local would be one nice thing.

    If you think writing an OSE process is hard, then having it designed the way I have described (that I have called "asynchronous thread") is a lot harder... Would be nice with some language support here.

    I wonder when E language gets ported to C++ and OSE... ;-)

    An IDL is basically language support!

    An IDL is basically language support! You don't need to invent a whole new language!

    But I don't think of what an IDL compiler would do as reordering. Consider a typical server request processing procedure:

    Response* process_request(Req* req) {
        ....
        resp1 = foo->process_req1(req1);
        ....
        resp2 = bar->process_req2(req2);
        ....
        return resp;
    }
    

    Note than in effect most server requests themselves can be seen as RPC from a client thread somewhere in the word.

    Now (to a first approximation) I don't care whether foo and bar are references to local or remote objects. This may not even be known until run time (or may change even after that). Given definitions of types of foo and bar an IDL compiler should create glue logic that does the right thing in either case. The RPC layer will have a generic receive/dispatch loop.

    But it might be better to think of the control thread itself moving to another place (since the original piece of thread code can't make progress, it is effectively in "zombie" state -- its "spirit" has moved to another place! -- you must have seen how people who are waiting for service have a vacant stare :-)). And to think of what is normally called a thread a thread container. For a better exposition see some paper on the Alpha Real Time Kernel (E. Douglas Jensen). I think this model makes it easier to reason about program logic.

    RPC Layer

    If you think you want RPC, that's the first thing to fix!

    An RPC layer is a good way to hide the costs and fallacies of the network from the developer. This is a bad thing.

    Procedural and OOP work locally because we can compose many low-level methods and objects to build domain-specialized objects and methods. But, if you try this with RPC, you end up having a lot of extra round-trip latencies between the client and server threads. Developers are quickly disillusioned of any 'separation of concerns between abstraction and performance'. They are pressured to provide very 'coarse-grained' method calls with bulky messages. This is painful to do (conflicts with many OO design patterns and basic abstraction principles), and also results in more specialized systems.

    Even without the performance issues, the extra failure modes would be a problem. Partial failure is one of the Big Problems for distributed programming.

    William Cook explains this problem well, and answers with batching mechanisms. E language, similarly, achieves implicit batching based on promise pipelining.

    RPC vs E

    Could you explain more why RPC is a bad thing, and what E does better?

    Abstraction and Composition

    Consider a very simple composite task implementing the Collatz conjecture:

    n = b.get()
    n' = if even n then n/2 else 3n+1
    b.set(n')
    return n'

    If b and log are local objects in a synchronous thread, this operation will take at most a microsecond, and is guaranteed to succeed. Yay.

    If b is a remote object, then in a success case, we have two round-trip latencies. We can assume two to four orders of magnitude performance loss. Further, there are four failure modes (assuming 'get' is pure, and 'b' is not itself a composite):

    1. We can fail at b.get() - message or return-value lost
    2. Someone else can use b.set() while we're computing n' (race)
    3. We can fail at b.set() - message lost
    4. We can fail at b.set() - confirmation lost

    In order to work around these problems, developers are likely to say, well, we can put more of the operation on the remote host.

    return b.runCollatz();

    This reduces it to one round-trip with two failure modes (fail at command, or fail at response). That's probably as good as we can do in an imperative model - unless, of course, we intend to use that return value for something else on the remote system.

    But adding methods like 'runCollatz' means (a) we must have a lot of foresight about how our remote objects are going to be used, and (b) we have no inherent upper bound on the vocabulary for each object.

    Effectively, we lose composition and abstraction. We cannot extend our languages in rich, ad-hoc manners we desire. A method such as 'runCollatz' is a symptom of this problem.

    In order to regain composition and abstraction in a distributed scenario, we need either fine-grained code distribution, or batching. For example, SQL is an example of code-distribution - we could use SQL to implement Collatz, assuming the value were in a database.

    I would not suggest that E is perfect, but E language does two things better than the above example. First, developers can know just by glancing at the code what are the failure modes and performance properties. Second, developers can achieve simple batching for a number of common cases by use of promise pipelining and clever use of the messages on integers and booleans or other pass-by-copy or pass-by-constructor objects.

    RPC properties

    RPC has latency issues when the nodes are connected by long fat pipes (you can't keep the pipe full - and it gets worse with distance - this is a real problem with plan9's 9p protocol). But they work well the nodes are connected by short fat pipes, typically the kind of places where OSE might be used. RPC looks like a function or method call but has additional failure modes. But these modes (+ associated cleanup & recovery) exist in any distributed system. I haven't looked at E in ages nor have I any working experience with it so will refrain from commenting on it. IMHO when you have a request/response model you have these issues no matter how you dress them up. To me the main concerns are whether a) this is easy to program and maintain, b) easy to understand, c) efficient and d) scalable.

    What I can tell you is that compared to RPC based programming, event based programming sucks. Big time! This based on having written multithreaded servers in both styles. Note: using RPC to effect bulk data transfer is not a good idea! Use a pipe for that. Similarly, when you need multicast, use multicast. One size does not fit all.

    Event Programming

    I am curious how much your bad experience with event-based programming is based on your particular approach to it.

    I've found that if you can associate events with callback objects - i.e. such that you can handle a growing number of ad-hoc events at runtime - that this is a very easy to program, understand, and maintain. E is based around this idea, but I've implemented it in C++ as well and - even without the promise pipelining and batching - it was easy to use compared to threads and local shared memory.

    Depending on how many local shared memory interactions your threads need (e.g. to maintain an abstract scene-graph or some multi-user state), perhaps your server is not a representative sample.

    Callbacks are nasty

    Callbacks is how I implemented a server (long before I knew of OSE) but here is what happens. The logical flow might be like this:

    a() { ... b(); ... }
    b() { ... c(); ... }
    c() { ... }
    

    But c() has to make a request to another process (or thread -- doesn't matter) and can't proceeed until it gets a response. So now I have to do this:

    a1() { ... b1(); }
    b1() { ... c1(); }
    c1() { make a request & register a callback to c2(); }
    c2() { ...; b2(); }
    b2() { ...; a2(); }
    a2() { ...; }
    

    When the request arrives, the main eventloop will callback c2() and so on. As you can see, the callback *inverted* the control flow! Don't know about you but this complicated things for me. One request/response split these functions in two pieces! Worse, now you need to wrap all this in some state machine because typically the control is not as simple as a calling b calling c etc. Basically you end up doing the work of a compiler. And not everyone on a team is clear about this so then the code starts growing barnacles.... And rather than factor code out, people start inlining code to reduce the number of functions they may have to double.

    Yes, this would be easier in a language like Scheme because basically a callback is really just a continuation but one doesn't usually have the luxury of choosing an implementation language. This however is a good way of looking at RPC. Basically a `callback' from the RPC layer turns into a `return' in the originator!

    Shared memory is a non issue here (except that it tempts masochists to use shared data structures).

    Asynchronous thread the solution?

    As for your specific problem, I feel these asynchronous threads would help. This is because when c wants to communicate, the state of the sequence is stored and execution returned to main loop. Whenever an answer to that request arrives, the sequence c was executed is revived and continued.

    This does have other consequences, such that other things might happen while c is waiting, which complicate things. But that's another story... ;-)

    I don't see how "asynchronous threads" help

    I don't see how "asynchronous threads" help!

    This is because when c wants to communicate, the state of the sequence is stored and execution returned to main loop. Whenever an answer to that request arrives, the sequence c was executed is revived and continued.

    This is exactly what happens *within* the RPC runtime layer. The RPC runtime does use unidirectional messages. If you look at just the messages, the same information is conveyed in the same sequence.

    When you have a nested request/response model of communication, someone has to handle all the complexity of saving state, resuming from a saved continuation when response arrives etc. The RPC layer factors this common case out so that not every programmer has to implement the same thing in every piece of code (and usually differently).

    Of course, you still have to deal with the unexpected disappearance of a client or server or any breaking/unreliability of the communication link. The RPC layer must provide a way to deal with these. Here some of the existing infrastructure provided by OSE can help quite a bit.

    Not so nasty

    You seem to be assuming second-class callback functions, which I agree are a problem. But with callback objects, your control-flow would instead look like:

    a() { ... b(new a2()); ...}
    b(cb) { ... c(new b2(cb)); ...}
    c(cb) { ... remoteCall(new c2(cb)); ... }

    At the very least, this avoids the bi-directional dependencies in source (i.e. developer of 'c' doesn't need to know anything about 'b' or 'a').

    One way to push more into the imperative layer (for consistency) is to capture the typical response-callback pattern with local promises, to which we can attach callbacks.

    a() { ... p = b(); p.attach(new a2()); ... }
    b() { ... p = c(); p.attach(new b2()); ... }
    c() { ... p = remoteCall(...); p.attach(new c2()); ... }

    In this case it would be more verbose, but we can potentially abstract the pattern.

    I grant that not every language makes managing callbacks or promises syntactically convenient. It would be better if the language directly supports some sort of lambda or block syntax. And operator overloading would allow us to achieve features such as converting numbers to promises and adding promised numbers to generate new promised numbers.

    This however is a good way of looking at RPC. Basically a `callback' from the RPC layer turns into a `return' in the originator!

    That does nothing to make RPC easier to reason about. You're just hiding latencies and failure modes like landmines in that return value. If this is 'simplicity', then your driving to work would be much simpler if you just wore a blindfold.

    Shared memory is a non issue here (except that it tempts masochists to use shared data structures).

    Shared memory certainly is an issue!

    Most useful applications involve some form of shared data structure - scene-graphs, databases, et cetera. Even if these structures are 'owned' by one thread or actor, we can have calls from different threads whose purpose is to adjust the structure. And we'll still bump into all the same race-conditions, reentrancy, and deadlock issues, depending on whether threads 'wait' for a callback, or need to manipulate it across more than one call (i.e. transactions).

    They don't seem equivalent

    Let me try again.

    a() { before_b; b(); after_b }
    b() { before_c; c(); after_c }
    c() { before_RC; RC(...); after_RC }
    

    Here I use before/after labels to distinguish various pieces of code. In event driven programming this becomes:

    a1() { before_b; b1(mk_ctx(a2,a_state,0); }
    b1(ctx) { before c; c1(mk_ctx(b2,b_state,ctx); }
    c1(ctx) { before RC; make_RC(...); mk_ctx(c2,c_state,ctx); }
    c2(ctx) { c_state = ctx.state;  after_RC; ctx.fun(ctx.prev); }
    b2(ctx) { b_state = ctx.state; after_c; ctx.fun(ctx.prev); }
    a2(ctx) { a_state = ctx.state; after_b }
    

    mk_ctx basically wraps up the cont. function to call, the state we care about + callers's context if any. Not shown is the main loop where messages are received or how relevant context is associated with received messages. The RPC layer used in the first code block basically does exactly this but in one place and the user doesn't have to worry about it. It can rely the stack since all these three are already on the stack! In the event driven style you are basically doing the work of a compiler or runtime; only you can't rely on what is already available on the stack.

    [Note: the RPC layer *doesn't* have to do *anything* to save the state of a, b or c when the remote call is made since this state is already on the stack. Basically the RPC layer marshalls the arguments into a message, sends the message, associates the caller's context with expected response and context switches out the caller thread]

    So let us see how would do this in an equally simple way!

    You're just hiding latencies and failure modes like landmines in that return value. If this is 'simplicity', then your driving to work would be much simpler if you just wore a blindfold.

    :-)

    Every abstraction, every library function hides something by factoring things out. Second, you can't *hide* latency, it is always there, but that doesn't mean you have to make every piece of code aware of it. As I said in my response to Mats, you still have to deal with unexpected situation.

    I get the feeling you have read about RPC but not actually tried to implement it (if you use NFS or any remote filesystem, you must have used it but you may not be aware of it).

    Inadequate abstraction is nasty

    This would be closer to what I was describing earlier:

    a() { before_b; b(function(r){after_b;}); }
    b(cb) { before_c; c(function(r)( { after_c; cb(result); }); }
    c(cb) { before_rc; rc(function(r) { after_rc; cb(result);}); }

    Note that the above 'function(r)' statements can implicitly capture values such as 'cb'. This also applies to the whole 'after_c' expression, for example. Quite a few languages offer mechanisms of this form: C++ has boost lambdas. (C++2011 has dedicated lambda types.) Java has anon-class objects. Go has goroutines. Apple augments several languages (C, C++, Objective-C) with Grand Central Dispatch and blocks for easy access to simple lambdas for callback events. Digital Mars D and C# have very good support for delegates and anonymous functions. And I shouldn't need to mention that Scheme, Lisp, ML, Haskell, etc. have first-class support for this sort of behavior!

    While imperative callbacks have their share of inherent problems, it is unreasonable of you to blame 'callbacks' for the inadequacies of your programming language! The fact that you have a lot of boiler-plate in your example should tell you that you have an abstraction/expression problem, not a semantic problem.

    Every abstraction, every library function hides something by factoring things out. Second, you can't *hide* latency, it is always there, but that doesn't mean you have to make every piece of code aware of it.

    It is irresponsible abstract away a property that cannot be hidden. That's just asking for abstraction violations down the road. And the idea that it somehow simplifies life for the users is just plain delusional. There is a difference between simple and simplistic - and the difference is that 'simple' is still sufficient to capture all relevant complexity. For distributed programming, latency is certainly relevant.

    We can't hide latency, so it is foolish to try. RPC is a very foolish approach to distributed programming.

    We have reached the fixed point of discussion


    a() { before_b; b(function(r){after_b;}); }
    b(cb) { before_c; c(function(r)( { after_c; cb(result); }); }
    c(cb) { before_rc; rc(function(r) { after_rc; cb(result);}); }
    

    Sigh... Now suppose a() has N such calls that can indirectly block.

    a() {   
        before_b;
        b();
        before_d;
        d();
        before_e;
        e();
        ...
    }
    

    In your scheme this will translate to

    a() {
        before_b;
        b(function(r){
            before_d;
            d(function(r){
                before_e;
                e(function(){
                    ...
                  })
              })
          });
     }
    

    Gag me with a spoon.

    And the idea that it somehow simplifies life for the users is just plain delusional.
    ...
    RPC is a very foolish approach to distributed programming.

    :-) :-)

    You keep making such statements but you have not shown them to be true. Even Go has an RPC package! RPC is not a general solution but it works very well when the bandwidth*latency between comm. endpoints is low.

    Use more abstraction

    Clearly, you have forgotten the most basic thing about programming: use more abstraction! (or a better abstraction)

    In this case, the answer is simple: abstract a pipe function or operator that composes two callbacks into a bigger callback. This would allow you to reduce your monster code into something like:

    a() {
      let l = function(_)       { before_b; return bArgs; } `pipe` b `pipe`
              function(bResult) { before_d; return dArgs; } `pipe` d `pipe`
              function(dResult) { before_e; return eArgs; } `pipe` e
      in call(l)
    }

    I've done this in Haskell, and in C++ (with functor objects and operator overloading), and I know it can be a in many other languages.

    Callbacks work fine. But it is clear, from your last several arguments, that you have not spent any time thinking about abstraction and composition of callbacks. Pipelines are one simple option for composition among many, yet make it clear to developers where failures may occur.

    RPC is not a general solution but it works very well when the bandwidth*latency between comm. endpoints is low.

    Right. Remote procedure calls work very well when the endpoint is not remote. Sounds like a placebo solution.

    But even that is untrue. Remote procedure calls DO NOT work very well even for low-latency endpoints. They still introduce those partial-failure issues and unnecessary round-tripping. Compared to batching or pipelines, RPC is not very effective at all.

    I think RPC is your Blub.

    Arguably, and relating to a

    Arguably, and relating to a concurrent discussion, pipes = event streams and callbacks = direct imperative style (assuming continuations). Both can be fine! Event streams make the asynchrony clearer under standard use, though there have been funny options for continuations as well.

    Arguably, the world is flat.

    And cancer causes cell phones. People argue nonsense every day. ;)

    While it is true that we can model pipelining in imperative-with-continuations, the inverse is generally not: most pipeline models lack loops and joins, for example. Constraints on structure may undermine arguments about equality between the models, even if they look similar in the particular case we're studying.

    Conversely, the 'generality' of the imperative model may prevent us from achieving benefits that pipelines may support. The model of pipelined callbacks I developed for my project allows me to avoid round-tripping with the original thread while structurally maintaining certain performance isolation properties.

    IMO, what matters is protecting abstractions, and having good abstractions worth protecting. We could have a 'direct imperative style' where every operation is allowed to fail or block and developers require intimate knowledge of the implementation to know program behavior. But that is not a good thing.

    Based on our prior discussions, I imagine you prefer to 'support' good abstractions, rather than to 'protect' them. My own interest is open distributed systems - a context in which protection and secure abstractions are essential. As I secure and scale, I tend to find the subtle differences and problematic interactions more obvious because I am far less able to hack around inadequacies of the model.

    Wow!


    a() {
      let l = function(_)       { before_b; return bArgs; } `pipe` b `pipe`
              function(bResult) { before_d; return dArgs; } `pipe` d `pipe`
              function(dResult) { before_e; return eArgs; } `pipe` e
      in call(l)
    }
    

    And this is clearer?

    Layering on more 'abstractions' and `composition' of callbacks when none needed just complicates things even more.

    I will give you one final example. Consider:

    f() { XXX; count = write(fd, buf, size); YYY; }
    

    Here the write can block. In effect you are doing an RPC. Usually the OS will fire off a request on your behalf to the disk (or the remote server or a user level process etc) and put the caller thread to sleep until the time the request is satisfied or fails. Some OSes may allow your thread to continue right away by returning a handle (in effect a ticket with which you can claim the response data later). This complicates user code considerably but sometime this is justified in the name of efficiency.

    The blocking semantics are easier to reason about; even when the call may fail! Typically you can test for the result of write (using a convention of -ve return value). Or you can raise an exception (try ... except .. or a signal handler), or you can provide error code as a separate value as in Go. All of these are far less unwieldy from a user perspective than an explicit continuation passing style.

    May be a Haskell based OS API would do things differently -- I don't know; haven't seen an OS written in Haskell -- but it is no use saying throw away your existing, limited (but proven to be quite capable) languages and libraries and use this newfangled language which is *not* proven in many areas where RPC is being used, and dealing with partial failure etc is equally painful. And even so, so far you have not shown anything that is better!

    Unfortunately this discussion is not going (to go) anywhere :-(

    Simplistic

    With blocking calls, you are creating complications elsewhere - i.e. because you'll need more processes or threads to keep the CPU busy and the program responsive, and you'll need more synchronization in order to protect any shared data models or resources, and you'll need more discipline in order to avoid starvation or deadlock or priority inversion. These concurrency complications are not visible in your toy examples, but they'll certainly be part of any real program.

    When a model seems locally simple, but causes complications in its clients, we call that 'simplistic'. It is important to capture essential complexity, otherwise we multiply that complexity by the number of clients. The classic example of a model being 'simplistic' was putting the Earth at the center of the system, then deriving complicated Ptolemaic orbits and epicycles for every other planet - locally simple, externally far more complicated than necessary.

    RPC is simplistic. For a toy example, it seems simple, but you create complications for the clients of the blocking functions. The client of your function 'f' must know implementation details to reason about program responsiveness and possibility of deadlock. The client of your function 'f' may need to create new threads to achieve responsiveness or manage concurrent interactions with the OS.

    To answer your question: yes, the callback piping is clearer, even though it fails to put imperative syntax at the center of your universe. It is certainly more consistent and amenable to refactoring and abstraction than the callback example you provided. Points of local delay are clear - i.e. right at the `pipe` statements. Responsiveness, synchronization, and stability of data models will be clearer because we don't need blocking functions, e.g. using the vats. Even stack behavior will generally be simpler and flatter because we aren't preempting and storing the stack at arbitrary positions.

    Regarding the rest of your argument: You are not constrained to directly using the abstractions your OS provides. I have no clue why you're talking about a Haskell OS. Callback-based event management has a long history, is well-proven, and does not require Haskell. Pipelining similarly has a long history, is well proven, and does not require Haskell.

    RPC is a very, very bad idea

    > RPC is a very foolish approach to distributed programming.
    >:-) :-)
    >You keep making such statements but you have not shown them to be true.

    That's my experience also..
    I've used RPC and IMHO RPC is a very nasty feature: it looks very simple to use, because it hides the difference between a normal function call and a remote call so "inexperienced" programmers like RPC and use it as a normal function call(!).
    That's the big issue: when 'remote' errors happen then it becomes a nightmare because the "inexperienced" programmers will not have handled correctly the error cases..

    Inexperienced programmers

    I've seen inexperienced programmers make an even bigger hash of event driven programming (among other things). The key is to either train them or not hire them at all. An RPC is different from a normal call and they should understand the differences. And they should also understand that normal calls can also fail and so they must write code to deal with such failures. You can use a naming convention for RPC to remind them if necessary.

    Inexperienced programmers (hopefully) learn and become skillful over time. In my experience a development environment optimized for the skillful programmer is generally more productive overall. If you build in a lot of training wheels & safety nets and what not for the newbies, the problem is these measure never get removed and slow everyone down.

    This is getting very far from PLs so I will resist further temptations to respond :-)

    callbacks?

    i thought i'd heard somewhere that they kind of suck?

    callbacks

    I agree that 'callbacks kind of suck', but I feel this is more a function of imperative being a rather poor programming model to start with - especially in context of concurrency or distribution.

    As a note, Jonathon Edwards had to abandon his notion of coherent reactions because, well, they really weren't all that coherent.

    improving callbacks via metadata for compilers?

    Yeah, callbacks do kinda suck in imperative code, perhaps most when nice control flow is hard to define in one thread. Let's ignore manual memory management as just spin on the ball making things tougher. (And I haven't been working on my language for a while, but this would be tangentially related.)

    A callback implies you did something async, at least slightly, as if two independent threads did something in parallel, in which case a callback represents a control flow join. But if there was only one thread in the first place, presumably one control flow terminates, or more typically, they both interleave future behavior somehow. Getting effects to occur in reasonable order, with reasonable latency and stack depth, can be headache inducing. It's hard to control in presence of side effects; mostly functional dataflow is not as nasty when dataflow acts like queueing to defuse "bunching up" conflicts.

    Worst of all, callbacks are hard to grasp when a language provides no support letting you see structure in what should happen. It's hard to trace, study, or validate patterns and rules written down nowhere. Often the plan describing what must happen exists only in one programmers's mind, tyically not expressed in a comment. (Many folks believe "comments are bugs" or else act as if they believe this. I much prefer analyzable redundancy, even at risk comments are wrong.)

    Ability for anything to happen represents a form of obfuscation. Callbacks let you run code as soon as appropriate, just as refcounts let you deallocate space as soon as appropriate, and in both cases auditing what actually happens is tough since deviation from plan is hard to see.

    A model I like better than callbacks is event-oriented and/or dataflow oriented, and I can usually turn callbacks into events. (A callback can change state of a waiting caller to say, "Wake up, the callback occurred." And then scheduling goes on as normal instead of handling the callback reaction right then and there.) It does add a bit more latency to handle a callback.

    Here's why I wrote this post: it would be nice if a language let you declare planned behavior of async systems, perhaps out-of-line with respect to actual code, so you could statically (or dynamically) analyze whether code looked consistent with that plan.

    It's hard to use syntax and grammar to indicate everything you want to say in one place. So maybe additional related annotation somewhere else would help. In fact, your syntax could be very simple, even Lisp-like in simplicity. And then declarations of meta-language features can appear somewhere else, also in simple syntax, but a compiler would know you meant this as annotation of that.

    In principle, nothing stops meta-annotations from being as large as you want, and having their own meta annotations, so you can express layered patterns of rules in metainfo as needed. But instead of a concrete example, consider hand-wavy references to other metadata systems:

    For example, RDF tries to express everything in standardized tuples (usually triples) letting everything be described via properties and literals etc. While I don't like RDF much, because static ontologies are really irritating (because worldview assuming and inducing), its data representation is useful to consider when you think about encompassing arbitrary navigable metadata.

    marginal improvements

    It isn't clear to me that adding an ontology and analysis for declarative async properties will actually save us any complications.

    I think our efforts would be better spent developing a programming model or framework...

    marginal but closing feedback loop

    I agree async code will be just as complex, and time is better spent on models and frameworks, since otherwise metadata would mean nothing without a model to target.

    Step one is making async code work correctly. Steps two and three are debugging and defending async code from reasonable challenges by other developers. (I only mean to say we have multiple steps, not to characterize them better than sloppily.)

    I find myself studying log traces by hand when another dev say, "I don't think your code works when I do this." Doing it by hand seems a mistake. Among other things, it's expensive.

    So an ontology (better yet the model an ontology describes) is a kind of overhead, like unit tests, intended to help automate compliance of actual behavior to planned behavior. As soon as you think of automating compliance analysis, you must ask, "Compliance with what?" Tools have no access to ideas not written down.

    I was suggesting a way to get a grip on complications and whether models address them. Maybe that's too subtle early in the game.

    static ontologies are really

    static ontologies are really irritating (because worldview assuming and inducing)

    I'm no big fan of RDF, but one of its goals is to not depend on any one static worldview. Features like sameAs (oh wait, that's OWL again) are specifically designed for linking different ontologies.

    Nested Meta-info

    If we do a lot of programming in the 'meta-info' layer, I imagine we'll want all the normal facilities in this layer - abstraction, conditionals, modularity, extensibility, distribution, et cetera. At some point, the 'layering' becomes indistinct, though there still may be a clear dependency path - staging rather than layering.

    Summary and search term

    Ok, so I'll try and summarise your answers in slightly different terminology and then you can tell me if I've understood you correctly, or missed some of the details. Your overall system is composed of objects; each object contains some private state and a set of methods that provide an interface. Each object runs inside its own thread / process (where the semantics look like private processes but the compiler may merge some objects into threads within the same process as an optimisation). The messaging interface between the objects is network-neutral so that method calls may be marshalled across a socket to another machine. Each thread executes a message despatch loop (normally called Event Based Programming and naturally asynchronous, I think this may be a search term that you are looking for).

    The part that you want to simplify with language support is the decision of which message to process next. In a synchronous process there would normally be some sort of alternation across the channels that the messages arrive on, if this includes priorities then the semantics can become very complex. Your example used a total ordering for priorities, this may make the problem easier as you do not have to worry about justice and fairness within equivalent priority levels.

    There are some issues that might be trickier than they first seem. Each object needs some kind of handle or reference to allow other objects to make method calls. If these object references can be passed within messages to other objects then tracking their lifetimes / validity becomes difficult. If objects can move during execution (e.g for load-balancing) then associating references with locations and handling routing can be difficult. On the other hand if object lifetime is static (and thus the object graph within the program is static) then you limit which set of programs can be written.

    Atomic methods impose some difficulties for the programmer; if I want to write one logical procedure than depends on a series of interactions with other objects then I need some kind of persistent storage between each step of event processing. If this storage is at the object level then you need some kind of consistency between separate methods; a very coarse grain guarantee would be that each method uses a disjoint set of member variables. This is one of the trickier issues for the programmer to track in this style of programming. Although the asynchronous model is closer to the architecture one could see subsequent processing steps (event despatches) being synthesized from one synchronous block (i.e a method scope). The transformation to do this is not too complex as every method call (with a return value) causes a message to send and the processing step to stop, with a new event handler for the remainder of the scope.

    I've not used OSE but from writing a lot of code in this style for embedded applications I would say that the trickier part of handling the queue is actually splitting up synchronous operations into smaller asynchronous (event-based) steps. If you have the freedom to design the input language as you like, would that not be a more helpful transformation to automate for the programmer?

    Seems like you grasped my

    Seems like you grasped my design idea.

    As for lifetime for object references, the object reference will either be alive or dead and once dead will never wake up again. The definition of dead is more tricky, how long do you wait before deciding it's dead? Timeouts has a tendency to be too short or too long, sometimes even both at the same time...

    The application will then need to worry about a reference going bad. The interesting question here is how much has been done before the connection was broken? Not sure how to best support the programmer in this situation without risking too much latency or bandwidth. E seems to have some kind of checkpointing which I haven't really get my head around how it works, but it might be how E solves this problem.

    Object references must be something that can be sent around. There might be some translations as to how the reference is expressed, but that would merely be something for the compiler to care about. As for tracking lifetime and validity, before using it you don't know. Once starting to use it there should probably be some kind of lost connection detection mechanism that would send an exception whenever connection is lost. There will however be a latency between when the last successful method call was performed on the object and when the caller is notified that the reference has become stale. There can have been several method calls in between, given that none produced any return values.

    Moving objects around for load-balancing reasons is another interesting thing. Not sure how important this is really, I think you can solve load-balancing without having to move objects. The question is, do we really need to perform load-balancing in the middle of a transaction? If not, then we could have different routes depending on load levels instead, which is something the application itself implements.

    The atomic methods does impose difficulties, but I'm wondering whether this means that I'm showing the problems associating with coding applications using these asynchronous threads (true asynchronous?) or if it actually adds to the complexity? The thing here is that for each object call, you might get another messages while waiting for the answer. This new message might do something that also affects the object's state. One thing to note here is that method calls should be ok for an atomic method as long as it doesn't need any return values. So some method calls should be possible, but not all. Oh, this is getting messy...

    I don't think I've ever broken up a synchronous operation into several asynchronous ones, I'm probably not daring enough. ;-) But do have seen the need, though I'd wanted a more complete redesign to make it decent.

    Forgot to answer your question

    The main protocol that I was thinking of for stateless requests was HTTP, but it may be a bad example as most uses now seem to bodge a huge amount of state into it.

    Isn't there a risk that

    Isn't there a risk that stateless communications results in a lot more bandwidth, since less assumptions can be made?

    Bandwidth and Stateless Communication

    That can go either way, actually.

    The cost of 'stateless' communication models is that, in the most direct implementation, the whole context must be sent back and forth. On the other hand, this is much more subject to various optimizations - caching, streaming, propagating differences (patches), compression/abstraction, parallel computations.

    Monotonicity, idempotence, and commutativity can further increase the number of valid assumptions. You might read about Consistency Analysis and Logical Monotonicity.

    The only trouble is, this ability to make useful assumptions doesn't offer sufficient performance advantages to offset the costs unless you do support the relevant optimizations. (There are often non-performance advantages that can make up the difference, though.)

    Collection of research papers

    You seem to know a lot of interesting research papers. Do you have any indexed collections of them available on-line? I know there are a number of links scattered around at the erights web site, but it would be helpful to have them collected together.

    Go

    Forgot to add this. Do checkout the Go language for an example of a practical concurrent programming language. Its typed channels and select statement to choose which channel to send-to/receive-from make concurrent programming very easy. In OSE each thread has one receive queue and everyone sends into the same queue so its receive() call needs a filter to pick a message from a particular sender or with a particular "signal number". In Go you don't care who holds the other end of a channel and you can have as many channels as you want. OSE's "link handlers" allow threads even in different nodes to exchange messages, which is very useful. Go is a compiled language so it would be difficult to build generic "link handlers" to have channel ends in different nodes. Anyway, Go might provide some inspiration!