A Language-Based Approach to Unifying Events and Threads

A Language-Based Approach to Unifying Events and Threads

This paper presents a language based technique to unify two seemingly opposite programming models for building massively concurrent network services: the event driven model and the multithreaded model. The result is a unified concurrency model providing both thread abstractions and event abstractions.

The implementation uses the CPS monad in such a way that the final result is a trace, that is, an ordered sequence of function calls. Threading is part of the basic monad implementation, and a scheduler is as simple as a tree traversal function over a queue of traces. Once you have a scheduler, events are obvious.

This is quite elegant, I'll start using it for my own applications as soon as I get hold of the source.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

What is the main result of this paper?

The paper is an interesting read, but I'm missing what is novel about the approach. Figure 3 gives a good overview of what the implementation performs. The section in the middle seems to be the interface between the event based and thread based representations.

But isn't this what every scheduler implementation provides? The scheduler is normally the interface between the machine and the processes being executed. The interface to the hardware is event based, from the schedulers point-of-view, while the interface provided to processes is thread-based.

It seems obvious that I've missed something in reading the paper, so could somebody else explain where the novelty in the paper is?

Could it be that it allows both representation simultaneously

(I only skimmed the document so...)
The nice aspect of "decomposing" transactionality and using transformation technics to move from one representation to another (thread versus events) in a single program space. This allows different parts of your design to work with different representation and yet they all work together. I am not sure that this paper is arguing this advantage...

Some snippets should illuminate the novelty

Page 8, bottom right:

This event-driven architecture is similar to that in SEDA [25], but our events are finer-grained: instead of requiring the programmer manually decompose a computation into stages and specify what stages can be performed in parallel, this event-driven scheduler automatically decomposes a threaded computation into finegrained segments separated by system calls. Haskell’s type system ensures that each segment is a purely functional computation without I/O, so such segments can be safely executed in parallel.

Most user-level thread libraries do not take advantage of multiple processors, primarily because synchronization is difficult due to shared state in their implementations. Our event abstraction makes this task easier, because it uses a strongly typed interface in which pure computations and I/O operations are completely separated. In Figure 4, the five steps of lazy thread execution are purely functional: they process an “effect-free” segment of the user thread execution. All I/O operations (including updates to shared memory) are submitted via system calls and are explicitly performed in the scheduler; the programmer has complete control on what I/O operations to run in parallel.

In other words, this library enables you to write code in a straightforward thread-based style, which transparently uses efficient asynchronous I/O ala event-loops. You can also provide a custom application-level scheduler written in event-loop style should you need to provide application-specific optimizations. It also transparently multiplexes all of your "user-level"/"application-level" threads over real OS threads for optimal CPU utilization. And it's all type-safe. Sounds pretty good to me. :-)

Process languages

Occam has allowed this at the language level for 20 years. Any of the process-based family of languages has a much tighter integration between the thread-view and the event-view. The underlying implementation is very efficient, so that code that looks threaded to the programmer is scheduled as events. The transputer architecture offered parallelism for programs written in this language.

Of course the process languages never really became mainstream, and maybe it's time for this to be reinvented to see if it catches on the second time around.

One aspect that you've quoted that is advanced in contrast to the CSP languages is the automatic extraction of parallelism. In Occam the programmer is responsible for identifying the parallelism in a program explicitly. Allowing the scheduler to decompose the program into segments and decide the constraints over their execution is a good step forward.

Web continuations have already been mentioned, but I wonder how this would cope with a loosely coupled cluster as an execution environment? If the segments are relatively long or the dataflow between them is sparse then this could have interesting results.

re: Process languages

Well, when it comes to modern process-based languages there's always Erlang, which is kind of mainstream (at least for some applications). And the KRoC team is working hard at making occam relevant again, by adding various capabilities for mobile channels and mobile processes. OTOH, occam-pi only runs on x86 machines at the moment, so you can't really take advantage of the nifty parallel scalability offered by transputer clusters. But now that the KRoC's transparent networking capability is in place, I imagine that you could do some neat things with dynamically distributed mobile processes.

+1

I have to agree with Andrew Moss here. I feel like the transformation between events and threads via CPS is implicit in, say, all the web continuations stuff. I would love to be proven wrong here though.

+1 naasking

I guess I'll just have to parrot naasking here: the benefits are that the scheduler is user-level and customizable and that the user-visible implementations are simple, simple, simple: typically ~200 lines, and type-safe. Pragmatically, this makes a huge difference. I've lost count of the number of "frameworks" of various kinds I've used where crucial user-visible state-machine pre-conditions, invariants, and post-conditions weren't documented at all, let alone checked, and actually extending anything would break the system at the drop of a hat. Having source code has often been of only modest benefit, with individual functions/methods running into the dozens of screens of densely-intertwined code.

So the main result isn't that events and threads are dual; as the paper points out, that point has been made elsewhere. The main result seems to be that this duality can be exploited in such a way that subsystems that traditionally haven't been user-visible are, that these subsystems can be modified realistically by users of the framework, and that this is due to the leverage offered by the expressive type system of a good language—in this case, GHC with its extensions to Haskell 98.

Pity that web server continuations are patented

Unfortunately one of the main applications of something like this is a web server. I say unfortunate, because its patented. http://lambda-the-ultimate.org/node/1365

Paul.

How does this differ from the OGI OSKer/House kernels?

I just downloaded this paper, but my immediate impression is that it is very similar to the work done with the OSKer kernel and House Operating System. However, those systems use a "resumption monad" instead of CPS, which they claim is more limited, but cheaper and easier to reason about.

So, my initial impression is that this paper is slightly more verbose in its naming of events and that it relies on CPS, which may be more powerful, but less intuitive than resumption monads. Am I correct in this, or is there something more here?


Cheap (but Functional) Threads
paper.


A Principled Approach to Operating System Construction in Haskell
, abstract with links to paper.


House webpage
, an operating system authored in Haskell. Its been discussed here before.

Userland Threading

Is the approach they are advocating in the paper simmilar to a user-space thread library where each system call (I/O related) is actually wrapped by the thread library so that each I/O call yeilds to the scheduler before calling a blocking I/O function?

Therefore the call returns in the background, the main loop is processing another thread that is not doing I/O, and the 'blocked' thread will resume not immediately after the I/O call returns but when it is next activated by the scheduler.

Something like Gnu Portable Threads (but with the added I/O async properties)

Sources available

Sources for Unify are available now. You'll need Linux 2.6, libaio, and GHC 6.5 to get the full effect.

Importance of the type-system

They say that Haskell's advanced type-system is a major factor in the ease/clarity of implementation and usage. But the only feature that I can see as crucial is knowing most of the code is purely functional; this allows them to do multithreaded processing safely.

But should purity (no side-effects) really be construed as a type-system benefit? It seems one could just use e.g. a pure subset of scheme (or some other language with at least one-shot continuations), and only allow side-effects in the scheduler code.

[note: this is not a statement about the usefulness of static typing - just about how this system is tied to it]



I also find it ironic that they use a pure functional language to allow an imperative style.

Perhaps this just reflects the necessity of ordering for useable concurrency.

Starting with purity and

Starting with purity and using something akin to a monadic typing discipline ("all effects take place via a monad which shows up in the type of the result values") allows precise control of which side-effects are possible when. So being able to mix purity and impurity effectively without blurring the boundaries is indeed a type system benefit.

How does yield work?

I found this article very interesting...

It seems to me that the functionally pure code that lies "between" the system calls are lazy thunks, which would get evaluated at the time the system call requires the values. Is this right?

If so, how would the yield primitive work? What forces the evaluation of the pure code that lies between a pair of yield syscalls in the trace stream?

the scheduler I'd think.

I'd think it'd be the scheduler that forces the evaluation.

Continuation-Value duality

Everytime I see this paper or look at it, I immediately think of Olin Shiver's Continuations and Threads: Expressing Machine Concurrency Directly in Advanced Languages (PS) and that essentially, down at the bottom, the continuation-value duality (or the duality induced by the not operator) is where Lauer and Needham's duality comes from. I have the beginnings of some half-baked ideas on how.

[edit]
Hmm, this line:

The relationship between threads and events can be made
explicit using a continuation-passing style (CPS) translation
[3].

suggests Andrew Appel already did this in Compiling with Continuations.

Paper has moved

Now here.

Paper missing

Seems that link is now also dead.

I found an older (?) paper here:

A Language-based Approach to Unifying Events and Threads