Asynchronous Programming, Monads and Continuations in C#, F# and Scala

Anders Heljsberg presented new asynchronous programming constructs for C# at PDC10. There is documentation, videos and a technology preview download.

When your user interface is unresponsive or your server doesn’t scale, chances are you need your code to be more asynchronous. With today’s .NET Framework and language features, though, that is easier said than done.
The Microsoft Visual Studio Async CTP proposes a new language feature in C# and VB, and a new framework pattern to go with it, that will make asynchronous programming similar to – and about as straightforward as –synchronous programming.

How is this related to F#'s asynchronous workflows, and monads in general?

We describe the asynchronous programming model in F#, and its applications to reactive, parallel and concurrent programming. The key feature combines a core language with a non-blocking modality to author lightweight asynchronous tasks, where the modality has control flow constructs that are syntactically a superset of the core language and are given an asynchronous semantic interpretation. This allows smooth transitions between synchronous and asynchronous code and eliminates callback-style treatments of inversion of control, without disturbing the foundation of CPU-intensive programming that allows F# to interoperate smoothly and compile efficiently.

What's the connection to Scala's delimited continuations by type directed CPS transform?

We describe the implementation of first-class polymorphic delimited continuations in the programming language Scala. We use Scala’s pluggable typing architecture to implement a simple type and effect system, which discriminates expressions with control effects from those without and accurately tracks answer type modification incurred by control effects. To tackle the problem of implementing first-class continuations under the adverse conditions brought upon by the Java VM, we employ a selective CPS transform, which is driven entirely by effect-annotated types and leaves pure code in direct style. Benchmarks indicate that this high-level approach performs competitively.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


A Language-Based Approach to Unifying Events and Threads - leverages monads and laziness in Haskell.

E language's 'when' construct.

It's nice to see these features inside a 'mainstream' language.

It's sad that Haskell still doesn't qualify. :(


When your user interface is unresponsive or your server doesn’t scale, chances are you need your code to be more asynchronous.

This paper focuses on 'wait'-related asynchrony, such as network IO. Most applications I've worked with perform a rough mix of waits and high-expense 'calculations'. The expensive calculations might be rendering a scene, planning a chess move, etc.

Even with the obvious additions of parallelism and preemption to help with the calculations, responsiveness can suffer when a few such calculations are active. For example, assume preemption is 20 milliseconds (the default value for GHC's green threads as of version 6.12). The expensive 'calculation' threads or 'par' sparks will tend to run for their whole time slice for each time slice until completion. Meanwhile, the UI threads tend to run in short bursts, running several asynchronous tasks before awaiting further input - and thus waiting for a future time slice.

In these circumstances, responsiveness can suffer greatly. Our UI thread might run every 20-60 milliseconds, depending upon the whims of the scheduler. It runs just a few milliseconds, then waits. A complete response to a user action might involve a sequence of these short 'bursts', and thus (under load) the user suffers perhaps (40 milliseconds) * (N incremental UI-thread 'steps') before users observing a response.

The result is often unacceptable responsiveness while under load - which, as an indirect consequence, encourages developers to grossly underutilize their CPUs.

There are ways around the problem - priority management, for example - but those techniques often remain unsatisfactory. For high responsiveness, I feel that developers need more than to simply introduce asynchronous behavior. We also need effective control over the temporal semantics of our programs. I'd like to have effective control over the balance of responsiveness and throughput, rather than leave that balance to whims of a relatively uninformed scheduler.

This means we must be explicit about when events happen. We need to balance a rather nasty problem: a large calculation must often occur before a deadline in order to achieve communications-effects (side-effects, FFI) within a temporal window. For calculations, the penalty for computing a value too early is at worst a little memory overhead. For FFI, an event happening too early can easily mean an incorrect program, or at least a less predictable program (a challenge for software verification and reuse). Anyhow, this means figuring out which calculations are needed ahead of performing the IO. Additionally, we likely need to adapt priorities based upon which results are needed soonest, and we need to adapt priorities for which threads are performing the IO vs. which are performing expensive calculations.

Within Haskell, I'm experimenting with incremental folds (including incremental groups of 'par' sparks) in a clocked event-loop monad as a basis for managing large calculations. I might compute a different amount of data each logical millisecond, based upon relative deadlines and adapting to avoid time-leaks (where logical time for the event-loop falls behind real-time). But, while this is a little more convenient than using the raw monadic interface, this still requires some ugly maintenance overhead by the developer to specify which calculations to perform ahead of time and express them incrementally. (Fortunately, Haskell is quite flexible. Use of monad transformers can serve as an intermediate language between threading and event loops. I've yet to give that a try, but I do think it will help.)

I'm increasingly convinced

I'm increasingly convinced that the challenge is better met at the system level: notions like responsiveness and data quality are subjective so typical optimization criteria should be shifted. E.g., in my field, we need to optimize for perceived response time, not actual full computation time (which may happen some time after). Similarly, for many map/reduce scenarios, the full map reduce is irrelevant; a continuous/online variant is sufficient (think of watching a statistic converge). In a sense, the controls you are advocating for here (and I often do as well!) seem like a rather cumbersome and low-level approach to what may be much simpler to do by system-level rejiggering. Unfortunately, I have no idea how to leverage languages to express these needs and capitalize on them.

Anyways, back to the main topic: in related news, our friends at Asana (who previously brought us Lunascript) now bring (native) fibers to the V8 JavaScript engine in order to simplify serverside scripting. This is less crufty than the typical source-to-source rewriting approaches.

Improving Values

I agree that much also can be done at the architectural level.

Improving values offer an imprecise or incomplete answer right away then having it incrementally 'improve' - grow more precise and accurate - over time. At 20 milliseconds you might know the answer is between 0.27 and 0.83. At 100 milliseconds, you might know it is between 0.56 and 0.59. These 'improving' values could easily be used to render a scene or drive a robotic arm. Generally, you'll want improving values to be 'stable'. For example, you'd favor a Verlet integration rather than Euler's method.

I agree this is a useful path to take to improve response times, and seems a natural fit for concurrent constraint programming (where improving incomplete answers is the whole basis of computation), or continuous functions (where you have nice stability guarantees that ∀a.limx→af(x) = f(a)).

But I don't believe this a substitute for the lower level control over temporal semantics. Rather, the two techniques quite complement one another, especially in reactive programming.

So you're "increasingly convinced", eh? Is that a stable function? ;)

"Task" vs future?

I must be missing something. From looking at the whitepaper, this very much looks like futures, but implemented via something coroutine'ish, thus the complicated CPS transform. But why not just use futures+threads and avoid all the compiler complication and language adhocery?

Threads are expensive on

Threads are expensive on .NET. What I think may be interesting is that you may be able to do something similar for other monads than the Async monad. For example you could say:

async List<int> Foo(){
  a = await [1,2,3]
  b = await [4,5,6]
  return a+b

=> [1+4,1+5,...,3+6]

This makes me wonder why the monad notation in Haskell and F# doesn't have an await-like operator, and instead requires you to insert monadic lets for nested expressions? For example if await is spelled @, and a and b are sequences:

seq { @a + @b }

Would be the same as:

seq {
  let! the_a = a
  let! the_b = b
  return the_a + the_b 

Or in haskell land:

do @a + @b


do the_a <- a
   the_b <- b
   return a + b

Why not fix that?

Threads are expensive on .NET.

Well OK, frankly, then why not fix that first? If building continuation closures is cheap enough, a VM thread shouldn't be more expensive.

Interop with native code and

Interop with native code and thread affinity outside the VM's purview complicates things greatly in practice. The thread you're running on may be the UI thread, called by OS indirectly via the message loop; and it may, in the general case, need to call out to third-party code that relies on running on the UI thread for correctness. It will also likely expect to have a decent stack address space allocated to it (committed on demand by touching the last committed page). The address space reserved for stack is probably the most costly thing about threads on Win32 (as opposed to Win64).

Since a large part of MS's motivation for using C# (and previously, extending J#) is better interoperation with the underlying platform chiefly by making it easier to call into unmanaged code, they would have a particularly hard time breaking the implied linkage between a VM thread and an OS thread.

Not that that isn't done; it is done in SQL Server hosting the CLR, which as far as I'm aware, largely uses its own scheduler. But software running in the context of the database is much more locked down than general purpose code.


Interesting, thanks for the explanation. I'm not entirely convinced, though. :)

I don't know the thread implementation of dot-net, but one obvious approach to VM threads is to multiplex them onto a (potentially dynamic) pool of OS threads. When you call out to the OS or other unsafe code, you "own" the respective OS thread for the duration of that call, so the problem you describe shouldn't be too hard an issue.

The trouble is that

The trouble is that third-party code may have its own thread affinity with thread-local storage. If you make two separate calls to a third-party library from the same VM thread, and the third-party library has thread affinity, it will be very confused to be reentered on a different OS thread.

Basically, any interop which makes the code calling from the VM look different to traditional (i.e. C level) code is going to cause compatibility problems, and compatibility is of extremely high importance - it's really quite hard to overstate - in a non-academic context.

.NET Tasks are indeed like

.NET Tasks are indeed like futures. As for why not use threads, .NET threads are indeed rather heavy, though not as heavy as OS threads. I think the more important issue though, is that threads implicitly keep references to a lot of state that won't be used in the future (on the stack). I would think this directed CPS transform allows the developer to more precisely specify what he's keeping around. I'm not sure how much this amounts to, but it could be significant for I/O bound programs.

Edit: not to mention, serializing continuations is possible, where serializing suspended threads, not so much.

SaaS and Split and Join

Looks to me they just (rightly) tackled a programming problem which is expected to occur in SaaS applications often: How to deal with splits and joins in programs which use and offer network services. Since that will be more about dealing with expected (network) latency and therefore exploiting possible concurrency in the program while coupling multiple services, they just tossed in a set of keywords as syntactic sugar for something which otherwise would have been solved with explicit worker threads and a cumbersome manner of joining their results when they become available.

In other words, it's syntactic sugaring which enables the `dumb' programmer to write SaaS applications more trivial - looks likes it's less about exploiting hardware parallelism.

So, you're right to question their motives for throwing in this particular syntactic extensions (and not others) into C#. To me, the conclusion is that Microsoft, at the moment, is looking more into how to build a platform which will dominate the server application programming language (SaaS) market, than looking into building a platform which will dominate the systems applications market.

(In other words, this is just about [providing solutions] where the money is [and futures might just have been dissed as too complex for the average Joe programmer].)

[ Also. You stated "why not just use futures+threads and avoid all the compiler complication and language adhocery?" In short, this is futures and threads and syntactic sugar by CPS transform for dealing with the 'split' and 'join' points which one would otherwise need to provide in an elaborate and error-prone manner. It is certainly not a bad solution. ]

Re: just use futures+threads

But why not just use futures+threads and avoid all the compiler complication and language adhocery?

I understand your sentiment. If they had fine-grained threading to start with, they wouldn't need these hacks to represent fine-grained threading as a CPS transform. Or would they?

I've begun to favor event-loop, turn-based threading models, even despite the apparent duplication of effort. Preemptive threads seem to have too many responsibilities: parallelism and concurrency, at the same time. When we split concurrency from parallelism, it becomes easier to reason about state management, progress, latencies, and failure modes.

I'm currently implementing an E-like 'vat' model in Haskell, specialized for my temporal reactive demand programming (RDP) paradigm. I previously tried threads + STM, but threads too often competed to manage reactive updates, would wait upon one another or undo work making it difficult to reason about system performance (latency, efficiency, parallelism). Similarly, I was disappointed by threads + pervasive MVars, which is quite close to threads + futures. But a vat concept greatly simplifies the reasoning. Every object is maintained by exactly one vat, vats don't compete and never wait upon one another (wait-free concurrency, which gives me local reasoning about progress) and vats are internally deterministic - allowing them to serve as independently verifiable packages (modules, services, application instances). I can still support considerable parallelism between vats or (via Haskell sparks) within vats.

But it is painful (syntactically) to explicitly use event-loop or callback-based concurrency. Much of the paper was describing various ways in which this is painful. That motivates providing an intermediate layer (perhaps a monad transform or a GADT) that can represent more convenient models atop the event-loops. This 'intermediate layer' is the role I see these "tasks" as fulfilling, which is why I mentioned the unifying threads and events paper in the first post.

Threads+futures is simplistic - inadequate for the sort of reasoning I'd like to support in RDP, and certainly inadequate for many other applications. This 'worse-is-better' language adhocery is needed because threading was the wrong model in the first place.