Can javascript concurrency be expressed as a monad?

It is easy to find questionable aspects of the javascript language, but there is at least one feature that it gets right: the composable concurrency model. The built in event loop and the continuation passing style allows completely different IO stacks, or many instances of the same stack to be easily composed in nodejs. Since every IO is asynchronous, the javascript programmer is forced to do the hard work and write code that fires as much asynchronous IO operation concurrently as possible, and to join the results in callbacks. This is not possible in a threaded programming language, since that encourages the programmer to write serial code, and the same holds if you use monads in Haskell which serializes all calls for you.

Here is an example application: suppose you want to traverse a large tree (directory structure or tree stored in a database) and print it out in a depth first fashion (or as an XML file with embedded child objects). If you use monads, then you will issue IO operations sequentially: first get the directory of the root, then the directory of the first child, then the directory of the first child of the first child, etc. However, the IO operations are slow, so you want to fire them concurrently. If you do this blindly, then you effectively do breadth first search as you issue IO operations, but then you have to store the results in memory to print it out in depth first, which will explode the memory overhead. Clearly, the best is some combination (use depth first search to drive the traversal, but at any point you issue extra queries that are expected to come soon).

Is it possible to write this depth first search concurrently in Haskell? The best would be to have a "handle" for file IO, or for socket IO, which you can use to serialize IO operations when necessary and execute IO parallel when necessary. I would like to execute operations A and B concurrently, but force the console IO to be serialized, however allow e.g. read only file IO to be parallel. I do not want to solve this problem by serializing the execution of A and B, only their console IO needs to be serialized. Is this possible?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I'm not sure but there might

I'm not sure but there might be some confusion... I've always thought of the intent of the monadic forms to have more of a static constraint dimension (at parsing- and type check-time) which, by very design/construction, means to induce a sequential chaining of the operations (once type checked), later at runtime.

(and yeah, then some debates may ensue on where compile time "ends" and runtime "starts", for some language processors/execution environments, okay)

I haven't seen anywhere yet where one of the "best" (or, say, "typical"/intended) use cases for monadic forms is claimed to be for parallelizing computations held in the monad's sub-terms.

For another related "parallel" / analogy (er... no pun intended!) :

if you want to ensure the serialization of threads re: a specific portion of code, you'll probably be using a statically scope critical section, reified in the language, via e.g., "lock(...) { ... }" in C#, but if you want to have them threads meet/join, well... you'll use "Thread.Join()" instead (for the same language).

Which are two, both short, but much different syntactical forms that by design / by construction aims to make their usage convey the distinct intents and their targeted effects very explicit at compile time, for what will happen later at runtime.

Or maybe I missed your point.

I think there are a few

I think there are a few different possibilities here. If we use the monad-as-semicolon analogy, then we can insert arbitrary computation in-between our actions (in the case of (2) below, we insert polling):

1) Request => Response => Request => Response => Request ....
This is the synchronous, serial version which can be easily implemented within the IO monad.
2) Request => Request => ... => Response => Response => ...
This is still a serial version, but is now asynchronous and non-deterministic. We split our processing function into a request-maker and a response-handler, with some kind of preserved ID to be able to associate response messages with their handler. Our monad's bind function (>>=) chains together (request,handler) pairs. The first thing >>= does is poll the input channel (with a small timeout) for responses, passing any it finds on to their handlers, which it keeps in a Map; then it dumps the next handler into its map and fires the request. In this way, we're still serial, but we can rattle through requests as we find them, and only when responses are received do we handle them. We can get away with this non-determinism whilst still being referentially transparent, since our monad relies on non-deterministic input coming from the World (whether a response has been received yet or not); given exactly the same timing, we'd get the same sequence of actions.
NOTE: We would need to add a "cleanup" mechanism too, which keeps doing 'empty requests' (eg. Nothings) until the callback map is empty (ie. all responses have been received), or some other criterion is met (like a global timeout).
3) [Request] => [Response]
This is the parallel version. I imagine we can do the same trick here that we did for (2), to make the code pure (referentially transparent), but parameterised by some real-world timings. In fact, we may be able to map-reduce the whole thing by firing off a synchronous request/response in its own thread and using MVars and the like to fold together the results.

I may be completely off here, BTW ;)

Re: I think there are a few

Yes, 1) works fine with the IO monad, but it forces all computations to be synchronous not just the IO.

2) I see what you are doing, so essentially you use a state monad to keep track of the outstanding response handlers and fire them when they arrive. However, you need to put the whole thing into a loop to wait for the completion of all requests. Is this a composable solution? Could I run two such request chains concurrently?

3) The completely parallel version cannot work, since I want to send new requests in response to the results I obtain from previous requests (see the directory tree analogy).

In node.js everything seem to work because the event pump is part of the runtime. Can these asyncronous IO operations somehow implemented in a way that uses a single thread and is composable? Maybe I did not understand properly your solution 2).

I'm pretty new to functional

I'm pretty new to functional programming in general, and haven't got an intuitive feel for the really high-level stuff yet (eg. monad transformers and the like), so at first glance I can't think of a generic way to compose (2) with arbitrary computations (although you may be able to make the requests/responses implementation generic enough to shoehorn any particular application into).

It may be possible to do this with arrows; the Arrowlets library[1] gives a nice API to events in Javascript, eg. SignalA and EventA. It may be possible to use a similar arrow library in Haskell[2] to access an event loop implementation (eg. a wrapper around solution (2)).

[1] http://www.cs.umd.edu/projects/PL/arrowlets/index.xhtml
[2] http://www.haskell.org/haskellwiki/Arrow

Threads, events, continuations

The relationship between (logical) threads, evens, and continuations seems to be what you're getting at. Here is the first thing that popped into my head on the subject