Parallel/Distributed

Actors that Unify Threads and Events

In this paper we present an abstraction of actors that combines the benefits of thread-based and event-based concurrency. Threads support blocking operations such as system I/O, and can be executed on multiple processor cores in parallel. Event-based computation, on the other hand, is more lightweight and scales to large numbers of actors. We also present a set of combinators that allows a flexible composition of these actors.

Philipp Haller and Martin Odersky also wrote Event-Based Programming without Inversion of Control which was discusssed here last year. You may also remember A Language-based Approach to Unifying Events and Threads (Li and Zdancewic).

Threads in JavaScript?

Threads in JavaScript? "Over your dead body," says Brendan.

But Neil Mix begs to differ -- they're already there!

Neil's latest blog post presents a cool hack combining JavaScript 1.7's generators with trampolined style to implement very lightweight cooperative threads.

The implementation weighs in at a breathtakingly small 4k.

The Landscape of Parallel Computing Research: A View from Berkeley

(via Tim Bray)

Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.

An interesting survey by a large and multidisciplinary group of Berkeley researchers. I think it is useful background reading for programming language researchers, even if the discussion of programming models is not up to the standards of LtU...

And yes, STM is mentioned (albeit briefly)...

Patrick Logan on Software Transaction Memory

A detailed blog post on STM - and why it is a Bad Thing.

Programming Parallel Algorithms

Programming Parallel Algorithms, Guy Blelloch. 1996.

Many constructs have been suggested for expressing parallelism in programming languages, including fork-and-join constructs, data-parallel constructs, and futures, among others. The question is which of these are most useful for specifying parallel algorithms? If we look at the parallel algorithms that are described in the literature and their pseudocode, we find that nearly all are described as parallel operations over collections of values. For example ``in parallel for each vertex in a graph, find its minimum neighbor'', or ``in parallel for each row in a matrix, sum the row''. Of course the algorithms are not this simple--they usually consist of many such parallel calls interleaved with operations that rearrange the order of a collection, and can be called recursively in parallel, as in Quicksort. This ability to operate in parallel over sets of data is often referred to as data-parallelism, and languages based on it are often referred to as data-parallel languages or collection-oriented languages. We note that many parallel languages have data-parallel features in conjunction with other forms of parallelism.

Before we come to the rash conclusion that data-parallel languages are the panacea for programming parallel algorithms, we make a distinction between flat and nested data-parallel languages. In flat data-parallel languages a function can be applied in parallel over a set of values, but the function itself must be sequential. In nested data-parallel languages any function can be applied over a set of values, including parallel functions. For example, the summation of each row of the matrix mentioned above could itself execute in parallel using a tree sum. We claim that the ability to nest parallel calls is critical for expressing algorithms in a way that matches our high-level intuition of how they work. In particular, nested parallelism can be used to implement nested loops and divide-and-conquer algorithms in parallel (five out of the seven algorithms described in this article use nesting in a crucial way).

IIRC, the team developing Fortress is putting nested data parallelism into Fortress. I think this is one of the most intriguing ways of getting heavy duty parallelism -- something that will grow ever more important as the number of cores in our computers grow.

Programming the Greedy CAM Machine

Programming the Greedy CAM Machine. Erik Ruf. January 2007

The Greedy CAM architecture describes a class of experimental processors that aim to cope with memory latency and enable parallelism by combining a streams-and-kernels execution model with a relational-query-based memory model. This article focuses on the programming abstraction (equivalent to the ISA in more conventional systems) of Greedy CAM systems, as exemplified by a low-level intermediate language. Using a series of small example programs, we demonstrate several programming idioms and analyze their performance using a simple functional-level simulator. We also suggest extensions needed for the implementation of higher-level programming abstractions.

Section 6.8 is on the suitability of LINT, the low-level intermediate language described in the paper, as a target language for the compilation of higher-level abstractions. But comments on the general issues discussed in the paper are welcome as well...

Extending the Multilisp Sponsor Model

Extending the Multilisp Sponsor Model by Randy B. Osborne. 1993.
Speculative computing is a technique to improve the execution time of certain applications by starting some computations before it is known that the computations are required. A speculative computation will eventually become mandatory (i.e. required) or irrelevant (i.e. not required). In the absence of side effects irrelevant computations may be aborted. However, with side effects a computation which is irrelevant for the value it produces may still be relevant for the side effects it performs. One problem that can result is the "relevant synchronization" problem wherein one computation requires some side effect event (a "relevant synchronization") to be performed by another computation, which might be aborted, before the first computation can make progress. Another problem that can arise is the "preemptive delay" problem wherein a computation that will perform some awaited side effect event is preempted by a computation whose importance (e.g. priority) is less than that of computations waiting for the event. In this paper we show how the sponsor model developed for speculative computation in Multilisp can be extended to provide a novel solution to these two problems. The idea is for the computation awaiting some action, such as the production of a value or the release of a semaphore, to sponsor the computation or set of computations that will perform the awaited action. This sponsorship ensures that the awaited action executes, and executes with at least the waiter's level of importance. We show how to apply this technique to solve the above problems for several producer/consumer and semaphore applications. The idea extends naturally to other synchronization mechanisms.
Old news, one may say, but why the industry is continuing to ignore this knowledge (What really happened on Mars? -- Authoritative Account)?. See also the author's previous paper containing besides other information an overview of speculative computing in general: Speculative computation in Multilisp (1990).

Simon Peyton Jones: Beautiful concurrency

I am writing a chapter for a book called "Beautiful code", edited by Greg Wilson. My draft chapter is about Software Transactional Memory in Haskell.

I would welcome any comments or questions you have on the paper, or constructive suggestions for improving it; the more concrete the better.


The book is aimed at a general audience of programmers, not Haskell geeks, so I have tried to explain everything necessary as I go along. So if you are not a Haskell expert, your input would be particularly valuable to me.

You can post your comments on the Haskell wiki.

STM was discussed here many time before, of course. For me the original papers were easier to follow, but some may prefer the style of presentation used here.

Data Parallel Haskell: a status report

Data Parallel Haskell: a status report. Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow.

We describe the design and current status of our effort to implement the programming model of nested data parallelism into the Glasgow Haskell Compiler. We extended the programming model and its implementation, both of which were first popularised by the NESL language, in terms of expressiveness as well as efficiency of its implementation. Our current aim is to provide a convenient programming environment for SMP parallelism, and especially multicore architectures. Preliminary benchmarks show that we are, at least for some programs, able to achieve good absolute performance and excellent speedups.

NESL was mentioned here in the past a few times, as was data parallelism in general.

The Joins Concurrency Library

The Joins Concurrency Library. Claudio Russo.

Comega extended C# 1.x with a simple, declarative and powerful model of concurrency - join patterns - applicable both to multithreaded applications and to the orchestration of asynchronous, event-based distributed applications. With Generics available in C# 2.0, we can now provide join patterns as a library rather than a language feature. The Joins library extends its clients with an embedded, type-safe and mostly declarative language for expressing synchronization patterns. The library has some advantages over Comega: it is language neutral, supporting other languages like Visual Basic; its join patterns are more dynamic, allowing solutions difficult to express with Comega; its code is easy to modify, fostering experimentation.

In the Joins library, the scheduling logic that would be compiled into the corresponding Comega class receives a separate, first-class representation as an object of the special Join class. The Join class provides a mostly declarative, type-safe mechanism for defining thread-safe synchronous and synchronous communication channels and patterns. Instead of (a)synchronous methods, as in Comega, the communication channels are special delegate values (first-class methods) obtained from a common Join object. Communication and/or synchronization takes place by invoking these delegates, passing arguments and optionally waiting
for return values.

The library also supports dynamic joins, which are not available in Comega, implemented by joining arrays of channels (with the size of the array being determined at runtime).

The major weakness of the Joins library is speed, since the static apporach of Comega provides more room for optimization (see the brief discussion in sec. 6).

XML feed