## Programming Parallel Algorithms

Programming Parallel Algorithms, Guy Blelloch. 1996.

Many constructs have been suggested for expressing parallelism in programming languages, including fork-and-join constructs, data-parallel constructs, and futures, among others. The question is which of these are most useful for specifying parallel algorithms? If we look at the parallel algorithms that are described in the literature and their pseudocode, we find that nearly all are described as parallel operations over collections of values. For example in parallel for each vertex in a graph, find its minimum neighbor'', or in parallel for each row in a matrix, sum the row''. Of course the algorithms are not this simple--they usually consist of many such parallel calls interleaved with operations that rearrange the order of a collection, and can be called recursively in parallel, as in Quicksort. This ability to operate in parallel over sets of data is often referred to as data-parallelism, and languages based on it are often referred to as data-parallel languages or collection-oriented languages. We note that many parallel languages have data-parallel features in conjunction with other forms of parallelism.

Before we come to the rash conclusion that data-parallel languages are the panacea for programming parallel algorithms, we make a distinction between flat and nested data-parallel languages. In flat data-parallel languages a function can be applied in parallel over a set of values, but the function itself must be sequential. In nested data-parallel languages any function can be applied over a set of values, including parallel functions. For example, the summation of each row of the matrix mentioned above could itself execute in parallel using a tree sum. We claim that the ability to nest parallel calls is critical for expressing algorithms in a way that matches our high-level intuition of how they work. In particular, nested parallelism can be used to implement nested loops and divide-and-conquer algorithms in parallel (five out of the seven algorithms described in this article use nesting in a crucial way).

IIRC, the team developing Fortress is putting nested data parallelism into Fortress. I think this is one of the most intriguing ways of getting heavy duty parallelism -- something that will grow ever more important as the number of cores in our computers grow.

## Comment viewing options

### Solid work

I haven't come across NESL before so the language is quite new to me. The basic definitions of work and time are exactly how most researchers attack the problem (normally as complexity / time-complexity) - but tying them into the actual language semantics is a really strong step.

As a language for experimenting with parallel algorithms this looks interesting, but as a language for teaching undergraduates parallel algorithms it looks amazing. Each of the time/work rules associated with the language are simple enough enough to teach the basic tradeoffs with. This is a very impressive formulation of the existing body of parallel techniques in a languauge.