Scan-based Data Parallel Languages
Given the resurgence of interest in data parallel languages due to ubiquity of GPUs in laptops and now even phones, as well as the upcoming AVX extensions to CPU short fixed-length SSE instructions, I thought there'd be interest in the following papers I've stumbled upon by Guy Blelloch.
First, scan primitives for vector computers, including segmented ones, can be efficiently implemented. Next, we can start to consider scans as primitive parallel operations in the sense of complexity models: each scan takes one step on an idealized vector machine.
Once you have constant time permutes, scatters, and gathers, you can do some interesting tricks. For example, for invertible recursive tree computations (lenses anybody?), we can compute them in O(1)! For some fun examples, see 5.5 of vector models for data-parallel computing, showing the trick for tree sums. (It references Leiserson's work on contractions).
When reading this work, it is worth thinking about hardware trends: work efficiency is important (parallelism is often important due to the power wall), the cost of indexed vector operations (contiguous sequential access may be better than indexed parallel), and the limited and fixed sizes of many vector units (e.g,, SSE). Finally, just like for memory, irregularity in the computation -- e.g., instruction divergence -- often makes the theoretical models hard to apply without losing to constant overheads. Indeed, aligning trends in CPUs and GPUs with these models seems tricky for the practicing parallel programmer, leading to studies showing CPU versions of GPU programs are often comparable.
Finally, I've found it interesting that the work of Keller et al in Data Parallel Haskell and GPU spin-offs, while following the NESL transformation into scan primitives model, does not appear to do so in the advised manner for irregular structures like trees. I've been increasingly wondering why.
Active forum topics
New forum topics