Accelerating Haskell Array Codes with Multicore GPUs

So, I stumbled on to a rather pleasant surprise today.

Current GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism. Good performance requires highly idiomatic programs, whose development is work intensive and requires expert knowledge.

To raise the level of abstraction, we propose a domain-specific high-level language of array computations that captures appropriate idioms in the form of collective array operations. We embed this purely functional array language in Haskell with an online code generator for NVIDIA's CUDA GPGPU programming environment. We regard the embedded language's collective array operations as algorithmic skeletons; our code generator instantiates CUDA implementations of those skeletons to execute embedded array programs.

This paper outlines our embedding in Haskell, details the design and implementation of the dynamic code generator, and reports on initial benchmark results. These results indicate that we can compete with moderately optimised native CUDA code, while enabling much simpler source programs.

I've been waiting for this paper for a pretty long time. I've known about the project for a while, but sadly had not seen too much on the implementation side of things. But as of a few weeks ago, now has a CUDA backend (!!). I'm really excited, as I've been
waiting for a nice Haskell GPGPU binding for a while, and now, a usable proof of concept is here. What do you all think of it? Any interesting killer apps in mind? I'm personally thinking of some fun music programming stuff--the timing of this is excellent, you see, as the realtime kernel in Ubuntu Lucid is now usable enough to work pretty stably with my Firewire audio interface, so I have started playing around with all the lovely free Linux audio software. However, because of this recent revelation, I may have to start work on a GPGPU accelerated version of a synthesizer module for Haskore. But, I digress. Discuss!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

This work is very similar to

This work is very similar to other existing high-level GPU programming projects in terms of what you can express (see their related work); the distinction is often only at the level of what their current targets are (directx version whatever vs cuda vs opencel). The rest is largely noise about choice in embedded DSL strategy (important, but not seemingly that significant relative to using lower-level languages).

BUT... there is an interesting comment when comparing to Nikola: "Nikola does not support generative functions, such as replicate, whose memory requirements can't be statically determined by Nikola's size inference." It's not clear to me how this paper entirely gets around this as well (or is it limited, in that the generated content must be related to some input array content?)

Also, it is interesting that Manual Chakravarty and Gabriele Keller are the first authors in that they do not use their approach for supporting interesting data types beyond the traditional NESL (well, Fortran?) trick of array-of-structs as struct-of-arrays.



I'm certainly interested in seeing more support for SIMD and GPGPU.