archives

BSGP: bulk-synchronous GPU programming

A SIGGRAPH paper by Qiming Hou, Kun Zhou, Baining Guo, abstract:

We present BSGP, a new programming language for general purpose computation on the GPU. A BSGP program looks much the same as a sequential C program. Programmers only need to supply a bare minimum of extra information to describe parallel processing on GPUs. As a result, BSGP programs are easy to read, write, and maintain. Moreover, the ease of programming does not come at the cost of performance. A well-designed BSGP compiler converts BSGP programs to kernels and combines them using optimally allocated temporary streams. In our benchmark, BSGP programs achieve similar or better performance than well-optimized CUDA programs, while the source code complexity and programming time are significantly reduced. To test BSGP's code efficiency and ease of programming, we implemented a variety of GPU applications, including a highly sophisticated X3D parser that would be extremely difficult to develop with existing GPU programming languages.

The language acts to simplify CUDA, which reminds me of assembly code even if it uses C syntax, with, among other things, a higher-level memory model and implicit data-flow (so you don't have to explicitly partition your code between different kernels). Here is one trick that really impressed me:

findFaces(int* pf, int* hd, int* ib, int n) {
  spawn(n*3) {
    rk = thread.rank;
    f = rk/3;  
    v = ib[rk];
    thread.sortby(v); 
    require
      owner = dtempnew[n]int;
    rk = thread.rank;
    pf[rk] = f;
    owner[rk] = v;
    barrier;
    if (rk == 0||owner[rk-1] != v)
      hd[v] = rk;
  }
}

After the call to sortby, all threads are sorted by rank according to the values of v, rather than explicitly sorting a list or some other auxiliary data structure that would have to be allocated into memory. In other words, the call forces a reality where all the threads are coincidentally arranged in the way we want them to be...an interesting PL concept.

ChucK : A Strongly timed language

Hi all,

I've been lurking here for a while - but this is my first post - so I hope I've got the
formatting right!

I thought perhaps this might be of interest to the community here:

ChucK : Strongly-timed, Concurrent, and On-the-fly Audio Programming Language

From the language specification:

"ChucK is a strongly-timed language, meaning that time is fundamentally embedded in the language. ChucK allows the programmer to explicitly reason about time from the code. This gives extremely flexible and precise control over time and (therefore) sound synthesis."

The idea of a programming language providing explicit support for being able to statically reason about how much real time a procedure requires (as opposed to it's O(n) time-complexity) seems new to me, and potentially of significant value in areas well outside ChucK's domain of audio processing i.e. real-time systems in general, and possibly even other application areas.

Your thoughts appreciated!

Implementation of Cardelli and Daan Leijen Style Record Systems?

I've been reading through "Extensible records with scoped labels", and I'll soon be rereading Cardelli's "Operations on Records." (should be on Citeseer)

My question is simple: are there language implementations, papers, web pages, lore, (anything?) on analyzing, optimizing and thus efficiently implementing the underlying machine data structures and record operator implementations that implement these elegant record systems?

And while I'm asking, any other LTU'ers interested in these record systems and have any insight to share?

Mucho thanks.

Scott