User loginNavigation |
BSGP: bulk-synchronous GPU programmingA SIGGRAPH paper by Qiming Hou, Kun Zhou, Baining Guo, abstract:
The language acts to simplify CUDA, which reminds me of assembly code even if it uses C syntax, with, among other things, a higher-level memory model and implicit data-flow (so you don't have to explicitly partition your code between different kernels). Here is one trick that really impressed me: findFaces(int* pf, int* hd, int* ib, int n) { spawn(n*3) { rk = thread.rank; f = rk/3; v = ib[rk]; thread.sortby(v); require owner = dtempnew[n]int; rk = thread.rank; pf[rk] = f; owner[rk] = v; barrier; if (rk == 0||owner[rk-1] != v) hd[v] = rk; } } After the call to sortby, all threads are sorted by rank according to the values of v, rather than explicitly sorting a list or some other auxiliary data structure that would have to be allocated into memory. In other words, the call forces a reality where all the threads are coincidentally arranged in the way we want them to be...an interesting PL concept. By Sean McDirmid at 2009-12-10 00:26 | LtU Forum | previous forum topic | next forum topic | other blogs | 9022 reads
|
Browse archives
Active forum topics |
Recent comments
27 weeks 2 days ago
27 weeks 2 days ago
27 weeks 2 days ago
49 weeks 3 days ago
1 year 1 week ago
1 year 3 weeks ago
1 year 3 weeks ago
1 year 5 weeks ago
1 year 10 weeks ago
1 year 10 weeks ago