User loginNavigation |
BSGP: bulk-synchronous GPU programmingA SIGGRAPH paper by Qiming Hou, Kun Zhou, Baining Guo, abstract:
The language acts to simplify CUDA, which reminds me of assembly code even if it uses C syntax, with, among other things, a higher-level memory model and implicit data-flow (so you don't have to explicitly partition your code between different kernels). Here is one trick that really impressed me: findFaces(int* pf, int* hd, int* ib, int n) { spawn(n*3) { rk = thread.rank; f = rk/3; v = ib[rk]; thread.sortby(v); require owner = dtempnew[n]int; rk = thread.rank; pf[rk] = f; owner[rk] = v; barrier; if (rk == 0||owner[rk-1] != v) hd[v] = rk; } } After the call to sortby, all threads are sorted by rank according to the values of v, rather than explicitly sorting a list or some other auxiliary data structure that would have to be allocated into memory. In other words, the call forces a reality where all the threads are coincidentally arranged in the way we want them to be...an interesting PL concept. By Sean McDirmid at 2009-12-10 00:26 | LtU Forum | previous forum topic | next forum topic | other blogs | 8087 reads
|
Browse archivesActive forum topics |
Recent comments
9 hours 58 min ago
20 hours 40 min ago
1 day 4 hours ago
1 day 5 hours ago
1 day 6 hours ago
1 day 8 hours ago
1 day 9 hours ago
1 day 9 hours ago
1 day 14 hours ago
1 day 18 hours ago