User loginNavigation |
BSGP: bulk-synchronous GPU programmingA SIGGRAPH paper by Qiming Hou, Kun Zhou, Baining Guo, abstract:
The language acts to simplify CUDA, which reminds me of assembly code even if it uses C syntax, with, among other things, a higher-level memory model and implicit data-flow (so you don't have to explicitly partition your code between different kernels). Here is one trick that really impressed me:
findFaces(int* pf, int* hd, int* ib, int n) {
spawn(n*3) {
rk = thread.rank;
f = rk/3;
v = ib[rk];
thread.sortby(v);
require
owner = dtempnew[n]int;
rk = thread.rank;
pf[rk] = f;
owner[rk] = v;
barrier;
if (rk == 0||owner[rk-1] != v)
hd[v] = rk;
}
}
After the call to sortby, all threads are sorted by rank according to the values of v, rather than explicitly sorting a list or some other auxiliary data structure that would have to be allocated into memory. In other words, the call forces a reality where all the threads are coincidentally arranged in the way we want them to be...an interesting PL concept. By Sean McDirmid at 2009-12-10 00:26 | LtU Forum | previous forum topic | next forum topic | other blogs | 9302 reads
|
Browse archives
Active forum topics |
Recent comments
4 weeks 5 days ago
4 weeks 6 days ago
5 weeks 21 hours ago
5 weeks 21 hours ago
5 weeks 5 days ago
5 weeks 5 days ago
5 weeks 5 days ago
8 weeks 6 days ago
9 weeks 4 days ago
9 weeks 5 days ago