This work is motivated by the following considerations
- Parallelism in scientific computing is often of the data-parallel variety: a program has de facto sequential semantics, but operating on objects that are distributed over many locales. I'm not aware of much theory about this: theoretical computer science usually takes a CSP approach to parallelism, where each process does its own thing, and other processes are a source of messages that it reacts to.
- In most contemporary processors data movement is very costly, more so than operations. There are few models that explicitly take into account that data is somewhere, and you can not just access data without making sure it's actually close to you.
- There are many `modes' of parallelism these days: multi-threading, co-processor offloading, distributed memory message passing. A code written for one mode is impossible to translate to another. So I wanted to have a mode-independent way of describing algorithms that could be `realized' in terms of existing systems.
Here is my theory story. I have prototype software, and you can take my word for it that (within its limited functionality) it is as efficient as hand-coded mode-specific software.
pdf document, slightly updated
Feedback much appreciated.
Recent comments
20 weeks 1 day ago
20 weeks 1 day ago
20 weeks 1 day ago
42 weeks 2 days ago
46 weeks 4 days ago
48 weeks 1 day ago
48 weeks 1 day ago
50 weeks 6 days ago
1 year 3 weeks ago
1 year 3 weeks ago