This work is motivated by the following considerations
- Parallelism in scientific computing is often of the data-parallel variety: a program has de facto sequential semantics, but operating on objects that are distributed over many locales. I'm not aware of much theory about this: theoretical computer science usually takes a CSP approach to parallelism, where each process does its own thing, and other processes are a source of messages that it reacts to.
- In most contemporary processors data movement is very costly, more so than operations. There are few models that explicitly take into account that data is somewhere, and you can not just access data without making sure it's actually close to you.
- There are many `modes' of parallelism these days: multi-threading, co-processor offloading, distributed memory message passing. A code written for one mode is impossible to translate to another. So I wanted to have a mode-independent way of describing algorithms that could be `realized' in terms of existing systems.
Here is my theory story. I have prototype software, and you can take my word for it that (within its limited functionality) it is as efficient as hand-coded mode-specific software.
pdf document, slightly updated
Feedback much appreciated.
Recent comments
22 weeks 6 days ago
22 weeks 6 days ago
22 weeks 6 days ago
45 weeks 23 hours ago
49 weeks 2 days ago
51 weeks 2 hours ago
51 weeks 2 hours ago
1 year 1 week ago
1 year 6 weeks ago
1 year 6 weeks ago