archives

Incompleteness Theorems: The Logical Necessity of Inconsistency

Some readers might be interested in this article which is published at

http://knol.google.com/k/carl-hewitt-see-http-carlhewitt-info/incompleteness-theorems/pcxtp4rx7g1t/28#

where a link to the video of a Stanford Logic colloquium on the article is available.

In the video, ⊢ is proposed as the "ultimate" :-)

Towards general nested data parallelism

There is quite a performance difference between a flattening nested data parallelism system like the seminal NESL language that transforms nested vector manipulations into flat Cray vector instructions and multicore languages like Cilk that achieve coarser parallelism but for more general computations. As explored by Guy Blelloch in the mid-90s, work-stealing techniques can also be used for coarse-grained parallelism in constrained vectorizing languages. However, the reverse is not obvious: can we apply flattening techniques for fine-grained parallelism to general data parallel languages, not just for manipulations of the restrictive nested vector datatype?

Two papers provide a semantic foundation for bridging this gap. First, Keller shows how to naturally support irregular structures like trees (paper but her thesis is cleaner): add recursive types and encode them by representing every (entire!) level of unrolling by a segmented vector. Second, Prins showed how to include higher-order values in a first-class way with the NESL-era language Proteous. Finally, Chakravarty and Keller followed-up on their u-type work to essentially clean up the rest of the expressiveness portion in their succinct paper More Types for Nested Data Parallelism.

With such a foundation, we reach Nepal, rebooted as Data Parallel Haskell, and Manticore (from the ML camp): flattening for general functional languages. However, it looks life isn't so easy (spj on youtube): none of these embeddings are actually using vector/SIMD instructions, both primarily focus on multicore and work-stealing (which does not require flattening), and they still struggle to beat C (sequential breakpoint / strong scaling issues). One of my guesses is that, while we now have a fundamental automated transformation for flattening richer structures, we have many more necessary optimizations HPC coders typically do for them. For example (and sparking my own interest), the breadth-first layout of recursively typed structures is inappropriate for common tree access patterns.