archives

Directly Reflective Meta-Programming

I was recently pointed to the following fascinating paper:

Directly Reflective Meta-Programming, Aaron Stump (HOSC 22(2), June 2009).

Existing meta-programming languages operate on encodings of programs as data. This paper presents a new meta-programming language, based on an untyped lambda calculus, in which structurally reflective programming is supported directly, without any encoding. The language features call-by-value and call-by-name lambda abstractions, as well as novel reflective features enabling the intensional manipulation of arbitrary program terms. The language is scope safe, in the sense that variables can neither be captured nor escape their scopes. The expressiveness of the language is demonstrated by showing how to implement quotation and evaluation operations, as proposed by Wand. The language's utility for meta-programming is further demonstrated through additional representative examples. A prototype implementation is described and evaluated.

Meta-programming without any encoding at all. The only minor drawback that I can see is that this language is untyped - and designing a type system for it would be extremely challenging.

Scan-based Data Parallel Languages

Given the resurgence of interest in data parallel languages due to ubiquity of GPUs in laptops and now even phones, as well as the upcoming AVX extensions to CPU short fixed-length SSE instructions, I thought there'd be interest in the following papers I've stumbled upon by Guy Blelloch.

First, scan primitives for vector computers, including segmented ones, can be efficiently implemented. Next, we can start to consider scans as primitive parallel operations in the sense of complexity models: each scan takes one step on an idealized vector machine.

Once you have constant time permutes, scatters, and gathers, you can do some interesting tricks. For example, for invertible recursive tree computations (lenses anybody?), we can compute them in O(1)! For some fun examples, see 5.5 of vector models for data-parallel computing, showing the trick for tree sums. (It references Leiserson's work on contractions).

When reading this work, it is worth thinking about hardware trends: work efficiency is important (parallelism is often important due to the power wall), the cost of indexed vector operations (contiguous sequential access may be better than indexed parallel), and the limited and fixed sizes of many vector units (e.g,, SSE). Finally, just like for memory, irregularity in the computation -- e.g., instruction divergence -- often makes the theoretical models hard to apply without losing to constant overheads. Indeed, aligning trends in CPUs and GPUs with these models seems tricky for the practicing parallel programmer, leading to studies showing CPU versions of GPU programs are often comparable.

Finally, I've found it interesting that the work of Keller et al in Data Parallel Haskell and GPU spin-offs, while following the NESL transformation into scan primitives model, does not appear to do so in the advised manner for irregular structures like trees. I've been increasingly wondering why.