Staging with Delimited Control

Yukiyoshi Kameyama, Oleg Kiselyov, Chung-chieh Shan, Shifting the Stage: Staging with Delimited Control. PEPM2009.

It is often hard to write programs that are ef?cient yet reusable. For example, an efficient implementation of Gaussian elimination should be specialized to the structure and known static properties of the input matrix. The most profitable optimizations, such as choosing the best pivoting or memoization, cannot be expected of even an advanced compiler because they are specific to the domain, but expressing these optimizations directly makes for ungainly source code. Instead, a promising and popular way to reconcile efficiency with reusability is for a domain expert to write code generators. Two pillars of this approach are types and effects. Typed multilevel languages such as MetaOCaml ensure safety: a well-typed code generator neither goes wrong nor generates code that goes wrong. Side effects such as state and control ease correctness: an effectful generator can resemble the textbook presentation of an algorithm, as is familiar to domain experts, yet insert let for memoization and if for bounds-checking, as is necessary for efficiency. However, adding effects blindly renders multilevel types unsound...

Code generators have to be written in CPS or monadic style due to the need for let-insertion and if-insertion. This makes writing code generators unattractive to many programmers. The paper provides an alternative based on delimited continuations, and paper proposes a sound combination of staging and delimited continuations. The combination has been implemented. The authors show how pleasant it becomes to write interesting code generators (specializing dynamic programming and Gaussian elimination codes). The paper and the code for many generators is available online. I suggest reading the explanation of the problem, and then looking at the actual code generators.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I won't have time to read

I won't have time to read this for another couple of weeks, but this sentence suggests a hammer in search of nails:

Instead, a promising and popular way to reconcile efficiency with reusability is for a domain expert to write code generators. Two pillars of this approach are types and effects.

For example, synthesis/sketching techniques are friendly ways to avoid such reasoning without significantly moving beyond the traditional language. Arguably, the tools I'd expect to see in this paper might be useful for the person implementing the synthesis tools, but they should be out of sight for the domain expert.

I'm a little biased as many around me work on this approach, but thought it'd be worth bringing up as a contrary (?) language design approach to the same problem domain, as it was not mentioned in the related work. A reason could be because synthesis is generally flat -- you work on one layer -- while these approaches are more operational/imperative and thus discuss levels.

We do solve the documented problem

The claim that the paper offers a solution in search of a problem can
be refuted by reading the introduction. In fact, the first sentence
of the introduction sets our problem area as high-performance and
high-assurance embedded computing. The problems we solve have been
posed in the literature. I would recommend the Introduction sections of
papers published in `Science of Computer Programming':


http://www-rocq.inria.fr/~acohen/publications/CDGHKP06.ps.gz

and


http://www.cas.mcmaster.ca/~carette/metamonads/index.html

Most of the authors of the first paper are affiliated with
supercomputing centers of U.S., France and Germany.

I believe the papers cited in these introduction sections
(specifically, the papers on SPIRAL, ATLAS, telescopic languages and
active libraries; the Hume project) make a convincing case for code
generation and making the `compiler technology' available and
accessible to application programmers. The most profitable
optimizations are algorithmic optimizations, which are domain- and
problem-specific rather than general purpose. These optimizations,
which often rely on mathematical theorems, are unlikely to be
discovered automatically. They require a domain expert to recognize
and apply. Our ultimate goal is to give the experts the tools to build
their own optimizing compiler, so to speak, without giving up domain
abstractions or learning assembler.

I did not mean to make

I did not mean to make tuning sound like it wasn't a problem; one of this year's best paper awards in SC went to a labmate tuning GPUs and he has to work a lot for his results. My comment was more aimed towards the matching of multilevel programming supported by types & effect systems to building HPC (and, more generally, tuned) code.

I agree that tuned code is horrendous and code generation is necessary for exploration and good code: most tuners are manually written perl transformers of c and fortran. Correctness and clear expression are major challenges. I'm not as convinced that developers should be so tied to guiding the code generation as suggested by this paper for these two reasons.

I brought up sketching as an example of an alternative programming model. In the sketching approach, one provides a full but simple implementation (eg., the naive matrix multiply), and than a partial specification of the optimized implementation to allow the space of tuning parameters. The parameters may be general code (discovery as you described) or a restricted language (choice of functions, etc.). In terms of the problem statement, you get both correctness (well, more likely, testing up to the small universe assumption) and clear expression (you skip many of the hairy parts).

The approach of your paper is, in part, a natural cleanup of existing practices. However, unlike sketching, it helps with only a very limited form of correctness out of the box (and I am not convinced it is the part tuners are most desiring), and I'm not convinced that conciseness translates directly to productivity. Leaving the paper, I'm still not sure what language features are useful for this hpc/tuning -- perhaps we're both way off, and hierarchical memory should be reified in some control way. I see multilevel programming problems being explored here, not HPC nor tuning ones. Hence, the comment about sketch language implementer, who is doing code generation, but not the HPC/tuning consumer.

Perhaps these approaches are complementary. Sketching, as I've seen it, has been applied to tricky HPC algorithms and parallel codes -- but not to tuning. For example, the tuning would be over code generators. In addition, multilevel programming with sketching (or an amb-like construct) makes sense as often there are different transformations to be applied and different combinations of them.

I'm not critical of the technical contributions for MLP techniques, but of the link to go beyond the PL community. The ability to mix and match transforms given a combinator library would be a powerful step away from perl-land (though a DSL over a XOC-like system would also seem sufficient). I guess the main point is I'm curious to see lessons learned from future work in building systems with the linguistic approach :) It's hard enough to write these algorithms without the extra tuning meta-programming step, and the multi-core world makes it even worse.

Tuning is a solution, not a problem

Sketching is another way to reconcile efficiency with reusability, and we did not mean for our paper to imply that it is not promising or not popular. You seem to see sketching as "contrary (?)" and "alternative" to code generation, in which case it'd be unfair for us to mention sketching as related work but not every other approach to HPC or to reconciling efficiency with reusability. But it would be a tragedy for the two approaches to stop at citing each other or hypothesizing themselves to be "complementary". Instead, I'd like to relate the two HPC approaches in more detail by breaking them into parts.

Tuning and sketching both treat HPC as a search problem: the search space consists of specialized programs, and the goal is to find a specialized program that is correct and fast. Code generation is often used in the context of tuning (as in ATLAS and SPIRAL), but even when it is used alone for HPC, we can still regard it as expressing a search space with a strong restriction bias. Viewed this way, tuning and sketching each combine combinatorial optimization and code generation in a different way:

  • In tuning, the programmer expresses a set of presumed-correct programs, in which the search looks for a fast program.
  • In sketching, the programmer expresses a set of presumed-fast programs, in which the search looks for a correct program.
Whether a problem is suitable for tuning or sketching thus depends much on whether performance or correctness is easy to test and optimize. For example, correctness is easier to test when the number of possible inputs to the generated program is finite, so sketching also "helps with only a very limited form of correctness out of the box" -- to take another phrase from you, it's hard enough to write these algorithms without testing them for correctness, and infinite inputs make it even worse.

To express a set of programs is to write a code generator, whether in tuning or in sketching. Thus, I don't understand your remark (which I've heard elsewhere) that, in sketching, "synthesis is generally flat -- you work on one layer". A sketch has two levels like any other code generator: the holes are at the present stage whereas the literal expressions are at the future stage. The developer of a sketch is as "tied to guiding the code generation" as the developer of any other code generator. Adding holes to the language, as sketching does, is a priori no more "friendly ... without significantly moving beyond the traditional language" than adding brackets and escape.

To specialize the remarks above to our work (: On one hand, sketching shows us that nondeterminism ("an amb-like construct") is desirable in a code-generation language. This desire provides additional motivation for us to let control operators reach beyond the nearest generated binder. On the other hand, our work is about generating code with a variable number of binders, so it would be interesting to see how sketching might express examples of dynamic programming, such as gib in our paper. Given that sketching is a linguistic approach, such expression would be another lesson "in building systems with the linguistic approach", in addition to those lessons recounted in the introductory sections that Oleg already cited in his comments.