Causal Commutative Arrows and Their Optimization

Causal Commutative Arrows and Their Optimization, Hai Liu. Eric Cheng. Paul Hudak. ICFP 2009.

Arrows are a popular form of abstract computation. Being more general than monads, they are more broadly applicable, and in particular are a good abstraction for signal processing and dataflow computations. Most notably, arrows form the basis for a domain specific language called Yampa, which has been used in a variety of concrete applications, including animation, robotics, sound synthesis, control systems, and graphical user interfaces.

Our primary interest is in better understanding the class of abstract computations captured by Yampa. Unfortunately, arrows are not concrete enough to do this with precision. To remedy this situation we introduce the concept of commutative arrows that capture a kind of non-interference property of concurrent computations. We also add an init operator, and identify a crucial law that captures the causal nature of arrow effects. We call the resulting computational model causal commutative arrows.

To study this class of computations in more detail, we define an extension to the simply typed lambda calculus called causal commutative arrows (CCA), and study its properties. Our key contribution is the identification of a normal form for CCA called causal commutative normal form (CCNF). By defining a normalization procedure we have developed an optimization strategy that yields dramatic improvements in performance over conventional implementations of arrows. We have implemented this technique in Haskell, and conducted benchmarks that validate the effectiveness of our approach. When combined with stream fusion, the overall methodology can result in speed-ups of greater than two orders of magnitude.

One way of understanding what is going on in this paper is that in terms of dataflow programming, FRP programs correspond to programs with single-entry, single-exit dataflow graphs. This means that none of the internal dataflow nodes in an FRP program are actually necessary -- you can coalesce all those nodes into a single node while preserving the observable behavior. (They briefly touch on this point when they mention that synchronous languages try to compile to "single loop code".) What's very slick is that they have a nice normal form procedure that (a) is defined entirely in terms of their high-level language, and (b) always yields code corresponding to the the coalesced dataflow graph. It's an elegant demonstration of the power of equational reasoning.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Going by your description,

Going by your description, for (uncited) previous work in compacting computations into one function call, see "Lowering: A Static Optimization Technique for Functional Reactive Languages." (2007) Similar stuff throughout the SDF community (compiling to state machines, which enables run-of-the-mill compiler optimizations).

Reasoning here is also useful for when you *can't* do the compaction (e.g., in FRP languages that support asynchrony or side effects), and for optimizations beyond this one (e.g., parallelism). Always fun to see what this groups comes out with :)

Thanks for pointing out the

Thanks for pointing out the missed related work. A notable difference is that lowering can't handle stateful computation.

Also I'd disagree with OP's description "none of the internal dataflow nodes in an FRP program are actually necessary", since it needs both pure and stateful computations. It's evident from the CCNF that a mimimal dataflow computation consists of exactly one of each of the following components: pure computation (arr), states (init), recursion (loop), sequential composition (>>>), and parallel composition (second).

Disclaimer: I'm the first author of the CCA paper.

Thanks for commenting!

It's evident from the CCNF that a mimimal dataflow computation consists of exactly one of each of the following components: pure computation (arr), states (init), recursion (loop), sequential composition (>>>), and parallel composition (second).

This is in fact what made me think that internal dataflow nodes weren't necessary. :) That is, if you compile a normal form to some sequential, imperative code to hand off to an event loop, then:

1) the event loop implements the outer recursion,
2) the sequential composition becomes sequential composition in the sequential language,
3) state is implemented by, er, state, and
4) parallel composition can be implemented by having multiple variables in the sequential language.

So it looks to me like you guys can escape the need to implement a change-propagation mechanism, which is pretty cool. OTOH, with something like a spreadsheet, where you've got a dataflow network where any node (ie, cell) can be modified at any time, there's no way to avoid building that mechanism.

Is this right, or did I miss something important?

Anyway, I enjoyed reading your paper quite a bit!

By change propagation, do

By change propagation, do you mean the problem of adjusting the output when only part of the input changes? No, CCA doesn't solve this problem (of incremental computation), although I think it may simplify the dependency analysis used by those techniques. Anyway, it's an orthogonal problem to what we considered in the paper.

The spreadsheet problem you described actually looks like a simple dataflow with no internal states, i.e., it is a pure function mapping from all cell's old values to new values.

Change propagation vs.

Change propagation vs. recomputation is tricky. E.g., Kim's work showed compaction is often a huge win. However, in Flapjax, an important chunk of the library was imperatively interfacing with the DOM (browser document) API. When to switch between these is unclear and it is not obvious if static knowledge is sufficient: which way to go might depend on if/how reevaluation occurs, which I think was Neel's point. Haven't seen an examination of annotations vs. static inference vs. dynamic feedback; would probably be too much of a niche interest..

There is a lot more work along these lines in related communities, such as pipeline optimizations to better use limited hardware resources. I suspect there's a lot of room for FRP systems to grow implementation-wise by adapting these techniques and maybe even doing them better.

Video of Presentation

One of my favourite ICFP '09 papers

I see it as a variation on the old classic On Folk Theorems.


I've missed that paper - thanks for pointing it out! That's simply hilarious, and yes - I do see the connection.