Incremental computation with divide and conquer memoization

Memoization is an interesting technique that could be used to implement a purely functional spreadsheet, without the need of maintaining a dependency graph. I've written a small article on the subject.

There are many ways to implement incremental computation. One popular method - implemented by spreadsheets - is to keep an up-to-date topologically sorted dependency graph of all cells. 
After modification of a cell, a spreadsheet can optimally determine which cells need to be recalculated (without recalculating everything) given its dependency graph.

In this short paper we will discuss another approach that is based on memoization. 
An example is given that uses a divide and conquer algorithm to maximally re-use sums of integers. Next to that, we sketch a possible implementation based on uniquely represented confluently persistent treaps.

edit: small typo

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Memoization is a common

Memoization is a common technique for efficient computation in self-adjusting computation, interactive programming, and various reactive models.

But memoization has its weaknesses:

  1. In a long-running system, it is difficult to determine how long to keep memoized values around.
  2. For simple functions, space management overheads can be orders of magnitude larger than the CPU savings.
  3. Memoization is difficult to apply in many cases. Not all types have a computable total ordering.

(Space management overheads include but are not limited to allocation, cache management, paging, GC, index, search or lookup.)

I recall reading an article on FrTime perhaps six years ago that described how eliminating implicit intermediate caching where feasible resulted in about two orders of magnitude performance improvement. While caching isn't quite the same as memoization, it is similar enough that such numbers should be alarming.

For long-running programs, I've been considering various approaches involving windowed memoization or exponential decay, and the possibility of expressing deep-memoized functions as a monad or arrow. But this sort of optimization is low priority for me.

The more nodes you have in

The more nodes you have in your dataflow graph, the slower your batch performance and you only save big if a change can be isolated to one mode. Coarser grained dataflow graphs make more sense, and Superglue had a massive performance improvement when we stopped memoizing EVERYTHING. I prefer the "two for the price" of one technique of just specifying memoization points explicitly in the program rather than doing it transparently.

Explicit memoization is the

Explicit memoization is the way to go, I agree.

Sirea is a push-based implementation, and I currently use implicit caching at every `merge` and `zip` node (where two signals are combined). Did you have a technique to avoid that in Superglue? (edit: I vaguely recall - did you simply re-compute the entire graph every 'step'?)

(I've been thinking about using a type layer to compile certain behavior subgraphs into straight-line Haskell functions, or even into CUDA functions. But, I keep reminding myself, now is not the time to think about further optimization. :)

So the last version of

So the last version of SuperGlue performed no memoization at all implicitly. It just had cells (explicit state) and sinks/sources from the UI library, which constituted the entire state of a SuperGlue program. Recomputation was only performed on a connection into a sink, so you weren't exactly re-evaluating the entire program when something changed, just the graph that happened to connect that sink.

purely functional spreadsheets

I also don't advocate transparent memoization for all functions. I'm also pretty sure that spreadsheets don't hold all memoized values indefinitely.
The neat thing is that you don't have to: it won't change the semantics of a program. Of course, throwing away memoized values does have a major impact on running time.
That said, I'm trying to implement an efficient purely functional spreadsheet engine without using dependency graphs explicitly. So for me, memoization is not about efficiency but rather about implementation.

memoization is not about

memoization is not about efficiency but rather about implementation

Those aren't usually exclusive. :)

Memoization should never

Memoization should never have a semantic consequence, though we can and have argued about glitching caused by inconsistent buffering (which the sorted graph is supposed to avoid).

But memoization has its

But memoization has its weaknesses

I haven't yet had the time to read the article presented in this post, but my gut feeling is that if you want to "do memoization right" (for example evaluate smartly when to garbage-collect memoized data) you ultimately will want to make the dependency graph explicit (what the author presented as the common solution he wanted to avoid).

(This isn't directly related to what you said in your reply, I'm just jumping on this sentence there.)

There are a lot more points

There are a lot more points in the design space than you've covered, but its nice to see people getting interested in this again.

BTW, the correct term for divide-and-conquer sum is "partial aggregation." It especially becomes relevant in the MapReduce world, but as the "two for the price one" paper told us, incrementalism and parallelism are fairly similar in their decomposition requirements.

Ideally, all "optimization" decisions should be tested.

Compilers are fiercely complex pieces of software, and I recognize that actually testing whether optimizations are beneficial is additional complexity to pile onto them. But it's necessary, IMO, because actually testing whether optimizations are really beneficial is also known as being grounded in reality.

Compilers do a lot of stuff on faith and belief, according to someone's best guesses about how to optimize for a system that is never precisely the system the code will run on, without actually testing whether those guesses are relevant or productive on the current machine.

For a laughably extreme example, fire up a compiler a few generations old, and it will cheerfully do Pentium-specific optimizations including coding around the thrice-damned FDIV bug, and optimize for a 2Mbyte cache and 512Mbyte memory. And that code will run, albeit far more slowly than it could, on your 4-processor quad-core IA64 machine. It may be a faster compilation for not checking, but the code it compiles will waste time *EVERY* time it's run.

Anyway; compilers should be sensitive to actual measured performance. In an architecture where memoization hinders rather than helps, the simple fact that code compiled one way runs slower than code compiled another way should be detected, and result in that information getting stored somewhere (and occasionally rechecked just to make sure) so the compiler knows not to do memoization by default.

All optimizations should be shown to carry their own weight.


But doesn't far more slower

But doesn't far more slower code compiled by the old compiler just a few percentage points slower, or maybe at most 20%?

Memoization saves lots of times in cases where computation needs to be incremental. So much so that you can't live without it. Partial aggregation also, so you are already going to code this by hand when you need it. The small document isn't talking about these as transparently applied optimizations, you have to know what you are doing and buy into it.