Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities

Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities
Luca Della Toffola, Michael Pradel, Thomas Gross
2015

Performance bugs are a prevalent problem and recent research proposes various techniques to identify such bugs. This paper addresses a kind of performance problem that often is easy to address but difficult to identify: redundant computations that may be avoided by reusing already computed results for particular inputs, a technique called memoization. To help developers find and use memoization opportunities, we present MemoizeIt, a dynamic analysis that identifies methods that repeatedly perform the same computation. The key idea is to compare inputs and outputs of method calls in a scalable yet precise way. To avoid the overhead of comparing objects at all method invocations in detail, MemoizeIt first compares objects without following any references and iteratively increases the depth of exploration while shrinking the set of considered methods. After each iteration, the approach ignores methods that cannot benefit from memoization, allowing it to analyze calls to the remaining methods in more detail. For every memoization opportunity that MemoizeIt detects, it provides hints on how to implement memoization, making it easy for the developer to fix the performance issue. Applying MemoizeIt to eleven real-world Java programs reveals nine profitable memoization opportunities, most of which are missed by traditional CPU time profilers, conservative compiler optimizations, and other existing approaches for finding performance bugs. Adding memoization as proposed by MemoizeIt leads to statistically significant speedups by factors between 1.04x and 12.93x.

This paper was recommended by Asumu Takikawa. It is a nice idea, which seems surprisingly effective. The examples they analysed (sadly they don't really explain how they picked the program to study) have a mix of memoization opportunities in fairly different parts of the codebase. There are several examples of what we could call "peripheral communication logic", eg. date formatting stuff, which we could assume is not studied really closely by the programmers focusing more on the problem-domain logic. But there is also an interesting of subtle domain-specific memoization opportunity: an existing cache was over-pessimistic and would reset itself at times where it was in fact not necessary, and this observation corresponds to a non-trivial program invariant.

The authors apparently had some difficulties finding program inputs to exercise profiling. Programs should more often be distributed with "performance-representative inputs", not just functionality-testing inputs. In one example of a linting tool, the provided "standard test" was to feed the code of the linting tool to itself. But this was under a default configuration for which the tools' developers had already fixed all alarms, so the alarm-creating code (which actually had an optimization opportunity) was never exercised by this proposed input.

Note that the caching performed is very lightweight, usually not a full tabulation of the function. Most examples just add a static variable to cache the last (input, output) pair, which is only useful when the same input is typically called several times in a row, but costs very little space.

By gasche at 2016-01-12 16:42 | Implementation | other blogs | 21257 reads

JIT, take me away

Nice. Somebody put that into a JIT for me so I can keep not having to think about it and yet eat my cake too. ;-) Seems like languages that lean towards referential transparency should try to build this in ASAP just in terms of marketing themselves.

By raould at Tue, 2016-01-12 18:37 | login or register to post comments

Miranda or Hugs had this

Miranda or Hugs has/had this. You could check because running the same computation twice costed less.

But the size of the memoization table is prohibitive to using it in practice.

By marco at Tue, 2016-01-12 18:49 | login or register to post comments

This is a very neat idea.

This is a very neat idea. Sort of a profiler for memoization. I imagine one could also determine useful static memo-tables (from a good set of representative inputs) or automatically select a good memoization strategy (e.g. last item vs. windowed history vs. exponential decay).

I'd love to see similar profiling efforts for discovering the best places to inject parallelism (assuming opportunities for safe parallelism are prevalent in a language).

By dmbarbour at Tue, 2016-01-12 20:54 | login or register to post comments

playing with scope of problem statement

A similar tool focused on a "layer" of state might be better than one focused on recomputation per function. I'll explain what that summary means next.

When I worked on some persistent storage systems (more than one flavor of ad hoc database), I noticed memoization for performance reasons had some model locality: backend and frontend needed to do their own memoization, partly because of encapsulation. Closer to storage you want to minimize high latency i/o calls. And close to UI display you want to minimize gratuitous transformation of representations.

Each layer can model a sort of working set, representing a cache, whose state can mutate with associated costs. If you can re-arrange some of the api traffic into idempotent reads, they don't change the cache state, and so have low cost, which is obviously the point of memoization just said differently.

I'm done, so here's a relevant anecdote you can ignore. The UI of an app used by many folks had a model of content that was matrix-oriented: displayed lines of content arranged in rows of attributes for items that had identity (like an email address). So when rendering, the display algorithm iterated over rows, then cells within a single row. For each cell, it looked up the same database entry, N times for N displayed cells, then pulled out the attribute for that cell. This accessed the same btree N times to provide the same answer. So a cache size of one reduced database traffic by a factor of N. (Obviously a somewhat bigger cache is better.)

Stu: What does this have to do with languages? Adding layer to PL types?
Wil: Yes.
Stu: That's absurd. We never represent layers in languages.
Wil: Okay.

By Rys McCusker at Tue, 2016-01-12 23:30 | login or register to post comments

Lambda the Ultimate

User login

Navigation