archives

Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities

Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities
Luca Della Toffola, Michael Pradel, Thomas Gross
2015

Performance bugs are a prevalent problem and recent research proposes various techniques to identify such bugs. This paper addresses a kind of performance problem that often is easy to address but difficult to identify: redundant computations that may be avoided by reusing already computed results for particular inputs, a technique called memoization. To help developers find and use memoization opportunities, we present MemoizeIt, a dynamic analysis that identifies methods that repeatedly perform the same computation. The key idea is to compare inputs and outputs of method calls in a scalable yet precise way. To avoid the overhead of comparing objects at all method invocations in detail, MemoizeIt first compares objects without following any references and iteratively increases the depth of exploration while shrinking the set of considered methods. After each iteration, the approach ignores methods that cannot benefit from memoization, allowing it to analyze calls to the remaining methods in more detail. For every memoization opportunity that MemoizeIt detects, it provides hints on how to implement memoization, making it easy for the developer to fix the performance issue. Applying MemoizeIt to eleven real-world Java programs reveals nine profitable memoization opportunities, most of which are missed by traditional CPU time profilers, conservative compiler optimizations, and other existing approaches for finding performance bugs. Adding memoization as proposed by MemoizeIt leads to statistically significant speedups by factors between 1.04x and 12.93x.

This paper was recommended by Asumu Takikawa. It is a nice idea, which seems surprisingly effective. The examples they analysed (sadly they don't really explain how they picked the program to study) have a mix of memoization opportunities in fairly different parts of the codebase. There are several examples of what we could call "peripheral communication logic", eg. date formatting stuff, which we could assume is not studied really closely by the programmers focusing more on the problem-domain logic. But there is also an interesting of subtle domain-specific memoization opportunity: an existing cache was over-pessimistic and would reset itself at times where it was in fact not necessary, and this observation corresponds to a non-trivial program invariant.

The authors apparently had some difficulties finding program inputs to exercise profiling. Programs should more often be distributed with "performance-representative inputs", not just functionality-testing inputs. In one example of a linting tool, the provided "standard test" was to feed the code of the linting tool to itself. But this was under a default configuration for which the tools' developers had already fixed all alarms, so the alarm-creating code (which actually had an optimization opportunity) was never exercised by this proposed input.

Note that the caching performed is very lightweight, usually not a full tabulation of the function. Most examples just add a static variable to cache the last (input, output) pair, which is only useful when the same input is typically called several times in a row, but costs very little space.

Typed X (Racket, Clojure, Lua) just doesn't pan out?

I have used Typed Lua. I have kept an eye on Typed Racket, and Typed Clojure. Overall I currently get the impression that they somehow don't quite get over some hurdles that would allow them to really shine. And thus people who wanted to love and use them are leaving them instead.

A post today re: Typed Clojure echoes this previous one:

In September 2013 we blogged about why we’re supporting Typed Clojure, and you should too! Now, 2 years later, our engineering team has made a collective decision to stop using Typed Clojure (specifically the core.typed library).

I come not to bury Typed X, but to praise them. Can somebody please work on figuring out what it is that is missing or needs to be tweaked to make them more usable? (Anything from speed to culture.) Or should we truly conclude that trying to augment / paper over a dynamic ecosystem is just for the most part doomed to fail?