Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities

Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities
Luca Della Toffola, Michael Pradel, Thomas Gross

Performance bugs are a prevalent problem and recent research proposes various techniques to identify such bugs. This paper addresses a kind of performance problem that often is easy to address but difficult to identify: redundant computations that may be avoided by reusing already computed results for particular inputs, a technique called memoization. To help developers find and use memoization opportunities, we present MemoizeIt, a dynamic analysis that identifies methods that repeatedly perform the same computation. The key idea is to compare inputs and outputs of method calls in a scalable yet precise way. To avoid the overhead of comparing objects at all method invocations in detail, MemoizeIt first compares objects without following any references and iteratively increases the depth of exploration while shrinking the set of considered methods. After each iteration, the approach ignores methods that cannot benefit from memoization, allowing it to analyze calls to the remaining methods in more detail. For every memoization opportunity that MemoizeIt detects, it provides hints on how to implement memoization, making it easy for the developer to fix the performance issue. Applying MemoizeIt to eleven real-world Java programs reveals nine profitable memoization opportunities, most of which are missed by traditional CPU time profilers, conservative compiler optimizations, and other existing approaches for finding performance bugs. Adding memoization as proposed by MemoizeIt leads to statistically significant speedups by factors between 1.04x and 12.93x.

This paper was recommended by Asumu Takikawa. It is a nice idea, which seems surprisingly effective. The examples they analysed (sadly they don't really explain how they picked the program to study) have a mix of memoization opportunities in fairly different parts of the codebase. There are several examples of what we could call "peripheral communication logic", eg. date formatting stuff, which we could assume is not studied really closely by the programmers focusing more on the problem-domain logic. But there is also an interesting of subtle domain-specific memoization opportunity: an existing cache was over-pessimistic and would reset itself at times where it was in fact not necessary, and this observation corresponds to a non-trivial program invariant.

The authors apparently had some difficulties finding program inputs to exercise profiling. Programs should more often be distributed with "performance-representative inputs", not just functionality-testing inputs. In one example of a linting tool, the provided "standard test" was to feed the code of the linting tool to itself. But this was under a default configuration for which the tools' developers had already fixed all alarms, so the alarm-creating code (which actually had an optimization opportunity) was never exercised by this proposed input.

Note that the caching performed is very lightweight, usually not a full tabulation of the function. Most examples just add a static variable to cache the last (input, output) pair, which is only useful when the same input is typically called several times in a row, but costs very little space.

Progress on Gradual Typing

Among many interesting works, the POPL 2016 papers have a bunch of nice articles on Gradual Typing.

The Gradualizer: a methodology and algorithm for generating gradual type systems

The Gradualizer: a methodology and algorithm for generating gradual type systems
by Matteo Cimini, Jeremy Siek

Many languages are beginning to integrate dynamic and static typing. Siek and Taha offered gradual typing as an approach to this integration that provides the benefits of a coherent and full-span migration between the two disciplines. However, the literature lacks a general methodology for designing gradually typed languages. Our first contribution is to provide such a methodology insofar as the static aspects of gradual typing are concerned: deriving the gradual type system and the compilation to the cast calculus.

Based on this methodology, we present the Gradualizer, an algorithm that generates a gradual type system from a normal type system (expressed as a logic program) and generates a compiler to the cast calculus. Our algorithm handles a large class of type systems and generates systems that are correct with respect to the formal criteria of gradual typing. We also report on an implementation of the Gradualizer that takes type systems expressed in lambda-prolog and outputs their gradually typed version (and compiler to the cast calculus) in lambda-prolog.

One can think of the Gradualizer as a kind of meta-programming algorithm that takes a type system in input, and returns a gradual version of this type system as output. I find it interesting that these type systems are encoded as lambda-prolog programs (a notable use-case for functional logic programming). This is a very nice way to bridge the gap between describing a transformation that is "in principle" mechanizable and a running implementation.

An interesting phenomenon happening once you want to implement these ideas in practice is that it forced the authors to define precisely many intuitions everyone has when reading the description of a type system as a system of inference rules. These intuitions are, broadly, about the relation between the static and the dynamic semantics of a system, the flow of typing information, and the flow of values; two occurrences of the same type in a typing rule may play very different roles, some of which are discussed in this article.

Is Sound Gradual Typing Dead?

Is Sound Gradual Typing Dead?
by Asumu Takikawa, Daniel Feltey, Ben Greenman, Max New, Jan Vitek, Matthias Felleisen

Programmers have come to embrace dynamically typed languages for prototyping and delivering large and complex systems. When it comes to maintaining and evolving these systems, the lack of explicit static typing becomes a bottleneck. In response, researchers have explored the idea of gradually typed programming languages which allow the post-hoc addition of type annotations to software written in one of these “untyped” languages. Some of these new hybrid languages insert run-time checks at the boundary between typed and untyped code to establish type soundness for the overall system. With sound gradual typing programmers can rely on the language implementation to provide meaningful error messages when “untyped” code misbehaves.

While most research on sound gradual typing has remained theoretical, the few emerging implementations incur performance overheads due to these checks. Indeed, none of the publications on this topic come with a comprehensive performance evaluation; a few report disastrous numbers on toy benchmarks. In response, this paper proposes a methodology for evaluating the performance of gradually typed programming languages. The key is to explore the performance impact of adding type annotations to different parts of a software system. The methodology takes takes the idea of a gradual conversion from untyped to typed seriously and calls for measuring the performance of all possible conversions of a given untyped benchmark. Finally the paper validates the proposed methodology using Typed Racket, a mature implementation of sound gradual typing, and a suite of real-world programs of various sizes and complexities. Based on the results obtained in this study, the paper concludes that, given the state of current implementation technologies, sound gradual typing is dead. Conversely, it raises the question of how implementations could reduce the overheads associated with ensuring soundness and how tools could be used to steer programmers clear from pathological cases.

In a fully dynamic system, typing checks are often superficial (only the existence of a particular field is tested) and done lazily (the check is made when the field is accessed). Gradual typing changes this, as typing assumptions can be made earlier than the value is used, and range over parts of the program that are not exercised in all execution branches. This has the potentially counter-intuitive consequence that the overhead of runtime checks may be sensibly larger than for fully-dynamic systems. This paper presents a methodology to evaluate the "annotation space" of a Typed Racket program, studying how the possible choices of which parts to annotate affect overall performance.

Many would find this article surprisingly grounded in reality for a POPL paper. It puts the spotlight on a question that is too rarely discussed, and could be presented as a strong illustration of why it matters to be serious about implementing our research.

Abstracting Gradual Typing

Abstracting Gradual Typing
by Ronald Garcia, Alison M. Clark, Éric Tanter

Language researchers and designers have extended a wide variety of type systems to support gradual typing, which enables languages to seamlessly combine dynamic and static checking. These efforts consistently demonstrate that designing a satisfactory gradual counterpart to a static type system is challenging, and this challenge only increases with the sophistication of the type system. Gradual type system designers need more formal tools to help them conceptualize, structure, and evaluate their designs.

In this paper, we propose a new formal foundation for gradual typing, drawing on principles from abstract interpretation to give gradual types a semantics in terms of pre-existing static types. Abstracting Gradual Typing (AGT for short) yields a formal account of consistency—one of the cornerstones of the gradual typing approach—that subsumes existing notions of consistency, which were developed through intuition and ad hoc reasoning.

Given a syntax-directed static typing judgment, the AGT approach induces a corresponding gradual typing judgment. Then the subject-reduction proof for the underlying static discipline induces a dynamic semantics for gradual programs defined over source-language typing derivations. The AGT approach does not recourse to an externally justified cast calculus: instead, run-time checks naturally arise by deducing evidence for consistent judgments during proof-reduction.

To illustrate our approach, we develop novel gradually-typed counterparts for two languages: one with record subtyping and one with information-flow security types. Gradual languages designed with the AGT approach satisfy, by construction, the refined criteria for gradual typing set forth by Siek and colleagues.

At first sight this description seems to overlap with the Gradualizer work cited above, but in fact the two approaches are highly complementary. The Abstract Gradual Typing effort seems mechanizable, but it is far from being implementable in practice as done in the Gradualizer work. It remains a translation to be done on paper by skilled expert, although, as standard in abstract interpretation works, many aspects are deeply computational -- computing the best abstractions. On the other hand, it is extremely powerful to guide system design, as it provides not only a static semantics for a gradual system, but also a model dynamic semantics.

The central idea of the paper is to think of a missing type annotation not as "a special Dyn type that can contain anything" but "a specific static type, but I don't know which one it is". A problem is then to be understood as a family of potential programs, one for each possible static choice that could have been put there. Not all choices are consistent (type soundness imposes constraints on different missing annotations), so we can study the space of possible interpretations -- using only the original, non-gradually-typed system to make those deductions.

An obvious consequence is that a static type error occurs exactly when we can prove that there is no possible consistent typing. A much less obvious contribution is that, when there is a consistent set of types, we can consider this set as "evidence" that the program may be correct, and transport evidence along values while running the program. This gives a runtime semantics for the gradual system that automatically does what it should -- but it, of course, would fare terribly in the performance harness described above.

Some context

The Abstract Gradual Typing work feels like a real breakthrough, and it is interesting to idly wonder about which previous works in particular enabled this advance. I would make two guesses.

First, there was a very nice conceptualization work in 2015, drawing general principles from existing gradual typing system, and highlighting in particular a specific difficulty in designing dynamic semantics for gradual systems (removing annotations must not make program fail more).

Refined Criteria for Gradual Typing
by Jeremy Siek, Michael Vitousek, Matteo Cimini, and John Tang Boyland

Siek and Taha [2006] coined the term gradual typing to describe a theory for integrating static and dynamic typing within a single language that 1) puts the programmer in control of which regions of code are statically or dynamically typed and 2) enables the gradual evolution of code between the two typing disciplines. Since 2006, the term gradual typing has become quite popular but its meaning has become diluted to encompass anything related to the integration of static and dynamic typing. This dilution is partly the fault of the original paper, which provided an incomplete formal characterization of what it means to be gradually typed. In this paper we draw a crisp line in the sand that includes a new formal property, named the gradual guarantee, that relates the behavior of programs that differ only with respect to their type annotations. We argue that the gradual guarantee provides important guidance for designers of gradually typed languages. We survey the gradual typing literature, critiquing designs in light of the gradual guarantee. We also report on a mechanized proof that the gradual guarantee holds for the Gradually Typed Lambda Calculus.

Second, the marriage of gradual typing and abstract interpretation was already consumed in previous work (2014), studying the gradual classification of effects rather than types.

A Theory of Gradual Effect Systems
by Felipe Bañados Schwerter, Ronad Garcia, Éric Tanter

Effect systems have the potential to help software developers, but their practical adoption has been very limited. We conjecture that this limited adoption is due in part to the difficulty of transitioning from a system where effects are implicit and unrestricted to a system with a static effect discipline, which must settle for conservative checking in order to be decidable. To address this hindrance, we develop a theory of gradual effect checking, which makes it possible to incrementally annotate and statically check effects, while still rejecting statically inconsistent programs. We extend the generic type-and-effect framework of Marino and Millstein with a notion of unknown effects, which turns out to be significantly more subtle than unknown types in traditional gradual typing. We appeal to abstract interpretation to develop and validate the concepts of gradual effect checking. We also demonstrate how an effect system formulated in Marino and Millstein’s framework can be automatically extended to support gradual checking.

Difficulty rewards: gradual effects are more difficult than gradual simply-typed systems, so you get strong and powerful ideas when you study them. The choice of working on effect systems is also useful in practice, as nicely said by Philip Wadler in the conclusion of his 2015 article A Complement to Blame:

I [Philip Wadler] always assumed gradual types were to help those poor schmucks using untyped languages to migrate to typed languages. I now realize that I am one of the poor schmucks. My recent research involves session types, a linear type system that declares protocols for sending messages along channels. Sending messages along channels is an example of an effect. Haskell uses monads to track effects (Wadler, 1992), and a few experimental languages such as Links (Cooper et al., 2007), Eff (Bauer and Pretnar, 2014), and Koka (Leijen, 2014) support effect typing. But, by and large, every programming language is untyped when it comes to effects. To encourage migration from legacy code to code with effect types, such as session types, some form of gradual typing may be essential.

Static vs. Dynamic Languages: A Literature Review

We've mentioned some empirical studies of programming languages a few times, but I haven't seen a comprehensive list we can use as a reference.

Fortunately, I just came across this pretty decent overview of existing literature on how types impact development. Agree or disagree with Dan Luu's position, the comprehensive list warrants a front-page post in my opinion.

One point worth noting is that all the studies used relatively inexpressive languages with bland type systems, like C and Java, and compared those against typed equivalents. A future study ought to compare a more expressive language, like OCaml, Haskell or F#, which should I think would yield more pertinent data to this age-old debate.

Part of the benefits of types allegedly surround documentation to help refactoring without violating invariants. So another future study I'd like to see is one where participants develop a program meeting certain requirements in their language of choice. They will have as much time as needed to satisfy a correctness test suite. They should then be asked many months later to add a new feature to the program they developed. I expect that the maintenance effort required of a language is more important than the effort required of initial development, because programs change more often than they are written from scratch.

This could be a good thread on how to test the various beliefs surrounding statically typed and dynamically languages. If you have any studies that aren't mentioned above, or some ideas on what would make a good study, let's hear it!

BWK on "How to succeed in language design without really trying"

A talk by Brian Kernighan at The University of Nottingham.

Describes all the usual suspects: AWK, EQN, PIC. Even AMPL. I was wondering which languages he had in mind when he mentioned that some of his creations were total flops. I'd love to learn from those!

The talk is a fun way to spend an hour, and the audio would be good for commuters. For real aficionados I would recommend reading Jon Bentley's articles on the design of these languages (as well as CHEM and others) instead.

Compilers as Assistants

Designers of Elm want their compiler to produce friendly error messages. They show some examples of helpful error messages from their newer compiler on the blog post.

Compilers as Assistants

One of Elm’s goals is to change our relationship with compilers. Compilers should be assistants, not adversaries. A compiler should not just detect bugs, it should then help you understand why there is a bug. It should not berate you in a robot voice, it should give you specific hints that help you write better code. Ultimately, a compiler should make programming faster and more fun!

Breaking Through the Normalization Barrier: A Self-Interpreter for F-omega

Breaking Through the Normalization Barrier: A Self-Interpreter for F-omega, by Matt Brown and Jens Palsberg:

According to conventional wisdom, a self-interpreter for a strongly normalizing λ-calculus is impossible. We call this the normalization barrier. The normalization barrier stems from a theorem in computability theory that says that a total universal function for the total computable functions is impossible. In this paper we break through the normalization barrier and define a self-interpreter for System Fω, a strongly normalizing λ-calculus. After a careful analysis of the classical theorem, we show that static type checking in Fω can exclude the proof’s diagonalization gadget, leaving open the possibility for a self-interpreter. Along with the self-interpreter, we program four other operations in Fω, including a continuation-passing style transformation. Our operations rely on a new approach to program representation that may be useful in theorem provers and compilers.

I haven't gone through the whole paper, but their claims are compelling. They have created self-interpreters in System F, System Fω and System Fω+, which are all strongly normalizing typed languages. Previously, the only instance of this for a typed language was Girard's System U, which is not strongly normalizing. The key lynchpin appears in this paragraph on page 2:

Our result breaks through the normalization barrier. The conventional wisdom underlying the normalization barrier makes an implicit assumption that all representations will behave like their counterpart in the computability theorem, and therefore the theorem must apply to them as well. This assumption excludes other notions of representation, about which the theorem says nothing. Thus, our result does not contradict the theorem, but shows that the theorem is less far-reaching than previously thought.

Pretty cool if this isn't too complicated in any given language. Could let one move some previously non-typesafe runtime features, into type safe libraries.

On type safety for core Scala: "From F to DOT: Type Soundness Proofs with Definitional Interpreters"

From F to DOT: Type Soundness Proofs with Definitional Interpreters by Tiark Rompf and Nada Amin:

Scala's type system unifies aspects of ML-style module systems, object-oriented, and functional programming paradigms. The DOT (Dependent Object Types) family of calculi has been proposed as a new theoretic foundation for Scala and similar expressive languages. Unfortunately, type soundness has only been established for a very restricted subset of DOT (muDOT), and it has been shown that adding important Scala features such as type refinement or extending subtyping to a lattice breaks at least one key metatheoretic property such as narrowing or subtyping transitivity, which are usually required for a type soundness proof.
The first main contribution of this paper is to demonstrate how, perhaps surprisingly, even though these properties are lost in their full generality, a richer DOT calculus that includes both type refinement and a subtyping lattice with intersection types can still be proved sound. The key insight is that narrowing and subtyping transitivity only need to hold for runtime objects, but not for code that is never executed. Alas, the dominant method of proving type soundness, Wright and Felleisen's syntactic approach, is based on term rewriting, which does not make an adequate distinction between runtime and type assignment time.
The second main contribution of this paper is to demonstrate how type soundness proofs for advanced, polymorphic, type systems can be carried out with an operational semantics based on high-level, definitional interpreters, implemented in Coq. We present the first mechanized soundness proof for System F<: based on a definitional interpreter. We discuss the challenges that arise in this setting, in particular due to abstract types, and we illustrate in detail how DOT-like calculi emerge from straightforward generalizations of the operational aspects of F<:.

Not only they solve a problem that has been open for 12 years, but they also deploy interesting techniques to make the proof possible and simple. As they write themselves, that includes the first type-soundness proof for F<: using definitional interpreters — that is, at least according to some, denotational semantics.

Understated Twitter announcement here.

GADTs Meet Their Match: Pattern-Matching Warnings That Account for GADTs, Guards, and Laziness

GADTs Meet Their Match: Pattern-Matching Warnings That Account for GADTs, Guards, and Laziness by Georgios Karachalias, Tom Schrijvers, Dimitrios Vytiniotis, Simon Peyton Jones:

For ML and Haskell, accurate warnings when a function definition has redundant or missing patterns are mission critical. But today’s compilers generate bogus warnings when the programmer uses guards (even simple ones), GADTs, pattern guards, or view patterns. We give the first algorithm that handles all these cases in a single, uniform framework, together with an implementation in GHC, and evidence of its utility in practice.

Another great paper on a very useful incremental improvement on GADTs as found in Haskell, OCaml and Idris. Exhaustiveness checking is critical for a type system's effectiveness, and the redundant matching warnings are a nice bonus.

Optimizing Closures in O(0) time

Optimizing Closures in O(0) time, by Andrew W. Keep, Alex Hearn, R. Kent Dybvig:

The flat-closure model for the representation of first-class procedures is simple, safe-for-space, and efficient, allowing the values or locations of free variables to be accessed with a single memory indirect. It is a straightforward model for programmers to understand, allowing programmers to predict the worst-case behavior of their programs. This paper presents a set of optimizations that improve upon the flat-closure model along with an algorithm that implements them, and it shows that the optimizations together eliminate over 50% of run-time closure-creation and free-variable access overhead in practice, with insignificant compile-time overhead. The optimizations never add overhead and remain safe-for-space, thus preserving the benefits of the flat-closure model.

Looks like a nice and simple set of optimizations for probably the most widely deployed closure representation.

Dependent Types for Low-Level Programming

Dependent Types for Low-Level Programming by Jeremy Condit, Matthew Harren, Zachary Anderson, David Gay, and George C. Necula:

In this paper, we describe the key principles of a dependent type system for low-level imperative languages. The major contributions of this work are (1) a sound type system that combines dependent types and mutation for variables and for heap-allocated structures in a more flexible way than before and (2) a technique for automatically inferring dependent types for local variables. We have applied these general principles to design Deputy, a dependent type system for C that allows the user to describe bounded pointers and tagged unions. Deputy has been used to annotate and check a number of real-world C programs.

A conceptually simple approach to verifying the safety of C programs, which proceeeds in 3 phases: 1. infer locals that hold pointer bounds, 2. flow-insensitive checking introduces runtime assertions using these locals, 3. flow-sensitive optimization that removes the assertions that it can prove always hold.

You're left with a program that ensures runtime safety with as few runtime checks as possible, and the resulting C program is compiled with gcc which can perform its own optimizations.

This work is from 2007, and the project grew into the Ivy language, which is a C dialect that is fully backwards compatible with C if you #include a small header file that includes the extensions.

It's application to C probably won't get much uptake at this point, but I can see this as a useful compiler plugin to verify unsafe Rust code.