Type-sensitive control-flow analysis

Type-sensitive control-flow analysis, John Reppy, 2006 ML Workshop.

Higher-order typed languages, such as ML, provide strong support for data and type abstraction. While such abstraction is often viewed as costing performance, there are situations where it may provide opportunities for more aggressive program optimization. Specifically, we can exploit the fact that type abstraction guarantees representation independence, which allows the compiler to specialize data representations. This paper describes a first step in supporting such optimizations; namely a modular control-flow analysis that uses the program’s type information to compute more precise results. We present our algorithm as an extension of Serrano’s version of 0-CFA and we show that it respects types. We also discuss applications of the analysis with a specific focus on optimizing Concurrent ML programs.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

a common mischaracterization in the abstract

The paper is cool but it starts by claiming an expectation that IMHO is backwards. Type checking is widely recognized as improving performance, and that includes abstraction constructs. In fact, I think theoreticians have underestimated just how important performance is to many practitioners. Theoreticians have seized on the error-removal possibilities of type checking, but practitioners seem to use it (when they do) to get speed. If practitioners were looking mainly for error removal, then they would have more quickly adopted array bounds checking and checked dynamic casts.

Anyway, this paper has cool ways to use static type information to improve analysis. In particular, Reppy exploits modularization features in order to improve analysis when you run it on an individual module. Specifically, if a data type is private, then it can only be instantiated inside its own module. Any such values that flows in through a public interface, must be one that flowed out through a public interface.

You can do some of this kind of reasoning, by the way, whether or not you have static types. For example, a private inner class can only be instantiated inside its scope, so any time you see an instance of one, you know all of the places it was created even if you do not have the whole program. Likewise for a Smalltalk block, which in the entire program has a single location it could have been instantiated into a closure.

Abstraction, not type checking

The paper claims it is abstraction that is commonly considered to cost performance. This seems pretty reasonable, especially in untyped language where there is no change to enforce an abstraction purely with the type system, though the static and separate compilation most statically typed language implementations seem to use costs some inlining opportunities compared to good JIT (re)compilers.

I wonder how keeping separate contexts for different (approximations to) instantiations of polymorphic functions compares with other precision improvements like tracking call stacks in k-CFA.