Whither Flow Analysis?

In the comments to a post about CSE in Guile, NeelK mentions we need more Flow Analysis. The other day I was reading A Flow-Analysis Framework for Realistic Scheme Programs which was done in Scheme48. GHC does some fusiony stuff. What is the state of the art? Who is working on it? When will it come to a compiler / javascript interpreter near me?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

It already did

See http://rfrn.org/~shu/drafts/ti.pdf for SpiderMonkey.

recent breakthroughs

A lot of program analyses reduce to the same problems, and three particular trends are exciting to me right now:

  • leveraging commonalities between program analysis optimizations and SAT solver optimizations (vijay d'silva). Sexy but no big wins yet.
  • using machine learning for context-sensitive context-sensitivity (percy liang & mayur naik). Pretty exciting.
  • scaling via gpus/distribution (forgot references, but think lam-style datalog formulations.) More generally, we need to be scaling to real-time and handling bigger data sets, and while the 'real world' has switched to these methods, researchers generally haven't.

IMO, the literature on pointer analysis is probably where to look for algorithmic / precision breakthroughs, not flow analysis, unless explicitly tackling oddities such as control implications for information flow.

RE:javascript, doing it in an on-demand JIT context is tricky. JS systems generally do more quasi-trivial lightweight things. I'd love for a way to provide precomputed hints, but am not holding my breath. The use cases are more for bug finding, specification mining, offline optimization, and IDE tooling.

LtU posts?

Those look like interesting references, thanks. Why haven't we heard of those on LtU yet? Since I got the ability of posting front page stories last year, I've tried to post a few things but I generally try to write about stuff I have a reasonable understanding of -- so it would take quite some time for me to write something about this work.

If you don't have the ability to post front page stories, you should ask Ehud for it. In the meantime, feel free to send me proposals of things to post (maybe a paragraph describing why you think the work is interesting would be nice, to publish along the article abstract) -- gasche dot dylc at gmail.

There is just a lot of it to

There is just a lot of it to go through. It would be nice if someone sees a paper they like that they post it, since signal to noise ratios aren't great otherwise.

I've dabbled in this field before and I found the biggest problems to be accuracy and predictability. Since the analysis would always lose precision, it was difficult for the programmer to predict how their code would be optimized...i.e. they couldn't depend on it. Even if their code achieved so much performance in one version, there was no guarantee that a small change would cause something with significantly different performance.

The results are perversely only useful if you don't depend on them...say in the few percentage point range. Anything greater can lead to observable performance digressions. Something interactive with checked assertions could get around that, though I'm not familiar of any work that has tried that.

Taming Lambda

CFA for higher order languages is fascinating. When working on my scheme compiler I heap allocated all environments: it's the simplest and most general strategy but it's not always the most efficient. Even with a big improvement to the CPS algorithm, I think that this is making the emitted code too slow to handle a bootstrap. From reading RABBIT and ORBIT I figured that they key is to stack allocate closures when possible: To know how to do this you need CFA. Of course CFA tells you a lot and you can do many other optimizations based on the information it provides.

So I started to research into this algorithm. Olin Shivers discovered the 0-CFA and k-CFA algorithms in his dissertation Control-Flow Analysis of Higher-Order Languages or Taming Lambda. He uses the CPS transform to reify control flow using the lambda operator. This continues the scheme tradition - He calls this The Hedgehog Approach. In a higher order language (since you can pass lambda around as data) control flow and data flow and mutual and interrelated so it's more involved than the same analysis for first order languages.

0th order control flow analysis works by figuring out which lambda values may be bound at the various binding sites. The analysis at higher orders takes what the environments might hold into account too. A good way to implement it is to emit subset constraints and then solve them afterwards. It's covered in the Principles of Program Analysis book, but I find that chapter pretty confusing. I also came across some slides that were quite clear here, here and here (page 41 and on).

The Bigloo scheme compiler makes use of 0CFA (on direct non-CPS code) to work out the best strategy to allocate each closure: Control Flow Analysis: a Functional Languages Compilation Paradigm. Overall it's very difficult stuff, I can see why would people would try to avoid implementing this if possible. I feel like it's too hard for me to work on before I have a bootstrapping compiler so I think I'll try to get something working without CPS first then go back and use CPS with this optimization if I can.

Matt Might's advisor was Olin Shivers and his dissertation Environment Analysis of Higher-Order Languages starts by recasting Shivers work in terms of abstract interpretation. Then he extends the analysis further in a few directions: for example abstract garbage collection, abstract reference counting - these seem to have very powerful applications. It would be really cool to build compilers that use these techniques! I expect it would give a competitive edge over compilers that don't - and I'm not sure if anyone uses these modern techniques yet.

off the wall question

So it sounds like (I am clueless) "lower level" imperative/procedural stuff sort of has explicit control flow in it, since people tend not to pass around function pointers all that often, vs. HLLs where people do (lambda (f) (f x )) all the time.

What are the gradients in-between? Like, how can HLLs be slightly changed to be more amenable to CFA? e.g. ¿There could be an optimizer feedback loop where it tells us "i can't figure out enough about that to optimize" and then we go in and make annotations or something explicit?

Even more zany, are there completely different syntax/semantic building block that make CFA fall out more easily, yet don't make us into procedural cavemen?

huh?

Instead people pass around objects with methods. Doesn't that still reduce to the same problems of context-sensitivity and whole program analysis?

Ya, only the degrees are

Ya, only the degrees are different. Still, virtual method dispatch can be a different problem to solve vs. full HOFs (you at least have the name to restrict your search space!).

Agreed

Agreed. That difference is one of the main challenges for the armies of phds who have switched from doing static analysis of Java to JavaScript. (E.g., doing pointer & inf flow analysis that switches from named objects to raw records.)

Ya, only the degrees are

Ya, only the degrees are different. Still, virtual method dispatch can be a different problem to solve vs. full HOFs (you at least have the name to restrict your search space!).

This is much more significant than you make it sound. k-CFA is polynomial time in OO, but EXPTIME in closure-based languages!

See above. Less important,

See above. Less important, high polynomial algos can already break at this scale, esp. for needed precision.

I don't think this is right.

I don't think this is right. As that paper describes, people really use nonzero levels of both object and context sensitivity in analysis of Java programs, but not for functional languages. So the difference matters in practice.

physical constants

Does the blowup happen outside of deeply nested contexts? Programs aren't (statically) deep -- our linter goes as far as rejecting ones that go beyond 5 or 6 levels.

Fwiw, analysis of JS is at least 1- or 2- deep. (Though yes, more due to getter/setter etc than its lambdas)

Wait, which FPs?

Racket libraries use objects/mixins/etc., so which FPs are you referencing? Also, in Racket's case, is the need somehow being avoided via a mechanism like macros? (So expanding before hitting the analysis?)

I think you are reading Sam backwards...

...he's saying that people are using algorithms like this to analyze big Java programs, so these algorithms *do* scale, despite having high (i.e. >1) polynomial factors.

i'm (easily) confused

I read the other excerpt and the thing you said, "k-CFA is polynomial time in OO, but EXPTIME in closure-based languages!" as saying OO was easier, not harder?

I see it like this: the more

I see it like this: the more generic the code, the harder to analyze. In a closure-based language, you can write really generic code without trying very hard, so the code tends to be generic. In an OO language, writing generic code is harder, so the code tends to be less generic.

However, you can always write generic code in a OOPL, it is harder, but the result would be the same: the code would be hard to analyze. Likewise, in a closure-based language, you can write non-generic code by abstaining, and the analysis would be easier.

Static Generics

Is that true when the generics are expanded statically like C++ templates?

In that case, you simply

In that case, you simply have more code to analyze. There is no free lunch. In any case, you can easily accomplish the same in an analysis if indirection isn't a concern (necessary for inlining).

shenanigans

There's a difference between having worked at all at some precision and working well. We don't put strong analyses into JITs. Unless we're doing something like MKL, we won't wait long for an optimizing compiler. Even in bug finding, we'll skimp. Ex: people pay coverity to run a battery of analyses -- it's not just flow analysis once and that's it -- and for speed, they still go lightweight.

In this paper, they explode even for offline use at ~300 KLOC: https://www.cs.utexas.edu/users/lin/papers/cgo11.pdf .

FWIW, Mayur's papers are interesting because they help answer how much sensitivity is needed to answer queries: there *is* a notion of enough speed for big code bases, but we're not there.

I'm still curious about whether/why 0-cfa is sufficient for FPs. I suspect there's something else going on here, but don't know enough about what people are using it for nor on what.

0CFA and FPLs

To my knowledge, very few functional language compilers use a higher-order control flow analysis. Off the top of my head, only MLton, Stalin, Bigloo, and Chez Scheme (subzero-CFA) do. In particular, Scala, Racket, SML/NJ, GHC and Ocaml don't use it. So the most popular FP implementations don't use 0CFA!

Beyond the fact that it's a whole-program analysis, I've found it surprisingly tricky to use the information 0CFA gives you to compile functional languages. The biggest benefit it gives you is that it lets you turn some indirect calls into direct calls, which you may then be able to inline.

However, the most important indirect calls in functional programs arise in the bodies of invocations of higher-order functions (e.g., in the definition of map), and CFA can't help you much with them, because all flow analysis will end up doing is tell you that lots of different lambdas are passed as functional arguments to map. (My possibly-wrong understanding is that GHC and Scala use a first-order analysis plus very aggressive inlining to eliminate these HOFs.)

I think you need to combine it with some kind of function cloning optimization/analysis, so that you can generate specialized code at each call site to eliminate the HOF. I'm sure it's folklore -- surely Olin Shivers or John Reppy or Manuel Serrano knows how to do it -- but I haven't seen it documented in the literature.

I remember that there was

I remember that there was some work related to automatic differentiation where they used flow analysis and subsequent specialization to eliminate all closures & all indirect calls. Of course that doesn't work for arbitrary code (e.g. for a church encoding of naturals) but it's interesting nonetheless.

guesses

I've been thinking a few things here, but really don't know the community.

1. The richer type structures and potentially induced programming style simplify optimization, e.g., for data representation, and as you mentioned, inlining. Referential transparency, macros, lack of subclassing, and avoidance of state all simplify these. (OTOH, I'm not sure on nominal vs. structural, which may make flow analysis useful again for data rep.)

2. The compiler teams are too small or research-focused to do the same passes that armies of intel etc. engineers would. Of the ones you list, GHC probably has the most engineering hours behind it and Racket has great people, but intel etc. have armies of compiler engineers. (For Racket in particular, I'd instead look at what's happening with JS.)

3. Less security analysis. This is likely a mix of most non-type-based analysis research here being on Java and maybe C, and related, most commercial needs being on analyzing security-critical code bases written in them. (And to an unclear extent, not needing the analyses in the first place: coverity's double free checks generally don't apply to say haskell.)

It really is the complexity

A few points:

- No one uses serious inter-procedural flow analysis in any widely-used compiler that I've ever heard of. All of the C/Java/FP/JS/etc analyses that I've seen break down into two cases:

* Simple, low-cost analyses, along the lines of Dybvig's Sub-0CFA or callgraph construction in C compilers. Similar simple flow analyses are used in JS (see Hackett and Guo), GHC, Racket, OCaml, you name it.

* Expensive analyses, used in systems that are not compilers. This includes all the security analyses, all the verification using control flow, all the C static analysis, etc.

* MLton is an exception to the above.

- Academic and industrial systems based on kCFA that I've seen in OO languages really do have non-zero object and caller sensitivity, whereas this is very rare in functional languages. This is because of the complexity difference due to closures (see Van Horn et al, PLDI 2011).

- 0CFA isn't "sufficient for FPs". There's lots of imprecision there that people would love to get rid of. But 1CFA is EXPTIME-complete, and slow in practice for serious programs. Even 0CFA is slow enough that Dybvig invented a whole new algorithm to put into his compiler (Adams et al, OOPSLA 12).

A few related papers

Andrew Appel's paper Loop Headers in Lambda-calculus or CPS covers a very simple and (I think?) commonly-used technique for efficiently optimizing higher-order call sites like map.

Fast and Effective Procedure Inlining presents an aggressive but linear-time hybrid inlining/constant-propagation/loop-unrolling algorithm. GHC's inliner uses a closely-related variant of this approach, as does Erlang.

Thanks!

-

Has anyone tried incremental flow analysis at runtime?

Sort of like running gprof at runtime and then focusing on how to speed up the top N time-hogs. Just like GC, flow analysis is also a graph algorithm. While GC cares about dead vs live, this will care about cold vs hot. The idea is to analyze only well-trodden code paths rather than the whole program.

Yes, I've done something similar.

In a hobby lisp implementation, I put counters on most control flow paths, and the interleaved runtime optimizer consumes (zeros) the counts, treating the counts it has reaped in a given area as "fuel" authorizing it to spend proportional effort trying to analyze/optimize code in that area.

When it doesn't know how to do anything more, it generates a code vector with no counter (final optimization), which cuts off further effort on that area.

It hasn't worked as well as I hoped. It's good at concentrating on the most heavily used code segments, but that mainly pays off only when doing strictly local optimizations. More fundamental algorithmic optimizations require increasingly non-local analysis to support. If the only way to spend effort on analyzing a given area is derived from counters incremented by repeating execution of that area, then crucial information about what code in that area will NEVER do - which is vital to making algorithmic transformative optimizations in heavily-used areas - is gathered very slowly from segments that aren't heavily used.

So it's really good for focusing peephole-type optimizations, but you still need a better way to guide effort spent on whole-program or transformative optimizations (which, when you do them will mostly undo the peephole optimizations you've already done....)

My working theory now is that it's worth doing a fair amount of whole-program analysis and transformations across the whole of the code before I start paying attention to the counters and focusing on individual areas. Local "heat map" information doesn't seem all that useful until it's time to get down to strictly local optimizations.

Thanks for relaying your experience.

Thanks for relaying your experience. I was thinking of cases like where an expensive pure function is called with the same args from a given call site. You can simply replace the call with its result! For an impure function capture external dependencies and as long as they don't change it can be considered "pure". The latter feels a bit like dealing with old-to-new pointers in a generational GC. Likely there are other such opportunities....

Better precision can lead to

Better precision can lead to reduced running time for flow analysis. If a flow analysis is imprecise then a value can flow somewhere that it can't flow to in reality, and then from that location it flows to other locations, thus causing a lot of work.

Specific result

k-CFA is polynomial time in OO, but EXPTIME in closure-based languages!

As far as I can tell, this result only applies to a rather narrow definition of "OO", though. That is, class-based, nominal, with no local classes (or object literals). In other words, Java.

Can someone provide intuition?

Why can't you convert a functional program (with closures) into an OO program by converting each function type to an interface and each call site to a class with fields for the closed over variables (that implements the interface corresponding to its type)?

Read the paper!

You should really read the paper referred to earlier.

Is functional k-CFA a profoundly different analysis from OO k-CFA? We resolve this paradox by showing that OO features conspire to make the exact same specification of k-CFA be polynomial-time: objects and closures are subtly different, in a way that interacts crucially with context-sensitivity. This leads to a significant practical result: by emulating the OO approximation, we derive a polynomial hierarchy of context-sensitive CFAs for functional programs, simultaneously achieving high precision and efficiency.
[...]
As might be expected, our finding hinges on the fundamental difference between typical functional and OO languages: the former create implicit closures when lambda expressions are created, while the latter require the programmer to explicitly “close” (i.e., pass to a constructor) the data that a newly created object can reference. At an intuitive level, this difference also explains why the exact same k-CFA analysis will not yield the same results if a functional program is automatically rewritten into an OO program: the call-site context-sensitivity of the analysis leads to loss of precision when the values are explicitly copied—the analysis merges the information for all paths with the same k-calling-context into the same entry for the copied data.

what else is possible?

if somebody smart were to get all tripped out and hallucinate some crazy programming syntax that gave us control flow for free, what would it look like?

would it have to look like something with no abstraction? so that the compiler never has to infer anything? or are there some ways to have abstractions but not make the compiler confused or have to have a complicated implementation?

random e.g. when people draw dataflow in FBP or LabView or whatever, it looks to clueless me that they are also drawing some parts of the control flow analysis because i think the wires are usually connecting 2 concrete things, not 2 generic things.

i'm trying to ask: for each bullet point question/issue/problem/hurdle that has led us to CFA (e.g. "what functions can really be passed into this (lambda (f) (f x)) if i do whole program analysis so i can then invoke some optimizations"), what would a syntax gordian knot solution look like? it might just look like crap, like bad C style, but are there other crazy alternatives anybody sees in their head?

You can substitute data flow

You can substitute data flow for control flow...this is how pure functional programming actually works (well + reduction semantics). In FRP, for example, data flow substitutes for control flow that is otherwise completely encapsulated.

CFA and DFA are then really similar activities, especially since any indirection through an assigned field will require some kind of pointer analysis to resolve.

The gordion knot doesn't really exist: you can only make control flow/data flow explicit by disallowing paramicity and polymorphism; i.e. all code is used in a single context. But then you have other problems (literal code explosions).

kicking a dead horse

if the ide were what did the copy-paste and future maintenance of that duplicated code, rather than using generics/templates/polymorphism that the compiler executes & studies, would that be a weird but somehow not entirely useless one weird trick? doesn't really help in the case of trying to make libraries since they more often want to remain generic (e.g. qsort). [edit: unless the libraries are put out in a way that the ide can do the copy-paste-customize from them, hrm...]

zombie horse kicking people

Let's say you had the IDE do the duplication, you would still have all that duplicated code to analyze at roughly the same (or even greater) complexity...you wouldn't win by making CFA easy.

The best you could hope for is a constructed result on a property enforced by a very powerful type system (or proof system). But that property is probably global and unrelated to the code you are duplicating, and would have to be specified explicitly somewhere!

Going back to my typeless work, perhaps there is a way for the IDE to simply the signature of the analysis to produce acyclic directed graphs of semi-unification relationships, where nodes are hidden as the graphs propagate out of each abstraction level. Ok, so that is pointer analysis, not CFA, but the problem is really similar.