Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams

Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, John Whaley and Monica S. Lam. PLDI 2004.

This paper presents the first scalable context-sensitive, inclusion-based pointer alias analysis for Java programs. Our approach to context sensitivity is to create a clone of a method for every context of interest, and run a context-insensitive algorithm over the ex-panded call graph to get context-sensitive results. For precision, we generate a clone for every acyclic path through a program's call graph, treating methods in a strongly connected component as a single node. Normally, this formulation is hopelessly intractable as a call graph often has 10^14 acyclic paths or more. We show that these exponential relations can be computed efficiently using binary decision diagrams (BDDs). Key to the scalability of the technique is a context numbering scheme that exposes the commonalities across contexts. We applied our algorithm to the most popular applications available on Sourceforge, and found that the largest programs, with hundreds of thousands of Java bytecodes, can be analyzed in under 20 minutes.

This paper shows that pointer analysis, and many other queries and algorithms, can be described succinctly and declaratively using Datalog, a logic programming language. We have developed a system called bddbddb that automatically translates Datalog programs into highly efficient BDD implementations. We used this approach to develop a variety of context-sensitive algorithms including side effect analysis, type analysis, and escape analysis.

Binary decision diagrams are one of the coolest data structures of the last 20 years, and are also one of the basic enabling data structures underlying modern SAT solvers. Using them to implement Datalog is a really clever idea.

EDIT: I relied on memory instead of Google, and got it wrong about BDDs and SAT solving. Modern DPLL-based SAT solvers generally do not use BDDs, because when your solutions are far apart in the solution space the representing BDD would get quite large. BDDs are widely used in verification, though, and are also still one my favorite data structures. :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

neat!

I just happened to write a simple implementation of Datalog the other day to implement 0CFA. It would be fun to improve the implementation using this paper's technique.

I did a little class project

I did a little class project using Datalog to implement a few compiler analysis; the report is here.
Partial-redundacy elimination fits on a page. :)

Teaser: can you write rules like

f x :- (forall y. g x y -> f y)

in Datalog?

Answer: Surprisingly (at least to me when I realized it), yes you sometimes can!

Thanks.

Thanks for the reference.

For those who are interested in playing around with this, check out bddbddb, an open-source BDD-based Datalog solver.

BTW, the 1014 paths is actually 1014. Makes the problem a whole lot more interesting ;-).

Paddle

For those who are interested: Paddle is another system that supports points-to analysis via BDDs.