## path-sensitive dataflow analysis. How?

This is a beginner question. In the books I've seen so far (e.g. "Principles of Program Analysis" Nielson&Nielson) simple dataflow analyses are explained. But the more practical/precise applications using DFA are using one which is path-sensitive (SLAM,ESP,etc.).
How do you make a DFA path-sensitive? Do you know any good papers/books which cover this topic?

Thanks.

## Comment viewing options

### Hi Jonathan, Making a DFA

Hi Jonathan,

Making a DFA path-sensitive is pretty easy conceptually, and a bit of a PITA in practice.

So, in a path-insensitive DFA, you build a control-flow graph that tells you which basic block can transfer control to which other ones. Then, in your flow functions for the analysis, you take the meet over all the predecessors of a node in order to compute the value for the current node.

To make the data flow analysis path-sensitive, what you need to do is label each edge in the CFA with a test expression under which it will transfer control. So if you have code like

L1: ...
...
if x = 5 then goto L2 else goto L3


then you will have a control flow graph with at least:

          L1
|
+--------+--------+
|                 |
| x = 5           | x != 5
|                 |
L2                L3


Then, you modify your flow function in the DFA. When computing the predecessors/successors of a node to take a meet over, you use symbolic execution on the edge's test expression to figure out whether the edge is possible or not -- if you can show it's false, then you know that the edge can't really be taken, and you can leave it out of the set of nodes to do the meet over.

### Ah, thanks! So that means

Ah, thanks! So that means for a path-sensitive DFA you will always need some kind of decision procedure to execute and compare these test expressions? Additionally those edge test expressions must be simple or abstract enough to be symbolically executed?

### Yeah, that's right, though

Yeah, that's right, though you're usually doing this analysis over the IR, which *should* have a very simple language of expressions, which you can usually easily symbolically execute. So the decision procedure is really "is this expression equal to false"? This assumes that you just punt and are conservative whenever there is aliasing.

If you must be careful about aliasing, then IIUC life gets tremendously harder, and your decision procedures start wandering off into theorem proving territory (because you have to deduce facts about the state of the heap at each program point).

(Fair warning: I'm a PL guy, not an analysis guy, so this is stuff I know about rather than stuff I do for real.)

### that helped me, thanks!

that helped me, thanks!

### Propagation of information and Theorem Proving

If we are in a functional programming language without side effects we can propagate positive information (as in the L2 branch) using unification. Unification will take care of aliasing issues as well.

In order to be able to propagate the negative information into the L3 branch we need to be able to work with disequations as well as equations, which requires us to extend beyond simple unification.

In the case that the condition is not a simple equality test but some predicate of type (... -> boole) then we start getting into theorem proving territory. Now (in the case that we can't restrict directly at compile time the values of x based on the predicate) we will have to propagate the predicate as a constraint on the variable x in L2, and the negation of the predicate as a constraint in the branch L3.