Lambda the Ultimate

SLAM (at POPL)
started 1/24/2002; 2:31:19 AM - last post 1/28/2002; 12:54:32 PM

Ehud Lamm - SLAM (at POPL)

1/24/2002; 2:31:19 AM (reads: 2474, responses: 12)

SLAM (at POPL)
Oleg returned from POPL/PADL and suggested posting his comments from various talks. I was delighted, of course. Posted to Software-Eng by Ehud Lamm on 1/24/02; 2:36:11 AM

Ehud Lamm - Re: SLAM (at POPL)

1/24/2002; 2:33:55 AM (reads: 1669, responses: 0)

On the first day of POPL, Thomas Ball from Microsoft Research presented an invited talk on SLAM: "The SLAM Project: Debugging system software via static analysis", on behalf of himself and Sriram K. Rajamani. SLAM has already been covered on LtU. Hopefully, my summary provides a bit more detail.

Most of the time, programmers write code within a particular framework -- (X) Windows API, Qt, device driver API, etc. API is a set of functions, which must be invoked in a correct sequence. The most primitive example (discussed in a talk) is a locking API: given a lock handle, you can invoke a lock() function -- but should do it only once. After that, you must eventually invoke an unlock() function -- again, exactly once. Verifying this pattern of usage seems trivial, until we note that calls to a lock() function may occur within a conditional statement. The call to an unlock() function must also occur within a conditional statement -- such that the condition of the first statement must imply the condition of the second. Checking this implication given the imperative source language is highly non-trivial. Naive model checking (which assumes that the conditional branch can be either taken or not) will report a violation of the locking API -- which often is a false positive. To be sure that we notice a real bug, we need to know which of the possible program paths are feasible.

SLAM tool takes C code -- as is, without any annotations. SLAM also takes a specification of the API -- as a finite state machine. SLAM iteratively infers program invariants and tries to improve the analysis to eliminate false positives.

SLAM creates an abstraction of the code: at first the abstraction is cruel: all variables can take any possible value and all conditional branches can be either taken or bypassed.
SLAM runs a model checker on the abstract program to see if calls to API functions fit the API specification (the FSM). If the model checker finds no counter-example, the program is correct (wrt the API usage).
If the model checker finds a counter-example (which shows a possible program trace), a special component (called 'Newton') checks the conditionals of the branches taken in the counter-example trace, notes the variables used in these conditions, and symbolically evaluates the program with respect to the noted variables. This evaluation yields new predicates -- constraints on the values variables can really take and constraints on the branches taken. Newton adds these new predicates to the abstraction of the program and repeats the model checking.

Thus SLAM is a combination of three tools -- create abstraction, model check, symbolic evaluation (abstract interpretation). The tools are indeed independently-compiled modules run through a command line. Many of the tools are off-the-shelf components: one component was written in OCaml! OCaml was also used as a glue language.

Thomas Ball claimed that the approach scales to 100k lines of code. In their experiments, they have analyzed several real (Winxx) device drivers and checked their use of DDR API. They found several real bugs in a floppy device driver.

Oleg - Mining specifications

1/24/2002; 8:00:27 PM (reads: 1795, responses: 4)

Mining Specifications
Glenn Ammons, Rastislav Bodik (U. Wisconsin @ Madison); James R. Larus (Microsoft Research). POPL, Jan 16, 2002.

To verify program's compliance to an API specification, we need the specification. The Windows Device Driver API used in the SLAM project was derived by hand, with expertise of people who developed the API. It is often the case however that (plain-text) API manuals are not precise enough for formal verification, and the original developers are not available for comments.

The talk proposes another, AI inspired, model: learn the behavior from observing a training set. In our case, the training set is running a set of test cases. Observations are dynamic traces of the program executions. The authors have built a tool that, given such traces, outputs a finite state machine that specifies the correct protocol of API usage. The main assumption is that common behavior is the correct behavior.

In more detail, the tool ingests the traces, notes calls to API functions, and builds a probabilistic finite state machine. Removing infrequent transitions from the latter yields the desired API specification.

The probabilistic FSA learner was "off the shelf". The contributions of the paper are

Figuring out which calls to API functions are related (by examining data dependencies).
E.g., open() returning 7, write(7,...) and close(7) are related, but write(1,...) is not related to the above
Simplifying the flow control to "scenarios". This involves some manual tuning and seeding, but drastically reduces the size of training data and makes learning faster
Deciding which scenarios are equivalent:
open()->7...close(7) and open()->8...close(8) are equivalent.

The authors inferred the basic X windows API (libX and libXi) by analyzing traces of 17 programs (five from the X Windows distribution). They used human-assisted learning: run a program, collect a trace, verify against the previously inferred API. If the verification fails, get the expert to inspect the failure. If the expert deems the program correct, extend the API specification. The authors found a real bug in two programs and benign violations of the X window protocol rules in three other programs (xconsole, xclipboard and clipboard).

Ehud Lamm - Re: Mining specifications

1/25/2002; 6:02:53 AM (reads: 1793, responses: 0)

Any more info on the FSA learner?

Frank Atanassow - Re: SLAM (at POPL)

1/25/2002; 11:58:27 AM (reads: 1443, responses: 1)

Re: Mining Specifications: Sounds like "correctness by consensus"?

I wonder what happens when there is no most-general-model of two traces, i.e., how inconsistency is treated. This is probabilistic, so I guess the users get to decide "what the specification ought to be". Very strange, thought-provoking, politically correct...! :)

Ehud Lamm - Re: SLAM (at POPL)

1/25/2002; 12:14:12 PM (reads: 1500, responses: 0)

Right on. I wanted to comment on the fact that as far as I recall specification ought to come before implemenetation

Oleg - Re: SLAM (at POPL)

1/25/2002; 8:16:51 PM (reads: 1404, responses: 1)

Any more info on the FSA learner?

As the authors indicate, they used an off-the-shelf PFSA learner

Anand V. Raman and Jon D. Patrick The sk-strings method for inferring PFSA. http://citeseer.nj.nec.com/103674.html

"which is the variation of the classical k-tails algorithm"

@article{ biermann72synthesis,
    author = "A. W. Biermann and J. A. Feldman",
    title = "On the synthesis of finite-state machines from samples of their behaviour",
    journal = "IEEE Transactions on Computers",
    volume = "C",
    number = "21",
    pages = "592--597",
    year = "1972"
}

Oleg - Re: SLAM (at POPL)

1/25/2002; 8:17:24 PM (reads: 1389, responses: 0)

The paper tries to solve the following problem:

an API manipulates at most k objects
For example, the file API manipulates only one object: the file handle
suppose the correct protocol of API usage can be represented by M, a probabilistic finite state automaton (PFSA). PFSA is a non-deterministic FSA with output alphabet being the set of API function names and with a probabilistic state transition function. Note that the correct API protocol is still probabilistic: some usage patterns are common and some are less common (either because they are rarely used or because we can know the correct API only approximately)
suppose we have a sample of execution traces of mostly correct programs.

The problem is to efficiently derive M', which is an approximation of M, given a confidence parameter, an approximation parameter, and the upper bound on the number of states in M.

Again, the fundamental assumption is that bugs are relatively rare -- at least for the programs and the traces used in a training set.

When it came to experiments, the authors did not have any reliable training set. Therefore, they tried to construct one incrementally. They started with traces of one program believed correct, extracted scenarios and inferred the automaton. They picked another program, got the set of traces, and extracted scenarios. The author then checked if all such scenarios (up to the specified max size) can be described by the previously learned automaton. Scenarios rejected by the automaton are manually inspected. If an expert believes a scenario is correct, it is added to the learning set, and a new automaton is inferred.

Ehud Lamm - Re: SLAM (at POPL)

1/26/2002; 7:01:15 AM (reads: 1455, responses: 0)

Thanks. Looks interesting.

Ehud Lamm - Re: Mining specifications

1/26/2002; 1:07:53 PM (reads: 1663, responses: 2)

Let's connect this to programming languages.

The issue here is that most languages are not expressive enough, and don't allow libraries to specify correct order for subroutine calls.

Notice that even if a tool such as the one described deduces a correct specificatoin, this specification can not be enforced by the programming language (e.g. C). Contrast this, for example, with deducing assertions, which can then be put inside the code - so that they will be enforced in the future.

Holland and Helm wrote some papers about reifying Contracts, which among other things enable the specification of allowed interactions between objects and methods.

Frank Atanassow - Re: Mining specifications

1/26/2002; 1:21:05 PM (reads: 1720, responses: 1)

C has assertions. All you need for assertions is booleans and a mechanism for crashing your program...

Ehud Lamm - Re: Mining specifications

1/26/2002; 2:05:04 PM (reads: 1786, responses: 0)

I guess I didn't make myself clear (though I did use the word contrast)...

Assertions are easy. Specifying API usage is much more problematic. (Examples: if open(file-id) is called, then eventually (before the program terminates) close(fid) must be called; read(file-id) is legal if open() was successful, and close() haven't been called yet.)

Noel Welsh - Re: SLAM (at POPL)

1/28/2002; 12:54:32 PM (reads: 1323, responses: 0)

This is interesting work! I wonder if having extracted the scenarios, one could run the FSA backwards to generate code. This could have interesting applications in smart IDEs. Imagine a code wizard looking over your shoulder and suggesting common usage patterns. Should be good for rookies; would probably drive the grey-beards insane! :)