archives

DySy: Dynamic Symbolic Execution for Invariant Inference

DySy: Dynamic Symbolic Execution for Invariant Inference. Christoph Csallner, Nikolai Tillmann, Yannis Smaragdakis

Dynamically discovering likely program invariants from concrete test executions has emerged as a highly promising software engineering technique. Dynamic invariant inference has the advantage of succinctly summarizing both “expected”program inputs and the subset of program behaviors that is normal under those inputs. In this paper, we introduce a technique that can drastically increase the relevance of inferred invariants, or reduce the size of the test suite required to obtain good invariants. Instead of falsifying invariants produced by pre-set patterns, we determine likely program invariants by combining the concrete execution of actual test cases with a simultaneous symbolic execution of the same tests. The symbolic execution produces abstract conditions over program variables that the concrete tests satisfy during their execution. In this way, we obtain the benefits of dynamic inference tools like Daikon: the inferred invariants correspond to the observed program behaviors. At the same time, however, our inferred invariants are much more suited to the program at hand than Daikon’s hardcoded invariant patterns. The symbolic invariants are literally derived from the program text itself, with appropriate value substitutions as dictated by symbolic execution. We implemented our technique in the DySy tool, which utilizes a powerful symbolic execution and simplification engine. The results confirm the benefits of our approach. In Daikon’s prime example benchmark, we infer the majority of the interesting Daikon invariants, while eliminating invariants that a human user is likely to consider irrelevant.

The Daikon invariant detector mentioned in the abstract is here.

Seems like a pretty straightforward idea, I guess, but as always: God is in the details.

Parametric datatype-genericity

Parametric datatype-genericity. Jeremy Gibbons and Ross Paterson. Submitted for publication.

Datatype-generic programs are programs that are parametrized by a datatype or type functor. There are two main styles of datatype-generic programming: the Algebra of Programming approach, characterized by structured recursion operators parametrized by a shape functor, and the Generic Haskell approach, characterized by case analysis over the structure of a datatype. We show that the former enjoys a kind of parametricity, relating the behaviours of generic functions at different types; in contrast, the latter is more ad hoc, with no coherence required or provided between the various clauses of a definition.

How could we have not mentioned this before?

The main result of this paper is that fold is a higher-order natural transformation. This means, the authors explain, that fold is a rather special kind of datatype-generic operator, both enjoying and requiring coherence between its datatype-specific instances.

We had several long discussions about the uniqueness of fold, which may serve as an introduction for those new to this sort of discussion. The tutorial on the universality of fold is, of course, on the papers page.

For some reason I feel a craving for bananas...

Subtext 2 Video

For those who haven't seen it, Jonathan Edwards has released a video demonstrating a prototype of his (live, dataflow based) Subtext 2 programming environment. Following up from his OOPSLA07 paper which was discussed on LTU recently.

Concurrency: The Compiler Writer's Perspective

A short interview with Brian Grant (of PeakStream and currently Google). From SD Times, Nov. 15, 2007.

SD Times: Is concurrency too difficult for developers accustomed to linear programming to grasp?

Brian Grant: It is challenging, but I don’t think it’s too challenging. I do think certain languages, especially C and C++, make it very challenging to write robust big multithreaded systems. Basically, they have features that are hostile to concurrency.

Does your experience in compilers give you a different perspective than the average developer?

[...]
Higher-level languages favor ease of correctness over ease of performance. For example, in functional languages such as Haskell, the code you write does not have any side effects. The model is that they don’t modify existing data; they generate new data. The advantage is that it’s easier for the compiler to reason about what different components of the code can do.
[...]