Path Feasibility Analysis for String-Manipulating Programs

Path Feasibility Analysis for String-Manipulating Programs. Nikolaj Bjorner, Nikolai Tillmann, Andrei Voronkov.

We discuss the problem of path feasibility for programs manipulating strings using a collection of standard string library functions. We prove results on the complexity of this problem, including its undecidability in the general case and decidability of some special cases. In the context of test-case generation, we are interested in an efficient finite model finding method for string constraints. To this end we develop a two-tier finite model finding procedure. First, an integer abstraction of string constraints are passed to an SMT (Satisfiability Modulo Theories) solver. The abstraction is either unsatisfiable, or the solver produces a model that fixes lengths of enough strings to reduce the entire problem to be finite domain. The resulting fixed-length string constraints are then solved in a second phase. We implemented the procedure in a symbolic execution framework, report on the encouraging results and discuss directions for improving the method further.

The authors note that while strings are a fundamental data type, the main existing way of handling strings in symbolic test-case generation tools is to treat them as regular arrays, and to treat string library subroutines as regular procedure calls, thus turning even simple calls to library functions into loop constructs.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Envelopes, Bananas, Barbed Wire for Strings?

Essentially, what they are trying to do is find a “calculus” of sorts for strings, the way the Bananas, Envelopes, Barbed Wire, and Lenses paper did for cons-lists?
And how about those of us who use cons-lists for strings; don't we already have what they are talking about?

You mean this paper?

This paper was of course

This paper was of course discuss numerous times on LtU.

Of further interest:

Of further interest: http://www.cs.virginia.edu/~ph4u/talks/muri.pdf . I know Pieter had written up a draft about solving string constraints last summer, but can't find it anywhere.

Such work is interesting for empirical reasons. When trying to model check, say, the linux kernel, what abstractions (and refinements) you use matter. Just say 'cons' will give you a blow up -- it's just one step above modeling the heap as a giant bit array. Furthermore, consider a lot of the code being written in safer languages: as popular safety concerns include races, xss, and sql-injection, it'd be nice to have a calculus for concurrency and strings, as they cover a lot of what is written. My bet is that something similar will be done for other popular ADTs like the web DOM.

I'm happy to report that the

I'm happy to report that the draft, "A Decision Procedure for Subset Constraints over Regular Languages," has since made its way to PLDI 09. The preprint is available upon request -- please find my contact info at http://www.cs.virginia.edu/~ph4u/ .

String rewriting systems

The existing body of literature surrounding rewriting systems in general, and string rewriting systems in particular, already is a "calculus" of modifying strings. (Toyama's theorem, for example, provides ways to compose rewriting systems as we would have with functions.) This work with path feasibility seems to reuse existing halting results of string rewriting systems -- specifically regarding length -- to enrich analysis of string manipulation by .net languages.