Path Feasibility Analysis for String-Manipulating Programs

Path Feasibility Analysis for String-Manipulating Programs. Nikolaj Bjorner, Nikolai Tillmann, Andrei Voronkov.

We discuss the problem of path feasibility for programs manipulating strings using a collection of standard string library functions. We prove results on the complexity of this problem, including its undecidability in the general case and decidability of some special cases. In the context of test-case generation, we are interested in an efficient finite model finding method for string constraints. To this end we develop a two-tier finite model finding procedure. First, an integer abstraction of string constraints are passed to an SMT (Satisfiability Modulo Theories) solver. The abstraction is either unsatisfiable, or the solver produces a model that fixes lengths of enough strings to reduce the entire problem to be finite domain. The resulting fixed-length string constraints are then solved in a second phase. We implemented the procedure in a symbolic execution framework, report on the encouraging results and discuss directions for improving the method further.

The authors note that while strings are a fundamental data type, the main existing way of handling strings in symbolic test-case generation tools is to treat them as regular arrays, and to treat string library subroutines as regular procedure calls, thus turning even simple calls to library functions into loop constructs.

By Ehud Lamm at 2009-01-17 23:28 | Software Engineering | other blogs | 8769 reads

Envelopes, Bananas, Barbed Wire for Strings?

Essentially, what they are trying to do is find a “calculus” of sorts for strings, the way the Bananas, Envelopes, Barbed Wire, and Lenses paper did for cons-lists?
And how about those of us who use cons-lists for strings; don't we already have what they are talking about?

By revence27 at Sun, 2009-01-18 14:56 | login or register to post comments

You mean this paper?

here?

By Scott Johnson at Sun, 2009-01-18 20:04 | login or register to post comments

This paper was of course

This paper was of course discuss numerous times on LtU.

By Ehud Lamm at Sun, 2009-01-18 20:32 | login or register to post comments

Of further interest:

Of further interest: http://www.cs.virginia.edu/~ph4u/talks/muri.pdf . I know Pieter had written up a draft about solving string constraints last summer, but can't find it anywhere.

Such work is interesting for empirical reasons. When trying to model check, say, the linux kernel, what abstractions (and refinements) you use matter. Just say 'cons' will give you a blow up -- it's just one step above modeling the heap as a giant bit array. Furthermore, consider a lot of the code being written in safer languages: as popular safety concerns include races, xss, and sql-injection, it'd be nice to have a calculus for concurrency and strings, as they cover a lot of what is written. My bet is that something similar will be done for other popular ADTs like the web DOM.

By lmeyerov at Sun, 2009-01-18 22:49 | login or register to post comments

I'm happy to report that the

I'm happy to report that the draft, "A Decision Procedure for Subset Constraints over Regular Languages," has since made its way to PLDI 09. The preprint is available upon request -- please find my contact info at http://www.cs.virginia.edu/~ph4u/ .

By phooji at Mon, 2009-02-09 02:38 | login or register to post comments

String rewriting systems

The existing body of literature surrounding rewriting systems in general, and string rewriting systems in particular, already is a "calculus" of modifying strings. (Toyama's theorem, for example, provides ways to compose rewriting systems as we would have with functions.) This work with path feasibility seems to reuse existing halting results of string rewriting systems -- specifically regarding length -- to enrich analysis of string manipulation by .net languages.

By aed at Sun, 2009-01-18 22:50 | login or register to post comments

Lambda the Ultimate

User login

Navigation