Software Engineering

Detecting Data Race and Atomicity Violation via Typestate-Guided Static Analysis

Detecting Data Race and Atomicity Violation via Typestate-Guided Static Analysis. Yue Yang, Anna Gringauze, Dinghao Wu, Henning Rohde. Aug. 2008

The correctness of typestate properties in a multithreaded program often depends on the assumption of certain concurrency invariants. However, standard typestate analysis and concurrency analysis are disjoint in that the former is unable to understand threading effects and the latter does not take typestate properties into consideration. We combine these two previously separate approaches and develop a novel typestate-driven concurrency analysis for detecting race conditions and atomicity violations. Our analysis is based on a reformulation of typestate systems in which state transitions of a shared variable are controlled by the locking state of that variable. By combining typestate checking with lockset analysis, we can selectively transfer the typestate to a transient state to simulate the thread interference effect, thus uncovering a new class of typestate errors directly related to low-level or high-level data races. Such a concurrency bug is more likely to be harmful, compared with those found by existing concurrency checkers, because there exists a concrete evidence that it may eventually lead to a typestate error as well. We have implemented a race and atomicity checker for C/C++ programs by extending a NULL pointer dereference analysis. To support large legacy code, our approach does not require a priori annotations; instead, it automatically infers the lock/data guardianship relation and variable correlations. We have applied the toolset to check a future version of the Windows operating system, finding many concurrency errors that cannot be discovered by previous tools.

Typestates extend the ordinary types by recoding the state of objects and allowing the safety violations that stem from operations being invoked on objects that are in the wrong state.

Swift: making web applications secure by construction

Swift is a language-based approach to building web applications that are secure by construction. Swift applications are written in the Jif language, a Java-based language that incorporates "security-typing" to manage the flow of information within an application. The Swift compiler automatically partitions the application code into a client-side JavaScript application and a server-side Java application, with code placement constrained by declarative information flow policies that strongly enforce the confidentiality and integrity of server-side information.

Swift was recently featured in the "Research Highlights" section of the Communications of the ACM, as a condensed version of an earlier conference paper. The original conference paper is Stephen Chong, Jed Liu, Andrew C. Myers, Xin Qi, K. Vikram, Lantian Zheng, and Xin Zheng, Secure web applications via automatic partitioning, Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP'07), pages 31–44, October 2007.

Jif has been mentioned previously on LtU here and here.

Ziggurat

Ziggurat is a framework for writing extensible programming languages, and for defining static semantics for those languages. In other words, it is a language designer's toolkit.

Ziggurat is based on macros. When building a language using Ziggurat, it is easy to make that language extensible by adding a macro system. Ziggurat macros allow for incremental extension of the language by rewriting. What makes Ziggurat different from other macro systems is that Ziggurat allows the language extender to optionally define static semantics for her new language, and connect these static semantics amongst language levels. This makes it possible to write specialized analysis algorithms for the higher-level languages, either for optimization purposes, profiling purposes, debugging purposes, or whatever task analysis is put to.

Strangely enough this project from Olin Shivers and David Fisher was not mentioned here before.

Those with access may want to check out the paper on Ziggurat in the September 2008 double issue of JFP.

Path Feasibility Analysis for String-Manipulating Programs

Path Feasibility Analysis for String-Manipulating Programs. Nikolaj Bjorner, Nikolai Tillmann, Andrei Voronkov.

We discuss the problem of path feasibility for programs manipulating strings using a collection of standard string library functions. We prove results on the complexity of this problem, including its undecidability in the general case and decidability of some special cases. In the context of test-case generation, we are interested in an efficient finite model finding method for string constraints. To this end we develop a two-tier finite model finding procedure. First, an integer abstraction of string constraints are passed to an SMT (Satisfiability Modulo Theories) solver. The abstraction is either unsatisfiable, or the solver produces a model that fixes lengths of enough strings to reduce the entire problem to be finite domain. The resulting fixed-length string constraints are then solved in a second phase. We implemented the procedure in a symbolic execution framework, report on the encouraging results and discuss directions for improving the method further.

The authors note that while strings are a fundamental data type, the main existing way of handling strings in symbolic test-case generation tools is to treat them as regular arrays, and to treat string library subroutines as regular procedure calls, thus turning even simple calls to library functions into loop constructs.

The RiSE of Research in Software Engineering

Wolfram Schulte introduces us to the Research in Software Engineering (RiSE) group.

The RiSE group will start publishing weekly videos on Channel9. The videos will not only describe tools or technologies the researchers are working on, but they will also explain how the tools work through 'classroom' videos.

Screencast announcement at Channel9.
Page in Microsoft Research.

MISRA C++:2008

Probably worth noting MISRA-C++. Not much to go on since you have to pay for the document that outlines the standards.

MISRA C++:2008 was finally released on June 5, 2008, following 3 years of work by a group of willing volunteers. They set out to craft a set of rules for the safe use of C++ in critical systems...

It seems not so long ago that the insurrection to fork a safer subset of C++ in Europe was suppressed. Instead of redefining the language, the efforts are now on trying to enforce coding standards and best practices. Try to solve things on the engineering side, rather than the programming language specification side.

Verifiable Functional Purity in Java

Verifiable Functional Purity in Java. Matthew Finifter, Adrian Mettler, Naveen Sastry, and David Wagner. To appear at 15th ACM Conference on Computer and Communication Security (CCS 2008).

Proving that particular methods within a code base are functionally pure - deterministic and side-effect free - would aid verification of security properties including function invertibility, reproducibility of computation, and safety of untrusted code execution. Until now it has not been possible to automatically prove a method is functionally pure within a high-level imperative language in wide use such as Java. We discuss a technique to prove that methods are functionally pure by writing programs in a subset of Java called Joe-E; a static verifier ensures that programs fall within the subset. In Joe-E, pure methods can be trivially recognized from their method signature.

The paper includes a nice discussion of security benefits that can stem from being able to identify pure functions (of course, it is not obvious that guarantees at the programming language level are maintained at the run time level). I am sure many here have opinions about whether it makes more sense to try to graft purity on an imperative language, exposing it as an added feature, or to move programmers to functional languages..

UpgradeJ: Incremental Typechecking for Class Upgrades

UpgradeJ: Incremental Typechecking for Class Upgrades, Gavin Bierman, Matthew Parkinson and James Noble.

One of the problems facing developers is the constant evolution of components that are used to build applications. This evolution is typical of any multi-person or multi-site software project. How can we program in this environment? More precisely, how can language design address such evolution? In this paper we attack two significant issues that arise from constant component evolution: we propose language-level extensions that permit multiple, co-existing versions of classes and the ability to dynamically upgrade from one version of a class to another, whilst still maintaining type safety guarantees and requiring only lightweight extensions to the runtime infrastructure. We show how our extensions, whilst intuitive, provide a great deal of power by giving a number of examples. Given the subtlety of the problem, we formalize a core fragment of our language and prove a number of important safety properties.

There has been an energetic discussion of API evolution in the forum, so when I saw this paper I thought it might be of interest to LtU readers.

In Praise of Scripting: Real Programming Pragmatism

Ronald Loui, In Praise of Scripting: Real Programming Pragmatism, IEEE Computer, vol. 41, no. 7, July 2008. [Openly accessible draft here]

The July IEEE Computer carries an article arguing for the use of scripting languages as first programming languages, and also arguing for a greater study of what the author calls "language pragmatics" (the original article is behind the IEEE paywall, but you can find a draft that has roughly the same content here). The argument for using scripting languages as educational languages can be summed up by Loui's abstract:

The author recommends that scripting, not Java, be taught first, asserting that students should learn to love their own possibilities before they learn to loathe other people's restrictions.
The bulk of the article is devoted to exploring this basic theme in more depth, and provides an interesting contrast to the arguments in favor of moving away from Java (and scripting languages) advanced in Computer Science Education: Where Are the Software Engineers of Tomorrow? (discussed earlier on LtU here).


Loui spends the latter part of the article arguing that, in addition to syntax and semantics, research on programming language should include a formal study of language pragmatics. According to Loui, a formal study of pragmatics would address questions such as:

  • What is the average lifetime of a program written in language X for programmers of type Y, for a program of type Z?
  • What is the average time spent authoring versus debugging a program in language X for programmers of type Y, for a program of type Z?
  • What is the consumption of short-term memory when programming in language X for programmers of type Y, for a program of type Z?

Lightweight Monadic Regions

Oleg Kiselyov and Chung-chieh Shan. Lightweight Monadic Regions. Haskell'08.
We present Haskell libraries that statically ensure the safe use of resources such as file handles. We statically prevent accessing an already closed handle or forgetting to close it. The libraries can be trivially extended to other resources such as database connections and graphic contexts...

Region annotations are part of an expression's inferred type. Our new Haskell encoding of monadic regions as monad transformers needs no witness terms. It assures timely deallocation even when resources have markedly different lifetimes and the identity of the longest-living resource is determined only dynamically.

For contrast, we also implement a Haskell library for manual resource management, where deallocation is explicit and safety is assured by a form of linear types. We implement the linear typing in Haskell with the help of phantom types and a parameterized monad to statically track the type-state of resources.

I am starting to think we need a department for effect systems and related topics (though we managed without a monads department!)...

You'll probably want to read the code, so go ahead. The code makes it plain which features of the type system are needed to achieve the end result.

XML feed