Software Engineering

Reflective Program Generation with Patterns

Reflective Program Generation with Patterns. Manuel Fähndrich, Michael Carbin, James R. Larus. October 2006.

Runtime reflection facilities, as present in Java and .NET, are powerful mechanisms for inspecting existing code and metadata, as well as generating new code and metadata on the fly. Such power does come at a high price though. The runtime reflection support in Java and .NET imposes a cost on all programs, whether they use reflection or not, simply by the necessity of keeping all metadata around and the inability to optimize code because of future possible code changes. A second---often overlooked---cost is the difficulty of writing correct reflection code to inspect or emit new metadata and code and the risk that the emitted code is not well-formed. In this paper we examine a subclass of problems that can be addressed using a simpler mechanism than runtime reflection, which we call compile-time reflection.We argue for a high-level construct called a transform that allows programmers to write inspection and generation code in a pattern matching and template style, avoiding at the same time the complexities of reflection APIs and providing the benefits of staged compilation in that the generated code and metadata is known to be well-formed and type safe ahead of time.

Macros, multi-staged programming etc. are the appropriate buzzowrds.

LtU readers will probably be interested in the STM example (see sec. 7.1)

Lightweight Static Capabilitites (II)

The slides for the talk discussed here are now available for download. Like all of Ken and Oleg's work, this stuff is both cool and important.

Keep in mind that the talk is quite different from the paper. The safety claims were formalized and proved in Twelf: list example, array example. To follow the proofs you should read them alongside the presentation slides. I am told that the first file might change soon, to reflect a more general proof. Perhaps Ken or Oleg would like to comment on the experience of doing mechanized proofs, a subject the comes up regularly on LtU.

LtU newcomers, who already managed to take in a little Haskell or ML, may want to spend a little time chewing on this, and ask questions if something remains unclear, since this work may eventually have practical importance, and can teach quite a few interesting techniques.

The Daikon Invariant Detector

Daikon

Daikon is an implementation of dynamic detection of likely invariants; that is, the Daikon invariant detector reports likely program invariants. An invariant is a property that holds at a certain point or points in a program; these are often seen in assert statements, documentation, and formal specifications. Invariants can be useful in program understanding and a host of other applications. Examples include “.field > abs(y)”; “y = 2*x+3”; “array a is sorted”; “for all list objects lst, lst.next.prev = lst”; “for all treenode objects n, n.left.value < n.right.value”; “p != null => p.content in myArray”; and many more. You can extend Daikon to add new properties (see Enhancing Daikon output).

Dynamic invariant detection runs a program, observes the values that the program computes, and then reports properties that were true over the observed executions.

Daikon can detect properties in C, C++, Java, Perl, and IOA programs; in spreadsheet files; and in other data sources. (Dynamic invariant detection is a machine learning technique that can be applied to arbitrary data.) It is easy to extend Daikon to other applications; as one example, an interface exists to the Java PathFinder model checker.

I spend a lot of time here talking about static typing, but let's face it: most often we're dealing with existing code that probably isn't written in a language with a very expressive type system, and rarely has been formally specified, whether through an expressive type system or otherwise. Daikon is interesting because it attempts to learn important properties of a piece of software by execution and observation. Combine it with a model checker like Java PathFinder, and you have an unusually powerful means of evolving the correctness of even quite complex code. There's also a relationship to the already-mentioned JML and ESC/Java 2, which in turn has a connection to the popular JUnit unit-testing framework. In short, the gap between compile time and runtime, and static vs. dynamic typing, seems to be narrowed in powerful ways by these, and related, tools.

Verified Software: Theories, Tools, and Experiments

Verified Software: Theories, Tools, and Experiments, VSTTE 2006, Workshop proceedings. K. Rustan M. Leino; Wolfram Schulte. August 2006.

The papers are:

  • Formalising a High-Performance Microkernel written in Haskell, using Isabelle/HOL.
  • Automated Verification of UPC Memory Consistency. UPC is a shared memory extension of C.
  • Automated Model-based Verification of Object-Oriented Code, describing ESpec which is a suite of tools that facilitate the testing and verification of Eiffel programs.
  • Cross-Verification of JML Tools: An ESC/Java2 Case Study which concludes that the use of JML RAC uncovered desgin flaws that ESC/Java2 was unable to report.
  • Static Stability Analysis of Embedded, Autocoded Software.

Revisiting Google's MapReduce

Google's MapReduce Programming Model -- Revisited. Ralf Lämmel.

MapReduce is a programming model (and an associated implementation) used at Google for processing large amounts of input data (such as millions of documents) and generating keyed or indexed query results (such as the indexes used for Google's web search). The programming model is stunningly simple, and it allows the programmer to largely abstract from the parallel and distributed execution of the data-processing computations. Parallel execution is promoted by the assumed skeleton of MapReduce computations. Load balancing, network performance and fault tolerance are taken care of by the MapReduce implementation.

We revisit the MapReduce programming model in an attempt to provide a rigorous description of the model. We focus on the key abstraction for MapReduce computations; this abstraction is parameterized by the problem-specific ingredients for data extraction and reduction. We use Haskell as a lightweight specification language to capture the essence of MapReduce computations in a succinct, executable and strongly typed manner. Our study substantiates that clarity, generality and correctness of designs (or the presentations thereof) are easily improved, if modest functional programming skills are put to work.

This is a fun example of type-directed exploration and prototyping, and pokes some gentle fun of Google's (ab)use of standard terminology. It's neat to get a glimpse into the programming process of a very experienced Haskell programmer.

(From the forums.)

Software Extension and Integration with Type Classes

Software Extension and Integration with Type Classes. Ralf Lämmel and Klaus Ostermann.

The abilities to extend a software module and to integrate a software module into an existing software system without changing existing source code are fundamental challenges in software engineering and programming-language design. We show that these challenges can be tackled, at the level of language expressiveness, by using the language concept of type classes, as it is available in the functional programming language Haskell. A detailed comparison with related work shows that type classes provide a powerful framework in which solutions to known software extension and integration problems can be provided. We also pinpoint several limitations of type classes in this context.

We've had a number of papers lately with solutions to the expression problem and related extensibility challenges, using various techniques in various languages. Here's one exploring the expressiveness of Haskell's type classes.

It's extremely instructive to compare different approaches to these now-standard problems, and in fact I wonder whether this would make an ineresting approach to a programming languages survey course: In CS 3xx we explore and compare a number of programming languages by exploring idiomatic solutions to standard software engineering challenges.

The authors are looking for comments on this draft for a few more days.

A Comparison of Ada and Real-Time Java for Safety-Critical Applications

The presentation slides for this Ada-Europe paper are online (the paper itself is proabably behind a paywall). The authors are Ben Brosgol from AdaCore and Andy Wellings from the University of York (UK) , really the guys to read if you are interested in these topics.

Some of the issues alluded to in the slides were discussed here in the past, either in general discussions about Ada or in discussions about the specific issues (RTSJ, async transfer of control etc.)

Failure-oblivious computing

There've been a couple of threads recently about safety-critical code (Rules for, and in real-time Java). Safety-critical code necessarily includes careful handling of failure situations. Or does it? Here's another take on failure handling, from the opposite direction:

Enhancing Server Availability and Security Through Failure-Oblivious Computing (Rinard et al., 2004) was originally presented at OSDI '04, but hasn't previously been featured here:

We present a new technique, failure-oblivious computing, that enables servers to execute through memory errors without memory corruption. Our safe compiler for C inserts checks that dynamically detect invalid memory accesses. Instead of terminating or throwing an exception, the generated code simply discards invalid writes and manufactures values to return for invalid reads, enabling the server to continue its normal execution path.

We have applied failure-oblivious computing to a set of widely-used servers from the Linux-based open-source computing environment. Our results show that our techniques 1) make these servers invulnerable to known security attacks that exploit memory errors, and 2) enable the servers to continue to operate successfully to service legitimate requests and satisfy the needs of their users even after attacks trigger their memory errors.

The paper includes descriptions of how this technique was applied with good results to servers such as Apache and Sendmail, as well as to client programs such as Pine, Mutt, and Midnight Commander.

The paper also raises concerns about the potential for such techniques to create a bystander effect (although the term moral hazard might be more appropriate here), influencing programmers to be less careful about error handling because they have a safety net.

This work was performed on programs written in C, and there's a temptation to think that the approach is only applicable to memory-unsafe languages. However, there's a connection here to the approach used by some of the classic shell scripting languages and their descendants such as Perl, in which certain kinds of failure are silently tolerated in the interests of keeping the program running. The approach could also have potential applications in other memory-safe languages, providing the potential for higher-availability programs, as noted in the paper's conclusion.

While most language implementors aren't going to rush to incorporate failure-oblivious approaches in their languages, the positive results obtained from this work are thought-provoking, and could inspire other less traditional but effective ways of handling failure.

Variance and Generalized Constraints for C# Generics

Variance and Generalized Constraints for C# Generics. Burak Emir, Andrew J. Kennedy, Claudio Russo, Dachuan Yu. July 2006

Generic types in C# behave invariantly with respect to sub-typing. We propose a system of type-safe variance for C# that supports the declaration of covariant and contravariant type parameters on generic types. To support more widespread application of variance we also generalize the existing constraint mechanism with arbitrary subtype assertions on classes and methods. This extension is useful even in the absence of variance, and subsumes equational constraints proposed for Generalized Algebraic Data Types (GADTs). We formalize the subtype relation in both declarative and syntax-directed style, and describe and prove the correctness of algorithms for constraint closure and subtyping. Finally, we formalize and prove a type safety theorem for a featherweight language with variant classes and generalized constraints.

Discussion of previous C# GADT paper on LtU.

I am unsure about use-site versus definition-site variance declerations. It would be interesting to hear what others think.

Also check out the LtU discussion on wildcards in Java.

Code Generation Network

It's been quite a while since I visited codegeneration.org, and it seems like the site grew considerably, so you might want to check it out again too.

Code generation is an important programming technique (not to be confused with the code generation phase of compilers), which I am sure everyone here is familiar with. It seems to me that the percentage of programmers who know about code generation is relatively small. Am I right in this assumption? I am not asking about people actually using the technique, mind you, just about knowing that it exists and what it means, and don't think the basic idea is "strange" or involves dark magic.

I wonder where, if anywhere, should programmers (and CS students) learn about it. And no, the answer well, on LtU of course isn't a good option!

XML feed