Software Engineering

Lightweight Monadic Regions

Oleg Kiselyov and Chung-chieh Shan. Lightweight Monadic Regions. Haskell'08.
We present Haskell libraries that statically ensure the safe use of resources such as file handles. We statically prevent accessing an already closed handle or forgetting to close it. The libraries can be trivially extended to other resources such as database connections and graphic contexts...

Region annotations are part of an expression's inferred type. Our new Haskell encoding of monadic regions as monad transformers needs no witness terms. It assures timely deallocation even when resources have markedly different lifetimes and the identity of the longest-living resource is determined only dynamically.

For contrast, we also implement a Haskell library for manual resource management, where deallocation is explicit and safety is assured by a form of linear types. We implement the linear typing in Haskell with the help of phantom types and a parameterized monad to statically track the type-state of resources.

I am starting to think we need a department for effect systems and related topics (though we managed without a monads department!)...

You'll probably want to read the code, so go ahead. The code makes it plain which features of the type system are needed to achieve the end result.

Catch me if you can: Towards type-safe, hierarchical, lightweight, polymorphic and efficient error management in OCaml

Catch me if you can: Towards type-safe, hierarchical, lightweight, polymorphic and efficient error management in OCaml, by David Teller, Arnaud Spiwack, Till Varoquaux:

This is the year 2008 and ML-style exceptions are everywhere. Most modern languages, whether academic or industrial, feature some variant of this mechanism. Languages such as Java even have a degree of out-of-the-box static coverage-checking for such exceptions, which is currently not available for ML languages, at least not without resorting to external tools.

In this document, we demonstrate a design principle and a tiny library for managing errors in a functional manner, with static coverage-checking, automatically-inferred, structurally typed and hierarchical exceptional cases, all of this for what we believe is a reasonable run-time cost. Our work is based on OCaml and features simple uses of higher-order programming, low-level exceptions, phantom types, polymorphic variants and compile-time code
rewriting.

Exhaustively checked, user-friendly exception handling was a bit of an open problem for awhile. As the paper details, languages supported either cumbersome, exhaustively checked polymorphic exceptions, as in Haskell, or we had unchecked easily extensible monomorphic exceptions, as in ML, or we had checked, extensible exceptions using a universal type as in Java.

Supporting exhaustively checked, easily extensible polymorphic exceptions seemed quite a challenge, which this paper solves using monadic error handling and nested polymorphic variants. The paper also gives a good overview of current techniques of exception checking in OCaml, ie. ocamlexc.

The performance of such exceptions is understandably lower than native exceptions, given all the thunking and indirection that monads entail. The authors attempt various implementations and test their performance against native exceptions. Ultimately, monadic error management seems acceptable for actual error handling, but not for control flow as native exceptions are sometimes used in OCaml.

One interesting extension is to consider how efficient the implementations would be given more sophisticated control flow operators, such as continuations, coroutines, or delimited continuations, or whether native exceptions can be salvaged using a type and effects system in place of monads.

The irreducible physicality of security properties

The recent discussion around Safe and Secure Software in Ada involved some amount of discussion around what is involved in proving software secure, and what role do PLs play in this. I recommend two papers for further discussion:

  • First, Rao & Rohatgi (2001), EMpowering Side-channel attacks, which discusses a fairly new technology for gathering information from running systems by monitoring their EM emissions; and
  • Rae & Wildman (2003), A Taxonomy of Attacks on Secure Devices, which provides a synthetic classification of attacks on computer systems based on the attackers degree of access to the machinery and the attacker's objectives, and which catalogues a range of attacks into the classification.

So I hereby advance three slogans:

  1. Security is physical: neither applications nor operating systems can satisfy elementary security properties, but only deployed computer systems. This is because elementary security properties are about what the attacker does, which ultimately has a physical basis;
  2. Security is non-modular: Programming languages and software engineering practices can ensure that software possesses properties helpful to security, but the properties are only meaningful in the context of a strategy to ensure a computer system satisfies its security policy;
  3. We should not talk of secure programming languages and secure programs, such talk does mislead; to talk instead of software being secureable might promote better understanding.

Edited following Dave Griffith's remarks.

Applied Metamodelling: A Foundation for Language Driven Development

Applied Metamodelling: A Foundation for Language Driven Development (2004)
by Tony Clark, Paul Sammut, James Willans

An excerpt:

Language-driven development is fundamentally based on the ability to rapidly design new languages and tools in a unified and interoperable manner. We argue that existing technologies do not provide this capability, but a language engineering approach based on metamodelling can. The detailed study of metamodelling and how it can realise the Language-Driven Development vision will form the focus for the remainder of this book.

In software engineering circles the term "language driven development" is synonymous with "language oriented programming", a term which LtU members are more familiar with (thanks to Martin Ward's article Language Oriented Programming which first appeared in 1994, and then Martin Fowler's essays on the topic). The book hasn't appeared on the radar here on LtU, despite 41 citations. I suspect this is due in part to only one citation at Citeseer, and the lack of cross-talk between computer scientists and software engineers.

There are a lot of similarities between the XMF language (discussion at LtU) and that of the Katahdin language (discussion at LtU). Other related discussions here at LtU, include Language Workbenches: The Killer App for DSLs - about the essay by Martin Fowler, Ralph Johnson: Language workbenches - a response to Fowler's essay, XActium - Lightweight Language Engineering? - which discusses an essay about a previous version of XMF, Generating Interpreters? , Language Oriented Programming - discusses an essay by Jetbrain's Sergey Dmitriev, "Language Oriented Programming" Meta Programming System - discussion of the Jetbrain MPS system, The DSL, MDA, UML thing again... - an older discussion on the relationship between DSLs and MDA.

(Disclaimer: Some may notice that I am mentioned on the XMF web site, but this is just because I subjected their XMF language to a number of grueling challenges which they passed with flying colors: see the language snippets in the documentation. I have no affiliation with their company.)

Software Craftsmanship: Apprentice to Journeyman

O'Reilly is hosting a collaborative book/wiki called Software Craftsmanship: Apprentice to Journeyman. It's structured as a series of "recipes" on how to approach different aspects of software development.

Jumbala : An Action Language for UML State Machines

Jumbala : An Action Language for UML State Machines, Juro Dubrovin, Master's Thesis.

UML 2.0 is a language for modeling complex software systems. A UML model may describe the dynamic aspects of software as well as the static structure. We concentrate on models of reactive systems, such as embedded controllers or telecommunications switches. The behavior of such systems is modeled using UML state machines. Although UML defines the structure of state machines, it leaves open the choice of an action language,which is the language used to specify how the transitions of a state machine affect the configuration of the underlying model.

A UML action language named Jumbala is introduced. The language has been designed as part of a project where the goal is to formally analyze behavioral UML models. Jumbala is based on the Java programming language. It has nearly the same syntax and semantics for statements and expressions as Java. Some new programming constructs have been added to facilitate state machine modeling. Jumbala also supports object-oriented programming with classes and inheritance.

An interpreter that parses and executes Jumbala programs has been developed. The interpreter will be part of a prototype tool set for analyzing the behavior of reactive computer systems modeled in UML.

This is interesting because it is another example of efforts from the modeling community towards combining models and programming languages to provide a single compilable specification of software. Some of these efforts are being coordinated using the term model-driven architecture (MDA).

[edit: fixed formatting issues]

Kermeta Programming Language

In my recent adventures researching modeling languages I came across a language not previously mentioned on Lambda-the-Ultimate.org called Kermeta. From the reference manual:

Kermeta is a Domain Specific Language dedicated to metamodel engineering. It fills the gap let by MOF which defines only the structure of meta-models, by adding a way to specify static semantic (similar to OCL) and dynamic semantic (using operational semantic in the operation of the metamodel). Kermeta uses the object-oriented paradigm like Java or Eiffel.

There is a list of published papers related to Kermeta here.

J&: Nested Intersection for Scalable Software Composition

J&: Nested Intersection for Scalable Software Composition by Nathaniel Nystrom, Xin Qi, Andrew C. Myers. 2006.
We identify the following requirements for general extension and composition of software systems:
  1. Orthogonal extension: Extensions may require both new data types and new operations.
  2. Type safety: Extensions cannot create run-time type errors.
  3. Modularity: The base system can be extended without modifying or recompiling its code.
  4. Scalability: Extensions should be scalable. The amount of code needed should be proportional to the functionality added.
  5. Non-destructive extension: The base system should still be available for use within the extended system.
  6. Composability of extensions.
The first three of these requirements correspond to Wadler’s expression problem. Scalability (4) is often but not necessarily satisfied by supporting separate compilation; it is important for extending large software. Non-destructive extension (5) enables existing clients of the base system and also the extended system itself to interoperate with code and data of the base system, an important requirement for backward compatibility. Nested inheritance addresses the first five requirements, but it does not support extension composition. Nested intersection adds this capability.
Compare this approach to one taken by Scala (or read the section 7).

DySy: Dynamic Symbolic Execution for Invariant Inference

DySy: Dynamic Symbolic Execution for Invariant Inference. Christoph Csallner, Nikolai Tillmann, Yannis Smaragdakis

Dynamically discovering likely program invariants from concrete test executions has emerged as a highly promising software engineering technique. Dynamic invariant inference has the advantage of succinctly summarizing both “expected”program inputs and the subset of program behaviors that is normal under those inputs. In this paper, we introduce a technique that can drastically increase the relevance of inferred invariants, or reduce the size of the test suite required to obtain good invariants. Instead of falsifying invariants produced by pre-set patterns, we determine likely program invariants by combining the concrete execution of actual test cases with a simultaneous symbolic execution of the same tests. The symbolic execution produces abstract conditions over program variables that the concrete tests satisfy during their execution. In this way, we obtain the benefits of dynamic inference tools like Daikon: the inferred invariants correspond to the observed program behaviors. At the same time, however, our inferred invariants are much more suited to the program at hand than Daikon’s hardcoded invariant patterns. The symbolic invariants are literally derived from the program text itself, with appropriate value substitutions as dictated by symbolic execution. We implemented our technique in the DySy tool, which utilizes a powerful symbolic execution and simplification engine. The results confirm the benefits of our approach. In Daikon’s prime example benchmark, we infer the majority of the interesting Daikon invariants, while eliminating invariants that a human user is likely to consider irrelevant.

The Daikon invariant detector mentioned in the abstract is here.

Seems like a pretty straightforward idea, I guess, but as always: God is in the details.

Samurai - Protecting Critical Data in Unsafe Languages

Samurai - Protecting Critical Data in Unsafe Languages. Karthik Pattabiraman, Vinod Grover, Benjamin G. Zorn.

Programs written in type-unsafe languages such as C and C++ incur costly memory errors that result in corrupted data structures, program crashes, and incorrect results. We present a data-centric solution to memory corruption called critical memory, a memory model that allows programmers to identify and protect data that is critical for correct program execution. Critical memory defines operations to consistently read and update critical data, and ensures that other non-critical updates in the program will not corrupt it. We also present Samurai, a runtime system that implements critical memory in software. Samurai uses replication and forward error correction to provide probabilistic guarantees of critical memory semantics. Because Samurai does not modify memory operations on non-critical data, the majority of memory operations in programs run at full speed, and Samurai is compatible with third party libraries. Using both applications, including a Web server, and libraries (an STL list class and a memory allocator), we evaluate the performance overhead and fault tolerance that Samurai provides.

An acceptable overhead for the pleasure of using an unsafe language? You decide.

XML feed