Software Engineering


It is increasingly important for applications to protect user privacy. Unfortunately, it is often non-trivial for programmers to enforce privacy policies. We have developed Jeeves to make it easier for programmers to enforce information flow policies: policies that describe who can see what information flows through a program. Jeeves allows the programmer to write policy-agnostic programs, separately implementing policies on sensitive values from other functionality. Just like Wooster's clever valet Jeeves in Wodehouse's stories, the Jeeves runtime does the hard work, automatically enforcing the policies to show the appropriate output to each viewer.

From what I gather, Jeeves takes Aspect Oriented approach to privacy. This is of course not a new idea. I presume that many of the classic problems with AOP would apply to Jeeves. Likewise, using information flow analysis for handling privacy policies is not an new idea. Combining the two, however, seems like a smart move. Putting the enforcement at the run-time level makes this sound more practical than other ideas I have heard before. Still, I personally think that specifying privacy policies at the end-user level and clarifying the concept of privacy at the normative, legal and conceptual levels are more pressing concerns. Indeed, come to think of it: I don't really recall a privacy breach that was caused by a simple information flow bug. Privacy expectations are broken on purpose by many companies and major data breaches occur when big databases are shared (recall the Netflix Prize thing). Given this, I assume the major use-case is for Apps, maybe even as a technology that someone like Apple could use to enforce the compliance of third-party Apps to their privacy policies.

I haven't looked too closely, so comments from more informed people are welcome.

Jeeves is implemented as an embedded DSL in Scala and Python.

Simple Generators v. Lazy Evaluation

Oleg Kiselyov, Simon Peyton-Jones and Amr Sabry: Simple Generators:

Incremental stream processing, pervasive in practice, makes the best case for lazy evaluation. Lazy evaluation promotes modularity, letting us glue together separately developed stream producers, consumers and transformers. Lazy list processing has become a cardinal feature of Haskell. It also brings the worst in lazy evaluation: its incompatibility with effects and unpredictable and often extraordinary use of memory. Much of the Haskell programming lore are the ways to get around lazy evaluation.

We propose a programming style for incremental stream processing based on typed simple generators. It promotes modularity and decoupling of producers and consumers just like lazy evaluation. Simple generators, however, expose the implicit suspension and resumption inherent in lazy evaluation as computational effects, and hence are robust in the presence of other effects. Simple generators let us accurately reason about memory consumption and latency. The remarkable implementation simplicity and efficiency of simple generators strongly motivates investigating and pushing the limits of their expressiveness.

To substantiate our claims we give a new solution to the notorious pretty-printing problem. Like earlier solutions, it is linear, backtracking-free and with bounded latency. It is also modular, structured as a cascade of separately developed stream transducers, which makes it simpler to write, test and to precisely analyze latency, time and space consumption. It is compatible with effects including IO, letting us read the source document from a file, and format it as we read.

This is fascinating work that shows how to gain the benefits of lazy evaluation - decoupling of producers, transformers, and consumers of data, and producing only as much data as needed - in a strict, effectful setting that works well with resources that need to be disposed of once computation is done, e.g. file handles.

The basic idea is that of Common Lisp signal handling: use a hierarchical, dynamically-scoped chain of handler procedures, which get called - on the stack, without unwinding it - to parameterize code. In this case, the producer code (which e.g. reads a file character by character) is the parameterized code: every time data (a character) is produced, it calls the dynamically innermost handler procedure with the data (it yields the data to the handler). This handler is the data consumer (it could e.g. print the received character to the console). Through dynamic scoping, each handler may also have a super-handler, to which it may yield data. In this way, data flows containing multiple transformers can be composed.

I especially like the OCaml version of the code, which is just a page of code, implementing a dynamically-scoped chain of handlers. After that we can already write map and fold in this framework (fold using a loop and a state cell, notably.) There's more sample code.

This also ties in with mainstream yield.

Koka a function oriented language with effect inference

Koka is a function-oriented programming language that seperates pure values from side-effecting computations, where the effect of every function is automatically inferred. Koka has many features that help programmers to easily change their data types and code organization correctly, while having a small language core with a familiar JavaScript like syntax.

Koka extends the idea of using row polymorphism to encode an effect system and the relations between them. Daan Leijen is the primary researcher behind it and his research was featured previously on LtU, mainly on row polymorphism in the Morrow Language.

So far there's no paper available on the language design, just the slides from a Lang.Next talk (which doesn't seem to have video available at Channel 9), but it's in the program for HOPE 2012.

Effective Scala

It's from Twitter, It's open source, It's about Scala. What's not to like?

Seven Myths of Formal Methods Revisited

Software Engineering with Formal Methods: The Development of a Storm Surge Barrier Control System - Seven Myths of Formal Methods Revisited (2001), by Jan Tretmans, Klaas Wijbrans, Michel Chaudron:

Bos is the software system which controls and operates the storm surge barrier in the Nieuwe Waterweg near Rotterdam. It is a complex, safety-critical system of average size, which was developed by CMG Den Haag B.V., commissioned by Rijkswaterstaat (RWS) – the Dutch Ministry of Transport, Public Works and Water Management. It was completed in October 1998 on time and within budget.

CMG used formal methods in the development of the Bos software. This paper discusses the experiences obtained from their use. Some people claim that the use of formal methods helps in developing correct and reliable software, others claim that formal methods are useless and unworkable. Some of these claims have almost become myths. A number of these myths are described and discussed in a famous article: Seven Myths of Formal Methods [Hal90]. The experiences obtained from using formal methods for the development of Bos will be discussed on the basis of this article. We will discuss to what extent these myths are true for the Bos project.

The data for this survey were collected by means of interviews with software engineers working on the Bos project. These include the project manager, designers, implementers and testers, people who participated from the beginning in 1995 until the end in 1998 as well as engineers who only participated in the implementation phase, and engineers with and without previous, large-scale software engineering experience.

This paper concentrates on the experiences of the software engineers with formal methods. These experiences, placed in the context of the seven myths, are described in section 3. This paper does not discuss technical details about the particular formal methods used or the way they were used; see [Kar97, Kar98] for these aspects. Moreover, formal methods were only one technique used in the development of Bos. The overall engineering approach and the way different methods and techniques were combined to assure the required safetycritical quality, are described [WBG98, WB98]. Testing in Bos is described in more detail in [GWT98], while [CTW99] will give a more systematic analysis of the results of the interviews
with the developers.

Discussion of formal methods and verification has come up a few times here on LtU. In line with the recent discussions on the need for more empirical data in our field, this was an interesting case study on the use of formal methods. The seven myths of formal methods are reviewed in light of a real project:

  1. Myth 1: Formal methods can guarantee that software is perfect
  2. Myth 2: Formal methods are all about program proving
  3. Myth 3: Formal methods are only useful for safety-critical system
  4. Myth 4: Formal methods require highly trained mathematicians
  5. Myth 5: Formal methods increase the cost of developmen
  6. Myth 6: Formal methods are unacceptable to users
  7. Myth 7: Formal methods are not used on real, large-scale software

Extensible Programming with First-Class Cases

Extensible Programming with First-Class Cases, by Matthias Blume, Umut A. Acar, and Wonseok Chae:

We present language mechanisms for polymorphic, extensible records and their exact dual, polymorphic sums with extensible first-class cases. These features make it possible to easily extend existing code with new cases. In fact, such extensions do not require any changes to code that adheres to a particular programming style. Using that style, individual extensions can be written independently and later be composed to form larger components. These language mechanisms provide a solution to the expression problem.

We study the proposed mechanisms in the context of an implicitly typed, purely functional language PolyR. We give a type system for the language and provide rules for a 2-phase transformation: first into an explicitly typed λ-calculus with record polymorphism, and finally to efficient index-passing code. The first phase eliminates sums and cases by taking advantage of the duality with records.

We implement a version of PolyR extended with imperative features and pattern matching—we call this language MLPolyR. Programs in MLPolyR require no type annotations—the implementation employs a reconstruction algorithm to infer all types. The compiler generates machine code (currently for PowerPC) and optimizes the representation of sums by eliminating closures generated by the dual construction.

This is an elegant solution to the expression problem for languages with pattern matching. This paper was posted twice in LtU comments, but it definitely deserves its own story. Previous solutions to the exression problem are rather more involved, like Garrigue's use of recursion and polymorphic variants, because they lack support for extensible records which makes this solution so elegant.

Extensible records and first-class cases unify object-oriented and functional paradigms on a deeper level, since they enable first-class messages to be directly encoded. Add a sensible system for dynamics, and I argue you have most of the power people claim of dynamic languages without sacrificing the safety of static typing.

Microsoft Roslyn Project whitepaper

Microsoft has recently detailed their "Compiler as a Service" initiative in a whitepaper. The whitepaper calls the project Roslyn.

Related, IBM sponsors the Eclipse IMP project for its X10 language (and the X10DT). IMP is also used for Eelco Visser's Spoofax IDE and WebDSL IDE.

Specification and Verification: The Spec# Experience

Mike Barnett, Manuel Fähndrich, K. Rustan M. Leino, Peter Müller, Wolfram Schulte, and Herman Venter, Specification and Verification: The Spec# Experience" Preprint of an article appearing in the June 2011 CACM.

CACM tagline: Can a programming language really help programmers write better programs?

Spec# is a programming system that facilitates the development of correct software. The Spec# language extends C# with contracts that allow programmers to express their design intent in the code. The Spec# tool suite consists of a compiler that emits run-time checks for contracts, a static program verifier that attempts to mathematically prove the correctness of programs, and an integration into the Visual Studio development environment. Spec# shows how contracts and verifiers can be integrated seamlessly into the software development process. This paper reflects on the six-year history of the Spec# project, scientific contributions it has made, remaining challenges for tools that seek to establish program correctness, and prospects of incorporating verification into everyday software engineering.

Spec# is, in some ways, quite similar to JML+ESC/Java2. But Spec# is a language rather than a set of annotations, which allows it to incorporate features such as a non-null type system and a very tight integration with the IDE.

Spec# was previously mentioned on LtU back in 2005.

Passing a Language through the Eye of a Needle

Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar Celes, "Passing a Language through the Eye of a Needle: How the embeddability of Lua impacted its design", ACM Queue vol. 9, no. 5, May 2011.

A key feature of a scripting language is its ability to integrate with a system language. This integration takes two main forms: extending and embedding. In the first form, you extend the scripting language with libraries and functions written in the system language and write your main program in the scripting language. In the second form, you embed the scripting language in a host program (written in the system language) so that the host can run scripts and call functions defined in the scripts; the main program is the host program.
In this article we discuss how embeddability can impact the design of a language, and in particular how it impacted the design of Lua from day one. Lua is a scripting language with a particularly strong emphasis on embeddability. It has been embedded in a wide range of applications and is a leading language for scripting games.

An interesting discussion of some of the considerations that go into supporting embeddability. The design of a language clearly has an influence over the API it supports, but conversely the design of an API can have a lot of influence over the design of the language.

Asynchronous Proof Processing with Isabelle/Scala and Isabelle/jEdit

Asynchronous Proof Processing with Isabelle/Scala and Isabelle/jEdit. Makarius Wenzel, UITP 2010.

After several decades, most proof assistants are still centered around TTY-based interaction in a tight read-eval-print loop. Even well-known Emacs modes for such provers follow this synchronous model based on single commands with immediate response, meaning that the editor waits for the prover after each command. There have been some attempts to re-implement prover interfaces in big IDE frameworks, while keeping the old interaction model. Can we do better than that?

Ten years ago, the Isabelle/Isar proof language already emphasized the idea of proof document (structured text) instead of proof script (sequence of commands), although the implementation was still emulating TTY interaction in order to be able to work with the then emerging Proof General interface. After some recent reworking of Isabelle internals, to support parallel processing of theories and proofs, the original idea of structured document processing has surfaced again.

Isabelle versions from 2009 or later already provide some support for interactive proof documents with asynchronous checking, which awaits to be connected to a suitable editor framework or full-scale IDE. The remaining problem is how to do that systematically, without having to specify and implement complex protocols for prover interaction.

This is the point where we introduce the new Isabelle/Scala layer, which is meant to expose certain aspects of Isabelle/ML to the outside world. The Scala language (by Martin Odersky) is sufficiently close to ML in order to model well-known prover concepts conveniently, but Scala also runs on the JVM and can access existing Java libraries directly. By building more and more external system wrapping for Isabelle in Scala, we eventually reach the point where we can integrate the prover seamlessly into existing IDEs (say Netbeans).

To avoid getting side-tracked by IDE platform complexity, our current experiments are focused on jEdit, which is a powerful editor framework written in Java that can be easily extended by plugin modules. Our plugins are written again in Scala for our convenience, and to leverage the Scala actor library for parallel and interactive programming. Thanks to the Isabelle/Scala layer, the Isabelle/jEdit implementation is very small and simple.

I thought this was a nice paper on the pragmatics of incremental, interactive proof editing. I've suspected for a while that as programming languages and IDEs grow more sophisticated and do more computationally-intensive checks at compile time (including but not limited to theorem proving), it will become similarly important to design our languages to support modular and incremental analysis.

However, IDE designs also need more experimentation, and unfortunately the choice of IDEs to extend seem to be limited to archaic systems like Emacs or industrial behemoths like Eclipse or Visual Studio, both of which constrain the scope for new design -- Emacs is too limited, and the API surface of Eclipse/VS is just too big and irregular. (Agda-mode for Emacs is a heroic but somewhat terrifying piece of elisp.)

XML feed