## Asynchronous Proof Processing with Isabelle/Scala and Isabelle/jEdit

Asynchronous Proof Processing with Isabelle/Scala and Isabelle/jEdit. Makarius Wenzel, UITP 2010.

After several decades, most proof assistants are still centered around TTY-based interaction in a tight read-eval-print loop. Even well-known Emacs modes for such provers follow this synchronous model based on single commands with immediate response, meaning that the editor waits for the prover after each command. There have been some attempts to re-implement prover interfaces in big IDE frameworks, while keeping the old interaction model. Can we do better than that?

Ten years ago, the Isabelle/Isar proof language already emphasized the idea of proof document (structured text) instead of proof script (sequence of commands), although the implementation was still emulating TTY interaction in order to be able to work with the then emerging Proof General interface. After some recent reworking of Isabelle internals, to support parallel processing of theories and proofs, the original idea of structured document processing has surfaced again.

Isabelle versions from 2009 or later already provide some support for interactive proof documents with asynchronous checking, which awaits to be connected to a suitable editor framework or full-scale IDE. The remaining problem is how to do that systematically, without having to specify and implement complex protocols for prover interaction.

This is the point where we introduce the new Isabelle/Scala layer, which is meant to expose certain aspects of Isabelle/ML to the outside world. The Scala language (by Martin Odersky) is suﬃciently close to ML in order to model well-known prover concepts conveniently, but Scala also runs on the JVM and can access existing Java libraries directly. By building more and more external system wrapping for Isabelle in Scala, we eventually reach the point where we can integrate the prover seamlessly into existing IDEs (say Netbeans).

To avoid getting side-tracked by IDE platform complexity, our current experiments are focused on jEdit, which is a powerful editor framework written in Java that can be easily extended by plugin modules. Our plugins are written again in Scala for our convenience, and to leverage the Scala actor library for parallel and interactive programming. Thanks to the Isabelle/Scala layer, the Isabelle/jEdit implementation is very small and simple.

I thought this was a nice paper on the pragmatics of incremental, interactive proof editing. I've suspected for a while that as programming languages and IDEs grow more sophisticated and do more computationally-intensive checks at compile time (including but not limited to theorem proving), it will become similarly important to design our languages to support modular and incremental analysis.

However, IDE designs also need more experimentation, and unfortunately the choice of IDEs to extend seem to be limited to archaic systems like Emacs or industrial behemoths like Eclipse or Visual Studio, both of which constrain the scope for new design -- Emacs is too limited, and the API surface of Eclipse/VS is just too big and irregular. (Agda-mode for Emacs is a heroic but somewhat terrifying piece of elisp.)

## Finding and Understanding Bugs in C Compilers

In Finding and Understanding Bugs in C Compilers Xuejun Yang, Yang Chen, Eric Eide, and John Regehr of University of Utah, School of Computing describe Csmith, a fuzzer for testing C compilers. The hard part was avoiding undefined behavior.

Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to ﬁnd compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our ﬁrst contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undeﬁned and unspeciﬁed behaviors that would destroy its ability to automatically ﬁnd wrong code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.

Two bits really stuck out for me. First, formal verification has a real positive impact

The striking thing about our CompCert results is that the middleend bugs we found in all other compilers are absent. As of early 2011, the under-development version of CompCert is the only compiler we have tested for which Csmith cannot ﬁnd wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task. The apparent unbreakability of CompCert supports a strong argument that developing compiler optimizations within a proof framework, where safety checks are explicit and machine-checked, has tangible beneﬁts for compiler users.

And second, code coverage is inadequate for ensuring good test thoroughness for software as complex as a compiler.

Because we ﬁnd many bugs, we hypothesized that randomly generated programs exercise large parts of the compilers that were not covered by existing test suites. To test this, we enabled code coverage monitoring in GCC and LLVM. We then used each compiler to build its own test suite, and also to build its test suite plus 10,000 Csmith-generated programs. Table 3 shows that the incremental coverage due to Csmith is so small as to be a negative result. Our best guess is that these metrics are too shallow to capture Csmith’s effects, and that we would generate useful additional coverage in terms of deeper metrics such as path or value coverage.

## The Habit Programming Language: The Revised Preliminary Report

Habit is a systems programming dialect of Haskell from the High-Assurance Systems Programming (HASP) project at Portland State University. From The Habit Programming Language: The Revised Preliminary Report

This report presents a preliminary design for the programming language Habit, a dialect of Haskell that supports the development of high quality systems software. The primary commitments of the design are as follows:

* Systems programming: Unlike Haskell, which was intended to serve as a general purpose functional programming language, the design of Habit focusses on features that are needed in systems software development. These priorities are reﬂected fairly directly in the new features that Habit provides for describing bit-level and memory-based data representations, the introduction of new syntactic extensions to facilitate monadic programming, and, most signiﬁcantly, the adoption of a call-by-value semantics to improve predictability of execution. The emphasis on systems programming also impacts the design in less direct ways, including assumptions about the expected use of whole program compilation and optimization strategies in a practical Habit implementation.

* High assurance: Although most details of Haskell’s semantics have been formalized at some point in the research literature, there is no consolidated formal description of the whole language. There are also known differences in semantics, particularly with respect to operational behavior, between different Haskell implementations in areas where the Haskell report provides no guidance. Although it is not addressed in the current report, a high-priority for Habit is to provide a full, formal semantics for the complete language that can be used as a foundation for reasoning and formal veriﬁcation, a mechanism for ensuring consistency between implementations, and a basis for reliably predicting details about memory allocation, asymptotic behavior, and resource utilization.

HASP has a couple of postdoc positions open to help with Habit.

## Ghosts of Unix Past: a historical search for design patterns

Not strictly PLT-related, but Neil Brown has contributed an amazing series of articles to Linux Weekly News:

For this series we try to look for patterns which become visible only over an extended time period. As development of a system proceeds, early decisions can have consequences that were not fully appreciated when they were made. If we can find patterns relating these decisions to their outcomes, it might be hoped that a review of these patterns while making new decisions will help to avoid old mistakes or to leverage established successes.

## Pure and Declarative Syntax Definition: Paradise Lost and Regained, Onward 2010

Pure and Declarative Syntax Definition: Paradise Lost and Regained by Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth from Delft

Syntax definitions are pervasive in modern software systems, and serve as the basis for language processing tools like parsers and compilers. Mainstream parser generators pose restrictions on syntax definitions that follow from their implementation algorithm. They hamper evolution, maintainability, and compositionality of syntax definitions. The pureness and declarativity of syntax definitions is lost. We analyze how these problems arise for different aspects of syntax definitions, discuss their consequences for language engineers, and show how the pure and declarative nature of syntax definitions can be regained.

I haven't compared this version with the Onward 2010 version, but they look essentially the same. It seems timely to post this paper, considering the other recent story Yacc is dead. There is not a whole lot to argue against in this paper, since we all "know" the other approaches aren't as elegant and only resort to them for specific reasons such as efficiency. Yet, this is the first paper I know of that tries to state the argument to software engineers.

For example, the Dragon Book, in every single edition, effectively brushes these topics aside. In particular, the Dragon Book does not even mention scannerless parsing as a technique, and instead only explains the "advantages" of using a scanner. Unfortunately, the authors of this paper don't consider other design proposals, either, such as Van Wyk's context-aware scanners from GPCE 2007. It is examples like these that made me wish the paper was a bit more robust in its analysis; the examples seem focused on the author's previous work.

If you are not familiar with the author's previous work in this area, the paper covers it in the references. It includes Martin Bravenboer's work on modular Eclipse IDE support for AspectJ.

## First-class modules: hidden power and tantalizing promises

Oleg just posted a new page, First-class modules: hidden power and tantalizing promises, related to new features in OCaml 3.12 (on LtU).

First-class modules introduced in OCaml 3.12 make type constructors first-class, permitting type constructor abstraction and polymorphism. It becomes possible to manipulate and quantify over types of higher kind. We demonstrate that as a consequence, full-scale, efficient generalized algebraic data types (GADTs) become expressible in OCaml 3.12 as it is, without any further extensions. Value-independent generic programming along the lines of Haskell's popular Generics for the masses'' become possible in OCaml for the first time. We discuss extensions such as a better implementation of polymorphic equality on modules, which can give us intensional type analysis (aka, type-case), permitting generic programming frameworks like SYB.

It includes a nice intro to first-class modules by Frisch and Garrigue: First-class modules and composable signatures in Objective Caml 3.12.

OCaml definitely just got even more interesting.

## Turning down the LAMP: Software specialization for the cloud

Several years ago, a reading group I was in read about the Flux OSKit Project, which aimed to provide a modular basis for operating systems. One of the topics of discussion was the possibility of, and possible benefits of, an application-specific OS. (For example, the fearful spectre of EmacsOS was raised.)

Today, I ran across "Turning down the LAMP: Software specialization for the cloud", which actually makes a pretty strong case for the idea on a virtual machine infrastructure,

...We instead view the cloud as a stable hardware platform, and present a programming framework which permits applications to be constructed to run directly on top of it without intervening software layers. Our prototype (dubbed Mirage) is unashamedly academic; it extends the Objective Caml language with storage extensions and a custom run-time to emit binaries that execute as a guest operating system under Xen. Mirage applications exhibit significant performance speedups for I/O and memory handling versus the same code running under Linux/Xen.

As one example,

Frameworks which currently use (for example) fork(2) on a host to spawn processes would benefit from using cloud management APIs to request resources and eliminate the distinction between cores and hosts.

On the other hand, I suspect that this "unashamedly academic" idea may already be advancing into the commercial arena, if I am correctly reading between the lines of the VMware vFabric tc ServerTM marketing material.

## Software Development with Code Maps

Robert DeLine, Gina Venolia, and Kael Rowan, "Software Development with Code Maps", Communications of the ACM, Vol. 53 No. 8, Pages 48-54, 10.1145/1787234.1787250

Getting lost in a large code base is altogether too easy. The code consists of many thousands of symbols, with few visual landmarks to guide the eye. As a developer navigates the code, she follows hyperlinks, such as jumping from a method caller to a callee, with no visual transition to show where the jump landed. ... Better support for code diagrams in the development environment could support code understanding and communication, and could serve as a "map" to help keep developers oriented. ... Our goal is to integrate maps into the development environment such that developers can carry out most tasks within the map.

Although the focus of this article is largely on "Code Map as UI", there are hints of the possibility that we might eventually see "Code Map as Language Element" (for example, the comment that "An important lesson from the Oahu research is that developers assign meaning to the spatial layout of the code. Code Canvas therefore takes a mixed initiative approach to layout. The user is able to place any box on the map through direct manipulation..."). The same ideas will of course be familiar to anyone who has worked with environments like Simulink, which provide a combination of diagrammatic structuring and textual definition of algorithms. But in the past such environments have only really been found in specific application domains -- control systems and signal processing in the case of Simulink -- while the Code Map idea seems targeted at more general-purpose software development. Is the complexity of large software systems pushing us towards a situation in which graphical structures like Code Maps will become a common part of the syntax of general-purpose programming languages?

## Is Transactional Programming Actually Easier?

Is Transactional Programming Actually Easier?, WDDD '09, Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel.

Chip multi-processors (CMPs) have become ubiquitous, while tools that ease concurrent programming have not. The promise of increased performance for all applications through ever more parallel hardware requires good tools for concurrent programming, especially for average programmers. Transactional memory (TM) has enjoyed recent interest as a tool that can help programmers program concurrently.

The TM research community claims that programming with transactional memory is easier than alternatives (like locks), but evidence is scant. In this paper, we describe a user-study in which 147 undergraduate students in an operating systems course implemented the same programs using coarse and fine-grain locks, monitors, and transactions. We surveyed the students after the assignment, and examined their code to determine the types and frequency of programming errors for each synchronization technique. Inexperienced programmers found baroque syntax a barrier to entry for transactional programming. On average, subjective evaluation showed that students found transactions harder to use than coarse-grain locks, but slightly easier to use than fine-grained locks. Detailed examination of synchronization errors in the students’ code tells a rather different story. Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.

I've recently discovered the Workshop on Duplicating, Deconstructing, and Debunking (WDDD) and have found a handful of neat papers, and this one seemed especially relevant to LtU.

[Edit: Apparently, there is a PPoPP'10 version of this paper with 237 undergraduate students.]

Also, previously on LtU:

Transactional Memory versus Locks - A Comparative Case Study

Despite the fact Tommy McGuire's post mentions Dr. Victor Pankratius's talk was at UT-Austin and the authors of this WDDD'09 paper represent UT-Austin, these are two independent case studies with different programming assignments. The difference in assignments is interesting because it may indicate some statistical noise associated with problem domain complexity (as perceived by the test subjects) and could account for differences between the two studies.

Everyone always likes to talk about usability in programming languages without trying to do it. Some claim it can't even be done, despite the fact Horning and Gannon did work on the subject 3+ decades ago, assessing how one can Language Design to Enhance Program Reliability. This gives a glimpse both on (a) why it is hard (b) how you can still try to do usability testing, rather than determine the truthiness of a language design decision.

## Joe Duffy: A (brief) retrospective on transactional memory

A (brief) retrospective on transactional memory, by Joe Duffy, January 3rd, 2010. Although this is a blog post, don't expect to read it all on your lunch break...

The STM.NET incubator project was canceled May 11, 2010, after beginning public life July 27, 2009 at DevLabs. In this blog post, written 4 months prior to its cancellation, Joe Duffy discusses the practical engineering challenges around implementing Software Transactional Memory in .NET. Note: He starts off with a disclaimer that he was not engaged in the STM.NET project past its initial working group phase.

In short, Joe argues, "Throughout, it became abundantly clear that TM, much like generics, was a systemic and platform-wide technology shift. It didn’t require type theory, but the road ahead sure wasn’t going to be easy." The whole blog post deals with how many implementation challenges platform-wide support for STM would be in .NET, including what options were considered. He does not mention Maurice Herlihy's SXM library approach, but refers to Tim Harris's work several times.

There was plenty here that surprised me, especially when you compare Concurrent Haskell's STM implementation to STM.NET design decisions and interesting debates the team had. In Concurrent Haskell, issues Joe raises, like making Console.WriteLine transactional, are delegated to the type system by the very nature of the TVar monad, preventing programmers from writing such wishywashy code. To be honest, this is why I didn't understand what Joe meant by "it didn't require type theory" gambit, since some of the design concerns are mediated in Concurrent Haskell via type theory. On the other hand, based on the pragmatics Joe discusses, and the platform-wide integration with the CLR they were shooting for, reminds me of The Transactional Memory / Garbage Collection Analogy. Joe also wrote a briefer follow-up post, More thoughts on transactional memory, where he talks more about Barbara Liskov's Argus.