Software Engineering

Pure and Declarative Syntax Definition: Paradise Lost and Regained, Onward 2010

Pure and Declarative Syntax Definition: Paradise Lost and Regained by Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth from Delft

Syntax definitions are pervasive in modern software systems, and serve as the basis for language processing tools like parsers and compilers. Mainstream parser generators pose restrictions on syntax definitions that follow from their implementation algorithm. They hamper evolution, maintainability, and compositionality of syntax definitions. The pureness and declarativity of syntax definitions is lost. We analyze how these problems arise for different aspects of syntax definitions, discuss their consequences for language engineers, and show how the pure and declarative nature of syntax definitions can be regained.

I haven't compared this version with the Onward 2010 version, but they look essentially the same. It seems timely to post this paper, considering the other recent story Yacc is dead. There is not a whole lot to argue against in this paper, since we all "know" the other approaches aren't as elegant and only resort to them for specific reasons such as efficiency. Yet, this is the first paper I know of that tries to state the argument to software engineers.

For example, the Dragon Book, in every single edition, effectively brushes these topics aside. In particular, the Dragon Book does not even mention scannerless parsing as a technique, and instead only explains the "advantages" of using a scanner. Unfortunately, the authors of this paper don't consider other design proposals, either, such as Van Wyk's context-aware scanners from GPCE 2007. It is examples like these that made me wish the paper was a bit more robust in its analysis; the examples seem focused on the author's previous work.

If you are not familiar with the author's previous work in this area, the paper covers it in the references. It includes Martin Bravenboer's work on modular Eclipse IDE support for AspectJ.

First-class modules: hidden power and tantalizing promises

Oleg just posted a new page, First-class modules: hidden power and tantalizing promises, related to new features in OCaml 3.12 (on LtU).

First-class modules introduced in OCaml 3.12 make type constructors first-class, permitting type constructor abstraction and polymorphism. It becomes possible to manipulate and quantify over types of higher kind. We demonstrate that as a consequence, full-scale, efficient generalized algebraic data types (GADTs) become expressible in OCaml 3.12 as it is, without any further extensions. Value-independent generic programming along the lines of Haskell's popular ``Generics for the masses'' become possible in OCaml for the first time. We discuss extensions such as a better implementation of polymorphic equality on modules, which can give us intensional type analysis (aka, type-case), permitting generic programming frameworks like SYB.

It includes a nice intro to first-class modules by Frisch and Garrigue: First-class modules and composable signatures in Objective Caml 3.12.

OCaml definitely just got even more interesting.

Turning down the LAMP: Software specialization for the cloud

Several years ago, a reading group I was in read about the Flux OSKit Project, which aimed to provide a modular basis for operating systems. One of the topics of discussion was the possibility of, and possible benefits of, an application-specific OS. (For example, the fearful spectre of EmacsOS was raised.)

Today, I ran across "Turning down the LAMP: Software specialization for the cloud", which actually makes a pretty strong case for the idea on a virtual machine infrastructure,

...We instead view the cloud as a stable hardware platform, and present a programming framework which permits applications to be constructed to run directly on top of it without intervening software layers. Our prototype (dubbed Mirage) is unashamedly academic; it extends the Objective Caml language with storage extensions and a custom run-time to emit binaries that execute as a guest operating system under Xen. Mirage applications exhibit significant performance speedups for I/O and memory handling versus the same code running under Linux/Xen.

As one example,

Frameworks which currently use (for example) fork(2) on a host to spawn processes would benefit from using cloud management APIs to request resources and eliminate the distinction between cores and hosts.

On the other hand, I suspect that this "unashamedly academic" idea may already be advancing into the commercial arena, if I am correctly reading between the lines of the VMware vFabric tc ServerTM marketing material.

Software Development with Code Maps

Robert DeLine, Gina Venolia, and Kael Rowan, "Software Development with Code Maps", Communications of the ACM, Vol. 53 No. 8, Pages 48-54, 10.1145/1787234.1787250

Getting lost in a large code base is altogether too easy. The code consists of many thousands of symbols, with few visual landmarks to guide the eye. As a developer navigates the code, she follows hyperlinks, such as jumping from a method caller to a callee, with no visual transition to show where the jump landed. ... Better support for code diagrams in the development environment could support code understanding and communication, and could serve as a "map" to help keep developers oriented. ... Our goal is to integrate maps into the development environment such that developers can carry out most tasks within the map.

Although the focus of this article is largely on "Code Map as UI", there are hints of the possibility that we might eventually see "Code Map as Language Element" (for example, the comment that "An important lesson from the Oahu research is that developers assign meaning to the spatial layout of the code. Code Canvas therefore takes a mixed initiative approach to layout. The user is able to place any box on the map through direct manipulation..."). The same ideas will of course be familiar to anyone who has worked with environments like Simulink, which provide a combination of diagrammatic structuring and textual definition of algorithms. But in the past such environments have only really been found in specific application domains -- control systems and signal processing in the case of Simulink -- while the Code Map idea seems targeted at more general-purpose software development. Is the complexity of large software systems pushing us towards a situation in which graphical structures like Code Maps will become a common part of the syntax of general-purpose programming languages?

Is Transactional Programming Actually Easier?

Is Transactional Programming Actually Easier?, WDDD '09, Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel.

Chip multi-processors (CMPs) have become ubiquitous, while tools that ease concurrent programming have not. The promise of increased performance for all applications through ever more parallel hardware requires good tools for concurrent programming, especially for average programmers. Transactional memory (TM) has enjoyed recent interest as a tool that can help programmers program concurrently.

The TM research community claims that programming with transactional memory is easier than alternatives (like locks), but evidence is scant. In this paper, we describe a user-study in which 147 undergraduate students in an operating systems course implemented the same programs using coarse and fine-grain locks, monitors, and transactions. We surveyed the students after the assignment, and examined their code to determine the types and frequency of programming errors for each synchronization technique. Inexperienced programmers found baroque syntax a barrier to entry for transactional programming. On average, subjective evaluation showed that students found transactions harder to use than coarse-grain locks, but slightly easier to use than fine-grained locks. Detailed examination of synchronization errors in the students’ code tells a rather different story. Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.

I've recently discovered the Workshop on Duplicating, Deconstructing, and Debunking (WDDD) and have found a handful of neat papers, and this one seemed especially relevant to LtU.

[Edit: Apparently, there is a PPoPP'10 version of this paper with 237 undergraduate students.]

Also, previously on LtU:

Transactional Memory versus Locks - A Comparative Case Study

Despite the fact Tommy McGuire's post mentions Dr. Victor Pankratius's talk was at UT-Austin and the authors of this WDDD'09 paper represent UT-Austin, these are two independent case studies with different programming assignments. The difference in assignments is interesting because it may indicate some statistical noise associated with problem domain complexity (as perceived by the test subjects) and could account for differences between the two studies.

Everyone always likes to talk about usability in programming languages without trying to do it. Some claim it can't even be done, despite the fact Horning and Gannon did work on the subject 3+ decades ago, assessing how one can Language Design to Enhance Program Reliability. This gives a glimpse both on (a) why it is hard (b) how you can still try to do usability testing, rather than determine the truthiness of a language design decision.

Joe Duffy: A (brief) retrospective on transactional memory

A (brief) retrospective on transactional memory, by Joe Duffy, January 3rd, 2010. Although this is a blog post, don't expect to read it all on your lunch break...

The STM.NET incubator project was canceled May 11, 2010, after beginning public life July 27, 2009 at DevLabs. In this blog post, written 4 months prior to its cancellation, Joe Duffy discusses the practical engineering challenges around implementing Software Transactional Memory in .NET. Note: He starts off with a disclaimer that he was not engaged in the STM.NET project past its initial working group phase.

In short, Joe argues, "Throughout, it became abundantly clear that TM, much like generics, was a systemic and platform-wide technology shift. It didn’t require type theory, but the road ahead sure wasn’t going to be easy." The whole blog post deals with how many implementation challenges platform-wide support for STM would be in .NET, including what options were considered. He does not mention Maurice Herlihy's SXM library approach, but refers to Tim Harris's work several times.

There was plenty here that surprised me, especially when you compare Concurrent Haskell's STM implementation to STM.NET design decisions and interesting debates the team had. In Concurrent Haskell, issues Joe raises, like making Console.WriteLine transactional, are delegated to the type system by the very nature of the TVar monad, preventing programmers from writing such wishywashy code. To be honest, this is why I didn't understand what Joe meant by "it didn't require type theory" gambit, since some of the design concerns are mediated in Concurrent Haskell via type theory. On the other hand, based on the pragmatics Joe discusses, and the platform-wide integration with the CLR they were shooting for, reminds me of The Transactional Memory / Garbage Collection Analogy. Joe also wrote a briefer follow-up post, More thoughts on transactional memory, where he talks more about Barbara Liskov's Argus.

Adding Type Constructor Parameterization to Java

Vincent Cremet and Philippe Altherr: Adding Type Constructor Parameterization to Java, JOT vol. 7, no. 5.

We present a generalization of Java’s parametric polymorphism that enables parameterization of classes and methods by type constructors, i.e., functions from types to types. Our extension is formalized as a calculus called FGJω. It is implemented in a prototype compiler and its type system is proven safe and decidable. We describe our extension and motivate its introduction in an object-oriented context through two examples: the definition of generic data-types with binary methods and the definition of generalized algebraic data-types. The Coq proof assistant was used to formalize FGJω and to mechanically check its proof of type safety.

FGJω is a simple extension of (Featherweight) Java's generics, where type parameters may be type constructors (functions from types to types). This very readable paper finally made me understand GADTs.

(Previously: Generics of a Higher Kind on Scala's support for the same idea.)

On Iteration

On Iteration, by Andrei Alexandrescu.

Lisp pioneered forward iteration using singly-linked lists. Later object-oriented container designs often used the Iterator design pattern to offer sequential access using iterators. Though iterators are safe and sensible, their interface prevents definition of flexible, general, and efficient container-independent algorithms. For example, you can't reasonably expect to sort, organize as a binary heap, or even reverse a container by just using its Iterator. At about the same time, C++'s Standard Template Library (STL) defines its own conceptual hierarchy of iterators and shows that container-independent algorithms are possible using that hierarchy. However, STL iterators are marred by lack of safety, difficulty of usage, difficulty of definition, and a very close relationship to C++ that limits adoption by other languages. I propose an API that combines the advantages of Iterator and STL, and I bring evidence that the proposed abstraction is sensible by implementing a superset of STL's algorithms in the D language's standard library.

Previously: Iterators Must Go.

DesignerUnits

One goal in a public release is influence by example. I'd like future software to sport nice measurement units.

A review sequence by depth of interest: overview, worked examples, backgrounder, unit catalogs, QuickStart.nb, and finally DesignerUnits.nb which houses code. Core sections are "Unit Algebra - Productions - Main Algebra" and "Quantity Analysis."

The Structure of Authority: Why security is not a separable concern

The Structure of Authority: Why security is not a separable concern, by Mark S. Miller, Bill Tulloh, and Jonathan Shapiro:

Common programming practice grants excess authority for the sake of functionality; programming principles require least authority for the sake of security. If we practice our principles, we could have both security and functionality. Treating security as a separate concern has not succeeded in bridging the gap between principle and practice, because it operates without knowledge of what constitutes least authority. Only when requests are made -- whether by humans acting through a user interface, or by one object invoking another -- can we determine how much authority is adequate. Without this knowledge, we must provide programs with enough authority to do anything they might be requested to do.

We examine the practice of least authority at four major layers of abstraction -- from humans in an organization down to individual objects within a programming language. We explain the special role of object-capability languages -- such as E or the proposed Oz-E -- in supporting practical least authority.

An important overview of why security properties cannot be an after-thought for any platform, languages and operating systems included. To this end, the paper covers security properties at various granularities from desktop down to object-level granularity, and how object-capabilities provide security properties that are compositional, and permit safely composing mutually suspicious programs.

A recent LtU discussion on achieving security by built-in object-capabilities vs. building security frameworks as libraries reminded me of this paper. Ultimately, the library approach can work assuming side-effects are properly controlled via some mechanism, ie. effect types or monads, but any solution should conform to object capability principles to maintain safe composition.

An example of a capability-secure legacy/library approach is Plash (Principle of Least Authority SHell), which provides object-specific file system name spaces. Any library interface to the file system should mimic this file system virtualization, which effectively pushes side-effect control down to OS-level objects, and which is essential to safely composing mutually suspicious programs that access the file system.

XML feed