The Design and Implementation of a Dataflow Language for Scriptable Debugging

The Design and Implementation of a Dataflow Language for Scriptable Debugging, Guillaume Marceau, Gregory H. Cooper, Jonathan P. Spiro, Shriram Krishnamurthi, and Steven P. Reiss.

Debugging is a laborious, manual activity that often involves the repetition of common operations. Ideally, users should be able to describe these repetitious operations as little programs. Debuggers should therefore be programmable, or scriptable. The operating environment of these scripts, however, imposes interesting design challenges on the programming language in which these scripts are written.

This paper presents our design of a language for scripting debuggers. The language offers powerful primitives that can precisely and concisely capture many important debugging and comprehension metaphors. The paper also describes a pair of debuggers, one for Java and the other for Scheme, built in accordance with these principles. The paper includes concrete examples of applying this debugger to programs.

We've seen a paper on compiling dataflow languages, so here's one on an interesting application.

Shape analysis for composite data structures

Shape analysis for composite data structures. MSR-TR-2007-13.

We propose a shape analysis that adapts to some of the complex composite data structures found in industrial systems-level programs. Examples of such data structures include ``cyclic doubly-linked lists of acyclic doubly-linked lists'', ``singly-linked lists of cyclic doubly-linked lists with back-pointers to head nodes'', etc. The analysis introduces the use of generic higher-order inductive predicates describing spatial relationships together with a method of synthesizing new parametrized spatial predicates which can be used in combination with the higher-order predicates. In order to evaluate the proposed approach for realistic programs we have implemented the analysis and performed experiments with it on examples drawn from device drivers. During our experiments the analysis proved the memory safety of several routines belonging to an IEEE 1394 (firewire) driver, and also found several previously unknown memory safety bugs and memory leaks. To our knowledge this represents the first known successful application of shape analysis to non-trivial systems-level code.

Seems relevant to some of the discussions currently going on (e.g., how PLT can help practitioners).

The analysis described in this paper fits in to the common structure of shape analyses, and is based on abstract interpretation.

Lowering: A Static Optimization Technique for Transparent Functional Reactivity

Lowering: A Static Optimization Technique for Transparent Functional Reactivity, Kimberley Burchett, Gregory H. Cooper, and Shriram Krishnamurthi. PEPM 2007.

Functional Reactive Programming (FRP) extends traditional functional programming with dataflow evaluation, making it possible to write interactive programs in a declarative style. An FRP language creates a dynamic graph of data dependencies and reacts to changes by propagating updates through the graph. In a transparent FRP language, the primitive operators are implicitly lifted, so they construct graph nodes when they are applied to time-varying values. This model has some attractive properties, but it tends to produce a large graph that is costly to maintain. In this paper, we develop a transformation we call lowering, which improves performance by reducing the size of the graph. We present a static analysis that guides the sound application of this optimization, and we present benchmark results that demonstrate dramatic improvements in both speed and memory usage for real programs.

Whenever I read about compiler optimizations, I try (with mixed success) to relate them to transformations in the lambda calculus. I haven't managed to figure out what's going on with the dip construct the authors propose, but I would guess that the place to look is the proof theory of the necessity operator in modal logic -- dataflow programming can be seen as a kind of stream programming, and streams form a comonad over the lambda calculus, and comonads give semantics to the modal necessity (box) operator.

Threads in JavaScript?

Threads in JavaScript? "Over your dead body," says Brendan.

But Neil Mix begs to differ -- they're already there!

Neil's latest blog post presents a cool hack combining JavaScript 1.7's generators with trampolined style to implement very lightweight cooperative threads.

The implementation weighs in at a breathtakingly small 4k.

Regular Expression Matching Can Be Simple And Fast

With Peter's observation that everything good in Computer Science happened during the "Golden Age" freshly in mind, I found Russ Cox's recent article on regular expressions to be enjoyable reading.

This is a tale of two approaches to regular expression matching. One of them is in widespread use in the standard interpreters for many languages, including Perl. The other is used only in a few places, notably most implementations of awk and grep. The two approaches have wildly different performance characteristics ... The trends shown in the graph continue: the Thompson NFA handles a 100-character string in under 200 microseconds, while Perl would require over 10^15 years.

Combining implementation details, finite automata, and a foray into decades-old theory, this article shows how most of our favorite little languages have an enormous performance bottlenecks for certain categories of string comparisons.

An additional data point: The Shootout benchmarks have a large string comparison test. It's interesting that Tcl is at the top of the heap for performance. Guess which one is using the Thompson NFA algorithm for regular expressions?

Lightweight Fusion by Fixed Point Promotion

Lightweight Fusion by Fixed Point Promotion, Atsushi Ohori and Isao Sasano.

This paper proposes a lightweight fusion method for general recursive function definitions. Compared with existing proposals, our method has several significant practical features: it works for general recursive functions on general algebraic data types; it does not produce extra runtime overhead (except for possible code size increase due to the success of fusion); and it is readily incorporated in standard inlining optimization. This is achieved by extending the ordinary inlining process with a new fusion law that transforms a term of the form f o (fix g.λx.E) to a new fixed point term fix h.λx.E' by promoting the function f through the fixed point operator.

This is a sound syntactic transformation rule that is not sensitive to the types of f and g. This property makes our method applicable to wide range of functions including those with multi-parameters in both curried and uncurried forms. Although this method does not guarantee any form of completeness, it fuses typical examples discussed in the literature and others that involve accumulating parameters, either in the foldl-like specific forms or in general recursive forms, without any additional machinery.

In order to substantiate our claim, we have implemented our method in a compiler. Although it is preliminary, it demonstrates practical feasibility of this method.

Deforestation is one of those optimizations every functional programmer who has ever had to rewrite a beautiful composition of maps and filters into an evil, ugly explicit fold has always longed for. Unfortunately, the standard lightweight fusion algorithms have trouble with examples as simple as foldl, and this paper has a very nice account of a simple algorithm that can handle it.

Gradual Typing for Objects

Gradual Typing for Objects. Jeremy Siek and Walid Taha.

Static and dynamic type systems have well-known strengths and weaknesses. In previous work we developed a gradual type system for a functional calculus named [...]. Gradual typing provides the benefits of both static and dynamic checking in a single language by allowing the programmer to control whether a portion of the program is type checked at compile-time or run-time by adding or removing type annotations on variables. Several object-oriented scripting languages are preparing to add static checking. To support that work this paper develops [another calculus], a gradual type system for object-based languages, extending the [...] calculus of Abadi and Cardelli. Our primary contribution is to show that gradual typing and subtyping are orthogonal and can be combined in a principled fashion. We also develop a small-step semantics, provide a machine-checked proof of type safety, and improve the space efficiency of higher-order casts.

The authors' previous work on gradual typing was discussed here. This brings it to an object-oriented setting which is (as the abstract points out) very directly applicable to mainstream scripting languages, at least in principle.

[Edit: This is from the types list, where the authors also added: "We will present the paper at ECOOP 2007 and would be especially interested in any feedback on the paper before the final submission is due on April 25."]

Almost everything happened in the Golden Age, right?

When writing CTM I was struck with how many of the good ideas in programming languages were discovered early on. The decade 1964-1974 seems to have been a "Golden Age": most of the good ideas of programming languages appeared then. For example:

  • Functional programming: Landin's SECD machine (1964)
  • Object-oriented programming: Dahl and Nygaard's Simula (1966)
  • Axiomatic semantics: Hoare (1969)
  • Logic programming: Elcock's Absys (1965), Colmerauer's Prolog (1972)
  • Backtracking: Floyd (1967)
  • Capability security: Dennis and Van Horn (1965)
  • Declarative concurrency: Kahn (1974)
  • Message-passing concurrency: Hewitt's Actor model (1973)
  • Shared-state concurrency: Hoare's monitors (1974)
  • Software engineering: Brooks's mythical man-month (1974)

It is a sobering thought that not much new stuff has come since then. Hindley-Milner type inferencing in 1978, constraint programming in 1980, CCS (precursor of pi-calculus) in 1980. What revolutionary new ideas came since 1980? Most of the work since then seems to have been in consolidation and integration (combining the power of the different ideas). Right?

50 years of “Syntactic Structures”

It seems that Chomsky’s Syntactic Structures was published fifty years ago this month. Unquestionably a historic event as far as linguistics and cognitive science is concerned.

Just the other day I had a conversation with a (non CS) colleague about Chomsky. My friend argued that Chomsky is regarded as very important for CS. Not going into a full historical analysis I argued that most CS people are unaware of Chomsky, or at least don't see him as all that crucial to the discipline of CS. A CS undergrad probably encounters Chomsky when learning about the Chomsky Hierarchy in a formal languages course - which deals mostly with regular and context free languages, the lower rungs of the hierarchy.

A more cautious historical analysis will show how Chomsky influenced CS, as well as much else, but I think my assessment of the way most computer scientists view Chomsky's impact on the field is fairly accurate. Still, many of us are interested in languages in general, as well as programming languages, and in the interaction between PLT and linguistics. So I think LtU is a good place to mention the fiftieth anniversary of Syntactic Structures, surely a landmark affair in 20th century science.

The Missing Link - Dynamic Components for ML

The Missing Link - Dynamic Components for ML, Andreas Rossberg. ICFP 2006.

Despite its powerful module system, ML has not yet evolved for the modern world of dynamic and open modular programming, to which more primitive languages have adapted better so far. We present the design and semantics of a simple yet expressive first-class component system for ML. It provides dynamic linking in a type-safe and type-flexible manner, and allows selective execution in sandboxes. The system is defined solely by reduction to higher-order modules plus an extension with simple module-level dynamics, which we call packages. To represent components outside processes we employ generic pickling. We give a module calculus formalising the semantics of packages and pickling.

This is a very nice paper showing how to integrate dynamic loading into the ML module system. Er, I guess I'm repeating the abstract. I thought this paper, in addition to the feature it gave, was also a good demonstration of how to put the Dreyer-Crary-Harper account of ML modules to work.