archives

Yacc is dead

In Yacc is dead (2010) Matthew Might and David Darais of the University of Utah, Salt Lake City...

present two novel approaches to parsing context-free languages. The first approach is based on an extension of Brzozowski’s derivative from regular expressions to context-free grammars. The second approach is based on a generalization of the derivative to parser combinators. The payoff of these techniques is a small (less than 250 lines of code), easy-to-implement parsing library capable of parsing arbitrary context-free grammars into lazy parse forests. Implementations for both Scala and Haskell are provided. Preliminary experiments with S-Expressions parsed millions of tokens per second, which suggests this technique is efficient enough for use in practice.

It seems every problem in computer science can be solved with either one more level of indirection or a derivative.

Recursion over the structure of a type

Hi! I am building a system for monadic reference counting in Haskell. A reference is defined as:

class Reference r where
new :: a -> St s (r a)
get :: r a -> St s a
incr :: r a -> St s ()
decr :: r a -> St s ()

The problem is that I would like to automatically inspect the structure of type a, so that incr and decr also increment any references contained in a recursively. Any suggestions?

Pure and Declarative Syntax Definition: Paradise Lost and Regained, Onward 2010

Pure and Declarative Syntax Definition: Paradise Lost and Regained by Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth from Delft

Syntax definitions are pervasive in modern software systems, and serve as the basis for language processing tools like parsers and compilers. Mainstream parser generators pose restrictions on syntax definitions that follow from their implementation algorithm. They hamper evolution, maintainability, and compositionality of syntax definitions. The pureness and declarativity of syntax definitions is lost. We analyze how these problems arise for different aspects of syntax definitions, discuss their consequences for language engineers, and show how the pure and declarative nature of syntax definitions can be regained.

I haven't compared this version with the Onward 2010 version, but they look essentially the same. It seems timely to post this paper, considering the other recent story Yacc is dead. There is not a whole lot to argue against in this paper, since we all "know" the other approaches aren't as elegant and only resort to them for specific reasons such as efficiency. Yet, this is the first paper I know of that tries to state the argument to software engineers.

For example, the Dragon Book, in every single edition, effectively brushes these topics aside. In particular, the Dragon Book does not even mention scannerless parsing as a technique, and instead only explains the "advantages" of using a scanner. Unfortunately, the authors of this paper don't consider other design proposals, either, such as Van Wyk's context-aware scanners from GPCE 2007. It is examples like these that made me wish the paper was a bit more robust in its analysis; the examples seem focused on the author's previous work.

If you are not familiar with the author's previous work in this area, the paper covers it in the references. It includes Martin Bravenboer's work on modular Eclipse IDE support for AspectJ.

Automatic Staged Compilation

Automatic Staged Compilation, doctoral dissertation of Matthai Philipose:

[...] The past few years have seen the emergence of staged optimization, which produces run-time optimizations that often have much lower run-time overhead than traditional optimizers, yet do not sacrifice any of their functionality. The key to the technique is a method, called staging, to transfer optimization overhead to static compile time from run time. Unfortunately, developing staged variants of individual optimizations has been highly specialized, labor-intensive work; staging pipelines of optimizations even more so.

This dissertation presents a system called the Staged Compilation Framework (SCF), which automatically stages entire pipelines of compiler optimizations at arguably little additional engineering cost beyond building the slower traditional version of the pipeline. SCF harnesses two powerful but traditionally difficult-to-use techniques, partial evaluation and dead-store elimination, to achieve staging. An implementation of SCF shows that staged compilation can speed up pipelines of classical compiler optimizations by up to an order of magnitude, and more commonly by a factor of 4.5 to 5.

I haven't read through it all yet, but after a cursory skim it certainly looks interesting.