archives

The fundamental limitations of parser combinators... and how to fix them.

I am presenting a paper at PADL 2011, in which I discuss the fundamental limitations of current Haskell parser combinator libraries like Parsec and UUParse. The limitations are inherent to the modelling of non-terminal references using direct recursion in a referentially transparent functional language like Haskell.

The basic problem is that because of the model used, an algorithm can only ever "look at" a finite top part of the infinite tree representation of the grammar, and the full grammar can never be observed. If you have ever wondered why libraries like Parsec or UUParse only ever use LL-style algorithms (sometimes with memoization or lazy computation of first sets), it is because only these algorithms can work with such a finite top part of a grammar. Bottom-up algorithms or grammar transformations (which are common in parsing literature and widely used in parser generators) cannot be implemented. Even printing a BNF representation of the grammar is fundamentally impossible.

In my paper, I show how a different representation of the recursion in the grammar allows for a full "view" on the structure of the grammar, while keeping the DSL very shallow. I demonstrate the additional power with implementations of a few grammar algorithms in the accompanying technical report. An elaborate implementation is available in the grammar-combinators library (featuring a packrat parsing algorithm, left-corner grammar transform, many grammar manipulation primitives, basic UUParse and Parsec compatibility etc.).

Details are in the paper. I'm very interested in feedback from the community. For those of you interested in playing with the code, there is a tutorial available.