Lambda the Ultimate

Packrat Parsing
started 2/13/2004; 12:50:30 PM - last post 2/14/2004; 9:53:33 AM

2/13/2004; 12:50:30 PM (reads: 8605, responses: 3)

Packrat Parsing

Packrat parsing is a novel and practical method for implementing linear-time parsers for grammars defined in Top-Down Parsing Language (TDPL). While TDPL was originally created as a formal model for top-down parsers with backtracking capability, this thesis extends TDPL into a powerful general-purpose notation for describing language syntax, providing a compelling alternative to traditional context-free grammars (CFGs). Common syntactic idioms that cannot be represented concisely in a CFG are easily expressed in TDPL, such as longest-match disambiguation and "syntactic predicates," making it possible to describe the complete lexical and grammatical syntax of a practical programming language in a single TDPL grammar.

There's also an implementation.

From ll1-discuss
Posted to implementation by andrew cooke on 2/13/04; 12:51:26 PM

Andris Birkmanis - Re: Packrat Parsing

2/14/2004; 8:41:49 AM (reads: 400, responses: 1)

In a lazy functional language, packrat parsing requires only ordinary algebraic data types, with no explicit hash tables or other costly lookup structures. A packrat parser implemented in this way is almost identical in style and structure to a conventional recursive-descent parser with backtracking: the packrat parser essentially just uses a different method to "tie up" the mutually recursive calls that are made between the functions comprising the parser...

Another variation on theme of monadic transformers-based parser? Then they are really like their robotic namesakes - being whatever you want them to be to solve the problem at hand :-)

But the phrase about fighting hash lookups with lazy evaluation is even more interesting, I have to read this paper through.

Andris Birkmanis - Re: Packrat Parsing

2/14/2004; 8:58:48 AM (reads: 397, responses: 0)

It just occurred to me that the Haskell parsers I've seen were geared towards TDPG, too, and not to CFG. Whereas the rules of a CFG most directly represent "expansions" from nonterminals to right-hand-side strings, the rules of a PG most directly represent "reductions" from right-hand-side parsing expressions to nonterminals. Furthermore, whereas the expansions expressed in a CFG represent operations on whole strings, the reductions expressed in a TDPL grammar represent operations on prefixes of an input string.

Andris Birkmanis - Re: Packrat Parsing

2/14/2004; 9:53:33 AM (reads: 396, responses: 0)

As a result, the Haskell specification, as with the specifications of most other programming languages in similar situations, simply provide an ambiguous CFG and an informal side-note specifying how a parser for the language should resolve the ambiguity.

Last time I read Declarative Continuations and Categorical Duality I thought I understood how to explicitly express evaluation strategy inside your code, and not as a side-note for compiler to use CBN or CBV, whether with eager coproducts or lazy products, etc. Arguing whether TDPGs are better than CFGs looks similar to preferring CBN over CBV. Why not give the user the control? Why not allow him mixing the paradigms to better approach the problem at hand? Or will it mean replacing a grammar language with a general purpose PL, and very esoteric at that...

Sorry for usurping the thread :-)