Validating LR(1) parsers

Validating LR(1) parsers

An LR(1) parser is a finite-state automaton, equipped with a stack, which uses a combination of its current state and one lookahead symbol in order to determine which action to perform next. We present a validator which, when applied to a context-free grammar G and an automaton A, checks that A and G agree. Validating the parser provides the correctness guarantees required by verified compilers and other high-assurance software that involves parsing. The validation process is independent of which technique was used to construct A. The validator is implemented and proved correct using the Coq proof assistant. As an application, we build a formally-verified parser for the C99 language.

I've always been somewhat frustrated, while studying verified compiler technology, that the scope of the effort has generally been limited to ensuring that the AST and the generated code mean the same thing, as important as that obviously is. Not enough attention has been paid, IMHO, to other compiler phases. Parsing: The Solved Problem That Isn't does a good job illuminating some of the conceptual issues that arise in attempting to take parsers seriously as functions that we would like to compose etc. while maintaining some set of properties that hold of the individuals. Perhaps this work can shed some light on possible solutions to some of those issues, in addition to being worthwhile in its own right. Note the pleasing presence of an actual implementation that's been used on the parser of a real-world language, C99.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Most parsers sucks at semantic hooks

Since I have been frustrated as everyone else with the "unsolved parsing problem".
I got to try my own and I would appreciate some feedback: LRTT parser.
Its a single Ocaml functor and has only been summarily tested on an Amd64 Linux box, yet is provided with a working demo.
LRTT means Left to Right Tree Traversal.
(BTW, here is my old account)

Interesting and promising

Interesting and promising. Too bad I can't read OCaml easily, but I'd definitely give a try at translating your solution into JavaScript or C# I'm much more familiar with if I just could find the time.

Two questions, though:

1. do you think a formal proof that it would solve the issues you mention can be contemplated ? That could bring another definitive and spectacular boost to PEG-based parsing techniques, I'd speculate !

2. besides the parsing problem proper, I'm not sure I understand your remark re: the "semantic hooks". As I see it, PEGs can be very friendly in some host languages (e.g., OCaml or JavaScript) as they often allow to implement either a direct evaluation (interpreter) or a deferred one, after a full AST construction, via a compiler's code producer visitor. In either cases, I wrote "friendly", if only because it's usually pretty easy to encode as a DSL embedded in those hosts, as we know. (Provided the host has closures, which helps a lot to make it feasible that way.)

But I believe I'm missing another and more subtle point that you're making in your remark, though (?)

Translating the parser

All you need is
- mutable records for the graph structure.
- garbage collection not to bother with memory management.
- a proper handling of tail recursive calls
I don't know anything about C# but I think most Javascript engines do not deal properly with tail calls, this is an absolute requirement, due to heavy use of mutual recursive functions even the simplest parses will blow up the stack.
I have no idea either how this could be translated to a purely functional language like Haskell, inplace updating of the graph nodes is the very basis of the algorithm.
I am indeed contemplating a formal proof but I am not entirely familiar with this subject domain, it appears that proving properties of pointer programs (not just functional) is a LOT of work.
As for "most parsers suck" I meant that LR semantic callbacks occur only upon reduces and are therefore quite awkward while up to now, and despite some attempts, PEG parsers didn't really supported left recursion.


(wrong sub thread)