Parsing with error recovery?

Are there parser DSLs which allow you to 'nicely' specify error handling? It seems natural with, say, parser combinators, though something that is otherwise more akin to LL(1) would be more interesting.

The context is that some of the most common languages are generally thought of as LL(1), CFGs, etc., yet, in reality, their informal specifications allow some forms of erroneous input and actually define how to handle them. This means they're not really erroneous, but pockets of complication that should be somewhat uniformly handled in an isolated manner.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The ideal case with error recovery DSL

The ideal case with error recovery is when you do not have to specify it :) and delegate work to the parser framework. But to achieve this, the language should have some regularities in the syntax. But in this case you have to plan for it to happen.

For example parsers in XML editors usually do quite sensible error recovery. The python parser has a good error recovery as well (provided that the source is indented correctly), and error recovery potential is quite good for python.

Generally I think that such "ideal" error recovery is possible when there is a phrase syntax in the language, and to have a language definition mechanism that is of the higher level than EBNF-based (so it rules out unrestricted LL(1)).

This is why have introduced a phrase syntax in ETL. It greatly simplifies error recovery and extensibility of the language. There are also plans for better error recovery in the next versions of the ETL language, for example by automatic detection of statement continuation using keywords and block starts in the statements, but I have my hands full with other tasks.

I've run into this before.

I've run into this before. Really, in the context of error recovery, you want absolutely nothing to screw up your parser: you want to accept as much erroneous input as possible, if only to give proper error messages, but and often to keep compiling beyond to type check and generate code also!

The algorithm I came up with was quite robust: use a precedence parser combined with a brace matcher and "something else" to adjust precedences based on token flow. The technique is mostly bottom up, meaning that it can recover from arbitrary errors. I actually implemented this for my Live Programming version of SuperGlue (the one presented at Onward '07), given that live programming requires really robust error recovery (you never want to stop executing the program). Of course, syntax is a part of this equation: I simplified syntax where needed to make my scheme work, but I didn't have to make too many compromises (the language is very C-like vs. Scheme-like).

I believe (but can't prove) that grammar-based parsing is kind of a dead end I think when it comes to robust parsing. The problem is the top-down specification of the language, rather than having a bunch of small bottom-up rules. Of course, there are also bottom-up grammars, but I was never able to quite figure those out.

You could still go with grammars.

We still could do with grammars, but they have to be higher level than EBNF. And there is a need for phrase syntax as well. Basically having phrase syntax allows for having both at the same time: a kind of top-down grammar, and a set of bottom-up rules at the same time.

This section in ETL tutorial explains the automatic ETL error recovery a bit (nothing have to be coded into grammar to support it). And this AST+source demonstrates error recovery in presence of errors. As you could see, the big part of AST could be built despite of having errors.