Generating useful errors in a functional stream parser

I don't know if this post strays too far from programming language theory for this forum, so please let me know if I'm criminally Off Topic. That said, I'm writing a parser library for C++ which uses a limited form of backtracking that resembles functional stream parsing. I find this parse strategy to be elegant and powerful, but it makes locating source of parse errors seemingly impossible. Does anyone know how to generate specific/useful error information in a functional stream parser?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Slightly off topic

This is indeed not quite language related. Let's try to keep the discussion focused on links to relevant resources.

Out of interest, would you

Out of interest, would you consider significant parsing and/or grammar problems on-topic? I'm thinking more "is parsing this decidable?" than "how do I implement this technique".

Sure, but also just

Sure, but also just slightly. We usually don't discuss parsing and front-end compilation techniques. We also try not to talk about techniques that are language specific. Of course if something is particularly worthwhile, or is an example of a construct that is of general use, that the discussion is on topic.

Notice that I said "slightly off topic". I didn't make an admin decision, just mentioned my opinion. The community may find this very much on topic, contrary to my feeling. That would be ok, of course.

I'll answer nevertheless.

I'll answer nevertheless.

We compute maximal reached position in alternating combinator:

(P p) ||| (P q) = P $ \s othermaxpos -> let
        (maxposP,rP) = p s othermaxpos
        (maxposQ,rQ) = q s othermaxpos
    in
        (max maxposP maxposQ,rP++rQ)

We should do that in item combinator also.

explicitly reified error nodes

viergroupie: Does anyone know how to generate specific/useful error information in a functional stream parser?

Just reify errors in output. Then the errors will dataflow to wherever a handler downstream can reach an end user.

I generate explicit error nodes for things a parser doesn't like, so they end up in the AST along with good nodes. (Then if you try to compile or interpret the AST, the error nodes complain; if you just print the AST, users will see what's wrong.)

When I'm trying to be thorough, error nodes include byte ranges of "bad" content in addition to line numbers, etc. Such error nodes encode "for this range of bytes, I'm unhappy for this specific reason" and are basically a fold of the erroneous input.

Instead of making error states illegal, give them a representation so all input content maps to something, getting closer to a total function of the input domain. It's more work, but it's proportional to additional usefulness of a parser under error inputs.

I don't know if finding errors exactly is possible

It could be that the point where an error is reached is not the site where the user made a mistake. For example take the following grammer:

address := name "\n" street "\n" city "\n" zip

where name can include all alphanumerics but cannot be empty. Then parse an address without the name. What will happen is the parser will interpret the street address as a name, and choke on the city as it is not a number. The error is forgetting the name, but it the parser will think it is because the house number is a city name. If we give an AST it might make it a bit more recognizable, but you could still have very nasty trees to look through for a mistake.