archives

Guppy: trying to make parsing simple, fun, and powerful

I've been working on designing a parser definition language called Guppy. I have not implemented it yet, as the syntax and semantics of it are a moving target subject to my whims. My knowledge and insight is limited, so I want to discuss the design of Guppy with a community.

Edit: I read the policies document after posting rather than before (this is my first post). Rather than having a mega-post here, I delegated most of its content to my blog.

In a nutshell, Guppy's mission statement is to make parsing simple, fun, and powerful. My ideas for its feature set are in flux; the list currently includes:

  • Syntax strongly influenced by traditional regular expressions
  • Indentation-based syntax
  • Ability to nest production rules
  • Ability to isolate sets of production rules (e.g. a set for the "parser", and a set for the "lexer").

See here for an in-depth description of my ideas so far along with snippets of Guppy code.

To start discussion, I'll introduce the biggest fundamental problem I'm currently faced with: should ambiguous grammars be allowed in Guppy? Consider the following:

expr:
    number
    identifier
    ( expr )
    expr '*' expr
    expr '/' expr
    expr '+' expr
    expr '-' expr

I think this is the most obvious way to define a grammar for basic arithmetic, but there are a couple problems. First of all, the parser doesn't know the order of operations here, and telling Guppy the order of operations might require non-obvious syntax. Second of all, the grammar really isn't correct. One could produce "1+2*3+4" by following expr '*' expr first; that would be an incorrect derivation.

The problem is that allowing ambiguous grammars can make writing grammars easy, but it can result in a less intuitive syntax and violations in correctness. How might I resolve this problem elegantly?