Operator precedence

Getting an expression grammar right is a core part of any language. Simple two-level (mult binds tighter than add) examples abound in texts and such. But how far should it be taken?

I'm looking at a grammar now (the lamented - by me - HyperTalk) that has ten levels. Is that going overboard? In particular, what do you get by separating equalities from inequalities like this:

( ) // bind tightest
unary operators
multiplicative
additive
inequality
equality
and
or // bind most globally

I'm constructing a grammar in which the equalities and inequalities have the same level of binding, i.e. it's left to associativity to sort out who goes first. Is there a gotcha waiting for me?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I suspect 1 == 2 != 3 == 4

I suspect 1 == 2 != 3 == 4 won't do what most people expect if you leave it to associativity.

Honestly, precedence isn't that big a deal - it's a little tedious to put together admittedly, but we're used to it. I remember a talk on operators in Agda where precedence was a DAG though, which is makes some sense if you've got lots'n'lots'n'lots of operators the way Agda and Haskell do. Haskell has around 10 levels counting parens, that doesn't cause a great deal of trouble.

Operator precedence

Operator precedence techniques, like the Pratt parser/top-down operator precedence, generally use a simple numbering scheme, which ultimately requires the developer to build and compile the precedence DAG manually while assigning precedences in the grammar. The impression I got from the Agda paper was that they automated this step, so all you need is to declare an operator's precedence relative to at least one other operator.

I think you could mimic this approach with a simple a combinator library where you declare precedence relationships, and it builds this DAG for you. Given the DAG, which is a total order on operator precedences, you can compile a list of (operator symbol, precedence number) pairs usable in something like a Pratt parser.

Ten is nothing.

C++ has 18 levels. http://www.cppreference.com/wiki/operator_precedence. I don't find it overboard, but I've been programming in it so long, I'm probably brainwashed.

10 == 0?

Well it's a good thing that at least there is a clear & defined precedence. I shudder to think of writing a C++ compiler. Thank goddess for gcc, which gives us C, C++, Obj-C, and even (bane or boon?) Obj-C++.

You may be brainwashed, but that could also be a good thing because it takes a special immersion to be (and stay) sharp at C++. We're not all Herb Sutter.

1 == 2 != 3 == 4

Ha, Philippa, that's exactly the sort of thing I suspected I was overlooking. Thank you.

Well, my language is (so far) as permissive as possible, allowing all combinations of operators and types. (Though some combinations are just silly.. well they have silly defaults. But no runtime type error.)

It's an LL(1) recursive descent, but I reserve the right to peek ahead a token if I d*rn well please. Keeps the grammar from being over-factored...

Thanks all!

Depends on your users

This is a 'human' issue not a PLT issue but so who are you the users you're targetting?

Beginners? So the right number is of operator's precedence is "just enough so that mathematics expression are correct", make the user explain what he wants after this (ie make it an error if there's no parenthesis), otherwise users will 'guess' often incorrectly.

Language X programmers? Then you need to reproduce the language X level of precedence..

Target audience

I do intend the language (provisionally called STalk for SimpleTalk) for beginners, or at least those who wouldn't know a dangling pointer from a third leg. An embedded scripting language inside a simple toolkit for making buttons and windows. Like HyperCard, Runtime Rev, TileStack, etc.

Now struggling over whether to extend the "put into" command to support "put before" and "put after" or keep it mono-purposed and add "prepend" and "append" commands. (Either way, gives the user the semantics of "in place modification" even if internally the interpreter implementes it like Static Single Assignment.) Have already bit the bullet and kept separate the concept of command and expression... :-)

But I know one thing: whatever I choose, someone will find it abominable.