Languages without operator precedence

(If this has already been discussed here, and I assume it probably has although I haven't been able to find such a discussion by search, please be so kind as to point me in the right direction.)

It seems like most languages come up against the operator precedence problem and take one of these two choices:

1. Make an operator precedence table. (C and many, many others)

pro: assigns meaning to 2 + 3 * 7 = 23 (although if you really like 35 or dislike the normal rules, you can have that interpretation too)
con: readers have to know the precedence table to mentally arrive at the same parse tree that the parser does

2. Forget the whole thing, and always make left (or right) deep ASTs. (lisp, APL, many others)

pro: easy to remember
con: while this is much easier to remember, you still need to know that this is the rule, when coming across it for the first time

I'm wondering about the (less-explored?) third option, and would appreciate information about languages that have taken it, or papers exploring it, or just peoples' thoughts:

3. Forget the whole thing, and _always require explicit grouping_ (at least when properties of the operators in question can't be used to arrive at a "don't care" conclusion).

pro: easy to remember, and easy to mentally parse even if you are unfamiliar with the rules
pro: enforces the (common?) practice of using grouping to communicate intent
con: you have to write either 2 + (3 * 7) or (2 + 3) * 7 (although potentially a clever editor could offer you a choice of candidate interpretations when what you write is ambiguous)

Note that we can use the associativity of + to allow expressions like 2 + 3 + 7 because we are indifferent between (2 + 3) + 7 and 2 + (3 + 7). Similar observations apply to a number of other common operators.

Note also that this doesn't completely eliminate operator precedence, since we still need ( ) to bind more tightly than +, for example. This looks like a matter of degree more than an iron-clad rule, so I'm hoping to avoid stepping in a bees' nest here :). Some might argue to relax the rules even further and allow tight bindings for unary negation, which I think makes good sense, since -x + 2 might not be likely in practice to be confused with -(x + 2).

It's possible to write an LALR (and possibly simpler?) grammar that deals with all of this. I'm still trying to wrap my head around techniques for generating sensible errors from parsing failures, so I'm not sure of the extent to which it is possible to give intelligible error messages.

Is this idea hopeless? Misguided? Already in 17 languages I should have heard of? All of the above?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I'm wondering about the

I'm wondering about the (less-explored?) third option, [...]

3. Forget the whole thing, and _always require explicit grouping_ (at least when properties of the operators in question can't be used to arrive at a "don't care" conclusion).

It seems to me that Lisp and Scheme fall into this third category, not the second one, since they require proper grouping *everywhere*. Seems pretty explored to me! :-)

Good point

Interesting point, that approach is at both ends of the spectrum between 2 and 3.

Although Lisp also applies tighter binding for backquotes, and creates right-deep ASTs for implicit sequences of applications of cons when lists are formed by juxtaposition and don't explicitly (edit: i had said implicitly) mention the . operator. (I'm not a Lisp language lawyer, this is just my understanding from a course a few years back. I could be either entirely wrong or wrong on some technicality here.)

Smalltalk, OTOH

always groups operators to the less

The expression 2 + 3 * 5 is 25 in Smalltalk, not 17 as it would be in C or Java.

Smalltalk doesn't have operators

I would argue that Smalltalk doesn't have operators. The above expression means "Send the message '+' with one parameter '3' to the object '2'; then send the message '*' with one parameter '5' to the result."

That's actually another way of dealing with operator precedence: denial. Without operators there's no operator precedence.

What are operators?

The same could be said about Haskell: x + y means calling the function + on x and y. But there is a difference between regular functions and operators: the syntax.

What are operators? (part 2)

In Scala, like Smalltalk, 3 + 2 is sending a + message to a 3 object. In .() notation, it's the equivalent of 3.+(2). And + is an ordinary method name - you're free to name your own methods +.

Nonetheless, Scala has precedence for messages that are named as "operators". 2 + 3 * 7 = 23 translates to 2.+(3.*(7)).

While I'm at it, messages that end with : are messages sent to the argument on the right. For instance, List has a :: method, which in good functional tradition is pronounced "cons." Because of the right fix rule, 1::2::3::Nil translates to Nil.::(3).::(2).::(1).

There are even ways to create some unary methods so that !a is just a message sent to a.

It's not quite as flexible as Haskell which allows you to specify your own precedence and fixity, but it's pretty nice and illustrates another way a language can have precedence without really having operators.

XML DTDs and RELAX NG compact schemas use #3

Unlike the Lisps, they don't require full parentheses, but they do require parentheses where two operators meet. Thus A, B, C is fine, and so is A | B | C, but A, B | C is not: you must write (A, B) | C or A, (B | C) as the case may be. (Comma is sequence, vertical bar is alternation.)

I knew there was a reason I should branch out from xml-schema

I knew there was a reason I should stop using xml-schema all the time and learn a few more of these document structure languages.

Interesting example of #3, as opposed to the #1 choice made by a bunch of regular expression dialects (among them Posix and Perl) in essentially the same problem space. The RELAX NG grammar uses the same approach I had in mind, so that is comforting that I'm not totally off-base.

A couple other possibilities

not mentioned.

4) Make operator precedence (for new ops) user defineable. Prolog is one example in which you can define tokens to be operators, with a given fixity and precedence. (In all such languages I'm familiar with, though, precedence is specified by some arbitrary number; I don't know of any that allow use of constraints to specify precedence--i.e. "* is higher than +", and have the compiler work it out and fail if there are any cycles).

Also, I'm not sure s-expressions, other parenthetical forms, or any other fully-bracketed construct, should be considered as an "operator" for purposes of this discussion. The ambiguity only comes with unbracketed syntactic forms, such as the human notational conventions for arithmetic which find themselves duplicated in programming languages.

My language, Kayia, does it this way

The language/method I developed does it this way. There is default precedence but you can modify it. Let me know if you're interested in looking at it.

occam

As it turns out, option 3 is exactly what was implemented in occam. Quoting from the occam 2.1 reference manual:

There is no operator precedence and so the hierarchical structure of a large expression must be defined by
parentheses.

I can recall this behavior being quite irritating when I first encountered it, because I was so used to languages with precedence. But you get used to it fairly quickly, and it does have the advantage of preventing you from introducing simple arithmetic errors caused by not fully thinking through the precedence rules for an expression.

A sixth option: relative precedence

A sixth option is implemented in Fortress: relative operator precedence. Operators in Fortress have precedence, but there is no total ordering, no complete precedence table. Instead, operators only have relative precedence, and only if they are actually commonly used that way in scientific publications.

Mixing operators that don't have defined relative precedence is a compile error; in that case parentheses must be used.

Also, Fortress is whitespace sensitive, but whitespace can only be used to emphasize existing implicit grouping, never to override it. So, 2+3*5 is okay, 2 + 3 * 5 is okay, 2 + 3*5 is okay because it only emphasizes the already present precedence, but 2+3 * 5 is a compile error, because the visually emphasized grouping doesn't match the actual parse tree. Whitespace can also be used to introduce grouping between two operators that have explicitly defined identical prededence but not between to operators which don't have relative precedence.

Thanks

Thank you for the pointers to occam and Fortress, I was not familiar with either (oddly, because I was taught about CSP with a very similar notation to the concrete syntax of occam).

The Fortress partial order approach is quite interesting (and seems also to be an example of the kind of extensibility--or at any rate potential for extensibility--mentioned above by Scott Johnson).

Fortress Extensibility

The Fortress partial order approach is quite interesting (and seems also to be an example of the kind of extensibility--or at any rate potential for extensibility--mentioned above by Scott Johnson).

Extensibility is one of the major goals of Fortress. After all, it is designed by Guy "Growing a Language" Steele!

The idea is that Sun could never imagine all the possible uses of Fortress up front. And since the whole idea of Fortress is to provide computational scientists with a programming language that looks as much as possible like the LaTeX-typeset scientific papers or blackboard scribbles that they already write anyway, being able to adapt not only the APIs but also the operators and even the syntax to their specific scientific field is crucial. A geneticist, a quantum physicist and a fluid-dynamics expert would never be able to agree on one single syntax and one single set of operators with one single precedence table. And nor should they!

So, in Fortress being able to add and redefine operators, precedence and syntax, is crucial.

Two notes

Just to remind two things, to further complicate the subject :)

1. Knowing the operator precedence is not the only thing needed in order to make an expression unambiguous. In the presence of side effects, the order of argument evaluation also matters, and most languages leave it unspecified. That is why in C, if x is initially 0, the expression ++x-x++ evaluates to 0 or 2, depending on the compiler.

2. Although both APL/J/K, on the one hand, and prefix (Lisp etc) and postfix (Forth, PostScript etc) languages, on the other, assign equal precedence to all operators, they differ in other details:

– In APL, an operator and its arguments are always close to each other; in Lisp or Forth, they may be far apart. Thus APL does ‘better’, but at the expense of leaving out operators of valence >2.

– In APL, parentheses are still needed for explicit precedence; in Lisp, they are obligatory (for other reasons), and in Forth etc they are not needed at all.

Side effects

Good point about side effects. I hadn't been considering it because the language I was thinking about when I started the thread doesn't have expressions with side effects.

In C, the issue with side effects

has little to do with operators per se--function calls suffer from the same problem.

The behavior of the following code

int subtract (int a, int b) { 
    return a - b; }

int main ()
{
    int x = 0;
    int y = subtract (++x, x++);
    printf ("The answer is %d\n", y);
    return 0;
}

is also not well-defined for the standard. The issue is that the arguments to a function, while strict, may be evaluated any any order the implementation chooses. If the leftmust argument is evaluated first, you get zero; if the rightmost argument is evaluated first, you get two. Most implementations I'm familiar with are left-to-right, but portable code cannot depend on that.

C has more flexibility than just choosing an order

subtract(++x, x++) gives undefined behavior in C, meaning that there is no guarantee what it will do - akin to an unbounded error in Ada. Modifying a variable twice with no intervening sequencing isn't allowed. It's not just that either order may be chosen: an implementation might do both operations concurrently and lock up a platform that detects conflicting writes/race conditions, or its optimizer might decide to remove the code completely as it could never be called by a valid program.

Which is rather unrelated to the discussion on precedence, except to show that there's more to specifying evaluation sequence than just precedence. As concurrency continues to get more press, *not* constraining evaluation order unnecessarily might get to be more important.

Where aren't the nasal demons lurking in C?

I mean, seriously. Wow.

Be fair

Where aren't the nasal demons lurking in C?

IIUC one of the advantages of using an operational semantics to define R6RS Scheme was that it made it easier (possible?) to formally specify that the evaluation order of arguments to a function is undefined. I personally find this to be a perfectly reasonable position to take. If you are going to make that requirement for an M-expression(ish) language like C, doesn't it make sense to extend it to infix operator application as well?

in Lisp, they are obligatory

in Lisp, they are obligatory (for other reasons)

Well, indeed - just to belabor the point, in lisp, parens aren't even for precedence particularly, they're just denoting cons-list/tree structure (which is then (maybe) evaluated*). (1+1) == 1+1 in maths, but '(hello) != 'hello in lisp.

* yielding another "false friend" that lisp parens are a bit like the func(blah, baz) parens in C, only on the other side of the func and without commas (func blah baz). That's the apparent result, sure, but only because the lisp eval is defined, when it gets the list (func blah baz) to evaluate it by calling the function bound to func with the values bound to blah and baz.

Whitespace implied parenthesis

The creator of merd implements what he calls "horizontal layout". This is like the Fortress example above, except that it actually writes the parentheses into the parse tree, based on the amount of whitespace between tokens. That is:

2 * 1+3 -> 2 * (1+3) -> 8
2*1 + 3 -> (2*1) + 3 -> 5

And, presumably, the operators have their normal arithmetic precedences to act as tie breakers, so:

2*1+3 -> 5

Or you could infer...

I am currently writing an experimental (C#-like) language that infers the order of operations. Instead of generating an AST during the traditional parsing step, it's moved closer to the analysis side of things. All statements are required to be of type void/unit, and the compiler builds an AST from a flat stream of tokens that satisfies that constraint.

You'll still run into cases of ambiguity, and you'll still run into issues with the 2+3*5 sort of code; but it's better than parens all over the place. Better than specifying a precedence and hoping that fits into other libraries' precedences...

Not sure how viable it'll be in practice though, hence the experiment.

Mathematics class

As children in school, most of us are taught about arithmetic precedence. These laws of mathematics should be upheld in any naturalistic programming language. If a software developer happened across some math formula on the Internet and wanted to use it in their program, it would be best if they could use it directly without having to convert from the arithmetic precedence to some reverse programmatic form.
On another note, the vast number of C++/C#/Java/etc developers are already used to operator precedence. It would be hard to unlearn this.