Parsing user defined operators with precedence/assoc ala ML

I Googled around quite a bit and didn't turn up much. I do note mllex and mlyacc distributed with SMLNJ, but it's not clear how to handle user defined operators with a yacc based strategy, or even whether SMLNJ uses these tools to implement itself.

Is this typically a job for good old hand rolled recursive descent?

For any and all pointers, info - thanks much!

Scott

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Yacc + some handwork

Yes, SML/NJ (and similarly other ML implementations) use these tools. To handle user-defined scoped infix directives a la SML you have to do bit of extra work, though. The usual trick is to first parse all application (infix or curried) simply as a list of atomic (aka primary) expressions, and then do infix resolution later. Infix resolution can then be a very simple hand-written parser that is invoked locally on these lists. It treats atomic expressions as tokens and only classifies them as nonfix or infix (of different precedence and associativity) based on the current infix environment.

You can have a look at my simplistic HaMLet implementation of SML: http://www.mpi-sws.org/~rossberg/hamlet/. When you unzip the sources, the Yacc parser is in parse/Parser.grm, and it invokes the infix parser from parse/Infix.sml on the fly. Should be pretty much self-explanatory.

(In principle it is even possible to get away without the separate parser - at least for plain SML, where all fixity scopes are syntactically apparent. You can manage the infix environment as global state that is updated inside the semantic actions of the parser. The lexer could use that environment to deliver the proper tokens.)

Javascript may not be

the fanciest language 'round here, but nevertheless the Pratt parser implementation at

http://javascript.crockford.com/tdop/tdop.html

is quite interesting.

So I'm thinking....

Operators are defined in a table. The lexer produces tokens for operators like oper2-R, oper6-L, etc. via table look up in the lexer. Yacc populates the table as it encounters new operator definitions.

Seems pretty straightforward in general, and fits in with my latest plan to use the Lex/Yacc implementation in PLT Scheme.

Yes, but...

Yes, but note that this particular approach does not scale to more flexible forms of infix definition. For example, in some dialects of SML a module signature can specify infix status. When you open a module, this becomes effective in the current scope. Obviously, with such a feature you cannot do infix resolution before you do type checking.

(The approach also requires global state and thus makes the parser non-reentrant, but that's probably not a big deal.)

Global state

You can manage without global state by making the fixity table an attribute of the infix operator tokens, so that it's available in the semantic actions. You'll need some way to maintain the context in the lexer as well, such as ocamllex's parameterized lexing rules

Avoiding mutable state altogether?

Yes, though you still need a way to update that state from the parser when you process (scoped) fixity declarations. I can see how you can do that if the table is mutable, but wonder if there is a way to avoid mutable state altogether given the typical lex/yacc interfaces.

All things considered I think it is actually cleaner and more scalable to do infix resolution separately from yacc parsing.

I think that the typical

I think that the typical interfaces don't provide any way for the parser to pass values to the lexer apart from (global or local) mutable state.

For some languages you could handle infix resolution in the lexer rather than the parser (again assuming parameterised lexing rules) if you were willing to give special treatment to the tokens that open and close scopes and to the tokens that specify fixity. This approach wouldn't require any mutable state, but I certainly don't recommend it!

All things considered I think it is actually cleaner and more scalable to do infix resolution separately from yacc parsing.

I'm inclined to agree.