javascript shift-reduce parser

Here is an universal parser in javascript:
http://synth.wink.ws/moonyparser/

I've been to hell and back to make it faster, but that's it, shift-reduce can't be much faster. Unfortunately, finding errors in text makes it three times slower. At least it has linear parsing time.

parser utilizes a new grammar language borrowed from Synth project. It is a kind of structured BNF language.

Cool!

This looks really neat; thanks for posting.

Could you explain more about it? I've worked on building parser generators as well, so I'd love to hear your opinions, thoughts etc. on any of the following that you wouldn't mind sharing:

- how parse errors are reported, what information error reports contain, whether the user has control
- what the motivation for the project was (i.e. were there shortcomings in similar projects that you decided to address)
- why you decided to go with shift-reduce
- what challenges you faced
- what is easy/hard to do using your library
- what limitations it has

And a couple more less interesting questions:

- what languages it can parse (all context-free?)
- does it detect ambiguous grammars? Can it deal with ambiguous languages?
- why does finding errors slow parsing down? (I'm not very familiar with shift-reduce)

Also, I haven't heard of a universal parser before. What is that?

By Matt Fenwick at Wed, 2013-07-17 21:24 | login or register to post comments

GLR parsing

I believe what he's calling a universal/shift-reduce parser is mostly known as "generalized left-right" (GLR) parsing, sometimes also referred to as Tomita parsing. That would mean it would parse all context-free grammars, handling ambiguity through splitting the tree into a (possibly implicit) parse forest. GLR parsing is indeed entirely shift-reduce, with the added complication that ambiguities create points where a shift and/or multiple reduces may all trigger at a given point. One then has to prune these trees if a later conflict shows the ambiguity to be local.

I, too, am interested in excellent exposition of these kinds of parsers. I'm familiar with GLR parsing in general (and use a GLR parser generator), but am missing the experience of building one ground-up. I've been stymied by the lack of material on GLR parsing that isn't raw source of the few projects that use it (I understand Bison recently grew GLR parsing) or specialized more for natural language parsing.

By Chad Wellington at Wed, 2013-07-17 23:16 | login or register to post comments

Hi Matt, tx for interest.

Hi Matt, tx for interest. Here are answers:

- how parse errors are reported, what information error reports contain, whether the user has control?

For parsing grammar there are syntax and logical errors like variable not found and direct circular reference error. For parsing text there is only syntax error at specified line and column. After a text is parsed, user should look for logical errors like if date is parsed then month can't be more than 12.

- what the motivation for the project was (i.e. were there shortcomings in similar projects that you decided to address)

Moony Parser is a child project of bigger project of making all in one web 3.0 / new filesystem / database / case tool / programming language with given grammar syntax extended with some dynamics used to do programming. As I didn't find any context free grammar (CFG) parser, I've decided to do my own one. There are some parsed expression grammar (PEG -search google) parsers that are blazing fast, but they are not complete in a sense that not every CFG can be parsed. As parent project is intended to parse any definable grammar, I had to do my own parser, but it is really slow (5 lines per second on slow Celeron, 10 lines per second on my mother's other old Pentium). If U imagine standard C compiler speed written in asm ported to client Javascript interpreter, then U won't be surprised. But what a heck, 100 seconds of parsing 1000 lines would also do.

- why you decided to go with shift-reduce

I have written three different parsers and picked shift-reduce as the fastest one and with linear parsed time. Other two was 1. some combination between top-down and bottom-up (i had exponential parsed time here with some grammars) and 2. infinite look ahead top down parser that was blazing fast when text didn't contain errors (something like PEG parser), but when an errors came for testing, i had sloooooooow exponential time of reaching end state of parsing text. I'm still shocked, but I didn't know what to do, so I picked shift-reduce.

- what challenges you faced

Finding grammatical errors still give me creeps with shift-reduce.

- what languages it can parse (all context-free?)

I hope so. It is written to do so. But it has its own grammar. I hope that this grammar is as complete as context free one.

- does it detect ambiguous grammars? Can it deal with ambiguous languages?

Sorry, one parsed text, one syntax tree. It returns the first combination it finds in ambiguous grammar. Can be extended, but I don't see the point (yet).

- why does finding errors slow parsing down? (I'm not very familiar with shift-reduce)

After shift-reduce with errors U get fragmented syntax tree without a clue why fragmented trees are not stitched together. To find an error I used the method of parsing all incomplete combinations of all sequences (without last parameter, then without one before, then without one before...) on all parsed tokens, and when EOF is reached, the parser prints the farest point it got in one piece. I got three times slower parser (depends on average number of parameters in a sequence). I still don't have any other idea so I'll use this slow parser in parent project.

Also, I haven't heard of a universal parser before. What is that?

It is not a regular buzzword, I've called it universal because it should be able to parse any provided grammar.

Again tx for interest and I hope that the parser is fast enough for some use. At least it is free, if nothing else.

By Ivan V. at Wed, 2013-07-17 23:20 | login or register to post comments

Thanks for the responses!

Is the code posted on a site like github or googlecode?

By Matt Fenwick at Thu, 2013-07-18 18:52 | login or register to post comments

code is posted just on

code is posted just on provided web site.

By Ivan V. at Fri, 2013-07-19 02:06 | login or register to post comments

I don' understand why is git

I don' understand why is git so popular

By Ivan V. at Fri, 2013-07-19 13:34 | login or register to post comments

CYK handles CFG's

CYK algorithm on wikipedia

I haven't implemented this myself. I don't have any ideas as to how your thorough error handling could be duplicated in this framework, unless the error happened to be an ambiguous parse.

By mtenright at Sat, 2013-07-27 17:29 | login or register to post comments

moony parser 2

Hi all.

A new version of Moony Parser is out. This time it implements "Earley" parser. Also handles CFG. It is up to 100 times faster than previous version in some cases. There should be less bugs because the code is simpler.

link to homepage

By Ivan V. at Sat, 2013-11-16 00:54 | login or register to post comments

Lambda the Ultimate

User login

Navigation