grammars as a shared executable specification for language processing tools

Parsers written using our library are remarkably similar to BNF; they are almost entirely free of solutionspace (i.e., programming language) artifacts. Our system allows the grammar to be specified as a separate class or mixin, independent of tools that rely upon it such as parsers, syntax colorizers etc. Thus, our grammars serve as a shared executable specification for a variety of language processing tools. This motivates our use of the term executable grammar. We discuss the language features that enable these pleasing results, and, in contrast, the challenge our library poses for static type systems.

Executable Grammars in Newspeak (pdf) Gilad Bracha

Executable Grammars in Newspeak JAOO 2007 slides (pdf)

Comment viewing options

Need for explanation and a short rant

I wonder about two things. First: why isn't those article presented on the homepage but in the discussion forum instead?

Secondly, lots of annyoing complaints.

I miss a rationale of this work but maybe it's just that I don't get this new trend of writing all kinds of handcrafted parsers in a functional style. My consciousness obviously misses an information about the subconscious needs of other programmers.

IMO this trend towards parser combinators or the OO-style interpreter design pattern indicates a regression behind parser generators ( at least for CFGs ) and the clean separation of grammar descriptions and actions. However one does not find many usefull grammars in the infosphere anyway. ANTLR has kind of a repository but even those are cluttered with all kinds of parser actions and are mostly useless for other tools but ANTLR. Now things become even worse with ideosyncratic embedded formats that require a very particular general purpose language to be interpreted. The public description of a language becomes an implementation detail. Upside down evolution into programming language provincialism.

Fortunately there is still XML for the one thing it did right.

So my naive question is: what exactly is wrong with EBNF or PEG? I'm completely stunned.

It might very well have been

It might very well have been on the front page somewhere, this seems like a dupe to me but I might be mistaken.

I'm not exactly thrilled about parser combinators either, at least from the perspective of building non-toy parsers. Related to my own field, a parser combinator seems impossible to make incremental. Why would anyone want to do that when they could just crank out some quick ANTLR code instead?

As for EBNF, its hardly sufficient: such standard grammars are merely implementations themselves and don't capture the pure essence of the language's syntax. For example, there is a way to translate precedence into an EBNF, but reverse engineering precedence from an EBNF is impossible. When considering performance, error recovery, and being incremental, you need more than an EBNF.

Actually, I'm still seeing a lot of hand coded recursive descent parsers, which doesn't look half bad if done in a decent language like Scala. For what I'm doing, there are even pretty straightforward techniques for making these parsers incremental.

It should definitely be

It should definitely be possible to make at least some classes of parsing combinator incremental. There's more than one possible implementation strategy, and one is to 'compile' via an explicit grammar representation - you don't even have to stick to just the one parsing algorithm once you've done that.

For example, there is a way

For example, there is a way to translate precedence into an EBNF, but reverse engineering precedence from an EBNF is impossible.

I do not understand this claim. A usual way to translate precedence in EBNF is to split rules and organize them within a hierarchical fashion:
 sum_expr: mul_expr (('+'|'-') mul_expr)* mul_expr: atom (('*'|'/') atom)* atom: NAME | NUMBER 
What kind of use case do you have in mind where this is unfeasible or hard to perform?

I've not completely made up my mind about the unnaturalness of the representation. I guess you refer to the AST / CST mismatch? There are tons of libraries that address the object / relation mismatch for any mainstream programming language and even more literature about it but the AST / CST mismatch seems to be only subconsciously present and you see all kinds of hald baked workarounds. In discussion forums people come up with ideas like translating parse trees to XML to be able to deal with them. It's grotesque but there is very little analysis.

Maybe this paper is an

Maybe this paper is an interesting read in the context of the above comments.

Describing parsers written

Describing parsers written with parser combinators as 'hand-crafted' is IMO misleading. The combinators are generally an embedding of an appropriate metalanguage much as a parser generator would work with, and where the combinators aren't more powerful than a sane parser generator would be (Parsec would be such an example, with support for all the context-sensitivity you could ever want) they can actually create a representation of the grammar and optimise it before an actual parser is created.

It's true that not everyone is going for a clean separation of grammars and actions, but in many cases this is actually the right thing - providing that separation requires additional code and possibility for error. If you really care that much you can, of course, parameterise the parser on the actions!

As for EBNF and PEG, they're not necessarily powerful enough and they're certainly no good for factoring a grammar. Despite this, a good many parsing combinator libraries take one of the two as a starting point.

The combinators are generally an embedding of an appropriate metalanguage much as a parser generator would work with

I think this is a big part of the understandable reaction to them that Kay expressed. Work on parser combinators has been driven by the Haskell community, where embedded domain-specific languages are a preferred approach. Ease of development and the other advantages provided by the embedding are traded off against the advantages of using an independent DSL with its own custom syntax. Choosing the DSEL side of this tradeoff is part of the Haskell philosophy, which I think was first explicitly expressed in Hudak's Building Domain-Specific Embedded Languages and later in Modular Domain Specific Languages and Tools.

However, those not steeped in that philosophy often find these embedded DSLs rather clunky (technical PLT term), particularly when compared to existing non-embedded DSL equivalents. Parsers are a good example of that.

There're definite tradeoffs - myself I've found the non-embedded DSLs clunkier for my own purposes. With Haskell at least I suspect the clunkiness tends to be a small amount of constant overhead and some metalanguage tokens that could be shorter if they weren't competing with the rest of the host language for namespace - I'd welcome good counterexamples though, even if I'd be likely to take them as a challenge to match!

edit: There's an important disclaimer here - my preferred flavours of combinator have existing sugar for them in the form of things like the do notation. I've talked before about looking for more general notions of binding to sugar, this would be somewhere it could be applied.

sugar

Could you say more about what you mean by

"...my preferred flavours of combinator have existing sugar for them in the form of things like the do notation. I've talked before about looking for more general notions of binding to sugar..."

oh, honey honey

Here's a recipe for enhancing support for embedded domain specific languages in a host language, without resorting to redefinable syntax (macros):

1. Come up with a way of translating from a source language (a standalone DSL) to code in the host language. Under some circumstance, and many more if you squint, this translation is known as "denotational semantics". (Explaining that further would require a few more paragraphs, available upon request.)

2. If your host language is a pure functional language, you're going to want to define a generic data type which is capable of tractably expressing the translated code from point #1, otherwise your translated code is going to be pretty unmanageable. In Haskell, this data type is known as a warm fuzzy thing, which for historical reasons is abbreviated as "monad".

3. Define a syntax which exploits the warm fuzzy data type's genericity to allow the translated output from multiple source languages to be expressed consistently using the same syntactic construct. In Haskell, the necessary genericity is provided by the typeclass feature, and the syntactic construct is 'do' notation.

So, whenever you write code using Haskell's do notation, if you're perverse (like me) you can think of what you're doing as manually expressing that part of the program as the output of a denotational compiler from some source language(s). The source language may not explicitly exist, but it already has a ready-made semantics — the monadic code you're writing is the output of that semantics — and it'd be easy enough to make up a syntax for it.

Any of steps 1 through 3 could be done in a different way. Each of them involve a slew of choices and deliberate restrictions. It's easy to imagine using a similar approach with different details, although it'd require some good language design skills to do it as generally and well.

Critique of this characterization welcome. I hope Philippa will chime in if I missed anything.

Critique of this characterization welcome.

I'm trying to decide where the approach discussed in this thread/paper fits into this characterization.

I can see where point 1 applies, but to my naive eye it doesn't seem to mesh with points 2 and 3 quite as well. I'm open to arguments though. ;-)

and Hot Dog, too.

That's a thought-provoking connection, thanks. I'd say that my points 2 and 3 are outside the scope of the paper, but if you were going to use the paper's approach to embed realistic DSLs, those points could become relevant.

For example, if you used the paper's approach to embed unusual control handling via CPS, or mutable state, then something like monads and/or 'do' syntax could come in handy. It's also quite possible that you could come up with some general syntactic shortcuts specifically tuned to the paper's approach, which would be a point 3 feature (and could be interesting!)

The paper states that it "leaves aside the solved problem of writing a parser/type-checker, for embedding object language objects into the metalanguage". My points 2 & 3 are about addressing that problem without resorting to a special parser, other than for some kind of fixed multi-DSL syntax such as 'do' notation.

Personally, once I

Personally, once I encountered parser combinators, I never understood why you'd want to deal with an entirely separate language for them. A whole new language and semantics to learn seems like an unwarranted additional level of complexity. Of course, this depends on the parser combinator library's friendliness, ie. flexible error propagation, etc.

Homepage vs. forum

why isn't those article presented on the homepage but in the discussion forum instead?

Isaac is not an official contributing editor, so cannot post to the homepage. If he'd like to be one, he should ask Ehud (who I assume is fairly busy with his move right now, btw).

[Edit: I've promoted this item to the front page, and I'm off to spend my bingo winnings.]

Bingo!

Editing

I'm well aware that what I find interesting for a moment might not be that interesting to others, and in the past LtU editors have usually managed to push forum topics they considered generally interesting to the front page within a short time.

You quoted a research paper

You quoted a research paper and also from it. These are real news and quite different in character from making opinioted statements like mine in the subsequent comment section that gives rise to a true forum topic of the kind Parser combinators vs Parser generators.

Just a quick note, since I

Just a quick note, since I am still buried here...

If I am not mistaken I invited Isaac to beocome a CE, but he prefers the current situaiton (as he explains above). I usually promote to the home items that seem to me worth the attention.

However, one of the nice things about LtU is that people share things they find interesting, if there's a chance they will interest others. So, Isaac, the invitation is still open, as a CE you can decide which items to post to the home page and which you prefer to put in the discussion group.