Parametric Grammars

I am curious why it seems there is little or no research on adapting grammars and parsers to support parametric polymorphism. It seems to me that grammars and types are more or less the same thing, and technology related to polymorphic type systems should apply to grammars and parsing as well.

Existing grammar technology has woeful compositional properties compared to type systems. I am currently using Dypgen which allows dynamic extension of a grammar. But first let me backtrack a bit:

Suppose to have an executable recursive descent parser for statements, where the parser accepts a list of statement forms and tries each one until it succeeds. If you put that list in a global variable, it is easy to extend the system by constructing a suitable data structure for parsing a statement at run time, push it onto the statement list and store the resulting list in the global variable.

The use of a global variable here rather than a weak functional technique is mandatory when you consider that some statements may be composed from others, and we want the recursion to extend nested statements to include the new production too.

Now as to Dypgen, it is better because it is purely functional in that after adding a new production for a statement, it rebuilds the parser engine, and so the recursion required to support nested statements works.

BUT .. we are still adding a new production to a statement, which is similar to hacking an Ocaml variant type and adding a new case, then recompiling. It's not the recompilation that concerns me here, but the fact we're forced to modify the old grammar to extend it.

The thing is the *right way(tm)* to do this would seem to be to use open recursion: in Ocaml you can use polymorphic variants with a parameter which is closed to form a concrete type, and for an extension you can add new variants to the open form and then close that. With this technology we have real subtyping: we have a type which is open for modification, and can be trivially closed for use, thus satisfying the open/closed principle.

Why can't we do this for grammars?

There are some real trivial uses for this. Dypgen supports 3 polymorphic operators already, namely * + and ?. But now, suppose I want to define "comma separated list of arbitrary-nonterminal" which in fact I need a lot.

I'm being asked to use a technology so seriously archaic it is worse than Basic or Cobol: it doesn't even have "subroutines". What I need here is actually quite flat: it needs parametric grammars, though not open recursion.

Given the huge amount of research into type systems .. why am I still using Assembler to write my grammars?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Compositional grammars

I agree with you intuition that grammar is a type of text.

I'm not sure that problem is parametric polymorphism. I thing the problem is a generic composition.

Note that there are high-level compoistional grammars here. But they all are for the languages that has common token and phrase layer. Currently it is XML/SGML and analogs. They are unsuitable for programming languages. But a lot of composable DSLs are created using them. Just consider web services with all their extensions.

For programming languages it is becames increasingly common to support syntax extentions for the expressions. The list of operators forms a kind of modular language definition. I first met this approach in the Prolog ("op" declaration), but possibly they have taken the idea from somewhere else.

I have been moving in that direction with ETL previously, currently the project is on hold since I need to rework the way keword are handled and unicode support. IMHO the trick for the grammars with rich compoistion language is starting to define grammras in terms of contexts, statements, and operators. Then they would compose. If we stick to productions, we could go only so far.

Pragmatic: menhir

If you don't need dynamic extensibility, you could try Menhir, by François Pottier and Yann Régis-Gianas, which allows parametrized grammar rule, with even higher-order parameters.

The semantics is quite simple -- the engine perform substitutions/expansions/duplications to get a non-parametrized grammar before compilation into the LR automaton -- but it is still carefully defined, with well-formedness checks to ensure termination of the grammar, for example.

Menhir

Ah yes, thanks for reminding me about Menhir! My project is using Dypgen, however my actual query is more about theoretical developments and application thereof.

Definite clause grammars can

Definite clause grammars can be parametric.

commaSeparated(X) --> X
commaSeparated(X) --> X, [COMMA], commaSeparated(X)

Where COMMA is the comma token. This is assuming that the underlying logic language has higher order relations.

Open recursion in grammars can be found in OMeta.

A few references

Your intuition is the basis for the PADS project (www.padsproj.org). PADS/ML, the OCaml variant of the language, incorporated parametric polymorphism for just the reasons you suggest. The theory behind it is detailed in "The Next 700 Data Description Languages" JACM 2010.

Also, there's a paper by Peter Theiman called "Parameterized LR Parsing" which is closely related to what you're describing.

amethyst has good compositional properties

My pattern matching engine amethyst support inheritance and parametrized rules with lambda abstraction.
Nice examples can be found in amethyst standard prologue.

http://kam.mff.cuni.cz/~ondra/amethyst/amethyst.ame.html

More lambdas can be found in highlighter which is just annotated parser.


http://kam.mff.cuni.cz/~ondra/amethyst/parser_highlight.ame.html