Extensible Languages

I'm interested in some thoughts on issues related to syntactic extensions to programming languages. As mentioned Felix now has DSSL (Domain Specific Sublanguage) support. The Felix grammar is now loaded dynamically from the library at run time. This looks like:

  syntax dssl1 {
    nt := sym1 sym2 ..  =># "scheme code";
    ...
  }
  ..
  open syntax dssl1, dssl2;

Here every name used is taken as a nonterminal symbol. There is no error checking. A new GLR automaton is built where the open syntax statement is issued, and applied until the end of the containing scope; more precisely, the scope is determined directly by the grammar itself: state modifications to the parser effected in a reduction propagate to the end of the production parsing that nonterminal.

Attempting to parse with an undefined nonterminal raises an exception.

Several DSSLs, including the current set, may already define nonterminals required for some other DSSL. There is no way to specify such dependencies. The same DSSL can be loaded twice. No attempt is made to stop that, the grammar is just extended again. My syntax doesn't allow for handling merges: the parser engine takes the most recent rule and uses that, so an ambiguous parse of a nonterminal will be silently repaired.

Clearly, we'd like DSSLs to be modular and well typed, with the usual kinds of properties we expect from libraries and module constructs. How would you suggest to fix these problems to achieve those goals?

Whilst this question is aimed at programmers, I have one for the category theorists too. In a language like Ocaml, an interesting technique termed open recursion can be used to extend types and functions covariantly an in a modular fashion. An otherwise recursive construction is made non-recursive by replacing inner calls with a parameter to form an open version of it, and then a second construction defines the desired closure by applying the symbol to itself (yes I know that's a crude description). The parameter also allows covariant extension by composing several such open constructions, and then closing them with respect to the composition.

To me there is an obvious and unexplored extension to this technique to grammars. We make a production non-recursive by replacing a recursion by a parameter, so now we have a new notion: a nonterminal parametrised by another nonterminal, and we form a closed term by applying the nonterminal to itself. If we add new productions to the nonterminal grammar, we can form closure of that too, to obtain a new language.

An extension of this idea is to parameterise a whole DSSL (grammar package) with other DSSLs, much like an Ocaml functor, and again form a closed language by a self-application.

It seems to me a nonterminal is really a just a type, and alternate productions are really constructors, so the analogy with open recursion for types is strong, and it would seem the user actions similarly would need to be open recursive.

Is there any theoretical work on this idea that suggests how a statically typed grammar with open recursion would be typechecked? For the modular extension? Is there even a model of it?

To me this kind of thing is better than what I am doing now, which effectively adds productions to an existing nonterminal, making the old version unavailable. Although this is dynamically covariant, it is not pleasing because it is a mutation rather than being purely functional and combinatorial.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Two level van Wijngaarden grammars?