OO type systems and BNFs

I'm looking at a type system that attempts to combine aspects of syntactic and semantic analysis. Say in a BFS, we have a grammar grammar production for a non-terminal like:

statement := block | method-call | assignment | ...

Now, to me, it simply looks like an OO type declaration with a closed set of subtypes; i.e.,

abstract class statement { ... }
class block : statement { ... }
class method-call : statement { ... }
class assignment : statement { ... }

The BNF approach resembles ML datatype declarations, which exhibit sybtyping. The OO approach is more extensible (new sub-classes can be added anytime) but lacks the ability to define other aspects of syntax (adjacent elements). Also, we never really think of a type as something to drive refinement autoamtically (e.g., an automatic specialization from statement to method-call).

So I'm wondering if anyone has ever tried to unify "types" with "grammars" before? I'm sure this isn't an original idea, but I'm not sure what keywords to google (term rewriting, transformational PL, ... haven't gotten me anywhere).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I suspect that the use of

I suspect that the use of compositional parsers such as boost::spirit will lead to this sort of structure, and is the closest thing I've heard of.

Hope that helps.

Ensō

I have not yet read the papers, so I'm not sure what it is about, but from the outside it seems related to the research on Ensō.

Enso or OMeta.

I agree with the apparent similarity to Enso. It also seems similar to Alex Warth's OMeta (part of the FoNC project).

I've also seen something similar done in Scheme, but I can't recall the project name.

I don't follow

The OO approach is more extensible

It isn't, as we all know, that's that good old OO-is-more-modular fallacy that emphasizes the case axis and ignores the operation axis. For ASTs, the latter typically is at least as important. (All related to the expression problem, of course.)

I'm not sure I understand what your goal is (in particular, where the "semantic analysis" comes in). It usually is straightforward to achieve a 1-to-1 correspondence between an algebraic data type for an AST and the respective abstract grammar. What is it that you are missing there?

I should have been more

I should have been more clear. In OO, you can specify a subclass after the superclass is defined (hence the meaning "extensible"), while with a algebraic datatype, the set of subtypes are closed on definition of the supertype.

I'm looking for previous work in PL. Our goal is to piggy back semantic analysis on top of syntax analysis (aka parsing) with a probablistic grammar (i.e., PCFG). Since there are not many fuzzy type systems out there, we want to see if we can make a connection between grammar and type definitions and work from there.

What happens with recursion,

What happens with recursion, eg, adding "block -> statement", so "statement : block"?

More seriously, there is a bunch on "OO grammars", though they lean on the dynamic and lazy side, so are generally less analysable.

More related seems to be, as Andreas pointed out, ADT, as well as funny attempts at typing XML schema..

Recursion in the has-a

Recursion in the has-a direction is fine, just not in the is-a direction. So things get weird, usually BNFs are has-a only, so the statement node will be a parent to the block node, but we could distinguish between is-a and has-a relationships in the grammar.

usually BNFs are has-a

usually BNFs are has-a only

Statically unbounded trees are useful, e.g., ASTs :) I'm guessing you mean unusual in your domain :) Put differently, "buffalo buffalo buffalo buffalo buffalo buffalo."

Using things like regex (not even bnf) for types is rare. Shriram has a whitepaper I believe on doing this for fancy record types to describe some duck typing patterns in JS.

The closest really seems to be XML schema types (Wadler, Meijer, and an overall cottage industry), which then you can use encodings to equate to OO types (I'm guessing what you mean by OO doesn't matter a whole lot here).

Interfaces

This is similar to what I have done in my parser generator, but instead of abstract classes I use interfaces as, in my case, alternative rules have no implementation (or fields). Interfaces also support multiple inheritance. Methods can then be added to the interface to be implemented by the classes. This approach works also well with structured grammars.

I would...

...look at the work on CDuce and other XML type systems. IIRC, they feature strong support for subtyping and OO-style grammar extension, which works well since context-free grammars are closed under union. As a result, adding productions to a CFG always leaves you with a CFG.

In fact, they are based on regular tree languages, which are exactly class of parse trees that context-free grammars produce. But by focusing on trees rather than strings, they are able to support operations that CFGs can't, like intersection and difference. Here's a link to the book on tree automata:

Tree Automata Techniques and Applications

+1 on that book, I found it

+1 on that book, I found it very easy to read

In the same vein, then,

In the same vein, then, I think the OP shouldn't miss to check out, if not done yet, James Clark's nice algorithm for RELAX NG validation(*), based on pattern derivatives and its nullable (-patterns) criterion. Unlike via XML Schema, XML document validation via RELAX NG benefited "from day one" of being usefully informed by a clear and precise surrounding hedge automata theory.

(*) and requiring just a small subset of Haskell to be presented; an algorithm also translated to Javascript by Nicolas Debeissat.