Grammar Visualization

An interesting visual comparison of the grammars of Ruby, Java 1.5 and Javascript.

Anyone care to interpret the graphs?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Some random reflections

I'm not sure if one can draw any sensible conclusions from these graphs but for what it's worth here's my thoughts.

I think the Ruby grammar reflects the situation of a language which have been revised over and over again, trying to allow more and more syntactic quirks. And this while not having stepped back and rethought the grammatical structure from the beginning to make it more general. Compare with the JavaScript grammar which looks really nice.

You can see a similar tendency in the Java graph towards the bottom. I was actually surprised to see this, especially when it comes to the new annotation language in Java 1.5. I think it has a very nice syntax which blends nicely with the rest of Java. I would have expected more sharing with the rest of the grammar. But I think this is due to the fact that ANTLR (together with many other parsing tools) doesn't have higher order productions so the possible sharing is limited.

It seems that ANTLR doesn't support operator precedences. They introduce artifacts in the graph (and hence in the grammar) which I just find confusing. They produce long chains in the graphs which doesn't really say anything about the structure of the grammar. Several of the grammars could have benefited form having that.

I'm really impressed with the simplicity of C's grammar. I hadn't looked at it before. Python's is also surprisingly neat. But simplicity of the grammar might not be the most important goal when designing a syntax, otoh.

I'm really impressed with

I'm really impressed with the simplicity of C's grammar.

the c grammar is actually ambiguous. the parse phase requires semantical input. this makes parsing such a grammar rather difficult, as the job requires good knowledge of the language specification.

in this regard the simplicity of a language's grammar structure can be rather misleading.

In what sense does it not support operator precedence?

Do mean that you can't describe a set of operators with one BNF rule, and then describe what order they should be evaluated in?

AFAIK, ANTLR aims to produce essentially what you'd write if you wrote a recursive descent parser by hand. Doing operator precedences by hand is painful and is a bit of wart from a code perspective, when it can be done by simply adding another BNF rule. In fact, I find the BNF way of setting up precedence to be a very "orthogonal" method when designing grammars, although it may make things less pretty when looking at a graph of the grammar.

Danger, WR: Precedence

The Javascript grammar looks cleaner, but I think this is mostly because it is stretched out by that long chain down the middle. And that corresponds to a zillion level of precedence, practically one for each operator (!). This just goes to show that visual appeal is only vaguely related to being easy to understand. I'm tempted to give Perl's grammar the same treatment. I suspect it would look fairly clean, since it also has many levels of precedence, and since so much work is done in the sentient lexer.

I may be mis-reading you..

..but what's so hard to understand about Javascript's syntax? I find it rather straight forward, personally, and it's one of my favorite "get stuff done" languages. (If only it wasn't so hampered by it's usual operating environments aka web browsers...)

I meant that it's "hard to

I meant that it's "hard to understand" by LtU standards, where difficulty simply increases with the number of distinct syntactic classes. For example, Scheme and Smalltalk are "easy" because they have a small number of syntactic forms (the fact that many people are confused by the profusion of parentheses or lack of operator precedence doesn't matter), while C/Java is "hard" because it has many more syntactic forms (that most people have already learned many of these elsewhere is not relevant). Basically, "hard" in this context means "hard to write a parser for it" or "hard to remember what happens in edge cases".

What is beauty?

To most untrained observers, a graph looks "clean" or "simple" if:

  • it is planar,
  • its edges are short and do not vary too much in length,
  • its vertices have low in/out degrees,
  • its edges and vertices are well spread out over the available space without clustering

It is not at all clear that any of these criteria confer any useful properties to the grammar or the usability of the language described by it. For instance, a highly connected vertex could indicate a language element that can be used in many places, which increases (perceived) orthogonality in the language.