Extensible Term Language 0.2.1

The Extensible Term Language is a high level meta-syntax language that that allows to define small and big languages that use blocks, expressions, operators, and statements as primary meta-syntax elements. The language definition is compiled to LL(1) grammar afterward.

ETL tries to find a new balance between syntax generality and extensibility. It is designed to allow creating DSLs and programming languages almost as extensible as Lisp on the syntax level (but macros are supposed to be implemented as tree rewriting rules), while retaining nice surface syntax (this example tries to be as close to Java as possible and this one to be somewhat close to dynamic functional languages). The parser also supports automatic error recovery.

Java implementation of the parser is available for download. The documentation is also available online on the project's web site.

Since the previous announce there were mainly usability changes in the grammar definition language and now it is much more compact. There was a lot of bug fixes. And finally there is a tutorial that demonstrates how to implement own DSL on using the AST parser.

ETL might be a nice tool for quick implementation of own DSL with nice surface syntax and for creating new experimental programming languages.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Indentation-sensitive parsing.

With such a high-level syntax description, it seems like you could generate an indentation-sensitive parser as well. Something like that would be a great public service (I feel like more people don't define indentation-sensitive syntax simply because it's slightly more work). I imagine most "normal" grammars only have one set of multi-line nesting delimiter (normally "{" and "}") so it seems feasible to replace that with indentation.

Python-like phrase syntax

I had a Python-like phrase syntax at some point during internal development, I'm not sure about it, but it even could even be on sf.net CVS for xc4j project. Actually I came up with idea of phrase syntax and statements mostly during that period of work on the project, since without phrase-syntax is not possible to create a Python-like language. I have found other uses for phrase syntax after that, for example error recovery.

Before that I was trying to get Dylan-like approach working but with very little success. It also has a kind of phrase-syntax but it is not so generic.

Because of problems with lambdas, I gave up on Python-like phrase syntax. It is not difficult to recreate it using the current mechanics, I just feel that indentation insensitive {}-based has more application. It easier to do lambda expressions and code generators do not have to be to careful about the code. Also it is more predictable and easier to use.

If you wish, you could even fork the project, basically phrase parser has to be redone and grammar compiler has to be twisted a bit to allow statements to span several segments.

You tried it from the wrong end

Fitting block structures into expressions doesn't work well but turning each expression into a compound statement is easy.

Just add the following rule to Pythons grammar:

thunk_stmt: small_stmt ':' suite

and connect it with the rest

compound_stmt: if_stmt | while_stmt | ... | classdef | thunk_stmt

Examples:

1. Property definition

    x = property:
        def fget(self):
             return self._x
        def fset(self, value):
             self._x = value

2. Anonymous functions

    r = 42
    a = yield fn(x,y):
              z = y-x*r
              return z

The modified grammar isn't in LL(1) anymore though.

I still see the problem with it

There is no problem with having blocks and having LL(1) grammar. The block should be just considered to be postfix operator in expression, to be part of primary operator in expression. In ETL tutorial for example I have both kinds of blocks that are defined. It was done with C-like blocks, but the solution is still applicable. It was developed for Python-like syntax after-all.

The problem happens because in Python-like syntax that block has to be the last component in the expression, so it could not be used in the middle of it, and there are cases when it desirable.

As quick generality check you could try letrec loop pattern for Java:

new Object() {
  boolean isOdd(int i) {
    if(i < 0) i = -i;
    return i == 0 ? false : !isEven(i-1); 
  }
  boolean isEven(int i) {
    if(i < 0) i = -i;
    return i == 0 ? true : !isOdd(i-1); 
  }
}.isOdd(5);

Another thing is that there are uses for objects inside expressions. Consider the let and if statements. With open/close blocks I could do the following easily (calculator grammar from tutorial):

assert {?a,b; assert a > b; a * b;} ({let a = {let b = 1; b + b;}; a * a;}, 3) == 12;

Here we have block used to create a lambda expression ({? a,b; ... }) that checks arguments before multiplication and we have a block expression with "let" statements as the first argument in the function invocation.

Try to encode it as singleliner in Python-like syntax while keeping the same AST and operation order.

It is possible to refactor expression to be expressible in Python syntax by splitting components into different statements, but the fact that the phrase syntax so strongly affects expression and statement syntax was what I did not like about about it.

Of course I agree that

Of course I agree that placing blocks in the middle of an expression causes technical problems in languages that delimit blocks using indentation.

I also agree with you about preserving LL(1). If one applies a slightly more extensive modification of Pythons grammar it will be LL(1) again. These are the rewrites.

stmt: compound_stmt
single_input: NEWLINE | compound_stmt NEWLINE
compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | thunk_stmt
thunk_stmt: small_stmt ( ':' suite | (';' small_stmt)* [';'] NEWLINE )

What I do not agree about is that this is also a design problem but this is a fuzzy target. As I see you prefer JavaScript like syntax over Pythons or aim on producing maximal expression density per line so it is no wonder that I couldn't convince you that my claim makes sense.

Yes. The difference is in value judgements

Yes. It looks like we understand the problem in almost the same way. But we assign different values to the different parts of the mosaic. And you claim makes sense. In case if I were designed a some fixed imperative programming language, the Python syntax might have won. But I'm approaching a bit wider class of problems.

I value freedom to use blocks in the middle of the expression, since it puts less constraints on the language designer. And blocks and block-like constructs are used a lot in the functional languages. C/Java/JavaScript-like blocks just enable such usages with the least hassle of what I see.

I also value the elegance and friendliness to touch-typing. And at these points Python-like syntax wins (but only in cases when it support required syntax as well).

I just value the first group more than the second one. You possibly value the second group over the first one.