Non-Lisp, natively extensible?

What languages out there that actually have verbose syntax provide good support for the addition of syntactic features? The ability to introduce new special forms in Lisp is awesome... what about the same in a less homogenous language? I understand it would be difficult to open up a language's grammar like that, but is there any work on this, using parsing expression grammars or something of that sort?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I hear Ocaml and Logix have s

I hear Ocaml and Logix have some support for these things.



Oddly enough, there are some projects that help you create syntax extensions for Java. There's a thread about it here.

Open Java


I recently posted a forum topic about Logix, a "multi-language programming system" whic supports dynamic syntax extension. Previously discussed on LtU here as well. Apparently Nemerle also supports syntax extension (although I haven't actually looked at it yet). And let's not forget OCaml, which allows syntax extension via Camlp4 (unless you don't consider the application of a pre-processor to be "native" extension).

O'Caml Nitpick

Allan McInnes: And let's not forget OCaml, which allows syntax extension via Camlp4 (unless you don't consider the application of a pre-processor to be "native" extension).

It's rather unfortunate that they call it a pre-processor, as it really isn't: it really is a proper separation of the surface-syntax-to-abstract-syntax transformation from the remainder of the compiler. In other words, your syntax extensions in camlp4 are not parsed again when they get to the rest of the compiler. The compiler operates on ASTs generated by camlp4 just as surely as it operates on ASTs from the standard syntax when you don't use camlp4.

Thanks for the clarification.

Thanks Paul. I've never actually used Camlp4, only read about it in a general manner. I appreciate your clarification of the way it operates.

No Worries... just occurred to me that not only might someone object to counting a "pre-processor," but that camlp4 actually tends to be under-appreciated relative to, say, Lisp macros, and that once in a while someone will say something like "gee, wouldn't it be nice if compiler developers would separate the parser from the from the rest of the compiler so we could use any surface syntax that generated an appropriate AST," not realizing that that's exactly what the O'Caml team did.

Still worried

camlp4 actually tends to be under-appreciated relative to, say, Lisp macros

Isn't it because AST of O'Caml seems to be under-documented? No offense meant here, but are Quotations for creating abstract syntax trees "the" documentation on AST? If so, they seem to be at a bit wrong level of abstraction details :-(

Separating parser and compiler

Well, it is certainly useful that OCaml separates the parser from the rest of the compiler. However, I'm not sure that allowing developers to use a separate concrete syntax at the file level is really very helpful, particularly if you are using source-level code control (e.g., CVS). It would be better if the syntax changes were part of an editor view, and the underlying file stayed in a canonical syntax. Still, ocamlp4 would make implementing such a thing considerably easier. Can you use it as a library, or just as a standalone tool?

Homogenous, but not Lisp...

... is Maude which uses its equational rewriting logic to implement a more interesting extensible mixfix parser.


The ones that expose their compilers, that is.

The key, though (and one most language systems, including Lisp systems, don't get right), is that after one mods the language, the debugger should also be modded so that debugging can be done at the same syntactic level as the code that the user writes at. Smalltalk, as it generally has the debugger source available, is very good for this. Most Lisp systems don't have their debugger code exposed so it's pretty hard to hack in to anything other than the primitive level (which sometimes seems as troublesome, after I've written a good compile-level transform, as if after I added some syntax to a C compiler I needed to have the user to debug that syntax at the assembly level).


I do think that each language that separates AST generation and bytecode compilation and provides an API to ASTs should be appropriate. I digged into the Python std lib and found a function called list2ast() ( and the converse ast2list() ) converting a nested list of certain objects into an AST ( and reverse ) that can be compiled into Python bytecode. Since there are pure Python parsers that parse a grammar into an AST only changes in the grammar file and AST->AST transformers are needed to extend the language ( and a few other files of the std-lib for convenience ). I don't know about Ruby internals that seems to be your favourite language in the moment but I would try to figure out if something similar is possible without touching any one of the C-sources of the interpreter.

Tcl and Pop-11

Tcl allows syntactic extension in that it does very little of its own parsing beyond basic tokenization. This leaves individual commands with a very free hand to implement new syntax (within scripts passed to that command). This is used e.g. by the critcl extension, which allows direct embedding of C code into Tcl:

critcl::cproc quadruple {int i} int {
    return i * 4;    /* this is C code */

I believe there are similar extensions for Perl, but I don't know whether they look as integrated as critcl code does. You can also override [source] if you want to try global syntax alterations, but I prefer the per-command approach, as it at least delimits the scope of the new syntax somewhat. Tcl lacks a really good parser framework, though, so most syntax extension is fairly conservative.

Pop-11 provides a full interface to the workings of its incremental compiler as a suite of library procedures, alowing unlimited syntax extension (and wholesale modification). From the comp.lang.pop FAQ:

Because Poplog incremental compiler facilities are available in a suite of Pop-11 procedures which compile code incrementally to a powerful general purpose virtual machine, it is not hard to develop new languages or extensions to old ones. This is how Common Lisp, Prolog and ML are implemented in Poplog, and many people have implemented various extensions to Pop-11 and other languages based on Pop-11.

Forth (and Factor, too)

Forth is lower level than Lisp, but extensibility is the name of the game.


No one's mentioned it yet, I think.

You'll want to have a look at

You'll want to have a look at Pliant at its new home. Lispy underneath, but quite a developed syntax on top. He manages to have most of his cake, and eat it too.


Might as well mention Rebol again. Extending the language by creating new "dialects" is a major part of Rebol.