Syntactic abstraction? (newbie question)

Hi,

I'm trying to find out about languages/papers which deal with modifying the PL's syntax within itself. For example: incorporating EBNF so it can dynamically extend it's grammar. I'm not sure, but I was wondering if this is covered by the term "syntactic abstraction"?

My background is very much imperative languages, but I've done a fair bit with parsers, and I'm learning some Lisp. My web searches have only yielded "Boo" and S-expression langages so far.

Any help gratefully received.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Logix

You might want to look at Tom Locke's Logix language. Logix is described as

[f]eaturing a procedural macro facility, dynamic syntax extension, and multi-language parsing. A new operator, complete with syntax and semantics, can be added on-the-fly with a single line of code.
It was previously mentioned on LtU here and here.

Livelogix is not alive

Logix development was abandoned already in 2005.

When you want to have a macro system reified in non S-expr language there is Converge but also MetaLua. A true EBNF based extensibility of Python is provided by EasyExtend. However, with regard to the language, I wouldn't say it's in itself since I didn't saw the point of confusing the levels.

Discussed many times before

Specifically extensible syntax:

1. Katahdin: Modifying your programming language as it runs

2. Parser that allow syntax extensions

There are other items of interest you can google for: dypgen, which Felix is now using to support extensible syntax, "Formal Islands", the Nemerle programming language (which has more powerful macros than Boo), MetaOCaml (multistage programming instead of macros), Dylan programming language macros, and so on. I'm sure others can fill in some more. :-)

Macros

Usually "syntactic abstraction" refers to macros (and related facilities) that operate within the language and allow controled extension and addition of syntax. It seems to me that the ability to change the syntax per se, is not enough for something to be considered a syntactic abstraction facility.

Another puzzle: does the word abstraction in "syntactic abstraction" refer to the fact that adding new syntax is a way to create abstractions, or do we require that the syntax extension mechanism (e.g., macros) provde abstraction properties (e.g., work well with scoping, etc.)?

Good question

Ehud wrote: does the word abstraction in "syntactic abstraction" refer to the fact that adding new syntax is a way to create abstractions, or do we require that the syntax extension mechanism (e.g., macros) provde abstraction properties (e.g., work well with scoping, etc.)?

I think, probably, neither, but I don't suppose there is a great deal of consensus on the matter. The latter should probably be qualified by something like "semantics-respecting syntactic abstraction", the former is an observation about a property a language has, rather than the property itself.

I think syntactic abstraction means "possesses abstractions for representing, manipulating and putting to use the PL syntax": this, I think, is the sense that CLers have in mind when they talk about CL having good syntax abstraction.

A counter question: if we allow that CL has syntactic abstraction, does Tcl?

abstracting over syntax

I think, probably, neither, but I don't suppose there is a great deal of consensus on the matter. The latter should probably be qualified by something like "semantics-respecting syntactic abstraction"

Isn't "semantics-respecting" more or less what the term "hygienic" is for? I agree that the qualification is necessary - languages with "dirty" macros still have syntactic abstraction, even if it's unsafe.

The characterization of syntactic abstraction that I'm most familiar was expressed by Matthias Felleisen on the LL1 list:

Macros exist for the very same reason that lambda exists (or Lambda): to abstract over patterns.

lambda abstracts over values.
Lambda abstracts over types.
Syntax abstracts over syntax.

A phase separation (theorem) should ensure that all three of them abstract at the right time and in a way that doesn't affect later time.

"Abstracts over syntax" implies Charles' "possesses abstractions for representing, manipulating and putting to use the PL syntax". Syntactic abstractions take syntax as input and produce syntax as output. I take the "should" in Felleisen's last sentence literally — he didn't say "must"!

A counter question: if we allow that CL has syntactic abstraction, does Tcl?

The existence of TMac, and the fact that it works by preprocessing Tcl source, would imply an answer of "not so much" at least. But I don't really know what the creation of a new syntactic abstraction would look like in pure Tcl. I'm sure someone can horrify me with an example!

Seems to me that "abstracts

Seems to me that "abstracts over syntax" implies that "the syntax extension mechanism (e.g., macros) provide abstraction properties" and that "a phase separation (theorem)" means that the construct should respect scoping - the properties I referred to.

defmacro?

Seems to me that "abstracts over syntax" implies that "the syntax extension mechanism (e.g., macros) provide abstraction properties"

Agreed.

and that "a phase separation (theorem)" means that the construct should respect scoping

Wouldn't that rule out CL's defmacro?

I'd argue that CL macros

I'd argue that CL macros fail the phase separation test, which means that they could be improved upon. Which is exactly what Scheme macros did...

Why the need for phase

Why the need for phase separation?

Confused

I don't follow Felleisen's post. Is he using "abstracts" as a synonym for "quantifies"?

Tcl's ability to define new constructs is pretty powerful, if you don't mind some extraneous $ signs appearing. It's cleaner than UNIX shell, but there is plenty of opportunity for horror.

Thanks

Plenty of reading material there -- in particular Chris Seaton's thesis. Looks like I came to the right place :-)

Thanks for the help.

Felix inline syntax extension

Just a reminder Felix provides inline syntax extension utilising the (Ocaml encoded) Dypgen extensible GLR parser. Almost the entire grammar is defined in the library. User actions of productions are generally written in (Ocaml encoded) OCS Scheme.

Although Felix has an ML like native syntax, this mechanism allows many familiar C constructions to be parsed and translated. Here is a snippet:


syntax csyntax {
  requires statements;
  ...
  c_type := 
    | pointer_type =># "_1"
    | c_type lpar star rpar lpar c_type_list rpar =>#
      "`(ast_longarrow (,_6 ,_1))"           // cfunction
  ;
  ...
}

The DSSL (Domain Specific Sub-Language) csyntax must be opened to enable parsing, the requirement on DSSL statements ensure that DSSL is loaded before csyntax.


open syntax csyntax;
int x = 1;

Some work has commenced to provide an SQL syntax module.

d-expressions, dave moon, perl 6, ...

You might also want to check out Johnathan Bachrach's D-Expressions: Lisp Power, Dylan Style, the Java Syntactic Extender, Perl 6 rules, and Dave Moon's suggestions for extensible syntax in the Arc suggestions (there are two posts by Moon on syntax in there).

Forth

Forth (and many of its descendants, my favorite of which is Factor) have a parser which is fully written in the language. Macrology works by simply extending that parser. It's possible, in theory, to change everything down to the parser's meaning of a word. (A word is a whitespace-delimited token which operates as a function does in other languages, except it operates on the stack.) But usually, when extending the syntax, we just add new immediate words (or, in Factor's terminology, parsing words) which are basically functions, called with normal function syntax, which execute at parsetime. (Of course, parsetime can be anytime!)

As an example, here's the definition in Factor of the word used for most word definitions, : (colon)

: :
    CREATE dup reset-generic parse-definition define-compound ; parsing

Oooh, see the metacircularity! The first colon in that definition begins the definition, and the second colon specifies the name. This may sound like a low level way of doing macros, but in my experience, it's no more difficult than CL or Scheme macros. This is because the parser has a high level interface for those who don't want to go into its depths.

So far, this completely self-hosted parsing style, where the parser is written in user level code, has been done in just a few languages, mostly in the Forth family. Looking at the above comments (shoulda done that before posting initially, where I got this wrong) it looks like a few other programming languages are catching up to Forth in this respect. I think it's a really interesting feature which allows for more flexibility in certain cases than traditional macro systems.

...and back.

So far, this completely self-hosted parsing style, where the parser is written in user level code, has only been done in languages from the Forth family, as far as I know. But there's no reason why that has to be (except that Forth/Factor present a particularly easy case for this sort of parsing, as their syntax is extremely simple).

Many language implementations have parsers that are "fully written in the language", in user-level code. Many self-hosted language implementations would qualify on that count.

However, for languages with syntaxes more complex than Forth-like syntax, making it possible to usefully (and safely) extend that parser, without having to directly modify its source, requires some work.

Typically, the way syntax extension is achieved in such languages is to provide a parser whose rule set can be extended or replaced in user code. That's the approach taken by both Camlp4 and Scheme, for example, and quite a few more experimental languages.

In such cases, whether the parser itself is written in user-level code isn't particularly relevant, because extension is done by adding rules using (typically) a DSL provided by the parser. Although the Camlp4 parser is written in OCaml, and many Scheme macro systems are written in Scheme, it wouldn't be very practical to extend syntax by extending the parser itself. That is, unless you count the ruleset as part of the parser, in which case that's exactly what those languages let you do.

Kinda

Soon after making that comment, I edited it, because I realized it wasn't completely right. One thing that I was unclear about was what I meant by "user level code." GHC is a completely (as far as I can tell) self-hosted GHC implementation, yet I wouldn't call its code "user level" because, to change something about it, you have to go hack GHC; you can't just add something to your program which changes the way the compiler or parser works. It looks like this is the way Camlp4 works (or maybe it's just a preprocesor; I can't tell because I haven't looked into it much), and also Katahdin works this way, so I have to retract my earlier comment. But as far as I can tell, Scheme/Common Lisp/Dylan macros don't; the language is still fixed to use mostly S-expressions (or D-expressions), and reader macros have limits. But you're right that this doesn't really matter in practice. It's just a slightly different way of doing things which is both interesting and useful in languages where it works well. Sorry for my misstatements.

an old saw

Forth wins the joke, here, I think.

Forth syntax is "simple" in the sense it doesn't have to deal with variable names -- nice combinator language, that.

Nevertheless, you may not realize, Anton, that in Forth, syntax extensions *are* written in an extensible, compositional, hygenic DSL (with intentional violations of hygiene permitted), defined as part of the parser. It's damn similar to Scheme, in some ways, actually.

There is a difference, that I think Daniel was getting at:

In Scheme, the "primitives" of the DSL for syntax extension must be provided by the implementation. There is no (reasonable) way to define a facility for macros in terms of "macro-less" Scheme except in the form of a source to source translator.

In Forth, the "primitives" of the DSL for syntax extension are written in "macro-less" Forth. That is to say, the "core" definition of Forth, a minimal set of "primitives" one has to implement to achieve forth, lacks anything like explicit syntactic abstraction facilities -- but then such are quickly (and usually) implemented in a portable way on top of that core.

This is because Forth, unlike Scheme, has not inherited an everything-is-batch-compiled-always fetish, like Scheme and, consequently, has never abandoned the idea of presenting the core "VM" of the operational model in a fully reflective way.

"Folklore Scheme" -- essentially the idea of a virtual machine with a GC'ed heap, three registers, and a trivial single-step rule -- that language is a lot more like forth. In "folklore scheme" you wouldn't talk about a Scheme parser as an essential concept -- but rather as a class of optimizers than can be applied to some, but not all libraries.

-t

same diff

There is a difference, that I think Daniel was getting at:

In Scheme, the "primitives" of the DSL for syntax extension must be provided by the implementation. There is no (reasonable) way to define a facility for macros in terms of "macro-less" Scheme except in the form of a source to source translator.

This is not very different from what I was getting at. I didn't mean to imply that Forth/Factor doesn't use a DSL here, but rather was primarily getting at why the core parser code tends not to be extensible in languages with more complex syntax.

My comments about the use of a DSL were a response to Daniel's references to "written in the language" and "user level code". Since as I mentioned, both of those descriptions can and do apply beyond Forth, I thought he might be saying something about how syntax extension code in Forth didn't have to rely on a DSL. E.g., people sometimes complain about Scheme's syntax-rules macros not being ordinary Scheme code, the way that defmacro macros are ordinary Lisp code. I'm sure your clarification is closer to the mark, thanks.

Forth syntax is "simple" in the sense it doesn't have to deal with variable names

It also doesn't have to deal with much structure.