A rationale for semantically enhanced library languages

Bjarne Stroustrup. A rationale for semantically enhanced library languages. LCSD05. October 2005.

This paper presents the rationale for a novel approach to providing expressive, teachable, maintainable, and cost-effective special-purpose languages: A Semantically Enhanced Library Language (a SEL language or a SELL) is a dialect created by supersetting a language using a library and then subsetting the result using a tool that “understands” the syntax and semantics of both the underlying language and the library.

How similar or different this idea really is compared to the facilities found in PLT Scheme and other previous apporaches to this issue?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

pivot, the discussed

pivot, the discussed framework, provides program transformation on an intermediate format. in that light the novelty of the concept is more than questionable as it basically patches a deficiency that never existed in s-expression based languages.

on the other hand, in the paper bjarne hints at the irony that languages like scheme provide a general solution to dsl implementation, while being even less successful commercially than kludgy implementations of dsls in not so ideal host languages.

But it might be better from

But it might be better from an architectural point of view to avoid macro definitions everywhere in the application code and factor them out into a preprocessor framework like pivot instead, where they are ( hopefully ) well separated together with the description of the extension / DSL. Writing all application code in a style such that a certain transformer component can be written most easily strikes me a somewhat strange language design philosophy. So the advancement may simply be a well factored solution without the arbitrary constraint of writing source code in the form of parse trees?

i agree that this is an

i agree that this is an advancement (some lisp/scheme aficionados might not ;). still, the concept is the same: a code analysis and generation phase for a machine processable representation of the source code. the paper mainly talks about the high costs for dsl support in tool chains and the like, which is a non-issue for scheme.

besides, this is not new in the domain of non-machine-processable languages either. see A Metaobject Protocol for C++ for a compile time programming framework quite similar to the pivot in stroustrup (though it lacks the explicitness in regards to design conventions and associated costs). openc++ is a mature implementation of that protocol. similarly, gccxml provides machine processable intermediate xml output. edg earns a living by licensing their front ends for that kind of work.

it's a good idea, but not a new one...

Wouldn't meta-programming be enough for all these tasks?

Interesting paper, and Stroustrup again touches a very important subject. But I think that meta-programming can cover all these needs, because it effectively allows the compiler to be extended infinitely.

One major use of meta-programming that I have lately been discussing is to setup a project's environment. Many business projects require lots of work to set them up: installing and configuring the database, installing and configuring the application server etc. All these tasks could be automated through meta-programming.

Meta-programming could also be used to achieve reasonable garbage collection for C++...

Which Meta-programming?

Could you be a bit more precise about what you mean with "Meta-programming"? Macros are meta-programming, meta-object protocols are. Sure, extensible programming language are incredibly powerful. But, speaking of (Lisp-like) macros, I wonder if there will ever be a macro system for languages "with syntax" that will be as powerful and as easy to use as the ones for Lisp. Trying to use Template Haskell just showed me that there's still a long way to go and I'm not even sure this is necessary...

(In fact Lisp code is even much easier to write once you have structural editing support--which is very easy to implement due to the simple syntax--and helps you avoid stupid "the compiler interprets it differently than you"-bugs.

Not templates - the language itself executed at compile time

Taking languages like Java, Visual Basic, C# and C++ into consideration, meta-programming means to execute some code at compile-time instead of run-time. The compile-time code would be given full access to the environment of the compiler, thus being able to transform any program into any other program, as well as prepare the necessary infrastructure for the program to run.

Template Haskell

Exactly this is what Template Haskell does. It even integrates with the type checker: The transformer function is type checked to return the correct type (basically a syntax tree) and the resulting code is of course type-checked, too.

The big problem is: Without a nice quoting facility this becomes incredibly unusable and ugly. The following code:

 (InstanceD [] (AppT (ConT (mkName "Compos")) (ConT name))
          [FunD (mkName "compos")
               [VarP (mkName "return"),
                VarP (mkName "ap"),
                VarP (mkName "f"),
                VarP (mkName "t")]
                  (CaseE (VarE (mkName "t"))
                    (matches ++
                     [(Match WildP 
                        (NormalB (AppE (VarE (mkName "return"))
                                       (VarE (mkName "t"))))

generates this simple snippet:

instance Compos Main.SExp where 
  compos return ap f t = 
    case t of
     ... other matches ...
     _ -> return t

Furthermore, it's not clean either, since it might interfere with names in the context. There is a simple quoting facility, where you can write your "code template" like [| let x = foo b in ... |] but you can no longer use it once you do little more complicated stuff. The description of the syntax tree datatype also takes multiple pages.

So if you want to design a language with a usable meta-programming scheme in this sense, then you have to design your syntax appropriately. Otherwise, it can quickly get a big PITA.

You don't need any special syntax except for 1 keyword.

You don't need any special syntax except for one keyword. This keyword is responsible for identifying the parts of code to be executed at compile-time.

Not even a keyword

A compiler should just be smart and figure out which parts should be partly evaluated at compile-time. What you do need is 2 environments, so that you can either write CompileTimeSystem.loadFile or RunTimeSystem.loadFile. For websites, you could have 3 environments, compile-time, server-side run-time and client-side run-time.

I don't think your argument

I don't think your argument is valid. The constructive approach of creating nodes as shown in the TH example is just a basic but essential step on writing a more user friendly macro facility on its top. Once you have established it you can reverse the creation process and start with the target of the transformation and insert the necessary nodes found in the source.

I've done all this recently for Python using the following ingredients:

1) A grammar rule description of source code to be transformed into the target language e.g.

repeat_stmt ::= 'repeat' ':' suite 'until' ':' test 

2) A function that extracts nodes like suite and test from the parse tree, which is trivial.

3) The description of the transformation target with slots / node variables being interspersed:

while True:
   if <test>:

4) A function that keeps the target, passes the nodes into the appropriate slots and calls the transformer to spit out the resulting tree. I simply called it expand().

This can be enhanced by introducing an operator that executes arbitrary code during transformation/expansion within the target which is mostly used for conditional expansion with an unknown number of branches ( think about a switch statement that shall be transformed in a long sequence of else-if's for example ) or in cases where e.g. <test> represents a list of nodes and one has to select exactly one using a subscript e.g. <test>[0].

The implementation of the machinery and the particular evaluation operator is neither very extensive ( ~400 LOC ) nor trivial but it doesn't matter because it is hidden from the user.

"nor trivial but it doesn't matter"

As long as all of the bugs really are worked out ;-)

Some bugs must stay.

Some bugs must stay. Otherwise people have no fun in finding them. Where should the self-affirmation come from in an age where the best development methodology is to use Google codesearch and paste found code snippets together?

A noteworthy special case of

A noteworthy special case of Stroustrup method might be a library (in C++) that is also a dynamic language. An example could be TCL/TK. But Stroustrup specifically doesn't discuss this case because it is "out of scope". But this approach seems to work for the same reasons that SELL might work.

I thought the point was

That he's proposing a way to extend languages without introducing a full blown macro system and the problem that coders then create their own little DSLs that nobody is familiar with. He proposes you add things your 'DSL' can do by creating a library, but then using Pivot to make it so that only a subset of normal C++ is legal. So you carve your DSL by adding things through a familiar mechanism and only get rid of things that are permissible rather than adding new ones. I got the impression he thinks this is a nice compromise between no language extensions and having lisp style macros.

I'm unconvinced that this would be enough in every case, I think he's selected case studies, embedded C++ and safe C++, that are biased towards his approach because they both focus on limiting what users can do rather than focusing on increased expressiveness.

The macro dilemma

The most severe problem with a powerfull macro system is that it conflicts with the intention of the language designer to grow the language himself. He decides about the default syntax and semantics of all language constructs. So either the language is still immature and has a way to go. Then using macros is premature design and syntactical decisions made by a framework/library programmer are likely duplicated or broken as the language advances. Or the language is mature and cares for everything in its huge ecosystem then macros are superflous and confuse programmers who are not familiar with custom syntax.