Syntax Design

I am in the process of designing a few domain specific languages. Thus I wondering about if people have suggestions about the design of syntaxes. It's generally not a topic of LtU. Nonetheless it is a practical problem for language designers. In a few cases, syntax does make a difference. e.g. Python and Lisp. Maybe it's a vague question. However I don't know how I could bring orderness to this topic.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Look at existing languages

I'm not aware of any guide lines or other good material. One obvious point is that a good syntax preferably should have a straightforward BNF grammar that is at least LALR(1), in order to ensure easy parsing for the implementer and absence of nasty surprises for the programmer. And not rely on context-sensitive lexical tricks.

It's probably best to look at a bunch of existing languages and try to classify their pros and cons. Some noteworthy points in the design space and respective representatives might be the following, in no particular order:

  • trivial to parse (Lisp)
  • simple to parse (Pascal)
  • mainstream (C, Java)
  • succinct (Haskell)
  • down to earth (Basic)
  • ad-hoc complex (C++)
  • black magic (Perl)

Discrimination

Ah, as always, only textual syntax is considered. DSLs, even more than usual PLs, are user interfaces. Why limit yourself to textual UI before even considering alternatives?

ha ha only serious

GUI PL

Actually I've been wondering about this myself recently. If the most basic units of a textual UI are the characters that make up a character set (ascii, unicode) then what are the set of primitives for a GUI? Is it just a pixel? Is it the typical primitives of a 2d or 3d vector package such as point, line, circle, square, rectangle, etc.

Today compilers can be pretty intelligent about producing machine code (or intermediate representation) which is better than most human developers may be able to produce. Could a 'compiler' take a general description of a GUI and produce a better looking (according to some design rules), more consistent and much easier to use GUI?

More and more I see the ability to link GUI components directly to data structures...are there any more techniques which reduce GUI development time?

Naked objects?

More and more I see the ability to link GUI components directly to data structures...are there any more techniques which reduce GUI development time?

I suggest looking for "naked objects" (including on LtU). I do not endorse them, but you may find a lot of relevant info burried in discussions about them.

Or, better, Squeak

Squeak is 'Smalltalk as it was meant to be', with a very direct manipulation of objects.

As regards naked objects, here is the book about them. Read with caution :-)

Self

The philosophy of Self is/was much more about direct manipulations of objects, and included research on UI aspects as well.

Self is cool, but...

CLIM is better, since it allows you to manipulate information objects themselves, not just the behavior and state of glyphs on the screen. The development API has a real object-verb declaration form to it that powers the CLIM command line and the context-highlighting and menus for on-screen objects, although it doesn't go quite as far in the other dimension as providing the equivalent of declarative style sheets, even though it abstracts over the output medium and input facilities. And I should point out that the Morphic way of building interfaces is too literal, in that it stands against the desire to use style sheets.

More importantly, it answers the question of what category of software addresses the original concerns: it's an UI manager.

Language for HCI

Could a 'compiler' take a general description of a GUI and produce a better looking (according to some design rules), more consistent and much easier to use GUI?

It seems to me that in order to be able to describe design rules you need a way of talking about human-computer interactions. Once you have that, instead of talking in terms of graphic primitives, why not create GUIs directly in terms of HCI primitives (whatever they may be - perhaps things like "selection", "mutually exclusive selection", etc - I'm not an HCI expert)? If you use graphics, the compiler has to infer semantics. If you use this notional HCI language, the semantics are already defined. Of course, I suppose there's nothing to prevent you from having a graphical language for specifying HCI. But would the "compiled" result necessarily bear any resemblance to the original spec?

As I said, I'm not an HCI guy. Does anyone know if there's been any work on a formalization of HCI concepts?

low level vs. high level primitives

I can see 'higher' level 'primitives' based on the relational model. This would work extremely well for data oriented applications (excel, scientific, accounting, etc.). Things will get murky when applications such as word or powerpoint are included.


On the other hand, visual primitives will very likely have to be based on something like NURBs (which, according to wikipedia, provide basic set of equations for the artifacts we see on screen...I don't know anything about NURBs beyond wikipedia).


The actual look and feel could be seperated using something like CSS for the web.


Currently the GUI components seem to be completely seperate from algorithms and 'objects' (in java, c#, etc.). It seems to me that in reality we should be able to generate, at least basic, but functional, GUI/visual representation automatically.


I read some of Edward Tufte's work hoping to find the most basic rules for data visualization. Unfortunately his rules are things like: don't clutter up a chart, only provide as much information as you need and some math formulas related to such rules. I also read some literature on information visualization which gets closer to what I am looking for (graph layouts) but still nothing as fundamental as the relational model or lambda calculus. In fact, there should be a direct relationship between relational theories (this should be obvious) and lambda calculus (i'm not completely up to speed on lambda but if functions can be defined...the same description should be enough to represent them visually).


I haven't yet looked at nake objects or squeak :)

That was kind of my point

The actual look and feel could be seperated using something like CSS for the web.

Yes, but in order to do that, you need some way of describing the "content" in a manner that is separable from the "look and feel". HTML/CSS uses basic document elements like body, heading, paragraph, and list as its core language for describing content. My point was that you would need something similar for describing GUI content (as well as wondering whether anyone had already done it).

An aside on NURBS: NURBS are a wonderfully general way to describe vector graphics. I'm not sure that they'd necessarily be the best choice as the ultimate prmitive for a GUI though - they can be a little heavyweight for describing simple shapes (e.g. a NURBS description of a circle might require 8 or 9 points, versus just specifying a center and a radius). It might be better to have a larger set of graphical primitives (which might include NURBS). That seems to be the way most vector graphics editors handle things.

Gaaaaaaaaaa!

Not another UI programming language!

Incremental/continuous parsing

These days, you might also want to consider how hard the language is to parse incrementally and continuously in an IDE like Eclipse. Statement-based imperative languages have an advantage here over expression-based functional languages as there is really no parsing dependencies between different statements. Although BNFs often introduce false dependencies in statement-based grammars use of recursion to represent lists of statements, just like functional language introduce false dependencies in loops to represent lists traversals, but I digress.

Tcl

Tcl's approach is, IMO, a reasonable compromise between Lisp and more 'mainstream' languages. It's easy to parse, but still gives you a few visual hints as to what's going on. And since it does take after Lisp, it's very easy to mold to fit your needs - you can write your own control structures and all kinds of fun stuff like that.

Not Tcl

I have to say that I find Tcl a particularly terrible approach to syntax. The minimalistic everything-is-a-string philosophy is just superficially simple - in effect, it renders syntax completely ad-hoc beyond the most basic whitespace-separates-arguments convention. There is no separation between concrete and abstract syntax, or rather: there is no abstract syntax. All parsing and syntactic interpretation more or less depends on the context where a string is used. That puts you right into quoting hell as soon as you need some non-trivial structure.

Strings and glue

Yes, syntactic interpretation depends on context in Tcl. But this is why it makes such a good glue language, because I can happily expose two components (e.g. written by third parties) and let them choose how to interpret representations they are passed, even if these interpretations are conflicting. For instance, a trivial example from porting parser combinators to Tcl. In the Haskell version there is code which looks like:

addop = do { symb "+"; return (+) } +++ do { symb "-"; return (-) }

The Tcl equivalent is just:

def addop      ::= [[symb +] | [symb -]]

and even that is more verbose than it need be. Why in the Haskell version do I have to mention "+" twice (same for "-" and the other operators), merely because it's use depends on context? To me, this sort of thing is "paying attention to the irrelevant" — it has nothing to do with my problem domain that "+" can be a string or an operator — and so by Perlis's definition (quoted elsewhere on LtU recently), I'd have to conclude that Tcl is a higher-level language than Haskell! :-)

OK, so that is a trivial example. I contend, without evidence, that these sorts of situations crop up quite frequently in Real World programming, particularly when trying to integrate code from various sources. In particular, I would say that this is more likely to be true at higher levels of abstraction, where interpretations are much more specific, and entail a much greater semantic commitment (e.g., at a low-level saying "this is a record" doesn't say an awful lot, but saying "this represents the state of a person's bank account" does). Tcl was designed to be precisely this kind of high-level glue language for putting together programs on an application level. I think it does better at this than almost any other language I have come across. Everything-is-a-string doesn't mean that you have to spend your time thinking in terms of strings. Rather, it allows flexibility in crafting abstractions so that there is less difference between the abstraction and the problem domain being described.

But it is also clear that something has been given up in order to have this flexibility: individual commands are free to interpret representations as they wish precisely because the "language" has done less for us. This is less good, obviously. Firstly, it means that individual commands have to do more work, and secondly it means there is less of a common foundation for programs to build on, so they can be wildly inconsistent. A good language is not just syntax, but also a body of shared concepts (a culture, if you will) that can be used to express ideas.

Tcl's approach to both of these problems is via its library. It provides a bunch of standard data structures (lists, dictionaries, numbers, patterns, etc) and each "type" (loosely speaking) provides routines for parsing, converting and serialising these types. So, in practice much of the day-to-day parsing is delegated to standard routines (and the result of this parsing is cached, so it usually only has to be done once). Incidentally, one place I think Tcl fails is that it doesn't provide public routines for parsing and manipulating commands and scripts (which, unlike in Lisp, don't correspond entirely with the syntax of lists), but this is hardly a criticism unique to Tcl. Of course, even after parsing, values aren't entirely safe from further revision or reinterpretation (I'm beginning to sound like Daniel Dennett talking about "multiple drafts", and not entirely by accident). Thus, you still get very few guarantees, which is why Tcl is quite resistant to optimisation, and why things like garbage collection are tricky (how do you recognise a reference?). I think there are ways to solve all of these problems— to layer levels of abstraction on top in a way that allows you to make reasonable assumptions, while still allowing you to drop down to lower levels if necessary. One such way is to bundle up values with an interpretation (a "type", again talking loosely) so that the type is explicitly part of the representation, thus avoiding reinterpretation (mostly). I have an extension which does just this (TOOT), but there is more work to do to make it efficient and robust. The guarantees you get will always be weaker conditional guarantees (if you use the approved interfaces only then you get these guarantees) rather than the strong guarantees some other languages provide (you can only use these interfaces, thus the guarantee will always hold). I think the weaker form is usually sufficient, but needs to be coupled with a way to identify where an interface was bypassed (i.e., where a contract was violated). The exception to this is where security is at stake: there you most certainly want the stronger guarantees, at least for some interfaces.

The second problem (lack of a common "culture" of high-level concepts) is a criticism that I think is more of a problem for Tcl. The standard library is very good in some respects. It has good routines for event-based I/O, GUIs (Tk), pattern matching (regexp, globs), and interfacing with operating systems and file systems. However, it has long been a criticism that Tcl is weak on data structures. This is partly true, and there is a tendency in the Tcl community to advocate a do-everything-with-associative-arrays-and-lists approach. Usually though, you want more problem-specific data structures, but without a standard mechanism of building these people have to build them in an ad-hoc way. Ironically, given Tcl's abilities at code reuse, this has led to a situation where the single most popular Tcl coding task (by far!) is to write your own object system. I think this is an area where Tcl could really improve; by standardising on a greater base of functionality. I said earlier that Tcl provides flexibility in crafting abstractions. More truthful would be to say that Tcl provides a flexible medium in which to craft abstractions. The tools it gives you to do the crafting could do with some improvements.

Abstraction in Tcl

Everything-is-a-string doesn't mean that you have to spend your time thinking in terms of strings. Rather, it allows flexibility in crafting abstractions so that there is less difference between the abstraction and the problem domain being described.

I disagree strongly. Everything-is-a-string is a (mis)feature that rather compromises proper abstraction, and forces you to think at the level of bare strings much more often than you would like to, because of quoting issues. Representation is not abstract, and it is almost impossible to enforce any abstraction in order to preclude errors with quoting or other dumb mistakes (the closest you can do is to generate names and do a tedious and expensive indirection through an associative array at every level of abstraction). The errors you get can be incomprehensible and are not localised - like in the worst unchecked languages.

In my experience - which admittedly dates ten years back - this was horrible, and I guess it is one reason why Tcl is weak on data structures: the string approach does not scale to compositional structure. Maybe I just was too stupid to use it properly, though...

Strings and things

Everything-is-a-string is a (mis)feature that rather compromises proper abstraction, and forces you to think at the level of bare strings much more often than you would like to, because of quoting issues.

I think you have things the wrong way round. Using proper abstraction allows you to avoid thinking about quoting issues. For instance, I can use Tcl's list commands to manipulate lists without ever considering that they have a string representation; the commands take care of any necessary quoting. The only time you have to deal with quoting is if you do direct string manipulation. If you just use the standard abstractions (which can be the same as in any other language), then you don't have to worry about it.

Representation is not abstract, and it is almost impossible to enforce any abstraction in order to preclude errors with quoting or other dumb mistakes (the closest you can do is to generate names and do a tedious and expensive indirection through an associative array at every level of abstraction).

Naming/abstraction always involves indirection. Whether that's looking up a string name in a hashtable (the lookup of which can be cached), or looking up an address in memory via a pointer. You can build up whatever layers of abstraction you want. You don't have to use arrays, you could use procedures like in most other languages.

... the string approach does not scale to compositional structure.

This would seem to be refuted by the fact that pretty much every programming language in existance still start from strings: source code. Tcl has its faults, but the idea that you can't build abstractions on top of strings is patently false.

Abstraction

If you just use the standard abstractions (which can be the same as in any other language), then you don't have to worry about it.

Maybe, but one problem is that consistent use of these abstractions is not enforced in any way - unlike in other languages. Hence I hesitate to call them abstractions, really. If you screw up using them properly at one place it can lead to obscure and pathological errors that can be hard to trace down.

Naming/abstraction always involves indirection.

That is not true. Take type abstraction, for example, or dynamic forms of sealing.


... the string approach does not scale to compositional structure.

This would seem to be refuted by the fact that pretty much every programming language in existance still start from strings: source code.

There is a fundamental difference. I already mentioned the issue of concrete vs abstract syntax: in most other languages there is a strict separation between the two, and the language semantics is completely independent from concrete syntax. Not so in Tcl, where semantics cannot be defined without going all the way down to the level of byte strings and individual characters.

Likewise, Tcl's semantics is not separated from representational issues: it is fully visible in the semantics how, say, a float is represented (although it may be underspecified - which isn't any better).

So, while it is standard practice to fully separate syntax vs semantics vs representation (a fundamental instance of abstraction, btw), Tcl's string approach precludes this. I view this as inherently low-level.

Maybe, but one problem is tha

Maybe, but one problem is that consistent use of these abstractions is not enforced in any way - unlike in other languages. Hence I hesitate to call them abstractions, really. If you screw up using them properly at one place it can lead to obscure and pathological errors that can be hard to trace down.

So abstractions are only abstractions if they are enforced? I can't agree with that definition. Are the abstractions I form in my head to comprehend the world around me enforced? If not, are they then not abstractions?

Take type abstraction, for example, or dynamic forms of sealing.

OK, I concede that point. Type abstraction is indeed an area that Tcl is weak(ish) in.

There is a fundamental difference. I already mentioned the issue of concrete vs abstract syntax: in most other languages there is a strict separation between the two, and the language semantics is completely independent from concrete syntax. Not so in Tcl, where semantics cannot be defined without going all the way down to the level of byte strings and individual characters.

This is a fundamental misunderstanding about Tcl. Tcl's semantics are defined by individual commands, in abstract terms. The fact that all values are required to have some concrete string representation doesn't mean that you have to define the semantics in terms of strings! For instance, as I've mentioned more than once, Tcl lists have an abstract semantics in terms of the operations that can be performed on them, and this semantics is completely unrelated to their canonical string representation. You could completely change the string representations and all properly written code would continue to work. What would fail would be other code that used a different interpretation (e.g., as a string, or as dictionary, etc). Those are, of course, exactly the places which would fail in another language anyway ("type errors"). But this is missing the point. Values in Tcl are not implemented in terms of strings, they merely have a string representation. In actual fact, (interpretations of) values are implemented in Tcl in terms of efficient structures at the C level (hash tables, arrays, etc). In other words, as well has having multiple possible interpretations, values in Tcl also have more than one representation: the canonical string representation, and one (or more) internal representation, which is an implementation geared towards a particular interpretation. The concrete representation should not be thought of as an implementation detail, but as part of the public interface. We don't expect the syntax of source code to change much, nor that of network protocols, or file formats. The implementations of the things they describe can change, and that's fine. The implementation of values in Tcl has changed several times, and yet you can still run Tcl code from over a decade or more ago with minimal changes, sometimes none. So it is clear that insisting on a string representation has not been a problem.

While I don't agree with everything the Third Manifesto crew say, I think C. J. Date has expressed a similar view when discussing how to support both encapsulation and ad-hoc queries. (Although, he may not think that it applies to Tcl, but I do):

“...Hugh Darwen and I deal with the foregoing conflict [between encapsulation and allowing ad hoc queries] by requiring that, for any given type, operators be defined that expose some possible representation for instances of that type. ... Note carefully, however, that this fact doesn't mean that [values of this type] are actually represented [in this way] inside the system; it merely means, to repeat, that [this representation is] a possible representation. The actual representation might be ... something else entirely. In other words, [these operators] don't violate encapsulation, and they don't undermine data independence.”

I broadly agree with this view, and think that Tcl's insistence that each type provide a string representation provides exactly this capability. I would argue that there doesn't need to be a particular representation that is the "actual" representation. The "actual" representation is whichever one is currently useful to some process.

Likewise, Tcl's semantics is not separated from representational issues: it is fully visible in the semantics how, say, a float is represented (although it may be underspecified - which isn't any better).

How so? Tcl is currently being upgraded to handle big integers. I don't expect the semantics to change, only the range of acceptable strings that can be treated as integers.

So, while it is standard practice to fully separate syntax vs semantics vs representation (a fundamental instance of abstraction, btw), Tcl's string approach precludes this. I view this as inherently low-level.

It is standard practice to separate them in the definition of the language itself. It is not standard practice to separate them in the use of the language. The interpretation of a value (its type) is usually fixed, and not open to reinterpretation (although it may be subject to further interpretation within the constraints of the previous interpretation). In many languages, any trace of there ever having been a representation at all is eliminated. Tcl ensures that there is always a distinction between representation and interpretation. It is wrong to characterise this as a "string approach", but rather as a capability.

Safety again

So abstractions are only abstractions if they are enforced? I can't agree with that definition. Are the abstractions I form in my head to comprehend the world around me enforced? If not, are they then not abstractions?

The distinction in question here, in PL theory, is usually described as "safety" - e.g. to quote TAPL, "a safe language is one that protects its own abstractions [...] Safety refers to the language's ability to guarantee the integrity of these abstractions and of higher-level abstractions introduced by the programmer using the definitional facilities of the language."

One distinction this results in (paraphrasing TAPL) is that unsafe abstractions can't be used abstractly — you have to be aware of, and deal with, low-level details related to the representation of abstractions. As I was recently arguing related to concatenative languages, aside from the issue of safety itself, languages in which you're required to use unsafe abstractions tend to be lower-level (or result in lower-level code) for this reason.

There are definitely cases where unsafety is useful, though. For very basic abstractions, like lexical variables, there may be very good reasons to make them safe, but even in that particular case, some unsafety might allow e.g. introspection which might otherwise be more difficult or impossible. But for (usually) higher-level abstractions, "unsafety" can make all sorts of useful flexibility possible.

Building structures out of pairs in Lisp is an example of this, and XML is another closely related example. In both cases, complex abstractions can be represented by tree structures, but unless steps are taken to encapsulate these representations, they're available for manipulation without regard to the rules of the abstraction they're supposed to represent. In some cases, being able to do this has enough advantages to at least balance the disadvantages.

I'll refrain from drawing any specific connection to Tcl here - I try to avoid offending more than one language community per month.

Re: Safety

One distinction this results in (paraphrasing TAPL) is that unsafe abstractions can't be used abstractly — you have to be aware of, and deal with, low-level details related to the representation of abstractions.

I don't think this follows. The fact that an abstraction can be broken doesn't mean that you have to deal in terms of representations. For instance, to use the example of Tcl lists again, it is quite possible to break the abstraction offered by Tcl's list commands and manipulate a list as a string. However, this doesn't imply that I need to take that into account when manipulating lists. So long as I don't break the abstraction then my code will work ok. If someone else breaks the abstraction and hands me the result, then my code may well error. But I still don't have to think in terms of representations to catch this; the list commands will do it for me. Indeed, it's hard to see how thinking in terms of representations would help in any manner.

I don't want to give the impression that I think safe/unbreakable abstractions aren't useful, or that Tcl doesn't support them. Just that they are not always what you want, particularly at the application level that Tcl was designed to operate at.

There are definitely cases where unsafety is useful, though. For very basic abstractions, like lexical variables, there may be very good reasons to make them safe, but even in that particular case, some unsafety might allow e.g. introspection which might otherwise be more difficult or impossible. But for (usually) higher-level abstractions, "unsafety" can make all sorts of useful flexibility possible.

Exactly.

Building structures out of pairs in Lisp is an example of this, and XML is another closely related example. In both cases, complex abstractions can be represented by tree structures, but unless steps are taken to encapsulate these representations, they're available for manipulation without regard to the rules of the abstraction they're supposed to represent. In some cases, being able to do this has enough advantages to at least balance the disadvantages.

Sure, you give up one abstraction in favour of another. Most of the time you want to drop down to a lower level of abstraction (e.g. representation) is not because you want to work at a lower level, but rather so that you can build a different high level abstraction.

Re: Safety

One distinction this results in (paraphrasing TAPL) is that unsafe abstractions can't be used abstractly — you have to be aware of, and deal with, low-level details related to the representation of abstractions.
I don't think this follows. The fact that an abstraction can be broken doesn't mean that you have to deal in terms of representations.

That's right, my paraphrasing was overly broad. TAPL carries on to say "in order to completely understand how a program may (mis)behave, it is necessary to keep in mind all sorts of low-level details...". When you're dealing with an abstraction that has a properly abstract interface, which nevertheless supports direct access to its representation, then knowledge of low level details are still needed to understand how it could misbehave if the abstraction is broken. (There are other kinds of abstractions which can't even be used without resorting to low-level details, though — is there a term for these "imaginary abstractions" which exist more in the programmer's mind than in the program's semantics?)

Sure, you give up one abstraction in favour of another. Most of the time you want to drop down to a lower level of abstraction (e.g. representation) is not because you want to work at a lower level, but rather so that you can build a different high level abstraction.

Yes. Still, to support this, a language doesn't have to allow arbitrary clients to break the original abstraction. There are various ways that code can be designated as being allowed to access the internals of an abstraction — e.g. by locating the code in the same organizational unit, or in OO languages, inheriting from a class, or declaring a "friend" class (in C++), etc. This is an area in which many of the safer languages could do better, which is one reason that some of the less safe languages can be so appealing.

(There are other kinds of abs

(There are other kinds of abstractions which can't even be used without resorting to low-level details, though — is there a term for these "imaginary abstractions" which exist more in the programmer's mind than in the program's semantics?)

"Patterns"? Yes, I realize that patterns most often refer to a particular approach to OO design, and on the other hand I realize that this characterization of patterns as merely compensating for a weakness in the language is a little unfair. But in general, I'd say it's the best term we have for this phenomenon...

Re: Patterns

Thanks, that's close, but I think we'd at least need to define a specific subtype of patterns. The problem is that the actual application of a pattern can often be completely abstractly encoded in a program's semantics. It's only the pattern itself, as a meta-definition, that can't be usefully encoded in many programming languages, which is why it needs to be expressed as a pattern.

An abstraction based on a pattern wouldn't qualify as what I called an "imaginary abstraction" unless it couldn't be fully encoded in the target language. It might be valid to say that all imaginary abstractions are the result of applying a pattern, but not all applications of patterns result in imaginary abstractions, depending on the pattern and target language.

I agree...

I basically agree, and that's why I've always had reservations about the point of view that "the existence of design patterns is a sign of shortcomings in language design." [1]

I guess there's really two ways that design patterns are sometimes "imaginary abstractions". There is, as you say, "the pattern itself, as a meta-definition," and then there's the actual language-specific encoding of the pattern. In some cases this encoding can be abstract and enforced by the language mechanics, and in fact, in some cases this is the entire point of the pattern: to extend language-enforced abstraction to a new area. In other cases, as you say, this encoding relies on contracts that can't be enforced, and the resulting abstraction is thus "imaginary". (Can we say "conventional" instead? It's a bit more charitable...)

But of course it's true that we might wish to be able to encode the actual pattern itself, as a meta-definition, in the language. In many languages, we can't, which of course leaves the pattern as an "imaginary" abstraction that we follow by convention and re-implement each time.

Some people have proposed that truly high-level languages will have no patterns (because they'll all be expressible as programs), but I don't really buy that. I think a more useful metric is to look at the "level" of patterns that a programming community produces.

In assembly language, for example, "if-then-else" is a pattern, and "subroutines" are a pattern, and "records" are a pattern, and... This is decidedly low-level stuff... In Java, we get all of the above in the language, but we have "abstract factory" and "visitor" as patterns. In a higher-level language we might express those in the code and uncover another higher level of patterns. That's a virtue of abstraction: it allows us to synthesize new perspectives.

So the problem with Java isn't the existence of patterns, it's the particular patterns... Anyway, now I feel like I'm just rambling...

[1] "A Core Calculus of Metaclasses" (great paper, go read it if you skipped it the first time...)

Re: Strings and glue

In the Haskell version there is code which looks like:

addop = do { symb "+"; return (+) } +++ do { symb "-"; return (-) }

The Tcl equivalent is just:

def addop      ::= [[symb +] | [symb -]]

and even that is more verbose than it need be. Why in the Haskell version do I have to mention "+" twice (same for "-" and the other operators), merely because it's use depends on context?

Sure, but what if the implementation of "+" is named 'plus' or 'arbitraryName1066' instead of '(+)'? The fact is, the string "+" does represent different things depending on the context, just like the word "closet" means different things depending on whether you're speaking German or English.

... but what if the implement

... but what if the implementation of "+" is named 'plus' or 'arbitraryName1066' instead of '(+)'?

Then I (in the Tcl version) would add back in the explicit return. I did say it was a trivial example.

The fact is, the string "+" does represent different things depending on the context...

This is exactly my point. It is Tcl which allows you to treat the string differently depending on context, while Haskell requires you to distinguish between "+" (the string) and (+) (the operator).

This is not intended as an argument against Haskell (or static typing), in case anyone mistakes it for that. Instead, the point is that as you ascend layers of abstraction towards more domain/application-specific concepts, then there is a greater chance that at some point there will be an inconsistency in the interpretations that two or more components place on a value. At this level, then, I would argue that it makes sense to have a language which is less semantically commited to any particular interpretation, that can mediate between the various components. In practice, I think this implies maintaining a separation between representation and interpretation. At lower levels, I think precisely the opposite is true: components are smaller and more likely to have been developed by the same person/team, therefore there is a good chance that they are consistent, so it pays off to commit to particular interpretations.

experimentation

I suggest you use tools that allow you to experiment easily with syntax and translation.

The Language Machine makes it exceptionally easy to create a shell script that applies grammatical rules that cover the whole spectrum from lexical analysis to generating transformed representations. You can experiment with grammars that are partially defined, and the built-in lm-diagram generator makes it very easy to see how the rules are applied. It's very easy to create different front-end backend combinations, as they are all written in the same notation, and it's easy to call external procedures in the D and C languages.

Of course I have some bias, as the author of the Language Machine. But it is exceptionally hard to design a usable syntax unless you can experiment with it, and most other tools are fragmented, complex, and intolerant of partial solutions.

On the subject of GUI grammars, the real grammar of a GUI is a grammar of permissible action sequences. The trick would be to map actions to symbols that can be used as terminal symbols in a grammar. The distinction between terminal and nonterminal symbols is merely that the nonterminal symbols never occur directly in the sentences of the language - in all other ways they are just opaque values that can be compared for equality.

The point that I have been trying to make elsewhere is that analytic or recognition grammars of every kind can be really easy to understand and play with, and that the received wisdom has induced unnecessary fear.

Comparison resource

A message I received today on the e-lang mailing list had a pointer to what looks like a useful comparitive resource: syntax across languages. This is part of a general "study of programming languages" that includes one of the a genealogical diagrams that is linked from this site.