Designing an alternative to s-expressions for language extensibility

While designing my own programming language, I experimented a lot with syntax, but I quickly determined that there are three things I really wanted:

  • I wanted the language to be as extensible as possible: if I wanted to add some control structure like "unless" or transform code arbitrarily with macros, then I should be able to.
  • I wanted syntax as regular as possible and a simple program representation, so that extensions could focus on semantics rather than grammar rules.
  • I wanted the language to look clean and to have little punctuation, similar to Python. I wanted infix operators. S-expressions are regular and simple, but I think they involve too much punctuation and I wasn't willing to compromise on aesthetic preference.

This was an incremental process that I barely documented as I went through it, but in the end I think I came up with something interesting, a kind of syntax that I call o-expressions. I have thought about it a good deal post hoc so that I could explain and justify my findings and I am at a point where I would appreciate some external feedback and criticism. I wrote two blog posts about it:

First, this post describes Liso, a translation of my syntax to s-expressions. It contains many examples, so you can get an intuitive grasp of it. It demonstrates a concrete use as syntactic sugar for Lisp-like languages and I implemented it as a Racket #lang, so you can try it out if you want (source).

My second post is more theoretical. In it I build o-expressions from scratch using principles that I believe are essential to language extensibility.

I plan to keep working on the syntax and port it to other languages, so feedback would be very welcome :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Sigh, here we go again

You've devised yet another semantics-specific alternate notation for Lisp. Throughout the history of Lisp from 1958 onwards, these M-expression systems have been invented and discarded, invented and discarded; as Squirrel says, "But that trick never works!", to which Moose replies "This time, for sure!" The only time they get adopted in Lisp-derived languages is when macros are excluded, as in Logo.

I would urge you to check out sweet-expressions. There is a bidirectional implementation in mostly-portable Scheme (you just have to add the hook to the system reader to use the translator directly) and one in portable Common Lisp. I don't think they have much hope of being adopted either, but at least they work uniformly and independent of code semantics, and at least they were created by a team and not an individual.

Let me clarify a few things

Maybe I explained it improperly (maybe I shouldn't have mentioned s-expressions to begin with) but the notation is not semantics-specific, or if it is, I would like to know what exactly you mean by that. O-expressions are parsed the same regardless of location, context or semantics. Macros are supported without any issues.

I know all about sweet-expressions, but they were specifically designed to be backwards-compatible sugar for s-expressions, which o-expressions are not. I ported them to Racket, sure, but I did that as a proof of concept and not as an end in itself. Right now, I am porting them to JavaScript's semantics (along with a macro system). Then I might try with other languages. In other words, I view o-expressions not as sugar for s-expressions, but as a kind of syntax that combines familiarity with high regularity and adaptability. That might be insufficient as an alternate syntax for Scheme, because people are familiar with s-expressions there, but I think it has potential as alternate syntax for other existing languages (or for entirely new languages).

Whether a notation was created by an individual or a team is irrelevant to its merits. It's not an argument. Besides, I'm asking for feedback here -- if I got some useful criticism that led to improvements to the syntax, then it would become a team effort, wouldn't it?

Did you look at PLOT and Dylan?

Dylan is strongly influenced by Lisp/CLOS and Smalltalk. In contrast to Lisp, it has a clear separation of compile and runtime (producing binaries). It also has an ALGOL-like syntax, and a macro system. Make sure to read also the work done by Jonathan Bachrach and Keith Playford on D-expressions Lisp power, Dylan style.

Also, David Moon presented PLOT, the Programming Language for Old Timers, at ILC'09, also discussed on LtU, which in my impression is a modern syntax design with macros and as few punctuation as you get. As far as I can see, there is no implementation (yet?) ;)

Didn't know about PLOT

I've looked at Dylan, but I did not know about PLOT. Thanks for the link :)

At a glance, my solution seems to differ from both in that I don't allow macros to modify the grammar or to work on token streams. I guess I could, but I am not sure giving too much flexibility is very productive. For instance, you'd have to inform editors about every new structure you make, and I think that disincentivizes making them.

O expressions

These are less noisy than S-expressions. But the priority graph seems a bit awkward and arbitrary to me. We have neither a simple set of precedence rules nor ability for developers to provide priority for their own operators. I suspect a developer creating a DSL will be unable to create a clean, noise-free syntax without overloading your special operators.

Operator priority customization

I prefer not to provide the ability to customize operator priority because I believe that (ideally) someone should be able to parse a DSL without learning new syntax. I mean, if you add some new operator like $++, I don't think it's unreasonable to make a + b $++ c a syntax error and force users to parenthesize the expression: it's not intuitively clear what the expression means. The extra noise lets me learn the DSL faster, which is probably worth it unless it is ubiquitous.

Still, though, it wouldn't be too complicated to let people customize priority. I don't like it, but if I see evidence that it would be useful, I might change my mind.

Laziness and control flow

As far as I'm aware, laziness allows pretty much any control structure to be expressed as a regular function. Such laziness doesn't have to be pervasive either, it can be restricted to specific arguments with some kind of annotation (like an inverse of "!" in Haskell). For example (my LISP is rusty):

(defun unless (condition !branch)
  (if condition nil branch))

Not really. Can you write

Not really. Can you write LOOP or pattern matching as a regular function? How about let, or variants like if-let? do notation? Comprehension syntax?

There's sometimes a workaround that involves runtime interpretation and explicit lambdas (I have to wonder what's the point of passing functions by need), but at some point laziness is as convincingly "universal" as a Turing machine. Often, the reason for specialised operators is to provide a convenient syntax for constructs that we wish to encourage. Merely replicating their behaviour without the convenience doesn't cut it.

Can you write LOOP or

Can you write LOOP or pattern matching as a regular function? How about let, or variants like if-let? do notation? Comprehension syntax?

Those are the kind of things macros are useful for. I was not arguing against macros. The post makes a distinction between control structures and macros:

I wanted the language to be as extensible as possible: if I wanted to add some control structure like "unless" or transform code arbitrarily with macros, then I should be able to.

I was pointing out a relatively easy way to implement the former, I wasn't saying anything about the latter.

EMSIP

I hate to be a wet blanket, but honestly I think that's exactly the wrong direction. There's nothing wrong with parentheses. If anything, the problem with Lisp and Scheme syntax is that some irregularities still sneak in. Eg, strings and their escape sequences, and "#" and all its ramifications.

I've proposed a way to get rid of even those irregularities. I call it EMSIP - stands for Even More Silly Irritating Parentheses, so the name is kind of a poke in the eye to parenthophobes.

Basically, the three brackets types have slightly different meanings (parentheses, curly braces, and square brackets). In all cases, brackets delineate a lexical scope and essentially nothing but brackets does so. The parser is actually simpler than `read', yet quite extensible.

While no tool can understand future extensions that have yet to be created, this is as future-proof as anything can be. A parsing tool that's only interested in some subset of structure - say, strings for translation - can pick out the stuff that it's interested in, with 100% certainty, forever, no matter what extensions get added later.

I wrote it up a few years back:

[meta] broken EMSIP links

the latter two need to be fixed.

Thanks

Thanks for mentioning the broken links. That's what I get for editing HTML by hand. Fixed.

It depends on your objective

A direction is only "right" or "wrong" with respect to a goal or an objective. To say that "there is nothing wrong with parentheses" is a bit like saying "there is nothing wrong with grey cars". I guess there isn't, but that's not relevant to someone who is trying to paint their car red. Perhaps they just like that color better. I don't think we ought to be purists about these things. The vast majority of code is written in languages that are horribly irregular and there is potential in making a syntax that is more regular while remaining appealing. Perhaps not as an alt syntax for Scheme, but it could work for a new language. I don't think either of us is going in the "wrong" direction, because we aim for different demographics.

Regarding editor or tool support, and disregarding significant indent (which is optional anyway), Liso is actually easier to highlight and to properly auto-indent than Lisp or Scheme because the syntax provides better hints. In these languages, you normally have to specify an indent policy for every control structure that needs one. For instance, if you define (match expr clauses ...), you need to tell the editor that it should indent expr more than the clauses (lisp-indent-function 1). Other control structures may have two significant subforms instead of one, or three, or a variable number, etc. You simply can't write a fully generic indent policy for these languages. In Liso, though, control structures will all look like @match expr: {clauses...}, where @match and : act a bit like a bracket pair. Highlighting is trivial: you highlight @anything_like_this. Indent is also trivial because the colon marks a clear boundary.

Point being, you can't afford to be too dogmatic about these things. In order to be productive in a sexp language, you need good auto-indent, but sexps don't embed proper indent rules syntactically. So you need to specify them elsewhere. I don't think it's a big deal -- it's usually easy to do and other languages are much worse -- but you can design syntax that's more editor-friendly than s-expressions if you are clever about it.

I'll plug my sugar for s-expressions

I think parts of Wart meet the criteria you mention up top -- and a lot more parsimoniously.

More complete details on the paren-insertion rules and on the infix rules. (I too partition the space of characters into sym and infix.)

But perhaps I'm missing something. I'd love to hear your response.

I don't think it's fair to say...

... that Wart is a lot more parsimonious than Liso:

  • My indent rules are simpler. In fact, I can fully describe them in nine words: "INDENT is '{', DEDENT is '}', NEWLINE is ','".
  • My brackets are not context sensitive: {} is always for grouping; [] is always for defining lists. Wart, on the other hand, has to make a distinction between (+) and (f).
  • My function call syntax looks more irregular than it actually is. f x means (apply f x) regardless of the form of x. Considering [] always defines lists, f[x, y] means (apply f (list x y)).

Regarding Wart itself I have a few comments:

  • Using whitespace for priority is interesting. I did the same thing for a markup language that I designed. Stylistically, I don't like to write operator expressions without whitespace, but it makes sense to associate whitespace-less expressions more tightly.
  • Not a big fan of f a + b c + d being equivalent to f (a + b) (c + d). That does not parse well at all, visually. I would prefer it to be equivalent to (f a) + (b c) + d.
  • I think right-associativity is a better default than left-associativity. Operators that work better left-associatively are not all that common: arithmetic and logic will work fine either way, whereas declaration, assignment, lambda, cons and function application become a lot more practical when they associate from the right. For instance:
    a <- 1 + len . cdr . xs

    Division and currying application are the only common operators I can think of that gain from left-associativity.

Parsimony is not simplicity

Rich Hickey would say you're decomplecting :)

Thanks for the comments. Since my goal is to be obvious at a glance to those who don't know the rules, I never use f a + b c + d outside of the doc. I would just phrase that with parens. Parens around function arguments are both common and ineradicable at some point, no point fighting it IMO.

I've gone back and forth on left- vs right-associativity. On one hand, as you said I can't have an operator for cons, and I am forced to use parens for nested assignment. On the other hand, I constantly use the idiom of cons?&car.x, which means ((& cons? car) x). And the clojure-like pipe operator works beautifully:

(c -> downcase -> rot13 -> upcase)

The only wrong decision here is trying to support both, I suspect.

In general, wart is able to be more parsimonious because my domain of concern is smaller. I'm quite happy to fall back to regular s-expressions where these rules don't work well, and I haven't yet run into a domain where the added parens are a big burden. If you run into a code example where, say, right-associative operators are much easier on the eye I'd love to see it. It would make a good counterpoint to the arithmetically-focussed Bresenham's algorithm in wart.

Custom flow control? Really?

Color me doubtful: usually custom flow controls are a hack nothing else, for example in Perl while( .. ) { ... } allow next/last to skip/exit the loop but do { .. } while( .. ) doesn't work with next/last!

usually custom flow controls

usually custom flow controls are a hack nothing else

I disagree. For example, one of the most interesting things about monads is their ability to sequence imperative statements, ie. perform custom control flow. This means I can use nice, uncluttered code like this:

return x >>= foo >>= bar >>= baz

Then behind the scenes my monad can automatically take care of null checks (Maybe monad), execution traces (Writer monad), dependency injection (Reader monad), inverting control (Continuation monad), backtracking, unification, state-handling, transactional memory, co-termination, and so on.

Maybe you don't agree with making such things implicit. Fair enough, but in that case I'd say it's *even better* to have custom flow control, so that we can make the explicit handling clearer. Compare:

-- Implicit null-checks with the Maybe monad
blah x = do y <- foo x
            z <- bar y
            baz z

-- Explicit null checks (ewwww...)
blah x = case foo x of
           Nothing -> Nothing
           Just y  -> case bar y of
             Nothing -> Nothing
             Just z  -> baz z

-- Explicit null checks with a custom control-flow function
ifNotNull f Nothing  = Nothing
ifNotNull f (Just x) = f x

blah x = ifNotNull baz (ifNotNull bar (foo x))

-- Or, point-free
blah = ifNotNull baz . ifNotNull bar . foo

-- Alternatively, with the functions 'in the right order'
(==>) = flip (.)
blah = foo ==> ifNotNull bar ==> ifNotNull baz

Take a look at Dave Herman's blog post...

...on reading and parsing and how they interact with macros. It's really good! Once you read this, you will understand more than about 95% of what is written on the Internet about macros.

Sounds like the same result

Sounds like the same result as from my techreport on well-behaved parsing of extensible syntax, where I concluded the parser can't be well-behaved unless the structure of syntax trees can be deduced before one knows what the syntax extensions are. (here)

I've been thinking about liso overnight

Are you trying to build just one possible syntactic alternative to s-expressions, or the one, timeless alternative?

That table of operators seems like a lot. Do you really need every single one of them? Also, I've always found hardcoded precedence to be a hack.

You say you want to reduce punctuation, but almost all of your rules involve new punctuation. Do you mean that you want to reduce parentheses?

What's with the @ operator? I can't wrap my head around it.

In general, I fear you're making a mistake in common with sweet expressions: trying to eliminate all parens. It's one thing to reduce their use, but there's a sweet spot past which features and rules cost global understanding far more than any local benefits they provide.

Finally, you might be interested in this other attempt at something similar (which also makes the same mistake IMO but far more mildly). It was discussed at the arc forum and reddit.

Another attempt (mine, this time) that uses adjacency to mean application like liso does. Discussion at arc forum.

Ah, I think I get @

I just noticed and read your second link.

It seems to serve a purpose similar to keyword args in wart.

Paradoxically, @ in wart means.. apply.

(f @args)   ; like (apply f args), analogous to ,@ in macros
(f @args x) ; let's see your apply do that

More info.

That's what my post suggests, true...

... but in Liso I treat @ a bit differently. Essentially, @xyz acts as a named opening bracket matched by a colon. Think of it this way: most editors will indent Lisp expressions by looking up the "indenting policy" associated to the first atom. By and large, there's a certain number of special subforms, then a body. I'm just encoding the same thing syntactically. If you define some new control structure as a Lisp macro, it'll indent like crap unless you manually enter the rule somewhere or use something like SLIME to scan the code. My syntax requires neither analysis nor extra manual effort. Like it or not, this is the kind of advantage that a user is bound to notice.

Regarding your example, I didn't bother implementing splicing syntax for lists. If I did, f[*args, x] would work just fine (or some other operator, I don't know). Otherwise there's always f{args ++ [x]}.

If we're going to play this game, though... "let's see your apply" implement multimodal functions :)

prepend[text] =
   msg ->
      @match msg:
         "text" -> text
         [s] -> text ++ ", " ++ s

hello = prepend["hello"]
hello "text"   ;; "hello"
hello["text"]  ;; "hello, text"

Unlike Lisp or Scheme, Liso does not have to restrict function application to lists of arguments. The terse apply syntax makes it painless to embed various modalities into the type of the message you send to an object. OOP languages generally have five major modalities (function call, getter, setter, getindex, setindex), the last two sometimes conflated with the middle two. Liso can map all of them to f x for some type of x, add new modalities dynamically, and make metaprogramming completely trivial without any need for dirty hacks like magic methods. Adapting s-expressions to the semantics of these languages is more awkward, because you only have one modality to work with.

Or consider this: a symbol that can act both as a macro and as a normal function. With o-expressions, you can do this. With s-expressions? Not cleanly.

I think it's closer to a

I think it's closer to a semantic alternative to s-expressions. In s-expressions, a pair has different semantics depending on whether it is found in car or cdr position in another pair: (b c) and (a b c) both contain the substructure (b c). The first occurrence has application semantics because it's in car position. The second occurrence has cons semantics because it's in cdr position.

I believe that this conflates what are fundamentally two different node types. So I split them up. In Liso, every pair is an application. The s-expression (a . (b . c)) would mean: apply b on c, then apply a on the result. Pairs define binary trees and not lists of lists. For lists, some other data structure ought to be used. Abstractly speaking it's a bit like this:

(f x (g y z))         ;; Standard syntax, standard semantics (function calls are represented by lists)
(f . [x (g . [y z])]) ;; Standard syntax, my semantics ([...] is a different data structure, not reducible to conses)

I have no use for s-expressions: they are the wrong syntactic sugar for the alternate semantics that I chose. In principle your own "Blub" attempt comes very close to what I'm doing, but I feel that it stops short of what I believe to be the next logical step: ditching lists for program representation in favor of pairs and adding a second internal node type for lists specifically.

The priority graph reflects what I believe is likely to increase programmer efficiency in practice. I am not a purist. There are standard priorities for arithmetic and logic that nearly everyone is familiar with. Pretty much nobody will complain about their presence but their absence would likely cause irritation. Priority parsing is trivial. Is it a hack? Maybe. Should I care? Probably not.

Regarding parens elimination, one thing you have to realize is that Lisp-like syntax eliminates them whenever they can get away with it. Consider the following comparisons:

(when test a b c)         ;; Lisp: 2 parens
@when test: {a, b, c}     ;; Liso: 2 braces
(when test (begin a b c)) ;; Equivalent s-expression: FOUR brackets

(f :x 1 :y 2)         ;; Lisp: 2 parens
f[x = 1, y = 2]       ;; Liso: 2 brackets
(f (= x 1) (= y 2))   ;; Equivalent s-expression: SIX brackets

Syntax based on o-expressions produce deeper nesting than the typical s-expression-based syntax. It preserves structural information that Lisp throws away. There are essentially two ways to solve parenthesitis: more syntax, or less structure. I do the former. Lisp does the latter. Clojure does the latter so much that I find its syntax offensive. I prefer my solution. I mean, let's be honest here, if you had to analyze code to find key/value pairs... you're going to have a harder time with Lisp than with Liso.

Fair enough

I'm going to play with this some more.

Syntax with clean macros is the holy grail of lisp. Are you really saying you've done it?

Thanks for the interesting factoid on clojure complecting things.

Another project you might be interested in.

^-expressions

I'm pretty late to the party but this thread got me thinking about s-expression syntax and I had an idea for handling infix operators. It would be possible to add support for arbitrary infix operations to s-expressions just by using an extra symbol to denote them. I think the caret symbol is a good choice because it is easy to type and looks like a branching tree (which corresponds roughly to an AST representation).

The ^ would transform triples like x ^y z to (y x z). It would be left-associative so 1 ^+ 2 ^* 3 would be (* (+ 1 2) 3)

These programs would be equivalent:


(fib n) ^=
  (if n ^<= 1
    n
    (fib n ^- 1) ^+ (fib n ^- 2))

(= (fib n)
   (if (<= n 1)
       n
       (+ (fib (- n 1))
          (fib (- n 2)))))


I've been toying with exactly that.

I have this exact operator in a lispy syntax I've been experimenting with, complete with that left-associativity behavior. (Technically I chose a different character than ^.) I wanted to let programmers express various flavors of qualified name lookup and object property access without burying the variables/properties under parentheses everywhere they occur.

It turns out I haven't used this syntax much. I have trouble finding a nice way to indent my code whenever I even think about using it, and (fib n ^- 1) looks too much like it means (- (fib n) 1). I was thinking of changing this syntax to have punctuation on both sides so I could use it without whitespace: (fib n^-^1). I think this would clarify the precedence and suggest a more uniform indentation style at the same time.

When thinking along these lines

When thinking along these lines, I envision using the first character of an atom to determine whether it's a keyword or symbol, and requiring symbols and keywords to strictly alternate. Parentheses, imho, are highly desirable for small expressions; in fact, I fully parenthesize expressions when writing in C, JavaScript, you name it, because this rescues programmers from the perils of operator precedence. I'd replace parentheses with other notations at a larger scale, which seems ironically to be where most alternative Lisp notations don't try to supplant parentheses.

(As an alternative to strictly bipartitioning leading characters between keyword/symbol, I've sometimes favored having a set of keyword-only leading characters, and then allowing other atoms to be keywords or symbols depending on context. If the first atom is a keyword, odd elements of the expression are keywords; otherwise, the even elements are keywords. This does introduce a new possible syntax error: misplaced keyword, when an element of an expression is supposed to be a subexpression but instead is an unparenthesized keyword.)

[Didn't make clear enough, in that, the role of literal constants and subexpressions, which can occur wherever a sybol can. Straightforward, though.]

I'm annoyed with myself for

I'm annoyed with myself for that; I botched the details. The rule should be, if the first element of the list is ambiguously either a symbol or a keyword, you look at the second element, and if it's ambiguous too, you make an assumption about which of the two is a keyword. (Doh.) All of which points out the desirability of just choosing in the first place an unambiguous rule for which atoms are keywords.