syntax and nesting: Lispy or Algol'ish?

I've recently made an observation about syntax that has provided the first compelling reason I've seen to *NOT* use the traditional fully-parenthesized prefix syntax of Lispy languages. Instead, it looks like a better option for avoiding deeply nested syntax is the argument-parenthesized prefix syntax of Algol-like languages, along with an infix dot as syntax for an infix field-dereference operator for objects.

The rationale is that with these two syntax additions, you are able to avoid a lot of nesting levels when programming in a functional style -- and nesting levels are often one of the things people find confusing to keep track of.

When programming in a functional style, you usually want a "functional OO" style, ie, all operations on objects return new objects rather than modifying extant objects in place. This is frequently referred to as "monadic" programming, but IMO that terminology mainly gives rise to confusion.

So let's look at a straight Lispy syntax for a compound operation involving three method calls with two arguments each:

((getmethod
   ((getmethod
      ((getmethod object method1) arg11 arg21) 
         method2) arg12 arg22) 
            method3) arg13 arg23)

Here the syntactic nesting is six paren-levels. The same operation with the dot for an infix field-dereference (eg, getmethod) operator gives this:

(((object.method1 arg11 arg21)
      .method2 arg12 arg22)
         .method3 arg13 arg23)

which is three paren-nesting levels and three infix dereference operators, and seems easier for humans to parse. Now, if we switch the lispy fully-parenthesized prefix notation for the Algol-ish argument-parenthesized prefix notation, we get:

object.method1(arg11 arg21)
   .method2(arg12 arg22)
      .method3(arg13 arg23)

Which is still the same six levels of semantic nesting, but no longer syntactically nested in parentheses at all. And this seems easiest for humans to parse, because the whole syntax for each operation "extends" the syntax for previous operations instead of having to be nested inside it.

Argument-parenthesized notation is just as 'regular' as fully-parenthesized, so macrology/call syntax can work the same way. The "infix dot" can be a reader macro as typical for lisps giving a fully-regular syntax. So, although the usual downside of losing lispy syntax is loss of ability to manipulate the code as data, I'm not seeing it here.

Just an observation at this point, but did I miss anything important?

another variant of Lisp syntax

Just for this example, there's another way to reduce nesting. I expect to support another syntax in any Lisp dialect I implement, borrowing from a Smalltalk model. (I plan to name a Smalltalk dialect "Gab" because it's a short word of like meaning. So the following syntax is Gabby as well as Lispy.) I'd write your expression like this:

(((object method1 arg11 arg21) method2 arg12 arg22) method3 arg13 arg23)

Symbols method1, method2, and method3 are not evaluated — they are just method selectors. Evaluation logic goes as follows. Does a special form symbol appear in first position? If not, evaluate it. Is the value executable? If not, then the expression in second position must be a symbol, used to lookup the method with that selector, but all other expressions after the selector get evaluated and passed as args to the method invoked. Basically this extends Scheme evaluation by turning an error case (non-executable value in first position) into Smalltalk style method dispatch.

To get code like the above to work in Smalltalk, with arity of two for each method, we need method names in "keyword message selector" format with two colons apiece. Suppose we replace method1 with method1:x:, method2 with method2:y:, and method3 with method3:z:. Then this does the same thing:

((object method1: arg11 x: arg21) method2: arg12 y: arg22) method3: arg13 z: arg23

Of course it would be shorter if we had replaced method1 with m1:x:, etc. The amount of nesting in Smalltalk syntax for that example is only one less than a Lisp version, so Lisp doesn't look especially verbose here. Binding local variables in let expressions can look overly nested to me, but not enough to annoy.

By Rys McCusker at Fri, 2013-03-08 10:45 | login or register to post comments

This is more or less a pure

This is more or less a pure version of how SICP introduces objects but you have one more layer of parentheses:

((object 'method) arg1 arg2)

Another option (perhaps a compromise) could be:

(object 'method (arg1 arg2))

In ISWIM-syntax languages, this would just be:

object 'method (arg1 arg2)

Which is only a few characters away from the dot syntax.

One issue with chaining methods is the return value of a method is not necessarily the object, e.g.

dict.get(key1).set(key2, value)

There there are two different ways to interpret:

dict.get(key2)

Will it return value or something else? In the pure version, each method would return object. In the side effect version, the return value is almost never object.

By kms at Fri, 2013-03-08 15:42 | login or register to post comments

maybe more impure than it looks

(I read both SICP and EOPL in the early 90's, but my post here conflates them so when kms says SICP I respond as if EOPL was mentioned instead ... which is sorta weird and slightly embarrassing to me. But more importantly, it makes my response even more rude because it's both inappropriate and wrong, for which I apologize.)

I like Ray's comparison of alternatives, so I added one — which is not a reference to SICP because it comes from an interpreter named Ygg I wrote in the early 90's, revised several times from 1990 to 1993. That I may do it again is boring; only green threads would make it different. I like Ray's criticism it's only single dispatch, because discussing that in docs would be useful.

This is more or less a pure version of how SICP introduces objects but you have one more layer of parentheses

Hmm, I thought I had fewer parens. I like SICP but haven't thought much about it in twenty years. At Taligent several of us started casually going through SICP, so we weren't thinking solely about C++ all the time, and because it was fun. I forget the variant of Scheme I was using for SICP, but it's macro system didn't match that assumed by SICP materials. I made a wild guess its macros might work just like non-hygienic macros I had coded for Ygg, and they did.

You can still find my SICP (cor: EOPL) contributions for define-record and variant-case online here and there, with a 1993 copyright plus edits by Friedman and other folks. So there's proof. :-) Posting them here in no way helps this topic though, and I basically detest appeal to authority. [Edit: It was MacGambit; for the curious, a clean copy appears at http://www.cs.indiana.edu/pub/eopl/gambitmacros.s, and my original Usenet post notes the odd use of eval to cause top-level definitions.]

Did my description make sense by itself? Sometimes I'm puzzled when a short and simple description causes a person to invoke a book, which adds burden on readers when they may assume they can't understand without boning up first on dusty tomes.

Change of subject: I had assumed Ray's example did not return the same object from each call. Smalltalk actually has specialized syntax to send multiple messages to the same object. But it occurs rarely enough in practice that folks feel semi-colon was wasted on such a narrow purpose.

By Rys McCusker at Fri, 2013-03-08 23:52 | login or register to post comments

I, too, liked the comparison

I, too, liked the comparison of alternatives, so I added a few! When I said "you", I did not mean McCusker -- I meant generic you, as in, "If you do it this way I am about to suggest, then you have one more layer of parentheses."

I didn't mean to appeal to authority. I didn't grok objects/closures until I fully digested this bit of SICP. Your description was good! This way makes the object/closure relationship clearer for me.

Sorry for being so defensive. There I'm doing it again!

Regarding methods that would otherwise be void that return the modified object, I just was brushing up on JavaScript and Crockford refers to this style of method as a cascade. Is this terminology founded elsewhere (perhaps in PLT) or just an analogy to CSS?

By kms at Sat, 2013-03-09 04:11 | login or register to post comments

my error then

Darn, my misinterpretation of your first sentence was more my fault than yours, as if you meant Above by "This" when you meant Below, and then I saw you as confirming what I expected. I fell prey to a classic error I worry about more when writing, which is, "If a reader can interpret a comment as about themselves, they usually will." So I should have known better. On re-reading your remark, I belatedly noticed there was a second interpretation, but by then it seemed best to see what you reply. Thus I'm the one owing an apology. Thanks for being good natured.

Don't worry about being defensive. I say sorry a lot myself. Somehow I used to provoke folks into barking, "Stop saying you're sorry!" Now I think they felt slightly like jerks for having used too critical a tone themselves. I'm usually keen to hear criticism about something I did, because I might learn something.

Now you mention it, Smalltalk used cascade to mean nearly that, using a semi-colon between messages to target the same object, so returning self instead of void was not necessary. But I recall no other use of the term in other contexts beyond phrases "cascading failure" and "cascading style sheets." So I'd expect precedent is Smalltalk, or perhaps another language too like Self, with which I'm not familiar.

By Rys McCusker at Sat, 2013-03-09 06:55 | login or register to post comments

Cascade

I've never heard "cascade" used to describe that. I've heard "method chaining" and "fluent interface" (which is a truly horrible name).

By Matt Hellige at Mon, 2013-03-11 02:33 | login or register to post comments

"Cascade" is a term from

"Cascade" is a term from Smalltalk. Smalltalk has built-in support for sending a bunch of messages to the same receiver. Fluent interfaces, as I understand it, were invented by a bunch of Smalltalkers to try to emulate that feature in languages like Java that didn't support it natively. See here.

By munificent at Fri, 2013-03-15 18:02 | login or register to post comments

The dot can also be an

The dot can also be an ordinary function if your language supports infix operators.

x.f = (f x)

You would have to arrange the precedence so that

x.f(y).g(z)

works correctly. This is what they do in F#, except they use |> as the operator (and for some reason they still have a separate syntax for .NET object method invocation, instead of treating methods as functions). So in F#:

x |> f y |> g z

is equal to:

g z (f y x)

e.g.

x |> map f |> filter p |> fold g 0

By Jules Jacobs at Fri, 2013-03-08 14:51 | login or register to post comments

Why F# method syntax is different

The reason F# needs to use a separate syntax for .Net methods is because methods need to support overloading. Multiple .Net classes can have methods with the same name, and in addition a single class can have multiple overloads of the same method. The non-OO parts of F# are basically just Caml, which cannot support overloading functions without totally messing up type inference. So there needs to be some syntactic way of distinguishing .Net method calls from function calls, in order to allow the functional part of the language to continue to work.

By Adam Jenkins at Mon, 2013-03-11 16:04 | login or register to post comments

Hmm. Now considering multi-arg dispatch methods....

The above is highly suitable for single-argument dispatch OO, but, SICP's gentle introduction aside, the traditional way to do OO in Lisps is with generic functions dispatching on multiple argument types.

Multi-argument dispatch is a generalization of single-argument dispatch, but because the method is no longer a property of a single entity, you wouldn't reach it via a field dereference operator.

The compound call above in traditional lispy syntax would wind up being something like:

(method3 
   (method2 
      (method1 object arg11 arg21) 
   arg12 arg22) 
arg13 arg23)

The proposed Algol-ish argument-parenthesized notation makes it into:

method3(
   method2(
      method1( object arg11 arg21)
   arg12 arg22)
arg13 arg23)

Which is no better, really. Dang.

By Ray Dillinger at Fri, 2013-03-08 17:33 | login or register to post comments

Macro

Here's a way to wrap the traditional syntax:

(defmacro bind (object method-list)
   (if (null method-list) object
       (chain ((caar method-list) object (cdar method-list))
              (cdr method-list))))

(bind object ((method1 (arg11 arg21))
              (method2 (arg12 arg22))
              (method3 (arg13 arg23))))

By kms at Fri, 2013-03-08 17:50 | login or register to post comments

What I was going to say

This also addresses your issue in another comment: we no longer care what each method returns.

By Kartik Agaram at Sat, 2013-03-09 07:32 | login or register to post comments

Existing usage

In Arc, which doesn't particularly use objects, my code would tend to look similar to examples already posted.

(method3 (method2 (method1 object arg11 arg21) arg12 arg22)
  arg13 arg23)

; anaphoric and, which short-circuits if an intermediate value is nil
(aand object
  (method1 it arg11 arg21)
  (method2 it arg11 arg21)
  (method3 it arg11 arg21))

The short-circuiting of aand isn't always convenient for this purpose, but it can come in handy elsewhere.

Switching to a more obviously OO-related viewpoint, Clojure code takes a similar approach when accessing JVM methods, from what I've seen:

(.method3 (.method2 (.method1 object arg11 arg21) arg12 arg22)
  arg13 arg23)

(-> object
    (.method1 arg11 arg21)
    (.method2 arg12 arg22)
    (.method3 arg13 arg23))

So, although the usual downside of losing lispy syntax is loss of ability to manipulate the code as data, I'm not seeing it here.

Here's a list of lisps I know of that amp up the sugar, yet still parse to the ad hoc structure of s-expressions before semantic processing. It's a frequent topic on Arc Forum.

The "infix dot" can be a reader macro as typical for lisps giving a fully-regular syntax.

A nice feature of a syntax based on stream reader macros is that once the outermost running reader macro has finished, the read operation is complete. Infix at the top level compromises that. After reading any value, the reader will have to look ahead to verify that a non-whitespace, non-infix reader macro is coming up, which is unfortunate if the input stream is a REPL.

Nevertheless, it wouldn't be the end of the world for there to be another kind of reader macro that acted as a command terminator, like a statement semicolon for lisp, but perhaps bound to newline instead. :)

By Ross Angle at Sat, 2013-03-09 00:41 | login or register to post comments

Dot notation is just syntax,

Dot notation is just syntax, it works perfectly well for any kind of dispatch. a.method(b,c) ===> method(a,b,c)

By Jules Jacobs at Sat, 2013-03-09 15:39 | login or register to post comments

Syntax...

I've been going with a fairly lisp-based design, but intend to to add some basic syntax to make it easier for people to use.

Syntax: right now I'm pretty certain of the dot as a non-evaluating infix dereference operator, square braces as an evaluating dereference operator, and numbers and numerically typed promises having an infix-mathematics expression evaluator as their call semantics. Symbols are divided into three lexically distinct classes corresponding to the names of lexically scoped variables (basic formation rules), the names of dynamically scoped variables (basic formation rules, plus a beginning and ending asterisk), and self-evaluating constants (basic formation rules, plus a leading octothorpe). I am not (yet) sold on Algol'ish function call notation.

The distinction between evaluating and non-evaluating dereference operators is that the non-evaluating operator takes a single token and the evaluating operator takes an expression. So for example if you say foo.5 and foo[3 + 2] both would mean "the value stored in the field named 5 of the variable foo" but (foo.3 + 2) would mean "two more than the value stored in the field named 3 of the variable foo." Or, as another example, foo.bar would mean "the value stored in the field named bar of the variable foo" whereas foo[bar] would mean "the value stored in that field of the variable foo whose name is the result of evaluating the variable bar."

Simplified basic types

I'm considering characters to be nothing more nor less than strings that happen to be short. I'm also making no distinction between various kinds of collections (lists, arrays, trees, hash tables, etc) -- that sort of implementation choice is to be left up to a profiling optimizer. A dictionary is no more nor less than a collection of tuples, which is also the operating definition for a database table.

And finally, I've thought a lot about the behavior of numbers and I'm trying to eliminate most of the "accidental" degenerate cases. Firstly, I'm making no type distinction between integer, rational, and decimal-fraction numbers. All of those are just sets of values and a number may have any value from one of those sets. Basic mathematical operations track non-exactness using contagion rules so you can tell if your results are inexact, but the axioms involved are not amenable to Hindley-Milner, etc, so in terms of type theory exact and inexact numbers are the same type. Both exact and inexact numbers have exactly the same set of possible numeric values, and have representations limited to the same precision/size.

There is a bignum type lexically and semantically distinguished from other numbers. If you're not using bignums, the system assumes you don't want results more precise than the standard representation can handle, and will roundoff (and coerce to inexact) as necessary to keep results inside that size/format. OTOH, if you actually do want unlimited-precision mathematics (at least until your machine runs out of memory) you have to be sure to use only exact operations and you have to make sure that all arguments to those operations are exact bignums, even those whose values could be represented as ordinary numbers.

If you use inexact bignums with exact operations, the result will be inexact but *as* precise in representation as the least-precise inexact argument, even if that's substantially more precise than the standard numerics. If you use exact bignums with inexact operations, the result will be inexact but as precise as the most-precise argument.

#nil is not merely a value in this lisp; it is reserved as the result of an operation that does not make sense. If you ask for the first element of an empty list, you get #nil because the operation does not make sense, not because that's the representation of the empty list. If you try to store #nil anywhere, it will not store; the operation returns #nil because an attempt to store #nil does not make sense. Likewise attempts to add #nil to a number return #nil, because that doesn't make sense either. Just about the only thing you can do with #nil that does make sense is check to see whether a value is #nil.

There are several other sets of distinguished values; boolean #true and #false, #uninitialized, and eight different #NaNs to name a few.

By Ray Dillinger at Sat, 2013-03-09 20:49 | login or register to post comments

Working through implications

Argument-parenthesized notation is just as 'regular' as fully-parenthesized, so macrology/call syntax can work the same way.

Say we decided to build a lisp with all the power of macros, but we put the function name outside the parens.

  f(arg1 arg2)

(Nice touch dropping the commas!)

How would you represent this as data? Would it still have the same structure as traditional s-exprs?

  car(f(arg1 arg2)) = f
  cdr(f(arg1 arg2)) = arg1(arg2)

Do you want to be able to represent non-call lists in the traditional way? It's kinda weird to have to say arg1(arg2). But if you want to be able to say (arg1 arg2), then you're now no longer as regular as traditional syntax. Code isn't quite data, in a tiny, subtle way. Ambiguities abound, and you have to do things like require no space between function and open-paren. Clunky. Let's choose door 2.

---

Traditionally:

  (a b c) == (a . (b . (c . ())))

The equivalent transformation now would seem to be:

  a(b c) == a(b(c()))

which suggests a different language, one with a sort of reverse-currying semantic..

I find the syntax for dotted lists interesting:

  (a b c . d) => a(b c(d))

---

Macros. We would like macros to be able to, among other things, create control flow operators that are indistinguishable from our primitives. So our primitives need to look like this as well:

  if(test(arg1 arg2)
    do(expr1
       expr2)
    do(expr3
       expr4))

  if(test(arg1) do_something(arg1) do_something_else(arg2))

  def(f(param1 param2 param3)
    do_something(param1 param2)
    do_something_else(param1 param2 param3))

  assign(f fn((param1 param2 param3)
             dosomething(param1 param2)
             dosomething_else(param1 param2 param3)))

Kinda jarring to the eye, but ok, maybe it's just novelty. Let's keep going:

  mac(when(cond . body)
    `(if(,cond do ,body)))

On the pro side it's cool how we don't need a splice operator, we can just say (list a ,(b) or (list a ,rest) when we mean (list a ,b) or (list a ,@b).

On the con side: backquote needs parens that don't really exist in the rest. But ok, maybe `() looks like a function.

Which reminds me, regular quote has the same problem:

  'x => '(x)
  '(a b c) => '(a(b c))

Between quote and unquote, there could end up being a lot more parens.

...to be continued (maybe).

By Kartik Agaram at Sun, 2013-03-10 02:13 | login or register to post comments

Right....

As I said, I'm not quite sold on the Algol'ish function call syntax.

These are some pretty good reasons.

Quote and unquote being functions and having the syntax of function calls (parens and all) is fine. But the list/call syntax mismatch is not so fine, unless there is something else besides lists, just as general, that has the same syntax in data as a call has in code.

I'm not going to break homoiconicity; the idea that all code can be read as data is too important to get rid of. That's the main incomplete notion that still has me hesitating about embracing an Algol'ish call syntax.

Ray

By Ray Dillinger at Sun, 2013-03-10 05:33 | login or register to post comments

The only reason conventional syntax can't have macros

..is algol'ish call syntax, I think. For the reasons I showed above. It can be done, but it ends up becoming onerous.

Everything else in an algol-like syntax can be desugared pretty easily, IMO. Statement separators/terminators can just become comment tokens :) And things like begin/end or {} can be turned into lists. You can have a seemingly-statement oriented language without creating the expression-statement divide. It's just f(x) that's the pain. Better to just use (f x) everywhere.

I discussed this before here. It's based on experiences with my toy language, Wart, that relies on a lisp-like evaluation model.

By Kartik Agaram at Sun, 2013-03-10 05:50 | login or register to post comments

C is homoiconic: strings are

C is homoiconic: strings are data. Why doesn't C have the advantages of Lisp then? Because sexprs have more structure than strings. But sexprs don't capture all the semantic structure there is in code. (list 'if (list '<= 'a 'b) 'a 'b) is not the most structured representation of an if AST node. Lisp sexprs are just an intermediate between completely unstructured (strings) and completely structured (an abstract data type for ASTs). The problems of Lisp macros, e.g. unintentional variable capture, also come from code representation not being completely structured. Therefore you can and should be *more* homoiconic than Lisp, by representing code with an abstract data type for ASTs.

By Jules Jacobs at Sun, 2013-03-10 11:59 | login or register to post comments

The problems of Lisp macros,

The problems of Lisp macros, e.g. unintentional variable capture, also come from code representation not being completely structured.

I view structure and hygiene as orthogonal. Scheme hygienic macro systems achieve hygiene without a full AST-based model.

By Manuel J. Simoni at Sun, 2013-03-10 13:14 | login or register to post comments

Although it is possible to

Although it is possible to do hygienic macros without changing the data type you're working on, the resulting macro systems are limited (e.g. syntax-rules). So many hygienic macro systems don't work on sexprs but on their own syntax objects. If you use an AST abstract data type from the start, hygiene comes naturally rather than something you have to specifically design for.

By Jules Jacobs at Sun, 2013-03-10 16:56 | login or register to post comments

So many hygienic macro

So many hygienic macro systems don't work on sexprs but on their own syntax objects.

Yes, but those are effectively conses and symbols with hygiene information attached - the basic model is still Lisp's AST-less, list-based syntax.

By Manuel J. Simoni at Sun, 2013-03-10 17:49 | login or register to post comments

Yes, they are conses and

Yes, they are conses and symbols plus extra structure. So structure and hygiene are not orthogonal. For another example of a problem that has to be explicitly fixed with sexprs based macros that wouldn't even exist in the first place with structured ASTs, see Fortifying Macros. The problem arises because the meaning of sexprs is not automatically aligned with the representation, so sexprs need to be parsed to extract meaning, just like strings need to be parsed to extract meaning.

By Jules Jacobs at Mon, 2013-03-11 01:07 | login or register to post comments

C lacks ubiquitous

C lacks ubiquitous first-class values, a true function eval, and sufficiently powerful tools for manipulating strings as structured entities. Although no tools can fully compensate for the inherent lack of structure of strings vs sexprs, there are things one can do. Add a true eval, fcvs, and regular expressions, and some aggressive mutability for good measure, and you have... javascript. I'm not fond of javascript, but I recognize that for all its faults it nevertheless taps into the fringes of the power of Lisp.

By John Shutt at Sun, 2013-03-10 14:49 | login or register to post comments

I think I need example to

I think I need example to understand you. What could be the most structured representation of IF AST node?

By Kazimir Majorinc at Fri, 2013-03-15 16:14 | login or register to post comments

Oh ... lists are collections.

Lists are collections. They are not recursively built out of cons cells as I've formulated it; In fact there is no such thing as a cons cell as distinguished from a collection of two elements. Collections can as easily be arrays or something else. One of the results is that the empty list or #nil has nothing to do with terminating a list. Another is that the "dot syntax" for pairs is not tied to that particular meaning and dot can be used instead as an infix operator for dereference.

car and cdr as functions will return #nil when you attempt them on an empty list, and in fact this is how the end of some iterations is detected. But that's nothing to do with list structure as such.

so, the traditional list syntax being a shortcut for pair syntax, or

(a b c) == (a . (b . (c . ())))

is simply not valid. Instead list syntax denotes collections whose storage format is unspecified.

By Ray Dillinger at Sun, 2013-03-10 05:46 | login or register to post comments

I see!

So did you imagine just using some syntax like the following?

[a b c]

By Kartik Agaram at Sun, 2013-03-10 05:56 | login or register to post comments

Precedence error

I think the sentence wasn't "so, this syntax (being a shortcut for that one) is simply not valid."

It was more like "so, 'this syntax being a shortcut for that one' is simply not valid."

The way I understand it, the syntax is still (a b c), but it isn't a shortcut for any other syntax. The resulting collection may or may not be implemented in terms of cons cells; this detail is invisible.

By Ross Angle at Sun, 2013-03-10 09:51 | login or register to post comments

Um, no?

I'm not sure why I should. I'm using parens to denote collections. The only thing I've considered an alternate enclosing form for, so far, is denoting whether the collection is or is not ordered.

Ray

By Ray Dillinger at Sun, 2013-03-10 15:42 | login or register to post comments

You're punning parens

Hmm, if I'm following you right, it looks like you chose door 1 after all. How would you parse this?

f(a (g h))

Is it a call to f with 2 args, or with one arg that's constructed by calling a on the list (g h)? So you're forced to constrain spaces one way or another to avoid ambiguity, just like the readable project above.

By Kartik Agaram at Mon, 2013-03-11 06:45 | login or register to post comments

Another question

When your language sees:

(a b)

Does it implicitly eval all elements of the list? Or does it assume the list is implicitly quoted? Neither option feels salubrious..

By Kartik Agaram at Mon, 2013-03-11 06:48 | login or register to post comments

Correction

Ross Angle pointed out that my equivalence for dotted lists is incorrect:

(a b c . d) => a(b c(d))

That would instead expand to (a b (c d)). So it seems we need dot syntax even with this proposal.

By Kartik Agaram at Sun, 2013-03-10 05:54 | login or register to post comments

implications, a different way.

Suppose there is a fairly unmodified list syntax, and we require top level function applications to be enclosed in parens. So where inside expressions we would say

f(x y z)

if we're evaluating the same thing at top level we have to say

(f(x y z)).

If y turns out to be a function with two arguments a and b, the expression can be written as:

(f (x y (a b) z))

The rule being that a list is always an argument list attaching to the function bound to the preceding element (or the function resulting from the previous expression).

The top-level forms always being lists sort of implies that they are argument lists for an "implicit" function -- which, of course, is eval curried over the persistent environment. But if we're going to do that we have to be consistent. So we need to put parens around *every* top level form, not just function applications. So instead of saying

foo[bar]

at the top level, we'd have to say

(foo[bar])

Which is, well, ugly, but better than a statement terminator because now we have evaluable top level forms always in the form of a list which we can destructure using car/cdr/etc, where a statement terminator by itself does not give the form any knowable structure.

So a list denoting a function application has two elements: a function name (or function expression) and an arglist. According to the logic that car returns the first element and cdr returns a list of the remaining elements, you get:

(car (f (x y z))) => f

(cdr (f (x y z))) => ((x y z))

which is again, consistent, but ugly. The result of the second expression is a list having one element; but that element is itself a list.

I can make a consistent homoiconic macrology here with cons, car, cdr - but the price of algol'ish procedure call syntax appears to be an additional level of list structure to deal with in macros and additional parens at top level around every complete expression.

By Ray Dillinger at Sun, 2013-03-10 16:33 | login or register to post comments

Take a look at Honu

Honu: Syntactic Extension for Algebraic Notation through Enforestation, Jon Rafkind and Matthew Flatt.

Honu is a new language that fuses traditional algebraic notation (e.g., infix binary operators) with Scheme-style language extensibility. A key element of Honuâ€™s design is an enforestation parsing step, which converts a flat stream of tokens into an S-expression-like tree, in addition to the initial â€œreadâ€ phase of parsing and interleaved with the â€œmacro-expandâ€ phase. We present the design of Honu, explain its parsing and macro-extension algorithm, and show example syntactic extensions.

If I understand it properly, the basic idea is to view "infix languages" as being composed of a pure recursive-descent/LL(1)-parseable part (most of the language), and a top-down operator precedence part (e.g., for infix arithmetic operators). Basically, standard Lisp macro technology can be adapted to work for the LL(1) part, and you can add a second infix operator macro facility for the second class of expressions.

It's a simple, but elegant and well-executed idea.

By neelk at Sun, 2013-03-10 16:16 | login or register to post comments

Well, okay... I guess I've reached a decision.

After looking at the implications in terms of consistency and source code manipulation, I cannot justify moving to the Algol'ish call notation. The additional elegance you get by escaping nesting in some circumstances is more than destroyed by the kluges and additional nesting you'd have to have in others. So calls are going to remain fully-parenthesized. I'm keeping the [square brackets] and .dot notation as accessors, and adding some other syntax for multivalent variables and constants and for capturing multiple-value returns, but with fully-parenthesized forms as the default, it's going to definitely remain recognizable as some kind of Lisp.

Thanks to everyone who helped kick this topic around; you've teased out issues that I missed when I was considering it by myself.

Special thanx to Kartik Agram for working out implications a couple different ways, to neelk for the reference to the Honu paper, to Ross Angle for the link to the Arc Wiki, and especially to Jules Jacobs for the (valuable!) Fortifying-Macros paper.

By Ray Dillinger at Tue, 2013-03-12 08:51 | login or register to post comments

future updates welcome

You write really well; I hope further updates are coming as you think more, because it's a pleasure to read your reasoning. The only person whose posts I admire more is Jules Jacobs, who has a scary kind of clarity in expression. (Apologies to anyone else who feels slighted. I'm resisting a mini dialog where Crash claims Ross Angle is a member of the Geometry Clan, and a generally trig fellow.)

By Rys McCusker at Wed, 2013-03-13 02:13 | login or register to post comments

User login

Navigation

syntax and nesting: Lispy or Algol'ish?

Comment viewing options

Browse archives

Active forum topics

New forum topics

Recent comments