Stroustrup's Rule and Layering Over Time

Dave Herman is the voice of the oppressed: syntax is important, contrary to what you have been told!

To illustrate he discusses what he calls Stroustrup's Rule:

  • For new features, people insist on LOUD explicit syntax.
  • For established features, people want terse notation.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

whaddawewant? syntax! whendowewantit? now!

Ehud, you troll, you. :P

Seriously, I didn't really mean to be outrageous, but as PL wonks I do think it's worth asking ourselves why "notation is a tool of thought" and "syntax doesn't matter" are both standard beliefs in our community.

I guess I always thought

I guess I always thought that these are (mostly) non-overlapping sub-communities...

It's a shame

By now you'd think that some subset of each sub-community would have figured out that everything has dual nature. Syntax is semantics, semantics are syntax. Let's all work together and share ideas :)

Interesting

It is nice to explicitly discuss the evolution of design decision over time, and the influence even *within* a single programming community of (also time-evolving) familiarity with the manipulated concepts.

An amusing aspect of this blog post is that it can be turned on its head somewhat and read back as a praise of semantics: one reason why a construction such as (foo?) is appreciated today is that it has a very clear semantics, given by its elaboration into (match foo with Ok x -> x | Error err -> return err), that is now internalized by users. If foo? had been proposed from the start, the design space of its intended semantics would be more open/uncertain, and there would possibly not have been a strong consensus on it.

semantics is a way of thinking about syntax

I trace the syntax-denigrating attitude at least back to Strachey 1967; "Basic irrelevance of syntax and primacy of semantics." I'm tempted to express my premise for abstraction theory as "basic irrelevance of semantics and primacy of syntax"; semantics either reduces to syntax or (to put the same thing differently) doesn't exist.

[...] abstraction defies the separation between syntax and semantics. Sometime during the 1988/89 academic year, I opined to a professor that the artificial distinction between syntax and semantics had been retarding the development of abstraction technology for over twenty years — and the professor I said it to laughed, and after a moment distractedly remarked "that's funny" as they headed off to their next appointment. Nonplussed [...], to myself alone I thought, but I wasn't joking.

Semantics as an equivalence of different syntax forms

We can think of syntax as a way to parse an input. Sure, we can cover all possible inputs, but for understanding that input, syntax is not all we need. We have to interpret that syntax by using some semantic rules. Although my understanding of semantics is somewhat simpler than usual generalized definition, semantics are playing important role in the whole process of uderstanding different expressions.

Since I've been investigating generalized understanding process for a long time, I've come out with an interesting result: knowing a semantics is merely having a knowledge on how to translate one expression of some syntax to another expression of some other syntax. If this other expression syntax is something we already understand (we know semantics about it), we can say that we know a meaning of the starting expression.

To put it simple: by some standard generalized definition, semantics of some expression is a meaning of that expression. So we say X means Y. Hence X translates to Y. And if we understand Y, then we indirectly understand X.

I'm still having a trouble about semantics of the most bottom expression form in the chain, and I think that it might not be that important. This most bottom expression form could be the Universe itself, meanning: if we can reproduce X in the Universe, then we have the understanding about X. We don't have to understand the whole Universe to deal with X.

By this theory, semantics of X could be a relative notion. In the example of X -> Y (X translates to Y), we could say that we understand X regarding to Y, or that we understand X in notions of Y. Note that there could also be a parallel semantics X -> Z, or X -> W, or whatever target expression is, but the target expression should have a property of being enough descriptive if we want to have a proper semantics of the source expression.

I'm using this semantics theory in a programming language I'm developing.

syntactic semantics

My approach is to say the semantics of a sentence is a language. Thus, a language maps sentences to languages. And since this means a language maps each sentence to something that maps sentences to something that maps sentences to, etc., I simply say that a language is a set of sequences of sentences, that is closed under prefixing; i.e., if L is a language and X and Y are sequences of sentences and XY is in L, then X is in L.

Curious

I'm thinking about your approach, and I'd like it to be possible. Thus, a language transforms a stream of characters into an AST (acts like a specialized function, regarding to grammar). But to have something that we could really use, in my opinion, we have to have a notion of language that can transform an AST to another arbitrary AST through the same interface it uses to operate on streams. A language from the beginning of this post deals with CFG, but is CFG enough to operate on AST-s on both sides?

Word choice

Actually, when I say "sentence" I suppose that's not the best word choice, as in my formal treatment (described yonder) the elements of a sequence are terms — essentially, ASTs.

I guess...

... we could call the notion that connects elements in a sequence also as a part of syntax.

But whichever way we turn it around, CFG, used for describing elements, doesn't seem enough expressive to hold the very process of transformation between elements. However, this transformation could be expressed by functions, and that means extending a CFG based language by the notion of functions, regardless of all of my attempts to keep the CFG language intact from outer extensions.

Syntax is Algebra

I believe syntax matters, but I think rules like the above are so absurd that it explains why theoreticians ignore it. As a good example from C++ why the rule is clearly wrong, just try writing a template. You have to put so many long winded keywords in there like template and typename that its impossible to read.

Important properties of syntax are that it should relate directly to an underlying algebra so the visual pattern matching programmers do looking at screens of code translate internally to algebraic expressions. A good example where this is recognised is Haskell type classes, where a standard set of names is used with the intent of expressing an algebra, made concrete in the instances. The programmer reads fold and filter and map and comprehends the operation without knowing the underlying datatype.

In a much more extreme form my Felix language allows the user to define syntax, which is much more powerful than mere operator overloading. The parser handles a lot of boilerplate automatically which provides stronger correctness assurances and also allows the programmer greater ease of reading, which also provides a better opportunity to reason about correctness.

In fact many theoreticians seem to miss the Grammar/Type isomorphism. This is the isomorphism that says all grammars are types, and vice versa, and is an analogue of the Curry/Howard isomorphism. The construction is trivial: just take a grammar and name each production with a type constructor that accepts a value for each symbol in the production, each nonterminal is a type which is the sum of its production's type constructors.

This also means a sequence of events in time can be parsed by some parser to produce a data structure in space, which defines a type for classes of event sequences. It specifies therefore how to at least dynamically check event sequences conform to a protocol specified by a type, what I call a control type using existing spatial type theory.

Types form algebras, by the isomorphism, so do grammars, and hence syntax is a way of modelling an algebra (sequentially).

Indeed, may ideas from type theory and functional programming can be applied to grammars, yet I have not seen much work on this. For example a parametrised nonterminal .. hey, that's just lambda abstraction. And with that a secure way to extend grammars that obeys the fundamental Open/Closed principle, by using open recursion and fixpoint operators.

So I would argue, a grammar must reflect and generate the underlying semantic and type calculus so it has combinatorial properties which define what an algebra is.

A really good example of BAD syntax or two: C type declarations. Not combinatorial syntactically. Lots of constructions in Ocaml fail to support arbitrary nesting (such as recursion between types and classes). The horrible distinction between statements and expressions in a lot of languages (including Felix!). One must never forget, referential transparency is syntactic property, vital to composition.

Editable views

I've been experimenting with projectional editing, even using plain text. I use a simple Forth-like syntax under the hood. But I can have variables or lexical closures or infix notations if I want them, using simple bidirectional transforms. View->Code, Code->View.

A convenient property of modeling syntax this way is that it becomes easy to adapt syntax to a problem or to a user. Syntax can be extended or deprecated quite freely, without concern for forward or backwards compatibility. A simple test that a round trip translation from code is identity can guard against ambiguity. And the round trip from view can show how code is interpreted, remove extraneous formatting for example.

An additional benefit is that my evaluation model is based on rewriting and lazy linking. So the evaluation function has type Program->Program. The use of editable views as the basis for syntax then allows me to view the evaluated program using the same syntax as I use for input.

And all of these features should theoretically extend to more structured or graphical view models where we might view SVG canvases and radio buttons and tables for input or output. I imagine the ability to evaluate hypermedia objects that are projections of programs. But I haven't gotten quite that far in my developments yet.

Does syntax matter? The simplicity of the Forth-like syntax I use under the hood is valuable to make views and rewriting feasible. OTOH, I don't need to commit to any particular syntax for the view presented to humans as part of my language design. Instead, it becomes part of the view model design. Which is still important. I do believe HCI matters.

Editable Views

This sounds very interesting, care to elaborate, perhaps show some examples?

I have used systems where 2D graphical elements represent code but I have no faith in that at all: my experience is that its clumsy and hard to use, and fails whenever one tries something difficult. Two cases: one a database access system, worked just fine for basic stuff but couldn't handle the complexity of invoicing. Another, a system based on black boxes designed to model telephony business logic, where some people had a graphical design tool. Again, fine for trivial operations, useless for complex routing logic.

I've heard other stories of how bad OO editors can be, ones that make you define you methods in little boxes.

I suspect two factors here: first, sequential text editors have had a long time to develop, and have powerful tools for formatting, colouring, search and replace, etc. Maybe this is the reciprocating engine beating the superior rotary design, simply due to a long history of engineering. The second fact is simply that linear things are much easier to work with as a base, because what is represented isn't merely 2D but often has a much higher, fractal, dimension. So going to 2D really gets in the way rather than helping.

So I suspect "projectional editing" of text, where the code and views are formatted and colourised in a conventional kind of way is probably optimal. Something like "folding" which many editors can do, even based on syntax, is a fairly simple example of this which can be already done (though not very well).

I'm really curious the kinds of "bidirectional" operations you can do. Everyone has used one direction operations, eg LaTeX code to PDF, and some nasty bidirectional editors like MS Word RTF editors.

The advantages of text-based

The advantages of text-based representation have been a theme for me; looking at the broad sweep of computing history, it seems to me text-based representation is common to everything that lasts. TeX/LaTeX. UNIX. Programming languages. Html. Wikis.

I've lately been developing a specialized Lisp, for writing bits of glue code connecting certain kinds of other stuff together; leaving out lots of usual Lisp features because they're just undesirable complications, e.g. no improper lists and no unbounded recursion (which for a Lisp hacker was both weird and fascinating). I speculated early on that anything long enough to need comments was probably longer than it ought to be, so I left out comments — and finally admitted defeat and modified the tokenizer to ignore everything from a semicolon outside any string literal to the end of the line. Lisp itself might be thought of as introducing a second layer of representational simplicity: first, the base representation for everything is text; second, the text represents an elegantly simple and wildly versatile data structure (in a word, S-expressions) — but the text representation of these structures has subtle extra information encoded in it, through symbolic name choice, whitespace and indentation, and comments.

Lisp and S-expressions

That's very interesting. Roughly you're saying, look, text is great, but S-expressions, have a bit more structure, they're not linear, but they're about as close to the next most complicated thing after a simple sequence as you can get. In particular with Scheme's linear syntax is very close to actual S-expressions (just replace matching parens with a tree branch).

Here's a question: if the fractal dimension of text is 1.01 (the extra bit is because we have indentation and end of lines so its just a tad more than linear), what is the fractal dimension of an S-expression?

It's interesting that your translation in some way adds structure (tree like) but also throws out the "subtle" things like variable names, comments, indentation, etc.

My guess is that the fractal dimension of Haskell is about 2.718 .. :-)

Dimensionality

I think you've pegged the dimensionality of Haskell. I suspect the dimensionality of S-expressions may be about 4.669 (supporting evidence here).

form

In all seriousness (and using the term "dimensionality" somewhat poetically), the question here seems to be to what extent the form of the representation is used to reflect the form of the thing described. The thing described has, as you say, a very high, fractal dimensionality. Text has a one-dimensional form, but that just means we get the description from the logical content of the text rather than from its physical form. S-expressions are simple enough that, with some practice, from the text one boosts fairly directly up to the logical form of the S-expressions, which can be visualized fairly well although one isn't being shown it directly, and is flexible enough to describe, indeed, high-dimensional structures of great versatility. The question is then whether they're flexible enough that the mind's-eye-visualized representation doesn't fall short of the dimensionality of the thing described (the program and its data). Experience suggests S-expressions are quite versatile, yes. There's a temptation to add on more data structures, but, in my experience, this reduces flexibility rather than increasing it. This seems to be the Achilles heel of so-called "user-defined data types": they limit how the data can be used. Lisp, by insisting on a ball of mud, never has to worry about fitting a square peg into a round hole; it's all mud.

All mud

Church and Scott encodings, using functions as the only type, would move even further in that direction.

I've taken that approach with my language. I've been satisfied with it. Though I did need to find a way to encode labels and extensible labeled data (row types records, polymorphic variants) before I was fully accepting.

Neat vs muddy

Hm. If it's working for you, great. My own experience does not beckon me in the direction of Church/Scott encodings, nor of records (which you talk about at the linked remark). A point of divergence in our perceptions may be hidden in your use of the phrase "that direction"; the question is, which direction does one think we're talking about. Church-Scott encodings are, I observe, just that: encodings. Most data gets insanely obfuscated when represented using an encoding that builds everything out of functions (the primitives aren't at the same level of abstraction as the data; good mathematical exercise, not so good for trivially easy visualization). Records are a prime case I had in mind when I wrote of reducing flexibility; they're too complicated. Handling records involves packaging things up into them and then taking them apart later, which, though less extreme than Church/Scott, still feels like encoding/decoding. With S-expresions it's much more as if the stuff was never encoded in the first place. As a rule of thumb, I suggest, anything you need a sophisticated type theory to explain is too structured. Sophisticated type theories are neat; mud is not so tidy.

Still mud

Using S expressions for everything or using lambdas (or combinators, in my case) for everything - both have the similar property of being uniform, simple, easy to visualize, "flexible enough to describe, indeed, high-dimensional structures of great versatility."

You say here that "Church-Scott encodings are, I observe, just that: encodings." The same can be said of encoding data or behavior in S expressions, texts, or bit strings - ALL of these things are "just that: encodings". But bit strings and plain text lack the hierarchical structure of S expressions or lambdas. Not all encodings are equivalent.

You claim "data gets insanely obfuscated when represented using an encoding that builds everything out of functions". I guess this would be the case if your PL isn't designed for it. But my work on editable views discussed earlier is for my same language where all data is Church or Scott encoded. I can nonetheless support texts, lists, numbers, records, etc. and render them appropriately. I use lazy linking so booleans, for example, simply become function names like true, false.

You say "Handling records involves packaging things up into them and then taking them apart later". You could say the same of cons cells, lists, S expressions. People also encode records in S expressions on a regular basis (E.g. via association lists). I think the issue with conventional records isn't that they package data, but rather that they are second class and rigid in most languages. You can't just incrementally add and remove and rename fields or compose and decompose records until you have the types you need, in most languages. There is no fluidity, no continuity. The encoding I linked above avoids this issue.

Lambdas don't need a type theory to explain. Don't conflate "I can explain this using a type theory" (which is good) with "I need a type theory to explain this" (which is bad). For sophistication, of course, simpler is better. So something like my records which can be explained by simple algebraic types are nicer than heterogeneous association lists which, if you attempt to explain them using types, would require sophisticated dependent types. (Not that you need types to explain association lists. But if you wanted to, you don't have a simple option.)

More on Mud

I suspect it depends partly on experience. I use Scheme in my system for the action codes in the parser and I find it completely unreadable. I can't reason about it very well and it's really confusing when I get errors because my Ocaml embedded Scheme processor (OCS Scheme) is not very good with error messages.

To me, visually, Scheme just doesn't have enough "punctuation" to be readable. I'm just not used to it. On the other hand my actual language Felix has a very rich set of syntactic forms, in fact so diverse I often have to look in the library or source code to remember the right syntax for something I haven't use for a while. That system has an extensible syntax and having used that to advantage I basically laugh at the idea that any modern language can possibly be designed that does not have this facility. You have to understand the *whole* language syntax is defined in the library (apart from a hard coded bootstrap which can only parse grammar extensions).

This is not because I have weak or non-orthogonal primitives which are not properly reflected in the syntax, although that is always an issue and always something to examine to seek improvement by simplicity. It's simply because particular classes of problems do not automate well with any core model you care to name. Functional programming can do parsing but specifying a grammar without a domain specific sublanguage exposes whole heap of boilerplate.

Here's an example: first, grammar fragment written out long hand:

  var xprod = Alt([
    Seq ([Nonterminal[prod_t] "term", Strng[prod_t] "+", Nonterminal[prod_t] "expr"]),
    Nonterminal[prod_t] "term"]);
     
  var tprod = Alt ([
    Seq ([Nonterminal[prod_t] "factor", Strng[prod_t] "*", Nonterminal[prod_t] "term"]),
    Nonterminal[prod_t] "factor"]);

  var fprod = Alt ([Seq ([Nonterminal[prod_t] "atom", Strng[prod_t] "^", Nonterminal[prod_t] "factor"]),
    Nonterminal[prod_t] "atom"]);

  var atom = Alt ([
    Seq ([Strng[prod_t] "(", Nonterminal[prod_t] "expr", Strng[prod_t] ")"]),
    Strng[prod_t] "9"]);


  // library
  var xlib = ([
    ("expr",xprod),
    ("term",tprod),
    ("factor",fprod),
    ("atom",atom)
  ]);

The reason I need the type parameter after Nonterminal is because I'm used an open, extensible, parametric base type, extended it, then closed it by fixation and the argument tells which fixation I need (and Felix doesn't do type inference).

And now with some functional wrappers but extended to an action grammar:

  var xprod = ALT([
    SEQ ([NT "term", STR "+", NT "expr", REDUCE ("PLUS",3) ]),
    NT "term"]);
     
  var tprod = ALT ([
    SEQ ([NT "factor", STR "*", NT "term", REDUCE ("MUL",3) ]),
    NT "factor"]);

  var fprod = ALT ([SEQ ([NT "atom", STR "^", NT "factor", REDUCE ("EXP", 3) ]),
    NT "atom"]);

  var atom = ALT ([
    SEQ ([STR "(", NT "expr", STR ")", REDUCE ("GROUP",3)]),
    SEQ ([STR "9", REDUCE ("DIGIT", 1) ])]);

  var stmt = SEQ ([ NT "expr", STR ";", REDUCE ("STMT",2)]);

and finally using a DSSL (domain specific sub language):

  gramlib xlib {
    expr = expr "+" term {REDUCE ("PLUS",3)} | term;
    term = term "*" factor {REDUCE ("MUL",3)} | factor;
    factor = factor "^" atom {REDUCE ("EXP", 3)} | atom;
    atom = 
       | "(" expr ")" { REDUCE ("GROUP",3) } 
       | "9" { REDUCE ("DIGIT", 1) }
    ;
    stmt = expr ";" { REDUCE ("STMT",2) };
  }

Just for fun here is the specification for the DSSL:

  priority 
    palt_pri # "_1";

  plibrary := "gramlib" sname "{" plibentry* "}" =>#
    """
    (let*
      (
        (tup `(ast_tuple ,_sr ,_4))
        (v `(ast_apply ,_sr (,(nos "list") ,tup)))
      )
      `(ast_var_decl ,_sr ,_2 ,dfltvs none (some ,v))
    )
    """
  ; 

  plibentry := sname "=" pexpr[palt_pri] ";" =>#
  """`(ast_tuple ,_sr (,(strlit _1) ,_3))""";

  sexpr := "parser" "(" pexpr[palt_pri] ")" =># "_3";

  private pexpr[palt_pri] := "|"? pexpr[>palt_pri] ("|" pexpr[>palt_pri])+ =># 
    """`(ast_apply ,_sr (  
      ,(qnoi 'Parser_synlib 'ALT)
      (ast_apply ,_sr (,(noi 'list) ,(cons _2 (map second _3))))))"""
  ;

  private pexpr[pseq_pri] := pexpr[>pseq_pri] (pexpr[>pseq_pri])+ =># 
    """`(ast_apply ,_sr ( 
      ,(qnoi 'Parser_synlib 'SEQ)
      (ast_apply ,_sr (,(noi 'list) ,(cons _1 _2)))))"""
  ;

  private pexpr[patom_pri] := "(" pexpr[palt_pri] ")" =># "_2";

  private pexpr[patom_pri] := String =># 
    """`(ast_apply ,_sr ( ,(qnoi 'Parser_synlib 'STR) ,_1)) """
  ;

  private pexpr[patom_pri] := "#EPS" =>#
    """`(ast_apply ,_sr ( ,(qnoi 'Parser_synlib 'EPS) ())) """
  ;

  private pexpr[patom_pri] := sname=>#
    """`(ast_apply ,_sr ( ,(qnoi 'Parser_synlib 'NT) ,(strlit _1))) """
  ;

  private pexpr[patom_pri] := "{" sexpr "}" =># "_2";


}

which uses the following library support:


class Parser_synlib
{
  open Parsers;
  open Grammars;
  fun NT (s:string) => Recog (Nonterminal [pgram_t] s);
  fun STR (s:string) => Recog (Strng [pgram_t] s);
  fun REDUCE (s:string, n:int) => Action[pgram_t] (Reduce (s,n));
  fun ALT (ls: list[pgram_t]) => Recog (Alt[pgram_t] ls);
  fun SEQ (ls: list[pgram_t]) => Recog (Seq[pgram_t] ls);
  fun EPS () => Recog (Epsilon[pgram_t]);
}

Using this machinery has greatly improved my ability to construct syntax error free grammars for testing my parser, and allowed extensions to be incorporated easily.

The point is the (Felix) grammar extensions automate boilerplate constructions systematically. They turn the particular domain's algebra represented in a functional algebra which is way too powerful for the job, and hence require lots of boilerplate, into a much simpler algebra by a syntactic mapping, which is itself driven by a syntactic algebra (namely the grammar extensions).

Error messages on mud

it's really confusing when I get errors because my Ocaml embedded Scheme processor (OCS Scheme) is not very good with error messages.

Error messages keep turning up as an issue; perhaps they're a clue to something. Key to a really good error message, methinks, is that it has a human element shining through it that simply cannot be generated formally; that element has to come from the human programmer who wrote the code in the first place. This means the language processor has to preserve the human element through everything that happens to the code up through the moment when the error message gets generated; but while this takes a well-designed language processor, I'm not so sure all languages can support such a processor. Maybe we should be asking what properties of a language allow this sort of humanity-preserving error messaging.

Errors

I agree. And some algorithms seems better suited to good error reporting than others.

I'm building a parser at the moment that uses coroutines and explores all paths. In principle, the parse fails if no path makes it to the end of the string. I have no idea how to explain an error other than "Syntax Error" without even being able to give a location.

Compare this with a GLR algorithm, where when the parse fails, it does so at a particular location. You can still only say "Syntax error at line 20 column 3" but that's better than "We can't find a parse".

This is also why I dislike HM type inference.

re: Errors

PEG.js has an interesting solution to reporting errors. It reports a syntax error, but it also outputs a list of terminals which were expected to be parsed at the maximum location it reached while parsing.

Menhir's error support

Thanks to the work of Frédéric Bour and François Pottier, the LALR parser generator Menhir now supports the "manually write a custom error message for each possible error state of the automaton" (attributed to Clinton Jeffery) approach, and the results are rather good. See:

I have used it on a small grammar for an idealized textual IR, and I had to write 32 error messages. The result is very clear, and although there is some regularity in the error message structures there is no clear way to entirely automate the process.

For example, here is one of the error states described by Menhir on my grammar:

program: BRANCH NIL TRIPLE_DOT 
##
## Ends in an error in state: 53.
##
## instruction -> BRANCH expression . label label [ NEWLINE ]
##
## The known suffix of the stack is as follows:
## BRANCH expression 
##

and here is my human-written error message:

Parsing an instruction, we parsed "branch <expr>" so far; a label, for
example "foo", is now expected to construct a branch instruction
"branch <expr> <label> <label>".

Deep understanding

It sounds as if one would have to have a fairly deep understanding of the parsing algorithm and the grammar to design the error handling. Supporting fluent handling of that sort is presumably well worthwhile for the results it can produce in situations where that sort of deep understanding is feasible. I'd really like to be able to give a minimal, intuitive description of a grammar and have the parser produce lovely error messages by deriving them from what I provided.

I'm peripherally reminded of a brief anecdote that usually came up during the coverage of LR parsing in compilers classes at my alma mater, concerning the choice of an error message to put in an LR parser table when something really ought to be allowed but isn't; the proposed message was "Error in language design".

Stability

How stable are these error messages in the face of grammar modifications?

Early to tell

I have made a couple changes to the grammar yet, but nothing very invasive.

The state numbers will shift a lot, of course, but menhir can keep track of that reliably. The menhir-generated description starts with the terminal stream that gets to the error state, so when you update the error file with the new grammar (menhir has a command to do that) it will robustly find the new error states for each previous error message thanks to this terminal stream (by just running the automaton on it). If some previous errors are not errors anymore, you get told, and if new error states appear they get added to the file.

Another thing that changes is that the description of the error context will change. I changed the grammar to allow explicit newlines in the middle of instructions (previously NEWLINE was a terminal that would terminate instruction), and I got list(NEWLINE) appearing in the middle of the context -- in the places where I inserted them in the grammar.

I think that one think to watch for would be for error states to get merged or split: going from several different possible contexts to one or conversely is going to need you to rewrite your error message. This has not happened to me yet.

(Then menhir provides you some machinery to, for example, mark some symbols to provoke extra reductions when they are at the top of the stack in an error state, which has the effect of "unrolling the context further" and thus have a coarser-granularity error instead of a finer-granularity error, which sometimes is what you want. These advanced aspects are explained in the article on the C error messages that I linked above.)

Ideally,

one would like to have error messages generated automatically from the grammar in some way that inherently creates lucid diagnostic information by preserving human perspective. The two questions are then what human-perspective information to provide to start with, and how to hang on to it and present it when the error occurs. In those terms, locating error-handling decisions at the parse states is a somewhat downstream option along a spectrum; one might then ask, how might one shift such information upstream to the grammar description, so that when modifying the grammar one would not then have any multiplied considerations of separately modifying the error-handling.

Indeed, but

Frédéric Bour, an author of the Merlin editor service for OCaml, has implemented an approach going in the description of what you describe. The grammar is annotated with extra information that help reacting to errors. (In the Merlin case, the annotations are not so much to decide an error message as to direct error recovery, that is produce an artificial fix of the parsing state to continue reading the programming buffer and usefully report other errors down the line.)

That said, there is a fundamental tension here : one of the things that the parser generator does is to run a grammar analysis that extracts information that is not, by design, present in the original grammar. This means that you may need to annotate "intent" on things that don't show up, declaratively, in the grammar description.

For example, the ability to write two separate productions that share a common prefix (instead of having to factorize them manually) is part of the core value of LR parser generators, as it lets you write more readable rules. The existence of this sharing will have an error-message impact, as errors in the shared part will have ambiguous continuations -- and this is an information you need to exploit to write a good error message, and generally want to know as a grammar designer. But I don't know where I would attach information for this case in the source grammar.

So I think it is useful to think of separating the error-reporting logic from the nitty-gritty details of the parsing automaton as much as possible, but we should also recognize that said details also play the role of a powerful grammar analysis tools whose information we want to exploit. How to expose this in a better way, I don't know.

If we were thinking of this question as just another case of compiler error reporting (type error, syntax error), one thing we would consider is to ask the grammar author to write a "shared prefix handling clause" somewhere in the grammar (maybe in a part dedicated to grammar errors), and then having a warning or error at compilation time if a mixed-continuation error exists without the corresponding sharing-aware clause, or conversely if a previous sharing clause has been rendered inadequate by a change in the grammar.

explaining stuff

The approach in my long-ago toy compiler was to produce a diagnostic message saying "here's the part of the source code where the problem happened, this part of it is the partial expression that was being put together, this token here is the one that didn't fit, and these are the sorts of things that would have made sense". Expressed far more compactly and effectively than that; but that's the information it conveyed (as best I recall; alas I don't have a platform set up where I can run it). The technique was probably helped by the fact that the grammar was set up so there was always just one unambiguous choice of what partial expression was being put together at a given point; if there's more than one choice of how many tokens before the current point might be included in the current expression, it would be more difficult to succinctly illustrate what the parser was in the process of doing when the problem was encountered.

The existence of this

The existence of this sharing will have an error-message impact, as errors in the shared part will have ambiguous continuations -- and this is an information you need to exploit to write a good error message, and generally want to know as a grammar designer. But I don't know where I would attach information for this case in the source grammar.

You could attach it to the enclosing alternative. Let's say you have a piece of grammar A B | A C, but the input is A D. You could attach error messages to the B and C nodes in the grammar that give an explanation of why the parse failed if the last hope of parsing the string died there. Such nodes would have type Parser where T is the value of the node if successfully parsed, and E is the type of error. By default the error type could be a pair (n:Int, errormsg:Str) where n indicates how far into the input string we got. The default error combiner for | could take the errors from both branches and choose the one with largest n. If both branches have the same n, display both (as in "We parsed up to {n}, and were expecting B or C, but got D"). If you a custom error message for the shared prefix you could write a custom error combiner on the | node (and change the error type for the B and C nodes from strings to something you can analyse programmatically in the error combiner).

A custom error combiner on

A custom error combiner on the | node would seem to expect an explicit branch. Grammars tend to introduce alternatives by simply having multiple grammar rules existing in parallel, in which case the choice is implicit. One might of course provide a custom error combiner for implicit choice; that seems rather... broad in scope, for customization.

Those implicit choices are

Those implicit choices are always associated with a named grammar rule, so you could attach the error combiner to the rule.

Let's say

Let's say we've got

A -> B !
A -> B , A
B -> C ( D )
B -> C [ D ]

We parse a C, and then instead of a "(" or a "[" we find a semicolon. Which rule would the error combiner be attached to?

B

B

Oh.

Thanks. That clarifies things; I would have said that's a nonterminal, rather than a rule, hence my momentary confusion. If you're willing to attach the error handling to (what I would call) a nonterminal rather than to an operator, that does limit it scope — though one might wonder if the scope might then be too narrow.

Functions as records

I have found it reasonable to treat records (and arrays, and lists, etc) as particular kinds of function closure. You call them with arguments (keys to look up) and they return results (data stored at that key). And there's a binding procedure that you call with a record, key and a value as arguments which stores the value into the closure at that key.

Closures, however, are not my favorite kind of function. Their semantics rapidly get ugly.

FWIW, I share the opinion that improper lists are not needed or wanted as program representation in Lisps.

Bidirectional operations

I started with numbers. My base language has support for natural numbers and texts. But in a view I can translate between expressions like "#0 #3 integer" and "-3". "3 #5 ratio" to "3/5" in the view. And "3141 -3 decimal" to "3.141". (Here I use #3 in the view for natural numbers, but that translates to plain 3 in the source.)

My underlying language doesn't support comments. But I can model comments in views in terms of simply dropping a text. Comment-like structures can also support qualified namespaces and other view hints.

The translation for local variables and lexical closures is a bit more involved, derived from proof of equivalence between SKI and lambda calculus. I blogged about it recently. It was finding this translation, and how simple it was, that finally convinced me this approach was really doable.

Once we have variables, infix notations aren't difficult. We can simply translate between "A B op" and "(A op B)" for a known set of operator-like words whenever A and B are variables, values, or themselves infix expressions. (We could further leverage operator precedence to eliminate unnecessary parentheses... but I personally never use operator precedence in my code.)

I have also experimented with concise expression of sequences, useful for lists, data, continuation passing code. A sequence translate between "[[a] {b,c} cons]" and "{a, b, c}". And "[[c] nil cons]" becomes "{c}".

Note how, for both sequence and numbers, I derive views from other views, incrementally building towards higher level expressions. I've found this a very convenient strategy, and it means we still have a comprehensible presentation like "2 #3 ratio" if a particular view fails to support rational numbers.

Editable views are also very useful for translating secure hashes to human comprehensible names. My language can link code and binaries and hierarchical codebases by secure hash, which is convenient for some code sharing and structure sharing patterns.

Cool

I like this idea a lot! Please report back as you make progress!

Having seen C++

I thought Stroustrup's Rule was There's always enough room for another kitchen sink.

This has some implications about minimal-syntax languages.

Stroustup's rule if applied to minimal-syntax languages (lisps, forths, etc) implies that all features are established features.

And indeed programmers in those languages DO treat all features as primitives. Lisp programmers use fairly deep concepts of semantics in casual, flexible, fluent ways. I hear the same thing is true of forth programmers.

It makes me wonder whether having the language structure enforcing minimal syntax on everything is somehow facilitating programmers in getting used to new things. Just the fact of *NOT* having the syntax call them out as not being remarkable or exceptional, might help programmers not *think* of them as remarkable or exceptional.

Interesting

Lisp programmers use fairly deep concepts of semantics in casual, flexible, fluent ways. I hear the same thing is true of forth programmers.

I'm not completely sure what you are talking about. Would you mind giving a few examples?

Well, okay...

Procedures as values that can be held in variables, stored in structures, etc.

Persistent closures.

Programs as expressions.

Functional programming.

Persistent closures as objects, long before OO became popular elsewhere.

Lambda and anonymous procedures.

'map' back before it was cool and all the kids started doing it.

Procedures with mutual recursion.

Macros and general translation of data into code.

Macros that generate macros.

Reflective interpreters.

Enumerations and generic accessors, long before it was cool and all the kids were doing it.

Distinguished equality predicates that test for different *kinds* of equality.

Type of an entity as an idea distinct from its storage format in memory.

And the list goes on. Each of these things emerged and became accepted practice treated as primitives, very *early* in Lisp history, for the most part long before programmers elsewhere started using them, and it seems to my (subjective, probably biased) memory that Lisp programmers got *comfortable* with these concepts much more rapidly than (later) programmers who encountered them using languages where loud syntax called them out.

Counter example

As an occasional Scheme user I would guess that this rigid simplicity calls apart when one needs an extension domain not well supported by the syntax. I may be wrong but the thing I miss in Scheme is pattern matching.

I can pattern match manually in Scheme but it seems really messy. In Ocaml, its relatively simple. Racket is a Scheme with pattern matching, but it required an actual syntax extension if I understand correctly. Scheme macros don't do the job, because they're not recursive, pattern matching is.

Pattern matching....

I remember thinking pattern matching was a problem in Lisp. Then I started using OPS5 Lisp when I needed pattern matching and the problem went away. :-)

Alternatively there's a package of macros that does essentially the same thing as a reflective interpreter extending scheme.

Rigid simplicity

The classic "ball of mud" quote about Lisp works so well because it captures something about Lisp that's very hard to put into words. Lisp's simplicity isn't rigid.

Pattern matching implemented in macros

If the Scheme supports something like defmacro, then a macro can call a procedure during macroexpansion, and that procedure can be recursive. That procedure can implement a custom notion of macroexpansion, or indeed its own whole compiled language.

Pattern matching is a good example of where that comes in handy. To write a macro where one part of the macro body is a pattern, we want to know what local variables are bound by that pattern so we can expand into a let or lambda with those variables. If we're willing to invoke a user-defined macroexpander on the pattern, we can design that macroexpander so that part of its expansion result is a set of variables.

Lisps often use macros as simple sugar for higher-order function calls, but a language which already has convenient lambdas doesn't need macros for that. I think macros pay off once we want to do a tree traversal, like we do with a macro for pattern matching.

Will just leave these here…