In defense of semicolon

I've noticed a lot of noise recently about getting rid of mandatory semicolons in mainstream imperative languages because...well...we can! I remember when this move was made in Scala, it was a bit painful with corner cases and error messages, but most of the community was happier with it. But I thought we lost something, let's look at the good and bad of semicolons:

Bad:

  • That semicolons can be inferred deterministically means that they are redundant.
  • Visual verbosity and clutter.
  • extra (keyboard) typing.
  • Anachronistic.

Good:

  • Redundancy enhances error detection (and robust IDE tooling in general).
  • Redundancy enhances readability.
  • They aren't really that hard to type.
  • Inference that depends on parsing probably goes a bit too far (Scala) in that its not always obvious to human readers where semicolons are inferred.

Did I miss anything?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Another case of implicit vs. explicit

With IDEs we can adopt the hybrid approach: implicit resolution (of lines in this case) at edit time coupled with explicit, even if subtle, feedback explicitly indicating the resolution. I'd argue this is a best of both worlds approach.

Yes. The IDE can permanently

Yes. The IDE can permanently infer a semicolon as a service to the user. By this, I mean, the semicolons become sticky and future changes to the code will not blow them away, and this is probably a good thing. But many users are uncomfortable about this kind of reference; they also get annoyed when the IDE infers a closing brace.

These inferences could also be made invisible and transient--the IDE infers a semicolon closing brace so it can provide decent feedback. But this, and then, if the code is left untouch (a semicolon, closing brace wasn't typed manually), can add a permanent token as above.

invisible and transient ...

invisible and transient

... or visible but faded and semi-transparent, such that the user must type it or move on for it to become permanent.

Ya, that's how it worked in

Ya, that's how it worked in the Scala IDE (for parenthesis at least, we didn't bother with semi-colon inference, but probably should have).

Redundancy doesn't always enhance readability.

I don't think redundancy always enhances readability.

I'm against having semicolons, but I think braces vs indentation is a simpler argument. In brace-based programming languages, programmers still typically indent their code so humans can easily see what's going on. The braces end up being just for the compiler.

Indentation-sensitive languages make the compiler use the same indicator of structure that humans use. Adding redundant braces, I believe, harms readability (though maybe you already covered this case in your "visual verbosity and clutter" point).

I think that, roughly, semicolons are for compilers and newlines are for humans. But this doesn't always work out as well as the braces vs indentation situation. (And then there's omitting parens on function calls, which gets even messier.)

Well, look at Haskell, which

Well, look at Haskell, which is minimally non-redundant, and just see how hard it is for new users to grock Haskell's syntax compared to something more redundant like Java (where redundancy comes from syntax and formatting). I believe some amount of redundancy is beneficial to readability.

Semicolons are for humans, its obvious that the compiler can do without them. However, semicolons do make the job of the compiler easier in forming decent error messages, but I argue that is for humans also.

Too broad?

Haskell does a lot of things non-redundantly, and I think it's useful to look at the features independently.

For example, do you think adding braces and semicolons to Haskell would improve readability of the code (for someone who has been using the language for, say, one month)?

I must remark (in reaction

I must remark (in reaction to Kannan's reply) that Haskell syntax *has* semicolon and braces, that can be used anywhere locally as an alternative to the most common whitespace-sensitive syntax.

Sean, I am not at all convinced by your statement that "it is hard for new users to grock Haskell's syntax compared to something more redundant like Java". This typically sounds like a claim that is believed but has not been empirically validated (orelse I would be interested in references).

I think any unease you could observe may very well be a "familiar vs. new" difference rather than something about Java or Haskell syntax (as Java syntax is basically C syntax and therefore minimally different from what people already see all around). I have heard of Haskell being used to teach non-programmers (ie. a programming course in a mathematics programme), and have not heard of particular syntactic difficulties.

We also often hear the argument that Python syntax (not that far to Haskell's) is nice and easy to work with, and look like most intuitive pseudo-code. My guess is that there is some truth in this idea, though I haven't heard of concrete experiments. On the other hand, Chris Okasaki reports that indentation-sensitivity was a great help to teach programming to beginners, by forcing them to pay more attention to the code structure.

Of course optional doesn't

Of course optional doesn't mean required, but dealing with other peoples' styles is a curse to having "choice;" language consistency is another dead horse to beat later.

I would argue that some amount of redundancy leads to reinforcement, which leads to both (a) quicker learning of the language and (b) better readability when looking at some code you didn't write. Oddly enough, books used to be written without periods. Punctuation is actually a recent and extremely useful invention. Familiarity is obvious one factor that would have to be accounted for in an empirical study, right now I'm just working off of anecdote.

When I look at Haskell code, is that I always get lost without redundant syntactic signage and type annotations. I'm not really talking so much about semicolons in this case (type annotations are probably more important), I was only using Haskell to make a point (I would use Scala more to debate semicolons).

wrong link

Your link to Okasaki's report seems to be incorrect.

Thanks, I just fixed it. In

Thanks, I just fixed it. In the process I noticed that it had been mentioned on LtU before, but without really starting a discussion.

oh the humanity

and we all know how easy it is to copy-and-paste-and-auto-indent-correctly python code of any significant length. not that we should actually be doing much copy and paste, but it does have to happen sometimes, and i freaking hate whitespace sensitive languages then. and other times, to boot.

Whitespace Programming Language

It is possible to program using only whitespaces Link

Edit : Link is to the wikipedia page of the language not sure what the person below is referring to.

Is there anything that you

Is there anything that you find valuable in the link you gave? I've glanced over it and found little more than trivia and unstructured discussion -- with the historical remark that whitespace significance was promoted by ISWIM a long time ago.

If it makes specific points that are pertinent to this discussion, please feel free to quote them more precisely.

JavaScript semicolon insertion considered problematic

JavaScript code that returns a nicely formatted object literal can be problematic because of semicolon insertion that is algorithmic but does something the programmer doesn't expect. Consider this example due to Colin Ihrig:

function getObject() {
  return
  {
    foo : 1
    // many more fields
  };
}

This does not return the object, but rather the undefined value, due to semicolon insertion after return (and likewise after throw). Furthermore, the semicolons in for statements must be literal; newlines cannot replace them.

JavaScript ASI

The way JavaScript does automatic semicolon insertion is very hacky and hence shouldn't be taken as an argument either way. If you want to look at a more reasonable example, look at Haskell.

Consider indentation

In Python, where indentation decides block levels, I would argue that semicolons do not (significantly) enhance error detection or readability, because both of those are already served so much better by the identation/significant whitespace. In Python, it's also generally quite obvious to human readers where the end of statements/expressions would be inferred.

So, I would argue that three of your "Good" points aren't significant in Python, and then the arguments for semicolons are quite weak.

I don't get it, indentation

I don't get it, indentation is about blocks, not statements, which are demarcated by new lines right? Then I have know idea why python would be any better than scala, unless you are saying python doesn't support unblocked multiline statements. If so then semicolons aren't necessary.

I'm not sure about Scala.

I'm not sure about Scala. Python does support unblocked multiline statements. But I'm not sure how that validates the case for the semicolon?

Programming Languages are Dead

The very fact that we're even having this debate says to me that programming languages- at least the popular ones, the ones defined as some combination of features found in either Algol-68 or Smalltalk-80, is dead. The design space has been exhausted, the optimal points in the design space have been discovered, and the only thing left to distinguish new languages are utter trivialities. Like whether semicolons are needed or not, or how fast the compiler runs.

This isn't to say that language design outside of the mainstream is dead, quite the contrary. One only has to compare, say, Haskell and Ocaml, to find large, important differences. Type classes or modules and functors? Lazy or strict evaluation? Monadic side effects or unlimited side effects? Which is better or worse is subject to debate, but my point here is that differences are not trivial. Note, basically no one says "I use Ocaml because I don't like Haskell's significant white space", or vice-versa. Or consider Scheme vr.s Lisp- same deal.

If some one were to show up and go "Hey, I implemented an Ocaml, but changed so you only have to use one colon for list cons, instead of two!", everyone would be like "Um, why?" That's the level of triviality we're at here. "My language is better because of some small syntactic difference." Your language is virtually indistinguishable from what it came from.

Yeesh

I don't think Sean was at all attempting to elevate semicolons vs whitespace into the level of lazy vs strict or pure vs impure. He was just raising a discussion about one aspect of concrete syntax design. If you think it's too trivial to discuss then by all means don't discuss it.

If, however...

If some one were to show up and go "Hey, I implemented an Ocaml, but changed so you only have to use one colon for list cons, instead of two!", everyone would be like "Um, why?"

If, however, somebody showed up saying "Hey, I implemented a Haskell, but fixed it to properly use single colon for type ascription", then that would be a BFD indeed. :)

Innovation in Mainstream

Finding yet another example of Parkinson's Law of Triviality does not indicate the design space, even for languages near Smalltalk-80 and Algol-68, is even close to fully explored or optimal, not even within the mainstream features. A few areas of significant variation today include traits/inheritance models, reflection, concurrency, persistence, partial failure handling, module systems, libraries, and even syntax (which is often more significant than a semicolon).

If you leave the mainstream, there's lots going on

Granted. But I don't see those ideas having much of an impact on the mainstream. Maybe I just read too much hacker news and reddit, but as far as I can see, to the extent that there is a debate in the mainstream about what languages to adopt, many if not most of the arguments are about trivialities- semicolons and compilation speed.

And I don't even mean to cast aspersions on the importance of syntax in language design. I suppose the no-semicolons claim could be a stand in for a much larger collection of syntactic changes, which in aggregate make for a sufficiently improved programming experience. But I haven't seen that argument made. And I would expect if that were the case, there'd be something less trivial that could be the symbolic change- maybe significant white space or something like that.

Orders of magnitude

Orders of magnitude improvement in any facet, even compilation speed, is never a trivial detail. Quantity impacts quality! If the arguments are about small differences in compilation times, it's trivial. If the arguments are about the orders of magnitude difference between Go and C compile times, it is not trivial - at that point you'll have a significant and visceral impact on programmer experience and productivity.

But, that aside, people will persist in arguing trivialities. I think you'll find that people arguing trivialities on hacker news and redd.it are also consistent in failing to convince anyone to adopt a language. There is at least a subconscious recognition of the triviality in most participants. But people tend to argue where they feel they can contribute, even if only for the trivial.

not bike shedding

Really, bike shedding is focusing on a trivial aspect of a language when there are more important things to argue about while I'm bringing up the semicolon topic in isolation of other issues, which admittedly would often take priority. However, making semicolons optional in Scala was a big deal with many important consequences. Its a detail to be sure, but a very important detail.

Once we figured out how to eliminate the need for semicolons via clever parsing (talking Scala here, Javascript is much less sophisticated in its semicolon inference), many people thought it was a no brainer to just get rid of them. If they aren't necessary, why should they exist? However, I'm arguing that the extra signals they provide about programmer intent (I meant here to be a statement boundary) as well as an invested reader (ah, the statement boundary is really here) can be very useful, that we lose something and therefore "optional semicolons" must be considered as another tradeoff to be made. Or to put it another way, a minimal syntax is not necessarily an ideal one.

No one is going to bother building a language out of one syntax improvement. Rather, there is a buffet of features to grab on to. In the context of my current research, live programming, I need the compiler to be extremely responsive, I need the compiler to be incredibly robust with respect to transient errors, so having non-optional semicolons might be very useful in that regard.

Feature Buffet

While semicolons might serve your purpose, I think you could find other features that would serve the same purpose without the visual clutter of semicolons. Syntax can be designed with localizing errors in mind, as can be IDEs.

One useful technique is to build localization rules into your probabilistic, error-robust parser: multiple blank lines indicate progressively lower probability of a relationship. Even without an offsides rule or significant whitespace, this should often be sufficient.

I would argue that such

I would argue that such techniques are orthogonal and probably composable in their benefits.

I agree with that.

I agree with that. Unfortunately, such techniques are also composable in their disadvantages. Be careful what you choose.

Casual Dismissal of Syntax

It sometimes seems like PL folks are all in a hurry to prove how little they care about syntax. This is sometimes a benefit, because we all want to avoid "bikeshedding," but it also speaks to an endemic resistance to actually considering user interface/experience as an aspect of language design.

Syntax is user interface; so are error/diagnostic messages. User interface is never a "triviality" when a human is in the loop.

As a small, concrete example: anybody with even moderate experience using a language that has comma-separated lists will probably have run into a situation where they want to be able to have a "dangling" comma at the end (that is, treat the comma like a terminator rather than a separator). This may be a "triviality" in some respects, but it makes it easier for a programmer to reorder items in the list, or to merge changes in a version-control system, without a lot of tedious busywork.

Making a language with the right underlying concepts, and then failing to think carefully about the interface through which programmers access those concepts strikes me as like building an elegant race car that you can only enter/exit by climbing through the window.

race car

Strange analogy since most track race cars designs don't have doors that open and the driver enters and exits through either the window or open cockpit. That's done not because of a misstep in user interface design but to improve safety and weight. Which is to say it's a good design for that domain.

To bring this back to ground. racing is often justified as a research bed for technology improvements that might make it into consumer cars. I don't expect anybody will be climbing through the window of their family minivan, but it does seem as least plausible to do research on say engine monitoring technology in a racing domain without worrying about user interface elements like cupholders and doors.

Same deal with language research - not all research needs to deal with every aspect of language design equally.

Should have said "sports car"

Yeah, not exactly a perfect analogy.

Certainly much research (investigating new "engine" designs) can be done on languages without all the amenities that the average programmer expects. That said, there is also plenty of room for research on the human/social aspects of language design.

For anybody who isn't doing pure research, though, and intends to make languages that programmers actually use, it seems worthwhile to take a measured, pragmatic interest in issues of user interface.

Novices certainly have a tendency to get over-invested in issues of syntax, but that doesn't mean that disinterest in syntax is required for expertise.

The C# language is an interesting example to me. While we may disagree with this or that choice, it is clear that many of their design choices with respect to syntax, scoping, etc. were motivated by actual study of programmer practices. For example, C# has rules related to local variable scoping that are intended to reduce the likelihood of certain errors caused by copy-paste programming.

Bringing up C#, its also a

Bringing up C#, its also a language where eliminating the need semicolons is pretty straightforward (with the rest of its grammar mostly unchanged, like Scala) but will probably never happen given that the developers they target really value good error messages and consistency in readability.

Syntax not Dismissed

There are many PL folk who consider syntax very important. John Shutt, William Cook, Sean McDirmid, Jonathan Edwards, and myself are just a few among them.

One of my goals is to unify UX and PX - bringing them together via live programming, reactive semantics, tangible values (or naked objects). I have ideas for programming grammars based on widget composition, and I'm especially interested in set-based or proximity-based composition (similar to traits models in OOP) where the system can auto-wire the components in a predictable manner. The effects of PL semantics on UI - on what is easy, and therefore on what is actually achieved - is deeper than one might expect. I'd like to elevate both PX and UX, e.g. to support programmable UIs, open composition, mashups, cooperative work (externalize state, @unhosted), secure interaction designs (object capability model).

Sean is another who considers UX a very important part of PL design. There seems to be a group of such people gathering (slowly) in the augmented-programming google group.

Semicolons can improve readability

With this kind of layout:

foo() {
  ; doThis (
      , bar
      , baz
      )
  ; thenThat ()
  }

Semicolons can be useful syntax.

They can be useful. Perhaps they aren't absolutely necessary in correct programs in most languages. But they certainly help localize errors in their role as a statement terminator/separator. They enable the compiler to recover from errors so it can report more than one error at a time, and in most languages reduce the ambiguity of intermediate parses so that the compiler can proceed faster.

Those are all services to the programmer and IMO worthwhile design points depending on what you're doing with your language.

You can't say "useless" or "needed" or "trivial" or "significant" without knowing the set of jobs that the language in question is intended to do. If you're doing compiles of million-plus line monolithic operating system kernels for example, the extra speed is no joke.

Unfortunately, even in languages that have separators/terminators for their statements, the error messages produced by popular compilers could definitely be better. Near-useless error messages, it seems, have gotten more common over time rather than less. Few of the lexer-parser generation tools available have a well-thought-out strategy for generating good error messages, and lots of older, revamped, or often-revised languages have a syntax that has grown over time to be very complex with lots of obscure corners.

So, yeah. Potentially useful, I just wish people would more reliably use 'em to their full potential for error reporting.

Ray

Wadler's Law Strikes Again

Treating unescaped line breaks as semicolons is a useful thing to do; but the guessing that some language (like JS) like to do can lead to surprising results sometimes.

Of course, if you include enough parens, semicolons (and other separator tokens) become unnecessary. :P