A discussion from the trenches.

Over at lwn there is an currently a discussion going on about the little things in programming languages that make a difference. Commas, decimal points, braces, etc.

The content is subscriber only for the first week so won't be generally available until the June 16th.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

synactic conveniences for common idioms

I am a big fan of Lispy languages, but they do mostly fail to make common idioms simple to write or read.

Being able to write foo[23] to get the twenty-third element of a sequence, or write foo.bar to access the subfield bar of the variable foo, is not really something that adds anything to the semantic model as such, and having more syntax does complicate the macrology -- but the syntactic convenience and brevity of common idioms makes code easier for humans to read, and that counts for a lot. So, yeah, I think AREF and LIST-NTH are excessively verbose and 'hide' common idioms from the eye by putting words into the code where these things are so simple to experienced coders that they don't need talking about.

Likewise, I'm fine with the fully parenthesized prefix notation for most function and procedure calls -- but even so, I want an extension that allows people to write mathematical expressions (especially complicated ones) in the same format they've been using for mathematical expressions since second grade.

Fortunately, in Lispy languages you can write one. So instead of saying

(+ (* 3 y) (/ z (+ 3 x))) you can say

(math 3 y + z / (3 + x)) and it's just easier to read.

To make it more 'magical,' if you have a MetaObject Protocol in your language you can just give numbers call semantics that entail calling an infix-processing macro on the expression they start, and that will make the infix behavior into a 'builtin' that you don't even have to explicitly invoke. So instead of writing

(+ (* 3 y) (/ z (+ 3 x)))
if you have a MOP, you could just write
(3 y + z / (3 + x))

I don't usually approve of the use of MetaObject Protocols; They make things behave in ways that experienced programmers otherwise familiar with your language do not expect them to behave. This is an exceptional case, first because it can be done while leaving the original syntax working with the original semantics, and second because the behavior it invokes is both universally familiar and semantically clear once people know that the behavior is enabled.

Anyway, yes, I've thought about syntactic inconveniences and their alienating effect on programmers. I think most programmers could get over using parenthesized prefix expression as a standard procedure calling syntax, but hate it passionately at least mostly because they aren't willing to part with syntactic conveniences they're used to from other languages. Syntactic conveniences like array references, subfield references and mathematical expressions, for starters.

Even more basic

The discussion delves into things like should a number starting with 0 be a number in octal or a number in decimal. Should numbers start with 0x be hexidecimal. How should floating point and imaginary numbers be represented.

Should commas be legal in decimal numbers. Should underscores be legal in decimal numbers, to make them easier to read.

Should ; be a statement terminator or a statement separator. etc.

For lisp with it's very simple syntax and it's codification such a long time ago I don't expect most of these are options anymore. For new languages all of these little choices that don't matter in the big picture tend to very interesting during adoption and when designing because when used day in and day out the little things make a difference.

Compute(100,265)Is it a

Compute(100,265)

Is it a function on one or two arguments? Lisp gets away with avoiding these choices by not having significant syntax.

Even more basic

The discussion delves into things like should a number starting with 0 be a number in octal or a number in decimal. Should numbers start with 0x be hexidecimal. How should floating point and imaginary numbers be represented.

Should commas be legal in decimal numbers. Should underscores be legal in decimal numbers, to make them easier to read.

Should ; be a statement terminator or a statement separator. etc.

For lisp with it's very simple syntax and it's codification such a long time ago I don't expect most of these are options anymore. For new languages all of these little choices that don't matter in the big picture tend to very interesting during adoption and when designing because when used day in and day out the little things make a difference.

Octal? Why?

I think that we have reached a point in history where numbers in octal are no longer useful for any practical purpose, with the possible exceptions of UNIX file permissions, and that only by an accident of history. They were excusable when we didn't want to spare cycles to convert to decimal for I/O maybe, and for writing ADD and NEG and MUL and DIV and REM assembly macros on braindead little controller chips that came from the factory only knowing AND, OR, and NOT. But those days are over.

Octal numbers (and some cases of hex numbers) look too much like decimal numbers to be allowed as basic syntax, in my opinion. Reduce the opportunities for people to become confused.

Because groups of 4 bits evenly divide all the popular word lengths, hexadecimal is still useful occasionally in expressing numeric limits or bit patterns. But it's useful rarely, and IMO needs marked as being exceptional in syntax, so I would actually prefer writing something like (hex "DEADBEEF") or hex("DEADBEEF") depending, and letting partial evaluation sort it out when preparing the program for execution.

This is partly because numeric syntax in modern languages has other, newer complications, and it's good to try to keep complications limited in number. For example, it has become useful to mark whether a number is or is not a bignum (if bignums have distinguished semantics) and whether it is or is not exact, and it's been useful for a long time now to allow expressing a value with optional signed decimal and/or binary exponents (one because sometimes people are working with scientific notation or more recently with decimal-float formats supported on some modern CPUs, and the other because sometimes people are starting with or converting to binary-float formats).

Other useful complications include allowing Cartesian and Polar complex numbers. Maybe at some point Cartesian and Polar quaternions will be worth the effort. None of these new complications were really worthwhile to build into language syntax back when we were all bitbanging in binary, but are becoming increasingly useful as we're able to move to higher-level models of mathematics.

Anyway; I think we've moved beyond the era when building alternate-base representation directly into numeric syntax is worth it.

Octals please die

Agreed on octal numbers, but I still find hexadecimal useful enough, and 0xABCDEF98 has become enough of a de facto standard that I think it makes sense to continue using it. However, the idea of adding syntax for complex numbers and other compounds leads to madness. This use case, I think, is what drove C++'s insane system of implicit conversions and copy constructors. I think it's worth giving up a little conciseness in order to be unambiguous, actually.

Inheriting bugs of old languages.

At a quick survey. Go and Java both inherit this syntax from C.
C# does not but C# still provides the syntax &O for an octal constant.
I completely agree that octal constants seem pointless on a modern byte oriented computer.

Which is the point of the discussion. New languages when focusing on the big things by not revisiting the little things sometimes inherit old bugs.

Perhaps it is support a confusing syntax for Octal. Perhaps it is requiring a context sensitive parser that looks has to do a symbol table look up for identifiers to parse the language. Perhaps it is not providing a simple idiom for common practice.

Which is a bad idea

[At a quick survey. Go and Java both inherit this syntax from C.]

Which is bad idea.. D did it right: even though at first it supported C-syntax for octals, it added a new more explicit syntax and deprecated the ugly C/Unix 0 notation.
It is strange that new C revision doesn't fix old warts in the C language, this isn't magic: add better notation (and a macro for old compilers) and a warning for the old notation, eventually make the old notation an error by default.

everything is a tradeoff

Oversimplifying slightly, every decision has a bad effect in some context, if that context is reachable. There are many ways to restate this. Each decision favors one quality over another, so that quality causes a disadvantage when a situation penalizes it, actively or passively. I don't know a precise name for this perspective; engineers tend to say "tradeoff" when nodding in this direction. Given imagination and pragmatic bent, all you need do is ask yourself, "Once I choose to pursue X instead of alternative Y, what bad thing can occur if circumstances permit?" Design reasoning is partly about gauging exposure to negative consequences of choices. For example, if I make this paragraph long enough, no one will read it even if each detail I add makes the concept more clear. Finding a goldilocks "just right" zone in tradeoffs is a real pain.

On the topic of little things in PL syntax making a difference, you can apply this idea to ponder effects of syntax, especially if rules permit syntax to remain subject to change, so each person can tune their own personal usage — or that of their social group — so clarity and efficiency is maximized for themselves, at the cost of ambiguity of some kind because of lack of constant definition over all contexts of usage. Talking about small details is fun. I like having syntax tuned so code is clear to me. But Irv and I might not agree on what looks clear, since a one-size-fits-all model isn't true. Voting on clarity causes tyranny of mediocrity when the hump of the bell curve has the most votes.

Now I'm mostly done with high level over-generalizations. However, keep asking, "What bad effect can this choice cause?" Address the downside. Present both pros and cons because there are always cons.

Brevity of syntax in a PL is a real boon. You can shorten a notation by using more characters to suit your needs, but that either denies use of the same characters to other purposes, or gives them multiple meanings depending on context. The number of contexts can keep growing as long as folks come up with new domains in which they want a terse local notation. This is basically just information theory structured as a game between players with ability to make new contexts defining character usage differently. (To a certain extent, making up a new PL aims to find a new locally stable set of syntax definitions, which happen to suit the way a language will get used and the vagaries of preferences in the user base.)

Supporting multiple bases in number notation is easy, but the variance in notation itself can cause problems. That octal is supported when someone wants it is great, but bad when it tricks a new coder into assuming it's decimal. When you support bignum formats, it becomes a performance sink if base used natively to compute doesn't match the one used for input and output, if it takes a lot of multiplication or division to convert bases. (I'm not going to say anything at all about data loss caused by imprecision in floating point formats, except to note, yeah, we can go there too.)

Smalltalk actually had support for arbitrary radixes less than some practical maximum. Octal 0666 in C would be 8r0666 in Smalltalk, where the decimal 8 before the r (for radix) means the base is eight so the value is given in octal. (You can write an invalid number token by using a digit that's too big for the base.) Instead of building in special notation conventions for different bases, Smalltalk just used the same radix escape prefix as syntax for all non decimal bases. If you standardize a general escape mechanism in notation, you can get by with fewer special cases in notation for privileged variations from a default notation.

Here I was going to talk about escape mechanisms in further detail, including the idea of a notation that says, "The following is a token I want parsed by this specialized reader whose name is XYZ." Before I say more, I'll see if anyone cares. The downside of this is that anyone can embed tokens requiring arbitrary readers you might not have installed; you might not even be able to find an extant copy of code for an old version of a reader if archived code is old enough.

Some tradeoffs have simple answers

Some things can be added with nothing but local cost. (Adding _ in numbers).

Some things by their very nature create problems. Aka C's octal syntax. Especially in an era where octal is no longer used.

Some things which would be reasonable are so obviously a problem no one will implement them. (Using commas in numbers).

So it is possible to find better trade-offs. And if you are designing a new language it is worth looking to see you can do something. Especially since many parts of syntax can not be changed until the next new language.

simple is my favorite flavor

I'm not sure I get your concern. Surely you don't mean you'll be happy provided numbers can embed underscore, and octal is forbidden?

I'm in favor of permitting things to occur. So I'd allow embedded underscores and explicit octal syntax like 8r0666. In particular, when disambiguating numbers from symbol names, having more ways to prefer seeing a number token seems like a good idea. But as you add extensions to see more number syntaxes, there can be non-technical conflicts between folks with different intensions and plans. And applying tools from different backgrounds can surface gotchas.

Edit: sorry, sometimes it takes me a day to notice when I'm being a prick. I apologize for my first short paragraph, which rudely implies my take on your state of mind matters, when it doesn't. The way I phrased it would have made it hard to object, which is where being a prick comes into the picture. What I really should have said was: thanks for posting something interesting to talk about, and I hope you keep doing it.

Link to the full article

LWN allows subscribers to make a link allowing access to the article for non-subscribers, so I did:

Little things that matter in language design

You should really subscribe to LWN, though. It's good (and cheap).