archives

Do names and symbols really imply semantics? If so what to do about it?

Some languages, like APL, are written in Martian script. Their proponents insist that a set of characters humans have never seen before in their lives have intrinsic, unambiguous semantic meanings which can be inferred directly from their shapes.

I don't think that is true. Not to humans, anyway. Martians probably see differently.

Some languages have strong requirements about the semantics of operators, based on the familiar semantics and relationships of those operators. For example there was a beta release of Pascal 7 (from Borland, around 1995 I think) that allowed people to overload operators, but warned that the compiler would simplify expressions according to the precedence rules the language set for them and the distributive, associative, etc. properties implied by those operators before any definition overloads were looked up. If an operation was commutative, the compiler was allowed to reorder its arguments arbitrarily. If distributive expressions like (a*b)+(a*c) would be reduced to a*(b+c) before looking up the operator overloads. If you defined any two relational operators, the compiler would automatically define the rest using the identity axioms. You were not allowed to define three or more relational operators. Etc.

This strong assumption that the semantics of the overloads must follow, in most respects, the semantics of the operators whose names they were using, really upset a lot of people who wanted to use '+' to concatenate strings or wanted to use '<=' and '=>' (which is how Pascal spelled 'equal-or-greater' as some kind of redirection operators. A lot of it (but not all of it) got changed between beta and release.

I was never really convinced that changing it was the right thing. It seemed perfectly reasonable to me that these familiar symbols should be constrained to have the familiar semantic properties that we infer when we're looking at an expression built of them. I thought people who wanted string-concatenation operator or a redirection operators should be using different names for their operators, like '$+' or '|>'. Sadly this was impossible as there was no way to define new operator symbols. This deficiency remains true in almost all languages that allow operator overloading.

In Scheme there is a strong assumption/tradition that any variable whose name ends in the character '?' will be bound to a procedure that returns a boolean value. The same idea in common lisp is associated with the trailing character 'p'.

The tradition in scheme goes on that any variable whose name ends in '!' is bound to a procedure that has a side effect. Many schemes will produce a warning if any such variable is ever bound to anything else. And some consider it an error if a side effect is produced by any procedure not so named, because the name is considered to be an important warning to the programmer that a side effect is possible.

Scheme used to have 'indeterminate digits' in numbers, written '#', so for example 123# denoted 'one thousand two hundred and thirty-something,' an inexact integer. This got scrapped once it became clear that implementors had no interest in keeping track of how many significant figures of decimal accuracy a calculation represented. They were only using 'inexact' to mean 'ieee754 floating-point hardware representation' and were openly hostile to the notion that anyone might hope for it to mean anything more. Many regarded it as a contradiction in terms that something could be both 'integer?' and 'inexact?' And I think in the current scheme standard it may in fact be a contradiction in terms. But getting rid of it meant abandoning the possibility of tracking significant figures of accuracy. So that character used to have a semantic meaning and purpose, but nobody valued that purpose.

Early basics had 'type sigils' so that the programmer (and the interpreter) could know the type of a variable just by looking at the name. $foo always referred to a string, .foo always referred to a floating-point number, #foo always referred to a binary value, etc.

People made an awful lot of fun of those type sigils in old basics. Until the same people turned around and started using Hungarian notation to keep track of the types of their variables in C++. Because keeping track of the type was HARD, and they wanted to be able to see what type something was by looking at it. So they defined their pszName and hwndInteractionWindow and their dwIDnumber and so on and didn't think about basic's type sigils at all because this, after all, was something different.

And after all these examples of naming semantics. How much semantics is it reasonable to expect to be able to infer from syntax? How much is it reasonable for the compiler to enforce, based on the name alone? And what relationship does programmers liking it have to helping them produce good code?