Lambda the Ultimate

inactiveTopic Type Safety anecdote
started 10/18/2002; 2:07:55 AM - last post 10/20/2002; 10:50:13 AM
Ehud Lamm - Type Safety anecdote  blueArrow
10/18/2002; 2:07:55 AM (reads: 1327, responses: 12)
Type Safety anecdote
Now for fun: what is '1' + '2'? The answer may surprise you. C# and Java agree (insert obvious joke here), but get a different answer than C, Perl, or Python do (which get three different answers).

The next time someone insists that type safety isn't important, let him try to answer this question...


Posted to general by Ehud Lamm on 10/18/02; 2:08:09 AM

Frank Atanassow - Re: Type Safety anecdote  blueArrow
10/18/2002; 6:56:25 AM (reads: 1317, responses: 2)
The only thing surprising about this is that anyone would be surprised by it!

Different languages will assign different semantics to the same piece of syntax. Surely that's to be expected... (and suggests why the so-called "principle of least astonishment" is vaccuous).

Ehud Lamm - Re: Type Safety anecdote  blueArrow
10/18/2002; 7:15:46 AM (reads: 1347, responses: 1)
Sure. The people who find it surprising are exactly the same people who don't realize that C is weakly typed... That's why such examples can be useful.

More seriously, not all of the possible semantic interpretations of this expression are equally good. The behavior tells you about the type system and type checking of the different languages. So the moral is not that different languages give different values, but that some interpretations make less sense than others.

Frank Atanassow - Re: Type Safety anecdote  blueArrow
10/18/2002; 8:51:28 AM (reads: 1292, responses: 0)
I don't really see what this has to do with weak typing. The result in C, '1' + '2' = 'c', is perfectly consistent with what the result would be in a strongly typed language which adopted the definition (here written in Ocamlish):
let Char.+ m n = Char.chr (Int.+ (Char.code m) (Char.code n))

pixel - Re: Type Safety anecdote  blueArrow
10/18/2002; 8:52:50 AM (reads: 1392, responses: 0)
Which semantic do you find good?

As for me, I think "char" being a subtype of "int" in character encoding is no good (which is the behaviour of C, C#, Java)

  • keeping the ASCII representation, it would better be handled the "char *p" way: p+1 is allowed, but not p+p. '0' + 2 and '2' - '0' are still allowed, but not '0' + '2'.
  • in Python and Perl, there is no such things called characters. characters are 1-length strings.
  • in Perl, numbers are a subtype of strings. Strings used as numbers are downcasted (and generate a warning when using -w)
  • is overloading "+" good? I like it (from the expressivity standpoint), but this may badly clash with the mathematics idea.

Jo Totland - Re: Type Safety anecdote  blueArrow
10/18/2002; 4:54:42 PM (reads: 1236, responses: 0)
Ehh, what exactly has this got to do with type-safety. Different languages have different semantics. So what?

Personally, I find all three answers: 'c', 3, and "12" to be reasonable. I think 3 is a bit weird, but considering what Perl is good at (text-processing), it makes sense (and I've used it more than once).

The fact that C's semantics dictates characters to be the same as small integers is perfectly type-safe. If you really want it, you can always write:


typedef struct { char val; } Char;
int ord(Char c) { return c.val; }
Char chr(int val) { Char c; c.val = val; return c; }

... but few C programmers think that would be an improvement. Now, you may claim that C is weakly typed anyway, and you would be absolutely right. But this is not an example of it.

jon fernquest - Re: Type Safety anecdote  blueArrow
10/19/2002; 1:42:13 AM (reads: 1186, responses: 0)
> As for me, I think "char" being a subtype of
> "int" in character encoding is no good
> (which is the behaviour of C, C#, Java)
> ...is overloading "+" good? I like it (from
> the expressivity standpoint),
> but this may badly clash with the mathematics idea.

I think this is the important point. For thousands of years mathematics has evolved useful ideas that are broken in about 50 seconds with a programming kludge.

Type systems enforce mathematical purity and allow programming languages to converge on a canonical way of programming according to mathematical convention, the end of re-inventing "the wheel", a form of punishment used in the middle ages.

> Different languages will
> assign different semantics to the same piece of syntax.

Particularly to pieces of syntax that have no clear meaning. What does it mean to add two characters together? It perhaps has meaning in some context like C pointers, but not in the wider cross-language domain of character encodings which have no intrinsic meaning apart from being conventions.

(I shudder when I think of the Burmese language: In Burma nowadays the font encoding for the most popular Burmese Windows font is the default Burmese "Ascii". Burmese characters are composed of multiple glyphs composed into a syllable and you have to adjust the size of many glyphs depending on their syllable context, so characters or glyphs only have an orthographic-font design meaning and addition is even more meaningless. Roman orthography is not universal in the world of natural languages like it is in the world of programming languages. It's not much of an issue right now, but it will be......the largest countries on earth, India and China are non-Roman orthography countries...)

Isaac Gouy - Re: Type Safety anecdote  blueArrow
10/19/2002; 9:56:12 AM (reads: 1156, responses: 0)
> what exactly has this got to do with type-safety
I wonder what we are all assuming "type-safety" means?

> C's semantics dictates characters to be the same as small integers
If the programming language doesn't distinguish operations on integers, characters, or booleans, or enums (any scalar value) isn't type-safety inevitably the issue?

'1' * '2' ???

> Different languages have different semantics. So what?
Some provide type-safety as standard, some don't. That's what.

>is overloading "+" good?
>I think this is the important point. For thousands of
>years mathematics has evolved useful ideas
I would agree if we were redefining 1 + 3 to mean 1 * 3. Instead we're taking a sign that has a specific meaning in one context (math) and using it in a different context with a different meaning. Is it bad that ">" is used to mark quoted text in email?

Jo Totland - Re: Type Safety anecdote  blueArrow
10/20/2002; 7:08:07 AM (reads: 1094, responses: 2)
If the programming language doesn't distinguish operations on integers, characters, or booleans, or enums (any scalar value) isn't type-safety inevitably the issue?

To some degree, yes. But I wouldn't say inevitable (perhaps with the exception of booleans/enums). You could just as well say that C doesn't have characters as a basic datatype, and that the one with the name "char" actually represents the smallest integer available. In addition, there is some syntactic sugar to convert character constants into those integers. Does that make you happy?

I see type-safety as something that is intended to help the programmer prevent errors occurring by referring to one datatype using one bit-layout as another using another bit-layout. The canonical example of this is integers and pointers. If you treat an integer like a pointer, disaster will usually occur.

Similarly, there can be pointers to different kinds of objects. Treating one like the other can also lead to disaster.

These kinds of errors, can easily be detected by the type-system, whether it's dynamic or static (C is static and does it most of the time, but not always, but that's beside the point here). On the other hand, there are a lot of errors that can't be detected even dynamically, unless the type-system knows what the programmer is thinking. Such errors can be as simple as reversing the condition in an if-statement, or using a construct of the language the programmer doesn't understand (such as adding two characters).

It is to me very surprising that it is in this forum I have to remind people that language semantics aren't always mirroring the expectations of the outsider. And very often for good reasons. Treating characters as integers, means decisions such as character-set encodings, multi-byte characters, internationalization, etc... can be kept out of the language definition. And because of that, we can implement other languages dealing with that in C.

And finally, whether we use the symbol + or the symbol % to denote addition has nothing to do with type-safety. It does have something to do with the principle of least surprise, but that's another story.

Ehud Lamm - Re: Type Safety anecdote  blueArrow
10/20/2002; 8:02:52 AM (reads: 1130, responses: 1)
Since I started this mess, let me try and set it straight.

There are two issues here. First, is the C approach gooad, or at least reasonable. Second, does this have anything to do with strong typing.

As to the first question, my answer is that I don't particularly like the C approach. It makes the program semantics dependent on the character encoding (EBCDIC will be different than ASCII), it makes less sense when we move to unicode etc. But then again, to each his own.

As to strong typing, I have to agree that I shouldn't have used this concept. To quote a definition from Cardelli: Strongly checked language: A language where no forbidden error can occur at run time (depending on the definition of forbidden error) [my emphasis].

In this case the problem is not that the checking is weak, it is that C's type system is weak. Indeed, many here said that char is a subtype of int. However, C doesn't really have a notion of subtypes. It has automatic conversions. This is hardly the same thing, as I hope you realize. Char, Int and Float all look like different types, indeed that's what K&R calls them. The range of an integer type is, likewise, not defined by the language or definable in the language. The behaviour on overflow and underflow are also not defined by the language, and are machine dependent.

Of course, the term weak type system is not well defined, and this is a matter of taste. Strong checking is defined, and I agree that the code given was not an example of weak checking (even though C is weakly checked).

And yes, a unityped language can be strongly checked...

Jo Totland - Re: Type Safety anecdote  blueArrow
10/20/2002; 10:33:33 AM (reads: 1092, responses: 1)
In this case the problem is not that the checking is weak, it is that C's type system is weak.

My first respone would be: No, it is that people expect C to be intuitive, whereas it is not. And neither is any other computer language out there...

On the other hand, what you say is rather interesting. What do you mean by a weak type system. I would believe that one would have to come up with some definition of a strong type system, such that it will guarantee (i.e. provably, in practice only if correctly implemented) certain properties (of the authors/speakers choice), such as no uncatchable run-time errors, no reading/writing to random memory, etc.. Then one could use that to define a weak type system as a type-system in which proofs for those desirable properties are not possible.

Of course, C fails to have a strong type system (under the above interpretation) given the choice of almost any reasonable property the type system should guarantee. What I fail to see, however, is exactly which property you implicitly refer to when mentioning strong and weak type systems, that C's type system fails to guarantee, but that would make '1'+'2' illegal (or whatever you want except for returning "(char)((int8)'1'+(int8)'2')").

I still think this has nothing to do with type-systems, and everything to do with peoples wrong expectations.

Ehud Lamm - Re: Type Safety anecdote  blueArrow
10/20/2002; 10:50:13 AM (reads: 1126, responses: 0)
Sure. I expect that something that is called a type by K&R (like char) will indeed be a type and not a subtype...

Ehud Lamm - Re: Type Safety anecdote  blueArrow
10/25/2002; 2:28:24 AM (reads: 991, responses: 0)
Read about the design of a type system, as part of the design of a programming language.