archives

Binary Representation - Is it something to rise above?

There is a tension in programming language design. Are we describing operations, or are we describing semantics?

I have a strong opinion that these are separate spheres - that representation should be either fully above or fully below the language's native level of abstraction - and that it is therefore hard for a language to do both well.

At one extreme, C, Pascal, Fortran, etc -- all those old classic statically compiled languages -- are primarily for describing machine operations, and specifically machine operations on bits and bytes. To the extent that they have type systems at all, the types each refer to a specific range of memory - bits and bytes - mapped directly to a template that specifies the bit-level encoding of each value.

At the other end, R4RS scheme - I think that was the last version that this mostly applied to. R4RS scheme was, aside from a few warts, a language in which caring how something was represented in bits and bytes just wasn't a relevant question. The programming model was a simple one in which things were values, or containers for values, and if you cared how a character or a number was represented in bits, then it could only be because you were making a mistake.

Numbers were numeric values, not a specific number of bits with a specific mapping of bit patterns to numbers. Characters were character values, not a specific number of bits with a specific mapping. They had no numeric values. In principle you didn't know, or even care, whether you were working on a machine were everything was represented in nine-trit "trytes" at the bottom level; there was just no reference to representation anywhere in the language's semantics, because the abstraction barrier was drawn above the level of representation.

I recently observed that C was a language for describing machine operations, and scheme a language for describing semantics, and this is what I meant. The C programming model makes no sense if types don't represent specific memory-aligned areas with specific bit-pattern-to-value mappings. R4RS scheme's programming model would have made no sense if they were. They operate at entirely different levels of abstraction.

Another topic recently asked what would be the characteristics of a good successor language to scheme, or of a future direction for scheme - and I think that representational abstraction barrier is an important property of any high-level language which ought to be preserved.

If you're not building a binary interface to something (like hardware or a communications channel that's specified in binary) and you're not working in a hardware-oriented language where those units and specific mappings of bit patterns to values ARE the sole underlying paradigm of your whole type system, then I submit that there is a deficiency in the design of the language.

Conversely, (and here R4RS scheme failed horribly) in such a language when you *are* building an interface to something, the patterns of bits you read or write should get interpreted according to a very specific mapping. The language that abstracts above representation issues for other operations, cannot be used to interface with anything unless the programmer specifies explicitly what mapping of bits to values is to be used for I/O. And this is because you're interfacing to things that *DON'T* have a common language for communication that abstracts above the level of representation, so you can't keep the abstraction barrier above that for the channel itself. A particularly egregious failure of this principle happened in a graphics library written in Scheme. Scheme provided no 'blob' type, and there was no way to specify a bits-to-values mapping on the I/O channels, so the implementor used strings to hold bytes, assuming that the I/O would be done in Ascii + Code page 1386. And of course the code exploded when used on an interface where reading and writing characters was by default done with a different code page or in UTF16.

So... in designing a programming language to operate above the representation abstraction barrier, I would absolutely ensure that bit and byte width of representations entered into the language semantics only when invoked specifically for I/O purposes. If the word "bit" came up in any chapter of the documentation other than the one about how to do I/O, I would be certain that I had made a mistake. If it were used referring to something the programmer actually needs to know to get useful work done, I would be certain that the mistake was egregious.

And this is why I am so completely certain that "uniform vectors" are a mistake in Scheme. They mean the programmer has to care about binary representation for some reason other than I/O.