"C Is Not a Low-level Language"

David Chisnall, "C Is Not a Low-level Language. Your computer is not a fast PDP-11.", ACM Queue, Volume 16, issue 2.

"For a language to be "close to the metal," it must provide an abstract machine that maps easily to the abstractions exposed by the target platform. It's easy to argue that C was a low-level language for the PDP-11.
...
it is possible to make C code run quickly but only by spending thousands of person-years building a sufficiently smart compiler—and even then, only if you violate some of the language rules. Compiler writers let C programmers pretend that they are writing code that is "close to the metal" but must then generate machine code that has very different behavior if they want C programmers to keep believing that they are using a fast language."

Includes a discussion of various ways in which modern processors break the C abstract machine, as well as some interesting speculation on what a "non-C processor" might look like. The latter leads to thinking about what a low-level language for such a processor should look like.

Comment viewing options

C

That's a good article and illustrates a number of things. One is the observation of what happens when a bunch of people without even first year maths try to write a specification without understanding how to define representations. Technically the representation model is entirely non-normative because it's logically circular.

Worse, even if you excuse the circularity some changes from C98 to C11 introduced dangerous, exploitable, over-specification. For example instead of specifying simply that the representation of non-char integer types was monomorphic, the committee decided to fix three allowed representations (1- and 2-complement or signed mag) and then allow the "bits" to be scrambled possibly with holes added. This excludes an otherwise perfectly valid BCD representation of integers, but worse, a programmer can now *calculate* the representation and legitimately bit-tweak an array of char and cast it to an integer. They can also hide secret data in any "padding" bits. This makes optimisation very hard and it makes programs incomprehensible.

Of course programmers regularly do bit fiddling: everyone uses cheats like fiddling the low bits of pointers to store extra information (my own garbage collector relies on malloc returning pointers with the two low bits clear).

C has always used a PDP-11 like model of the world and I suspect the article misses the point when saying that modern processors no longer fit this model, so C is no longer a good low level language: the real point is that hardware designers have indeed constrained their designs so OS kernels run faster than their competitors, resulting in perverted development of processors. Indeed the failure of the IA64 could be because with two stacks, it breaks the huge number of cheats used in low level code. Again my own GC assumes there is only one stack because C fails to provide a function to obtain the stack range for the kind of analysis a conservative scan requires .. and it assumes the pointers on the stack are aligned machine words.

The C stack is the biggest obstacle to advanced programming techniques: my own system has to use malloc allocated heap frames to obtain micro threads, retaining C/C++ compatibility, whereas Go uses (very clever) method of suballocating p-thread stacks at the cost of losing C compatibility. In both cases the stack is an evil enemy and paged virtual memory defeats simple microthreading.

There's little hope when people continue to expect computing to be deterministic. The human brain and its peripherals demonstrate that to integrate both the massively parallel capabilities (for example edge detection in the simple three layer neural net that forms the retina) with the primarily sequential logic of conscious thinking we have to accept a degree of non-determinism: roughly speaking we have to give up the idea it is possible to serialise contended access to shared memory.

Determinism is good

I would argue that only specialized performance critical computing tasks should be non-deterministic. Getting the same answer when you do the same thing twice, possibly on two different machines, is hugely valuable.

Determinism

A friend, learning C, asked me for help in understand the results of some such expression as

a = (b++) + ((b++) + (b++))

It emerged that one could run this on different compilers, even on the same machine, and get about as many results as one tried different compilers, introducing several useful ideas such as areas in which standards, even if written unambiguously, are especially likely to be unreliably conformed to, and kinds of code one ought in practice to avoid writing so as not to invite problems.

That expression is undefined

That expression is undefined as per the language definition. See for instance the discussion in stackoverflow So it's perfectly fine for diffrent compilers to give different results.

compiling undefined behavior

AFAICT, when a compiler encounters undefined behavior, the only "perfectly fine" response is to complain to the programmer. Applying an ad-hoc interpretation seems extremely dubious.

UB is dynamic

The problem is that most of the UB in C/C++ is dynamic, i..e, cannot generally be diagnosed statically. And it exists because the language designers don't want to pay the cost of even the cheapest runtime checks either (or in some cases, because the language is constructed such that it simply cannot be checked at runtime).

Sanitizers like UBSan try to inject runtime checks where possible, but it's far from complete or cheap.

determinism

Determinism is not useful in many applications. For example a company does not need to know the precise value of its assets, only an approximation. Search engines don't find a fixed answer but many, and the list can vary over time. Tracking systems follow targets within bounds just as drivers stick within lanes wider than their vehicles.

Almost NO real world problems require definite answers, only answers close enough to be useful. We need to not only learn to think approximately, but to accept that machines cannot go fast if they have to find exact solutions. We need to be able to build models that require you to *pay* for greater precision. For example an image in your browser should be delivered over time with increasing precision at increasing cost, perhaps by fractal decomposition and serialisation .. because you just aren't always interested in wasting bandwidth on every image.

After all, computer models of phenomena are usually approximations in the first place. Numerical analysts have lived with non-determinism since Fortran .. quantum mechanics is intrinsically non-determinate. And of course the functional programming community lauds the value of non-determinism, for that is precisely what type systems provide: approximations weak enough that static verification is possible.

Determinism is not useful in

Determinism is not useful in many applications. For example a company does not need to know the precise value of its assets, only an approximation.

I think you're confusing determinism with precision. Determinism means that your imprecise calculation will always produce the same (imprecise) result given the same inputs. That is an invaluable property, even when complete precision isn't required.

^ This

Determinism =/= Exactness

You seem to be confusing two ideas. All of the examples you give above would benefit from determinism, even the examples where we don't generally have it. Haven't you ever asked someone to "google XYZ and see the third link"? It doesn't always work because Google isn't deterministic (or at least depends on other state), but it often does and it's nice when it does.

There are applications where determinism is critical, such as in certain games where all of the players maintain their own copy of a complex simulation. Approximation in the simulation is fine, but we need all of the players to get precisely the same answer or over time small changes snowball into drastically different game states.

Determinism should almost certainly be the default in a programming language. Sometimes we need to trade it for performance, but that decision should always be explicit and obvious.

Decent article, but has some unfortunate sensationalism

The fact is that most of these points are also true of assembly. And assembly is the lowest level programming language that 99.9% of programmers will ever touch. So, the conclusion would then be that there are no low-level programming languages in existence.

But that takes away a very useful way of thinking of and speaking about things. Level of abstraction is a continuum, and it's very useful to be able to speak about the difference between C, which is a relatively thin layer over the underlying computing model, and something like Java, which is a thicker level, and then you have your Javascript, Ruby, Prolog, Haskell et al which exist even further removed from that underlying model. This remains true even if the underlying model exposed by the machine code is itself a kind of virtual machine emulating an obsolete design, hiding away things like multi-level caches and out-of-order execution and a million other details that didn't exist in 1970.

A pet peeve of mine is when people redefine words to mean things that people who use them don't intend when they use them, and then writing exposés showing you how everything you thought you knew was wrong. I have no quibbles with the technical content of the article, I just disapprove of the tone and message behind it. Programmers aren't that naive: those who work with C or even higher-level languages in domains that require absolute performance do know about and think hard about things like cache coherency, ILP, and the like.

It would be nice to have an article that explained these things in detail without simultaneously trying to invalidate a useful shorthand way of speaking about things and also pretending like the audience consists entirely of naive fools.