The Unreasonable Effectiveness of C

It seems to me to be sufficiently on the side of reasonable argument rather than just blithering rant, so I'm posting the link here. A Damien Katz blog entry: The Unreasonable Effectiveness of C:

For years I've tried my damnedest to get away from C. Too simple, too many details to manage, too old and crufty, too low level. I've had intense and torrid love affairs with Java, C++, and Erlang. I've built things I'm proud of with all of them, and yet each has broken my heart. They've made promises they couldn't keep, created cultures that focus on the wrong things, and made devastating tradeoffs that eventually make you suffer painfully. And I keep crawling back to C.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

close alternatives

even after reading the blog post i still kinda detest C, so now I'm interested in C0 and Cilk as close-but-not-as-bad alternatives. i woulda said other things like ATS, but they use a GC which are disallowed when playing by the rules of the blog post, i think.

[update: oh C0 probably doesn't really count since it doesn't claim to be able to work with C libraries.]

Pretty sure ATS doesn't

Pretty sure ATS doesn't require a GC if you stick to the subset that doesn't require the runtime.

No GC needed

You are correct you don't need the GC for ATS. Much of the runtime also doesn't need the GC. The libraries often have a GC version of a data structure and a non-GC version. Almost all of my ATS programming has been done without using the GC.

Often the not-as-bad C

Often the not-as-bad C alternatives are missing the point as alternatives. C is neither a high nor a low level language because it exists at two levels at once. Each structured object can be manipulated as a buffer, which is both the reason for C's benefits and dangers. "Close to the metal" just means this correspondence.

So it isn't quite that one could preserve the benefits of C by adding guards everywhere or use more complicated internal representations but instead help programmers to achieve what they want by using e.g. monadic style programming where for certain ( if not all ) manipulations of a struct it has to be wrapped into a monad which may provide additional safety checks - or does nothing which would reproduce the current behavior.

Ordinary safety checks as envisioned by better-than-C alternatives may not suffice in certain domains. In security oriented programming, data elements may be guarded by a checksum, which is checked on read and re-computed on write. A different checksum may indicate corruption or attack and irreversibly kills the system i.e. leave a trace in a section of persistent memory which is checked on startup. So while C doesn't have anything to offer here out of the box programmers at least don't mess with semi-intelligent compilers and their default assumptions. This is the primary reason why many people believe that C cannot be significantly improved. Static metaprogramming may be the way to go. Maybe one might even get rid of the preprocessor?

Thanks to the pointer to ATS.

Thanks to the pointer to ATS it looks interesting, and I had not stumbled across it before.

Unreasonable admiration of C

Katz begins the article saying "C is the total package". Then he falts Erlang for having a bug in it and says that any language that abstracts too far from the computer has a fundamental problem. He finishes the article by saying he would like to make a better C, however, he says, making a better C might not be possible or even worth the effort.

So which is it? Is C the total package? Or is it lacking so much that it's not even possible to fix it?

There's obviously a reason why Couchbase is written in Erlang instead of C. It's that reproducing, in C, the functionality that Erlang brings to Couchbase would be a huge effort that he might not ever get right. So much for C's 'effectiveness'.

don't fully disagree

i'm not bought into the blog post e.g. a counter point (if we are blog fighting) is here, with counter-counter-points in the comments of course.

i wouldn't personally want to use C, and have been using e.g. haxe of late for fun, but at the same time i do pretty much agree that there is tremendous danger in using a VM or sufficiently smart compiler. :-) or in not having a real debugger, etc.

the irony of course is that in these days i figure C compilers are pretty crazy technology in and of themselves wrt getting performance out of the underlying machine.

"everything is a trade-off." and the trade-offs are then evaluated subjectively.

nevertheless, it got me considering what kinds of things don't rely on vms and jits and gcs. :-)

'restrict' makes the

'restrict' makes the aliasing performance complaint obsolete, although the analysis cost is pushed onto the programmer.

restrict is clumsy

The restrict keyword is quite clumsy for expressing alias information. For example, you cannot express 'a' and 'b' never alias, while both may alias with 'c'.

You can specify that *p and

You can specify that *p and *q never alias one another for the lifetimes of p and q.

C doesn't provide a standard

C doesn't provide a standard ABI, and many C compilers produce mutually incomprehensible output.

On the other points, he might do well to look into Go or Limbo.

Architectures provide a standard C ABI

The comment that C does not provide a standard ABI is rather bizarre.

Most if not all ABIs are defined in terms are of C ABI for a platform/architecture. I can think of 1 or maybe 2 ABI issues with C over the last 20 years. Nor have I ever worried about code compiled with different C compilers being ABI incompatible.

Perhaps you were thinking of C++ which has had a fair share of issues over the years.

The ABI for x86-64 for

The ABI for x86-64 for Windows (Win64) is different to the Unix ABI (arguments are passed in different registers). In fact the same is true for Win32 functions which use the Pascal calling convention, although most code still uses the standard calling convention for x86.

C ABI == OS ABI

The C ABI only exists as common ABI in the OSs written in C, that is not the case in the few OS that used other language.

Katz has some good points

Katz has some good points. C has essentially removed the need for almost all assembler programming. Entire operating systems used to be written in assembler. Today, even 8-bit microcontrollers are now programmed in C. The problem with object oriented languages has turned out to be painful levels of library complexity. See any Java stack backtrace, or read some C++ templates. Some languages seem to encourage excesses in that direction. That's an important point to consider in language design.

C's big problems involve reliability and security, not expressive power. The language doesn't express how big many arrays are, which leads to buffer overflows. The language doesn't express who owns what, which leads to memory leaks and dangling pointers. The language doesn't deal with concurrency, which leads to race conditions. Most real-world problems in C come from those issues. (I've previously described how to fix the first one, but that's another issue.)

Previously on LtU:

see also: http://lambda-the-ultimate.org/node/3915

You don't have to like it, but you have to respect it.

its a question of style.

Is C better than C++ ? I think its a trade off that runs down to the following question: does the longer compilation time of C++ buy you a higher level of abstraction that is worth the cost ?

The answer depends on the programming style;

If one is doing lots of fine grained objects - more objects and smaller ones at that, then C++ is better, as C will be too tedious.

If one is doing coarse grained objects then C is all right;

Also the benefits of exception handling is often a very controversial topic, and well, templates are hard to do well; so C is still the preferred choice for many programmers.

C provides one highly useful non-leaking abstraction.

C does provide one fairly comprehensive abstraction. So comprehensive in fact that most folks don't even remember it's there.

C's instructions operate on a fairly uniform abstract machine that papers over most complexities of the actual hardware across many diverse platforms.

We don't write code that deals explicitly with registers and L1 cache in C for example, because that is effectively abstracted over by C's uniform model of the machine and memory. How to manage these resources to achieve the semantics set out by the C code is quite properly left to the compiler.

I mention this abstraction of hardware in C precisely because it is that very rare thing; an abstraction which does not leak. And because it doesn't leak, it has become almost completely invisible to everyone except assembly-language programmers.

Pretty much every higher-level language provides abstractions that leak, exposing their seams and workings at inopportune moments and places, or in a way that costs intolerably, unnecessarily bad performance in some cases, and therefore inspires people to code below the level of the abstraction - which means not using that higher-level language.

Such abstractions never quite fade into invisibility the way C's abstraction over the hardware has, because every "abstraction leak" and "performance penalty" is complexity that must be internalized by the programmer. The programmer learns what usage patterns evoke performance penalties, and learns to avoid using those abstractions in those ways. Usually the programmer learns this by studying how the abstraction is implemented, which he oughtn't need to do if the abstracted resource were managed well.

It is widely considered true that you have to understand both a leaky abstraction *and* its implementation in order to use it effectively, and that unnecessarily increases the cognitive load on users of a higher-level language. If an abstraction doesn't leak, on the other hand, you do not need to understand its implementation.

It may be best not to provide any abstractions unless they are completely, totally leak-proof, and the underlying implementation is highly effective in taking advantage of the resources you're abstracting over. So effective, in fact, that no one can reasonably expect to do better with any reasonable effort, and no one has to think of those resources.

Only then do your abstractions achieve the "simplicity" of C's hardware model. We are vaguely aware that the machine that's running the code has registers, L1 cache, cache coherency hardware, and probably an MMU built into the CPU -- but because we can trust the C compiler to manage these things as well as or better than we could ourselves (or at least so well that it's not worth our time and trouble to think about them or learn how to manage them) these things have all become effectively invisible.

Ray

Non-leaky?

C has a lot of implementation and target dependent features, such as size and endian-ness of integers, potentially the layout of pointers. Algorithms by C developers must often be explicitly cache aware. C's model of the machine is a useful abstraction, but I wouldn't call it a non-leaking abstraction.

The idea that we need to "understand abstraction *and* its implementation in order to use it effectively" isn't really about whether or not the abstraction is leaky, but whether the abstraction is compositional. By definition, 'compositional' properties are exactly those for which you don't need to know the implementation of each component - i.e. P(X*Y)=f(P(X),'*',P(Y)). If your abstractions are not compositional, then most software grows monolithic because it requires a deep understanding of implementation.

C's abstractions are not very compositional, unless they fit into the narrow band of expressiveness associated with simple procedures (no signals, callbacks, main-loops).

Even local variables leak

Disagree. Even the abstraction of local variables within a function is leaky. One can take a local's address, pass it around (even let that pointer escape, for a truly nasty surprise later), and through bad pointer arithmetic destroy other local variables as well. Actually, I am hard pressed to think of a single abstraction in C that *does not* leak. One can always wield pointers and blast holes in the world underneath--and worse yet, some styles, many deep library functions, and the implementations of C's lower level facilities and operating system support, and a lot of code in the wild *does*.

In C, non-leaking pointers FORCE leaking local vars.

I think you misunderstood. It isn't local variables that I claim don't leak; it's the way C models the hardware that doesn't leak.

The model of the hardware presented by C includes the general concept of pointers. And because the hardware model does not leak, pointers can point absolutely anywhere on the machine. To restrict pointers would force the hardware abstraction to have holes in it. It follows that everything which has a specific location in memory, including local variables, can be pointed at.

To restrict the "escape" of those addresses, you'd have to force inconsistencies in the way values are handled -- at the very least you'd have to introduce things that happen even though there is no specific call to specific source code that instructs the machine to do them. The absence of routines that happen when they are not specifically called is one of the few constraints that C programmers can use (and rely on using) to trace and analyze their code.

This results from presenting the simplest possible mental model of a pointer for programmers to deal with, and having it be absolutely general within the context of the hardware abstraction. Absolutely every constraint anyone might want on pointers for "safety" would make the pointers more complex and harder to understand, and would force the hardware abstraction that C programmers work with to have conspicuous gaps.

In short, if you want higher-level abstractions (like local variables) that cannot be forced to leak, you absolutely cannot achieve them while simultaneously providing a non-leaky model of general pointers.

Ray

That C isn't anymore

The model of the hardware presented by C includes the general concept of pointers. … It follows that everything which has a specific location in memory, including local variables, can be pointed at.

Not in the recent (C99/C1x) specifications. For example,

extern void foo (char *);
int bar ()
{
        char str = 'c';
        char * ptr = malloc(1);
        *ptr = 1;
        foo(&str);
        return *ptr;
}

compiles to

        pushq   %rax # stack alignment noise
        movb    $99, 7(%rsp)
        leaq    7(%rsp), %rdi
        callq   _foo
        movl    $1, %eax # no need to read *ptr after call
        popq    %rdx
        ret

with clang 3.2 (gcc 4.7 is similar). str and ptr are different allocations, so, while foo can do anything to str, a standard compliant program can never get to the malloced data. Better: the C compiler is even able to eliminate the very call to malloc.

The standard actually specifies pointers as offsets in objects. Even pointer arithmetic isn't supposed to go more than one past the end of the object (allocation) whence the pointer originates:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

You might argue that the spec is mandating a very abstract model, but that actual implementations are more transparent. That might be true at the runtime level. However, as shown above, compilers do exploit the spec's abstractness to enable aggressive optimisations. The example in this post is more than benign, but look at all the bugs caused by programs that check for signed overflow after the fact: the spec says that overflow on signed arithmetic is undefined behaviour, and compilers thus assume it doesn't happen.

Signed overflow is a bit besides the point, but still interesting, so here's an example with gcc 4.7 (clang 3.2 is similar, but doesn't even need the redundant conditional).

long overflow (long x)
{
        if (x >= 0) return 0; /* redundant but bamboozles GCC into emitting wrong code */
        return (-x) > 0;
}

is turned into

        movq    %rdi, %rax
        shrq    $63, %rax
        ret

GCC is telling us that the result of negating a long is positive iff the argument is negative. That's almost always true: the exception is LONG_MIN (check your custom bignum code, that's affected Clozure CL until very recently)… but that's an overflow, and, as is commonly known, overflows never happen.

C-the-spec isn't close to the machine, and C-the-implementations have been taking advantage of that for some time now. The C that doubled as a convenient assembler doesn't exist anymore.

Besides the points made

Besides the points made above, C doesn't abstract over L1 cache any more than machine code. Caches are something that CPUs manage behind your back (with the exception of prefetch instructions).

Hardware details not important ? non leaky ?

If you want to write high performance stuff then you very much need to know about hardware details; cache lines L1-L3 caches; virtual memory TLB tables; how memory and the bus work. Modern hardware architecture has become terribly complex, if one compares it with the PC of the early 90s; well all this is important for writing servers, writing games, any system where performance matters and Java can't be used.

Please see "What Every Programmer Should Know About Memory" by Ulrich Drepper

http://www.akkadia.org/drepper/cpumemory.pdf

Bah

1) C doesn't provide the overflow or carry flags on integers, that's a very leaky abstraction on a basic type.

2) nearly all the languages abstract the register and cache so this isn't C-specific.

3) as for the other abstractions provided by higher level languages, it depends: in C++ you're not obliged to use them (many use C++ as a 'better typed' C) in other languages you're correct.

Is an apple better than an orange?

All PL were not created for the same set of problems. Why would anyone compare a language that can create operating systems with languages that can not. I read one blog post (cited above) that said Fortran was way faster than C. Can Fortran be used as the source for an operating system? Can Java? Haskell? etc

If C/C++ is to be criticized, wouldn't it make sense to compare it to one of it's peers rather than PLs that were clearly designed for other programming problem domains?

Does anybody believe that there could exist just one programming language that is better than all others in all problem domains? Is a Swiss army knife better than a Bowie knife or a good pair of scissors?

Shouldn't there be some context in the discussion of the relative merits of PLs and the features they possess? Continuations might be a great feature in a language that you are using to build another PL but is it appropriate in a language used to make an accounting system for a company? A web based game?

Fallacy of the Right Tool

Can Fortran be used as the source for an operating system? Can Java? Haskell?

Yes. Yes. Yes.

If C/C++ is to be criticized, wouldn't it make sense to compare it to one of it's peers rather than PLs that were clearly designed for other programming problem domains?

C or C++ can reasonably be compared to any other general purpose programming language. They're all peers.

Is a Swiss army knife better than a Bowie knife or a good pair of scissors?

Tools are a misleading metaphor for languages. Tools go away when the project ends. It is easy to switch between tools many times during development. Tools rarely need to be integrated (though it can be convenient to package them together, like a Swiss army knife or a toolbox).

Languages are materials; balsa, bone, fiberglass, steel, Lego brick. Materials stick around when the project is finished, and may occasionally need maintenance or extension. Materials have varying qualities e.g. robustness, rigidity, resilience, flexibility, mass. Some materials are brittle and break easily if stressed; some might cut you upon doing so. Integrating different materials, i.e. into composite materials or fused components, is often a challenge with impedance issues (though may be worth the overhead in some cases).

Treating languages as tools to make a point in an argument is the 'fallacy of the right tool'.

One can certainly compare materials, and judge some more effective for some purposes than for others. But due to the integration challenges, it is quite fair to criticize general purpose languages (e.g. C/C++, or Haskell) for any problem domain, whether it be machine learning, operating systems, or web apps.

Similarly, treating languages as an apple vs. orange comparison is an analogy that breaks down very quickly. Apples and oranges at least have similar pre-processes (growth on trees, shipping, sales) and similar post-processes (digestion in stomach, nutrients). If you want languages to be a matter of taste, then you'll need to guarantee similar properties - e.g. different syntax but a common model for development, linking, and integration. There are some ecosystems of languages that have such properties (e.g. .Net).

Shouldn't there be some context in the discussion of the relative merits of PLs and the features they possess?

With a general purpose language, every context is fair game. You can bring up GUIs or web-app servers or OS or any other context if you want to make a particular point.

Continuations might be a great feature in a language that you are using to build another PL but is it appropriate in a language used to make an accounting system for a company? A web based game?

Interestingly, the idea of continuations for stateless web-app development is quite useful. I vaguely recall that some OCaml project was the first to use them. Continuations in distributed systems can model mobile agents. I imagine you could model a valid distributed account transaction system using continuations.

It can be difficult to judge whether a given feature is useful for a problem or domain. I believe we can make sound judgements only based on language properties, not language features.