How can C Programs be so Reliable?

Laurence Tratt: How can C Programs be so Reliable?

Henry Spencer said, "Those who don't understand UNIX are doomed to reinvent it, poorly." And that's probably why so many of the programs written in C are more reliable than our prejudices might suggest - the UNIX culture, the oldest and wisest in mainstream computing, has found ways of turning some of C's limitations and flaws into advantages.

A nice post about one man's experience with C.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

C Done Well is a Thing o Beauty

No, I'm not being facetious. Look at the (open!) source code to any of id Software's game engines for examples of what I mean.

The problem, IMHO, is that to accomplish that, you essentially have to arrive at a Zen Buddhist's level of understanding and appreciation of the relationship between simplicity and correctness. C itself—the compiler, the libraries, the literature, the surrounding culture—won't help you get there. The primary reason I'm interested in PLT is that I'm interested in how the language itself—the compiler—can become your greatest ally in ensuring the correctness of what you write. This becomes especially important on development teams of size > 2-3, and I would note that my id Software engine examples were written by development teams of 2-3 also. :-)

C++ a little bit, too

Stroustroup has written too much for me to be able to find the paper / interview I have fuzzily in my memory, but he has done C vs. C++ comparisons which, at least at first glance, show how there is something somewhere inside all of C++ that could be considered kinda nifty -- the C version being horribly complicated with error handling, the C++ version magically hiding all that in a nice way.


I've been writing C code (God help me) for 28 years now, and my monkey brain is still depressingly active.

What I do find astonishing, however, is the interaction between C/C++ and Agile development. Increasingly, what employers seem to value is the ability to re-write code quickly. A few, but most notably Microsoft and Google, seem to be selecting for the ability to "deep dive" into an existing code base and make an incremental change. Not exclusively, by any means, but it's a significant focus. The idea seems to be very old-school Toyota: that the key to continued revenue is continued incremental advance, and that a large number of small incremental advances add up to perceived value in the eye of the customer. When the code is in C/C++, the implications of accumulated cruft are pretty scary. Microsoft, arguably, has done such a good job on static analysis because they really didn't have any choice.

A corollary of this is that software architecture and design are perceived as less-relevant (indeed irrelevant) skills, because (a) there is a largely justified negative sentiment about analysis paralysis, (b) churn means that architecture/design doesn't hold up over time anyway, and (c) up-front process is believed to be incompatible with quick-turn incremental releases (in spite of decades of contrary data on this). Having never had the commercial advantage that accrues from something like the S/360 Principles of Operation (i.e. a behavioral contract that you can enforce on your customers and partners), Microsoft doesn't quite appreciate what having one would mean. They had an opportunity to establish one with .NET and notably failed to do so. Google seems to be following this path. I wonder to what degree this is why both companies perceive C/C++ as "okay".

It bears note that these are easily the two most successful software companies on the planet, which raises the question: are they right? And if so, then a question for LtU: what elements should exist in language that is designed to support survivable churn?

Agile or steadfast?

I should think that the thing about C is that it is awful for agile development, and the UNIX culture seems to be somewhat wedded to an opposite approach to development.

Without having any —God forbid— new ideas, let's call steadfast development the approach for which:

  1. The bulk of software development is about writing programs for compilation into program files, be they executable or sharable libraries, which will have a known place in the OS;
  2. Each of these pieces has a clear purpose, and it tries to fulfil this purpose as directly as possible;
  3. If what some code wants from some other code changes, you can extend that code if it doesn't disturb anything else. A bug fix? Well, maybe the consequences of fixing the bug are worse than the bad consequences of the code failing, and its often hard to be really sure about this...
  4. If what you want from some program changes, you musn't change the program. Code reuse is bad, so perhaps you need to write a whole new program?
  5. Some code isn't like this, we call it scripting. Scripting is about trvial gluing together of real programs;
  6. And we're happy about doing things this way!

UNIX culture isn't completely steadfast, but it's not really all that agile. C seems to be OK as a steadfast language, if it is operated at very low speeds, but lousy as an agile language.

But is that a bad thing

You give this description as if it's a bad thing, but what you are describing are essentially the requirements for robust and sustainable systems. So if you are putting this up as a contrast to agile development, it's a strong indightment of agile methods.

I take issue with (4). It's not that code reuse is bad. It's that sometimes the requirement for preserving known behavior means that the first step of code reuse really should be copy&paste. That's not always bad. Sometimes it's simply what responsible behavior requires.

It's worth looking at the obverse. Agile may get you short-term flexibility and quicker time to market and quicker turn times, but it rarely (in practice) leads to coherent or specified (not even post-specified) systems. That's not a sustainable or engineerable result. The agile folks argue that when you start you often don't know (or know wrong) what the requirements are. Well and good. But why is it that no agile project ever seems to learn what the requirements are? And yes, sometimes the requirements do keep moving, but I claim that (a) if the requirements don't converge, you are doomed regardless of method, and (b) if they do, then at some point sustainability must become an important consideration.

False dichotomy

I don't prefer agile over steadfast! I much prefer dealing with the UNIX core, to, say, dynamic scripting APIs to web browsers.

But I don't see this as an either-or choice —hence my bipartisan snarkiness— and while I agree with what you say about agile development, I think that what is wrong with what I call steadfast development is that it both: (i) takes a paranoid attitude to code construction, and (ii) avoids tools that help us build confidence in code correctness. The steadfast guys aren't really better at requirements than the agile crowd.

The consequence is that, say, the OpenBSD team write their ssh suite in C, ensuring correctness through a heroic coding discipline, which, surprise, surprise, hasn't ensured correctness.

Good and Bad of Agile

I think there are many very interesting and useful concepts and practices of agile development. Pair-programming is spot on, IMHO.

But at its very worst, agile programming can degrade into typical mainframe bank/insurance data processing houses (where my father spent his career). A small army of developers endlessly changes both the code base and the (giant) data sets to continually reflect the ever changing banking and insurance policies of the company. 24/7. Nothing is ever "finished." Blech.

There are almost no invariants! So a subset of the coding army is always "on call" to address the inevitable, routine and downright frequent middle-of-the-night ABEND's that comes from this degenerate form of software development. Pretty sad.

You make me feel like a dinosaur :-)

I more or less still adhere to this "steadfast" development philosophy, I suppose, although I think that there's a perfectly fine tradition of code reuse in the Unix/C derived culture.

At my old development firm, one of our software design motto's was "reliable within its (limited) function." Of course, we could apply this to software components, only conservatively changed, which in turn might be (initially) assembled in more "loosey goosey" ways.

A big virtue of the "steadfast" approach is that it makes one think very hard about an application or system's *invariants* and to ensure that these are rock solid before exploring the murkier portions of the problem space.


S/360 Principles of Operation

something like the S/360 Principles of Operation (i.e. a behavioral contract that you can enforce on your customers and partners)

I found this statement interesting; I downloaded this document, and at a rather cursory glance, this S/360 Principles of Operation looks like a regular technical reference manual.

(Sorry; a non-paywall link wasn't obvious, though I didn't do much searching)

So do you mean that the information laid out in the S/360 were basically immutable over time? What is it that makes this document special? How is it different from say, an ARM reference manual?

The business policy

Indeed the S/360 Principles of Operation is a reference manual. It is unique in several regards:

  1. Wherever the interface is unspecified, the programmer behavior required for future-proofing is stated.
  2. Wherever actual machines turned out to diverge from the specification, it was the machine that got changed.

So on the one hand, customers could rely on forward compatibility. And on the other, IBM repeatedly told important customers who had relied on accidents of implementation that IBM felt no obligation to preserve those accidents. The contract was stated, and it was skirted at the buyer's risk.

Contrast this with Windows, where the tyranny of printer driver compatibility forced Microsoft to code-sign a whole bunch of stuff that flatly didn't meet the requirements, and a huge amount of legacy code remains, in part, because the specification for what it doesn't isn't clear enough to make the risk of removal or replacement tolerable.

So what is really unique is both the quality of the specification and the fact that it allowed IBM not to be at the mercy of accidents of implementation or outright flaws in their design.

The ARM reference manual is a maze of underspecification on the one hand, and it doesn't say much of anything about the peripheral interconnects on the other, so it's not useful as a contract without lots of additional information. It's weak enough that ARM is effectively abandoning it as they move into the 64-bit realm.

I don't know that I would advocate today carving things so firmly in stone as the S/360 specification did. There is merit to staged evolution. But the key word is "staged".

C is like a koan.

Zen Buddhists tell each other stories that, to an analytical or western perspective, make no sense, in order to provide guideposts to enlightenment. These koans are as simple and powerful as water. They remain completely mysterious until you "get it" - at which time you discover that you still can't really explain it to anyone who doesn't.

C is like that. PLT folk will say it has horrifying flaws, that it makes no sense, that it is horribly unsafe. These things are true. But in the same way that water, or a koan, is simple and powerful, C is simple and powerful. It is the most direct available communication with the machine other than binary code and assembly language.

It forbids nothing. That is its great virtue and power and also its great vice and weakness. Attempts to cure its vice and weakness (java, objective-c, etc) also remove the virtue and power which was the only reason why such vice and weakness were tolerable.

Most other languages are "safer", make more "sense" from a PLT perspective, admit of greater analysis, etc etc etc, but in order to achieve those very different virtues, they must abandon the singular virtue of C. There are things they must forbid. The moment you forbid anything, you've created some task that cannot be done naturally or well in your language. Those tasks are why C exists, and, perhaps, why C will always exist.

It will make a nice exercise

It will make a nice exercise for students to find all the flaws in your argument, but seeing as all issues were discussed hundreds of times I'll just highlight one claim that should be added to the ever growing list of myths: "The moment you forbid anything, you've created some task that cannot be done naturally or well in your language." Perhaps most amusing is that this concludes a message that began with Zen...

C is like that. PLT folk

C is like that. PLT folk will say it has horrifying flaws, that it makes no sense, that it is horribly unsafe. These things are true. But in the same way that water, or a koan, is simple and powerful, C is simple and powerful.

Very true. I alternately love and hate it, depending on the days of the week and the phase of the moon. Right now I'm playing with a set of macros and a simple C coding style that gives you polymorphism and type class-like overloading. No joke.

If anyone wants to hasten my descent into insanity, feel free to point me to some very simple type class programs I can translate and use as examples, and I can finish my write up sooner. ;-)

I guess logic demands that

I guess logic demands that you be expelled from the class of "PLT folk" now...

I figured it would just

I figured it would just expel me from the class of sane folk. ;-)

Since in my experience these

Since in my experience these are disjoint groups, I never expected to find you (or any LtU regular) in that class anyway.

(Am I the only one who cringes when he reads, or writes, a sentence that conflates "groups" and "classes", both implicitly used to refer to sets..?)

Logical Uncertainty

Darn. If he had done this in Java, I could have replied that Ehud's logic is uncertain where his Sun is concerned.

Oh well.

Not quite what you are looking for

But the following post about inheritance of #defines is also in the category fun with the C preprocessor. It has a PL angle and is immediately practical.

Paul and Ray are both on

Paul and Ray are both on Zen, so let's make the Koan test.

You have abstractions and you have none. Tell what it is, quick!

The master showed a C encoded quicksort implementation on a large screen with a beamer. The student looked at it or a while. Then he became enlightened.

C does actually forbid some

C does actually forbid some things. Most affecting of my needs are that it forbids access to the overflow flags, ie. status register, so that you have to write overly complicated code to determine if you have overflowed (overflown?) a variable.
Also, performing tail recursion isn't really possible either. A goto will handle the simple case of local recursion, but doing mutual recursion can get really messy.

Now I'll admit many higher level registers don't allow the overflow thing, but if you use C as intended, ie. to get down and dirty, it can be a bit of a pain.


Language Safety in C

In a discussion thread elsewhere about language safety, I noted that C is both syntax-safe and comment-safe. There are rules in both places, and all conforming compilers follow them. Then somebody pointed out the preprocessor, and I had to settle for comment-safe and tokenization-safe. The somebody pointed out comment nesting, which breaks both.

But I continue to hold that conforming C implementations enforce full and complete safety with respect to the character set encoding of the input unit of compilation.

And given that C doesn't claim to do any more than that, I claim that a conforming implementation guards all of the lexical, grammatical, type, and semantic properties required by the standard, and is therefore a safe language.

(shap ducks abruptly)

C Safety More Seriously

While the character-set-safety thing was a joke, it arose from a serious question. There are two ways that we might define the term "safe language":

  • A safe language is one that enforces a particular set of abstractions and properties that we, in our collective herd-think as PL people (or in my case, PL hangers-on) think are important.
  • A safe language is one that specifies a set of abstractions that are to be guarded by a conforming implementation, and specifies the requirements on that implementation in such a way that a conforming implementation does, in fact, enforce them.

The first supposes that there is a language-independent definition of "safe language". The second supposes that there are only languages that are "safe with respect to ...". Which is it?

The first definition is essentially political in nature, and is subject to fads. The second definition is subject to abuse through careful crafting of claims (which is what I was illustrating with character-set-safety for C).

Of the two, I prefer the second. In my view, "type and memory safe language" means something. "Safe language" may or may not mean something according to context.

Safe languages and nerf languages

The idea that some languages are safe is pernicious. It is possible to write bad, buggy code in any language.

When people go too far in an attempt to create a "safe" language (whatever "safe" means to them) they often create a "nerf" language, in which it is impossible to do systems work or access available information that could be used to make simple tasks more efficient.

Sometimes a language is "nerf" because it simply doesn't define low-level operations, which, if defined, could be consistent with its runtime and semantics. Scheme, with its tiny standard, has a lot of this kind of problem. But there are also problems where languages are "nerf" because a principled but unfortunate choice of abstractions, which a language must enforce, puts optimal (or in the worst case all) solutions to some problems out of reach.

Here's an example of an unfortunate abstraction choice interfering with memory-management code. The portable scheme library in SRFI-38 has to identify shared structure in data structures for writing by using the scheme absolute-equality predicate eq?. eq? returns true given mutable arguments if the things referred to by the arguments are identical under mutation, ie, mutating one will cause a change in the value of another.

The usual implementation of eq? is an equality test on pointers, but that's implementation dependent. In principle a garbage collector can move things around, or a scheme runtime could move things around for other reasons. So although the eq'ness of objects guaranteed to be identical under mutation is stable, the binary value of pointers to those objects (and hence ordinal relationships between them) may not be stable over time. Hence ordinal, greater-or-less type comparison of pointers breaks a language abstraction and is rightly semantically meaningless in scheme.

Because it is impossible to directly compare the addresses or identities or even creation-times of objects using numerical operations or ordinal predicates, the operation of finding the object to which another object might be eq?, which could have been implemented as a hash or binary-search otherwise, had to be implemented as a linear search. And this means that the most algorithmically efficient implementation of this operation is simply out of reach in portable scheme.

Problems like this are rare in scheme. As a language that supports most programming paradigms well, its abstractions rarely actually interfere with expressing the best algorithm for something. But basic decisions that people make in designing a "safe" language often result in a nerf language.

C, in being designed and built for systems work, is one of the most completely non-nerf languages in existence. To the extent that its designers found it possible to avoid, it defines no abstractions whatsoever whose enforcement would interfere in any way with accessing and manipulating the lowest level of information in the memory or in the I/O ports of the host machine.

The alternative being...?

Here's an example of an unfortunate abstraction choice

I don't understand, you make it sound as if there was an alternative to this "choice". But the only "alternative" would be to require a non-moving garbage collector, which is absurd. So, unless you meant to imply that GC is "an unfortunate choice" altogether, I don't see what your example is supposed to demonstrate, especially wrt safety.

Not like that.

I didn't mean unfortunate in the sense of bad language design; Scheme is very well-designed. It is, however, unfortunate for all of us that garbage collection is an abstraction which, while making the language safer, makes it difficult or impossible to write the most efficient possible memory-management code in that language.

Automatic memory management is an example of an abstraction - a useful, reasonable abstraction that does make a language safer - which makes it difficult or impossible to write some types of code. If you have too many such abstractions you get a nerf language, where you are very safe, but have to link to routines written in other languages in order to do much of anything efficiently or directly. If you have not enough, you get a language that's dangerously unsafe and you have to build many of your own abstractions or infrastructure for effective use.

C is nearly unique as a programming language in being designed deliberately for an absolute minimum of nerf. Yes, it's dangerous, but no matter what you want to write, C doesn't have abstractions that get in the way. If you want abstractions in C, you have to build them yourself (hence Greenspun's tenth law).

One of the ways C programs become robust and reliable is when people use it to build solid abstractions of the sort that other languages provide by default, and use those abstractions. The "meat" of the program uses a good set of routines and abstractions like a DSL, with whatever home grown nerf doesn't interfere with the task at hand.

This is a situation where I

This is a situation where I believe C#'s unsafe feature is under-appreciated. You don't give up the ability to access raw memory, pointers, etc. It's opt-in, not encouraged, can be disabled by policy by hosts, but it's there when you need it.

Good idea, weak execution

Having used it some, I agree, but it could have been executed better. There are many types of unsafety, and it isn't clear to the programmer in C# what "unsafe" really means. So it tends to get sprinkled about a lot in places that don't need it. Also, programmers don't understand where they will/won't get an advantage from it. Finally, you can have an unsafe implementation without an unsafe signature, and that leads to all kinds of quagmires.

It's an idea on the right track, but not (unsurprisingly) quite right in its first realization.

C can get in the way

C doesn't have abstractions that get in the way

If you've ever, say, written a compiler that uses C as its target language then you know that this is simply not true. There are quite a few abstractions in C that can get in the way (and unfortunately, without giving the benefit of safety in return). Some of them have already been mentioned earlier in this thread. That's why the C-- project was started, btw.

Abstractions, or insufficiencies?

The only abstractions I can think of that get in the way when using C as a target language are its stack discipline, binary representation choices, and calling-frame structure conventions. It's hard to imagine a meaningful language without those abstractions in some version, so I count that much as minimum-nerf. I can't think of any language (except maybe Forth) that has less nerf.

But you can code around the binary representations directly (defining a counted-string instead of a zero-terminated string is a first step), so there's really very little in the way.

Most problems using C as a target language don't arise from its abstractions getting in the way; they arise from the very few abstractions it has (such as the correspondences between binary representations and numeric data type) being underspecified.

Assembly not meaningful?

Is assembly language not meaningful? ;-)

Abstractions found in C are, among others: the entire notion of function and the whole stack and calling conventions that come with it, local variables, goto labels (can only be static and local), arithmetics (hide interesting status bits of the processor), the type system, basic data types, and so on and so forth.

Standard example: you cannot represent proper tail calls in C.

Having done this...

The subset of C that you can use safely as a code generation target is much smaller than people think. For starters, C's sequence point semantics is terribly underspecified and fairly subtle. There are a lot of other things that are underspecified as well. Truth be told, the only merit of C as a code generation target is quasi-portability and bootstrapping.

Undefined behaviour

The way I interpret Pierce' informal definition in TAPL is that a "safe language" in the general sense is simply one that does not have undefined behaviour (not to be confused with non-deterministic behaviour). That is, the possible set of behaviours - i.e. the semantics - of any program is fully determined by the language definition, and not by random interference with the implementation's environment. I don't see much of a problem with this definition.

Is the percentage of

Is the percentage of reliable C programs higher, or simply the absolute number of them, I wonder...

Well clearly

Let N be the number of C programs, R the number of reliable C programs. If R>0, then,

100*R/N < R iff N > 100

Since there have clearly been more than 100 C programs written, it must be the case that the percentage of reliable C programs is higher than the absolute number of reliable C programs. :^)

Ha ha. The question was

Ha ha. The question was obviously "is the percentage of reliable C programs higher than the percentage of reliable programs written in other languages, or is it just that there are more reliable C programs that reliable programs written in other languages". Do formalize that, if you must.

There ya go

Ladies and Gentlemen,

Is the percentage of reliable C programs higher than the percentage of reliable programs written in other languages, or is it just that there are more reliable C programs that reliable programs written in other languages?

I guess you must.

I guess you must.

I guess I'll do both

I'm torn between apologizing for subjecting you to the corny humor and pointing out that this is affirming the consequent.. My apologies.

If only "if" behaved this

If only "if" behaved this way. This is starting to be amusing...


There are a lot of C programs that have had a lot of time to be debugged.

Granny & grandpa have had a lot of water flow under their bridge, you know.

It depends (sorry)

If by "reliable" we mean "type and memory safe", then the answer is certainly no.

If by "reliable" we mean "programs that have run without observed error in the field, unattended, for over 25 years", then the answer is yes, because (a) there exist at least two examples in C (KeyKOS, 5ESS), and (b) on grounds of youth, the only other candidate language would be Ada, and few of the programs written in Ada operate continuously. Certainly I'm not aware of any Ada programs that have gone non-stop for 25 years. Airplanes don't stay powered up that long.

If by "reliable" we mean "production programs that have been formally verified", then the answer for C appears to be one, for Ada appears to be several, and for all other languages combined appears to be zero, so comparing against Ada no, but against all non-Ada and non-C programs combined, yes.

I rule out academic exercises intentionally, based on what I took to be the spirit of your question.

If I read it well... it summarizes to:

"Think, or Core Dump" is not such a bad teacher when writing portable assembly.

It's not as portable as you

It's not as portable as you might think :)

That's definitely part of the point

That's definitely part of the point. However another part of the article (assuming I wrote it clearly enough, which I probably didn't!) puts forward the idea that the way that C libraries (in particular the Unix libraries) are forced to handle errors (e.g. no exceptions; everything's done manually) makes recovering from problems a tractable issue. extsmail, for example, can recover sensibly (modulo bugs) from pretty much any issue which arises except running out of memory. I wouldn't want to do that in any exception-based language that I know of. That insight, while probably not original, surprised me.

Two sides, one coin

puts forward the idea that the way that C libraries (in particular the Unix libraries) are forced to handle errors (e.g. no exceptions; everything's done manually) makes recovering from problems a tractable issue

The virtues you are assigning to C are basically the direct result of it being a very low-level language. And all of its faults are also a direct result of it being a very-low level language.

Being very close to the machine, there aren't a lot of layers of functionality to deal with, so the range of possible problems is smaller. This lets you deal with them exhaustively.

With a higher-level language, there are more layers between the program and the machine, which means that it can sometimes be quite hard to tell at what level a problem has occurred, or what problems might occur. There's always some unpredictable failure mode.

The plus side of the higher-level language is that you don't have to worry about many of those lower layers most of the time, whereas, as you point out, with C you do have to, which is its down-side.

So, in the end, what you are saying is that low-level programming is qualitatively different from high-level programming, which is true.

I think the danger with the way you have framed it, though, is that you are making it sound like these strengths and weaknesses are as result of the moral strengths and weaknesses of the respective languages and their communities, rather than being a function of the circumstances which they were designed for.

The - surprising to me - strength of C

The - surprising to me - strength of C as put forward in the article is that despite its failings (moral or otherwise), communities (virtuous or otherwise) have been able to use them to make a certain, very important, style of programming surprisingly easy. Whether that is the product of deliberate design or happy accident is hard to quantify, and would need a detailed historical study, which is beyond my capabilities (although I can't help stating my bias: my experience, perhaps cynically, would suggest that it's probably happy accident).

Higher Level Code and Failure Modes

With a higher-level language, there are more layers between the program and the machine, which means that it can sometimes be quite hard to tell at what level a problem has occurred, or what problems might occur. There's always some unpredictable failure mode.

Is that really the case? Some counterpoints:

  1. C is only close to two parts of the machine: the CPU and the Memory. Most failures encountered in C are caused by neither of these. I do not expect the precision for errors in C to be greater than they are in other languages.
  2. While a higher-level language may be a bit lossy about exactly why a problem occurs, that does not prevent the language from controlling where and how partial failures are exposed to your code. That is, failure modes can be made predictable.

Three Exception Monte

I do not expect the precision for errors in C to be greater than they are in other languages.

I was responding within the context of using Unix-style libraries that Laurence's piece referenced. Obviously if you use C in a more complex environment, you might run into the same "higher level" precision problems.

This also ignores C coding errors (off-by-one, pointer arithmetic, etc.)

While a higher-level language may be a bit lossy about exactly why a problem occurs, that does not prevent the language from controlling where and how partial failures are exposed to your code.

Sensible recovery from a failure (should I retry, should I ask for user input, should I crash the program, etc.) typically requires a fair amount of information about where and why the problem occurred. The more information is lost or fuzzy, the harder it is to exhaustively enumerate the possibilities of what happened so that you can choose an appropriate range of responses.

Sure you can have a blanket "swallow the error" or "crash the program" policy (as many programs do...), but that lacks the virtue that Laurence was extolling. ;-)

Sensible handling of failure

Sensible handling of failure must be based on the locally observable symptoms of an error, rather than its cause. Controlling failure-modes, then, means controlling how symptoms of failure exhibit to your clients. Sometimes this means trading one sort of failure for another. Often, controlling how failures exhibit requires escalating the failure: in general, partial failure is much more difficult to recognize and recover from than is a total failure (which is the basis behind transactional and 'do or die' error handling).

Blanket policies for how symptoms exhibit are quite reasonable: "throw an exception" or "return an error code" or "shut-down a sub-program and alert a supervisor" or "call an error-handler object for advice". In a distributed scenario, many errors can exhibit as disruption (link or node failure). [edit]: As Thomas notes, in Unix many errors can exhibit as an externally killed process.

Resilience, or failure recovery, mostly needs information about context. Answering your questions - "should I retry? should I ask for user input? should I crash the program? etc." - first require answering "am I under a time limit? is a user available? does the architecture support recovery from a crash? from where might I begin a retry?".

If you want to develop code reusable in different contexts, it is necessary to pass some of the context into the code that might exhibit symptoms of failure. This can be done via a pass-a-handler approach, or via dynamic scope, or via resumable exceptions mechanisms. Usefully, once you pass these components into the system, those previously "blanket policies" become specific to particular calls... i.e. a handler is free to ask a user, maintain awareness of time limits, log the error, crash the program, or even attempt 'retry'.

In a more OOP setting (where elements persist across calls) you can also provide context for error recovery during system construction, i.e. via dependency injection, or annotations to support cascading disruption (i.e. assert that object B should become disrupted if A becomes disrupted - permanent disruptions being subject to garbage collections).

In any case, cause for the error should never be part of failure handling or recovery. You must depend on symptoms and context. If the locally observable symptoms are sufficient to infer a unique cause, that's just a lucky accident.


One corollary to this is that recovery, when possible, is always local. In certain styles of programming, we can make a case for (a) local recovery followed by (b) abandon a speculative computation, but each is local with respect to the layer of abstraction at which the failure becomes visible.

A second corollary is that declared exceptions at procedures, in addition to being immodular, are silly. The only parties who will handle a traversing exception are very local to the point of raise, and programmers don't need annotations to keep track of such local effects. Conversely, it isn't helpful to have 30 lines of declaration of exceptions that you couldn't handle usefully even if you knew what caused them.

A third is that downward-recovery (e.g. SEH) is just a bad idea. Whatever is true about systemic knowledge at the level of abstraction of the program at the point of raise, going to a lower level of abstraction (and thereby throwing away comprehension) can't help.

Error Recovery

A plain 'exceptions' mechanism is a bit painful because you lose context, and therefore you cannot recover. But you should review resumable exceptions mechanisms, as seen in Lisp, Dylan, Smalltalk. Thread with related discussions and links.

That's a property of the

That's a property of the Unix libraries you're using, not the language. For the C language in general, the options are 1.) handle error condition, 2.) manually bubble error condition up to caller, 3.) silently (and/or accidentally?) swallow error. This is exactly like exception-based languages except that in C option #3 is a syntactically more convenient.


I agree with the 3 options you outline

I agree with the 3 options you outline but I would say the major difference with exception-based languages is that errors can't be propagated automatically in C - every caller of a function must handle it explicitly (unless one starts fiddling with the stack). So, I would argue, that in exception-based languages, it's very difficult to know what exceptions a function might throw because there's no need to manually enumerate them as there is in C (and polymorphism can really muddy the waters). Therefore I would suggest that the Unix libraries reflect the constraints imposed by C, rather than there necessarily being something unique about them.

How is that any different

How is that any different from any other solid library? Example. Any decent library has to document its failure modes and preconditions, and to be decent it needs to do it in documentation as well as in the source code.

How do you know that your sendmail utility is actually robust--that there are no error conditions being swallowed in the code you call? Did you manually examine the source tree for every POSIX function you called, and every function they called, down to the OS, and make sure that every error code generated was received and propagated or handled? If so, then yes, having manual error mode propagation would make it easier to keep track. My guess is that you didn't do that, that you're willing to take this on faith because you only call POSIX functions, which lots of people have been beating on for a long time, and because you cover all the error conditions which they bubble up to you and trust that any error conditions which aren't bubbled up were handled and not swallowed. I.e. you relied on documentation and the fact that the library works well for many people and has for a long time. The fact that C makes it easier to swallow errors than to propagate them should, if anything, make you more wary that there could be lurking failure modes (integer overflow!) because they just silently corrupt execution state instead of crashing programs and making people deal with them. That's why exceptions were created in C++ in the first place, because they make it harder to ignore failures accidentally.

Ultimately of course the only way to really know what error conditions are lurking is with program analysis tools--which would be easier to create for C than C++, admittedly, because C++ is so complex, although not as easy as for other languages that support introspection (Lisp? C#?).


In a sense, we almost agree

In a sense, we almost agree. Good libraries should document their error conditions. However exception-based libraries almost never do because generally it isn't really an issue, and you're not really expected to care about most exceptions; in C, you have to carefully document everything or else you're stuffed. Sometimes the right approach is to swallow errors; extsmail does that often (deliberately). And don't forget that in C, the choices aren't just propagate or swallow: one can also panic and terminate in the face of certain errors (and extsmail does that in the face of out-of-memory conditions).

It's not as if I'm anti-exceptions: I designed an exception-based language. But I still find it odd that I can write a truly robust program (of a certain, slightly peculiar, type, perhaps) more easily in C than in any exception-based language I know. All I'm doing is trying to reason about why that might be.

Component Failure

I think a lot of his critique has to do with the fact that exceptions are just handled 'better' in C.

I.e., the model mostly used in say Java with explicit exceptions leads to a programming style where an exception is raised, and the exception is handled 'non-locally' at a try/catch block where it's then the responsibility of the calling 'component' to solve whatever was at hand and put the called 'component' in a correct state.

Since C doesn't raise exceptions explicitly, it leads to a programming style where the exception is handled locally, the called 'component' is put in a correct state, and an error is reported to the calling 'component,' basically stating: "this component failed (because of this and that reason)."

There is a lot to say for the C model of handling exceptions locally, you can do the same in Java, but that needs a trained programmer.

I think a good convention might be to associate exceptions with 'components/libraries/objects/modules,' haven't seen [explicit language support for] that anywhere yet.

[Edit: changed local to non-local, i.e. local refers to the place/instruction where the exception was raised.]

Has anyone considered a limited exception design

C programs can pass references (pointers) to retrieve results while using the return value from a function as an error code.

Has any language implemented a severely limited exception mechanism:

  • likely for a functional language,
  • with no automatic propagation feature,
  • freeing up the "normal" use of any function's return value for the main program logic,
  • adding an "exception style" error handling syntax for immediate local error handling,
  • likely with mandatory coverage of all exceptions possibly thrown by the function (requiring a modicum of safety).

Expression level exception propagation, a la some kind of re-throw feature, might fit in with such a scheme, but I don't see it as adding much to this style of highly constrained error handling design.

This design only uses exceptions to

  • syntactically/semantically segregate error return codes/message from the "mainline" error free code logic, and
  • to enforce handling of all possible errors (presumably a match/case style expression would do the trick if errs are setup as algebraic data types).
  • optionally, error recovery/resumption feature might fit in well with this scheme (segregated on error pattern matching), but again only in the same highly constrained and local context of a single function's possible error modes and associated reporting to the caller.

So minus propagation, a very limited exception mechanism might still prove quite useful and beneficial.

Any non-local control flow (via continuations or throws or whatever) and unwind-protect/finally language facilities, if present in the language, would be designed completely separately from error handling facilities.

Seems sane to me, so I must be missing something :-)


re: Has anyone considered a limited exception design

You have almost exactly described the exception handling mechanism of the CLU programming language:

1. CLU is not a functional language in the sense that it has mutations to variables and values.

2. CLU does automatically propagate one kind of exception (named "failure"). If some other kind of exception is thrown but not immediately handled, then a "failure" exception is implicitly thrown.

3. Signaling an exception immediately terminates the function which does so and returns in a non-standard way to the caller. Thus, exceptions do not compete against "normal" return values. For example, I think this is (at least roughly) valid CLU code:

    // if foo() raises a bar_exception, set a to 0.
    // if foo() raises some other exception,
    //   throw a failure exception
    // normally, set a to 1 + the value of foo()
    a := 1 + foo() EXCEPT WHEN bar_exception: a := 0;

3. The EXCEPT WHEN syntax may decorate simple statments and may also decorate basic blocks (BEGIN ... END EXCEPT WHEN ...) and so it is quite similar to what people are used to these days.

4. Functions must explicitly declare all exceptions they might signal except for the "failure" exception. "Failure" is always an option in CLU. Unhandled exceptions are permitted (for they are converted to "failure" exceptions) but of course a compiler can issue warnings in such cases.

These features of CLU are used, as you say, to segregate error handling code from normal-case program logic and to not have exceptions interfere with normal-case return values.

CLU doesn't enforce handling all possible errors but it makes it statically possible to find them. The rationale for not enforcing handling is that it would be inconvenient. For example, if the function inverse(x: int) returns 1 / x it should probably be declared as a function that might signal a divide_by_zero error, but it would be tedious to have to write a handler for such for the statement a := inverse(4);. (This rationale and the one that follows next are taken from a paper for which I'll give a link.)

Finally, resumption might fit with this model but the designers of CLU chose to omit it for what boils down to three reasons: (a) It's complicated. (b) You almost never need it. (c) If you need that, you can instead have the caller pass a function into the callee.

The paper the describes the last two rationales is called "A History of CLU" by B. Liskov. Here is a link to PDF for it I think that paper has been discussed in comments on LtU but I didn't find it as a topic post.

Finally, just to inject my own opinion: I think that C would be vastly improved by the edition of CLU-style exceptions. It would not be a trivial thing to do because this style of exception requires a different calling convention than does ordinary C. It could still be done, I think, because it could be implemented using the same calling conventions needed to implement C++ exceptions.

Thanks - I'm encouraged!

Unhandled exceptions are permitted (for they are converted to "failure" exceptions) but of course a compiler can issue warnings in such cases.... CLU doesn't enforce handling all possible errors but it makes it statically possible to find them. The rationale for not enforcing handling is that it would be inconvenient.

I think both options are valid. One might view unhandled exceptions as syntactic sugar for some final pattern clause "| oerr -> abort (err-msg oerr)" in some hypothetical functional type of language.

OTOH, one could argue that there are few things less convenient than writing belt-and-suspenders industrial code, and that making it more convenient to write other, more casual types of programs is completely unimportant (folks can always write bash shell scripts or whatever for quickies, after all).

I think reserving exceptions for "truly exceptional" situations (out of memory, i/o errors, etc.) - and not for all possible forms of naughty programmer inputs (zeros, bounds errors, null pointer, etc.) that can be avoided or caught in normal program logic by the programmer - would make exhaustive exception handling much, much less onerous.

Thanks for the reference! (And yes, I think this would be a nice "conservative" addition to C as well.)


Agreed. It would be nice to

Agreed. It would be nice to have a clean separation between exceptions for exceptional conditions vs. mechanisms like Design By Contract for programmer errors that violate preconditions or invariants.


Though I agree with you...

I would like to elaborate a bit on your examples, and my proposed exception handling.

My comment was on that components should fail, and that failure should be handled locally. Your examples are valid, but do not address that, -I guess- in most cases, a library (or an object) is a body of related functions with a certain functionality. What I would like to see is handling of exception at the library level.

For example, say I designed a library/object around some persistent storage for some data 'Foo'. Then what are exceptions you would like to handle at the calling/using site? Well, I think exceptions like 'FooException: Storage failed for [some object] in store [some file],' or 'FooException: Store is out of space!' make a lot more sense than 'IOException: Couldn't open file [some filename]'.

I.e., it makes sense that if you define a library/object named 'Foo,' and you use that, the only exceptions you'ld expect are 'FooExceptions,' the exceptions which may be raised by using that component. I think it would make sense for a language to enforce that at the library/object level.

C, by not having exceptions, actually enforces such a style, and the strange observation is that Java, actually (especially with unchecked exceptions) does not really (at least, sloppy programmers allow other stuff to propagate upward).

C modules/exceptions (reply to marco)

You threaded under my message, marco, though I'm not sure you were asking me. I'll reply anyway.

I very much like the notion of modules that manage some bits of state in an encapsulated way and, in that context, have module-scoped exception processing. In old-school unix, taking whole programs as modules, the atexit() facility is a crude approximation.

A problem for C in this regard is that it doesn't really have particularly strong concept of "module". It has "compilation units" and it has a crude scoping rule where you can make some declarations and definitions local to such a unit - but those are such low level concepts that they really don't add up to modules in any singularly Right Way. So I'm not sure where you would hang modular exception handling on this framework.

A whole other ball of wax that's indirectly related is the weakness of C's facilities for syntactic abstraction (CPP). If C had much stronger facilities for syntactic abstraction, then perhaps one could carve out a lasting ecosystem of modules and put in modular exception handling at that level. I'm not sure you'd really need much more than just (a bit better than CPP's) syntactic abstraction.


I glanced over your post and wanted to make more clear what I meant, there was no real question.

Exception handling/language support was a thing which came up during my own language. I like Ocaml's approach, but also C's. In Ocaml, exceptions stand alone beside the program, in C, exceptional states are often enumerations within a library. To make my point, it is typical to have one exception per file in Ocaml, or one enumeration per file in C, shared among any number of functions.

Actually, your comment "" it [C] "doesn't really have particularly strong concept of "module". It has "compilation units" "" and "I'm not sure where you would hang modular exception handling on this framework" is also what I noticed too [1], and lead me to the conclusion

For a simple language, the correct manner of defining and handling exceptions is up to the digression of the programmer since exceptions are bound to 'components' (where I mean, modules/libraries/objects which implement a certain functionality through some interface). The mechanism should be there, the rest is up to the programmer.

You could syntactically enforce somehow what a module/component/library is and group a series of exceptions with that. I toyed with the idea of letting them coincide with namespaces. Unfortunately, any idea I came up with I decided against as being too 'restraining.' (Though what I now have -can throw anything anywhere- is too liberal. Actually, I think that Ocaml almost got it exactly right. I.e. if a component/module usually coincides with a file, why add extra syntax for that, and just explicitly declare all possible exceptions.)

[1] Of course, compilation units and (sub-)libraries usually, but not always, coincide.

[ As you pointed out: I should have used the term module instead of library. Another sticky for my forehead. ]

The kinds of failure that

The kinds of failure that exceptions are best used for can almost never be handled locally; this is largely their purpose. The benefit of exceptions is that they are a way to automatically unwind a stack of invocations across multiple abstraction layers, cleaning up along the way, while sending back details of the error that caused the aborting of the computation. Errors that are expected to occur and are to be handled locally ideally shouldn't use exceptions. For example, parsing in a compiler: the parser should resync and add the error to a list, but otherwise continue, rather than unwinding. Where a third-party API is involved, ideally the API should provide alternate versions that use some form of error codes rather than exceptions when the user wants to implement a local error handling policy; cf. Int32.Parse vs Int32.TryParse in .NET.

What do you expect to do with such a FooException? All it seems to be is a lossy symbolic renaming, or a wrapper around the underlying I/O exception with a friendlier message. And if I understand you correctly, a logical conclusion of what you advocate as ideal is the automatic wrapping of random low-level exceptions into a single abstraction-defined exception, along with (allegedly) a more comprehensible message.

But you've lost information doing that. 'Store out of space' is a qualitatively worse error message than 'Drive X: is out of space' or 'File system /var/tmp is out of space'. The latter is actionable; the first relies on an understanding of the semantic model of the abstractions built into the program, knowledge of how it's configured, and knowing exactly which store failed and where its data was stored. Depending on how relevant these abstraction are to the user, that may be helpful - or not. I think this kind of policy makes sense if the information markup is useful and sensible to the user, but not as a blanket policy for all exceptions that may penetrate the abstraction boundary.

Java's checked exceptions, on the other hand, are a terrible idea. The whole point of exceptions is that they propagate upwards. Checked exceptions burden the programmer with having to create a tunnel in the type system between the point of the throw and the point of the catch. That's sufficiently onerous that the programmer is strongly incentivized to either swallow the exception or hide it inside a non-checked exception, defeating the mechanism. This isn't "sloppy" programming; it's the language that's sloppy. The language should work with the programmer, leading them into a pit of success - they should have to try harder to do the wrong thing, not vice versa.


(I added another post, so see up.)

I think we mostly agree. It boils down your comment "the latter is actionable," and responsibility. Who has the responsibility of taking the action? The calling, or called component?

Actually, to be honest: I DON'T KNOW, but I feel the notion of failing components is the most natural. But again, no idea what is best, is there one best way of doing so, or is it up to the programmer to dictate a convention? I guess there are some OO patters which tell where to delegate responsibility, so that could be a lead... and I don't know.

[ Btw, your (first) parse example is bad, since we agree on that. The question is if the parser should be able to throw IO or DivideByZero exceptions. ]

Good design prefered over convenience?

Yes, that may well be a good exception "taxonomy" against which library users may effectively program. I can well imagine programming against such a "library exception API," more or less, as a seemless (and unavoidable) part of the overall library API.

Such a library level taxonomy can be implemented via a constrained low level exception mechanism, careful API design and simple explicit, manual "rethrows" of exceptions within the library's implementation, minus any other fancy footwork regarding automatic exception propagation.

My emphasis would be on a minimal exception handling mechanism and semantics available for library designers still suitable for constructing more or less "logical" library-level exception handling strategies for the library's users.

Exception "propagation" (fully manual) might (with exercise of good taste) work sometimes at the library implementation level - and very good then. But at the library boundary, the model is still to handle exceptions thrown by the library on a per-library-function basis, i.e, in a strictly local context.

Does this rough philosophy and minimal exception machinery fit the bill, if sometimes at the expense of convenience (vs. other exception mechanisms)?


Call-chains aren't always first-order

I.e., it makes sense that if you define a library/object named 'Foo,' and you use that, the only exceptions you'ld expect are 'FooExceptions,' the exceptions which may be raised by using that component. I think it would make sense for a language to enforce that at the library/object level.

The problem with this scheme is that it breaks down in the presence of higher-order functions or objects. If you have a function Foo.f that you pass another function Bar.g to "call back", then you have to expect - and want to see - Bar exceptions as well.

This is the same reason why exception specifications in C++ or Java don't work: they were designed under a naive first-order assumption that does not apply to typical OO code (or generic code, for that matter). If you wanted to do it right, you'd need to be able to abstract over exception specifications, i.e. have a simple form of effect polymorphism.

Edit: This, in turn, is closely related to Neel's observations about OO and formal verification.

Another Good Point

I think you're right. And, as I guesstimated it, abstraction over exception specifications in higher-order languages defeats the purpose of exceptions (since the value and exception are almost equivalent to a union type then), though I have no good examples for that.

[ The other observation being: If exceptions are things which should be associated with a module, then it's natural to expect it to be part of the signature of the module, not one or more specific functions. ]

[ And also: It's just damned tedious for a programmer to decorate types of functions with exceptions. A case for sloppiness? ]

Tedium of enumerating exceptions....

Yes, Java is tedious in this way, and I don't understand, why the exceptions thrown by a function aren't simply part of the inferred function type instead.

After all, exceptions are only throwable (I'll presume for the sake of argument) via a special keyword such as "throw" or "raise". The compiler has all the information it needs to infer the exception aspect of any function's type, or again, I must be missing something big.


Inference of Exception Types

You're missing modularity issues, and all the implications of polymorphic dispatch. Interfaces, at the very least, would need to enumerate exceptions.

Exceptions vs. HOF = fauly type systems?

If a language supports exceptions and hof, let's just say map for example, why can't the map function's set of thrown (checked) exceptions be a kind of type parameter bound to the set of ("checked") exceptions thrown by the mapped function used by any given call to map itself?

Am I pipe-dreaming again, or am I detecting a lack of ingenuity or imagination in integrating ("checked" exceptions) exceptions elegantly into the type system?

What am I missing?


No Faults, Design Decisions

Say I curry A.f with B.g in C.h and want to apply it to a list in D.j using

Say B.g fails (DivideByZero), what is the inferred type (including exceptions) [of the application] in D.j, and am I allowed to use [which has no provisions for dealing with exceptions from modules A-D]?

You open up a can of worms, there is no good solution, hence no inference of exceptions.

[ Btw, it's solvable such that everything type checks of course, just as one can use union types to typecheck lisp. Pragmatically, no. ]

Just parameterize every

Just parameterize every higher-order function by an effect variable identifying possibly thrown exceptions. Something like:

val B.g: int -%e-> int
    where %e = DivideByZero
val 'a list -> ('a -%e-> 'b) -%e-> 'b list

At the site of application of, either an exception is allowed to escape, ie. the current expression has an external effect variable, or it an exception is not allowed to escape, in which case you must handle it locally.

Java's checked exceptions are unusable because of the absence of effect variables, as Andreas pointed out. To recover Java's full flexibility, you also need a way to propagate these effects to the interface level, which would make interface signatures more complicated, but it's not intrinsically impossible.

I agree

But have the feeling writing types for hofs would become a living hell (which it sometimes is already ;) )

[ A statement without any proof: Say I do as you said, then my guess is that almost all (higher-order) functions would need these addenda. To me, that would defeat the whole purpose of exceptions. If I need to add this kind of information everywhere, then I might as well not add it, and leave that to the 'dynamics of the language.' I mean, pragmatically, what did I add to the map function except for the fact that I now explicitly, and laboriously, write what I already know, it can throw anything what one of the argument functions can throw? (And there are probably only a few places, those places where exceptions are handled, where it actually 'adds' information.) ]

Optimize HOF Type Syntax for the Default Case

DDC will infer effects unless you restrict them. I suspect that is the behavior you want: if all you're doing is propagating the effect, there should be no need to specify it in a type signature; when you wish to restrict or control an effect (i.e. at module boundaries), then you do so explicitly.

This brings up a good point:

This brings up a good point: is there a good way to specify effects separately from a function's type signature? Most effect systems integrate the effect variables with the arrow, but if possible, it would be nice to have effect signatures be distinct. Consider something like:

val B.g: int -> int
      -% int -> DivideByZero  (* can be inferred *)

val 'a list -> ('a -> 'b) -> 'b list
           -% 'a list -> ('a -> %e) -> %e   (* can be inferred *)

Partial signatures?

Oleg Kiselyov has a cute trick for writing a "partial type signature" in Haskell--something deliberately too polymorphic, with some (but not all) constraints mentioned explicitly, such that the compiler will infer the missing parts. Perhaps something like that could be added directly to a language, subsuming effects along with any other type constraints.

I'd imagine an extra annotation in the signature to indicate that something is or is not "over-generic"; the type checker would then be free to specialize or restrict anything marked as over-generic as part of type inference, while retaining anything that was explicitly specified.

The programmer would then be able to give the type checker a few hints about tricky parts--effects, quantified types, unusual type class constraints, whatever--and leave the rest implicit. Probably easier said than implemented, though.

Don't understand his trick....

From Fri Aug  6 20:29:40 2004
Subject: Partial signatures
Date: Fri,  6 Aug 2004 20:29:40 -0700 (PDT)
Status: RO

Malcolm Wallace wrote:
] and since you cannot write a partial signature,

Can't we really?

It seems `partial signature' means one of two things:

	- we wish to add an extra constraint to the type of the function
	  but we don't wish to explicitly write the type of the
	  function and enumerate all of the constraints

	- we wish to specify the type of the function and perhaps some
          of the constraints -- and let the typechecker figure out the
	  rest of the constraints.

Both of the above is easily possible, in Haskell98.

> module PartialSig where
> import Data.Array		-- for the examples below

In the first case, suppose we have a function

> foo x = Just x

and suppose we wish to add an extra constraint (Ord x) but without
specifying the full signature of the function. We just wish to add one

> addOrd:: (Ord x) => x -> a
> addOrd = undefined
> foo' x | False = addOrd x
> foo' x = Just x

Even a not-so-sufficiently smart compiler should be able to eliminate
any traces of the first clause of foo' in the run code. So, the recipe
is to define a function like `addOrd' (or like an identity), give it
the explicit signature with the desired constraint, and then `mention'
that function somewhere in the body of foo. Or, as the example above
shows, prepend a clause of the form
	foo arg ... | False = addOrd arg ...
In that case, the body of the function foo does not have to be changed at

For the second case: suppose we wrote a function

> bar a i = a ! i

and wish to give it a type signature

*> bar:: Array i e -> i -> e

But that won't work: we must specify all the constraints:
    Could not deduce (Ix i) from the context ()
      arising from use of `!' at /tmp/d.lhs:207
    Probable fix:
	Add (Ix i) to the type signature(s) for `bar'
    In the definition of `bar': bar a i = a ! i

But what if we wish to specify the type without the constraints (and
let the compiler figure the constraints out)? Again, the same trick

> barSig:: Array i e -> i -> e
> barSig = undefined
> bar' a i | False = barSig a i
> bar' a i = a ! i

Incidentally, barSig plays the role of a Java interface of a
sort. barSig is a bona fide function, and can be exported and
imported. To make sure that some other function baz satisfies the
barSig interface (perhaps with a different set of constraints), all we
need to do is to say
	baz arg1 ... | False = barSig arg1 ...

We can also attach to barSig some constraints. The typechecker will
figure out the rest, for bar' and baz.

Uuuuh, ok?

[ Do understand his trick. Don't agree. ]

The Trick Works

For every somewhat more elaborate HM style inferencer. Dealing with exceptions is for an inferencer not even very hard, it means you'll end up adding union typing for that to types. You need to do that anyway if you want to typecheck uses of exceptions - with or without explicit type annotations. (I personally don't, but it's very possible.)

The question is not how to typecheck, but whether to typecheck exceptions, and typecheck exceptions explicitly.


All proposals for exceptions, as I know them, deal with unions of exceptions (types) which can be thrown, I think the most natural manner would be to abstract sets from the type signature.

val g: ('a -e0-> 'b) -> ('b -e1-> 'c) -> ('a -e2-> 'c)
where e0 = ...
      e1 = ...
      e2 = e0 + e1

My personal opinion: It's still a mess. (And it becomes an even greater mess if you think about what if ('a -e0-> 'b) is abstracted to somefunctiontype, exported, and used? Abstract it to: somefunctiontype throwing e?)

[ And it's a mess since I strongly believe exceptions live at the module interface. Personally, I would prefer module interfaces such as

   type File   
   open_file: text -> File
   read_file: File -> text
   NoFile text
   FileCorrupt text

And not put the burden on the programmer of laboriously defining each type exact for each (exported) function. ]

Has been proposed already

FWIW, Gafter et al.'s proposal for closures in Java indeed adds "exception type parameters" to scale exception specs to higher order (a property which they refer to as "exception transparency"). The proposal is rather reserved about defining their semantics, though.

Monadic exception handling...

I remember hearing about monadic approaches to just about everything and the kitchen sink. Transactions and Exceptions were certainly among those, even of the resumable sorts.

You might also use some tricks to type-classes for implicit configuration to carry the 'dynamic scope' and simplify passing of handlers into code without interference from heavyweight 'monads'.

Is in-language and in-OS security dead?

While we're on the topic, I'd like to ask a question I've been thinking about lately:

It seems that we're now at a point where we can say confidently that OS security has failed. For example, Qubes OS by a (I gather) prominent security researcher simply side-steps the whole issue by virtualizing multiple OSes.

The above situation is a direct result of certain architectural design decisions, which include over-complexity of the OS API, insecure GUI design, and, last but not least, the monolithic kernel architecture. It is the opinion of the authors that systems that are based on such insecure architecture simply cannot be made secure. (Qubes OS Architecture)

If we can't secure the OS, there's not much point in developing secure PLs, is there?

If we can't secure the OS

If we can't secure the OS, there's not much point in developing secure PLs, is there?

Methinks you ask the question backwards.

Developing secure PLs is a necessary step towards developing a secure, high-performance operating system (one with low overheads for its process model, IPC, and device control).

The insecurity of the C language is a horrendous performance drain once you account for how it influences the process, concurrency, and inter-process communication overhead costs [include parsing and serialization as overhead]. (As a consequence, proponents of C are careful to avoid accounting for these costs.)

A secure language of 'good' performance could beat C once you account for cheaper process model and IPC with minimal serialization, context change, and parsing overheads. But there is no reason such a language couldn't support considerable optimizations. Indeed, many of the same language features that support security (such as static knowledge of confinement and control over introduction of non-determinism) could be leveraged for optimizations.

The line between operating systems and programming languages is vague and ill-defined in any case. A language that include models for persistence, concurrency, security, extension, and distribution will have every important OS feature with the exception of a framework and library of code for effectively managing and controlling devices.

Singularity OS

One particularly interesting project in making operating systems both secure and efficient is MS Research's Singularity OS, which runs everything in ring 0 and uses a JIT-compiling VM to enforce its security guarantees. Not only does it enhance security, but the overhead of secure IPC is much reduced. There have been some other OSes with similar ideas, like Inferno and JavaOS.

Most anything would be a step forward from the current practice of writing everything in C and just programming very carefully.

Singularity OS is the sort

Singularity OS is the sort of system I was thinking about. There are many similar projects based on Object Capability Model security.

CPU Design

Another interesting aspect is the impact of the C language on CPU design.
With different languages, the MMU may not need to provide hardware-enforced page protection.

There is also a distinction between the visible assembly language and the actual architecture of the underlying core, for example, the instruction scheduler of a 'dataflow' out-of-order CPU re-creates on the fly single assignment forms of the instruction stream.

The generalisation of JIT compilers offers new opportunities for better software/hardware optimisation

Language influence on Hardware

Another interesting aspect is the impact of the C language on CPU design.

I agree. I would love to see a study on this subject. Languages almost certainly influence which features have been optimized (i.e. with regards to vector instructions, support for feedback from other devices, etc.).

With different languages, the MMU may not need to provide hardware-enforced page protection.

Maybe, though I suspect that such protection may still be useful for high-performance concurrent garbage-collection, reactive programming, and possibly transactions... if it were not utilized for security.

The generalisation of JIT compilers offers new opportunities for better software/hardware optimisation

It is very exciting.

Claimed, but not substantiated

There are many claims in the Singularity work that need to be viewed skeptically, and all of the claims you list are among them. The linear typing stuff proved to be essentially unusable. The argument for JIT-based safety ignores the inability to assure the VM and the fact the the VM has inherently higher complexity than a secure OS microkernel, the fact that certain kinds of security enforcement don't want to be done at the same layer of abstraction as the subject program, and the fact that real memories experience bit flips that cannot be guarded against in a system without guards at multiple levels.

Finally, the claim that Singularity performance was competitive is so far off the mark as to severely discredit the reviewers and the conference committee. No paper with those claims should have ever been accepted without a comparison to some other current or previous generation research systems. While the Singularity v. Singularity times were fairly comparable, Singularity is more than 10x slower than L4, EROS, Coyotos, QNX, or VXWorks -- even on the allegedly fast IPC operations. There is room for an awful lot of experimental error, bad judgment, optimism, and misunderstanding when you are a factor of ten away from the state of the art. What there isn't is any scientifically valid basis on which to stake a claim about performance of any sort.

I agree with your final statement -- thus BitC -- but ironically, the one place where BitC turned out not to be truly motivated was the bottom (microkernel) layer. At that layer, the idiomatic constraints are so tight in a well-structured microkernel that the choice of language could just as well be C without introducing any problems of semantic ambiguity. Safe languages are nice to have at that layer. Where they become essential is at the next layer up.

The seL4 verification is illustrative. They have verified a microkernel, not an OS.

the fact that real memories

the fact that real memories experience bit flips that cannot be guarded against in a system without guards at multiple levels.

I am completely clueless here; what do real world systems do to prevent and/or detect bit flipping? I don't understand how multiple levels by themselves provide any assurance... I *do* understand error correcting codes.

Two answers

ECC memory, of course.

But in some cases it is possible to redundantly store critical data structures in such a way that (a) you can *detect* errors in fail-fast fashion using ECC or checksum, and (b) you can often *correct* them using the redundant copy. In the EROS operating system, for example, about 85% of memory particle hits in critical structures proved recoverably correctible without ECC memory. Note this didn't extend to user data!

In the VM case, you can argue that it's okay to lose type safety so long as the consequences don't escape the VM. But that requires that loss of type safety must not lead to compromise of the auditing agent. I know of no way to enforce that requirement in a system design that does not have hardware-enforced permission layering.

So what is a critical data

So what is a critical data structure? Where is the example code I can dive into, or at least a schematic diagram demonstrating the reliability model?

I think this is the exact sort of bluriness in system design that needs to be put an end to.

For example, what is type erasure, really? What does it really mean to be tagless? What execution overhead can we ignore? How do you model safe upgrades, with support for recovery and roll-back on failed certification without such information? Moreover, what if the machine you are upgrading to does not have enough memory, core or otherwise, to have to images of the machine at once AND verify the image is okay? What if that is too costly? What's the secure interaction design for that? I've never seen papers on those issues! (Edit: What's interesting is that maybe now is the time for researchers to squeeze Microsoft for funding for such issues, seeing as how Microsoft's Fortune x00 clients are now pushing them to release smaller software versions over time rather than huge software packages like Windows 7, allowing clients to build isolated "appliances" using MS technology with client-controlled system administration).

I'd prefer language and system design to go in the direction of modeling all these various what-if's. Then prototyping new hardware and matching optimal hardware to a design should be much easier.

See my reply below

... to naasking

I know of no way to enforce

I know of no way to enforce that requirement in a system design that does not have hardware-enforced permission layering.

Fault tolerant computation via redundancy. It'll take a significant performance hit, but it's doable.

Fault tolerance

There is some difference between making fault tolerant hardware to mitigate transient errors (by using redundancy and error detection/correction mechanisms) and being tolerant to software bugs (see the Ariane 501 failure).

In EROS, the hardware page

In EROS, the hardware page mapping tables are derived from a persistent tree data structure. The tree nodes are also kept in memory, as main memory is basically a cache for the persistent object store. The added redundancy of this information (in different forms) means you can perform consistency checks at each level to detect errors. This is just one example I can think of off the top of my head.

That's an example

In general, the "key" data structures were the ones that held capabilities. These were all covered by checksum, with the consequence that the window of opportunity for corruption was less than 5 minutes. If we detected corruption, we either rebuilt the state from other sources (e.g. the page tables or the process cache), discarded and reloaded the object (if it wasn't dirty), or declared a restart to avoid persisting known-bad state.

The key to the whole thing is that data with a low update frequency can be managed by updating checksums on store and checking them before consequences happen.

The EROS code went offline when we stopped the OpenCM work. I'm poking at how to get that state moved into Mercurial so that it can be fetched again.

Supporting data, please?

So why is it that no high-confidence OS to date has been written in such a language, yet several such systems exist?

The original claim is merely that unprincipled OS design cannot prevail on security (or, I would add, reliability). Safe languages cannot rescue unprincipled designs. It is unquestionably possible to express truly horrible programs in any safe, principled programming language you might care to name.

More than this...

I agree: we need more than a safe and secure language. We must also develop the patterns of robust composition, which will necessarily cover all aspects of the software life-cycle (including distribution, integration, maintenance, extension, and upgrade of user and system services and their dependencies). And we need an effective, verifiable service set to support common devices and user requirements. And within those services, it's the same deal, just more specific to domain... i.e. security requires we control whom has authority to take keyboard focus, or observe the mouse.

A secure language is an important step, perhaps even a 'giant leap', but the path is a long one. There remain plenty of places to err. For example, many Object Capability Model aficionados attempt to emulate an applications distribution model that was developed for ambient authority systems. I consider this a significant 'wrong step', and discuss it here.

That said, I don't believe we've yet to solve the 'safe/secure' language aspect yet.

You refer to existing projects that develop operating systems in secure languages. But which of these languages has safe concurrency models that are ensured against deadlocks and priority inversions? Which effectively support resource conservation and process accounting to protect against denial-of-service? Which support us in achieving timing requirements?

I suspect there are no high-confidence OS's today written in any language. There are, at best, some high-confidence service installations, which rely upon considerable awareness of their usage context and specific application requirements.

I think you misread me

You refer to existing projects that develop operating systems in secure languages.

I think you misread me. I referred to existing projects that developed operating systems in unsafe languages. Indeed, the only high-confidence OS's today are written in unsafe languages. Ada, admittedly, sits in a curious middle position depending on the subset chosen.

I suspect there are no high-confidence OS's today written in any language.

The beauty about suspicion is that it is not impeded by data. There exist six operating systems rated B3 or above under the old Orange Book scheme, any of which would be considered high-confidence. There are at least three at EAL7 under the Common Criteria scheme (perhaps more, I have lost track), which are certainly high confidence. There are a number of FAA Level A certified systems that are formally verified.

But I think you were trying to say something worthwhile here, so would you care to refine? Perhaps your point was that high-confidence systems are high confidence partly because they are not general-purpose?

You'll have trouble with the Orange Book systems if that is your supposition.

High Confidence OS

According to recent data I can find: There are no operating systems at EAL7. There is a specific product at EAL7. There is one operating system at EAL6 (INTEGRITY-178B), whose 'security target' doesn't even begin to cover interaction and composition between untrusted subsidiary applications and critical services. (Also, while INTEGRITY-178B is a real-time OS, its EAL6 security target did not include verification of its real-time properties).

FAA certified systems are, of course, highly focused on a particular operating environment. I've worked at two places on such systems, and I have family involved in that certification industry. (Apparently there are issues with politicking... i.e. if organization X isn't willing to finalize certification on a product, maybe we should be paying organization Y to do it instead? Fortunately, it is tempered a bit by liability.)

The Orange Book effectively defines security in terms of the Unix model. I certainly wouldn't have confidence in it for general purpose roles - i.e. not much in the TCB, and far less the Operating System.

I suppose I should clarify: There is NO point at which the 'Operating System' stops and the 'Applications' start. There is a point at which the TCB stops, and there is a point at which the assurance stops, and you want the former to be as small as possible and the latter to be as large (and as broad-spectrum) as possible.

Services outside the TCB are often critical to the correct behavior of a system and achieving confidence in it. In my own domain, this includes flight controllers for UAVs, or obstacle avoidance on a USV or UGV, or sane link-lost behaviors. And yet all of these services must interact with a plethora of external interests. A flight-controller must interact with both mission management payload behaviors (such as tilting to give camera a better view of a region, or get a better shot). A UGV with a payload with a drill-bit in the ground (or worse, one with a robotic arm involved in attaching an explosive charge to an IED) cannot come running straight back to mama when link is lost. A USV drawing a tether line (used for many sea sensors and systems) can't just 'stop' to avoid a collision, because the rotors and line might interact in usually bad ways (damaging one another or entangling). And all these payloads are pluggable.

In the general case, anyone with authority to interact with a component tied to a device (i.e. a 'capability' in the capability model) must be able to place software on or near that device in order to represent interests (and continue gathering data) during a link loss, disruption, jamming, or even intentional communications silence (for stealth). That is, high-confidence, general-purpose secure systems must not couple authority and transport-layer connectivity. Payloads, operators, coordinating platforms (forward observers, loitering munitions, refueling stations, traffic controllers), and such must be capable of distributing general-purpose untrusted code to one another to represent long-lived interests and communications. One must be able to run this code, have it interact or coordinate with multiple subsystems - including external systems (thus forbidding 'sandboxes') - to which the sender of the code already possesses authority.

Thus, a general-purpose setting requires accepting and running untrusted general-purpose code, at least from anyone whom has any authority to interact with any code that is already part of the operating system. A general-purpose high-confidence OS, therefore, must achieve security in this scenario, providing a variety of assurance properties. (A useful assurance property is: code running locally won't have more local authority than could have obtained if running remotely across a perfect network connection. Object capability model can ensure this.) Properties include information assurance, protection of priority and any necessary real-time features, legitimacy - i.e. that policies can be effectively expressed, and responsibility, which requires that any grant of authority must be accompany the necessary information to utilize that authority safely and incentive (fiscal, legal, political) to actually do so. Safety (in the physical sense) must sometimes be compromised (risk-reward scenarios... i.e. get the fire-truck to the fire faster, at greater risk of accident), but responsibility requires risks and such be clear to whomever is authorizing the breech. And, of course, forensic auditing will be needed for inevitable politicking when an even partially autonomous system suffers a fault. Finally, performance mustn't be sacrificed arbitrarily; such things as power, energy, heat dissipation, and bandwidth are often at a premium, and both CPU time and Memory will quickly become so if people are enabled the proper communications medium: distribution of general-purpose code.

When you can give me high confidence in that mess, without relying on hacks such as a "trusted" subnet (an impossible precondition in widespread coordination scenario), then I'll say we have a high confidence OS. While the above scenario may seem specific to a domain, it involves end-to-end issues that cross every domain (including UI, data management, information assurance, social interaction, and so on) and thus represents general-purpose interests. Truly, the same sort of code-distribution scenarios apply across every domain that uses communication for anything whatsoever... even such mundane things as browsing the web, taking a test, answering decision-trees to perform one's taxes, controlling temperature in one's home, playing video games, or interacting with a menu through the television. If you fail to provide code distribution features, they will be reinvented... repeatedly, and badly: usually without the performance or security or assurance or multi-system integration properties. This is historically true. Plugins, JavaScript, and even the common 'executable application' distribution model are all instances of this phenomenon.

The Orange Book doesn't even begin to cover a combination of 'high-confidence AND general-purpose'. At a (generous) best, Orange Book might be read as a security target for an operating system whose role is specialized to information management. I spoke of 'many places to err' earlier; for general-purpose use, Orange Book goes terribly wrong with step C1: separation of users and data.

High-confidence must include at least as much validation as verification; it doesn't matter how well you meet specifications if you have bad specifications.

The beauty about suspicion is that it is not impeded by data.

The beauty of any claim about 'high confidence' is that it is inherently subjective. ;-)

Quick response

So first, I agree.

And second, I note that you are indeed looking for high confidence/security in a general-purpose system, in the sense that that the application payload isn't predefined. The only system I know of that ever accomplished that in the field was the KeyKOS system.

Reliability through pain

Because software written in C can fail in so many ways, I was much more careful than normal when writing it. In particular, anything involved in manipulating chunks of memory raises the prospect of off-by-one type errors - which are particularly dangerous in C. Whereas in a higher-level language I might be lazy and think hmm, do I need to subtract 1 from this value when I index into the array? Let's run it and find out, in C I thought OK, let's sit down and reason about this.

That suggests that the most reliable languages should come with shock collars.

Some strawman initial settings for the collars:

integer overflow - Zap
index out of bounds - Stun
null pointer dereference - Lethal

I'm sure a PLT researcher could find a few volunteer undergrads to be experimental test subjects and since undergrads aren't generally covered under international human rights laws there shouldn't be much problem on a legal front.

Anything can be "Reliable"

...if you don't bother measuring for reliability, or defining what reliability and margin-of-safety is for your project.

Perhaps the author should read Daniel J. Bernstein's Some thoughts on security after 10 years of qmail 1.0. Honestly, ask yourself, why is it there are so many mail programs written in C, but how many reach qmail's quality standards? How many actually had quality standards from the outset, which qmail clearly had? (edit: What about FTP software?)

Finally, the author does not cite J.D. Gannon and Jim Horning's seminal work on reliability studies in programming languages - so it is clear to me he lacks any knowledge of what reliability in programming language design even means (from a scientific Design of Experiments standpoint) [edit: or is just really bad at bringing forth a balanced perspective, since Gannon and Horning definitely argue against the author's perspective].

Edit: One final thought. The language the author uses to describe his mistakes is just too weak for me.

In fact, ignoring logic errors that would have happened in any language, only two C-specific errors have thus far caused any real problem in the Converge VM (please note, I'm sure there are lots of bugs lurking - but I'm happy not to have hit too many of them yet).


But let's be honest - in how many languages can one retrospectively add a garbage collector?

I realize he is a language designer, but these phrasings sound like a famous golfer apologizing after ruining his public image. Do not use weak phrasings of failure. It dramatically influences your audience and is only useful if you are writing a persuasive essay, not a balanced critique of C's reliability. For a model of clarity in phrasing problems in language design - read papers by the Ocaml designers. Those guys are simply blunt.

C Koans

1. A student went to Master Ritchie and asked: "Master, I have learned so much about how to make programming languages safe. And C has almost none of these features. Yet is is so often used safely! How is this possible?"

Master Ritchie said "Don't be stoopid, kid. I'll show you the answer indirectly. I have a different problem for you to worry about." And with that he led the student to the wood shop and handed him a plank of wood. "I need you to cut a perfect 14 inch circle from this plank because I need to build a new stool for my meditation. You should start by using that band-saw over in the corner."

The student was dumbfounded: "I am afraid to try. I do not know how to use a band-saw safely. I'm afraid I'll hurt myself."

Hearing his own words, the student slapped his own forehead and grinned. "I think I understand. Let me sleep on this."

2. The next day, the student went to Master Thompson. He eagerly reported: "Master Ritchie taught me yesterday the secret of good C programmers! Just don't try to write any programs that you can't write safely in C!".

Master Thompson sighed and then chuckled. "You learned nothing. I spoke with Master Ritchie. I know all about it. We're all over you kid. You're clueless. Forget about all that for a minute. I'll fix your confusion indirectly. Follow me."

The Master led the student back down to the wood shop. He handed him a plank of scrap wood. "Take this for practice, so that you don't mess up Master Ritchie's meditation stool. You're right. Don't use the band-saw to cut the circle. Use that router, over there in the corner."

After a few hours of practice the student approached Master Thompson saying "I have failed. The router is safer - I am confident I won't hurt myself. But my hands are not steady enough. I can not cut a good circle. I have trouble following the line with the router - it wiggles this way and that."

The Master said simply, "Now you have two problems," and walked away.

3. The next day the student went, in a distraught state, to Master Kernighan saying "I am lowly. I can not use a dangerous band-saw without fear of hurting myself. I can not use a fairly safe tool like a router because my hands are not steady enough to cut a perfect circle. I can not program a dangerous language like C. And I can't even wield a safer tool effectively. I am lost."

Master Kernighan said "Yes, I've heard all about your dabbling in C and you should know I have nothing to do with C other than I sometimes describe it to others. So now I'll describe it to you. Indirectly, of course - you should be getting used to that by now."

With that Master Kernighan strode out wordlessly into the street and, after a moment of confusion, the student followed. The Master pointed. Across the street a construction crew was erecting some scaffolding for a new building they were making. The Master asked: "What are they doing?" The student responded "Building a new building!" Master said: "No, nothing they are building is part of any new building." He raised an eyebrow at the student, who said, "Well, they are building a structure to help them build the new building." Master Kernighan cleared his throat, swatted the student on the butt with a slender volume with a large, pale blue C on the cover, and went back to his office.

The student sat in a state of satori for 20 minutes before losing his train of thought.

4. The next day the student went back to Master Ritchie. He was burbling with excitement. "I get it! When I can't solve some problem using C safely, I should build some other tool to make it possible, first! For example, I can build a machine that will hold the router steady to cut a circle, even though my hands can't hold it steady. I just need a little rig to hold the router steady and move it in a circle!"

Master Ritchie grimaced and then frowned. He rolled back his eyes in thought. The student stood there, nervously shifting his balance foot to foot. At length, the Master spoke, handing the student the plank from which the stool seat was to be cut. "Close enough. Take this down to the wood shop. Cut a perfect circle of 14 inches from this plank. Take that scrap wood too. You should start by using that band-saw over in the corner."

And with that, the student was enlightened. (Well, close enough, anyway.)

Error checking in C

For all that it is painful and time-consuming, checking for individual errors returned from library functions and OS calls, one at a time, the way you have to do it in C code, is often better security and reliability than exception mechanisms provide.

Using exceptions, you can guarantee that all errors are caught and handled. But using exceptions in that way is imprecise; you don't know exactly where the error occurred. Checking individual returns is the only way to actually find each individual error.

I guess I've never really been a fan of exception systems. When the handler is isolated from the error site, it's too easy to not consider the individual possible errors, what exactly you were trying to do when each one arose, what each means, and most importantly how to fix it. Drilling down to individual error sites by checking individual returns often (usually?) reveals individual bugs that can be fixed, where throwing exceptions only reveal that "there's a bug, or maybe several bugs, somewhere in this code."

Exceptions are easier, but not better; checking individual errors is precise.

C syntax

Now that you mention it, it does seem to me that guarded continuations, my supposedly exception-like facility in Kernel, are about something other than generic error-handling. So far I've used exit-guards for trapping errors only at very local, narrow points, and otherwise the significance of both exit- and entry-guards seems to be retention of some degree of control when calling a procedure (exit-guards), or when exporting a continuation (entry-guards).

It's my subjective impression that, in designing language syntax for error handling, virtually all effort goes into finding ways to make a clean separation between normal-case algorithm and error-provisions. (I have this memory of Gregor Kiczales, once upon a time, mentioning this separation as a particular hope of his in pursuing aspect-oriented programming, though poking around I can't find evidence to back up the memory; perhaps he stopped saying it publicly when it became clear that the aspect-oriented generation of technology wasn't going to deliver?)

If indeed C does well by the diametrically opposite strategy —that is, by entangling error-provisions rather than separating them out— a question this raises for me is, is this due to some characteristic of C syntax? Three possibilities occur to me here. (a) Perhaps the vastly maligned terseness of C syntax makes error-entangling less painful than in more verbose (and therefore more, er, micro-locally lucid) languages. That is, each individual token may be harder to puzzle out, but larger patterns at the scale of a function may be more graspable because you can see more of it at once. (b) Perhaps C benefits from its inability to express much of anything lucidly, so that one isn't tempted into a security-undermining attempt to achieve greater lucidity by separating out the error-provisions: it's all going to be C code anyway. (c) Perhaps there is something else about C syntax that reduces the illucidity of the error-entangling strategy. I don't know what that would be, but I'm motivated to look for it by the fact that (a) makes me sad and (b) is almost a joke. (Cf. "Ha ha only serious" in the Jargon File.)

[edit: changed somewhat bizarrely unhelpful subject title]

C syntax?

I question the assumption that separating handlers from exception sites is really an increase in lucidity. As I see it, it allows code liberty to accumulate an unlimited amount of unresolved context, and the context really has to be resolved before the code is fully understood.

I perceive exceptional conditions as "loose threads" in a computation. If I read code that can throw an exception, it's a loose flow-of-control thread going off somewhere I can't see. I don't consider the code complete, and therefore lucid, until I see the handler and how the threads come back together into a known state. The problem is that I can only remember about a dozen of these things as stacked-up context waiting to be resolved.

So.... I don't buy the idea that separating error provisions results in an increase in lucidity. It does make the code more convenient to express, but leaves both the non-exceptional and exceptional parts needing a lot of non-local context to understand.

That said, the question you raised is still valid. Does C's syntax somehow reduce the pain of expressing errors and error-handling along with the main line of computation?

I would consider your option (b) a non-starter. C code, unless intentionally obscure, is crystal clear to people who are used to it. That leaves (a) terseness and (c) something else. I don't really have a good answer.

I think terseness helps. C's terseness allows you to express some complexity directly in relatively small regions of code, avoiding the pile-up of unresolved context. You can read such a region, and say without further ado, "okay, the code responds to this situation by doing so-and-so." But it also hurts, because those small regions can become boilerplate or cut-and-paste code that would benefit from procedural or syntactic abstraction.

Lucidity consists in some sort of balance between (among other things) immediate resolution of context and proper use of abstraction to avoid duplication. Both C and other languages allow abstraction of pretty much any code you'd want to abstract in an error context; but C allows the balance to be struck in a different place by being terse enough to also favor immediate resolution.

C error handling

As a C practitioner for, um, quite a few years now, I've observed that:

It is very important to distinguish between various kinds of unusual condition. Borrowing some vocabulary from "go" and adding some new vocabulary: It's helpful to distinguish between errors, panics, and aborts. Errors are ordinary things that a program should handle, usually in-line. For example, the user is prompted for a file name to read that file, the file typed in doesn't exist, so handle that case. Aborts are what happens when it is detected that some program invariant is violated and generally the only correct response is to immediately exit the program. A good abort handler will typically write a constant, immutable string to stderr and should probably then call "_exit()" because little else is safe. Panics are when the whole program or some subsystem has to "give up", although no invariant has been broken and the program or subsystem can still "clean up" safely.

Ordinary errors are served pretty well by the customary C habit of indicating an error in a return value and handling it directly at the caller site. One way to judge the quality of a C program is to scan through looking for calls that you know might return an error value (e.g., malloc(3) or open(2)) but can see that the code doesn't check for that error. After years and years of programming C, when I see something like an unchecked return value from open(2) or malloc(3) in supposedly production code, I get a little adrenaline rush of annoyance at the author (even when it is myself).

Aborts are similarly well-handled by in-line code. Typically, they take the form of assertions and so they aid rather than hinder the reading of that code. Conservative C coding (or higher level coding that invokes C programs) always assumes that aborts are possible and tries to treat them in such a way that it's effectively the same as if the program had been killed externally (e.g., on unix, by sending certain signals to the process). This really focuses the mind on, at least, keeping consistent or recoverable state in external resources.

Where the syntax of C is a bit limiting (but only a bit) is with regard to panics. In these cases, where some internal or external resource is being managed, you really do want to say at some higher-level part of a call tree "and in case of panic, here is how to clean up" and at a lower part of the call tree, where the panic is triggered, you really do want to have to say little more than just "panic()". The readability (lucidity) of the program is better the closer you can get to stating the recovery procedure near the point of resource declaration that needs recovery. But C is a bit weak at letting you do that cleanly. As one example of the problem: consider a function whose return value is designed to be easily used within expressions (like "return x + foo()"). If such a function (like "foo") can cause a panic, either you have to tediously check the return value (e.g., "tmp = foo(); if (tmp == ERR) goto panic_handler; return x + foo();" or take a performance hit and use setjmp()/longjmp(). It's never not ugly and with some approaches in some circumstances has annoying performance implications.

For a language like C (as if we could change it now), something like Go's new panic()/recover() features is *plausible* (modulo re-interpreting it for ordinary calls rather than go-routines) - but not necessarily how I'd do it. I'd rather see a kind of CLU solution on steroids with some syntactic aid. Namely:

Allow "panicky" as a declaration qualifier on functions. A function may not call a panicky function unless either it itself is declared to be panicky, or it has a panic handler which invokes no panicky functions and does not itself panic. Panic handlers can be defined within any lexical scope, though they may not use as an rvalue any in-scope variable which has not earlier been initialized, in a trivially, statically observable way, by a non-panicky execution trace. Every panicky function can return one of two ways: the normal C way, or in a way that causes the caller to "goto" the corresponding panic handler. The default panic handler simply panics. That way you can still write "x + foo()" but have a reasonable way of dealing with the fact that "foo()" can panic.

Memory failure and multi-processing

In your scheme outlined in the last paragraph, what happens when a non-panicky function causes stack overflow?

Also, regarding the idea of panic as unwinding the stack, I think this is a reasonable pun in architectures where tasks are single threaded, but additional language support is pretty useful in the increasingly common case where failure subcomponents span multiple threads.

re: memory failure and multi-processing

In your scheme outlined in the last paragraph, what happens when a non-panicky function causes stack overflow?

Unless other additions / changes were made to C, besides just introducing panic, I wouldn't expect stack overflow to be detected.

Supposing that C were modified to detect stack overflow I suppose you could have it so that if a call to a panicky function would overflow, the call is converted to a panic, otherwise to an abort or (assuming some "emergency" stack space) to a signal whose default handler aborts. (Similarly for calls to alloca.)

additional language support is pretty useful in the increasingly common case where failure subcomponents span multiple threads.

Perhaps but with no specific threading model under discussion, I'm not sure much can be said about that.

Sub-topic mirrors the thread

One of the first things novice C programmers are taught is to allocate dynamic arrays of the correctly determined size for handling input data rather than stack allocate an array that "it will probably be big enough". It's kind of ironic that the possibility of stack overflow is due to the language itself essentially doing the same thing as the beginner - statically allocating a buffer and assuming that "it will probably be big enough".

So how is reliability possible? My answer is that passing through a lot of testing under conditions that mirror the intended operating conditions of the software is (at the current state of technology) more diagnostic than formal reasoning about programs relative to those expected conditions, though this testing says little or nothing about what might happen outside of that tested operating range. The whole issue of C/C++ and security problems is somwhat related to this - testing that says software works reliably in the presence of a regular user doesn't say much about how it works in the presence of a cunning, malicious adversary, and lots of "reliable" programs are routinely discovered to have atrocious security holes.

This is a very interesting thread.

I thought I was going out on a limb when I praised error-handling locally in C, but I'm seeing that a lot of other people have had the same experience.

When you want to write truly robust programs, handling errors locally seems to work better - for a lot of us, apparently - than propagating exceptions up the call stack.

This is a result that seems to recommend against the primary reliance on exception systems common in modern language designs.

Is there a good way to study this proposition about relative usability and effectiveness of error-handling styles? How would someone set up and conduct a rigorous usability experiment to explore it, assuming access to a few classrooms full of CS undergrads and a smallish research grant?

Another form of condition handling

As Thomas points out, many so called errors are not non-normal situations. A missing file, out of disk etc are conditions that are part of normal operation for a program and must be handled in a reasonable manner to maintain the user experience.

Since they are normal program flow, handling them locally, as C requires, makes some sense and my (non-rigorous) look at a selection of C programs shows that much of the "error" handling is in this style.

The problem with handling them locally is that a low level routine does not know what to do about the condition, so usually the best it can do is clean up and return another error code. For example if a file is not found, the low level routine doesn't know if it should print a message, show a string in a status bar, pop up a dialog or just exit.

This ripples up the call chain resulting in a de-facto unravelling of the stack, throwing away state that has been created along the way. Eventually the error propagates to a level that knows what to do about it, so now not only does it have to do it but then re-create all that state that was lost. Exceptions only reduce the programmer effort for the same result, they just throw away state more simply, they don't help to handle the condition.

To my mind the Common Lisp condition handling is a better method, where a low level function can tell a high level one "this failed but these are the things I know as alternatives, which would you like" and the high level routine can select whichever option suits its policy. Continuing the file open example, the options might be abort, return error code or try again with a different file name and having asked the user via the file dialog, the top level can tell the low level routine to try again using this file name instead. And no state needs to be re-created.

I notice that a few C programs have achieved somewhat the same thing without language support by using error callbacks or standard error functions that low level functions can call in case of failure, with the error function or callback signalling a retry or to return an error. This does not appear to be common though (again a non-rigorous survey).

Exceptions only reduce the

Exceptions only reduce the programmer effort for the same result, they just throw away state more simply, they don't help to handle the condition.

Unless you are using RAII idiom in C++, exceptions do not throw away state more simply. The finalization clause (edit: such as in Java) can easily make understanding programs difficult, and the code is not necessarily any easier to write than a C goto clean-up.

designing graceful failure modes

This thread reminds me of why programming language design and implementation should be regarded as a basic skill that all programmers should learn:

The problem with handling [exceptional conditions] locally is that a low level routine does not know what to do about the condition, so usually the best it can do is clean up and return another error code. For example if a file is not found, the low level routine doesn't know if it should print a message, show a string in a status bar, pop up a dialog or just exit.

Such issues are one of the touchstones that distinguish good C programs from less good C programs, and that determine much of what it is like to operate a C-based environment like Unix.

By a "good C program" I mean one that has useful and coherent functionality, that handles failures gracefully, that is easy to read and maintain, that contains re-usable code where appropriate, and that "plays nicely" with other programs in the environment.

Many simple programs such as utilities designed to be used in shell command pipelines take a simple approach: (a) print an error message and immediately abort the program; (b) manage external resources so that that is always safe. Unix contains features whose design supports this approach. Some examples: A program can create a "temp file" and immediately delete ("unlink") the file while still holding the file open. The temp file still exists but can not be seen in any directory. If the process holding it aborts, the file is automatically cleaned up. Similarly, the default behavior on Unix systems is that when a process dies, any sub-processes it has created are automatically killed, and any processes that attempt read from or write to a dead process are signaled.

Sometimes those facilities aren't enough and clean-up is absolutely needed beyond what is built-in to the operating system. In these cases, the "rule of thumb" is often that any catastrophic error in a program should be handled the same way the overall system would handle the abrupt termination of the program by external forces (such the process being manually killed, or the machine being shut off). That is why so many unix programs have external "recovery" tools to clean up after catastrophic program aborts. Note that non-local exception handling within a program will not help here because, under our rule of thumb, programs must assume that such exception handlers are not guaranteed to run.

Sometimes a program is conceptually divided into sub-programs. One example might be an HTTP server in which we want to ensure that a catastrophic failure in the handling of one request does not force the abort of the concurrent handling of other requests. Classically good C style is often to handle this case by using sub-processes. That is why, today, so many programs for which latency is a concern maintain pools of ready-and-waiting sub-processes. Note that each such subprocess can simply abort, as if it were a simple command line program.

When using subprocesses for subystems isn't a viable solution, and subsystems must all reside in the same process, good C style most often tries to arrange that (a) a subsystem can "internally abort" at nearly any time, returning out of the subsystem with as little exception handling as possible; (b) the parent system can trivially clean up the state of the subsystem. That is one reason why you'll often see complicated C programs give subsystems pools of memory to allocate from rather than letting them use "malloc()" - so that in the event of a subsystem abort, the parent system can trivially free up all memory allocated by the failed subsystem. That is non-local exception handling of a sort but it is a very limited, trivial kind of non-local exception handling. From the perspective of the programmer writing a subsystem, he is writing a program that simply aborts, albeit it not in the usual way. (It's also worth noting that CLU's automatic promotion of unhandled error returns to "failure" exceptions is ideal for this kind of situation - that is one reason why it would fit so nicely into C.)

You (Lex) also pondered the questions of information loss, error reporting, and so forth. A traditional exception mechanism (as in C++ or Java) makes elaborate provisions for passing arbitrary information from the point of an error to the point of its handling. Good C programs generally avoid relying on such facilities (because of that rule of thumb: any error can be treated as if the processes or subunit was externally killed or the machine's power switched off). As an alternative, many good programs use logging facilities instead.

The problem with handling them locally is that a low level routine does not know what to do about the condition, so usually the best it can do is clean up and return another error code. For example if a file is not found, the low level routine doesn't know if it should print a message, show a string in a status bar, pop up a dialog or just exit.

These kinds of issues are most often handled in good C programs by parameterizing components (as you noted). For example, if a subsystem should request the name of a file to open but should do so differently when run in batch mode, interactively at a terminal, or interactively in a GUI - the subsystem is often best designed to be configurable with the parent system's choice of file prompting routine. (To call this a "callback" is a bit misleading, in my way of speaking, because a callback is a function parameter associated not with a subsystem, but with some instance of a multiply-instantiated data structure.) Non-local exception handling for cases like this are usually overkill. For example, one does not often need to be able to dynamically shadow the handler function posted by some parent on the call stack.

There are some situations where none of the above habits of good C programs help and this brings me to point about why programming language design and implementation should be regarded as a basic skill:

In some kinds of program, the desired kinds of error handling and recovery fit none of these patterns. In some of those kinds of program, there is no clean way to express the desired error handling in C. What then?

Two example problem domains come to mind: error handling in a shift-reduce parser, and error handling in a system supervisory program that must robustly manage a large number of sub-processes that have complicated relations among themselves.

In each case, the traditional solution in unix / C-based environments is the same: to invent a domain specific language. Thus, for parsers there is YACC and for system supervisory tasks, /bin/sh.

Good C programs generally treat the problems of complex error recovery in the manner of the old joke: "Doctor, it hurts when I move my arm like this." "Well, patient, don't move your arm like that." Good C programs mostly just avoid the problem.

When the problem really, really can't be avoided, that's often an indication that the most viable solution is to build a new programming language atop C.

Thank you for this

This is EXACTLY the point I was trying to get across to the posters in the Go-nuts mailing list panic/recover thread, using a Parsing Expression Grammar language as an example of how to re-think the problem. [Edit: It's also the point made by DJB in the comments I made above about what reliability means.]

Sometimes a program is conceptually divided into sub-programs. One example might be an HTTP server in which we want to ensure that a catastrophic failure in the handling of one request does not force the abort of the concurrent handling of other requests. Classically good C style is often to handle this case by using sub-processes. That is why, today, so many programs for which latency is a concern maintain pools of ready-and-waiting sub-processes. Note that each such subprocess can simply abort, as if it were a simple command line program.

It's interesting how the Go people describe their solution to the same problem...

Maybe the advantage is nothing

We appear to be in furious agreement that what is most important is to design a coherent and appropriate (the key word here) error handling approach.

Small Unix utilities can safely use the "roll over and play dead" approach.

Other applications would not consider such behaviour appropriate, the power station controller, the stock broking system or the compiler that stops at the first error all significantly reduce the user experience by just exiting ;-).

I would propose the theory that good C programs meet reasonable reliability criteria because error handling is a significant part of their design. This is because the programmer recognises that it is part of the normal operation of the program, not somehow separate.

Where C assists is that it doesn't restrict what method (or even combination of methods) can be implemented and since the main mechanism (returning a code) is naturally part of the normal code flow. This minimalism of mechanisms also makes the programmer think about the overall implementation, since as has been observed, testing return values on every call is labour intensive and boring.

Whereas languages with built-in mechanisms may result in programmer default to that mechanism, even if it isn't the most appropriate. Thats something that might be testable with a supply of undergraduates and might be good to compare to experienced programmers too.

And C allows hiding that implementation (perhaps by DSL, perhaps by idiom) so that it is easy to use correctly and consistently, which adds considerably to the resulting quality.

On the specific mechanism of callbacks (using the term in the traditional Unix/C meaning of any function pointer parameter) one point that I failed to make is that it is also cheap compared to exceptions and many other non-local mechanisms, just a function call.

why no industrial successor to C?

What surprises me is that even in the large niche occupied by C (embedded software, large server programs, operating system kernels, drivers, implementing garbage collectors and higher-level runtime systems, ...) no popular successor appeared to C.

Of course, they have been several academical proposals (like BitC, Cyclone, perhaps even C++, and also intermediate languages like LLVM, C-- and others) but none has been really successful.

As a case in point, AFAIK, every hardware driver for Linux or other Unices is still coded in C; many HTTP servers also, etc.

Perhaps the strangest thing is that most C compilers are still coded in C (or perhaps C++, as in LLVM), even if C is not a good language for symbolic processing. (Xavier Leroy's proved CompCert is a brilliant academical exception, coded in Ocaml and proved with Coq).

And even if C is quite often used as an intermediate language (it is not very good for that) -even I am using it in GCC MELT, no major single C code generator library appeared.

Even more strangely, while C++ claims to be compatible with C -this being its leitmotiv for a long time- few software evolved incrementally from C to C++ (and coding in C++ new routines for big C systems, like e.g. the Linux kernel, the Xorg window server, several HTTP servers, is not widely done).

Also, with the major exception of C++, few popular languages endorsed C legacy by claiming incremental compatibility with it. I find strange that there have not been many popular C successors. Even more suprisingly, the C preprocessor did not evolve significantly.

On the other hand, current hardware made possible many language improvements (garbage collection, dynamic languages) that was difficult to use in the 1980s era computers.


C could die with multiprocessing....

I forgot to mention that perhaps C might die when many-cores (e.g. hundred-cores) processors would be routinely available in every desktop or laptop.

But I am not sure of that. Maybe (unfortunately and sadly) C will still be widely used in 2025.

Nobody has really tried

None of the efforts you list, with the notable exception of BitC, seriously set out to replace C. Walter Bright's work on D is also worth mention as a serious attempt, as is Go. Both D and Go have the merit of being simple, and the demerit of not attempting to bring modern PL technology to modern programming. Time will tell if that was a good call on Walter's and Rob's part (or, conversely, on mine).

The projects you name were research efforts whose primary objective was to explore PL design issues. It is possible for such projects to transition to commercial use, but it takes an overwhelming amount of effort, and it takes a commitment from key project participants to do the other 90% of the work. C++ made that transition in part because a large commercial effort was put behind it, and in part because Bjarne had the lattitude to put in the other 90% of the effort.

C is very well tuned for the niche that it occupies. Perhaps because none of the other efforts have built production compilers, none of them have shown performance that is really comparable to that of C.

BitC has had an industry transition path from the beginning, and it was always intended that we would build a serious commercial compiler for it. Whether that actually happens remains to be seen; unrelated events have certainly interfered at various times.

Even if successful, BitC won't replace C. There is a large body of C code out there that works just fine as-is, and a larger body that should be abandoned in place (just as there is in every other language). Also, the target has evolved. Where the original goal in 2003 was "safety with the performance of C", the majority of code today is written in Java/C#. This means that safety has been accepted, and pulling the C community into the 20th century [sic] may not be the right focus.

Within the strategizing group, the discussion about BitC has been mainly about how to be callable from C and how to transcode the short list of critical libraries into a safer and more analyzable language. Most of the C code we want to talk to is "mostly safe by idiom". It's probably cheaper to deal with that code in-place or convert it and just leave the rest to die. The real challenge is how to maintain something like OpenSSL in two languages simultaneously. We don't have a good answer for that yet.

In any case, if you want to look for a replacement for C, the candidates today are D, Go, and (perhaps) BitC.


As I am currently working in a Microsoft financed job verifying parts of the Windows kernel, I think that people interested in reliable C programs might also be interested in VCC.