Social Processes and Proofs of Theorems and Programs

A paper that was mentioned in the discussion forum, by Richard A. De Millo, Richard J. Lipton, Alan J. Perlis, 1979.
It is argued that formal verifications of programs, no matter how obtained, will not play the same key role in the development of computer science and software engineering as proofs do in mathematics. Furthermore, the absence of continuity, the inevitability of change, and the complexity of specification of significantly many real progarms make the formal verification process difficult to justify and manage. It is felt that ease of formal verification should not dominate program language deisgn.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The dialogue has indeed been going on for a while...

"Proofs of programs are too boring for the social process of mathematics to work." — Richard DeMillo, Richard Lipton, and Alan Perlis, 1979

"... So don't rely on social processes for verification." — David Dill, 1999

As quoted in Types and Programming Languages.

Two major holes in the argument

This is a pretty interesting and entertaining article.

I have much sympathy for criticisms of formal verification, particularly the concern that proofs only match programs to specifications, which means they are only useful if the specifications are easier to write and understand than the programs themselves.

Nonetheless, I see two major holes in the article:

1. "There is no reason to believe that a big verification can be the sum of many small verifications."

I disagree; to disregard this possibility is to ignore modularity and the separation of concerns, which are foundational concepts of programming.

2. "It is nothing but symbol chauvinism that makes computer scientists think that our structures are so much more important than material structures that (a) they should be perfect, and (b) the energy necessary to make them perfect should be expended."

I am opposed to symbol chauvinism; translating a program from source code into Greek letters does not increase its correctness. But there is another reason to want correctness rather than reliability: computer security. It is here that the analogy to engineers building bridges breaks down. Computer programs are subject to randomness (e.g. random errors in input data) just as bridges are subject to weathering, and in such cases probabilistic arguments are useful. But computer programs are also subject to deliberate attack in a way that bridges seldom are. If a terrorist blows up a bridge, no one blames the bridge designer. But because of the ubiquity of computers, the low cost of attack, and the high incentive to attack, we need programs to be impervious to attack.

Indeed, with our ever-increasing dependence on computers there are correspondingly ever-increasing opportunities for financial and political gain from attacking them. Constructing correct programs out of correct components with clearly defined boundaries of responsibility may be our only hope.

<scooby_do>Hruh?</scooby_do>

ping, quoting "Social Processes and Proofs of Theorems and Programs:" 1. "There is no reason to believe that a big verification can be the sum of many small verifications."

All by itself, to me, this makes the entire paper of extremely questionable value. My immediate reaction to it is given in the subject. Upon further reflection, it occurs to me that the only reasonable way that people as intelligent and experienced as the authors could adopt such a position is, in fact, given in the title: "Social Processes..." but the means of combining the small verifications in the post-Curry-DeBruijn-Howard-Isomorphism era isn't "social" in the sense of "informal;" it's "social" in the sense of the Jim Horning quote that I used in the Monad Reader thread: "A Proof is a repeatable experiment in persuasion." In the context of automated theorem proving, we absolutely expect to be able to proceed by ensuring that our "small verifications" are indeed composable into large ones. Thankfully, this is essentially guaranteed by any proof assistant from Automath on, so I am forced to the conclusion that the critique refers exclusively to the mental and paper process of verification... which, in turn, leads to observations like David Dill's.

"It is felt that ease of formal verification should not dominate program language deisgn" is a perfectly reasonable conclusion from a couple of points of view: one, given that "formal verification" means someone slaving over a paper notepad for a very long time; and two, that "ease" of such verification really should be irrelevant to the design (if it's hard work to verify a useful programming language, tough). But if a new programming language is going to solve tough problems of security and concurrency, it seems to me that formal verification by automated means is likely to be not only helpful, but crucial.

Reliability vs security ?

I'm not quite sure about your distinction between reliability and security? Your arguments become metaphorical [1]. From the viewpoint of an isolated system ( e.g. a server ) running software also an attack is a random input. If hosting programs is an issue a shared-nothing message passing approach reduces the problem of security mostly to one of reliability ( including surveillance, logging, recovery etc. ). Of course there must be secure and encrypted transactions and save key storages. But this implies also secure hardware and covers a very small set of the overall system.

[1] IMO the bridge metaphor fails for a simple reason. A bridge in our material reality is always a kind of infrastructure bottleneck. If a bridge can be splitted into myriads of parallel independent pathways a terrorist attack on it causes only minor damage. Indeed we would expect self healing becoming a mayor effort in dealing with both attacks and weathering in that case.

Reliability, security, privacy, secrecy, etc.

From the viewpoint of an isolated system ( e.g. a server ) running software also an attack is a random input.

That's not a particularly rich adversary model.

Using a CRC will defend against (most) random input perturbations, but won't do a lick of good against a malicious adversary. I'm sure an example a bit closer to PL and system design can be imagined with just a bit of thought that will show a clear separation between random inputs and malicious attackers.

Of course all of this leads to a related thought that comes back to the big/small verification argument, and that is the role of crypto, etc. in the building of "secure" programs. Indeed there is astonishingly little proven about cryptography where the rubber meets the road, and there isn't even that much more out there proven even to the extent of if AES is a secure block cipher than the following properties hold for this protocol. But even where there is, you still need to rely on your social processes and assembly of small verifications to create any real sense of verifiable security in your product.

True enough...

...but even here there's been remarkable progress, albeit much of it very recent:

And no doubt many others, but these endeavors seem quite promising and intriguing, e.g. for "Verification of cryptographic algorithms," we find:

We have pursued our work on the formalization of the generic model (GM) and the random oracle model (ROM) in COQ. The aim of this work is to verify the correctness of cryptographic algorithms, without making the perfect cryptography assumption.

In [11], we have shown an upper bound to the probability of a non-interactive adversary (that adheres to the GM) to find a certain secret. We have also extended this result to the case of an interactive adversary (that adheres to the GM and to the ROM), and are currently considering the case of parallel attacks.

QuickCheck in Haskell Syntax

I've always thought that maybe languages should have proving of individual functions as part of the syntax. I don't see why a function that is provably wrong should compile. By wrong I mean ``can break under automated test''. Where passing can also translate to better security, et al. (Since testing a string func can mean giving it a char * that isn't terminated, for example.)
I also feel IDEs should also do more than just tell me that strlen actually, believe it or not, takes a char *. Maybe they should be subjecting functions to torture, to prove them for correctness, rather than completing `int' for me.

But are these even practical? Or do they exist, and I am just not looking?

-- EDITED!!! See reply below.
-- Sample Haskell where the compiler (some stage, that is) taketh care of testing
-- 'Tis as easy as type notation, and maybe could be inferred.
-- It is ugly, because I'm coming up with it on the fly. :o)

{-TEST::  [..0] -> 1, 5 -> 120, [6 ..] -> > 120 -}
fac :: Int -> Int
fac x | x < 1     = 1
      | otherwise = x * fac (x - 1)

-- Of course fac is undefined for < 1, but I am not trying to write 
-- a correct fac for the (n + 1)th time. Just to show something.
-- So much for correctness tests ...

PS: I don't mind five-hour compiles for `Hello, World!' as longs as my a.out won't be wrong. And if the compiler has the --make flag, which considers testing as part of making ... ;o)

Next question, how do you test for a correct return of an `undefined' value?

You could use the Maybe

You could use the Maybe type, but then, it'd probably be handier to use a more specific type, ie. and integer x such that x > 0.

Oh, and your function returns 0 for all inputs ;-)

Oh, Blinkenlichten!

Damn. I am correcting it ...
So much for automated testing. At least the compiler should have told me if I had compiled (the hypothetical one I talked of).
For those of you who will miss the typo, I hade retuned 0 in the case that x < 1.
Funny, I had bragged about how many correct factorial functions I have written. :oD

factorial

Just so you know, the factorial of 0 is actually 1, so your version is still incorrect.

edit: Whoops, just realized you defined it as < 1, not = 1. Nevermind :)

There are two...

...languages that do this that I'm aware of. I wrote about Felix 1.1.3 here. It's interesting because you get both Why output that you can feed to Why and any of its supported back ends (including Coq) or feed directly into Ergo, since Ergo accepts Why syntax; and a "check" statement that will automatically generate test cases for the axioms. Pretty cool.

The other is Concoqtion, where MetaOCaml and Coq are married and, at least in theory, you get the best of both worlds. As J Storrs Hall wrote:

This will enable me to fulfill the lifetime dream of writing

Qed.

as a valid statement in my program :^)

Automated testing integrated into the language

You could always peek in the direction of Eiffel, which had Hoare triples built into the language from the very outset. You can use the contracts as test oracles for automated random testing via AutoTest, which provides various test data generation strategies (random selection, genetic algorithms, etc.) and tries to build minimal test scripts for test failures (and then folds those back in to the set of unit tests for regression testing). Better yes, there are plans afoot to integrate AutoTest into EiffelStudio, which will give you push button support for torture testing code directly from the IDE.

The biggest hole in Eiffels toolkit, to my mind, is the lack of extended static checking that makes use of the contracts. There are projects for that too, in particular ES-Verify as part of ESpec, but they're still in their infancy. What I'm looking for is something on par with ESC/Java2 combined with the ESC/Java eclipse plugin.

"Proof as program" analogy & fitting the formal to intuition

The authors make this analogy (fig. 1&2, p. 275) in a manner that I think reveals what is wrong with this common popularisation of the formulae-as-types correspondence.

I think that the right analogy for the authors to have made is that grasping why a program does what it is supposed to is analogous to seeing the truth of a theorem.

The other major error in the paper is a complete failure to see the potential to devise formal frameworks that are a better fit for human intuition than the kind of first-order Hilbert systems they effectively skewer in the paper. There is a great deal of opportunity for improvement in these systems, but I don't know of any theoretical obstacle to making computer assisted formal proof assistants as productive an environment as pen & paper/whiteboard informal rigour. Whether such obstacles exist or not, it seems to me, is the key to the whole question.

I remember having a nice

I remember having a nice chat with Lipton after he gave a talk on the subject lo these many years ago (when I was a grad student). I made the tongue-in-cheek observation that there wasn't really a better way of specifying the requirements formally than simply to write the desired program in APL. I was a bit surprised when he agreed, perhaps also with tongue in cheek.

It seemed to me then that if one has the technology to write formal specifications at an abstract enough level that you're more likely to get them right than the program itself, you should in principle be able to implement a language at that level and let the specs be the program. I still think there's a lot of unrealized potential to the idea.

The main problem though, and one where I think DeMillo, Lipton, & Perlis still have a valuable message, is that the hard part is getting from intuitive understanding to a formal specification in the first place. The average kid in a math class finds "word problems" harder than long division. It's a phenomenon I call "formalist float" in this book and offer as one major reason AI has had such rough going for the past 50 years.

The problem with trying not to rely on social processes is that that's where the specs come from in the first place. It's worth giving at least a little thought to mechanizing them in a heuristic rather than a formal way.

Josh

Exactly right!

J Storrs Hall: It seemed to me then that if one has the technology to write formal specifications at an abstract enough level that you're more likely to get them right than the program itself, you should in principle be able to implement a language at that level and let the specs be the program. I still think there's a lot of unrealized potential to the idea.

You've hit the same nail on the head that Curtis W did in this thread. We can perhaps summarize the question as "Why can't we write executable specifications?" Well, why indeed! As I attempted to explain there, thanks to constructive logic and the Curry-DeBruijn-Howard Isomorphism, a "proof assistant" such as Coq really is a programming language. It's just a quite impractical one along a number of dimensions, e.g. it has no I/O capacity of its own. It's also even more austere than Haskell, lacking the "unsafe______" family of operators, the IO monad, etc. Still, Adam Chlipala isn't sure that these observations necessarily disqualify Coq from being practical for dependently-typed programming as-is.

However, as you may recall, Concoqtion marries Coq with MetaOCaml to arrive at a much nicer "blending" of formal specification and "reasonable" programming language. I haven't yet installed it, but perhaps it's time to.

This also seems related to the thread on Total Functional Programming.

Right, so far as it goes

Though I think a problem with very powerful logics such as the the calculus of inductive constructions, is that the proofs are quite apt to correspond to truly, madly, deeply infeasible programs.

The work on executable specifications coming out of the "proof search as computation" crowd (who mostly have more sympathy for logic programming than for functional programming), I think tells a more compelling story, with frameworks such as LLF (linear LF) and its descendants, and the OBJ/Maude family.

Good Point

With Coq, I get the impression that you do need to take some care in choosing between relying on "proofs-as-programs"—that is, extraction of a program developed as a proof—and merely "writing code about which desirable things have been proven." I think this supports your point, because, after all, why should such a distinction exist? In fact, conceptually, the only distinction is how much computation vs. how much logic is being done.

It's interesting that you mention OBJ/Maude; Maude is another tool that I keep meaning to investigate, particularly given that the Open Calculus of Constructions has been preliminarily implemented in it.

But for Coq users like me, there's even better news in Building Decision Procedures in the Calculus of Inductive Constructions:

The (intuitionist) logic on which Coq is based is the Calculus of Constructions (CC) of Coquand and Huet [9], an impredicative type theory incorporating polymorphism, dependent types and type constructors. As other logics, CC enjoys a computation mechanism called cut-elimination, which is nothing but the β-reduction rule of the underlying λ-calculus...

The traditional view that computations coincide with β-reductions suffers several drawbacks. A methodological one is that the user must encode other forms of computations as deductions, which is usually done by using appropriate, complex tactics. A practical consequence is that proofs become much larger than necessary, up to a point that they cannot be type-checked anymore...

Our main contribution is the definition and the meta-theoretical investigation of the Calculus of Congruent Inductive Constructions (CCIC), which incorporates arbitrary first-order theories for which entailment is decidable into deductions via an abstract conversion rule of the calculus. A major technical innovation of this work lies in the computation mechanism: goals are sent to the decision procedure together with the set of user hypotheses available from the current context. Our main result shows that this extension of CIC does not compromise its main properties: confluency, strong normalization, coherence and decidability of proof-checking are all preserved. As a second key contribution, we show how a goal to be proved in the Calculus of Inductive Constructions can actually be transformed into a goal in a decidable first-order theory. Based on this transformation, we are currently developing a new version of Coq implementing this calculus.

This sounds pretty interesting!

OCC - An Open Calculus of Constructions

Maude is another tool that I keep meaning to investigate, particularly given that the Open Calculus of Constructions has been preliminarily implemented in it.

Oh, good call! For me, this is the better of the two pieces of news you provide. Mark-Oliver Stehr did important work in his PhD thesis on reconciling logical frameworks based on LF with Maude, which is based on rewriting logic. Now I see he is tackling another important family, that based on dependently-typed generalisations of system F.

It's gratifying...

...that you found that post helpful!

I neglected to note a crucial connection that's noteworthy, however. From the Open Calculus of Constructions link:

Also, the idea of overcoming these limitations using some combination of membership equational logic with the calculus of constructions has been suggested as a long-term goal by Jouannaud 1998.

Jouannaud is, of course, the second author of the CCIC paper that I linked to, and 1998 was nine years ago, so I guess "long-term" is 9-10 years. :-) Given this connection, the OCC and CCIC are, in some sense, aware of each other and tackling the same issues. It will be fascinating to see how they relate to each other, both formally and pragmatically.

Quite!

I neglected to note a crucial connection that's noteworthy, however...

Do I hear the sound of walls that come tumbling down?

Wholeheartedly agreed...

I believe that we both do, and we're in good company. From Twenty Years Later (PDF):

My own perspective on this question is that the coming years will see a new generation of proof assistants, in which (higher-order) rewriting superseeds the lambda calculus. Isabelle is the first prover of this kind, but has lost many of the important features of Curry-Howard based calculi. I anticipate both approaches to merge in the coming years, and some work has been done already [9, 11]. An other merge is coming as well: dependent types are making their way in programming languages [55], while modules and functors have been successfully added to the calculus of inductive constructions [18] as well as a compiler for reductions [29, 30]. This move towards harmony will make its way through in the coming years. I do not see a good reason in the present dichotomy between programming languages and proof assistants.

It seems to me that meditating on Maude/OCC, Coq/CCIC, and Epigram will greatly inform the likely future of programming and theorem proving.

Update: Unfortunately for me, OCC/Maude appears not to have been updated since 2004. It's based on an experimental binary of Maude 2.0, which is made available for Linux. Whatever the experimenal features are, they have apparently not made it into Maude 2.3—I tried loading occ.maude into Maude 2.3 on my Mac OS X PowerPC box and it wasn't pretty. :-) So I continue to believe that we'll see this convergence of the lambda calculus and rewriting... but I believe we'll see it in Coq first.

Optimization

Well, people have been working on deductive program synthesis since the Manna and Waldinger days. And it's certainly looking like the field is poised to make a significant stride towards usability.

I think, though, that DPS, and PLT generally, are held back by a lack of appreciation for the value of elegant design of useful datatypes -- ones with wide applicability, clean and consistent semantics, and a well-matched set of primitive operations. PLT has generally punted on the issue, tossing it over the fence in the form of objects and other facilities for applications programmers to roll their own.

But your average app. prog. is not a particularly good designer of clean mathematical structures, and as a result most programs are a mass of special cases. This forms a major snag to a synthesis system (as well as making the job of the human programmer a lot harder).

If one wants total functional programming, for example, rather than trying to decide which number to assign as the result of dividing by zero, one could go to a semantics where each operation had a solution set, and that for 1/0 would be {}. (It also gives you a semantics with a nondeterministic flavor.) Like a naive Prolog program, it might not be efficient to execute directly, but it could form the basis of a formal specification that a synthesis system could turn into a working program.

The key to doing this lies in having the inferencing system think not only about correctness but efficiency -- something I don't think has seen a lot of work yet.

Josh

Another Good Point...

...but here's a start!

perpetuum mobile

In comment-33584, J Storrs Hall wrote:

It seemed to me then that if one has the technology to write formal specifications at an abstract enough level that you're more likely to get them right than the program itself, you should in principle be able to implement a language at that level and let the specs be the program. I still think there's a lot of unrealized potential to the idea.

Also sprach Joel:

... this is the software engineering equivalent of a perpetual motion machine. It's one of those things that crackpots keep trying to do, no matter how much you tell them it could never work. If the spec defines precisely what a program will do, with enough detail that it can be used to generate the program itself, this just begs the question: how do you write the spec? Such a complete spec is just as hard to write as the underlying computer program, because just as many details have to be answered by spec writer as the programmer. To use terminology from information theory: the spec needs just as many bits of Shannon entropy as the computer program itself would have. Each bit of entropy is a decision taken by the spec-writer or the programmer.

So, the bottom line is that if there really were a mechanical way to prove things about the correctness of a program, all you'd be able to prove is whether that program is identical to some other program that must contain the same amount of entropy as the first program, otherwise some of the behaviors are going to be undefined, and thus unproven. So now the spec writing is just as hard as writing a program, and all you've done is moved one problem from over here to over there, and accomplished nothing whatsoever.

(Emphasis mine — V.N.)

Personally, I think you left

Personally, I think you left out the most important part of Joel's essay:

Currently, in the battle between the geeks and the suits, the suits are winning, because they control the budget, and honestly, I don’t know if that’s such a bad thing. The suits recognize that there are diminishing returns to fixing bugs. Once the software hits a certain level of quality that allows it to solve someone’s problem, that person will pay for it and derive benefit out of it.

"Diminishing returns," by definition, takes into account the cost of fixing bugs. The discussions that take place here about the value of static type systems, theorem provers, etc. all take place in the context of such tools becoming vastly cheaper to use over time, and expanding the contexts in which they are used. Indeed, one of the major points of dwelling on the Curry-Howard Isomorphism is precisely to leverage the insight that a static type system is a theorem prover by definition, so why not take advantage of it by implementing type systems that express more powerful theorems while still being tractable (or not—the debate over total vs. partial functions on types in dependent type systems rages on)?

Joel writes compellingly on the amount of work that must be done in order to develop proofs in Dynamic Logic even for trivial theorems, and how error-prone that process is. Thankfully, he also reminds us that that was 16 years ago. The field hasn't stood still. Today we have a wide range of excellent proof assistant tools that dramatically reduce both the effort of developing proofs and the error rate in the proofs themselves. We even have provers whose core calculus has itself been proven correct, making one's confidence in the correctness of its proofs extremely high.

Finally, much work has been done on the automated verification of imperative code; please see the Why platform and its related tools, Caduceus and Krakatoa, for the verification of C and Java code, respectively, using a wide variety of both automated provers and proof assistants. Caduceus and Krakatoa are especially interesting, as they proceed from C or Java source code, respectively, annotated with JML(-inspired, in Caduceus' case), generating Why code which can then be verified with any of the provers supported by Why.

So while there is still quite a bit of progress to be made, the situation is rather obviously not as bleak as Joel paints it based on the state of the art 16 years ago. The gap between what can be assured using a mainstream programming language today and what can be assured by provers at the current state of the art is enormous, with the cost of applying such provers decreasing every year and a fairly obvious inflection point occurring when the "prover" is a relatively rich type system in your chosen programming language, or at least relatively expressive annotations added to a mainstream programming language like C or Java. Particularly when you consider the latter examples, it becomes quite clear that "So now the spec writing is just as hard as writing a program, and all you've done is moved one problem from over here to over there, and accomplished nothing whatsoever" is turned on its head: the program writing is at least as hard (that is, has as much or more entropy) as writing a spec, and all you've done is amortized the cost of reducing the entropy. If you're lucky, you've amortized the cost sufficiently to survive and even make a profit, and if the alternative is months or years of pencil-and-paper proving, then that is indeed not a difficult call to make. If, on the other hand, verifying the Schorr-Waite algorithm in C (PostScript), including:

  • No null-pointer dereferencing
  • No out-of-bounds array accessing
  • No dereferencing of pointers that do not point to a regularly-allocated block
  • Everything outside the graph structure is unchanged
  • The implementation terminates
  • The implementation is correct

takes five weeks, doesn't that seem like it's worth it, relative to attempting to ensure all of that by testing and debugging?

No Silver Bullet

As I read it, part of your argument is that though the essential complexity cannot be eliminated by choosing a better language (prover/specification tool/etc.), there is a big deal of accidental complexity suspected to exist in the programs written in the current languages of choice (where both complexities are as defined by Fred Brooks in No Silver Bullet).

The other part seems to be empirical evidence for this suspicion.

Well Put

I think that's exactly it: a formal spec, once you identify an appropriate logic in which to express the spec, is vastly more succinct than the implementation code is likely to be. This has, it seems to me, two implications: one is indeed that an "executable spec" is likely to encounter all kinds of integration and deployment issues in practice, and these are real costs that matter. The other, however, is that it's far less costly to evolve a concise formal specification than it is to evolve (informal) code. The case study of the Waite-Schorr algorithm in C is fascinating precisely because it addresses both issues: the implementation is in C, but annotations in the form of comments express constraints on the implementation that C alone cannot express, and these constraints were proven to hold in five weeks, including some false starts in attempting to use a fully-automated prover and the usual evolution of the spec as lessons were learned. Five weeks does indeed sound like a very practical amount of time relative to the level of testing and debugging that would have been required instead, and the result has the benefits of being a proof: if the implementation fails, it will be due to bugs in the C compiler, runtime, hardware... or, vastly less likely but still not impossible, bugs in Caduceus, Why, or Coq. But one should absolutely look at the C compiler, runtime, and the hardware it's running on first, should issues arise.

The right tools can

The right tools can potentially allow us to go from executable spec to an efficient implementation that's known to meet the spec by way of consistent refactoring, too. If UI code weren't so tedious to write I'd have a crack at a basic tool for that kind of thing, albeit probably for a much simpler language initially - I've ended up wanting it for tasks like optimising a clear, deeply-embedded implementation of a DSL into a fast, shallowly-embedded one.

The right tools probably can

The right tools probably can allow us to simplify writing UI code?
Along the lines of writing the program for writing the program...

Definitely. The main problem

Definitely. The main problem with this one's that I'd need a custom widget for structure editing though - thankfully I've just been offered some code to play with! If everyone feels free to prod me about this once in a while I'll see what I can do, odds are an initial proof of concept'll be for a simple toy language though.

Agreed

One of the reasons that I think FRP programming for GUI work is compelling is precisely that it makes GUI software engineering amenable to the same algebraic reasoning that we FP types like to apply to other arenas.

The Silver Bullet - how the werewolve survived

I tend to think the distinction between accidental and essential complexity is purely rhetorical. People love to reduce complexity and since it can't be eliminated from programs ( No Silver Bullet ) a new binary distinction is invented that serves the purpose of preserving everything as it is ( the search for the Silver Bullet ) while obtaining it attacks just the "accidental complexity" ( one side of the distinction ).

Falsified

This observation would seem to me to be trivially falsified by the existence of programming languages that are Turing equivalent, but in one language, you can prove the absence of undesirable properties ahead of execution (by applying a theorem prover, sufficiently-expressive static type system, abstract interpretation...) and in the other you cannot.

More concretely, consider what we can confidently claim about the Schorr-Waite graph-marking implementation in C without the verification process referred to earlier in the thread, and what we can claim having done the verification. Without the verification, I claim that accidental complexity prevents us from doing anything more than ascribing a subjective probability that the implementation has the properties that we want and lacks the properties that we don't want to it. A human being who looks at that code and says "I guarantee that..." and rattles off the properties proven by the verification is either a pathological liar or delusional, and should be dismissed in either case. If they say "I believe that..." and rattle off the properties, then you need to explore why they hold that belief, and the extent to which you trust the code based on your belief in their belief.

With the verification, you have employed a tool, Why, that has a proven-correct verification condition generator; Coq, which satisfies the de Bruijn criterion for trustworthiness of its proof-checking kernel; and Caduceus, which might be said to satisfy the Hoare criterion for correctness. The process has reduced the accidental complexity to the essential complexity through a combination of analyzing the code and extracting only the essential complexity from it, and from analyzing the annotations that, in addition to the code itself, make the intention (or "essence") of the implementation explicit.

The result is that you now have an implementation, in C, of a complex algorithm that would be worthy of placing into the embedded system of an automobile, spacecraft, hospital diagnostic system, etc. vs. one that would not—even though they're the same implementation. If that doesn't clarify the distinction between dealing with accidental and essential complexity, then I don't know what does.

Maybe a possibly wrong

Maybe a possibly wrong algorithm with a very low error probability is much simpler than an algorithm that is provably correct? Maybe one can build systems by just using Genetic Programming breeders and each kid will do so in the future? Can you exclude that a priori from your considerations? Or an implementation of e.g. some sorting algorithm has rich details and is more complex than a simple and naive quicksort but has better properties on mildy preordered sets of numbers. What is the guideline to follow?

This complexity mumbo jumbo smacks somewhat like ID theorist pseudo-science ( ID theorists use the notion of "irreducible" instead of "essential" complexity to argue that evolution cannot happen ).

On the other hand there is an established concept of "algorithmic complexity" in theoretical computing science measured as the shortest length of a program P in some fixed language L that can produce a certain output O. While this concept is meaningfull and precisely definable algorithmic complexity is not generally computable and has little practical relevance.

Note that I don't argue against theorem provers, static typesystems, constraint satisfaction solvers and whatsoever. They don't need a defense using vaguely defined and intuitively appealing notions that don't lead to any measurable consequence. When correctness is king you have little choices even though the found solution might not be the most simple one.

Rhetorical or Tautological?

As I cannot find anything to disagree with :-) I have to wonder if perhaps a preferable term to "rhetorical" in your original post might not be "tautological," that is, we simply define "accidental complexity" vs. "essential complexity" to be "that which cannot be addressed formally" vs. "that which can be addressed formally." As you say, if lives depend on it, one wants the most ironclad guarantees possible—but these are not necessarily the most practical to obtain, even with excellent provers, checkers, etc. I wholeheartedly agree that probabilistic algorithms, genetic algorithms, etc. are worthwhile and important, and don't intend to give them short shrift. My only objection, such as it is, was to the notion that the concepts of essential vs. accidental complexity are a matter of semantic legerdemain only—a claim that it no longer seems to me that you are making.

It's perhaps worth

It's perhaps worth remembering that in general we cannot be sure that we've removed all accidental complexity - only that what complexity we have removed can't have been essential?

Fractal Intuition

Maybe to add more mumbo-jumbo, but my intuition of complexity is like follows:
the (idealized) problem at hand is like a fractal, but its solution is usually built from a finite number of non-fractal elements. Inevitably, the solution will never exactly match the problem. By choosing this or that alphabet of elements from which to build the solution, and the number of elements used one is able to reduce this remainder to some extent, but almost never to zero (barring the rare cases when elements themselves exactly match the problem).
So accidental complexity becomes not a number, but more like a function from the alphabet of the solution and the tolerable discrepancy between the problem and the solution.
Essential complexity, on the other hand, can be thought as a number.

treatise of human nature

In comment-39047, Paul wrote:

Without the verification, I claim that accidental complexity prevents us from doing anything more than ascribing a subjective probability that the implementation has the properties that we want and lacks the properties that we don't want to it.

I'd hesitate to fault my fellow hackers for not being able to do any better than that. After all, it's human nature if Hume is to be taken seriously:

There is no Algebraist nor Mathematician so expert in his science, as to place entire confidence in any truth immediately upon his discovery of it, or regard it as any thing, but a mere probability. Every time he runs over his proofs, his confidence encreases; but still more by the approbation of his friends; and is rais'd to its utmost perfection by the universal ascent and applauses of the learned world.

The much ballyhooed case study of C source verification (by Thierry Hubert and Claude Marché) suffers, as they all do, from Frege's bane, as pointed out by Ádám Darvas and Peter Müller:

The drawback of this approach is the lack of consistency proof for definitions and axioms given on the source or prover level. This might also lead to soundness issues.

But that's beside the point. It's abundantly clear from his perpetual motion machine analogy that Joel chooses to argue his point from first principles. His argument is in no way based on his traumatic encounter with Dynamic Logic. As Anton pointed out, how you intepret Joel's claim depends on how you define your terms. As a f'rinstance, consider the following two specs:

  • For any natural N, this program computes a decimal representation of π to the Nth place.
  • This program directs a robotic arm to tie Ashley's stopper knot.

Joel holds it to be self-evident that you can't automatically derive programs from these perfectly good specs.

Now, you can try to refine the specs to a point where it becomes possible to derive programs from them automatically. Joel further holds it to be self-evident that such a refinement requires the same amount of skill as writing the program the current unenlightened way. Basically, his point is that programming cannot be deskilled.

Oh, and speaking of the most important parts of Joel's essay. The most important part is this:

... I got to see what most computer programmers do for a living. It's this scary thing called "in house software." It's terrifying.

The significance of this observation becomes clear if you consider, as Perlis et al. pointed out, that

C.A.R.Hoare has been quoted as saying, "In many applications, algorithm plays almost no role, and certainly presents almost no problem." (We wish we could report that he thereupon threw up his hands and abandoned verification, but no such luck.)

In other words,

there is no continuity between the world of [Schorr-Waite] and the world of production software, billing systems that write real bills, scheduling systems that schedule real events, ticketing systems that issue real tickets.

Simple as that.

there is no continuity

there is no continuity between the world of [Schorr-Waite] and the world of production software, billing systems that write real bills, scheduling systems that schedule real events, ticketing systems that issue real tickets.

I don't see how this can be justified. Every system has properties and invariants we would like maintained. Accounting has it's equations, ticketing systems have workflow and access control properties, scheduling systems have deadlines that can be missed and those that cannot, etc. Verification, logics, or types systems in which we can encode such properties and reason about them are thus invaluable.

For any natural N, this program computes a decimal representation of π to the Nth place.

For a system with no a priori notion of π, its definition may very well be the implementation of the spec. I agree that at its core, programming cannot be deskilled, but there's lots of crust before reaching that core. An interesting question is how much of this crust is actually necessary, and how much is simply due to a lack of expressiveness.

...except hot bakery items

In comment-39054, naasking wrote:

Every system has properties and invariants we would like maintained. Accounting has it's [sic] equations, ticketing systems have workflow and access control properties, scheduling systems have deadlines that can be missed and those that cannot, etc.

I happen to work on a ticketing system (lots of code, many hundred man-months of effort) and my experience confirms what DeMillo/Lipton/Perlis are saying: There is no continuity between the world of 10KLOC programs and the world of production software.

To lift a quote from Whither Verified Software? by Ramesh Bharadwaj:

Consider a program that is required to compute the sales tax associated with a sale: the correct tax rate varies with the location of the sale (which may not necessarily be the location of the computer on which the program is run), the sales tax to be levied at that location, and all applicable legislation(s) pertaining to the transaction. For example, California law provides for the exemption of sales tax on food products subject to the following restrictions:

Sales of food for human consumption are generally exempt from tax unless sold in a heated condition except hot bakery items or hot beverages, such as coffee, sold for a separate price, served as meals, consumed at or on the seller's facilities, ordinarily sold for consumption on or near the seller's parking facility, or sold for consumption where there is an admission charge.

It is inconceivable that the above conditions and restrictions could be specied precisely in the form of program assertions. For example, how does one formalize notions such as except hot bakery items "or near the seller's parking facility?"

So, yes, "accounting has its equations" but "hot bakery items" are a whole different game.

I happen to work on a

I happen to work on a ticketing system (lots of code, many hundred man-months of effort) and my experience confirms what DeMillo/Lipton/Perlis are saying: There is no continuity between the world of 10KLOC programs and the world of production software.

Out of curiousity, what language was used to develop it? Without an expressive language and type system, I doubt very much that useful properties could be encoded. It would thus be no surprise at all that such properties would be difficult to see, extract, and encode.

So, yes, "accounting has its equations" but "hot bakery items" are a whole different game.

Yes, there must a return on your investment, but there is much that can be done in the meantime. "Except hot bakery items" seems a perfect fit for qualified types. Spatial data does indeed seem challenging to verify, particularly a vague notion such as "near". If we could be more concrete and say "within a radius of 5 miles", then we could create a datatype indexed by a peano numeral representing the distance, and use it to ensure the proper codes are used.

It certainly gets more challenging the more specific the constraints become, and return on investment between verification and testing as small a TCB as possible must be considered. While "everything" may be beyond reach for the forseeable future, it does not make these techniques useless!

Whistling in the Dark

el-vadimo: I'd hesitate to fault my fellow hackers for not being able to do any better than [ascribing a subjective probability that the implementation has the properties that we want and lacks the properties that we don't want to it].

So would I, were it not clear that we can all do much better than that.

el-vadimo, quoting Ádám Darvas and Peter Müller: The drawback of this approach is the lack of consistency proof for definitions and axioms given on the source or prover level. This might also lead to soundness issues.

First of all, I'm extremely distrustful of a quote that begins with a sentence saying "this approach" without its surrounding context. A citation or hyperlink would be quite helpful. Having said that, the concern about consistency proofs is a rather general one. It was, in fact, explicitly addressed in the comment that you replied to: Why's verification condition generator has been proven correct. At least one of Why's supported proof assistants, Coq, satisfies the de Bruijn criterion: its proof checker relies on a kernel that is small and simple enough to itself be proven, and in fact Coq's underlying Calculus of Constructions has been implemented and checked in Coq itself. In a nutshell, there are no credible claims of unsoundness in Coq proofs, and since Why's verification condition generator has been proven correct (with Coq), there are no credible claims of unsoundness in Why's verification conditions. No one—not even your beloved Perlis et al.—is claiming that Hoare Logic is unsound.

What that leaves, then, is the observation that the specification itself must be correct. But that's a tautology; of course the specification must be correct if the implementation is to have any hope of being correct, whether you prove anything formally or not.

el-vadimo: As Anton pointed out, how you intepret Joel's claim depends on how you define your terms.

I'm glad that you provided the link this time, so that anyone who follows it can see that, in fact, Anton's comment doesn't at all support your claims. It's very characteristically Anton, I think: striking a pragmatic middle ground between my admittedly idealistic slant and what I interpret as your dismissive opinion based on Perlis et al.'s 29-year-out-of-date critique and, I have to say, apparent unfamiliarity with the work on automated theorem proving since then.

el-vadimo: As a f'rinstance, consider the following two specs:

  • For any natural N, this program computes a decimal representation of Ï€ to the Nth place.
  • This program directs a robotic arm to tie Ashley's stopper knot.

Joel holds it to be self-evident that you can't automatically derive programs from these perfectly good specs.

But anyone familiar with even rudimentary computer science knows that these are, in fact, not "perfectly good specs." They're very definitely the kinds of specs that people start with at the first stage of whiteboarding an implementation, but formal proof or not, they will both be elaborated on and made precise enough to be implemented. The first can, of course, be done with enough precision to support a formal proof of correctness using tools like Caduceus, Why, and Ergo or Coq in less than an hour for someone familiar with the tools; the second would take weeks or months for expert robotics researchers, again, with or without formal proofs—once again, the question is whether the costs of constructing formal proofs is prohibitive or not, and my thesis is that for someone fluent in the appropriate tools, the costs are not prohibitive for a vastly larger class of endeavors than seems to be apparent to much of the industry, and even many people here on LtU.

In any case, you're arguing against a claim that no one here has made: that one can automatically and scalably derive programs from formal specifications. My only thesis is that it's far cheaper to arrive at high assurance software, loosely defined as "if it fails, someone dies, than is generally appreciated—so much so that it's not obvious to me that "high assurance software" is a legitimately distinct category from software in general for reasons other than ignorance. More specifically, while it is obviously economically acceptable to continue on what I perceive the current path to be, if (as I believe) the additional cost of producing high assurance software in a significant range of domains is ε, why not spend the additional ε and establish yourself as a purveyor of software with at least an order of magnitude higher quality than the competition?

The remainder of your quotes from Joel and Perlis et al. about algorithms playing almost no role, "in-house software," and there being no continuity between FIND and the world of production software are merely comical. Algorithms aren't the only verifiable things for the obvious reason that there's no difference between algorithms and data structures formally. As another poster pointed out, "production software" also has preconditions, postconditions, and invariants that must hold in order to function properly.

It's true that it wouldn't be practical to tackle these issues using pencil and paper, as Hoare did with FIND. So perhaps it's time for you to move on from a 29-year-old paper critiquing that effort and familiarize yourself with the 20th- and 21st-century technology that is being brought to bear on them.

Regarding essential complexity

What that leaves, then, is the observation that the specification itself must be correct. But that's a tautology; of course the specification must be correct if the implementation is to have any hope of being correct, whether you prove anything formally or not.

I think "essential complexity" in the No Silver Bullet sense often refers to exactly this tautology. In fact, Daniel Berry has made precisely this point in The Inevitable Pain of Software Development.

I am not against any kind of formal verification and of course I believe we can give engineers better, safer and more expressive tools. But I also agree with el-vadimo (channeling Joel) that software development can't be deskilled. Any spec precise enough for verification or automated implementation will be the product of a skilled engineering process which itself has the potential to introduce bugs. I see no way around this fundamental limit, and honestly, I'm thankful for that. :)

Channeling Anton

Let me hasten to say that I wholeheartedly agree with all of this. I would only add that what's interesting about approaches such as that of Caduceus/Why is that they only require you to add annotations to make explicit the design criteria that I feel quite confident Anton would point out that you have to have in your head in order to have written the code to begin with. If making this explicit is error-prone, so is writing the implementation in the first place, and at least a tool like Caduceus will tell you when your preconditions/postconditions/invariants and your implementation are inconsistent. But if you can't have any confidence in your preconditions/postconditions/invariants, then by definition you can't have any confidence in your implementation, either. Why this is supposed, somehow, to be damning of supplying preconditions/postconditions/invariants, but not of providing an implementation, continues to baffle me—that, coupled with my confusion over his continued reliance on a 29-year-old critique of formal methods in the face of modern work in the arena, is the crux of my issue with el-vadimo.

requested citation

In comment-39065, Paul Snively wrote:

I'm extremely distrustful of a quote that begins with a sentence saying "this approach" without its surrounding context.

For the surrounding context, see Faithful Mapping of Model Classes to Mathematical Structures by Ádám Darvas and Peter Müller (available here and elsewhere).

Excellent

Thanks! I'll read these very soon (within the week).

Verify, then trust

Basically, his point is that programming cannot be deskilled.

There's a big difference between deskilling programming, and providing programmers with greater leverage to solve problems more easily.

To me, easier generation of programs from specifications is more about the latter than the former. In fact, that's what much of PL theory is about: we write a kind of specification in some high-level language (or even just an assembly language), and a program automatically ultimately spits out a batch of 1s and 0s that makes a machine do something that corresponds to our specification.

Perhaps someone wants to argue that a spec is something different from a program — in which case I'd love to see the definitions in question — but my point is that this entire field is about translating high-level descriptions into something less tractable, and there's been quite a bit of success at this so far.

Further, one of the benefits of these high-level descriptions is that it's more practical to perform meta-processing, such as proofs, on the high-level descriptions, than it would be on their low-level equivalent. The more such processing we're able to do, the better we tend to understand the problems. Not every problem requires such close attention, but some do benefit from it.

there is no continuity between the world of [Schorr-Waite] and the world of production software, billing systems that write real bills, scheduling systems that schedule real events, ticketing systems that issue real tickets.

I'm not sure what the implications of "continuity" are supposed to be in this context, but I see applicability of formal proof techniques in many cases. One of the more significant uses of computing today is to achieve scale, and when you do things at large scales, small errors can have huge effects. So you might indeed want to prove the calculation at the heart of your billing system correct, particularly if it's non-trivial, as billing systems for complex products often are.

Another example along these lines can be found in insurance, which is unusual in that its products often have essentially unlimited downside risk for the insurer, so mistakes in calculations can have career-destroying, lawsuit-triggering, company-bankrupting consequences.

Of course, there are also the classic examples of mission-criticality, such as aircraft or surgical control systems, where formal techniques are already being used.

I think there's a tendency, in discussions like this, to focus on common examples most familiar to people. But while it may be overkill to formally prove correctness for your average widget store application, where the calculations don't get much more complex than total=subtotal+qty*price, this doesn't automatically generalize to all "in-house" or production software, some of the most interesting examples of which hardly ever get discussed in public.

Besides, even the systems that rely on simple calculations like total=subtotal+qty*price are relying on a formal axiomatic system with provable properties. We also rely on programming languages that can produce a correct program from that equational specification. It's just that we've long since taken the formal system in question for granted.

The Hoare quote summarizes the case well: "In many applications, algorithm plays almost no role". That's true, and it helps explain why I believe that not all programs need to be statically typed, but it also correctly implies that in some applications, algorithms (and other provable things) do play a role.

It's not clear, then, how one gets from that to Perlis' apparent desire to see Hoare "abandon verification". Is this perhaps just another of those battles between artificially drawn binary extremes, where the choice is between verifying everything or verifying nothing, and neither is correct? Or am I missing a more subtle issue?

Yes

You're absolutely right that this duality, like so many others, has been postulated in part for rhetorical reasons, to direct research attention, funds and prestige toward or away from particular areas of investigation. Any researcher who denies that this happens is either lying or hopelessly naive.

We say "some of the remaining complexity is essential, but we don't know how much", and then of course any complexity that is eliminated is afterward claimed to have naturally been purely accidental. ("But of course you haven't solved the real problem.") From a less cynical perspective, this can be simple expectation-setting ("Pay us for this research, but don't expect much progress..."), while on a more cynical view, it is just plain naysaying.

Incidentally, the structure of this situation reminds me of what we might call the Schoolman's Argument after Scott Aaronson.

In any case I don't think it's bad to acknowledge that software engineering is likely to remain expensive and difficult. But in the case of this particular distinction, I do think it's worth asking who's making it, and why.

Zero point energy

It may be fairly uncontroversial to be skeptical about perfect and complete generation of programs from specs (although it depends a lot on how you define your terms). However, this is a fairly extreme formulation of the issue which overlooks plenty of useful possibilities, some of which are well within reach today.

To take a less extreme perspective, it isn't essential that a spec has to define "precisely what a program will do" in every respect. For example, I find that writing specs as functional programs can be very quick and effective, but such specs often only need to be precise along certain (important) dimensions. They may, for example, use lists where a production program would use a structure with different complexity. So such a spec doesn't define precisely what performance characteristics the production system should have, but it may define other aspects of the semantics of the program very precisely.

Admittedly, getting from such an executable spec to a production program may not be automated in the general case any time soon. But in particular cases, it already is automated: anytime the "spec" turns out to be good enough to use in production without major rewriting.

Joel's last quoted paragraph might equally be applied to static type systems (which of course are closely related to specifications). Static type systems provide a "mechanical way to prove things about the correctness of a program". A type scheme for a program provides meta-information which at best otherwise exists only in latent form. That type meta-information is a kind of abstraction (or specification!) of the program which, while it doesn't capture every aspect of the program's semantics, nevertheless captures significant aspects which are useful to be able to precisely specify and check. That information has a natural connection to the program it describes, a connection which is inherently checkable and provable. Even though a program's type scheme is an incomplete specification of the program, it achieves much more than "simply moving the problem around without accomplishing anything". Something similar could apply to other kinds of specification language.

So I agree with Josh that "I still think there's a lot of unrealized potential to the idea." Actually, I think quite a bit of that potential has already been realized, but is currently not being exploited just because of unfamiliarity with the available tools.

Good point

That's a good point Anton. Much of the power of writing some kind of (formally analyzable) spec comes not from precisely defining in every respect what a program will do, but from applying some kind of abstraction to suppress certain details of the design to produce a model that can actually be understood and/or analyzed.

Writing a list-based functional program to specify program semantics abstracts from the details of performance in order to focus on functional (no pun intended) correctness. Similarly, you can do some really useful analysis of complex concurrent systems by abstracting from the details of the internal sequential computations, and focusing on the patterns of interactions. The result is often a model that is tractable for exhaustive analysis (say via model-checking). That analysis won't guarantee that the final system will be completely free of all bugs, but it will increase the degree of confidence you can have in your selected pattern of interactions being free of things like deadlock. Much more so than a far more time-consuming amount of testing would. And perhaps you could do some other kind of analysis on the sequential parts of the code.

Using abstractions that can be analyzed to increase confidence in a design is exactly how other engineering disciplines do their work. Part of the art of engineering is knowing what abstractions make sense in a particular context, and how much the results can be trusted (e.g. "are we using the system in a regime where it behaves linearly?", "is it valid to assume incompressible flow here?", "do we need to worry about propagation delay in this circuit?").

program verification: the very idea

Xref for the sake of archival completeness: Program Verification: The Very Idea.

a modern take on social processes and static analysis

To lift a few quotes from the paper referenced in http://lambda-the-ultimate.org/node/3824

Usefulness, a social construct, trumps soundness, a mathematical construct:

Like the PREfix product, we were also unsound. Our product did not verify the absence of errors but rather tried to find as many of them as possible. Unsoundness let us focus on handling the easiest cases first, scaling up as it proved useful.

A user's trust in a verifier is not based on having a proof available for manual or automatic inspection. It is based on how much faith she observes being put in the tool by her co-workers and other peers:

How to handle cluelessness. You cannot often argue with people who are sufficiently confused about technical matters; they think you are the one who doesn't get it. … What to do? One trick is to try to organize a large meeting so their peers do the work for you. The more people in the room, the more likely there is someone very smart and respected and cares…

It doesn't matter if the verifier is right. It only matters if the user believes it to be right:

… explaining errors is often more difficult than finding them. A misunderstood explanation means the error is ignored or, worse, transmuted into a false positive. The heuristic we follow: Whenever a checker calls a complicated analysis subroutine, we have to explain what that routine did to the user, and the user will then have to (correctly) manually replicate that tricky thing in his/her head.

Social problems call for social solutions. Piling on more technology onto what is essentially a social problem is not an answer:

A vicious cycle starts where low trust causes complex bugs to be labeled false positives, leading to yet lower trust. We have seen this cycle triggered even for true errors. If people don't understand an error, they label it false. And done once, induction makes the (n+1)th time easier. We initially thought false positives could be eliminated through technology. Because of this dynamic we no longer think so.

Vardi's recent digs at DeMillo, Lipton, and Perlis

Moshe Vardi is CACM's current Editor-in-Chief. It is of historical interest that he brought up "Social Processes" a couple of times in recent years. In CACM: Past, Present, and Future (CACM, Volume 51 , Issue 1, Jan 2008, p.44), he wrote:

I came of age as a computer scientist in the late 1970s, during my formative years as a graduate student. I remember being highly influenced by some great research articles published during CACM's "black-and-blue" years. [...] in May 1979, De Millo, Lipton, and Perlis's "Social Processes and Proofs of Theorems and Programs" was instantly controversial. (Indeed, it still makes for interesting reading today, though the tremendous progress in formal-methods research has dulled its edge.)

In More Debate, Please! (CACM, Volume 53 , Issue 1, Jan 2010, p.5), he wrote:

In the May 1979 issue of Communications, a powerfully written article by Richard A. De Millo, Richard J. Lipton, and Alan J. Perlis entitled "Social Processes and Proofs of Theorems and Programs," argued that formal verification of programs is "difficult to justify and manage." The article created the perception, in the minds of many computer scientists, that formal verification is a futile area of computing research.

...

With hindsight of 30 years, it seems that De Millo, Lipton, and Perlis' article has proven to be rather misguided. In fact, it is interesting to read it now and see how arguments that seemed so compelling in 1979 seem so off the mark today.

This prompted a response from DeMillo and Lipton that was published a couple of issues later in Too Much Debate? (CACM Volume 53 , Issue 3, Mar 2010, pp.6-7). They responded to the perceived tone of Vardi's remarks and promised to "respond to the technical substance of his comments at a later time".

We'll keep an eye out for the forthcoming rebuttal. In the meantime, Gene Spafford chimed in:

In the case in point, I remember reading the original, the followups, and the many discussions that followed — both in my CS classes and in a philosophy seminar! I have yet to see a clear refutation of the conclusions of that first article that convinces me, as an informed reader. Perhaps I missed something published, or I misunderstood the original paper, so I may be “misguided.” But without something to sway my thinking, I still accept those results.

Indeed

They responded to the perceived tone of Vardi's remarks and promised to "respond to the technical substance of his comments at a later time".

We'll keep an eye out for the forthcoming rebuttal.

Yes. Yes, we will.

Quoting Gene Spafford:

I have yet to see a clear refutation of the conclusions of ["Social Processes and Proofs of Theorems and Programs"] that convinces me, as an informed reader. Perhaps I missed something published, or I misunderstood the original paper, so I may be “misguided.” But without something to sway my thinking, I still accept those results.

I continue not to know what to say to anyone who doesn't find the 30 years of development in (semi)automated proof technology, the Curry-Howard Isomorphism, and the de Bruijn criterion "a clear refutation of the conclusions of 'Social Processes and Proofs of Theorems and Programs.'" It's getting to the point where all the practitioners actually creating guaranteed reliable software can do is shrug and let the armchair philosophers continue to let their papers drift down from their ivory towers.

understanding Spafford's lack of enthusiasm

Re: comment-33518 and comment-58391:

I continue not to know what to say to anyone who doesn't find the 30 years of development in (semi)automated proof technology, the Curry-Howard Isomorphism, the de Bruijn criterion [and the triangle inequality] "a clear refutation of the conclusions of 'Social Processes and Proofs of Theorems and Programs.'"

The de Bruijn criterion states, at best, a necessary condition for delivering believable formal verifications (although even that much is debatable). It is not a sufficient condition in the sense that adherence to the criterion does not, in and of itself, make proof construction easy. Furthermore, practice has shown time and again that abstractions and concepts that are well-behaved in the small turn out to be non-composable in the large. A large system cannot be fully specified in Coq or Isabelle. Different formalisms are appropriate — nay, required — for different things and I'd be curious to see examples of fully verified systems that employed a combination of Coq/Isabelle for one part and some other formal-verification tool for another part. For example, as you recall, the real-world example you cited elsewhere in this thread stopped short of verifying the parts coded in assembly (that would've been so much fun!). Had they verified those, it would've been interesting to see how they would bridge their formal semantics of C with the formal semantics of the assembly language for their architecture(s) of choice. And that's just a tiny microkernel we're talking about. Perhaps Spafford can be excused for remaining skeptical of the de Bruijn criterion.

It's not at all clear how the Curry-Howard isomorphism is of much relevance, either. Consider the following snippet of code written in a language known as Simply Typed λ Calculus (where ⓪, ①, and ② are notational shorthand for the corresponding Church numerals):

(define (f a b p q c)
  (cond ((= c ⓪) b)
        ((even? c)
         (f a
            b
            (+ (* p p) (* q q))
            (+ (* q q) (* â‘¡ p q))
            (/ c â‘¡)))
        (else (f (+ (* q b) (* q a) (* p a))
                 (+ (* p b) (* q a))
                 p
                 q
                 (- c â‘ )))))

While it is true that it corresponds to some natural-deduction proof, it is also true that that proof fails to capture anything of interest about the above function. In particular, it doesn't prove that the function increases monotonically and rather quickly. So, why should I or Gene Spafford care about CH?

The above is a view from the industrial trenches. If you care for an opinion from a hotbed of research in formal methods, Alastair Donaldson, if I read him correctly, basically says that model checking — rather than theorem proving — is where it's at.

Oh, and before I forget... In another post, Donaldson mentions a wonderful presentation by Vijay Victor D'Silva (Tales from Verification History) that I highly recommend. Not a page turner like Donald MacKenzie's meticulously researched and well-told thriller Mechanizing Proof, but very interesting nonetheless.

Informal

I doubt model checking even scales that much, it still chokes on simple things as parity at the moment, as far as I know.

I don't know much of the current state of the art, but I think there's still a lot to be done on the theorem proving/model checking combination. Most (good) proofs are relatively short, and theorem proving -as I knew it- was just to laborious.

I wonder whether computer assisted proving could be done differently, like with a wizard where you can state: "Try a monotonicity proof, then a Noetherian approach." Or, give me a monotonicity lemma on a statement. I doubt that that it would scale, but it might be interesting for mathematicians.

Interactive proof assistants

I wonder whether computer assisted proving could be done differently, like with a wizard where you can state: "Try a monotonicity proof, then a Noetherian approach." Or, give me a monotonicity lemma on a statement. I doubt that that it would scale, but it might be interesting for mathematicians.

Some interactive proof assistants (including Coq and Isabelle) allow for this sort of approach by defining custom tactics, or by suggesting relevant lemmas to the ATP-based proof searches. Finding interesting statements/lemmas from scratch is still unfeasible, though.

Whether theorem proving is laborious depends in no small part on what theorems are to be proven. Proving correctness for a program written in ML/Haskell style is relatively straightforward; imperative programming is harder, and real-world embedded programs or system software seem to be hardest.

No idea where they're at now

Yeah, I worked somewhat with some of them - it mostly breaks down to how well the problem is defined. Still, it remains quite hard to prove correctness of termination of Collatz.

The thing I was thinking about were not tactics, but using the structure of the definition to find the accompanying proof rules. But it was more wishful thinking, I guess.

[ Don't agree with Erdos on Collatz, btw. ]

Proving correctness for a

Proving correctness for a program written in ML/Haskell style is relatively straightforward; imperative programming is harder, and real-world embedded programs or system software seem to be hardest.

As a side node. The practice of embedded systems development is one of the application of functional black box testing. If you want to create whitebox/unittests you have to customize the used application protocol for the purpose of testing which is system specific. That's why there aren't off the shelf UT frameworks for embedded systems and won't ever be. The situation can be much improved though by providing test-runner generators, a user interface from which tests are started and results are collected. I/O is then the only part which is left open for implementation.

I mention this only because the conflict between testing a concrete API and proving code to be correct according to a formal model doesn't apply and this not only because the latter is impractical but the former isn't practised either.

Misunderstanding CH?

This seems to be a misunderstanding of the CH isomorphism. Under CH, the simply-typed lambda calculus is isomorphic to intuitionistic propositional logic, or a fragment thereof depending on which data type constructions are allowed in the type system. It should not be much of a surprise that such a type system cannot express many interesting statements.

Whether interesting theorems can be directly derived from a function's construction also depends on the form of these theorems: this is generally true for postconditions and invariants, which are ubiquitous in real-world formal verifications.

Ally Donaldson's post

Alastair Donaldson, if I read him correctly, basically says that model checking — rather than theorem proving — is where it's at.

Reading Donaldson's post, he makes some very good points, and he does argue in favor of model checking, but his conclusion is hard to support.

The good points he makes are:

  1. automated formal methods are more convenient than manual theorem proving
  2. weak ("lightweight") assertions are easier to write and understand than full specifications of correctness
  3. counterexamples are more socially relevant and useful than failed proofs of correctness

These points do support an emphasis on automated search for counterexamples and refutations, but they do not support model checking as anything more than one technique among many, much less an "entirely different formalism". From a "theorem proving" point of view, a counterexample is simply a constructive proof of a negative statement. This is a strikingly general perspective: for instance, a theorem proving system might consider an assertion of the form (forall xyz. A ∨ B) and find a proof of e.g. (forall x. exist yz. ¬B ∧ ¬A), where y and z may be provided as witnesses and any of (¬B) and (¬A) may be either a constructive refutation or a negative proof.

These facts should be readily apparent to anyone with expertise in constructive logic, and they would seem to belie the assertion that the theorem proving approach is not worth considering because "model checking is where it's at".

breaking provably secure systems

Speaking of Lipton's blog Gödel's Lost Letter and P=NP...

In one of his recent posts, he shares the following anecdote:

In the summer of 1976, Anita Jones and I were part of a RAND summer study group organized by Stockton Gaines on security. Anita and I eventually wrote a fun paper during the summer: a paper with a long, complex, and crazy story; I will share it with you another day.

During the workshop, Gerry Popek visited us one day, and gave a talk on his then current project to prove a certain OS kernel was secure. This was a large project, with lots of funding, lots of programmers, and he said they hoped in two years to have a proof of the OS’s correctness. What struck me during his talk was he could write down on the board, a blackboard, a relatively simple formula from set theory. The formula, he stated, captured the notion of data security: if a certain function f had this property, then he would be able to assert his OS could not leak any information. Very neat.

At the end of his talk I asked him if he wanted a proof now that his function f satisfied the formula. He looked at me puzzled, as did everyone else. He pointed out his f was defined by his OS, so how could I possibly prove it satisfied his formula—the f was thousands of lines of code. He added they were working hard on proving this formula, and hoped to have a full proof in the next 24 months.

I asked again would he like a proof today? Eventually, I explained: the formula he claimed captured the notion of security was a theorem of set theory—any function f had the property. Any. He said this was impossible, since his formula meant his system had no leaks. I walked to the board and wrote out a short set theory proof to back up my claim—any f had his property. The proof was not trivial, but was not hard either.

The point of the story is: the formula captured nothing at all about his OS. It was a tautology. What surprised me was his response: I thought he would be shocked. I thought he might be upset, or even embarrassed his formula was meaningless. He was not at all. Gerry just said they would have to find another formula to prove.

Oh well.

Nice!

Nice!

He looked at me puzzled, as

He looked at me puzzled, as did everyone else. He pointed out his f was defined by his OS, so how could I possibly prove it satisfied his formula—the f was thousands of lines of code. He added they were working hard on proving this formula, and hoped to have a full proof in the next 24 months.

This nicely shows the futility of trying to separate your formal description of the system from the system itself.

Meanwhile, back in the real world...

tour de force brute

Re: comment-58382:

Meanwhile, back in the real world... [seL4: Formal Verification of an Operating-System Kernel]

What an amazing piece of work. A tour de force, for sure, but one achieved par force brute. A few quick remarks, if I may.

To place this achievement in its proper historical context, let's compare it to the following earlier result, as presented in Knuth's lecture All Questions Answered (TUGboat, Vol. 23, 2002, No. 3/4):

When I looked at the first published papers on proving correctness of programs, Tony Hoare’s paper on proving that the find program was correct, there were two bugs in it. He had proved it correct, but there were two bugs in his original proof.

Two points are worth making. The first point is about discontinuity. Hoare's program (1971) was about 101 lines long. At the time, real-world™ programs were ≥106 lines long. (SAGE, built in the 50's, was about that big. So was IBM's System/360.) By analogy with the de Bruijn factor, let us introduce the DeMillo-Lipton-Perlis discontunuity factor:

       D = Srw/Sfv

where Srw is the size (measured in lines of code) of the largest real world program and Sfv is that the largest formally verified one. In 1971, when formal verification of programs was a nascent discipline, D ≫ 104 was a safe lower bound. Fast-forward to 2010. The current numbers seem to be

       Srw ≳ 108,
       Sfv > 104 (as evidenced by seL4 and other projects),
  and, consequently,
       D ≳ 104.

As I said before, I expect this discontunuity to hold in the future.

The second point is briefly touched upon in the seL4 paper under the heading The Cost of Change. The really hard thing about software development is not the writing of workable software. It's the rewriting of it. Did the use of formal verification on the seL4 project delay the onset of the Belady-Lehman upswing? (Thanks to Matt Hellige for the reference!) Let's review the evidence:

A well-designed high-performance microkernel, such as the various representatives of the L4 microkernel family, consists of the order of 10,000 lines of code (10 kloc). This radical reduction to a bare minimum comes with a price in complexity. It results in a high degree of interdependency between different parts of the kernel

There is one class of otherwise frequent code changes that does not occur after the kernel has been verified: implementation bug fixes.

So, what we have here is a very tightly woven ball of C spaghetti for which a solid assurance case can be made that its current bug count is near zero. If that's not a definition of the Belady-Lehman extremum, I don't know what is.

Of course, bugs may still lurk in the portion of the kernel left unverified. There may also be errors in the toolchain or the formal semantics of the chosen subset of C. By the author's definition, the latter are not considered bugs in the their book, but they are in mine. One of the important ways in which Hoare's work remains superior is that it was fully published. seL4 folks have been threatening to subject the full set of their artifacts to public scrutiny since about June 2007. Maybe they will finally do so for the upcoming republication of their paper which was apparently slated to appear in the May issue of CACM .

Still, there are reasons to believe that the latent bug count is about as low as can be. The only way from this point on is straight into the B-L upswing.

Which reminds me of the chuckle I had upon reading the parent post... In the Ⓡeal World, seL4 is partially verified but unused while its cousin OKL4 is used but unverified. The latter can claim some amount of coolness by association but the kissing cousin argument grows increasingly untenable with every new release.

Parenthetically speaking, what would be really funny is if Coverity found a couple of bugs in seL4.

Surprisingly...

...I essentially agree, insofar as I was surprised to find that the seL4 effort used Isabelle vs. Coq, for reasons that essentially boil down to support for proof automation. I absolutely agree that the ratios of proof development to executable result given in table 1 would be prohibitive for all but the most demanding of product development efforts. That's exactly why I encourage people to use Coq vs. Isabelle and, especially, to follow Adam Chlipala's work on certified programming with dependent types and its emphasis on proof automation. I think it would be very helpful to see how the seL4 work would look if, for example, it had been done with tools like Ynot rather than Isabelle.

With that said, I think putting seL4 through Coverity is also an excellent idea.

Update: The previous post, Effective Interactive Proofs for Higher-Order Imperative Programs, about recent Ynot developments is also quite relevant.

Doron Zeilberger's Opinion #112

A quote from On Human Hypocrisy (and Human-Centric Bigotry): A Typical Computer-Assisted Proof is Far More Rigorous (and Certain!, and Deeper!) than a typical Human-Generated Proof, (and some suggestions on how to improve the reliability of published mathematics in “peer”-reviewed journals):

… an average paper is read by at most two people (the author and the referee). Most people are so busy writing papers, that they don't have time to read them, and look for mistakes.

That's mostly because, to borrow Morris Klein's phrase, most papers are necessarily (by Sturgeon's Law) nothing but “pea soup and tripe”.

xref

Note to self: Somewhat germane to the topic of proofs and social processes is the subthread rooted in I don't read “formal X” sections in papers over in “References about the importance of formalism in programming language design”.