program verification: the very idea

James H. Fetzer's Program Verification: The Very Idea (1988) is one of the two most frequently cited position papers on the subject of program verification. The other one is Social Processes and Proofs by De Millo, Lipton, and Perlis (1979), previously discussed on LtU. Fetzer's paper generated a lot of heated discussion, both in the subsequent issues of CACM and on Usenet.

It's not clear to me what all the fuss is about. Fetzer's main thesis seems pretty uncontroversial:

The notion of program verification appears to trade upon an equivocation. Algorithms, as logical structures, are appropriate subjects for deductive verification. Programs, as causal models of those structures, are not. The success of program verification as a generally applicable and completely reliable method for guaranteeing program performance is not even a theoretical possibility.

(See also part I, part II, and part III.)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

part I: relation to previous work

Fetzer devotes about two thirds of his article to discussing De Millo's paper. He concludes:

In maintaining that program verification cannot succeed as a generally applicable and completely reliable method for guaranteeing the performance of a program, De Millo, Lipton and Perlis thus arrived at the right general conclusion for the wrong specific reasons. Still, we are all indebted to them for their efforts to clarify a confusion whose potential consequences — not only for the community of computer science, but for the human race — cannot be overstated and had best be understood.

To assess the argument by which he arrives at this conclusion, you are going to have to work your way through 16 pages of rather dense text that is flawed in the following three ways:

  1. Fetzer's academese is a little difficult to read — all the more so because it appears side by side with quotes from De Millo's extremely well-written article, not to mention such masters of lucid English prose as C.A.R. Hoare. I found myself having to re-read Fetzer's piece twice.
  2. Fetzer is not a computer scientist. At the time of publication, his primary background had been, methinks, in the theory of knowlege and philosophy of science. Although he seems to have been very familiar with the general philosophy of science (think Popper's Conjectures and Refutations and The Logic of Scientific Discovery), I found no evidence that he had also been well-versed in the narrower field of philosophy of mathematics in general and Imre Lakatos's Proofs and Refutations: The Logic of Mathematical Discovery in particular — and a very unfortunate thing that is.

    It's unfortunate because De Millo's argument is unmistakably Lakatosian. Fetzer could have spared himself and his readers a lot of aggravation if he had read Proofs and Refutations before writing his piece. Instead, he spends an inordinate amount of time dissecting and refuting (his mistaken interpretation of) De Millo's arguments.

  3. In the course of doing so, he apparently manages to confuse a number of prominent verificationists with poor reading skills. Anti-Fetzer screeds and "rebuttals" published in subsequent issues of CACM make it clear that some of his readers had failed to separate Fetzer's own beliefs from those he had explicitly ascribed to De Millo et al.

part II: Fetzer's thesis

(see also part I above.)

Fetzer's stated goal is

... to investigate the arguments that DeMillo, Lipton and Perlis have presented in an effort to disentangle several issues that seem to have become intricately intertwined.

Accordingly, he spends a significant portion of the article carefully delineating his points of disagreement with De Millo's piece. The meat of his own argument is found in the last third of the article.

He observes the following distinction:

algorithm

a logical structure of the type function suitable for the derivation of outputs when given inputs;

program

a causal model of an algorithm obtained by implementing that function in a form that is suitable for execution by a machine;

In the section called Abstract Machines versus Target Machines, he further observes that

  1. As an effective decision procedure, an algorithm is more abstract than is a program, insofar as the same algorithm might be implemented in different forms suitable for execution by various machines by using different languages.
  2. Compilers, interpreters, processors and the like are properly characterized as physical things, i.e., as systems in space/time for which causal relations obtain. Abstract machines are properly characterized as abstract entities, i.e., as systems not in space/time for which only logical relations obtain.

(The above line of reasoning may seem familiar to some LtU readers.)

Stated yet another way (emphasis mine — V.N.):

...results in logic and mathematics fall within the domain of deductive methodology and require demonstrations. Lawful and causal claims fall within the domain of empirical inquiries and require inductive warrants. Observational and experimental findings fall within the domain of perceptual investigations and acquire support on the basis of direct sense experience.

Fetzer agrees with De Millo, Lipton, and Perlis on the following point:

... to the extent to which verification has a place in programming practice, it applies to the evaluation of algorithms; and to the extent to which programming practice goes beyond the evaluation of algorithms, it cannot rely upon verification.

Poor terminology

It seems Fetzer uses "program" to refer to what physically happens when you run a program. I don't see any hint of an attempt to argue that a program understood as a string of symbols subject to some language semantics is not a mathematical object amenable to deductive reasoning. Going from the uncontroversial claims that conclusions about physics require inductive justification to conclude that algorithms can be proven but programs cannot conflicts rudely with the common usage of the word.

De Millo, Lipton, and Perlis explain themselves - programs are too long and grotty to benefit from the anything like the usual process of mathematical verification.

Fetzer, the strong verificationist

It seems Fetzer uses "program" to refer to what physically happens when you run a program.

Actually, Fetzer's section "Abstract Machines Versus Target Machines" defines four senses of the word "program" and discusses the relationships between them. I've summarized these below, based on Fetzer's definitions and subsequent descriptions:

(i) algorithms
(ii) encodings of algorithms for abstract machines
(iii) encodings of algorithms that can be compiled to machine code for a physical machine
(iv) encodings of algorithms that can be compiled to machine code and (successfully?) executed by a physical machine

Fetzer writes that "programs in senses (i) and (ii) are subject to absolute verification by means of deductive procedures". I'm sure the program verificationists appreciate his support, but I'm left wondering what the fuss is about. :)

*deleted*

part III: resulting firestorm

(See also part II above.)

Fetzer's paper provoked a very lively discussion on the pages of ACM Forum in subsequent issues of CACM. Arguments against Fetzer fall into the following broad categories:

  1. Calls for purging the (ir)responsible CACM editors on ideological grounds:

    ... by publishing the ill-informed, irresponsible, and dangerous article by Fetzer, the editors of Communications have abrogated their responsibility.

    (LtU readers are no strangers to arguments phrased in such terms as pointless, damaging, and deleterious :)

    In his response, Fetzer picked up on the ideological nature of this attack. He refers to Victor Basili and his esteemed comrades as "The Gang of Ten". More to the point, Fetzer points out that the Gang of Ten displays no comprehension at all of the fundamental distinctions between validity and soundness, programs and algorithms, pure and applied mathematics, abstract entities and physical systems... Spot on, Mr. Fetzer.

    As far as the call for editorial purges, the attack was so vicious that it warranted a response by none other than Peter J. Denning, then Editor-in-Chief:

    I stand fully behind my editors... I am dismayed by the ad hominem attacks sprinkled through the letter, which detract from the thoughtful criticisms and worthwhile suggestions for more discussions.

    A lot of readers were appalled by the vitriolic anti-Fetzer and anti-ACM screeds penned by some of the academic verificationists. Richard Hill's response sums it up very well:

    Speaking for myself, after eight years of undergraduate and graduate studies in computer science and mathematics and eighteen years of both direct and management experience in writing, designing, testing, implementing, and operating large and small software systems in commercial, academic, and scientific environments, I feel that I should be entitled to make up my own mind about kinds of research are useless and possibly harmful. I cannot condone such obscurantist attitude of those who would silence criticisms of their pet projects on the grounds that the critic is ignorant. If the criticism is unfounded, it will fall because of its own weakness. Is not the essence of the scientific process to encourage lively debate? What does academic freedom mean, if journals cannot publish controversial articles?

    Having read the vitriolic, unjustified, unreasoned attacks on Fetzer that have been published, I am led to suspect that at least some defenders of program verification can find no real arguments to rebut Fetzer's contentions and resort to meaningless insults in a desperate attempt to defend a position that cannot be logically defended.

    Well said.

  2. Accusations of ignorance:

    Fetzer seems to be unaware of work in program verification in which the aim is to bridge the gap between algorithms and executions. There has been much recent interest in the formal description and verification of hardware and operating systems. A high level description of an algorithm can be formally related to its interpretation with respect to a formal description of hardware. A mathematical description of hardware at the gate level would reduce the algorithm/execution gap to the point where only the relation between the gate description and the physical properties of silicon would be subject to Fetzer's distinction. And whereas it is true that physical gates do not always behave as their mathematical counterparts — for example, the formalism does not account for such contingencies as Luddite antiverificationists with hammers smashing the gates to bits — the semantic gap is sufficiently small to render Fetzer's objections inconsequential. To deny any relation between, say, a physical AND gate and the corresponding boolean function is to deny that there can be any useful mathematical model of physical reality. This is tantamount to asserting the impossibility of physical science.

    To which Fetzer respond:

    Since I certainly do not want to assert the impossibility of physical science — thereby terminating my own profession as a philosopher of science — I better be prepared to dispute the premises that yield this conclusion. But that turns out to be surprisingly effortless, insofar as even "vanishingly small" differences are enough, so long as they are "differences," i.e. do exist. For every argument that I have given for the difference between programs and algorithms obtains alike for physical AND-gates and their abstract counterparts.

  3. Attempts to trivialize Fetzer's argument (a technique not without parallels on LtU :)

    Writes James C. Pleasant:

    The article [...] does a disservice to the cause of the advancement of the science of programming by belaboring the rather obvious fact that programs which are run on real machines cannot be completely reliable, as though advocates of verification thought otherwise.

    Likewise, Lawrence Paulson, Avra Cohn, and Michael Gordon write:

    Fetzer [...] makes one important but elementary observation and takes it to an absurd conclusion. His point is this: a computer system involves physical components that can be behave unpredictably, so no deductive argument can guarantee that the system will work perfectly. For example, a verified Strategic Defense Initiative (SDI) system could fail disastrously.

    In his response, Fetzer attempts to clarify his point:

    The principal position under consideration with respect to program verification, no doubt, is that of C.A.R. Hoare and those who share a similar point of view, a matter about which my article is quite explicit. On several occasions, I quote Hoare himself, most significantly as follows: "Computer programming is an exact science in that all the properties of a program and all the consequences of executing it can, in principle, be found out from the text of the program itself by means of purely deductive reasoning." This sentence, which is the most unambiguous formulation of his position that I have yet to discover, reflects a bold conjecture concerning computer programming — a conjecture that I contend is impossible to justify.

Discussion of Fetzer's articles stretches across multiple subsequent issues of CACM. In many respects, the discussion is more illuminating than the article itself. Unfortunately, the Net has not yet treated as damage ACM's refusal to publish back issues of CACM free of charge. To access the following archived portions of the discussion of Fetzer's article, you need an ACM subscription:

  • Communications of the ACM 32(3) (March), pp. 287-290. [ACM Forum, Academe's vitriolic reponse to Fetzer's article by Ardis, Mark; Basili, Victor; Gerhart, Susan; Good, Donald; Gries, David; Kemmerer, Richard; Leveson, Nancy; Musser, David; Neumann, Peter; and von Henke, Friedrich (1989), "Editorial Process Verification" (letter to the editor, with replies by James H. Fetzer and Peter J. Denning)]
  • Communications of the ACM 32(3) (March), pp. 374-377 [Technical Correspondence, Pleasant, James C.; and Paulson, Lawrence; Cohn, Avra; & Gordon, Michael; and Bevier, William R.; Smith, Michael K.; & Young, William D.; and Clune, Thomas R.; AND Savitzky, Stephen (1989), "The Very Idea" (5 letters to the editor)]
  • Communications of the ACM 32(4) (April), pp. 420-422 [Viewpoint by Dobson, John; and Randell, Brian (1989), "Program Verification: Public Image and Private Reality"]
  • Communications of the ACM 32(4) (April), pp. 506-512 [Technical Correspondence, Müller, Harald M.; and Holt, Christopher M.; and Watters, Aaron (1989), "More on the Very Idea" (3 letters to the editor, with reply by James H. Fetzer)]
  • Communications of the ACM 32(7) (July), pp. 790-792 [More on verification Hill, Richard; AND Conte, Paul T.; and Parsons, Thomas W.; and Nelson, David A. (1989), "More on Verification" (4 letters to the editor), ACM Forum]
  • Communications of the ACM 32, pp. 1130-1131 [Tompkins, Howard E. (1989), "Verifying Feature-Bugs" (letter to the editor), Technical Correspondence — a response to Harald Müller]

Apply program verification techniques

Thanks for sharing this. I think it is clear that program verification has limited (but important) applications, and that a path of progress would be to make these applications broader by applying techniques and principles of program verification to domains such as testing and quality assurance.

I sympathise with Lawrence Paulson, Avra Cohn, and Michael Gordon which you quote:

Fetzer [...] makes one important but elementary observation and takes it to an absurd conclusion. His point is this: a computer system involves physical components that can be behave unpredictably, so no deductive argument can guarantee that the system will work perfectly. For example, a verified Strategic Defense Initiative (SDI) system could fail disastrously.

Semantic verification

There is another reason that a verified algorithm represented as a program might not work in a real environment. It has to do with semantic mismatch with the intended environment. Program semantics, as opposed to semantics of a programming language, is often overlooked or taken for granted in the design of a program. A semantic error will lead to "execution" problems of a different sort. It is not so much that it doesn't work but that we can't use it, or it doesn't work with some other bit of code, or it is incompatible with business practices, etc.

Really?

I'm surprised that you find Fetzer's thesis uncontroversial. I haven't yet read the whole paper, but I did read his response to his critics in the Technical Correspondence section. It consisted mostly of accusing his critics of bad faith, while basing his entire argument on a drastic over-interpretation of a single quote from Hoare, a quote that even the rest of Hoare's article suggests should not be read the way Fetzer reads it.

The whole thing suggests that Fetzer fails to understand that there are more distinctions to be made than those between an algorithm as described in pseudo-code in a textbook and that which executes at the bit-level on a machine. For example, a program written in SML has a formal semantics, and things can be proved about it. But it is lower-level than a simple description of an algorithm. Further, when models of entire chips are done in ACL2, this also suggests that Fetzer's distinction between algorithms and programs needs to be refined.

Finally, I'm not sure why you felt the need to implicitly inject criticism of other LtU posters into your discussion of the article - that seemed both inappropriate and out of place.

Thanks!

I have to confess that I did find el-vadimo's part III: resulting firestorm, with its link to my Background, in which I characterize the lead story, The irreducible physicality of security properties as

...pointless at best, and damaging at worst; taken more seriously than it warrants, it could have the same deleterious effect on security research as a subdiscipline of PLT research that Gödel's incompleteness theorems had on number theory research, that the halting problem had on automated theorem proving for some 20 years, that the halting problem had on logic programming after the development of Prolog, etc.

a bit rich, if only for omission of a link to the lead, where we find:

"We should not talk of secure programming languages and secure programs, such talk does mislead..."

I used the word "deleterious" advisedly, as the lead story literally recommends deleting talk of secure programming languages and secure programs from the discourse. My analogy to Gödel's incompleteness theorems setting back research in automated theorem proving remains, in my opinion, apt, despite Charles Stewart's observation that Nikolas de Bruijn was able to develop AUTOMATH in that timeframe, a singular accomplishment whose advancement of the state of the art is reflected in some people's—myself included—reference to the "Curry-de Bruijn-Howard Isomorphism." I'm not at all persuaded that AUTOMATH's existence contradicts the point in any substantial way.

Incompleteness

I find it hard to see how Gödel's results from 1931 could have 'set back' work that didn't begin until at least 15 years later. Automated theorem proving is hard, and I'm skeptical that it was really 'set back' by prior developments in the foundations of mathematics. Finally, I would be a little worried about suggesting that the Incompleteness theorem (which is true) was responsible for setting anything back - the systems wouldn't have been provably consistent anyway.

Don't take my word for it...

...you can find the assertion cited here. The point is that automated theorem proving was set back in the sense that shortcomings in early systems that weren't necessary were believed for years to be necessary due to widespread overapplication of the incompleteness theorems. To be sure, all modern theorem provers, SAT solvers, etc. are subject to the incompleteness theorems—they're also strictly more powerful and efficient than the early provers with their (incorrectly) incompleteness-theorem-based limitations.

Simplistic AI culture

The problem was not "overapplication" of the incompleteness theorems. The problem was the gung-ho AI mentality of the day that was eager to philosophise and reluctant to do the mathematics. Or more simply, the problem was a failure to understand the incompleteness theorems and their implications for the work they were doing.

The "chilling effect" of this kind of negative result I think is maybe healthy, separating good from bad work.

SAT solvers

To be sure, all modern theorem provers, SAT solvers, etc. are subject to the incompleteness theorems [...]

As far as I remember, SAT solvers are propositional, so they are not subject to Gödel's incompleteness theorem (which applies to first-order languages with arithmetic axioms).

"Uncontroversial"

Uncontroversial seems about a accurate a description of Feltzer's claims as Feltzer's claims seem accurate. How about pedantically correct but inflammatorily phrased?

The paper seems to start with the uncontroversial epistemological point that deductive reasoning cannot absolutely guarantee that the sun will rise tomorrow and indeed can't say anything about the sun at all without using some premises based in empirical evidence. The obvious corollaries that "absolute verification" of programs without any physical premises is impossible and "relative verification" with respect to assumptions about hardware does not provide absolute guarantees (even leaving aside known probabilistic hardware faults). Then conflating "verification" with "absolute verification", and saying "completely reliable" with a sense like "fails in no logically possible worlds" lead to statements like "absolute verification", and goes on say things like "The success of program verification as a generally applicable and completely reliable method for guaranteeing program performance is not even a theoretical possibility".

Actually, I think he overreaches a bit even in his claims about what can be formally verified, saying that an "encoding of an algorithm" can be verified but that "encodings of algorithms that can be compiled and executed by a machine" cannot, unless maybe the later is meant to refer literally to a pattern of charges in memory. If I invent a compiler that accepts your pseudocode and actually execute your programs can that invalidate your proofs?

Here is the Hoare quote that heads the section actually title "The Very Idea of Program Verification" (empahsis mine)

When the correctness of a program, ifs compiler, and
the hardware of the computer have all been established
with mathematical certainty, if will be possible to place
great reliance on the results of the program, and predict
their properties with a confidence limited only by the
reliability of the electronics.

Exactly!

This is the claim of those of us who promote formal methods: that it is, in fact, possible to attain exactly that level of certitude in our software, the question being, as has been discussed here many times, at what cost? Modern microprocessors are already there. Certified compilers are, I would estimate, 2-5 years off. The same technology that will make certified compilers practical will likely make some other kinds of certified programs practical, for those willing to spend ~ the same amount of time in a proof assistant as in developing automated tests. :-) Then, yes, "all" you have to worry about is your TCB, which, barring certified operating systems, etc. is still far too large for comfort, but we'll still be an order of magnitude better off than we are today.

Modern microprocessors are

Modern microprocessors are already there.

Which modern microprocessor was already proven bugfree? I am curious.

What is a TCB? Acronym Finder says "Trusted Computing Base" which does not tell all the story.

Transputer

IIRC, the design for the INMOS Transputer T800 was formally verified in the late 1980s.

Postscript. Oh, here it is: May and Shepherd, 1987. Formal verification of the IMS T800 microprocessor.

The references to this paper that I have found only seem to talk about verification of the FPU, so it may be that only that was verified.

Jeremy Gibbons (in Formal methods: Why should I care? the development of the T800 Transputer floating-point unit (.ps.gz), 1993) discusses May & Shepherd's work.

Transputer verification

I don't believe that the full transputer design was verified. My understanding is that INMOS did what a lot of formal methods people tend to advocate: focused their formal efforts on a few key components that were likely to produce the most bang for the buck. One of those was the FPU, which was specified in Z by Geoff Barrett, taken through various transformations in occam by David Shepherd, and eventually ended up as microcode . Another was the scheduler, where the specification work was also done by Geoff Barrett in Z. Later, during the development of the T9000, Geoff Barrett and Vicky Griffiths did a bunch of work using CSP to specify and verify the "virtual channel processor" (this work was the seed from which the FDR2 refinement checker for CSP grew).

Craigen, Gerhart and Ralston, Industrial applications of formal methods to model, design and analyze computer systems: an international survey, Noyes Publications, 1995 has a nice summary of the Transputer work.

Thanks, CG&R

Thanks for the clarification. The CG&R paper is available as a scanned PDF.

I only just briefly read

I only just briefly read over this report, but I feel someone should point ont how important this is. There is a clear discussion of all the issues already in 1988, and many practical examples of application. Not only that but the report seems to be the basis of decisive regularity action. The origin of code certification required in avionics software. It should be pointed out that a key element of this process is SPARK a subset of the Ada programming language.

Edit: And don't forget to

And don't forget to watch for the next episode of Safe and Secure Software: An Introduction to Ada 2005.

Volume 2: Case Studies

Actually, what I'd been looking at was a book of the same title by CG&R, which incorporates both the paper you've linked to, and a second volume with more detailed information on some of the case studies. Fortunately, it appears that "volume 2" is also available as a PDF, via Citeseer (apparently from PS source provided by NIST). The descriptions in volume 2 provide a lot of interesting context, and are well worth the read in my opinion.

Volume 1 and Volume 2

Both Volume 1 and Volume 2 are available from NIST as plain-text and PostScript files (easier to read than the PDF scans: vol. 1 and vol. 2):

  1. An International Survey of Industrial Applications of Formal Methods. Volume 1: Purpose, Approach, Analysis, and Conclusions, by Dan Craigen, Susan Gerhart, and Ted Ralston, March, 1993:

  2. An International Survey of Industrial Applications of Formal Methods. Volume 2: Case Studies, by Dan Craigen, Susan Gerhart, and Ted Ralston, March, 1993:

AAMP7

The VHDL-level specification of the AAMP7 from Rockwell Collins has been fully verified. That case is interesting because they confirmed not only operational correctness but also enforcement of a separation policy.

I do not know how deep into the silicon synthesis (if at all) their efforts have gone.

Intel

My understanding is that after the Pentium I floating-point division bug, Intel has made a huge investment in crafting models of their CPUs and proving them correct: they employ John Harrison, who does the work in HOL Light.

TCB

TCB does indeed stand for "Trusted Computing Base." My definition is taken from Mark Miller (markm here on LtU), who has described the TCB as "the part of the system that you have no choice but to trust." In both the security and the reliability realms (and part of markm's genius is his insight that the two concerns are not separable), a major goal is to make the TCB as small as possible, which is another way of saying, to have explicit assurances of correctness of as much of the system as possible.

My major argument is that, today, the TCB is essentially the entire system: every application, every plug-in, the operating system, every device driver, the CPU, the GPU, etc. It's mad, and it's within our power to change.

Certified compilers are already here

At least, a certified compiler is already here.

The CompCert certified compiler

Indeed

Dr. Leroy's work has been previously storied here and mentioned in several other threads, notably the YNot thread.

Point of view

How about pedantically correct but inflammatorily phrased?

To people in computer science "verification" applies only to the code and beyond that there are many shades of gray. To an outsider verification means just that. It sounds like a guarantee. I don't think Feltzer goes too far.

Suppose I verify machine code

What then? The only thing I'm trusting is my logic and the description of the processor's ISA. It's possible, in theory, to verify the ISA against the actual layout diagrams of transistors on chip, at which point I'm down to trusting that the chip was fabbed correctly and that the transistors (as physical electrical components) work reliably.

At a certain point any verification effort has to "bottom out." If nothing else, we have to trust that our logic is consistent. But the point is that we can bottom out on a pretty trustworthy basis if we work hard enough.

The fact that its not possible, even in principle, to trust nothing is not an interesting observation.

Cultures,

The main point I want to make is that physical scientists and engineers don't often use terms like "proof" or "theorem proving" in connection with their work. They may calculate all the forces in a bridge, but they don't consider that to prove anything, bridges are more complicated than that. It is natural for someone like Fetzer, or myself to raise their eyebrows on hearing an expression like "proof" or "verification" in connection with an artifact such as software. Artifacts have a theory and a physical incarnation. The two can be very close or quite far apart. An exact match between theory and practice is not essential. Theory is often simplified to get answers, or elaborated if more detail is needed.

Terms like "proof" or "theorem" come from the world of mathematics and theory. We prove things in the realm of theory not in a practical arena. Software is ultimately practical and useful. A "car" becomes a car when you put it on a road. A program becomes a real thing when you run it.

Having said all this I enthusiastically support the use of formal methods both for software and hardware when appropriate. Formal methods have a special roll in software in that they provide semantics. Physical things have semantics already, we don't need to supply it, we only need to understand it. A software artifact is a virtual "thing". We must worry about its semantics. Formal methods are important primarily because they provide semantics. Semantics should provide "safety" and "security" benefits, but won't guarantee it.

"Verification" is a matter of semantics...

It is natural for someone like Fetzer, or myself to raise their eyebrows on hearing an expression like "proof" or "verification" in connection with an artifact such as software.

Actually, engineers (software and otherwise) use the term "verification" quite often*. However, when engineers say "verification", we usually mean something quite different than what the "program verification" folks mean. Verification, in standard engineering parlance, is some kind of demonstration that a design meets its requirements. That demonstration is often by testing, but may also (depending on the kind of requirement) be "by inspection" or "by analysis". The latter comes closest to what the program verification people have in mind. But usually the analysis is focused on specific pieces of the design, instead of trying to verify the whole system (and the models used in that analysis are often required to be later validated against test data). Where formal methods have been used in industrial practice, they've typically been used in much the same way: focused on critical design elements.

* I agree that "proof" is almost never heard in engineering circles.

[Edit: Arguably, the point of "verification" in a traditional engineering setting is to "prove" that a design has certain properties (after all, Roget's gives "verify" as a synonym for "prove"). But I've rarely heard the words "proof" and "prove" bandied about during verification, even when the verification in question was some kind of deductive analysis. I suspect that the word "proof" implies a kind of absolute certainty that most engineers I've worked with are hesitant to lay claim to. Of course, I can't speak for all engineering organizations, everywhere - there may well be engineering organizations where the term "proof" is in common use.]

Engineering also focuses on

Engineering also focuses on conservative approximations with appropriate safety margins. The more precise the science behind the engineering, the slimmer the margins. With computer science, that margin can be made zero, hence the use of "proof".

A marginal note

The more precise the science behind the engineering, the slimmer the margins.

Not necessarily. Margins are as often about uncertainties in the environment as they are about uncertainties in the design itself. I may have a very accurate understanding of the yield strength of the material used to build a structure, yet still design in a large safety factor because I don't know exactly what the loads applied to the structure will be. Software engineers face the same problem of uncertain environments, regardless of how precise (or accurate) the science behind the software is.

Fuzzy logic

Uncertain environments, a good application for fuzzy logic. Is fuzzy logic reducible to a proof?

Obviously any analysis is

Obviously any analysis is limited by precision of the input variables, and as you say, software is no exception. The state of a non-deterministic system is an example of an "imprecise variable" in software. However, totality is one tool one can use to mitigate imprecision with non-deterministic execution, as the language ensures that all possible system states are handled (again, subject to the assumptions of the language and its libraries).

Effectively, totality makes the system state an infinitely precise variable, so it's possible to make the margin infinitely slim. An infinitely slim margin is effectively a proof, thus, total functions are proofs, which we already know. :-)

hardware verification

Re: comment-41384: "… models of entire chips are done in ACL2 …" (see also comment-41403).

As neelk would say, hardware is a more or less first-order imperative system. As such, it is, indeed, tractable. Peter Gutmann put it thusly in his Ph.D. thesis The Design and Verification of a Cryptographic Security Architecture (§3.6, Where Formal Methods Are Cost-Effective, p. 117):

Is there any situation in which formal methods are worth the cost and effort involved in using them? There is one situation where they’re definitely cost-effective and that’s for hardware verification. The first of the two reasons for this is that hardware is relatively easy to verify because it has no pointers, no unbounded loops, no recursion, no dynamically created processes, and none of the other complexities which make the verification of software such a joy to perform.

The second reason why hardware verification is more cost-effective is because the cost of manufacturing a single unit of hardware is vastly greater than that of manufacturing (that is, duplicating) a single unit of software, and the cost of replacing hardware is outrageously more so than replacing software. As an example of the typical difference, compare the $400 million which the Pentium FDIV bug cost Intel to the negligible cost to Microsoft of a hotfix and soothing press release for the Windows bug du jour. Possibly inspired by Intel’ s troubles, AMD spent a considerable amount of time and money subjecting their FDIV implementation to formal analysis using the Boyer-Moore theorem prover, which confirmed that their algorithm was OK.

Another factor which contributes to the relative success of formal methods for hardware verification is the fact that hardware designers typically use a standardised language, either Verilog or VHDL, and routinely use synthesis tools and simulators, which can be tied into the use of verification tools, as part of the design process. An example of how this might work in practice is that a hardware simulator would be used to explore a counterexample to a design assertion which was revealed by a model checker. In software development this type of standardisation and the use of these types of tools doesn’t occur.

These two factors, the fact that hardware is much more amenable to verification than software and the fact that there’ s a much greater financial incentive to do so, is what makes the use of formal methods for hardware verification cost-effective, and the reason why most of the glowing success stories cited for the use of formal methods relate to their use in verifying hardware rather than software [131][132][133][46]. One paper on the use of formal methods for developing high-assurance systems only cites hardware verification in its collection of formal methods successes [134], and another paper concludes with the comment that several of the participants in the formal evaluation of an operating system then went on to find work formally verifying integrated circuits [124].

a mishmash of followups

In comment-41384, Sam Tobin-Hochstadt wrote:

Really?

Really. I disagree that Fetzer's entire argument is based "on a drastic over-interpretation of a single quote from Hoare". My own reading of the relevant (albeit admittedly cherry-picked) verification literature suggests that many prominent verificationists, intentionally or unintentionally, convey the following message:

  • "Besides the notion of productivity, also that of quality control continues to be distorted by the reassuring illusion that what works with other devices works with programs as well. It is now two decades since it was pointed out that program testing may convincingly demonstrate the presence of bugs, but can never demonstrate their absence. After quoting this well-publicized remark devoutly, the software engineer returns to the order of the day and continues to refine his testing strategies, just like the alchemist of yore, who continued to refine his chrysocosmic purifications."

    (The implication being that mathematical methods can demonstrate the absence of bugs, sans doute.)

  • "Without the verification, I claim that accidental complexity prevents us from doing anything more than ascribing a subjective probability that the implementation has the properties that we want and lacks the properties that we don't want to it."

    (The implication being that, with verification, one can transcend the mere ascription of subjective probability and achieve certainty.)

ibid.:

The whole thing suggests that Fetzer fails to understand that there are more distinctions to be made than those between an algorithm as described in pseudo-code in a textbook and that which executes at the bit-level on a machine. For example, a program written in SML has a formal semantics, and things can be proved about it. But it is lower-level than a simple description of an algorithm.

Is it any wonder that you found Fetzer "accusing his critics of bad faith"? Half of his critics didn't even seem to have read his paper. As Anton pointed out in comment-41398, Fetzer most certainly understands that there are many distinctons to be made. In Anton's paraphrase, the case of SML falls under "(ii) encodings of algorithms for abstract machines".

ibid.:

...models of entire chips are done in ACL2

My reading of Fetzer suggests that he would certainly agree that modeling chips in ACL2 was a good thing. That said... I went to Peter's presentation on ACL2s the other day. He mentioned that AMD continued to use ACL2 for verification of their FPU. They do not verify the entire chip. (For that matter, I'd be shocked if Intel verified the entirety of theirs. They probably limit their endeavors to the FPU as well...) What was more interesting to me though is that it took Peter a couple of hours to prove in ACL2 that his function fib (a naive O(N2) version of Fibonacci) computed the same results as fib3 (a smarter, tail-recursive version of the same). Reminded me of machine-obstructed proof...

Re: "...limited only by the reliability of the electronics"

I almost buy this claim for circuits and programs whose complexity is commensurate with that of a thermostat. For more complicated contraptions, I don't believe this is an attainable goal. A quick mental experiment, if I may. Imagine a brand-new Boeing 797 whose flight controller (all three versions of it) have been designed and built from scratch, with all of its hardware and software formally verified. Verified but never tested. Welcome aboard. Are you coming or not?

As an aside, there is an interesting article by Peter Denning called The Choice Uncertainty Principle. (The same Peter Denning who had to stand up for his editors when the verificationists began calling for Rob Kling's resignation.) The question he asks is, Given that "it is impossible to make an unambiguous choice between near-simultaneous events under a deadline, how can computations choose reliably?" His answer is, they can't so choose:

A half-signal input can cause the flipflop to enter a "metastable state" for an indeterminate time that may exceed the clock interval by a large amount. The flipflop eventually settles into a stable state, equally likely to be 0 or 1... Unambiguous choices between near-simultaneous signals must be made in many parts of computing systems, not just at interrupt flipflops...

We can summarize the preceding analysis as the choice uncertainty principle: "No choice between near-simultaneous events can be made unambiguously within a preset deadline." The source of the uncertainty is the metastable state that can be induced in the chooser by conflicting forces generated when two distinct signals change at the same time.

It is a mistake to think the choice uncertainty principle is limited to hardware. Suppose your software contains a critical section guarded by semaphores. Your proof that the locks choose only one process at a time to enter the critical section implicitly assumes that only one CPU at a time can gain access to the memory location holding the lock value. If that's not so, then occasionally your critical section will fail no matter how careful your proofs. Every level of abstraction at which we prove freedom from synchronization errors always relies on a lower level at which arbitration is solved. But arbitration can never be solved absolutely.

Allan McInnes writes in comment-41407:

...INMOS did what a lot of formal methods people tend to advocate: focused their formal efforts on a few key components that were likely to produce the most bang for the buck.

That is very true. All of the formal-methods people who actually do (or did) this stuff for a living (plus some highly responsible members of the academia) always stress the benefits of formal specification and disavow what Charlie Martin called Hoare's Folly (Message-ID: <16964@duke.cs.duke.edu>). See, for example:

While it's quite clear that

While it's quite clear that you are critical of at least the claims some people make of formal methods, after having read all of your arguments in this and other threads, I'm still left with a burning question: what is your point?

Are you suggesting that research into formal methods stop? Are you suggesting research into formal methods should decrease?

It seems obvious to me that given two systems, one built with a formally verified implementation, and one without, the verified implementation provides higher confidence in its correctness. The confidence arises because the verification provides some guarantees, not because it can guarantee everything. This degree of confidence is obviously subject to the assumptions made in the verification, assumptions which can be borne out by experimental verification, or further verification at a different level of abstraction.

Ultimately, experimental verification of a sort is required, as the scientific principles upon which all of our knowledge of the world is based, are fundamentally phenomenological.

Exploiting verification requires significantly less evidence to achieve high confidence than experimentation on an unverified implementation though. In short, I don't see how any of your arguments reduce the usefulness of formal methods, so I'm still left with the burning question: what is your fundamental point?

Fetzer, the careful epistemologist

A year and a half after the paper was published, Fetzer clarified his position in a Usenet posting (Message-ID: <3241@umn-d-ub.D.UMN.EDU>). Reproduced for posterity:

hal and others have observed that my exchange with Charles Martin has become been something less than a model of academic discourse. I want to make an attempt to salvage something worthwhile from this "debate" that might possibly shed some light on the real issues that lie below the surface here. In his article 2961, for example, Martin acknowledges the importance of the distinction between pure and applied mathematics. He also endorses the difference between absolute and relative forms of support (proofs, demonstrations, verifications, what have you). Whether or not he identifies the sources of these distictions relative to this debate is less important than that these distinctions be acknowledged as relevant — even fundamental — to the questions considered.

Much of the distress in the program verification community seems to me to have arisen from a failure to adequately distinguish between various different positions that are possible here. I freely admit that I was less concerned with drawing distinctions between various positions at the time than I was to assess those that concerned me. Let me therefore attempt to make a modest contribution toward that end by specifying several different positions that no doubt deserve to be separated, where some concern logic, others methodology, and others verification:

Positions of logic:
  • (T1) Formal proofs of program correctness can provide an absolute, conclusive guarantee of program performance;
  • (T2) Formal proofs of program correctness can provide only relative, inconclusive evidence concerning program performance.
Positions of methodology:
  • (T3) Formal proofs of program correctness should be the exclusive methodology for assessing software reliability;
  • (T4) Formal proofs of program correcness should be the principal methodology for assessing software reliability;
  • (T5) Formal proofs of program correctness should be one among various methodologies for assessing software reliability.
Positions on verification:
  • (T6) Program verifications always require formal proofs of correctness;
  • (T7) Program verifications always require proof sketches of correctness;
  • (T8) Program verifications always require the use of deductive reasoning.

My original paper was devoted to the distinction between (T1) and (T2) as matters of logic. I contend that (T1) is false but that (T2) is true. In relation to questions of methodology, I also maintain that (T3) is false, and at the ACM CSC 90 debate, I argued further that (T4) is also false. I have no doubt, however, that (T5) is true. With respect to verificaiton in the broad sense, it is my view that (T6) is clearly false and that (T8) is clearly true, but that the truth-value of (T7) is subject to debate. Much might hinge upon whether these proof sketches had to be written down, could be merely thought-through, or whatever. In the former case, (T7) becomes close to (T6), in the latter case, closer to (T8). However these matters may finally be decided, these seem to be the principal distinctions which need to be drawn to get at the issues that lie beneath the surface here.

Let me conclude with one further gesture of agreement with Charles Martin. He acknowledges that there are important differences between accounting procedures, for example, and life-critical situations. On this we could not more strongly agree. The greater the seriousness of the consequences that would ensue from making a mistake, the greater our obligation to insure that it is not made. This suggests that the role for prototyping and testing increases dramatically as life-critical tasks come into play. We can afford to run a system that has only been formally assessed when the consequences of mistakes are relatively minor (Nintendo games, for example), but not when they are serious (radiation therapy, for example). So it becomes a matter of how confident we need to be that a system will perform as it is intended to perform. We cannot know without testing the system.

Let me be the first...

...to say that I agree with the entire blockquote verbatim.

Having said that, I would want the TCB of the radiation therapy system to be confined to the hardware exclusive of the instruction set of the processor, by which I mean that, ideally, the entire software stack and processor implementation would be proven correct, and testing could be confined to ascertaining, e.g. the mean time between failures of the components, etc. That is, the role of testing in life-critical systems cannot be to prove the absence of defects, since testing cannot prove the absence of defects. But you must perform testing even in systems whose entire software stack is proven correct precisely because of the possibility of exogenous failure states.

ceci n'est pas une pipe

In comment-41458, Paul Snively wrote: "you must perform testing even in systems whose entire software stack is proven correct precisely because of the possibility of exogenous failure states."

If by exogenous you mean:

Testing cannot prove the absence of defects. But neither can proof of correctness.

then we are in agreement.

As Jeannette Wing pointed out, there are two boundaries between the mathematical world and the real world. First, you must go from informal requirements to a formal specification. (Customers cannot articulate the correct behavior but they know the incorrect behavior when they see it.) Wing says: "This mapping from informal to formal is typically achieved through an iterative process not subject to proof." Or, as Anton would say, One can't proceed from the informal to the formal by formal means. Case in point:

It is also worth noting that many problems are inherently complex. For example, what does it mean for floating-point arithmetic to be correct? It took many years to settle on the IEEE floating-point arithmetic standards 754 and 854, and William Kahan was awarded the Turing award for his contributions to this effort.

"The second boundary is crossed in the mapping from the real world to some abstract representation of it. ... The formal specification is only a mathematical approximation of the real world." As Count Korzybski would say, The map is not the territory. Case in point:

Now let me take a more sophisticated example: cryptography on smart cards. This is a field that seems to be natural for verification. Indeed we could hope to prove, for example, that a cryptographic algorithm needed exponential time to break by brute force and that it was correctly implemented on a smart card. So the smart card would be secure, right? Wrong. Along comes Paul Kocher with a watt meter and breaks the key that you have proved is secure. How did he do that? He did it by bypassing the assumptions you made in your proof.

Formal methods do lead to better software. (Although most of the value seems to come from writing the formal spec. It's not clear if program verification adds that much value beyond the benefits of formal specification.) However, formal methods do not save us from moral hazard.

As Yaneer Bar-Yam puts it,

Intuitively, the complexity of a task is the number of wrong choices for every right choice.

Formal methods do not eliminate wrong choices. They just make it possible to make wrong choices in new and exciting ways.

Program verification has been and continues to be oversold — possibly at the expense of more fruitful approaches...

A lot of this looks endogenous rather than exogenous to me, but hey, as long as we agree on the substance, we don't have to agree on the terminology. Heck, we don't even have to agree on the substance.

Points of Agreement

I don't think anyone will disagree (least of all me) that, in software development, formally specifying what you are trying to build is the hard part, whether you intend to do any kind of formal proofs or not. This is why I think that tools that allow you to rapidly evolve some kind of specification, state assumptions about it, and check for counterexamples—I am, of course, thinking of Alloy here—are the most important ones to day-to-day software development.

el-vadimo: Customers cannot articulate the correct behavior but they know the incorrect behavior when they see it.

I would say instead that customers need guidance on how to articulate correct behavior unambiguously.

el-vadimo: "This mapping from informal to formal is typically achieved through an iterative process not subject to proof."

The point is not that the iterative process be subject to proof; the point is that you can prove that you've implemented what each step in the process arrives at, and nothing else.

el-vadimo: Or, as Anton would say, One can't proceed from the informal to the formal by formal means.

Quoting Alan Perlis, of course. :-) Once again, there's nothing to disagree with here, and in fact I would point out that Perlis' point was that one should be formal (which is not the same as complete or exhaustive, per Jackson and Alloy!) from the outset.

el-vadimo: [Good examples of complexity of floating-point arithmetic and hardware-based attacks on smartcards elided for brevity.]

There are countless application domains that can be considered complex—perhaps even more complex, by some consensus definition of "complexity," than defining floating-point arithmetic, but are not definitional in nature. The most obvious example to me would be to fully specify the surface and abstract syntaxes and static and dynamic semantics of a dependently-typed programming language with some sort of imperative features that implements some process calculus at the type system level and provides information flow security features along the lines of Flow Caml. I feel quite confident in saying that this would be worth formalizing and the resulting compiler proving correct—in fact, I feel quite confident in saying that it should be correct by construction, extracted from a proof assistant such as Coq.

Would you disagree with this, and if so, why?

As for the smartcard case, it's a very nice example of the exogenous failure case: the smartcard didn't (can't) fall to any of a vast (known!) array of attacks on its cryptography. This means that it is more secure than it would be otherwise—a positive benefit. However, it fell to something exogenous, that is, outside the system addressed by the proof. No one is saying it's possible to avoid that; there's always something outside the scope of the system definition addressed by the proof, if for no other reason than the quantum mechanics, which is one reason I try to focus on programming-language design when thinking about correctness and proof: part of the design process is very much about knowing what's inside and what's outside the scope of the language; the demarcation is quite clear.

el-vadimo: Formal methods do lead to better software. (Although most of the value seems to come from writing the formal spec...

Here I think we find our strongest point of agreement.

el-vadimo: ...It's not clear if program verification adds that much value beyond the benefits of formal specification.)

I would say that this depends heavily upon the domain and costs of failure. There's a lot to glean from that paper; thanks for the link. Some parts that I find noteworthy:

Two observations arise from this discussion. The first observation is that the elimination of implementation errors would mean that any errors detected in the final outer loop must be requirements errors. If a Program Verifier were used, then only one iteration within the inner loop would be required; further there would be a significant reduction in costs due to the elimination of residual implementation errors that otherwise would be detected within the outer loop of requirements testing. The second observation is that the elimination of requirements errors in general has a larger impact than the elimination of implementation errors.

If source code generation and its formal verification can be automated then the problem of evolving requirements is mitigated if the cost of re-developing software and verifying it is significantly reduced. However the most important contribution of the Program Verifier challenge is that it could be scalable in a way that the elimination of requirements errors cannot be.

Consider the following simplistic, but useful, illustration. Suppose the elimination of requirement errors through the use formal techniques and human expertise resulted in a 50% saving in a development. Now consider the limited expertise available. For a number of development projects totalling £100, 000, 000, 000 (one hundred billion pounds), it would not be unreasonable to consider that only 1% could be addressed by the scarce human expertise available. This means that savings of 50% of one billion pounds could be achieved, a not inconsiderable sum of five hundred million pounds.

The scalability of the result of the Program Verifier challenge means that potentially all the projects totalling one hundred billion pounds could be addressed. Even a 10% saving would result in a saving of ten billion pounds. An experiment conducted by QinetiQ based on the Typhoon development indicates a saving of 30% over the system development lifecycle. An important aspect of the experiment was to compare costs to achieve the same outcome, to pass the same set of acceptance tests. This means that the objective of software verification was not to achieve a failure rate better than that of the conventional development, rather it was to achieve at least the same quantifiable failure rate but at significantly less cost.

In order to achieve the potential benefits of the Program Verifier grand challenge a number of research issues must be addressed, the following constitute a minimum.

  • A validated and rigorous measurement framework for comparing costs and benefits for development processes.
  • The development of Program Verifiers for commercial language subsets with respect to commercially accepted modelling languages, this implies two supporting objectives:
    • language semantics for subsets of commercial languages (such as C and C++) that are mechanically checkable and sympathetic to program verification needs;
    • model language semantics that are mechanically checkable and sympathetic to program verification needs.
  • Automated proof maintenance to support the evolution of the software.

Much work has already been done in these areas, but the products of these areas needs to be brought together and put into a coherent framework to support the Program Verifier challenge. A repository to collate results in this area would be an important step. The repository should also be used to make a judgement on the fitness for purpose of products in these areas. For example there are many semantic models for languages, there are rather fewer that are mechanically verifiable and directly support software verification.

This is rather obviously a call for certified compilers from modeling languages to, e.g. verifiable C. Caduceus can provide the verifiable C aspect; now we just need a certified modeling language compiler to C that emits the necessary conditions as comments in the C source that Caduceus uses. :-)

Conclusions

The popular prejudice is that formal methods add cost to add reliability, I challenge this view... A programme of work to drive down the costs of verification and validation for commercial models and languages is required. The trend is towards a set of automatic code generators from commercial modelling languages that lend themselves to automated verification i.e. a set of program verifiers.

Systems are becoming more complex increasing the chances that errors will occur at higher levels, but in order to address this we need secure foundations. Software verification is important because it provides a solid foundation for Formal Methods to be applied in the design and requirements phase to gain further benefits.

Honestly, I fail to understand how one arrives at "It's not clear if program verification adds that much value beyond the benfits of formal specification" from all of this.

el-vadimo: However, formal methods do not save us from moral hazard.

Moral hazard is related to asymmetric information, a situation in which one party in a transaction has more information than another. The party that is insulated from risk generally has more information about its actions and intentions than the party paying for the negative consequences of the risk. More broadly, moral hazard occurs when the party with more information about its actions or intentions has a tendency or incentive to behave inappropriately from the perspective of the party with less information.

While it is indeed unlikely that moral hazard can be entirely eliminated from interactions accomplished by mutually distrustful parties through software, we can, as usual, do vastly better than we do today. I would read Nick Szabo's essays as well as How to Write a Financial Contract and give Flow Caml a look.

el-vadimo: Formal methods do not eliminate wrong choices. They just make it possible to make wrong choices in new and exciting ways.

This is, of course, too vague to be helpful. Formal methods do eliminate (some) wrong choices. They also make it possible to encounter wrong choices that you didn't know were wrong when you proved that you didn't make the wrong choices you knew about. If the wrong choices you knew about would have had a significant cost to make, relative to proving that you didn't make them, you come out ahead. Given the open world assumption, that's as good as it gets.

el-vadimo: Program verification has been and continues to be oversold — possibly at the expense of more fruitful approaches...

From the "has been" link:

Finally, to avoid any misunderstanding as to our own views, let us make it clear that we in no way dispute that program verification can have a significant role to play in the vital task of clarifying specifications and removing design faults. That, however, is not the issue.

OK. Then what is?

From the "continues" link: what exactly is being oversold here? They talk about Microsoft's SDV and go on to say that it would be nice to extend the concept to software in general. Nowhere do they say anything like "do this and your computer will never crash."

The "more fruitful approaches" link is all about being correct by construction as opposed to verified ex post facto. More than anything else, this leads me to believe that I haven't been sufficiently clear in my terminology: the application of proof that I'm most interested in is precisely that which results in a compiler for a new language being extracted from a proof assistant such as Coq: that is, the compiler is correct by construction. I am 100% in agreement with Anthony Hall.

Does this mean that I am in 100% agreement with you?

The only reason that tools and techniques such as Caduceus are interesting to me is because there will always be people who insist that they use C in their mission-critical systems. It's possible to adhere to that insistence while guaranteeing that major classes of defects typically found in C code are not there. That's all that's necessary to make a tool like Caduceus worthwhile. But it's not what I recommend people use, given vastly better alternatives.

el-vadimo: A lot of this looks endogenous rather than exogenous to me, but hey, as long as we agree on the substance, we don't have to agree on the terminology. Heck, we don't even have to agree on the substance.

The irony is that I think we actually do agree on much of the substance. That is, my belief is that your belief is that my belief is that correctness proofs are a sufficient guarantor of fitness for purpose of the software subject to those proofs. Assuming that my belief about your belief is true, your belief is false. However, your exploration of the space of "proven correct," particularly as broken down between what apparently many people in the industry and literature call "program verification" as distinct from "correct by construction," suggests to me that we might find significant common ground in appreciating the value of formal specification and software that is correct by construction as opposed to verified ex post facto. The point here is not that this does any better a job at addressing all possible failure cases than software that is verified ex post facto but rather that one can achieve merely as-good-or-better results for less cost, as articulated so well in the paper you linked to and that I quote extensively above.

To summarize: you seem to be attacking the notion that software verified ex post facto can be used literally fearlessly, under any and all circumstances—a claim that no one here has made, and falls outside the scope of what most of the discussion about proof here is about, which is designing new languages whose interpreters or compilers are correct by construction. In particular, your link to Anthony Hall's paper where he argues forcefully for correctness by construction leads me to wonder if we have not been talking past each other this entire time. Nevertheless, I wish to thank you for the continued civil discussion that has, as usual, afforded me the opportunity to sharpen my own thinking on the subject.

moral hazard

In comment-41497, Paul Snively wrote:

Moral hazard is related to asymmetric information, a situation in which one party in a transaction has more information than another.

That's not what I meant. I used the term moral hazard in the conventional, van-straatenesque meaning of the phrase: Better brakes don't decrease the risk — they just allow drivers to travel faster (and thus more dangerously).

Butler W. Lampson explains in Software Components: Only The Giants Survive:

What is often overlooked is that the software crisis will always be with us (so it shouldn't be called a "crisis"). There are three reasons for this:

  • As computing hardware quickly becomes more powerful (in line with of Moore's law), new applications quickly become feasible, and they require new software. In other branches of engineering the pace of change is much slower.
  • Although it's difficult to handle complexity in software, it's much easier to handle it there than elsewhere in a system. It's therefore good engineering to move as much complexity as possible into software.
  • External forces such as physical laws impose few limits on the application of computers. Usually the only limit is our inability to write the programs. Without a theory of software complexity, the only way to find this limit is trial and error, so we are bound to over-reach fairly often. "A man's reach should exceed his grasp, or what's a heaven for."Browning.

By the way, Lampson's paper is part of Papers for Roger Needham that I mentioned already.

On the purpose of heaven

"A man's reach should exceed his grasp, or what's a heaven for."

Which is one of the better arguments for formal methods and verificiation, IMO. Go forth and verify!

allez en avant, et la foi vous viendra

In comment-41524, Anton commanded:

Go forth and verify!

As D'Alembert would say, Mssrs, knock yourselves out.

Me, I'm going back to re-reading General Systemantics: How Systems Work and Especially How They Fail by John Gall.

It might not be what you meant...

...but it is what you linked to, and keep in mind that "moral hazard" is a formal term of art in economics; you should exercise some caution in adopting a "when I use a word, it means just what I choose it to mean, neither more nor less" perspective in such cases. It is possible in the realm of economics to choose policies that reduce the possibility of moral hazard. It is likewise possible in the realm of software. I'm going to have to respectfully disagree with Dr. Lampson: we have a theory of software complexity—several of them, in fact—and there are better ways to proceed than trial and error, as you yourself have acknowledged previously.

definitional hazards

el-vadimo does seem to be using 'moral hazard' in its classic, economic sense, as defined in the first sentence of the wikipedia page he linked to: "the prospect that a party insulated from risk may behave differently from the way it would behave if it were fully exposed to the risk." In this case, we're presumably talking about the prospect of people so confident in their formal verification that they don't bother to test carefully, etc.

As raised here, that page mentions that moral hazard is "related to" asymmetric information. In our case, the party paying for the consequences of someone's overconfidence in formal verification would be whoever experiences the program's malfunction. I'm not sure how relevant that is here, though. The point is the effect (if any) that the use of formal verification has on the actions of the verification user, which I think is what el-vadimo is getting at.

But I should make it clear that my opinion is that any moral hazard related to formal verification (including "lightweight" methods such as static typing) is inconsequentially small, in theory and practice.

Agreed

If all we are stipulating here is that even verified software has failure modes when actually run, then I'm perfectly willing to stipulate that. If all we're saying is that the end-user might not be in a position to appreciate what can't fail when faced with what did fail, I'm willing to stipulate that, too. I would only say that neither of these stipulations seems at all helpful in assessing the value of verifying software.

have checker, will verify

In comment-41527, Paul Snively wrote:

I'm going to have to respectfully disagree with Dr. Lampson: we have a theory of software complexity — several of them, in fact...

Safe to assume that you disagree with Jean-Raymond Abrial as well? The father of Z and B has this to say[1]:

In our case, however, the situation is a bit different from that of early avionics in that there is no clear theory (yet) related to large computerized systems.

This echoes the words of Robin Milner[2] which, although spoken almost twenty years ago, still ring true today [emphasis mine — V.N.]:

...suddenly almost everyone who builds computing systems is convinced that all system design — hardware or software — needs to be done within a rich conceptual frame, which is articulated by rigorous methodologies, which are in turn assisted by computer-based reasoning tools...

It is excellent that this conviction is now widespread; but there is an attendant danger. Much use has been made recently of the term “formal methods” in relation to system design. The choice of this term (wherever it came from), rather than the term “theory”, suggests that the methodology is paramount; I fear that it also reflects a mistaken assumption — that is, it suggests that the conceptual frame for design already exists and that we only need to animate it by the right formal methodology.

Paul Snively:

there are better ways to proceed than trial and error

Are there? Perhaps there should be, but it doesn't sound like we've found these better ways yet[3]:

The current modelling and verification practice for embedded systems can be characterised by the slogan: “model hacking precedes model checking”. Constructing a model is typically based on the experience, intuition and creativity of an expert. In an initial phase the model is improved in trial and error fashion: first verification runs show errors in the model, rather than errors in the system. Once the model is considered stable and correct, the subsequent verification runs are considered analyses of the system.

Moreover, “the construction of verification models really is a design problem in its own right that may share many characteristics of the design problem of the system itself.”[3] Why is design hard and unformalizable? For one thing, “a large part of the problem is the scale of it; [i]t is the essence of Computer Science that if a methodology doesn’t scale up, then it isn’t a methodology at all.”[2]

Although there are social reasons[4] why truly large specifications (of which there are no known examples) are going to be just as big of a mess as current (working[5]) large systems, let's concentrate just on linguistic issues for now[2]:

... there is good evidence that this approach [of striving for a single specification language, like Z[6] — V.N.] is limited. For all the success of logicians in founding a large part of mathematics upon a single logic, they have never succeeded—and seldom claimed that they would succeed—in getting mathematicians to do their work in this one medium. The different branches of mathematics are just too various, and our linguistic invention is just too fertile, for this kind of uniformity to be sustained. And the subject matter of computation is equally various, so we would expect the same need for plurality; indeed Kim Larsen and I recently did a small experiment in process validation and found that, in the space of three or four pages, we needed three or four distinct formal calculi to express the natural proof succinctly.

When even simple problems[7] require you to use four or five distinct formalisms (Z[8], finite automata, Harel's statecharts[9], regular languages, and first-order logic), what can we expect for systems whose size is in the millions of LOC? How do we formally bridge their semantics? How do we combine partial specifications written by different teams of specifiers[4]? What does a continuous build system for formal specifications look like?

Current problems[10]:

  • Limited scope. The vast majority of techniques are limited to the specification of functional properties, that is, properties about what the target system is expected to do. Non-functional properties are in general left outside any kind of formal treatment.
  • Poor separation of concerns. Most techniques provide no support for making a clear separation between (a) intended properties of the system considered, (b) assumptions about the environment of this system, and (c) properties of the application domain.
  • Low-level ontologies. The concepts in terms of which problems have to be structured and formalized are programming concepts — most often, data and operations.
  • Isolation. Formal specification techniques are isolated from other software products and processes both vertically and horizontally. Vertical isolation: specification techniques generally pay no attention to what upstream products in the software lifecycle the formal specification is coming from (viz. goals, requirements, assumptions) nor what downstream products the formal specification is leading to (viz. architectural components). Horizontal isolation: the techniques generally do not pay attention to what companion products the formal specification should be linked to (e.g., the corresponding informal specification, a documentation of choices, validation data, project management information, etc.)
  • Poor guidance. The main emphasis in the formal specification literature has been on suitable sets of notations and on a posteriori analysis of specifications written using such notations. Constructive methods for building correct specifications for complex systems in a safe, systematic, incremental way are by and large non-existent. Instead of inventing more and more languages, one should put more effort in devising and validating methods for elaboration and modification of good specifications.
  • Cost. Many formal specification techniques require high expertise in formal systems in general (and mathematical logic in particular), in analysis techniques, and in the white-box use of tools. Due to the scarcity of such expertise their use in industrial projects is nowadays still highly limited in spite of the promised benefits.
  • Poor tool feedback. Many analysis tools are effective at pointing out problems, but in general they do a poor job of (a) suggesting causes at the root of such problems, and (b) proposing recovery actions.

The bottom line is, I have yet to see a theory of specification-in-the-large[11]. In particular, I'm interested in a theory that, in the words of Tony Hoare[12], can “fully explain why real big programs work.” (In other words, Crisis? What crisis?[5])

References

  1. Jean-Raymond Abrial, Managing the Construction of Large Computerized Systems.
  2. Robin Milner, Is Computing an Experimental Science?, Jan 17, 1986
  3. Ed Brinksma and Angelika Mader, On Verification Modelling of Embedded Systems
  4. Melvin E. Conway, How Do Committees Invent?, Datamation magazine, April, 1968.
  5. Robert L. Glass, The Standish Report: Does It Really Describe a Software Crisis?, CACM, Aug 2006, vol. 49, Issue 8.
  6. J.M. Spivey, The Z Notation: A Reference Manual.
  7. Pamela Zave and Michael Jackson, Where Do Operations Come From? A Multiparadigm Specification Technique, IEEE Transactions on Software Engineering, July 1996, Vol. 22, No. 7, pp. 508-528.
  8. ISO/IEC 13568:2002, Information Technology – Z Formal Specification Notation – Syntax, Type System and Semantics.
  9. David Harel, On Visual Formalisms, May 1988, Communications of the ACM, Vol. 31 , Issue 5, pp. 514-530
  10. Axel van Lamsweerde, Formal Specification: a Roadmap.
  11. Frank DeRemer and Hans Kron, Programming-in-the large versus programming-in-the-small.
  12. VSTTE: Topics and Questions, (the above quote is attributed to Hoare by K. Rustan M. Leino)

Jevons paradox

Re: better brakes:

Better brakes don't decrease the risk — they just allow drivers to travel faster (and thus more dangerously).

Note to self: cf. the Jevons paradox:

In economics, the Jevons paradox, sometimes called the Jevons effect, is the proposition that technological progress that increases the efficiency with which a resource is used tends to increase (rather than decrease) the rate of consumption of that resource.

Enzo Ferrari

IIRC, I think it was Enzo Ferrari who once commented on that his cars had poor brakes: "Who needs brakes? Anyone can make a car that goes slow."

Combine that with Jevons and we all should be driving in SUVs with poor brakes.

Hmmm... Yeah... That explains C#, definitely...

The social acceptance of risk

Some of the above comments--indeed the whole debate as to what constitutes an "acceptable" risk is occasionally inconsistent.

Certain classes of devices--medical instrumentation and implants, avionics, weapons control systems--are held to a very high standard by regulatory authorities.

Software in other devices whose malfunction might lead to the death of the operator or others, however, is not. Here is one of several coding standards promulgated by AUTOSAR, an industry consortium which deals with control architectures (including software) in automobiles--the referenced document discusses how code written in C shall be written. The majority of the rules seem to syntactical in nature--things like "don't declare two variables in one statement". Quite a few appear to be rules designed for no other reason to facilitate brain-dead code analysis tools, such as a rule prohibiting whitespace between an operator and a symbol in certain cases. (!!!)

As far as I'm aware, no government agency regulates automotive control software. Given that the automotive industry has a long track record of preferring to defend itself in lawsuits than design cars to be as safe as possible, this sort of standard really doesn't surprise me.

Going beyond that--in many cases, software is held to a higher standard than hardware.

Going beyond that, in many cases, risk *is* socially acceptable. We don't mind at all that automobile manufacturers design and sell cars capable of travelling at twice the posted speed--an activity which is hazardous even when done by a professional driver on a closed course; and frequently deadly when done by drunken teenagers on the highway. But this is a tradeoff we are willing to make, as someone remarked (who? Google isn't finding the quote for me), if safety was the only concern for transportation, we'd all ride tricycles.

That said, 50,000 people or so a year perish on the roads and highways of the U.S.. Many parts of the world are far worse per capita. There are still parts of the world where it is socially unacceptable to wear seatbelts (those that do are mocked as "nerds" or similar), and other basic safety precautions are routinely ignored.

Is it rational, after all, to give Grandpa a pacemaker which has been formally verified and tested in every which way to never fail, when he drives around town in a used Honda Civic with bad brakes, and occasionally forgets to wear his glasses? In some cases, it seems what risks we tolerate is quite arbitrary (or, being more cynical, caclulated to protect those with deep pockets from getting sued).

The mathematical and physical aspect of program verification (and verification of program instances on particular real-world systems) are, in many ways, much further ahead than the social, political, and economic aspects of the question.

wget --recursive --no-parent http://vstte.ethz.ch/Files/

VSTTE posted about 13MB worth of position papers on software verification. Sitting in the front row is Sir Charles Antony Richard Hoare. The list of authors and the breadth of coverage are mind-boggling:

  1. On Constructing Large Computerized Systems (a position paper), Jean-Raymond Abrial
  2. Integrating Theories and Techniques for Program Modelling, Design and Verification, Bernard K Aichernig, He Jifeng, Zhiming Liu, and Mike Reed
  3. Trends and Challenges in Algorithmic Software Verification, Rajeev Alur
  4. The Role of Invariants in an Automatic Program Verifier, Myla Archer
  5. WYSINWYX: What You See Is Not What You eXecute, G. Balakrishnan, T. Reps, D. Melski, and T. Teitelbaum
  6. The Verified Software Challenge: A Call for a Holistic Approach to Reliability , Thomas Ball
  7. The Spec# Programmin System: Challenges and Directions, Mike Barnett, Robert DeLine, Bart Jacobs, Manuel Fähndrich, K. Rustan M. Leino, Wolfram Schulte, and Herman Venter
  8. Dependent Types, Theorem Proving, and Applications for a Verifying Compiler, Yves Bertot and Laurent Théry
  9. Whither Verified Software?, Ramesh Bharadwaj
  10. Linking Content Definition and Analysis to What the Compiler Can Verify, Egon Börger
  11. Scalable Software Model Checking Using Design for Verification, Tevfik Bultan and Aysu Betin-Can
  12. Model-Checking Software Using Precise Abstactions, Marsha Chechik and Arie Gurfinkel
  13. Model Checking: Back and Forth Between Hardware and Software, Edmund Clarke, Anubhav Gupta, Himanshu Jain, and Helmut Veith
  14. A Constructive Approach to Correctness, Exemplified by a Generator for Certified Java Card Applets, Alessandro Coglio and Cordell Green
  15. The Verification Grand Challenge and Abstract Interpretation, Patrick Cousot
  16. Automatic Verification of Strongly Dynamic Software Systems, N. Dor, J. Field, D. Gopan, T. Lev-Ami, A. Loginov, R. Manevich, G. Ramalingam, T. Reps, N. Rinetzky, M. Sagiv, R. Wilhelm, E. Yahav, and G. Yorsh
  17. Toasters, Seat Belts, and Inferring Program Properties, David Evans
  18. Decomposing Verification by Features, Kathi Fisler and Shriram Krishnamurthi
  19. Verifying Design with Proof Scores, Kokichi Futatsugi, Joseph A. Goguen, and Kazuhiro Ogata
  20. On the Formal Development of Safety-Critical Software, Andy Galloway, Frantz Iwu, John McDermid, and Ian Toyn
  21. Verify Your Runs, Klaus Havelund and Allen Goldberg
  22. Software Verification and Software Engineering: A Practitioner's Perspective, Anthony Hall
  23. Specified Blocks, Eric C. R. Hehner
  24. Let's Not Forget Validation, Mats P. E. Heimdahl
  25. Some Verification Issues at NASA Goddard Space Flight Center, Michael G. Hinchey, James L. Rash, and Christopher A. Rouff
  26. Verified software: theories, tools, experiments - Vision of a Grand Challenge project, Tony Hoare and Jay Misra
  27. Reliable Software Systems Design: Defect Prevention, Detection, and Containment, Gerard J. Holzmann and Rajeev Joshi
  28. Performance Validation on Multicore Mobile Devices, Thomas Hubbard, Raimondas Lencevicius, Edu Metz, and Gopal Raghavan
  29. Tool Integration for Reasoned Programming, Andrew Ireland
  30. What can we do (technically) to get "the right specification?", Cliff B. Jones
  31. A Mini Challange: Build a Verifiable Filesystem, Rajeev Joshi and Gerard J. Holzmann
  32. Integrating Static Checking and Interactive Verification: Supporting Multiple Theories and Provers in Verification, Joseph R. Kiniry, Patrice Chalin, and Clément Hurlin
  33. Decision Procedures for the Grand Challenge, Daniel Kroening
  34. Implications of a Data Structure Consistency Checking System, Viktor Kuncak, Patrick Lam, Karan Zee, and Martin Rinard
  35. Lessons from the JML Project, Gary T. Leavens and Curtis Clifton
  36. Tools for Formal Software Engineeering, Zhiming Liu and R. Venkatesh
  37. The Challenge of Hardware-Software Co-Verification, Panagiotis Manolios
  38. From the How to the What, Tiziana Margaria and Bernhard Steffen
  39. Computational Logical Frameworks and Generic Program Analysis Technologies, José Meseguer and Grigore Rosu
  40. A Mechanized Program Verifier, J Strother Moore
  41. Integrating Verification Components, Leonardo de Moura, Sam Owre, Harald Rueß, John Rushby, and Natarajan Shankar
  42. Reasoning about Object Structures Using Ownership, Peter Müller
  43. Modular reasoning in object-oriented programming, David A. Naumann
  44. Where is the value in a Program Verifier?, Colin O'Halloran
  45. Scalable Specification and Reasoning: Technical Challenges for Program Logic, Peter W. O'Hearn
  46. Towards a Worldwide Verification Technology, Wolfgang Paul
  47. It is Time to Mechanize Programming Language Metatheory, Benjamin C. Pierce, Peter Sewell, Stephanie Weirich, and Steve Zdancewic
  48. An Overview of Separation Logic, John C. Reynolds
  49. A Perspective on Program Verification, Willem-Paul de Roever
  50. Automated Test Generation And Verified Software, John Rushby
  51. Meta-Logical Frameworks and Formal Digital Libraries, Carsten Schürmann
  52. Generating Programs plus Proofs by Refinement, Douglas R. Smith
  53. Languages, Ambiguity, and Verification, The SPARK Team
  54. The Importance of Non-theorems and Counterexamples in Program Verification, Graham Steel
  55. Regression Verification - a practical way to verify programs, Ofer Strichman and Benny Godlin
  56. Programming with Proofs: Language-Based Approaches to Totally Correct Software, Aaron Stump
  57. Position Paper: Model-Based Testing, Mark Utting
  58. Toward the Integration of Symbolic and Numerical Static Analysis, Arnaud Venet
  59. Abstraction of Graph Transformation Systems by Temporal Logic and Its Verification, Mitsuharu Yamamoto, Yoshinori Tanabe, Koichi Takahashi, and Masami Hagiya
  60. Program Verification by Using DISCOVERER, Lu Yang, Naijun Zhan, Bican Xia, and Chaochen Zhou
  61. Constraint Solving and Symbolic Execution, Jian Zhang

verifying compiler: the grand challenge

Since it doesn't seem to have been mentioned here before and the Wikipedia is likewise silent on this issue, let me put the VSTTE collection of papers in context.

UKCRC (UK Computing Research Committee) sponsored a workshop on "Grand Challenges for Computing Research" back in 2002. Some fifty researchers attended and contributed proposals. C.A.R. Hoare drafted criteria for maturity of a grand challenge, accompanied by a sample report on the Verifying Compiler. You can find an early version of his draft in a compendium of papers (PDF, 4.7MB, 289 pages) entitled Computer Systems: Papers for Roger Needham. (There is some interesting stuff there, by the way.)

Rechristened as GC6: Dependable System Evolution (Jim Woodcock), the task of building a Verifying Compiler was adopted as one of the grand challenges. Hoare published The Verifying Compiler: A grand Challenge for Computing Research in the 50th anniversary issue of JACM, where the following grand-challenge papers were also published:

If you don't have an ACM subscription, Google says you can find a copy of Hoare's paper here. Also, a slightly different version of this paper is available here.

That was 2003. Fast-forward to 2005. A group of "the top international experts on systematic methods for specifying, building and verifying high-quality software" gathered at VSTTE — a "conference held as a response to Tony Hoare's Grand Challenge on the 'verifying compiler'." Hence the avalanche of papers.

Physical semantics

When I listen to the current discussion of "formal methods" I hear the expressions like "automatic theorem proving", "verified compilers", or "proof carrying code". What I don't hear is any discussion of "semantics", and I use quotes here because I don't mean programming language semantics but the physical semantics of the system. We might also call this systems theory.

But few people seem to get the idea that soft-ware also has physical semantics. Imagine a communications system perhaps an encoder. At one time long ago this would have been an electromechanical device, later it was replaced by a discrete electronic device, around 1975 this device was replaced with an 8080 microprocessor and suddenly became a soft-ware system. In all this "change" the physical semantic theory is still the same as it always was. As the process goes on we get more and more software but the physical semantics stays the same, the systems continue to work and are reliable to the extent that the PHYSICAL semantics is correct. The software can be seen as a refinement as defined in certain formal methods.

But where is the physical semantics in this discussion? There is good work in this area. The ASM work is a good example. But there is a problem here. The ASM method and many other formal and semiformal methods are about defining semantics on a high level and moving down into the details in a series of refinements. This is not a "programming language" method. The code itself is not the primary work product but the semantics is. If this agenda is successful a lot of people will have to change their ways.

Getting back to the automatic theorem proving agenda that I started with. How does this method capture high level system semantics. Does it even address the issue? We still can't seem to define "side effect" semantics in programming languages. When this is done side effects are not problematic. I need to know a lot more before I can get excited about this agenda. The goals don't really help us define semantics. But heck, who can complain about more tools?

Clarification?

Hank Thediek: But few people seem to get the idea that soft-ware also has physical semantics. Imagine a communications system perhaps an encoder. At one time long ago this would have been an electromechanical device, later it was replaced by a discrete electronic device, around 1985 this device was replaced with an 8080 microprocessor and suddenly became a soft-ware system. In all this "change" the physical semantic theory is still the same as it always was. As the process goes on we get more and more software but the physical semantics stays the same, the systems continue to work and are reliable to the extent that the PHYSICAL semantics is correct.

I must be missing something, because to me, this sounds like a "communication protocol definition." That's not to say that there isn't overlap there with programming language semantics; there certainly is: for example, it's not unusual to have such a protocol defined with the traditional parser-and-lexer-generators used in programming language development.

Hank Thediek: Getting back to the automatic theorem proving agenda that I started with. How does this method capture high level system semantics. Does it even address the issue? We still can't seem to define "side effect" semantics in programming languages.

Once again, I think I must be missing something, because side-effects most definitely have been treated in formalizing programming language definitions. For example, Standard ML has a formal semantics and supports side-effects. Also, Winskel's Formal Semantics of Programming Languages deals with side-effects in all of small-step operational semantics, big-step operational semantics, denotational semantics, and axiomatic semantics. This semantics survey in Coq is based on about the first 100 pages of Winskel's text and adds abstract interpretation to the mix.

Is this what you meant? Or am I completely off track here?

Research

The problem I find with all the languages I know, and that doesn't include Standard ML, is that they allow you to do things without supplying semantics. Loops and accumulators are good examples. I can program these things but the compiler or interpreter doesn't know that a loop has a certain semantics or what that semantics is, and therefore doesn't prevent me from making certain mistakes. For instance choosing a step size that is too small or too large. This is true of a great many things that I can program. It is not up to the compiler to get it right , it is up to me. How is this going to change in this new world of automatic proof? How does the programmer supply the missing semantics. How does the compiler understand it?

I see how this works in the "classical" formal methods such as the ASM method for example. But for a similar thing to work in this new agenda the compiler would need to understand any semantics I might need to use. Has this been addressed? How?

Also as I stated above, it is not "formal semantics of programming languages" but physical semantics in general, and in my example this does includes protocols and also what they may mean.

Also I understand that this is a research agenda, and that there are ways to address these issues, an that some of this may have been done already, and lastly that until this is finished none of us will know for sure.

Edit: as an example of a language that understands loop and accumulator semantics see System Dynamics.

Edit2:Here are some slides that go with the ASM Book. The slides relate the ASM method, which I am taking as an example of physical semantics, to a very long list of semantic systems.
Universal computation model

Brief, woefully inadequate response

OK, I looked at the slides on universal computation. First, I have to say that they're excellent—extremely thorough and not totally incomprehensible even given that I'm not familiar with all of the models of computation that they touch on.

Hank Thediek: The problem I find with all the languages I know, and that doesn't include Standard ML, is that they allow you to do things without supplying semantics. Loops and accumulators are good examples. I can program these things but the compiler or interpreter doesn't know that a loop has a certain semantics or what that semantics is, and therefore doesn't prevent me from making certain mistakes. For instance choosing a step size that is too small or too large. This is true of a great many things that I can program. It is not up to the compiler to get it right , it is up to me. How is this going to change in this new world of automatic proof? How does the programmer supply the missing semantics. How does the compiler understand it?

Without having delved into how iteration is expressed in ASMs, it seems like what you're asking for is some means of reasoning about iteration. The first observation that I would make is that many languages are offering support for iteration that is more rigorous with respect to what's being iterated over, as well as mechanisms for ensuring that only contents meeting some criteria are iterated over. These features are frequently seen in the guise of "map" and "filter," although around here people are more likely to point out that they're all folds of one kind or another. :-) More generally, though, when there isn't an enforced relationship between the thing being iterated over and the iteration, we talk about loop variants and invariants. It's true that most languages don't enforce the provision of these. Tools like JML, hence Krakatoa for Java code, or Caduceus for C code, allow us to write stronger specifications about the behavior of Java or C code and prove that our code adheres to these stronger specifications. This set of slides does a good job of explicating reasoning about loops in this fashion.

When I talk about using a proof assistant to develop a compiler, however, bear in mind that the extent to which the language the compiler compiles helps you deal with iteration and its correctness is up to the language design, not the proof assistant (that is, it's up to the object language, not the meta-language). The proof assistant is just there to help guarantee that the compiler actually compiles the object language correctly. For example, if, for some reason, you wanted to design a language that was not Turing complete because it forbade general recursion, you would want to use the proof assistant to prove that, for all functions expressible in your language, those functions are total. That leaves out for, while, and do...until loops where termination conditions can perpetually be false, for example. But there's nothing whatsoever stopping you from using Coq (a good example of a language in which all functions must be total) as the meta-language to develop an interpreter or compiler for a language that doesn't have that constraint (the IMP language from Winskel, Leroy's CompCert subset of C, Benton's semantically type-sound compiler for a simple language (reminiscent of IMP), etc).

Does this help at all?

Annotation systems

I find this very helpfull. It seems that these are annotation systems similar or related to SPARK. Thanks for the links.

on playing catch-up

Before I forget... A couple of quotes from Talking Model-Checking Technology by Leah Hoffman (A conversation with the 2007 ACM A.M. Turing Award winners.)

Joseph Sifakis:

First of all, it can be very challenging to construct faithful mathematical models of complex systems. For hardware, it's relatively easy to extract mathematical models, and we've made a lot of progress. For software, the problem is quite a bit more difficult. It depends on how the software is written, but we can verify a lot of complex software. But for systems consisting of software running on hardware, we don't know how to construct faithful mathematical models for their verification.

Joseph Sifakis:

We know how to verify systems of medium complexity today — it's difficult to say but perhaps a program of around 10,000 lines. But we don't know how to verify very complex systems.

Edmund M. Clarke:

We're always playing a catch-up game; we're always behind. We've developed more powerful techniques, but it's still difficult to keep up with the advance of technology and the complexity of new systems.

A quote from In Search of Dependable Design — another article by Leah Hoffman published in the same issue of CACM:

"Computer science is a very young discipline," explains Joseph Sifakis, research director at CNRS. "We don't have a theory that can guarantee system reliability, that can tell us how to build systems that are correct by construction. We only have some recipes about how to write good programs and how to design good hardware. We're learning by a trial-and-error process."

SECD verification project

Note to self: Hans Hübner, writing on the SECD verification project, had this to say:

In the SECD hardware implementation that I worked on, I used the original microcode of the SECD chip developed at the University of Calgary as part of a verification project. I found two major bugs in this microcode which, ironically, was part of the file set that compromises the proof that the SECD hardware implementation was correct according to its abstract specification. One bug was the garbage collector which did not work at all, and the other was in the RAP instruction which did not work either. I have trouble not concluding something regarding academia in general from these findings.

that.

is. truly. awesome.

Verification methodology

Beyond the fact that it is often cited as a success story for HOL and that it resulted in a book, I don't know much about Brian Graham's verification methodology for his SECD microcode. It's hard to know what to make of this item in lieu of this, so I'd be delighted if anyone could give pointers to literature that explains this.