## On the (Alleged) Value of Proof for Assurance

Ehud is about to be annoyed with me. :-)

I am lately suffering from doubts about whether the value of proof is justfied by its cost. Some questions, in no particular order:

• Proof seems to be an all-in or all-out exercise. There does not seem to be a viable strategy for initial, limited adoption followed by progressive improvement. Type systems seem to be more "continuous" in their adoption options. Given the (admittedly incomprehensible) expressive power of something like ATS, what is the real semantic gap between provers and type systems these days? Do type-based property embeddings offer the possibility of more incremental and more cost/benefit-aware adoption of checking techniques?
• The goal of proof isn't to be correct. It is to be confident. This has two parts: (1) confidence that you got something right, and (2) ability to convince a second party that your confidence is justified. Assume that you succeed in your proof. To what degree has use of a prover inherently failed objective [2]? If the final objective is credible correctness, then there is presumably a trade-off between accuracy and comprehensibility. Does proof (in its current form, at least) err too much on the side of accuracy?
• Given the likelihood of (possibly retrospective) failures of total abstraction, to what degree is confidence in proof misplaced? If so, is the cost of discharge justified?

There is no question in my mind that proof processes generate more robust code. Yet the general concensus seems to be that this robustness is much more an emergent consequence of rigorously understanding the problem then a result of the proof discharge. In this view, the primary benefit of proof (in the context of assurance) is largely to keep the specification "honest". If this is in fact the goal, is proof the best way to accomplish that goal? How much of that goal can be undertaken by type systems?

In a private conversation, Gernot Heiser (NICTA, OK Labs) was recently asked how use of proof impacted their QA costs. As I recall it, his answer was that they run between 1.5x and 2x the end-to-end cost of conventional development and testing, but they achieve much higher confidence. My questions:

• Might there be a better trade-point between cost and value (in the form of confidence)?
• To what degree is their confidence well-founded?

In a second private conversation, Eric Rudder observed that one of the costs of proof-based methodology was a loss of ability to rapidly adapt software to new requirements and new demands. It follows that proof is not always appropriate, and that a more continuous cost/time/benefit option space would be desirable.

My own observation is that in the end, systems need to be robust, and they include components that lie well outside our ability to prove them. In many cases, type can be usefully exploited when proof cannot, and there is a training cost to educating people in both domains.

So finally my question:

Once we step away from formal semantics and PL insights (which are certainly good things), what is the proper role of proof in real-world production? And what is the proper role of advanced type systems?

## Comment viewing options

### type can be usefully

type can be usefully exploited when proof

Types provide proofs -- when you keep making the distinction, is it because you mean your use of types is for proving not exactly the property of interest, choosing to save time (or whatever reason) by only proving often simpler base properties like memory safety?*

### Continuity of scope

Good question, and thanks for asking.

Yes, types provide proofs, but we tend to use them in ways that are qualitatively different than our use of theorem provers. Indeed, it often seems that we reach for theorem provers because the expressiveness of types isn't covering what we need. And when the need is sufficient to justify theorem proving, it often turns out that the original language type system gets re-expressed within the prover as part of the semantic model of the language. Somewhere in the process, we seem to step across a chasm of complexity, comprehensibility, and cost.

In traditional type systems, the expressiveness of types tends to exclude value-related and/or alias-related properties. In most cases it can also control-related properties, though effect types cover some of that space. In consequence the discharge of the type-expressable properties is straightforward, the pain is mild, and the gain is clearly worthwhile.

I find the work on dependent type interesting and frustrating. ATS and friends can clearly deal with value-dependent properties. A few such properties are comprehensible, most are not. I'm not sure about alias-dependent properties; at first glance these require a different sort of quantification than seems readily expressed with dependent type. Control-dependencies are expressible, but perhaps not comprehensibly.

I think my concern overall is centered on engineerability. It isn't a question of whether we will undertake proof. As you point out, we do so routinely. It is a matter of whether we can somehow make the path to deeper proof more incremental and less a matter of crossing chasms.

For sufficiently programs, proof seems like a "least acceptable standard of practice". For other programs, the confidence achieved from proof is not cost-justifiable. But it seems to me that there are many programs in the middle, where we would like more confidence than we get from usable type systems, but perhaps not as much confidence as proof requires. At the moment, this seems to be an "excluded middle".

### But it seems to me that

But it seems to me that there are many programs in the middle, where we would like more confidence than we get from usable type systems, but perhaps not as much confidence as proof requires. At the moment, this seems to be an "excluded middle"

Refinement types and contracts are filling this space.

### Pedersen's Law

"Almost all theorems are correct, but almost all proofs have bugs."

### Law

I think the first half of that statement runs afoul of Sturgeon's Law.

### Afoul?

I am not sure you are using this word correctly, having read the two addages representing Sturgeon's Law on wikipedia:

I repeat Sturgeonâ€™s Revelation, which was wrung out of me after twenty years of wearying defense of science fiction against attacks of people who used the worst examples of the field for ammunition, and whose conclusion was that ninety percent of SF is crudâ€.[1] Using the same standards that categorize 90% of science fiction as trash, crud, or crap, it can be argued that 90% of film, literature, consumer goods, etc. are crap. In other words, the claim (or fact) that 90% of science fiction is crap is ultimately uninformative, because science fiction conforms to the same trends of quality as all other artforms do.

Edit: I do think Sturgeon's Law is an amusing rebuttal to Pedersen's Law, though!

### Sturgeon vs. Pedersen

Re: comment-57850:

I do think Sturgeon's Law is an amusing rebuttal to Pedersen's Law, though!

Sturgeon's Law (also known as Sturgeon's Revelation) is orthogonal to Pedersen's Law. What you and the likes of Doron Zeilberger seem to fail to realize is that it takes more for a theorem to rise above the level of crud than merely being correct.

### Not even wrong?

Z-Bo said it was an amusing rebuttal — not a valid rebuttal. In addition to the ninety percent crud including some stuff that's correct, the ten percent non-crud includes some stuff that isn't correct.

### What I fail to realize :)

I am personally a big fan of something I picked up from Gary Kasparov, adapted from his article The Chess Master and the Computer: a refinement to Moravec's paradox I call the Kasparov Principle.

In what Rasskin-Gutman explains as Moravecâ€™s Paradox, in chess, as in so many things, what computers are good at is where humans are weak, and vice versa. This gave me an idea for an experiment. What if instead of human versus machine we played as partners?

[...] Despite access to the â€œbest of both worlds,â€ my games with Topalov were far from perfect. We were playing on the clock and had little time to consult with our silicon assistants. Still, the results were notable. A month earlier I had defeated the Bulgarian in a match of â€œregularâ€ rapid chess 4â€“0. Our advanced chess match ended in a 3â€“3 draw. My advantage in calculating tactics had been nullified by the machine.

This experiment goes unmentioned by Russkin-Gutman, a major omission since it relates so closely to his subject. Even more notable was how the advanced chess experiment continued. In 2005, the online chess-playing site Playchess.com hosted what it called a â€œfreestyleâ€ chess tournament in which anyone could compete in teams with other players or computers. Normally, â€œanti-cheatingâ€ algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less â€œintelligentâ€ than the playing programs they detect.)

Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.

The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. [...] Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.

### more on Sturgeon's revelation

For mathematics I'd reformulate it as "90% of proofs are crud." However, in modern times story writers aren't allowed to fix the bugs in one anothers' stories, whereas in mathematics this is encouraged, so less than 90% of surviving proofs are crud.

### As You MIght Imagine...

...I've been thinking about this a lot for a long time. Of course, on one hand, the Curry-Howard Isomorphism tells us that there's no difference between "proof" and "advanced type system." But that ignores a whole raft of issues, ranging from what, exactly, you want to prove to the pragmatics of designing a language with an "advanced enough" type system that doesn't end up shading into the realm of "theorem provers" rather than programming languages.

Er... unless that's what you want.

Here's what I want: a language in which it's possible to write this or this sort of thing without superhuman effort, ideally with some kind of type inference. It may be that figuring out what heuristics to employ in a sufficiently-advanced type inference system to handle cases like this is very difficult, but I take Ur/Web as inspiration in that it also employs some features of dependent types while providing a form of type inference.

Finally, I would be mindful of Adam Chlipala's comments here, in the section "Why Not a Different Dependently-Typed Language?"

None of the competition has well-developed systems for tactic-based theorem proving. Agda and Epigram are designed and marketed more as programming languages than proof assistants. Dependent types are great, because they often help you prove deep theorems without doing anything that feels like proving. Nonetheless, almost any interesting certified programming project will benefit from some activity that deserves to be called proving, and many interesting projects absolutely require semi-automated proving, if the sanity of the programmer is to be safeguarded. Informally, proving is unavoidable when any correctness proof for a program has a structure that does not mirror the structure of the program itself.

I see proof as critical because I want to write software that is able to make genuinely radical claims of correctness, including security and privacy. I want to build digital financial derivatives that are transparent enough to be audited by any interested party with a computer without giving away the farm. I want to create a digital cash system such that it can't be inflated by the unconstitutional money supply monopoly but otherwise has all of the desirable properties of cash. And so on.

No doubt other people have different motivations. :-)

### "does not mirror the structure of the program itself"

that sounds like a rough road since i would guess it means more work is required to keep the proof and the program corresponding properly.

### Automation

Absolutely; that's a central reason that Adam is so adamant about the importance of proof automation, and a major reason that he, and I, prefer Coq to the alternatives.

### The Composing Contracts papers

Composing Contracts papers by SPJ, Jean-Marc Eber and Julian Seward.

It's been listed many times, and has probably been read by many LtU readers:

Apart from that, I would say that we need more research about program reliability via improved language design. See John Gannon's 1975 Ph.D. thesis Language Design to Enhance Programming Reliability, Weinberg's Psychology of Computer Programming, and the original Peopleware papers.

Finally, Software Estimation: A Rigorous and Practical Approach by Fenton and Pfleeger is called that for a reason - most "software engineers" do not have clear goals when building a system. When they do, and you criticize those goals, they often are upset at you, rather than realize that their goals were set for the very purpose of having independent reviewers audit their design & implementation. Nothing pisses me off more than "look, I skimmed your critique, here is why we did what we did." No, you nitwit, I read your design documents, where you SHOULD have stated why you were doing what you did. - I don't need you to revise history for me and make scientific evaluation of your ideas impossible.

Clarity is emerging, but not through discipline. There are peaks and valleys, and for every one step forward, somebody with more money and more advertising takes two steps backward and massively miseducates thousands of college students. If you look at the advances in Software Configuration Management and Product Data Management, this is the sort of clarity I'm refering to. The problem domain is stabilizing, and the only problem left is configurable modularity, reliability and robustness.

My final remark would be that it is quite clear to me that places like the Software Engineering Institute are in some ways reinventing the wheel in trying to improve things. - But I have a lot of knowledge about the history of things like Computer Science Education (My friend and I delivered an oral report to the whole math/CS faculty my senior year in college asessing/auditing the strengths and weaknesses of the curriculum, and we read 10 years of ACM SIG CSE papers, and conducted surveys, as a way to back up our 'opinions'.)

### Missing the point

Clear goals with rigorous auditing when building a system is frequently counter-productive. You can have aspirational goals when you start out, but along the way you discover things that make it profitable to change course. If you audit a system in view of its original goals, you're missing the point.

Or to put it in military terms: "the plan is useless, but planning is essential" (Eisenhower).

### Baseline Project Plan, Current Project Plan

It's commonly accepted software engineering wisdom that you should have a baseline plan (a Cover Your A-- Plan to show the suits whenever they "discover things that make it profitable to change course") and a current project plan, covering the current scope and vision of the project. You can then use this to defend yourself when you need to argue for a revising of the schedule or slashing of the features to stay within cost, within budget, and meeting the user's real needs now and in the future.

Most people don't actually do this in open source, because there is no direct cost or budget. Furthermore, if you are working on an open source project but sponsored by a company, you might have a manual for your users, and that's it. If you've ever read Dreaming In Code, it is absolutely hilarious/sad how the project founder couldn't even set clear goals for the project - but had no problems setting a clear bet with Ray Kurzweil for when a computer would reach the Turing Test. And the project founder taught Software Engineering undergraduate classes at a premeire US university!

If you don't have clear goals, you need to work on gathering requirements. Or use SCRUM or XP and try to kill the project as soon as possible when it is obvious it is going nowhere. Those agile methodologies are great for projects with ill-defined "build something revolutionary and name it after My Dog" scope.

"You can't control what you can't measure." -- Tom DeMarco

### Perhaps my perspective is

Perhaps my perspective is warped by my position. I speak from the side that is creating new software products to be sold to third parties. There is no bound on requirements that can be gathered, because the set of requirements define the product that you're going to build - but you don't know how easy that will be to build, or whether it turns out to be easier to build something else, for a different set of requirements.

### Update

BTW, I e-mailed Bertrand Meyer a few weeks ago and he told me that an unofficial version of his paper on composing contracts in an object-oriented, design-by-contract fashion is freely available online, as well.

SOFTWARE ARCHITECTURE: OBJECT-ORIENTED VS FUNCTIONAL

One of the arguments for functional programming is better modular design. By analyzing publications advocating this approach, in particular through the example of a framework for financial contracts, we assess is strengths and weaknesses, and compare it with object-oriented design. The overall conclusion is that object-oriented design, especially in a modern form supporting high-level routine objects or â€œagentsâ€, subsumes the functional approach, retaining its benefits while providing higher-level abstractions more supportive of extension and reuse.

I always wanted to read this paper, but could not bring myself to purchase Beautiful Code.

### I always wanted to read this

I always wanted to read this paper, but could not bring myself to purchase Beautiful Code.

Why not? Many of the chapters are a lot of fun. You didn't get turned off by Jonathan Edwards' complaint that the whole notion of "beautiful code" is overrated, did you?

### Two reasons

1) Jonathan and I both had the same complaint, although he had no influence on my view. I wrote my own viewpoints in the comments on his blog. Please see: "No one likes saying what they do for a living is ugly, particularly if they get a lot of enjoyment out of it. There is a psychological term for this: Cognitive Dissonance." -- I really love programming, and am very passionate about it. I just do not think programming will ever be beautiful. I also think most people do not understand what Knuth meant by "art" and Dijkstra meant by "elegance". There is beauty in every project, but it is not in the code. It is in the ideas. Alan Kay recently wrote about this on the FONC mailing list, explaining that its the ideas that matter, not the code.
2) I dislike some of the systems discussed there, and think some of the system designers selected to write about good design absolutely know literally nothing about design criteria, whatsoever

I already have many books on my shelf that illustrate technobabbling: how to communicate poorly and not solve any real world problems. -- You are free to criticize me for my view, and form your own opinion.

Bottom line: Just because a book has an alluring title does not make it a timeless or even timeworthy read.

### Fair enough

You are certainly entitled to your opinion. As for the title and the whole idea of "beauty" in code on a philosophical level, I actually don't care so much. I enjoyed what I've read of the book: lots of cute hacks. Of course it's only natural to strongly disagree with some of the chapters, I'm sure any thoughtful programmer would find plenty to argue with. For me, that's part of the fun!

Anyway you seemed a bit venomous about the book, so I was just curious where that came from.

### There is beauty in every

There is beauty in every project, but it is not in the code.

It is in the code, to the degree that the code accurately and concisely reflects the ideas. The two should not be distinct.

### Third goal

The goal of proof isn't to be correct. It is to be confident. This has two parts: (1) confidence that you got something right, and (2) ability to convince a second party that your confidence is justified. Assume that you succeed in your proof. To what degree has use of a prover inherently failed objective [2]?

Actually, there is a third goal. Knowing that your proof strategy is elegant - phrased the right way so that it is obvious it is right and belongs in The Book. As CAR Hoare put it - "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

There have been studies that show traditional testing sometimes outperforms using heavyweight formality like Z for specifications. - I'll try to dig them up. I read about these in Peter Gutmann's book Cryptographic Security Architecture: Design and Verification There are a couple of problems, namely (a) auditors have to be able to understand Z to use it (b) as many as a 1/3rd of auditors don't simply find some specifications in Z hard to understand, but incomprehensible [meaning Z is the ISO-900x champion] (c) as Gutmann puts it, "In other words, Z is the answer provided you phrase the question very carefully."

Jonathan,

All I can say is that if you want me to reciprocate the proctoscope you gave me in the other thread, read Gutmann.- Most here may only know him from his /. featured articles where he criticizes Microsoft's Windows Vista Security as "the longest suicide note in history" and Linux security.

### Z notation

I don't really understand this critique. Z is a notation for formal specifications based on mathematical notation, and it's about the easiest to understand as far as formal specifications go (though if there are any sticking points I'd be quite interested in knowing about them).

It is unrealistic to expect auditors to understand a formal spec as easily as an informal one, but a Z formal spec is still far more auditable than source code. And the compliance of source code to the Z spec can be checked by automated tools.

Edit: el-vadimo's link seems to clarify this point, at least partially. It appears that Z notation is much too terse to be readily understandable, and that insufficient efforts were undertaken to link the formal Z specification to informal specifications, requirements, criteria and design decisions. I had largely taken for granted that this would be done as a matter of course in any well-engineered system.

I certainly knew about Peter's dissertation and follow-up work, and I'm a bit embarassed that I neglected to acknowledge it above. I agree about the proof strategy needing to be elegant - both for clarity and for evolvability.

I do hope you took my "proctoscoping" title in the spirit of mild fun. That's certainly how I intended it. Reciprocating is certainly welcome and appreciated.

### Gutmann's critique of formal methods

I read [...] in Peter Gutmann's book Cryptographic Security Architecture: Design and Verification: â…“ of auditors don't simply find some specifications in Z hard to understand.

I don't own the book, but Google's been able to index Guttman's homedir which contains, at present, a file titled Chapter 4: Verification Techniques. In it, the following quote can be found:

One study of the comprehensibility of formal specifications found that the most common complaint among users was that the specification was incomprehensible. When asked to provide an absolute comprehensibility rating of a Z specification, subjects rated the specification at either â€œhardâ€ or â€œincomprehensibleâ€ after a week of intensive training in using the language. Another survey, again related to the comprehensibility of Z, found that even subjects trained in discrete mathematics who had completed a course in formal methods found it very difficult to understand any of a 20-line snippet from a Z specification, with nearly a third of the test group being unable to answer any questions relating to the specification, which they found incomprehensible.

### With an error like this...

Another approach which has been suggested is to automatically verify (in the â€œformal proof of correctnessâ€ sense) the specification against the implementation as it is compiled [49]. This has the disadvantage that is canâ€™t be done using any known technology.

I have a very difficult time taking anything the rest of the chapter says seriously—especially since, as is common in these critiques, there are tons of references to outdated tools and work from the 1960s, and no apparent references to the state of the art.

### Peter was actually very

Peter was actually very careful to research the current state of the art, so if you have concrete counter-examples that work in the context of the kinds of systems his dissertation addresses, that would be a very interesting thing to see.

I should add that I'm referring only to the 4th chapter linked to in the parent post—it's entirely possible that the remainder of the book addresses that concern.

However, my primary point—that there's a specific claim that no known technology makes possible the creation of software that is "correct by construction" in the sense of "satisfies a proof of correctness," and that the claim is false—stands. We've spent a lot of time here discussing Coq and Isabelle, two systems that do precisely that. That, coupled with chapter 4's critiques of outdated technology and lack of cross-references to discussion of other technology elsewhere in the book, strongly suggests to me a lack of familiarity with such technology.

I could, of course, be mistaken.

### Two corrections

... there's a specific claim that no known technology makes possible the creation of software that is "correct by construction" in the sense of "satisfies a proof of correctness," and that the claim is false... We've spent a lot of time here discussing Coq and Isabelle, two systems that do precisely that

Much as I like and respect Coq and Isabelle, there was no existence proof at the time of Peter's writing that either system could do a proof of correctness of this sort on a substantial, general-purpose body of software. Even today, I would argue that there is no existence proof for either system on the kinds of software Peter was concerned with. Today, they can do proofs of correctness only under narrowly constrained conditions, and only when (a) the system is very small [as for seL4] or (b) the system is crafted in a language such as O'Caml or Haskell, both of which remain ill-suited to the kinds of production code that Peter was concerned about.

Also, Peter's critique includes as part of the difficulty the problem of correct specification.

### "Production" Code

I'll certainly have to grant that if "production" code means "written in C," then yes, Frama-C still seems to represent the state of the art, with all of the shortcomings that have been discussed here and elsewhere.

It may be that Peter had the difficulty of correct specification in mind. For whatever reason, I didn't glean that from chapter 4, and in any case I'm strongly of the opinion that would should just stipulate that specification is hard: the observation is a truism and has nothing to say, as far as I can tell, about the ease of fulfilling the specification once it exists, either formally or informally.

Although, come to think of it, I do have a level of sympathy for the point of view that suggests that some, or perhaps even most, of the value of formalization derives from the construction of the formal specification—that is, formalization forces you to make your assumptions explicit.

### "automatic" is important

Didn't Peter write "automatically verify"? (emphasis added)

If I understand correctly, Coq and Isabelle don't automatically verify the code against the spec; they often require some human guidance on how to prove the correctness theorem.

I think you should take Peter Gutmann's thesis seriously. I don't claim it is free of errors, but I do claim it is worthy of being taken seriously.

### Phase Distinctions

David Wagner: If I understand correctly, Coq and Isabelle don't automatically verify the code against the spec; they often require some human guidance on how to prove the correctness theorem.

This is a good point. I took the comment to mean "There exists no technology such that, once the development is complete and the task is to build executable code, automatically ensures the consistency of the proofs and code." This is the interpretation that I claim is false, given the code extraction processes in Coq or Isabelle.

However, another reasonable interpretation might be "technology that allows the development of a formal specification and an implementation of that specification separately, but automatically ensures their mutual consistency, does not exist." I can't find anything to argue with about that interpretation, but neither have I ever heard anyone claim the contrary. That is, it's understood by Coq and Isabelle users that the process of developing certified software involves human tasks that can't be said to strongly resemble the traditional process of programming: they're logical, rather than operational. It's also true that those processes can't be completely automated. This is why I believe Adam Chlipala's work on Certified Programming with Dependent Types is important: the point is very much to gain leverage from Coq's Ltac tactic-programming language to significantly improve the level of automation available for developing certified software. Hopefully someday we'll see results that have an even more impressive effort/output ratio than what we see with seL4.

David Wagner: I think you should take Peter Gutmann's thesis seriously. I don't claim it is free of errors, but I do claim it is worthy of being taken seriously.

I take it extremely seriously, otherwise I wouldn't bother critiquing it. :-)

Claims of the lack of value of tools and processes admittedly draw an even more skeptical look from me than others. The reason for this is very simple: that which is wildly impractical today becomes commonplace tomorrow. A good example of this is SAT solving: as these slides point out, "SAT, the quintessential hard problem (Cook 1971): SAT is hard, so reduce SAT to your problem" and "SAT, the universal constraint solver (Kautz, Selman, ... 1990's): SAT is easy, so reduce your problem to SAT."

I'm perfectly SATisfied (sorry) that certified software falls into precisely the same category.

Paul: I think you're stretching badly because you love the promise of proof tools. Let's make this concrete. Do you know of any technology, then or today, that can realistically prove the correctness of a practically usable implementation of OpenSSL? Or one that can get us an end-to-end proof of a login subsystem, including its reliances on the underlying kernel and the database system in which login relations are stored? I do not.

The domains in which current proof methods are strong are largely term-based computation (monads, for this purpose, are term-based). The kinds of programs for which we need proofs in the wild are intensively stateful.

The seL4 work, for all my challenges, is a landmark piece of work. In order to get it to happen, they had to invent/discover an appropriate subset of C, formalize the semantics of two languages (Haskell and the C subset) in Isabelle, and apply (if memory serves) something close to 56 people-years of verification effort on a piece of code that would take three to four people-years to design and develop in the absence of verification concerns. They had to build an enormous amount of prover infrastructure. The end result is certainly not automated, and would not scale to larger systems (because the cost is too high in the presence of evolving specification). The Coyotos research effort contemplated a comparable effort.

Finally, I note that "that which is commonplace tomorrow" does not exist today, and certainly did not exist when Peter was writing.

### seL4's formal semantics of Haskell and C

Re: comment-59736: â€¦ they had to invent/discover an appropriate subset of C, formalize the semantics of two languages (Haskell and the C subset) in Isabelle â€¦

shap, do you know if seL4 folks ever published their formalization of Haskell and C? They sketch in some details of their C-to-C-SIMPL translation with the lightest of pencils in Refinement in the Formal Verification of the seL4 Microkernel and Mind the Gap but my Google-fu is not strong enough to find any references to their description of the actual formal semantics. My mindfulness of this gap has been prompted by the following vignette in Gutmann's thesis (Â§3.5 Credibility of Formal Methods, p.101):

The problems inherent in relying purely on a correctness proof of code may be illustrated by the following example. In 1969, Peter Naur published a paper containing a very simple 25-line text- formatting routine which he informally proved correct [125]. When the paper was reviewed in Computing Reviews, the reviewer pointed out a trivial fault in the code which, had the code been run rather than proven correct would have been quickly detected [126]. Subsequently, three more faults were detected, some of which again would have been quickly noticed if the code had been run on test data [127].

The author of the second paper presented a corrected version of the code and formally proved it correct (Naurâ€™s paper only contained an informal proof). After it had been formally proven correct, three further faults were found which, again, would have been noticed if the code had been run on test data [128].

The saga of Naurâ€™s program didnâ€™t end with the initial set of problems which were found in the proofs. A decade later, another author analysed the last paper which had been published on the topic and found twelve faults in the program specification which was presented therein [129]. Finally (at least as far as the current author is aware, the story may yet unfold further), another author pointed out a problem in that authorâ€™s corrected specification [130].

[129] refers to Bertrand Meyer's excellent paper On Formalism in Specifications in which he discusses "the seven sins of the specifier" and presents Naur's original problem statement:

Given a text consisting of words separated by BLANKS or by NL (new line) characters, convert it to a line-by-line form in accordance with the following rules:

1. line breaks must be made only where the given text has BLANK or NL;
2. each line is filled as far as possible as long as
3. no line will contain more than MAXPOS characters.

This looks like a simpler problem than nailing down the formal semantics of a (subset of) C.

### Can someone explain

How one is supposed to write a compiler without too many bugs if even informal reasoning over text formatting is hard, and takes years?

### grand challenge

K.Â RustanÂ M.Â Leino, in his PowerPoint slides On the subject of the Verification Grand Challenge: Programming methodology (see the end of this post) attributed the very same question to Sir C.A.R. Hoare:

Â Â Â Â Can programming theory yet fully explain why real big programs work?

I presume the implied answer was, not yet.

### One can't

And this is why, ultimately, I have come to question proof in the context of assurance. A (very) high-confidence measure of a low-confidence proposition is, perhaps, not so useful.

But I can't entirely convince myself to stick with a position on this, for three reasons:

1. Automated, comprehensive testing (the economic rationale for proof) is useful whether the specifications are perfect or not. My emphasis here is on the automation and the comprehensiveness of what is examined/proved, rather than on the complete coverage of the proof set.
2. Progress isn't made in all places at the same time. We may hope to learn, over time, how to do better at specification.
3. There is strong evidence that the attempt to prove improves both specification and implementation in measurable and cost-effective ways. This is valuable independent of whether the proof/spec is actually right or not.

Aside: looking for a precise semantics in C is hopeless, since C essentially has no semantics. There is much more hope for languages that actually do have semantics, and the problem of language semantic specification is probably tractible today thanks to the strong emphasis on mathematical foundations that have been generated within the formal computer science community.

Naur's original problem is a subtle problem involving back-tracking. While the PL specification problem is larger, it consists of small, locally-specifiable subparts that are composed. The Naur problem is a qualitatively harder type of thing to specify.

And remember, finally, that we, as a field, have relatively little experience with formal specification. It's not clear today how much experience helps, but I'm struck that today's proofs look a lot like early source code: no comments, obscure tricks (perhaps motivated by the prover) and so forth that obfuscate the specification.

### CompCert

Actually, a formally certified compiler has in fact been written. It targets PowerPC and ARM architectures from Clight, a subset of C, and the development does include formal semantics for Clight. The developers claim that proving the correctness of their compiler has helped them to a great extent in validating their formal semantics for the language.

Mathematical foundations have been important in making formal semantics more feasible, but any such semantics need to be clearly related to the informal semantics and "meaning" programmers and compiler writers commonly rely upon, which usually is not stated mathematically.

It is also fairly clear that automation of traditional "testing" is a powerful rationale for formal proof of weak, "lightweight" specifications. As we have been arguing elsewhere, a secondary emphasis on automated disproof of such specifications would be even more helpful, since the resulting artifacts would point out errors in either the specification or the program.

### Not yet published

My understanding from Gernot is that this is not yet published, but that their intent is to make the entire proof trail publicly available. I've just sent a note to Gernot forwarding the question and encouraging him to respond.

### The crucial bits are out there.

Gernot promptly forwarded to me, so I'll answer:

I'm actually not sure if the whole proof will ever be fully public (it is commercial property of Open Kernel Labs). We will very likely make the C semantics public at a later point, though. We're aware that this currently creates a problem in outside trust of the proof and this is why we are currently working on an official high-grade certification of seL4 where the proofs are checked by an independent third party. And, of course, if you're a customer, I'm sure OK Labs can be persuaded to give you access to the proofs for money.

The interesting bits of the C semantics that seem to be the point of contention here are published already, though. The main reference is the Types, Bytes, and Separation Logic paper at POPL'07, with a number of further papers that are summarised in Tuch's PhD thesis. An early version of the memory model (described in the paper) has been released for quite some time and is available from the NICTA/ERTOS website. We have fine-tuned this a bit, but the basic framework hasn't changed.

The short version is: the standard imperative language part of C is quite boring and has been formalised hundreds of times in different tools. Our approach is to translate it into a generic imperative language framework in Isabelle, called SIMPL. It is open source and available from the Archive of Formal Proof. The somewhat more interesting bits are how to translate C expressions into SIMPL while staying sound, the main problematic point being the treatment of side effects in C expressions and the problems you can run into with those (mainly order of evaluation). For this we just restrict our C subset to what style guides for good C tell you anyway: avoid side effects in expressions. If our parser sees a function call in an expression more complex than a simple assignment, it attempts to automatically prove freeness of side effects. If it is not successful, it rejects the program. There are a few more technical bits like that, but nothing really special and nothing that hasn't been done similarly in other tools that treat other subsets of C.

The really interesting point for C verification (at least in systems code) is on pointers, pointer arithmetic, safe and unsafe pointer casts and the whole memory model that goes with it. This is what the papers above are about, where we go a good deal further than others and what was one of the original research contributions in this project. The basis of the memory model is a fully detailed, brutally plain function of just word32 to byte. We formalise how types are laid out in memory and managed to derive a reasonably nice and high-level proof framework for this (with soundness proof in Isabelle). The basic operational semantics for the C memory are fairly simple. The hard part is to make it manageable in proofs.

So in summary, I'm not worried that we got the memory model wrong or the C semantics. If at all, I'm much more worried about bugs in the compiler, the cache and TLB behaviour we have left out, that our VM invariants may not be strong enough, or even that parts of the top-level spec are less useful than we want them to be. All of these could lead to unexpected behaviour. And for the certification, we're taking a hard look at all of these assumptions again.

Basically, for a research project that tries to push the state of the art in formal verification, our assumptions are more than fine, they are better than anything that has ever been done for this size, complexity, and level of detail of C code. That was the point of L4.verified. For really high-assurance software, they are not good enough yet. We know that. We're working on it. One thing at a time.. ;-)

### Vadim, you tug on the heart strings

I was just about to post a front page story chronicling the history of that series of papers, including Meyer's paper!, and some, including software engineering researcher Stephen R. Schach, still think Meyer's description is somewhat ambiguous; after giving mathematical descriptions, Meyer converts it into English language. Schach argues Meyer's conversion of his mathematical descriptions for the problem into English language is not a 1:1 correspondance and leaves room for error if the programmer relies on the English prose. Bertrand's conversion into English reintroduces the mistake by Goodenough and Gerhart that it is okay to print "WHO" on the first like and "WHAT WHEN" on the second line OR "WHO WHAT" on the first line and "WHEN" on the second line when the input "WHO WHAT WHEN" is greater than 10 characters. I was going to post a story about Ted Kaehler's live specification of a text editor's line formatting. -- I actually emailed Bertrand a few weeks ago to ask him his thoughts.

What you leave out from Gutmann's quote is the fact that Naur, Goodenough and Susan Gerhart, etc. who "proved" these correct were all leading computer scientists of the time! Leavenworth, the Computing Reviews reviewer, was also well-known for proofs. And so was London. We're talking about the brightest people on earth.

### Plenty of Truth

shap: Paul: I think you're stretching badly because you love the promise of proof tools.

I have absolutely no doubt that there's some truth to this. Maybe even quite a bit. With that said, I don't think it's quite as stretchy, or quite as based only on promise as it may seem.

shap: Let's make this concrete. Do you know of any technology, then or today, that can realistically prove the correctness of a practically usable implementation of OpenSSL?

What do you mean by "OpenSSL?" Do you literally mean the OpenSSL codebase? Or could I choose GNU TLS, NSS, yaSSL, or MatrixSSL instead? I feel pretty comfortable asserting that MatrixSSL 3.1.1 ('find . -name '*.c' -print0 | xargs -0 wc -l' => 19479, including comments and blank lines) is amenable to correctness proof using Frama-C and Coq.

shap: Or one that can get us an end-to-end proof of a login subsystem, including its reliances on the underlying kernel and the database system in which login relations are stored? I do not.

Of course not; you haven't finished Coyotos yet. ;-)

With that said, I think we already saw Ynot and its verified relational database system, right? So it seems that what we need is a combination of the seL4 work and the verified relational database work. And that's one of the reasons that I have questions about the seL4 team's choice of Isabelle, both because of the already-underway related work that had been done in Coq, and because, while you can do a certain amount of proof automation LCF style with Isabelle, it seems to me that what Adam Chlipala and others are doing with Ltac in Coq is quite a bit farther along.

shap: The domains in which current proof methods are strong are largely term-based computation (monads, for this purpose, are term-based). The kinds of programs for which we need proofs in the wild are intensively stateful.

Absolutely. That's why all the work done on Floyd-Hoare logic in Frama-C and Ynot, to name just the two tools I'm slightly familiar with, is important.

shap: The seL4 work, for all my challenges, is a landmark piece of work...

Right. The crux of my assertion is that there were, and are, missed opportunities to reduce the proof burden/focus on the the work outstanding from other efforts: there are processor models for x86 and PowerPC available for Coq. Ynot is all about verifying imperative software; in Effective Interactive Proofs for Higher-Order Imperative Programs we find:

The veriï¬cation burden in our new system is reduced by at least an order of magnitude compared to the old system, by replacing manual proof with automation. The core of the automation is a simpliï¬cation procedure for implications in higher-order separation logic, with hooks that allow programmers to add domain-speciï¬c simpliï¬cation rules.

This certainly strikes me as encouraging with respect to recasting the seL4 work, or at least informing how the Isabelle support might evolve.

shap: Finally, I note that "that which is commonplace tomorrow" does not exist today, and certainly did not exist when Peter was writing.

Sure. My comment was only intended to reinforce the extreme risk in declaring something impossible/not to exist, and in making it, I was not limiting myself to verification efforts aimed at, e.g. C code. If we're defining "production code" as "C code"—and I absolutely agree that we have to in the context of systems programming even today—then I absolutely agree. I just didn't read Peter's chapter 4 to include that constraint.

### some came true

Too much of the proving efforts are one-off stuff with immature tooling. Naturally, making reusable components with improved tooling would greatly increase applicability and reduce effort. I expected your predictions to happen based on that. Now, it's late 2016, we get to see what happened. :)

A smaller SSL was verified with a Frama-C based tool against specific issues

A reference implementation of TLS was mathematically verified extracting to FP languages. With seL4, we've already seen that could be converted to C. Gets better in a second though.

http://www.mitls.org/

COGENT language is functional, systems language with certified compilation to C. Already used for ext2 filesystem with labor going down from 1.65 person-months per 100loc in seL4 to 0.69 person-months with COGENT demo. Maybe port miTLS to it for quick boost or C deployments.

https://ts.data61.csiro.au/projects/TS/cogent.pml

CertiKOS team delivered embedded hypervisor by integrating a bunch of DSL's or tailor-made techniques. Many of them will be reusable for other problems. Maybe port miTLS to *that*.

http://flint.cs.yale.edu/certikos/framework.html

The combination of basic specs, contracts, functional programming, tools like QuickCheck/QuickSpec, and FP-to-C like COGENT already can deliver quite the cost/benefit analysis. The full thing seems to be able to handle something like TLS now, all or most of it. I expect the technology of tomorrow to be even more amazing with the amount of practical stuff the CompSci folks are cranking out. Truly, exciting times for people that love high-assurance systems. :)

### Very interesting point:

This is a very interesting point which should be investigated to a greater extent. Since the Z notation is based on standard mathematics, one should expect it to be easy to understand. But following the references in the above paragraph brings up some papers which suggest that the use of mathematical notation hinders rather than aiding usability, and that it would be helpful to adopt a literate programming style, including design rationales and even alternative specifications which were rejected in the final system:

• â€œLiterate Specificationsâ€, C.Johnson, Software Engineering Journal, Vol.11, No.4 (July 1996), p.225. [web]
• â€œMathematical Notation in Formal Specification: Too Difficult for the Masses?â€, Kate Finney, IEEE Transactions on Software Engineering, Vol.22, No.2 (February 1996), p.158.

To what extent do these pragmatic concerns carry over to functional programming languages, which are also heavily influenced by mathematical notation and much terser than the imperative languages which are commonly used in industry? Should literate programming become a recommended practice here as well? Are non-functional approaches to programming simply easier to understand for most developers?

### As far as I understood

Z is highly informal. Though it has been years since I looked at it.

As for your other question, I don't think literate programming is very good for programmers; I see it as a derivative of presentation techniques/ learning. I am not against it, but for programming in the large, whatever you do, people are going to be exposed to APIs. And, as far as I am concerned, the best manner of presenting an API is through a standardized interface such as Javadoc or Haddock. (Possibly with other documentation, which is maybe where the literate programming comes in.)

Put bluntly, programmers are not interested in prose (when they need to get the job done).

### the Z notation

Re: comment-59678:

Since the Z notation is based on standard mathematics, one should expect it to be easy to understand.

Easy examples are easy (for some measure of easiness). Here's a specification of the Eight Queens problem lifted from The Way of Z by Jonathan Jacky:

SIZE == 8
FILE == 1 .. SIZE
RANK == 1 .. SIZE
SQUARE == FILE â¨¯ RANK
DIAGONAL == 1 - SIZE .. 2 * SIZE

â”‚ up, down : SQUARE â†’ DIAGONAL
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
â”‚ âˆ€ f : FILE, r : RANK â¦
â”‚    up(f,r) = r â”€ f  âˆ§
â”‚    down(f,r) = r + f

â”Œâ”€â”€ Queens â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
â”‚ squares : FILE â†£ RANK
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
â”‚ â´ squares â— up, squares â— down âµ âŠ† SQUARE â†£ DIAGONAL
â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€


### Literate specification

Since the Z notation is based on standard mathematics, one should expect it to be easy to understand.

Possibly. But Z also uses a lot of fairly obscure notation that many people are not familiar with.

it would be helpful to adopt a literate programming style, including design rationales

My impression is that most Z specifications are written in such a style (in part because the math can be difficult to follow otherwise).

### Probability of Perfection

I listened two days ago to a talk by John Rushby where he evangelized the concept of probability of perfection. This notion arised just from the problem that proof is "yes/no" whereas confidence is measured for example in the aerospace industry by something like "does show catastrophic behavior only in 1 out of 10^9 times". More information can be found here.

### Maybe Worth Noting...

...there is a school of Probability Theory, probably most vigorously expounded by the late E.T. Jaynes, that holds that Probability Theory, properly understood, is neither more nor less than an extension of Aristotelian logic dealing with how to conduct inference given incomplete information. See New Axioms for Rigorous Bayesian Probability for important information and, of course, Probability Theory: The Logic of Science.

### BTW...

I should probably just create a story with this periodically, to remind people...

"A proof is a repeatable experiment in persuasion." — Jim Horning

### I want proofs, not types

These days it has become very fashionable to use advanced type systems to prove properties of programs. The COQ theorem prover is a nice example of this.

The reason for this is the (admittedly intriguing) idea that if you just write down the program in the right way, proof of its main properties will be immediate. The hope with type systems is that you can capture statically all the right ways of writing down a program.

I think this hope is pretty vain and leads you in the wrong direction. Types are good for abstracting from implementation details; for example, when dealing with integers, I really don't care how they were constructed from more basic sets in a set theoretic framework. Types are not good for replacing the way theorems have been written down and proofs have been done in the mathematical community for quite some time now.

If you are proving a program to be correct, you actually don't need your program to have an advanced type system at all (again: simple types are useful just for not needing to deal with raw sets), because you can express everything you want to say about this program in normal mathematical theorems.

So, in short: I really do not like this idea of approximating proofs with ever more complex type systems.

Rather, I would like to acknowledge how high-level programming in the form of algorithms has always been a part of mathematics. Therefore we have to improve the way how to do mathematics on a computer in general, not only for programs. Note that there are only very few mathematicians (or rather, none at all) who use interactive theorem provers for their work. There are mathematicians out there who are definitely interested in interactive theorem proving; Thomas Hales is a great example. But even those mathematicians will not use theorem proving for their normal work until the tools have been much improved.

I understand why people from computer science want to apply this (theorem proving) technology NOW to their problems. But in my opinion, we first (or, at the same time) need to make (modern) mathematicians happy. If we keep this in mind, it will provide a great guide for how to design the next generation of interactive theorem provers.

### And I forgot to mention that

And I forgot to mention that Isabelle is closer in spirit to that next generation than Coq is :-)

### Afoul Language

The above two claims run headlong into the Coq-checked proof of the four-color map theorem and the existence of the C-CoRN repository.

### I am not saying you cannot

I am not saying you cannot do impressive mathematical work in Coq. I am saying, if you do it in Coq and not in Isabelle, you are even more impressive :-)

There has been (not quite so impressive) work on mathematical theorems in Isabelle, too: Flyspeck II

### Still Curious

I'm still curious as to what you think is missing from Coq with respect to doing mathematical proofs. It seems to me that, historically, you could make the case that Isar was a strong mark in Isabelle's favor, but currently I think C-zar answers the competition nicely, and Ssreflect provides some insights, including useful tools, related to how Coq was used to conduct the four-color map theorem proof.

### First: COQ is a great tool.

First: COQ is a great tool. If you want to do mathematics in COQ, go ahead and do it.

I think neither Isabelle nor Coq are there yet. That's why I want a new generation of theorem provers. Ssreflect is nice. Can you use it together with C-zar ?

### Avigad's proof of prime number theorem

is my hobby horse for the use of Isabelle in involved mathematical proofs. They chose to encode a proof by Erdoes and Selberg. This proof had a notoriety of being quite involved in certain circles.

### The reason for this

The reason for this is the (admittedly intriguing) idea that if you just write down the program in the right way, proof of its main properties will be immediate.

This is one way of using Coq and other dependently typed systems, yes. However, you definitely can use Coq to write mathematical theorems the way one would in Isabelle and the like. You may need to introduce special axioms to deal with classical reasoning, extensionality, quotienting, choice principles etc.--things which ordinary mathematicians take for granted. On the other hand, the lack of builtin support for these is what allows Coq users to describe high-level algorithms in a mathematically rigorous way; Isabelle and others tend to resort to ad hoc solutions, if they support algorithms at all.

### Coq has the ad-hoc logic,

Coq has the ad-hoc logic, not Isabelle. Isabelle builts on the same kernel as hol-light, and hol-light has been mechanically verified to be correct. For Coq, I don't think that there is a single person on earth who can faithfully say that the Coq logic is consistent. Also, how can you be more ad-hoc than by "just adding axioms" !?! This is just the reason why "normal" mathematicians prefer Isabelle over Coq. Prove that FFT-algorithm to be correct? Sure, I just need to fix that Coq-kernel by adding an axiom which might make my logic inconsistent or maybe not... Ah, don't bother, just DO IT CONSTRUCTIVELY ;-)

Apart from that, Coq definitely deserves credit for making popular again the idea of proof by computation (although this idea is really due to ACL2, not Coq). But since then, Isabelle has caught up. I would even say that for writing down, proving, and executing high-level algorithms today, Isabelle is the better choice, see this article, which talks a bit about one option of doing computation in Isabelle.

### Not Quite

Steven Obua: For Coq, I don't think that there is a single person on earth who can faithfully say that the Coq logic is consistent.

You might find Coq en Coq interesting. :-)

Steven Obua: Also, how can you be more ad-hoc than by "just adding axioms" !?! This is just the reason why "normal" mathematicians prefer Isabelle over Coq. Prove that FFT-algorithm to be correct? Sure, I just need to fix that Coq-kernel by adding an axiom which might make my logic inconsistent or maybe not...

Are you saying that Isabelle uses no axioms in its mathematical proofs, then? I'm finding your position extremely confusing: Coq and Isabelle are both type-theory-based, both support the development of extracted software that is "correct by construction," both rely extensively on the Curry-Howard Isomorphism, both provide higher-order logic, both support classical as well as constructive logic, both provide mathematician-friendly proof languages, both adhere to the LCF approach to developing theorem provers, and both adhere to the de Bruijn criterion. They're kissing cousins.

Steven Obua: Apart from that, Coq definitely deserves credit for making popular again the idea of proof by computation (although this idea is really due to ACL2, not Coq).

I would say it's due to AUTOMATH. Coq has never claimed originality in that regard.

Steven Obua: I would even say that for writing down, proving, and executing high-level algorithms today, Isabelle is the better choice, see this article, which talks a bit about one option of doing computation in Isabelle.

I don't think anyone here is arguing that Flyspeck II isn't inspirational work, or that Isabelle isn't good for it. I'm certainly not. I only take issue with what seem to me to be some utterly wild claims about Coq's potential unsoundness and unsuitability for doing serious mathematics—especially given that that serious mathematics seems to already exist as Coq libraries.

### AUTOMATH: Actually, I don't

AUTOMATH: Actually, I don't know this system (although I certainly did hear about it). What I mean by proof by computation is to reuse the result of the computation (some call this reflection) within the proof. Did AUTOMATH have that already? Not bad.

I know about Bruno Barras thesis. Actually, I relied heavily on his work on computing in my own thesis. So, where in this thesis do I find a consistency proof? Also note that the actual COQ kernel is far more complex than anything you will find in this thesis.

Again: I am not saying COQ is unsuited for serious mathematics; you are not correctly citing me here. What I say means "I favor Isabelle over the other alternatives" which is about what you said in an earlier post about COQ.

Isabelle and COQ are becoming more and more similar, I do not deny that. Still, COQ is dedicated to constructive mathematics, which is something I don't like. I want full tool support (like SAT solving) built into my theorem prover for classical logic from the beginning, and not to be treated a second class citizen just because I dislike complex type systems and like set theory.

In its main components Isabelle does not make use of the Curry-Howard isomorphism, by the way, Coq does. If you are looking for differences between the systems, there is one.

### Fair Enough

Steven Obua: Isabelle and COQ are becoming more and more similar, I do not deny that. Still, COQ is dedicated to constructive mathematics, which is something I don't like. I want full tool support (like SAT solving) built into my theorem prover for classical logic from the beginning, and not to be treated a second class citizen just because I dislike complex type systems and like set theory.

I personally don't understand why "built in from the beginning" and "an integral part of the standard libraries and/or contributed libraries" should be in conflict, but OK.

Steven Obua: In its main components Isabelle does not make use of the Curry-Howard isomorphism, by the way, Coq does. If you are looking for differences between the systems, there is one.

By "main components," I suppose you mean Isabelle/Pure. That's fair enough, but to a first approximation Isabelle/Pure's only users are people who wish to embed new logics into Isabelle, so it seems odd to me to exclude the remainder of the Isabelle distribution from consideration. Still, if your point is that the kernels differ in this way, it's a fair one.

### No, by its main components I

No, by its main components I mean Isabelle + Isabelle/Isar + pretty much else. But I cannot rule out that somebody out there has written a package for Isabelle that uses Curry-Howard.

The Isabelle extraction system, like all such systems, relies on the Curry-Howard Isomorphism. Still, I do think the observation that it isn't part of Isabelle's defining metalogic is reasonable.

### Isabelle's foundations

Coq and Isabelle are both type-theory-based, both support the development of extracted software that is "correct by construction," both rely extensively on the Curry-Howard Isomorphism, both provide higher-order logic, both support classical as well as constructive logic, both provide mathematician-friendly proof languages, both adhere to the LCF approach to developing theorem provers, and both adhere to the de Bruijn criterion. They're kissing cousins.

This really overstates the similarity between Coq and Isabelle. In particular, Isabelle is based on Church's simple theory of types and has no dependent types, which is not what people usually mean by "type-theory-based", and Isabelle has very little to do with the Curry-Howard isomorphism. In particular, Isabelle proofs are not terms in any lambda calculus. Also, Coq does not follow the LCF approach, which revolves around using the ML type system to enforce that only certain kinds of values can have the 'theorem' type. Instead, Coq constructs proof terms which can be checked via simple typechecker.

Coq and Isabelle are both typed theorem provers based on the lambda calculus, but that's pretty much where the similarity ends.

### Points of Comparison

Sam: In particular, Isabelle is based on Church's simple theory of types and has no dependent types, which is not what people usually mean by "type-theory-based..."

That's the first I've heard that "type-theory-based" doesn't mean "simple theory of types."

Sam: Isabelle has very little to do with the Curry-Howard isomorphism.

Isabelle/Pure might not, but see Program Extraction in Isabelle. To say that Isabelle has very little to do with the Curry-Howard Isomorphism seems frankly bizarre.

Sam: Also, Coq does not follow the LCF approach, which revolves around using the ML type system to enforce that only certain kinds of values can have the 'theorem' type. Instead, Coq constructs proof terms which can be checked via simple typechecker.

Bruno Barras seems almost to agree with you insofar as Ltac tactics' relationship to the LCF approach can be characterized as "awkward." But it's clear that the intention is to adhere to the LCF principle, and in fact Ltac does. Awkwardly. :-) John Harrison, author of HOL Light, seems to think that the relationship is pretty straightforward. That Coq's OCaml, vs. Ltac, tactics adhere to the LCF approach is documented in the Coq Reference Manual.

Sam: Coq and Isabelle are both typed theorem provers based on the lambda calculus, but that's pretty much where the similarity ends.

With all due respect, that's contradicted both by the literature and by experience, and seems like a claim designed with no purpose other than to make false uniqueness claims about Isabelle.

### With all due respect, Sam is

With all due respect, Sam is right, and you are wrong.

### Hmm

Look. I'm not proposing any claims about the uniqueness or superiority of any systems here. I'm just making some factual claims about the vast difference between Isabelle (in general and in the Isabelle/HOL instantiation) and Coq, which the authors of both systems would agree to.

First, the type system of Isabelle/Pure (or Isabelle/HOL) has nothing to do with the the theorem proving process - it's a type system for the *metalanguage*. Whereas in Coq, propositions are types.

Second, proofs in Coq are terms in the Calculus of Co-inductive Constructions. These terms can be typechecked after they are created. This is what building a theorem prover on the Curry-Howard isomorphism means. Proofs in Isabelle are instances of the thm' datatype in this file: http://isabelle.in.tum.de/repos/isabelle/file/7b7ae5aa396d/src/Pure/thm.ML . Note that the implementation of the thm' type is hidden by the signature. That's what the LCF approach is about. In Coq, users can write arbitrary terms in CoC and use them as proofs, if and only if they typecheck. Nothing like that would be possible in Isabelle, since it's not possible to check than an instance of thm' was produced correctly after the fact.

### Thesis

So it seems that your thesis revolves around whether the proof process is based on the Curry-Howard Isomorphism or not. I appreciate that clarification and completely understand and agree that Isabelle's is not while Coq's is.

My contention remains that you and Steven made vastly broader claims than that, which a little bit of reference to materials by Coq developers, Isabelle developers, and developers of other, related systems contradicts. I think Isabelle is a wonderful system. I just dislike the thought of people being discouraged from using Coq by misleading information.

### Type erasure?

Nothing like that would be possible in Isabelle, since it's not possible to check than an instance of thm' was produced correctly after the fact.

Is this just a consequence of type erasure?

### No

As I said in my other reply to you, this is because Isabelle has no proof objects, or even really any proofs as values at all. When you have a proof in Coq, you have a value that you can repeatedly and reliably verify to be a proof of the proposition that is the terms type.

When you have a proof in Isabelle, what you basically have is an ML program that you could run again to print out the proved proposition a second time.

### Deep Curry-Howard

Isabelle has very little to do with the Curry-Howard isomorphism

I'm sensing two different notions of Curry-Howard in this conversation, which might be the source of the disagreement between, on one hand, Sam and Steven, and the other Paul. (For the record, I normally think about it the way Paul does.)

Narrowly conceived, the Curry-Howard Isomorphism was originally a very specific claim about simply-typed lambda calculus and first-order intuitionistic logic.

The next level of generality up from that, which Sam seems to be championing, is to extend the notion of "proofs are programs, types are propositions" to all appropriate lambda calculi and logics. At this level, I can understand Sam and Steven's claim that Isabelle (which, to be clear, I am passingly familiar with, but haven't really used) doesn't "use Curry-Howard". (Though very strict adherence to this might eliminate Coq as well )

But there is a further extension of the concept, that I believe both Paul and I subscribe to, where it can be said that Isabelle is most definitely using Curry-Howard, namely type-checking is being used to ensure the admissibility of particular programs as proofs within particular type-encoded logics. As the tutorial says: "Types are extremely important because they prevent us from writing nonsense."

If we want to make a distinction, it might be that Isabelle is using C-H passively and implicitly, whereas Coq is very explicitly and actively working with it. Which approach is stronger is open to (a possibly quite interesting ;-) ) debate, but for the discussion to be more than just talking past each other, I think a shared starting vocabulary might smooth the way to the real issues.

### Still not correct

This claim, if I understand it correctly, is still false:

where it can be said that Isabelle is most definitely using Curry-Howard, namely type-checking is being used to ensure the admissibility of particular programs as proofs within particular type-encoded logics.

In Isabelle, type-checking does not ensure the admissibility of any programs as proofs. This is because in Isabelle, programs are *not* proofs. In fact, in Isabelle, there is no object in the system that corresponds to a 'proof'. There are theorems, and an ML program might construct a new theorem, and that ML program might be referred to as a proof. But the ML program corresponds to a program written in Gallina in Coq, not a program in the CoC. The lambda term in CoC corresponding to a proof in Coq has no equivalent in Isabelle.

Also, I'm not sure what you intend by the phrase "type-encoded logic", but I doubt that Isabelle/HOL, as commonly used, features any such logics.

### Source of confidence?

There are theorems, and an ML program might construct a new theorem, and that ML program might be referred to as a proof.

Answering this might help me understand: what makes you confident that a theorem produced by Isabelle is correct with respect to whatever mathematical system you are working with, e.g. ZFC? In other words, what properties of the proof/program/Isabelle/whatever make you confident that the theorem you have proven isn't bogus?

### The LCF approach

This question is what the LCF approach is about. The ML program constructs an object of the thm' type. The ML type and module system restricts the ways such objects can be constructed, allowing the trusted core to be restricted to just one module, which can be easily audited. That's the file that I linked to earlier. If you trust that that file implements classical logic correctly, then you can trust that when Isabelle says something is a thm', that it's really provable.

### Re: The LCF approach

The ML type and module system restricts the ways such objects can be constructed, allowing the trusted core to be restricted to just one module, which can be easily audited.

OK, so to paraphrase you, because the program that constructs the theorem type-checks, the theorem is valid. Is that right?

If so, I would still say that Curry-Howard (in the very broadest sense I mentioned) is at the basis of Isabelle, albeit implicitly, because that is exactly why I'm confident that Coq has worked too.

I understand your point that from a user-experience point of view Isabelle is quite different and doesn't call upon Curry-Howard in any way visible to the end user. I can see that this might be an important distinguishing feature for someone choosing a proof assistant.

### Now, that's just stretching

Now, that's just stretching it a little too far. Yes, types are used, but where is the isomorphism? You can do with a single type, called "thm". Have fun mapping it bijectively to all sorts of things ...

### No stretch necessary

Now, that's just stretching it a little too far. Yes, types are used, but where is the isomorphism? You can do with a single type, called "thm". Have fun mapping it bijectively to all sorts of things ...

Presumably the bijection is non-trivial; if everything maps to "thm", it wouldn't really discriminate between truth and junk, so there must be some content to whether something maps to "thm" or "not thm" (however that is conceived).

Also presumably, there is some structure to those things that are deemed to inhabit thm, and I would guess that there is a type system lurking there too that applies to the constituent parts.

Besides, I wouldn't define things so narrowly to require an actual specific isomorphism for C-H to hold, even though that is the traditional descriptor. Any deep semantics-preserving correspondence is good enough for me. ;-)

I think the (ramified) Curry-Howard "concept" is a very deep and universal principle of proofs.

### You can redefine things and

You can redefine things and say that an isomorphism does not need to be an isomorphism to be an isomorphism, but this is not deep, it is just ridiculous.

Actually, you do not need to distinguish between thm and not thm, you just need to know that there is a thm type you cannot tamper with (although in practice, it helps of course to have at least one other type, for example string; that makes 2 types, which still induces one hell of an isomorpishm :-)). The internal structure of thm is of no concern; it is typed in Isabelle, but it could have been written in pure Lisp, there would not be a difference. It could actually just be an untyped string, although this is not advisable because of efficiency concerns.

### Names are not definitions

You can redefine things and say that an isomorphism does not need to be an isomorphism to be an isomorphism, but this is not deep, it is just ridiculous.

Just because, based on one way of thinking about the original STLC case, the common name uses the word "isomorphism", you can't raise that to the level of binding definition.

I would argue that even in the prototype case, the interesting thing about the phenomenon is more complex than isomorphism, since it is actually the properties that the well-type terms have in the untyped calculus that provides the "aha moment" for the discovery.

The internal structure of thm is of no concern; it is typed in Isabelle, but it could have been written in pure Lisp, there would not be a difference.

I would ask again what it is that gives you confidence that, because something is a thm, the "rules" have been followed.

### Is every theorem prover based on Curry-Howard?

Where's the isomorphism here? Type systems and propositions both existed before their connection was understood, and the mere presence of both doesn't imply CH is at play. Using a type system to enforce structural properties of proofs does not require CH. Enforcing structural properties is what type systems do, a priori. That a well formed proof should have certain structural properties does not require CH either. If we're not considering what a type system means as a proposition, or what a proof means as a program, then where's the connection to CH?

### One way or another

the mere presence of both doesn't imply CH is at play

I disagree: CH is a discovery about the nature of things that holds whether you are thinking about it or not.

And furthermore, I would contend that it holds of any situation where one language verifies/proves/constrains the semantic behaviour of the other in any way that can be considered "computational".

Enforcing structural properties is what type systems do, a priori.

I might have agreed with that at one time, but my experience using Coq led me to believe that this is not quite right. In Coq, the type system is ridiculously powerful (you can "tell outrageous lies" with it) and it is the limitations of programs that constrain the validity of the types as propositions. Just because PLs normally do what you say doesn't mean that is the only way it can be.

If we're not considering what a type system means as a proposition, or what a proof means as a program, then where's the connection to CH

Programs ("computational sentences") are always proofs, understood most simply as logical implication between their inputs and their outputs, and any system to computationally constrain inputs and outputs will be equivalent to a (possibly very weird and not tractable) logic/type system.

If you have a set-theoretic conception of types, this should be especially obvious, but I think it works for any computational languages that constrain each other. See here for more info about my views on this.

However, you don't have to buy this "big story" to buy the small one in the Isabelle case, since there is explicit type-checking being used to ensure valid logic and there is explicit creation of programs to generate a proof token. The existence of a thm is the witness to the existence of a program that generated it, and we need that program to have type-checked in accordance with the rules of the logic.

I don't see how in that case you could deny that CH has at least something to do with it, even if you want to quibble about exactly how much it is involved.

### The existence of a thm is

The existence of a thm is the witness to the existence of a program
> that generated it, and we need that program to have type-checked in
> accordance with the rules of the logic.

No. We just need it to type check, not in accordance with the rules of the logic.

### Types and logic

No. We just need it to type check, not in accordance with the rules of the logic.

A central consequence of CH is that type-checking is logical checking. If you tell me that the logic being checked by the type checker is unrelated to the logic your proof is working in, I would suggest that you might have some foundational issues to clarify.

### No

A central consequence of CH is that type-checking is logical checking. If you tell me that the logic being checked by the type checker is unrelated to the logic your proof is working in, I would suggest that you might have some foundational issues to clarify.

First, as Steven (an actual Isabelle developer) pointed out, Isabelle could be written in an untyped language, with some other mechanism for representation hiding (perhaps generative structures). Second, what we are telling you is exactly that "the logic being checked by the type checker is unrelated to the logic your proof is working in" but you keep denying this. What if Isabelle was implemented in first-order ML? Or in Coq? Or in Java? All of these are conceivable, and all of them would result in the same theorem prover.

### Yes/No/Maybe

what we are telling you is exactly that "the logic being checked by the type checker is unrelated to the logic your proof is working in" but you keep denying this.

I'm not "denying" it. I'm basing what I'm saying exactly on what you are telling me is the basis of your confidence that the proofs are correct: that your "proof tokens" inhabit the type thm, and that the programs that generate these proof tokens type-check against a "module" which encodes the logical system within which you are working.

Of course this could be done in a PL without explicit typing (a type-checker is just a program after all), and in that case I would still argue it was the moral equivalent of a type-checker (and hence CH would apply), though I would concede in such a case that the relationship is less direct.

However, in this case the link is more explicit than that if you tell me that the type-checking of programs is in fact part of the basis of your confidence in the correctness of your proofs, whether you keep those programs around as explicit objects in your system or not.

I'm starting to wonder if there is some ideological reason to repudiate CH that I'm not aware of. Or worse, one that I am aware of...

### Goodbye

At this point, your claims have left the realm of reality. If the Curry-Howard correspondence is about anything, it's about the relationship between types (in a typed LC) and propositions (in the logic), terms (in the typed LC) and proofs (in the logic). In Isabelle, propositions do not correspond to types, and terms do not correspond to proofs. Thus, Isabelle is not based on the Curry-Howard isomorphism. The terms that you are talking about in Isabelle are terms in ML, a Turing-complete language. The corresponding logic is unsound. Are you claiming that Isabelle is unsound?

I have no ideological reason to repudiate the Curry-Howard correspondence. It's a beautiful relationship that has been useful for type systems and for theorem provers. But it's not the only way to design a theorem prover.

Given your position, I no longer think argument on this point is useful. I encourage you to learn more about theorem provers, types, and logic, but LtU is not the forum for that.

Given your position, I no longer think argument on this point is useful.

Perhaps we have passed the point of useful discussion, however my purpose was to understand your position as well as to articulate mine.

I encourage you to learn more about theorem provers, types, and logic, but LtU is not the forum for that.

Actually, LtU has been partly a forum for that, for me and others, since its inception, so long as it is supplemented with a considerable amount study.

However, it is often possible for people who understand a lot about the same things to come to different conclusions about what they mean.

### Another take?

Isabelle does rely on the call-by-value semantics of ML to ensure all thm's are really finite terms built from primitives in THM right? Just like Coq does when it typechecks a term. So one should say only [thm] values are valid proofs in Isabelle.
Would it be correct to put this in Curry-Howard correspondence terms like that:
ctyp value = proposition
thm value = proof
metatheorem that THM implements classical logic = verification
metatheorem that classical logic is consistent = cut elimination

I realize this is a bit far-fetched and probably useless :)

### Curry-Howard?

Yes, but is this really a correspondence between proof-checking and type-checking i.e. between logics and type-systems, as Curry-Howard is generally understood? It seems that this analogy is more general than that, in that it could apply to a generic formal system with arbitrary rules of inference.

### Isn't That What Marc Said Up-Thread?

...it could apply to a generic formal system with arbitrary rules of inference.

Programs ("computational sentences") are always proofs, understood most simply as logical implication between their inputs and their outputs, and any system to computationally constrain inputs and outputs will be equivalent to a (possibly very weird and not tractable) logic/type system.

here

It would seem that Marc and I are neo-Platonists: we believe that there's an "out there" out there, and that things like the Curry-Howard Isomorphism describe it. Others, maybe including Steven and Sam, appear to believe that the Curry-Howard Isomorphism doesn't apply unless some designer chooses to apply it on purpose. This feels to me like the most accurate characterization given the conversation thus far, but it obviously might not be.

### When applies applies

Curry Howard always applies, but we don't always apply it. An analogy: there is an isomorphism between regular expressions and FSMs. One day, a programmer implements a regular expression matcher in a way that uses unbounded space. The isomorphism still applies, but the programmer didn't apply it.

### It's not a matter of "feelings"...

...it's a matter of mathematics.

The Curry-Howard correspondence says that types are propositions, that lambda-terms are proofs, and that beta- and eta-equalities correspond to cut-elimination.

This correspondence simply does not hold for Isabelle's logic. Propositions and proofs are merely elements of two different datatypes. The soundness of the inference rules of the logic Isabelle is validated by a model-theoretic (ie, lattice-theoretic/algebraic/NOT-proof-theoretic) soundness theorem, which says of each inference rule that if the premises are valid (ie, equal to the topmost element of the lattice), then the conclusion is valid (ie, equal to the topmost element of the lattice).

There is absolutely no requirement that the inference rules satisfy anything like a cut-elimination theorem -- and as a matter of fact, they don't. In particular, note that (a) we do not derive inference in pairs of introduction and elimination, (b) there are no restrictions on the structure of a sequent, and (c) there is no requirement that inference rules conserve information -- we simply validate arbitrarily complex inferences against the model. (And that model is lattice-theoretic, which means it is not a semantics of proofs, the way a categorical semantics is.)

The fact that the thm type only constructs terms according to these inference rules is simply a matter of type abstraction, like maintaining the balance of binary trees.

The Curry-Howard correspondence is really awesome. But this is because it has actual mathematical content, and is more than a feel-good platitude. In particular, this means there are plenty of logics whose meaning theories are not in accord with it.

### Nothing More Than Feelings

I knew I'd get called on that. :-) I used the term to describe my interpretation of various people's positions, i.e. to allow for the fact that they're subjective and not solely of logical/informational content. I completely understand and accept that Curry-Howard is not a matter of feelings—a fact that I take to mean that it always applies, whether out of intention or not.

### Oh what a feeling!

First of all, Neel, thanks for posting your view. In one post I have a clearer understanding of the position being articulated about Isabelle and CH than I was able to glean previously.

The Curry-Howard correspondence says that types are propositions, that lambda-terms are proofs, and that beta- and eta-equalities correspond to cut-elimination.

As stated earlier in the thread, I find that definition too narrow, though I understand wanting to stick to it, since it is easier to maintain rigor. However, I prefer to substitute "representations of programs" for "lambda-terms", and rather than "beta- and eta-equalities correspond to cut-elimination", I would say "evaluation corresponds to proof validation (i.e. proof normalization)".

Though this is way harder to keep rigorous, it ensures that we are still able to claim that all this stuff has something to tell us about a real-world Turing-complete language like Haskell, or any other PL. Less precise, but I think still true (unless you don't agree?)

There is absolutely no requirement that the inference rules satisfy anything like a cut-elimination theorem

I would guess that you have read The Blind Spot, where Girard makes a snarky commment to the effect that a proof system without cut-elimination is not worthy to be considered a proof system. ;-)

But setting that aside, I'm not sure I see how this is different from modeling a classical or non-standard logic in Coq: essentially add rules axiomatically to "make things work out". I still rely on the existence of an evaluating proof program with the right type to ensure that I didn't go wrong, it's just that some of the types along the way are inhabited "by fiat" rather than by construction.

(And that model is lattice-theoretic, which means it is not a semantics of proofs, the way a categorical semantics is.)

Can you expand on this a bit? Since CPOs can be used to model types, computations, and logics, it isn't clear to me at first glance they couldn't be used to model proofs, especially since proofs are often modeled with tree-like structures. I assume I'm forgetting something?

and is more than a feel-good platitude

I don't think we are trying to start a self-help movement or a church with it if that is what you are worried about. ;-)

### By lattice-theoretic

By lattice- or model-theoretic semantics, I mean propositions are just elements of the lattice (ie, truth values), and the partial ordering on the lattice is logical entailment. In a categorical semantics, propositions are objects, and proofs are morphisms. Any lattice is a degenerate category with at most one morphism between any two objects, and so a lattice-theoretic semantics can be viewed as a categorical semantics which equates all proofs (since if there is a morphism between two propositions, then all proofs of that entailment are mapped to the same morphism).

This is why proofs in an Isabelle/LCF-style system have have no notion of normal form. There is no equational theory of proofs, since proofs are an auxiliary notion lacking any independent status in the semantics of the logic. Propositions get semantics directly, in terms of their truth values -- e.g. for classical logic we can take the set of truth values {true, false}. The role of a proof is solely to establish when a proposition is entailed by the truth value true, and there is no closed set of rules for a given logic. Indeed, the very same logic can have many different proof systems with overlapping sets of axioms, since the semantics of the logic only cares about entailment of propositions.

Adding nearly any axiom at all to Coq will break its proof-theoretic semantics, since axioms generally make the normalization theorem false -- they can't be reduced, and hence introduce new non-canonical normal forms. However, Coq is very carefully engineered so that Prop can be given a simple model-theoretic semantics as well, and this means you can add any consistent axiom you like to Prop (including equating all provable Props) without breaking the realizability interpretation of its computational part (ie, without breaking extraction.)

In the Blind Spot, Girard is advancing a new methodology for giving semantics to logic from the one above, which integrates the proof-theoretic, Curry-Howard style with realizability (the third major semantics for intuitionistic logic). You can't understand what the novelty in it is, unless you can see that Tarskian style of semantics is actually different!

### Thanks!

There's a lot of food for thought here. Would you say that Girard's work here is similar to the work in Justification Logics, which appear to be the follow-up to some material I posted a while back on proof polynomials? One thing that struck me about Sergei Artemov's work was that he claimed that his "Logic of Proofs" included a generalization of the Curry-Howard Isomorphism, which is what leads me to this question.

Regardless, thank you again for the thorough and patient explanations of the principles underlying the position that, e.g. Isabelle's proof process isn't "merely" a "latent" application of Curry-Howard.

### Girard, mon semblable, mon frere

Thanks Neel! Another great post.

In the Blind Spot, Girard is advancing a new methodology for giving semantics to logic from the one above, which integrates the proof-theoretic, Curry-Howard style with realizability (the third major semantics for intuitionistic logic). You can't understand what the novelty in it is, unless you can see that Tarskian style of semantics is actually different!

To be honest, I did find it hard to see the novelty in Ludics, as you may recall, because Girard seemed to be describing how I already saw things! (Minus many of the technical details of Ludics, which I still won't claim to have fully assimilated.)

The Tarskian approach has always struck me as a simplified first approximation of the other styles, which to my eye are different only in technical detail, not substance.

The classic simplified Tarskian model in the form:  "Fred is a circus clown" is true iff Fred is a circus clown 

is a useful, but somewhat obvious first step, in the sense that it claims that we understand the "mere sequence of symbols" on the left because we can process the semantics of the "real sentence" on the right. It pushes the real action "off stage" trusting a human mind to make sense of it. Even the values "true" and "false", which we take to be deep semantic values get their real power from our understanding of what they mean "off stage".

We feel that we've made progress this way, since we trust our intuitions about the semantic domain better somehow, but in another way, we've just pushed the real semantic interpretation off to another level, and declared ourselves done.

If we go the further step, and ask how that next level works, we find that we are back in the same spot: with a "syntactic" description in one language on one side and a "semantic" description in a another language on the other. To prevent an infinite regress, we have to ask ourselves if there is some way to beef up the the translation, so that the very process that makes the translation is the semantics. (This isn't that far from the idea of semantics as a functor, you may note).

If you go down this road, the Tarskian approach is just the "trivial case" with relatively or completely unstructured arrows as your witnesses to implication. The CH and realizability approaches just use richer languages to construct the witnesses to implication.

So to paraphrase how I understand the Isabelle case as an illustration, the actual logic that Isabelle uses is really the implicational fragment of a presumably intuitionistic logic, and the propositions, which are meaningfully structure to the human user, are where the "real logic" is, but these are just empty symbols with no constraints on them to Isabelle. Mapping the implications to arrows in a lattice simply ensures that implication is transitive and that no arrow points a ⊥/false, except the identity arrow for ⊥.

The implications have minimal structure, essentially no more than "if you give me meaningless symbols A, I will give you meaningless symbols B". And there is a second "semantic evaluation" function that is similarly structureless, in that it is just a look-up table that says "if you give me meaningless symbol A, I will give you the token "true"" (or instead the token "false").

Now, it is a very small step from here to a simple CH interpretation. Since our symbols are meaningless anyway, lets call them types, and instead of a "semantic evaluation" function, a type A either has a single inhabitant a or is empty. And instead of mapping our implication arrows to a lattice, we rather build arrow types A → B that are said to be inhabited iff their result type is.

As a final step to ensure that our set up is valid, we perform "cut-elimination" on our inhabited implication types and test that each input type in our implication chains is also inhabited, so that we know that we aren't just "making it all up" using ex falso.

This is pretty darn close to trivial, but it still demonstrates that such an interpretation fits the framework, and you could see how you would get richer and richer systems by expanding the content of the languages involved to make more meaningful semantics.

### Once more with feeling...

I just realized that I forgot to comment on one other statement:

The fact that the thm type only constructs terms according to these inference rules is simply a matter of type abstraction, like maintaining the balance of binary trees.

That is a pretty strong propositional guarantee on the structure of binary trees, don't you think? Would you not consider that a guarantee with both logical and proof-theoretic content?

### Label Anxiety

It would seem that Marc and I are neo-Platonists: we believe that there's an "out there" out there

Given that I don't believe that Plato was a Platonist (a topic for a different community blog), I'm not 100% comfortable with the label. ;-) But I have an analogy that might also answer Matt M's response.

In the old Road Runner cartoons, there was a running gag that Wile E. Coyote would run off a cliff, but he wouldn't fall until he looked down and realized he had nothing beneath him. The joke is that the coyote gets to avoid the law of gravity until he is aware that it applies and "applies" it.

But we all know that falling is just a constraint that arises from the set of circumstances that are referred to by "going over a cliff". In the absence of gravity, would it make sense to call it "going over a cliff"? No agency is required in order to fall.

Similarly, I would contend that the very definition of what it means to be a type, a (mechanized) proof, a proposition and a program puts them into a certain kind of relationship that we happen to label the "Curry-Howard correspondence" (I'm not crazy about using "Isomorphism", for the reasons mentioned earlier, though that is traditional.)

This relationship is a necessary consequence of the structure of semantic systems that fit those definitions.

Specifically to Matt M's point:

One day, a programmer implements a regular expression matcher in a way that uses unbounded space.

I think this proves one thing: a bad programmer is not a sound logical system. ;-)

### > I'm starting to wonder if

> I'm starting to wonder if there is some ideological reason to
>> repudiate CH that I'm not aware of. Or worse, one that I am aware of...

I always wondered how conspiracy theories get started. Now I know.

### Usenet 1995

I always wondered how conspiracy theories get started. Now I know.

No conspiracy theory is being imputed, just that there might be some reason that hasn't been named in the discussion to want to disassociate Isabelle from CH provers, or at least emphasize the differences from them.

Not entirely sure what those reasons might be, but I re-read some of your comments earlier in the thread about classical vs. constructive, and catering to mathematicians, and I started to wonder if there wasn't something non-technical I was excluding from the discussion that would help me better understand it.

### Goodbye

Well, you have not responded to any technical arguments that Sam or I brought up, so this is funny ...

I can understand your desire to unify stuff (like, everything related to types and proofs is CH). After all, this is a great part of the appeal of mathematics. But just as important is the ability to keep apart things that are different (because, why would otherwise unifying be so interesting?).

### Huh?

Well, you have not responded to any technical arguments that Sam or I brought up, so this is funny ...

I was trying to understand your "technical arguments", and I explained exactly how I was hearing your arguments so that we could sort things out.

I can understand your desire to unify stuff (like, everything related to types and proofs is CH). After all, this is a great part of the appeal of mathematics. But just as important is the ability to keep apart things that are different (because, why would otherwise unifying be so interesting?).

Both unifying and distinguishing are important parts of any theory, and they certainly are both important for proofs. What you consider the same or different usually results from your idea of what you are trying to do both technically and socially within the context of your field. People who build proof assistants probably make different distinctions from those who use them or who study proof theory and type theory, and we might have found what those distinctions were through discussion.

You guys could have said: "we don't see things that way; for us factor X is the important distinguishing criterion for this contextual reason" and we could disagree about those factors, but at least we'd understand each other.

But it's pretty hard not to read the response "no, there is no possible way that this has anything whatsoever to do with Curry-Howard" as a case of "doth protest too much".

### Bingo

Marc Hamann: disagree: CH is a discovery about the nature of things that holds whether you are thinking about it or not.

Exactly. The way I've put this in face-to-face discussions is "The Curry-Howard Isomorphism isn't optional any more than the law of supply and demand is." That is, there are people of a certain philosophical bent who will (as I think you and I believe) pretend otherwise, but there will always be existence proofs to the contrary. All that's left is to decide what aspects of either side we find compelling. Steven likes that Isabelle starts from the classical tradition, and nevermind that that Constructive Type Theory and Curry-Howard-based extraction are right there in the standard distribution. I like the Constructive approach, so I appreciate that Coq is based on the Calculus of Inductive Constructions, and nevermind that classical logic and set theory are in the standard and contributed libraries. :-)

I regret that you and I, and Steven and Sam, seem to have ended up talking past each other. I really would be using Isabelle if I weren't using Coq, and I think there's a lot I could learn from their participation here.

### Goodbye to you, too

** Marc Hamann: disagree: CH is a discovery about the nature of things ** that holds whether you are thinking about it or not.

* Exactly.

The triangle inequality is part of the Isabelle standard distribution, too. It also holds, whether you are thinking about it or not. It does not explain how Isabelle works, unfortunately.

### de Bruijn Criterion

If I understand correctly, the answer for both Isabelle and Coq is the de Bruijn criterion: in Isabelle's case, the claim is that the "thm" type and its operations is small and simple enough to be audited by a human mind, and the LCF approach ensures that all proofs in Isabelle inhabit the "thm" type.

In Coq, we have a type/proof checker by the Curry-Howard Isomorphism such that checking the type of a proof term in Gallina is isomorphic to checking the proof, and that type checker is also small and simple enough to be audited by a human mind.

Observations like this are why I find claims of "vast differences" between Isabelle and Coq... well... silly. Culturally, perhaps. But technically, a lot of what's articulated as distinguishing them strikes me as distinctions without a difference.

By the way, thank you for the great follow-up elsewhere in the thread. You managed to say everything I wish I could have said had I only been more patient and thoughtful.

### de Hamann Criterion

the LCF approach ensures that all proofs in Isabelle inhabit the "thm" type

That still sounds pretty Curry-Howard to me. ;-)

You managed to say everything I wish I could have said

I think you did fine.

My secret is that I always try to send my first, "more interesting" response to /dev/null. ;-)

### I think we are talking about

I think we are talking about the same kind of isomorphism here :-)
Paul rightly refers to "Program Extraction in Isabelle" as being based on Curry-Howard. I shared an office for five years with the guy who did that work, so I heard about that work, yes.

Is this kind of program extraction an essential part of Isabelle? No. It is a cool tool inspired by the work that has been done in Coq on program extraction.

I am not interested in spreading misleading information about Coq or Isabelle. But without a doubt Isabelle is closer to a system a classical mathematician would use than Coq. Coq is more appealing to many computer scientists than Isabelle is. There might be exceptions (there always are), but this is the general tendency.

### FWIW

I think this is a fair point: Steven rightly points out that classical logic and set theory are at the core of Isabelle, and I'm quite sure that this informs the development of mathematical proofs in Isabelle in ways that might be supported out of the box in Coq, but are nevertheless culturally de-emphasized, the Coq emphasis instead being on Constructivism, as you would expect for a tool based on the Calculus of Inductive Constructions. :-)

Sam and Steven have both clarified their meaning quite well in my opinion, for which I'm grateful. I certainly don't believe that they intended any misrepresentation, and if I gave the impression that that was their intent, I must apologize.

### Isabelle has proof terms

As an Isabelle implementation guy I feel obliged to clarify the situation:
Isabelle had LF-style proof terms since 2000 (cf.
http://www4.informatik.tu-muenchen.de/~berghofe/papers/TPHOLs2000.pdf), but for performance reasons they are deactivated most of the time and if activated are only logged and stored inside the LCF-style thm datatype. One can then later replay or analyse the proofs.

Regarding the complexity of the calculi, the raw Calculus of Constructions (perhaps even raw pCiC) seems far simpler to me than Isabelle/Pure's underlying calculus with all its extensions (which has not even been officially written up in its newer form). I can't really comment on if that applies to the calculus underlying the full Coq kernel (is e.g. universe polymophism / "algebraic universes" treated in a formal writeup?). Regarding Isabelle, only lately a project was started to reduce Isabelle's extensions to normal HOL as found in HOL4, HOL Light etc (and then translate Isabelle/HOL theories and theorems to Isabelle/ZF; forthcoming paper at itp2010).

Interestingly there are sentiments (at least in the Isabelle Developer Community in Munich) that it would be cool to implement "soft types" outside the kernel to be able to experiment with custom "type" systems, where "types" are merely special terms (not unlike the situation in Coq).
Those "type" systems would then just be theorems about the semantic interpretations of the "types", animated by custom proving procedures.
This is not really a new idea for HOL-like theorem provers, see e.g. Joe Hurd - Predicate Subtyping with Predicate Sets.

Perhaps this concept of "soft types" can best be understood in the context of a set theorists informal usage of "this thing has type ..." (as opposed to the more primitive notion of "this is an element of ...") when talking about functions or other concepts abstracted from the implementation language ZFC.

More formally perhaps "types" are mathematical objects which can naturally be described via their external interactions, i.e. category theory. You should certainly not unfold the definition of "types" too often.

### Our Ugly Code

Very few programmer's code (the actual source text) is appealing to look at or even generally easy to comprehend. My sense, after many years of reading ugly code, is that there are a few reasons for this:

1) Programmers think that there is no limit to the benefit of being able to see more code at one time, regardless of how much code can be seen or how large is the display on which it's viewed. (This is often expressed.)

2) Programmers have a (I hope unconscious) belief that source code that fits in less space on the display is more compact when compiled and more efficient when executing. (This is a pure hunch on my part.)

3) Programmers are not trained in typography or graphic arts, and have few good intuitions about textual information presentation. (I'm pretty sure this is an actual fact.)

So even code that is architecturally and algorithmically good and encoded into an implementation well (artful and / or elegant, if you will) can still be quite ugly to behold on the page or screen.

I don't believe this must be so, but given how vehemently programmers argue for their inscrutable "styles," I think it will be an empirical fact for a long time to come.

### Somewhat guilty

2) Programmers have a (I hope unconscious) belief that source code that fits in less space on the display is more compact when compiled and more efficient when executing. (This is a pure hunch on my part.)

You know, even though I know this not to be the case for compiled languages like C or C++, I have to admit that when writing performance critical code I have at times written code with shorter variable names and fewer lines because it just "feels" faster.

These days, with reflective and interpreted languages, there is at least some cost to writing less dense code with longer variable names. E.g. in Java the class and method names (and in debugging mode, local variable table) are stored in the class files, taking up space. Verification and linking of said code requires parsing, examining, and comparing names. I can't count how many dozens of papers I have seen on reducing the space taken by Java programs by targeting this metadata. In worse dynamic languages, source code has to be parsed before it is executed, dispatching methods by name may end up consulting hashtables, etc.

In the end, I think for Java the cost is relatively small--but still, this cost is nonzero and I can at times be horribly miserly. (Perhaps that is from working for years with a MCU with 4KB of RAM).

### More Guilty

If it looks wrong, it is wrong. If it looks right, you should be able to state invariants. If the invariants look wrong, it is wrong. If the invariants look right, test it. If it passed the tests, maybe you did it right.

From the "I try hard, but cannot write code anyway" league ;)

### There may be a small penalty

There may be a small penalty for longer function and class names in many compiled implementations, as they are often represented directly in the binary. Where the cumulative effect of that is noticeable or not is a different matter. Also, programmers working in a language like C++ often do care about cumulative speed of compilation and linking.

### Ah, you are right.

Ah, good point. Dynamic linking of C/C++ libraries requires symbol tables, though IIRC ELF's linking information includes serialized hashtables to make this faster. Typically the debugging information is in a section of the ELF file that is not mapped into memory and thus shouldn't cost anything, unless it occurs in the file between the program header table and the loadable sections, which would require seek()'ing past it.

Agreed on compile time as well.

### For the dynamic runtime, my

For the dynamic runtime, my guess is that the cumulative effect of longer names on memory usage/throughput when different binary sections are loaded into memory/cache is more important than the longer time to do a hashing or string comparison.

### Feeling faster vs. truly fast runtime reflection

Ben,

Supporting fast runtime reflection has always seemed like a binary format issue. The format needs to be designed with en vivo studying of system characteristics in mind, and in situ updates need to be very fast. Although I have not read papers on Java or other languages, I am curious what "many dozens of papers" try to reduce metadata space and how. I don't think the Java standard allows much leeway in this design. In general, languages with an operational definition of reflection tend to have very sloppy runtime overheads (IMHO); I don't even like Tanter's work in this regard and think the best work on reflection to date is Filinski's work on monadic reflection and Manuel Clavel's work on reflective term rewriting. The reason is that how reflection is used in most distributed systems equates to defining a really large Lookup Table in an associative memory; and having studied many "open" distributed systems, they do not seem to take into account dynamic federation (like an object capability model necessarily would) and assume lookups will never partially fail. They also tend to use Lookup Tables to perform re-assembly, but they define re-assembly operationally and so the code is quite convoluted.

So these VM designs like Java strike me as very poor for large scale computing. In other words, this problem is yet another example of how VM design is best solved by junking something like Java (and Swing, etc.)

### Even in Java configurations

Even in Java configurations without reflection (e.g. CLDC), the metadata in class files is still a major source of space usage. There is a never ending stream of papers on the subject. Just look at the citation list of Pugh's 1999 paper here and do a breadth first search. Dozens might have been an understatement.

Orthogonal to the reflection issue with Java, the constant pools of classes are not shared, even in JAR files (JARs are simply ZIP files of class files and a manifest file), so every time a class A references another class B's field or method, B's full class name including package name, the field name, and the field's type show up in A's constant pool. People have clamoring for shared constant pools in Java since time immemorial. Android's DEX file format finally got this right--it collects classes together much like a JAR but is a custom format with shared constant pools, which is much more dense.

As for the rest of your comments about reflection, my own personal view of reflection is very dim. Metaprogramming, of which reflection is just the tip of the iceberg, can be incredible useful in certain situations. Most of the situations I have encountered that are compelling to me are cases of a universally useful subsystem--such as serialization--that operates on client data structures of any kind. Such a subsystem would be written once, never modified, and then reused. Its reflective nature obviates the need for it to ever be modified. At the other end of the spectrum, there are some use cases where programs do some minimal configuration using reflection; e.g. the user specifies an option to the system in the form of a string, and the system finds its internal components implementing that option by reflection. Then there is a third kind where the program uses reflection pervasively to compensate for lack of other language features, or to avoid lots of boilerplate code in poorly designed OO application.
I can't prove it but I think all of these use cases cry out for not a single reflective mechanism, but other, more principled and more targeted mechanisms. I dislike reflective programming in general and wonder whether there really is such a need out there for it.

### Bracha on Java serialization (and language design)

Re: comment-60313 (emphases mine):

As for the rest of your comments about reflection, my own personal view of reflection is very dim. Metaprogramming, of which reflection is just the tip of the iceberg, can be incredible useful in certain situations. Most of the situations I have encountered that are compelling to me are cases of a universally useful subsystemâ€”such as serializationâ€”that operates on client data structures of any kind. Such a subsystem would be written once, never modified, and then reused. Its reflective nature obviates the need for it to ever be modified.

This reminds me of Gilad Bracha's recent tanget on orthogonality in language design:

The reliance on class names is also an issue. What of anonymous classes? This is a problem in any case, but aggravated due to the reliance on names.

â€¦ the serialization team was, however, perfectly justified in assuming every class had a well defined name. They were working with Java 1.0, before the introduction of inner classes. Likewise, the inner class team was working on a system without serialization. No one saw the conflict until after the release combining the two â€” when it was far too late to do much about it.

### Orthogonal to the reflection

Orthogonal to the reflection issue with Java, the constant pools of classes are not shared, even in JAR files

Ha! I remember when I was a freshman in college and my friend told me how much memory the String class required.

Android's DEX file format finally got this right--it collects classes together much like a JAR but is a custom format with shared constant pools, which is much more dense.

Well, thank you for the ego boost. But I would hope everyone should know enough about linkers and loaders to realize a poorly designed binary format will cause headaches. I also hope people here understand mime-type (DEX, JAR) should be orthogonal to VM design so long as they define the same resource (deploy-time unit of work), even if it isn't.

Most of the situations I have encountered that are compelling to me are cases of a universally useful subsystem--such as serialization--that operates on client data structures of any kind.

I am not sure why you have focused on a particular case for serialization, such as client data structures. Most of the reason for Smalltalk's structure involved features like creating a restore point, trying out a particular idea and then, when rolling it back, having scheduled controllers redraw the environment in the correct order so that the restore point actually works. Newspeak does similar stuff, but it has a thunking layer to protect itself from the underlying environments volatile memory, so that object references do not depend on the operating system's resources directly (e.g., Brazil should not depend on Windows' GDI Objects). This is portability that enables you to save a full environment as a snapshot and migrate it to a new machine and load it there.

At the other end of the spectrum, there are some use cases where programs do some minimal configuration using reflection; e.g. the user specifies an option to the system in the form of a string, and the system finds its internal components implementing that option by reflection.

A design tenet of object-oriented engineering is to isolate invariants through problem domain analysis, and then put variables in external configuration zones. This makes unit testing features like message interaction braindead easy, because variables unrelated to the semantics of the problem domain are not "hidden" during testing but must be supplied. The scope of integration tests therefore cover these random factors that cannot be modeled explicitly through structure. It is important that a subsystem and supersystem do not depend on passing this "configuration" back and forth between each other, because the yo-yo of the Context object will require synchronization between the supersystem and all subsystems that depend on the Context. I also see overly-active Context passing in current graphics architectures, like WPF. Awhile back the Visual Studio product team wrote a post about its lessons on performance tuning WPF, and one of the things that didn't surprise me was that they eliminated fine-grained, incremental loading of resources which required constant rebuilding of the computation graph (causing the page layout synthesis subsystem to thrash on continuously invalidating constraints). To solve the problem, what they effectively did was implement a Context object, and did not allow subsystems to tell the supersystem to change the Context object. Instead, any changes were batched together and the interface for changing the Context object was separate from the Context object itself.

Then there is a third kind where the program uses reflection pervasively to compensate for lack of other language features, or to avoid lots of boilerplate code in poorly designed OO application.

I view this third category as hand-waving by you, and have seen it by others, so you are not alone :). In my experience, it is not enough to say that reflection is used in place of other language features, because it indicates we can just Scrap Your Boilerplate (SCB) or whatever. Languages like Steve Dekorte's Io make the boilerplate invisible, and so does any language with extensible syntax (e.g., SGLR, PEG, Earley, Pratt), so how can it be that the same semantics are boilerplate in one setting and "compensating for lack of other language features", but in another setting the same semantics have no boilerplate? For example, SGLR even eliminates The Lexer Hack from grammar design concerns, so the qualitative change is structural and not just merely related to syntax. Without such a structural change, every lexer must keep track of the flow of tokens from the tokenizer, creating a state machine in the lexer that "guards" against the wrong token stream. This is not modular, and means lexical analysis is not as simple as it could be. When you are trying to analyze something, a rule of thumb is to make your analysis as simple as possible, using the simplest model possible.

I can't prove it but I think all of these use cases cry out for not a single reflective mechanism, but other, more principled and more targeted mechanisms.

Generally speaking, you will want to combine reflection with other useful language mechanisms useful for term rewriting, like pattern matching and unification. Actually, reflection with pattern matching and unification has a synergistic energy to it. Furthermore, you will want an elegant, simple syntax for reflection to be effective. I've grown to liking Maude's syntax conventions, although sometimes I find them too terse and too much the product of mathematicians who like shorthand too much (IMHO).

I dislike reflective programming in general and wonder whether there really is such a need out there for it.

Reflective programming is pretty common in Maude, and a lot of the toolchain is implemented using reflective term rewriting. The use of reflection is actually efficient and allows specialization of term rewriting, such as defining subsorts using constraints on sorts. For example, a number of the performance optimizations done in Object/Relational Mapping frameworks would be best modeled in a tool like Maude, with the help of a meta sublanguage for peephole optimizations (for directing towards a favorable strategy) and rule-base optimizations (for eliminating bad reduction strategies). For example, ORMs like Hibernate use lazy loading to allow the programmer to specify what they want in a unit of work, and behind the scenes the unit of work object tells the engine what the user requested and the end result is the cheapest query that satisfies observational equivalence. What all ORMs do not allow you to do, though, is metalogic substitution and rewrite the query in terms of new demands. So you can't invert the query and transform it with a new meaning function and also convert a query into a query model (similar to how RDBMSes store queries internally to maximize query plan reuse, but instead of defining physical path strategies on indexes, etc., defining logical slots for substitutions); speaking from experience, this can be very useful and can eliminate duplicate code. With caching, it is effectively as-if we did duplicate code. It also allows performing model production just-in-time.

I just want to conclude by encouraging you to blow your own mind every day. Have a look at ASF+SDF Meta Environment and other metaprogramming tools, and go back to the original vision of automatic generation of environments with the idea of attribute grammars. Follow the rabbit trail.

### Down with Reflection! *

* "Down" in the sense of "kill that mofo!", not in the sense of "gettin' jiggy wid it." Just so you know.

In my experience, it is not enough to say that reflection is used in place of other language features, because it indicates we can just Scrap Your Boilerplate (SCB) [sic] or whatever. Languages like Steve Dekorte's Io make the boilerplate invisible, and so does any language with extensible syntax (e.g., SGLR, PEG, Earley, Pratt), so how can it be that the same semantics are boilerplate in one setting and "compensating for lack of other language features", but in another setting the same semantics have no boilerplate?

It's not. SYB is compensating for lack of other language features, for example the fact that Haskell cannot treat isomorphic datatypes generically.

For example, SGLR even eliminates The Lexer Hack from grammar design concerns, so the qualitative change is structural and not just merely related to syntax.

I know what the lexer hack is, but can you explain how SGLR addresses this? (I want to refute the usefulness of reflection here, but I don't know what I am refuting. :)

Maude

I remember looking at Maude's reflection once, and concluding that it did not really fit my definition of reflection. (I should say, "our"; when I worked at Utrecht with the Generic Haskell group, we all looked at it and thought so.) As far as I remember, it just represents Maude programs using abstract syntax plus some syntax extension as a normal datatype and then interprets the code internally using a tower of interpreters. Not much different than camlp4, except not external. Maybe Maude has changed since then, but my idea of reflection is something like "eval", which doesn't involve a tower. I think in Maude there is no interaction between the meta-level and the object level. (Edit: Or rather, no way to affect the meta level from the object level dynamically.)

My objection to (genuine) reflection is that it subverts the abstraction features of the host language. For example, for me a cardinal rule of programming languages is that the semantics of an object should not depend on how you name it. (Don't misinterpret this; you know what I mean.)

With a reflective tower like Maude, you are not doing that, because the Full Maude system is a different application than the "Simple Maude" one; the fact that the syntax of the latter is (syntactically) a sublanguage of the former's is just an accident.

I'm not apologizing for Maude; I'm just saying that what supports is not reflection.

(Edit: Ehud, it would be nice if the allowed tags for comments included ins and del.)

### Re SGLR applied to lexer

Re SGLR applied to lexer hack: GLR parses ambiguous productions simultaneously until all but the correct parse remains (or alternatively you end up with a forest of parse results). With scannerless parsing, the same approach can be applied to the typedef / identifier token ambiguity.

The relationship with reflection is by analogy, but I'm not sure it holds up meaningfully. Riffing on your SYB category, reflection can be used as a way of inefficiently implementing language features. I would define the category is those situations where you write imperative code using reflection that runs at runtime and consumes declarative code written by the programmer, where a proper language feature could consume that code at compile (or at least, parse / analysis) time and produce a more efficient result, holistically. In other words, the bit that does the "language" understanding exists at a different (and later) phase in the path of code to execution, and doesn't require integration with the compiler.

But trying to make the grammar analogy fit in here doesn't work as a refutation of the category, IMHO. I can see grammar definitions being put together in a parser-combinator like fashion, but only for the purposes of declaration, and henceforth processed by reflection to be transformed into the final implementation. Consider e.g. how .NET's LINQ uses reflection (along the way) to turn an expression which evaluates to a query description into an actual query, at runtime; the reflection is used to get metadata (attributes) associated with various fields and classes (=> tables) that appear in the expression, to guide the transformation.

.NET could have done this at compile-time (as historical query-integration languages have often done so), but it wouldn't have been as flexible, both in terms of imperative, dynamic construction of query expressions, and relative openness to alternative back-end implementations. By doing a certain amount at runtime using reflection (though of course static support via extension methods etc. helps), MS avoided having to create a compiler plugin API or similar to let other vendors implement SQLServer/etc. parity support.

### What's Bad In My Books

My objection to (genuine) reflection is that it subverts the abstraction features of the host language. For example, for me a cardinal rule of programming languages is that the semantics of an object should not depend on how you name it. (Don't misinterpret this; you know what I mean.)

What I think is awful is compile-time erasure and run-time reflection on type information (as in Java 5 Generics). The reason is mainly cognitive, having seen many Java 5+ applications written that use run-time type information, and the programmer is not cognizant of this subtlety. I can only guess how many (dangerous) real-world bugs this has caused. Too many Computer Science Ph.D.'s I've encountered cannot even write correct Java reflective code due to this inconsistency, let alone line-of-business Java code-grinders. A good example is the Hamcrest pattern matching library for Java, built in part by Dr. Nat Pryce.

In short, I know exactly what you mean, and agree with you that you should always strive to model problem domain invariants structurally, rather than by name. This is true whether you adopt a functional view or an object-oriented view. I also think you are suggesting something else: that Haskell's reflexivity, in conjunction with its use of a Hindley-Milner type system, is a richer form of generic programming than Maude, presumably since the type inferencing builds out the most general algebraic structure for you, and the parts that cannot be provided through basic language features can be provided reflexively (e.g., "SYB is compensating for lack of other language features, for example the fact that Haskell cannot treat isomorphic datatypes generically."). I am not sure I want to debate this with you, since I've read most if not all of Lammel's papers, including the 100 page one comparing generic programming approaches in Haskell, and found it dizzying to hold in my head. Any conversation about this stuff would kill my brain cells. If you'd like to write an expanded critique of Maude, go ahead, I'm sure it will be interesting to many.

I think in Maude there is no interaction between the meta-level and the object level. (Edit: Or rather, no way to affect the meta level from the object level dynamically.)

Because you should model problem domain invariants structurally, there is little need I've seen to allow the object level to affect the meta level (edit: strictly speaking, by definition, a reflective tower is a simplified meta-object protocol in which the object level cannot dynamically affect the meta-level). If the object level affects the meta level, then it is only via a recursion on sequential values that the meta level can read and react to. I believe this sort of isolation is extremely important, and the raison d'etre of languages: since we cannot rely upon machine language to guard against incorrect operation, we must build language interpreters to protect programmers from invalid machine usages. Isolation is a basic requirement for fault-tolerant program interaction.

My advice here is entirely about method, not subject matter (e.g., eval vs. syntactic reflection vs. declarative metaprogramming, etc.).

I know what the lexer hack is, but can you explain how SGLR addresses this? (I want to refute the usefulness of reflection here, but I don't know what I am refuting. :)

Spoofax (the mother--of-all-SGLR-demos) uses what some call "declarative metaprogramming" and not eval. Barry appears to have understood my analogy was about method, not subject matter.

### Funhouse Mirrors

The concept of 'reflection' is terribly distorted by existing uses and implementations. We'd probably avoid a lot of miscommunication, equivocation, and offense if we were to name a few different classes of 'reflection', but I don't feel up to the task.

My understanding of reflection as-a-language-feature requires that it be an ambient authority in the language. Naturally, this allows that reflection could be achieved without making it a language feature. As an example that I'd probably be happy with: take Gilad Bracha's Mirrors with the restriction that there exists no ambient authority to ObjectMirror - i.e. that authority must be designated within the constraints of the object capability model and very well might not be available to arbitrary programs.

I'd be even happier if this non-ambient reflection was also not part of a 'language standard'. Instead, provide an interpreter or model of the language in itself, and have instantiations of this interpreter provide the reflective authorities. Developers may then target programs to this interpreter when they need reflection - or otherwise wish to perform language experimentation. Reflection never becomes part of the language standard, except in some de-facto sense that this interpreter is readily available to language developers in all languages. If provided with an open or extensible model, the resulting interpreter would also be a powerful basis for experimentation and developing optimizations and new language features, and long term will certainly pay for itself and then some.

And that, as I understand it, is the essential basis for reflective towers. The reflection is not a language-standard feature, but is rather a feature of an implementation of a language modeled within itself.

But all this is a wild divergence from what I'd normally consider 'reflection', which is more along the lines of ambient authority to tweak and observe the meta-layer from within the object layer. The various advantages of relfection and self-modifying code would tend to apply (monkey-patching, AOP, stack analysis, down-casting, supporting ad-hoc language features such as mixins and multi-methods, etc.) but without structure and security required to control dangers and dodge widespread runtime costs.

It is with regards to ambient authority and language standard reflection that I say: Reflection is a design bug. And I'll agree with Frank: !ofom taht lliK

### Most people's dense natural

Most people's dense natural language prose is ugly to look at and read unless it has been edited for improvement. Source code for large programs is so long and gets modified for so many other reasons that no attempt is made to optimize these aspects in most cases.

### proof as certification, etc.

TL;DR: a proof certifies heavy effort, but heavy effort guarantees nothing

Take the context of NP computational complexity. Two analogies:
1. Compare to the establishment of NP-hardness.
You might have a whole bunch of smarties work real hard for a long time on getting a fast algorithm. If they fail, all you can say is "we failed. It's still slow". If, on the other hand, you prove the problem NP-hard, you certify that the problem will be slow, and your approximation efforts can be evaluated on their own merits.
2. Compare to the exhibition of a certificate
NP complexity can be defined based on a certificate: a solution-certificate is P-time to confirm but NP-time to generate. Simply being able to exhibit a supposed solution is not sufficient; one needs to establish a mechanism to verify that a solution satisfies properties in P-time.

In each case the idea that you've "worked hard and produced a decent answer" is unsatisfying. We instate mechanisms so that answers may be strongly, publicly, and mechanically validated.

This certification perspective illuminates the spectrum of verification. A type system takes much less effort than a fuller proof of semantic correctness-- but it's public and certifiable evidence. The end user may not care about the strict and thorough typing, but the programmer can start with some kind of confidence. Instead of reporting to one's boss that "I looked at the code a lot and I think I used the data structures correctly", one can say "it typchecked" and establish the property publicly and confidently.

In my logician-philosopher hat, I'd describe it as an observational fallacy. We, as implementors, notice that most proof efforts boil down in practice to careful and painstaking analysis after which the exercise of formal proof-creation adds little. But we cannot then infer that the analysis would be identical without that final step. The destination can have causative action on the planning and execution. The proof itself may have value that we, as implementors, do not recognize.

For the former exogenous benefit, consider specification. Just as a natural-language specification without code tends to be wishy-washy, analysis efforts without a formal product will tend toward the same. The sheer experience of trying to use a specification to produce code will find bugs not evident on the most thorough perusal. Crashing one's assertions against an unforgiving and unassuming formal system guides analysis towards greater clarity and precision.

For the latter, think of how simpler the battle over the Toyota brake problem would be if they had a proof-certificate that such problems can't happen. Sure, the proof could still be wrong, but the sheer presence of the certificate shows that they put in the requisite effort to establish the property. Saying, especially after the fact, that they did their darndest but have nothing to show for it, means little.

Thusly I find the original statement deceptive and difficult, as it entertains "doubts about whether the value of proof is justfied by its cost". Accepting little value in the proof itself makes it too easy to ignore exogenous changes and benefits that alter its value writ large. I remember some colleagues entering the ACL2 seminar triumphantly after finding a bug in an upcoming commercial processor. The bug had escaped all the other tests, found only after trying to prove its absence. So not only did the process of proof find a bug, but the final establishment of a proof gave a strong assertion that other bugs are most likely absent. The proof gave the company a confidence in the thoroughness and efficacy of the bug-hunting not otherwise available.

### NP-slow

Ignoring context (I understand you meant the NP arguments didactically), you are misstating things. Just because a problem is NP-hard in the general case does not mean it is slow in the average actual case.

Nor does a problem being NP-hard mean that a bunch of smarties can't find a solution - i.e. by tweaking the problem. For an optimization problem, perhaps an epsilon optimized solution will meet user requirements (no more than X% worse than optimal), and will often be much easier to achieve.

Finally, your mention about 'verify the solution in P-time' only applies to NP-complete problems (a particular class of decision problems), not to NP-hard problems in general (as you named on point 1).

### Confirmed

Ah, you're right, I changed lanes from NP-hard in (1) to NP-complete in (2) without signaling.

I think you've confirmed my point, though, and perhaps we are in violent agreement. As you mention, getting out of the "NP-slow" case requires shifting the problem slightly. Approximations are popular, as is restricting the input space.

However, the approximation comes at a different stage of the process than the proof. If you come back to a customer with "we used an approximation", they could object "why not exact? you failed!". If you also have a proof that the original problem is intractable, you now have perfect justification for not solving that problem exactly.

I've actually seen this play out in real life. We were delivering an approximation answer to a customer, and they objected to the imprecision (they were EE-educated, smart but used to the hardware space). We pointed out that a fully accurate solution to the problem was equivalent to solving the Halting Problem. They had issues understanding the ramifications there, but the point stands; without the assurance of a proof, we wouldn't have been able to fully justify our approximation.

### Very fair points. It has

Very fair points.

It has never been my experience that a user requirement, much less a 'problem', is so well specified that a customer is in a position to complain that you solved the wrong problem. Engineers have a lot of leeway in picking a subset of problems to solve - i.e. there are many different justifiable solutions for meeting user requirements, and each solution will involve solving a different set of problems. If you cannot change the problem, you're almost certainly allowing a customer to dictate the solution rather than the requirements.

That said, our limited foresight of these problems often interferes with choosing a realistic solution (much less an ideal one). A proof that a given problem is 'AI-complete' would certainly clarify even to ourselves (not just to the customer) that we've maybe backed ourselves into a corner, and that it's time to stomp across wet paint and accept some redesign and rework.

### Falsifiability

My belief is that the value of proofs is in their falsifiability. Any proof is useful to the extent that it offers you those details which might enable you to disprove it.

For example, The value of a type system is in the errors it blasts at the programmer at compile time (or even write time, in certain IDEs). Each error is a falsification of the argument implicitly made by the programmer's incorrectly-typed code.

OTOH, this is also why I've started to dislike "proofs" that are just a list of baffling theorem prover tactics. Imagine if you were reading a math journal and someone offered the following proof of P=NP:

> Posit: P == NP
OK.
> Auto
Qed.

How would you disprove this? Sure, the computer, at one point, said the theorem was proven. But I could empty a can of diet coke over the motherboard, and the computer would say all kinds of interesting things. The useful part is the falsifiability. I'd bet if I had a falsifiable argument that P = NP, I could actually use that argument to build an algorithm. I don't know how to do that from the above proof (although I believe this could be done automatically with modern Coq, which can express proofs as programs).

Hopefully I'm not starting any flame wars. These are just my thoughts of late.

### Proof Script vs. Proof

In Coq, the list of tactics applied is the "proof script," which, as the name suggests, constructs the proof (or "proof term," if you really want to be accurate). Once a proof has been completed, you can ask Coq to print it. I did this in my 100 Proof: Proof 1 post on my blog, where I used Coq to formalize the proof of a polynomial inequality in the integers. The proof script is very short, thanks to heavy proof automation for this class of proof in Coq ("intros. psatz Z 2. Qed."), but the proof term in the comments is nearly a page long, and completely incomprehensible if you aren't well versed both in Coq'a Gallina language and the library support for the Psatz tactic.

The principle reason that we trust a tool like Coq is the de Bruijn criterion: Coq's proof-checking kernel is small and simple enough to be audited by hand, if you understand the logic that it implements (which is itself small and simple). If you trust that it's sound, then you can trust any proof that it checks, no matter whether the proof term was constructed completely by hand à la Twelf, or by a (semi-)decision procedure like Psatz.

### but where's the understanding?

Surely a human could have written a more readable formal proof that would give a better understanding of precisely why x^2 - 2*y > 5 when x > 3 and y < 2. (For those not following links, this is what Paul proved in his blog post.) If I don't know why a proof works, then I don't know (for example) if the assumptions can be relaxed. Could we prove that x^2 - 2*y > 6? "intros. psatz Z 2. Qed." doesn't really tell us, unless it does so in a way I can't read. On the other hand a proof using monotonicity of (^2) on positive integers, and monotonicity of (2*) on integers shows us that we can prove it > 13.

Also, one other thing about proof scripts has been bugging me for a while: Why don't proof scripts have type signatures? Couldn't psatz have a type that restricts it to only be applied when it will find a proof? Aren't proof scripts just as hard to understand as untyped imperative programs? Well, except that if they do happen to produce a proof, at least you know it's correct, even if you can't understand it.

### Absolutely

A few observations:

An important question to ask before beginning a Coq development is indeed what the purpose is. There are three major ones that I tend to think about (i.e. there could be others):

1. A yes/no answer to "Is this true?"
2. A yes/no answer to "Is this true?" with an explanation that I can understand.
3. An extractable implementation of a type (= theorem by the Curry-Howard Isomorphism).

Different goals will indeed suggest different proof approaches.

Surely a human could have written a more readable formal proof that would give a better understanding of precisely why x^2 - 2*y > 5 when x > 3 and y < 2.

Sure. For simple mathematical proofs like this one, it might have been worthwhile to use Coq's C-zar language to write the script, and to make, e.g. the positivity lemma(s) and sum-of-squares approach to the proof explicit. It would, of course, be a longer script: one of the trade-offs of the "proof with understandable explanation" vs. "yes/no answer" approach.

Why don't proof scripts have type signatures?

IMHO (I'm not a Coq developer) because the Ltac tactic language is Turing-complete, and it would be potentially confusing to have a typed tactic language whose type system was different from that used in the Gallina language, which is not Turing-complete. Keep in mind that the point of Ltac is to make it possible to try anything that might construct a valid proof term, so it has essentially none of the constraints that Gallina does. All a tactic has to do is construct a proof term; it's still going to be checked by Coq's kernel.

Couldn't psatz have a type that restricts it to only be applied when it will find a proof?

Psatz could because it deals with a decidable fragment of the logic, but an arbitrary tactic couldn't, because the Calculus of Inductive Constructions is strongly undecidable.

Aren't proof scripts just as hard to understand as untyped imperative programs?

Potentially, but so what?

Well, except that if they do happen to produce a proof, at least you know it's correct, even if you can't understand it.

Bingo. So my suggestion is: use the usual tactics for yes/no answers, C-zar for when you want to understand the proof you're developing at the likely cost of increased verbosity, and the Program keyword for developing certified software to extract.