## Computer generates verifiable mathematics proof

NewScientist.com:

A computer-assisted proof of a 150-year-old mathematical conjecture can at last be checked by human mathematicians... ...Georges Gonthier, at Microsoft's research laboratory in Cambridge, UK, and Benjamin Werner at INRIA in France have proven the [Four Colour Theorem] in a way that should remove such concerns.

They translated the proof into a language used to represent logical propositions - called Coq - and created logic-checking software to confirm that the steps put forward in the proof make sense.

Georges Gonthier's home page includes links to the paper and the actual proof.

## Comment viewing options

"5. That Microsoft is granted back, without any restrictions or limitations, a non-exclusive, perpetual, irrevocable, royalty-free, assignable and sub-licensable license, to reproduce, publicly perform or display, install, use, modify, distribute, make and have made, sell and transfer your modifications to and/or derivative works of the Software source code or data, for any purpose."

### True

I didn't download the proof.I was so disgusted by the idea that there's a license attached, I simply clicked No, and moved on to other things.

### MS MapColor 1.0

Well, it's a constructive proof and Microsoft wants to defend its position in the map-colouring market. But I agree: it's silly.

### not to worry

Microsoft wants to defend its position in the map-colouring market.

No problem, Google MapColors will be much sexier anyway, 'cause its map colors will have drop shadows.

Although I like the NewScientist article the headline is wrong. The computer did *not* generate the proof. It merely checked it. I think this is a very important distinction to make if these methods are to be accepted by mathematicians. Many mathematicians are afraid that computers will take away the elegance and creativity of mathematics by finding (completely incomprehensible) proofs automatically. But proof assistants like Coq does not try to automate prooving (at least not to any greater extent). They focus on checking proofs which mathematicians should be much more open to.

### The title

Well, rereading it the title is indeed misleading. The original proof was computer generated. The current work is about computer verification of the proof...

### Definition of "computer generated"

If my understanding of the proof of the four-color map theorem is correct, the proof consisted of showing that all possible planar graphs could be mapped on to one of about 2000 different cases; a computer program was then written to show that in each case, a 4-coloring existed.

So... how much can be considered "computer generated"? The computer had the (monotonous for a human) task of showing a 4-coloring existed for each case. Such a thing can, of course, be considered a simple instance of theorem proving, analogous to the sorts of theorem proving carried out by typecheckers, bytecode verifiers, CPS programs, etc.

However, the intellectual heavy lifting still was done by a human mathematician; just as the intellectual heavy lifting in type inference is done by guys like Pierce, Milner, and Reynolds.

We're still a long ways off from being able to specify a proposition in some logic (such as the 4-color theorem), a relevant set of axioms to describe the system, and allowing the computer to synthesize the proof via transformation rules.

The incompleteness theorem holds that many propositions are not decideable; and the limitations of Turing machines further stipulate that even for those theorems for which a proof exists, a deterministic algorithm for constructing such proofs cannot be found. However, despite these limitations, human mathematicians are able to prove highly non-trivial theorems by suitable application of domain knowledge and heuristics.

Were a proof for a theorem as difficult as the 4-color theorem were to be completely synthesized by a computer, that would be an astounding result. The result described on this page is interesting and useful, and I don't mean to diminish their work, but we're a long way off from true computer-generated proofs of arbitrary theorems. And we might not ever get there.

### Minor matter

even for those theorems for which a proof exists, a deterministic algorithm for constructing such proofs cannot be found

If a proof exists then we enumerate all strings until we find one.

### ...a decideable deterministic algorithm. :)

What I meant, of course, is a deterministic algorithm which is decideable on a Turing machine. Enumerating a (countably) infinite number of strings is not.

A good chunk of the skill employed by a trained mathematician is knowing which infinite subset of the strings in the language to avoid; so only a finite number of cases can be checked.

### What mathematicians really do is different

The fact is, apart from a handful of people working in logic, no mathematician ever generates a string that could formally be called a 'proof'. What mathematicians actually do is generate strings (and pictures) designed to convince other mathematicians that they could, in principle, generate proofs. But in actual fact writing a correct formal proof of even the most simple published theorem is pretty hard.

### Assumptions and Shortcuts

All strictly formal proofs are predicated on explicitly laying out the assumptions being made. Without assumptions there can be no proof. What most mathematicians do is that they assume that the reader is already familiar with many of the assumptions and they require no formal proof in the current instance. Otherwise every mathematical proof would have to start from square one, proving the theory of numbers, sets, categories, etc... Most mathematical proofs are not intended for the casual observer. They are intended for those that are steeped in the particular mathematical domain and it's symbols.

And while on the subject of Maths, I find it enlightening to view the field of mathematics as not one singular discipline, but rather a whole variety. There are maths that are geared towards all sorts of things. The type of math that is best suited to the study of programming is not necessarily useful for physics (e.g. lambda calculus...). Anyhow, I think Maths is the more proper term than Math.

### More shortcut than assumption

Even taking into account assumptions mathematicians don't generate proofs in the formal sense. They don't even state their theorems formally. They use a mixture of natural language and semi-formal notation, nothing that a computer could recognise as a theorem, let alone actually derive a proof for. Even the most formal textbooks on mathematics come nowhere near 'formal' as a computer scientist might understand it. And that's one reason why I don't think there's any need to worry about competition from automated theorem provers, or even theorem checkers, for a long time.

### As formal as it gets

You may find http://us.metamath.org/index.html interesting for 'formal' proofs. 7 steps for 1+1=2 baby. Yeah !

### It's a little behind the leading edge...

"A natural number is less than or equal to one iff it is equal to one." Just added today. But I still think it's nice to see some people actually demonstrating that ZF really can do the job. And it's interesting to have a peek to see what manner of beast is usually swept under the carpet with lines like "It is trivial to see that...".

### CPAN, obviously.

This obviously calls for CPAN, the Computer Proof Archive Network!

It would be sort of like CiteSeer where you can automatically {look up,import} references to other proofs. I'm not sure if this is just a joke or would actually be useful.

--Shae Erisson - ScannedInAvian.com

### That would be useful. No joke.

There are entire libraries of proofs for the Isabelle and PVS theorem provers, building from first principles on up. More get added all of the time. This site lists a few such projects for Isabelle. It would presumably be useful to be able to automatically find, reference, and import those libraries, instead of hunt the internet for them.

### ghilbert

ghilbert set out to dome something like this, but it looks like the author has deserted it for nice outlines.

### Generality of lambda calculus

The type of math that is best suited to the study of programming is not necessarily useful for physics (e.g. lambda calculus...).

You have a valid point about the diversity of mathematical fields, but I have to (nit)pick on your example, because I think almost the opposite is true in that particular case. Structure and Interpretation of Classical Mechanics shows the usefulness of lambda calculus, via functional programming in Scheme, when applied to mechanics. Scheme is used here not just to calculate, but to help in understanding. This is described in some detail in the preface.

The informality of mathematical proofs which sigfpe has mentioned is is one of the problems which Sussman and Wisdom use Scheme to address, in the context of physical formulae as opposed to proofs:

Computational algorithms are used to communicate precisely some of the methods used in the analysis of dynamical phenomena. Expressing the methods of variational mechanics in a computer language forces them to be unambiguous and computationally effective. Computation requires us to be precise about the representation of mechanical and geometric notions as computational objects and permits us to represent explicitly the algorithms for manipulating these objects. Also, once formalized as a procedure, a mathematical idea becomes a tool that can be used directly to compute results.
...
Our requirement that our mathematical notations be explicit and precise enough that they can be interpreted automatically, as by a computer, is very effective in uncovering puns and flaws in reasoning.

Notions like mapping, folding, set comprehensions, bounded quantification, and nondeterminism allow one to write code that transparently reflects the underlying mathematical concepts and fosters clean thinking.

Many aspects of this are not unique to lambda calculus, but of course lambda calculus provides a good foundation for these operations. Further, Siskind as well as Sussmann and Wisdom do rely on lambda calculus features, like higher-order functions and curried functions.

Siskind posted a longer article talking about the benefits for research of being able use a "language that supports first-class aggregate data, user-defined higher-order procedures, and lambda expressions with lexical scope". I like his conclusion:

I have come to believe that, in research, it is really beneficial, and even imperative, that corresponding code and documentation have isomorphic abstract syntax. I conjecture that, if people really thought about it and understood the issues, the same would hold true in industrial software development.
Matthias Felleisen and Dan Friedman have also made similar points about the usefulness for understanding of expressing ideas in Scheme (which can be generalized to functional languages, if you're so inclined).

### Structure and Interpretation...

I'm not sure that book shows the usefulness of lambda calculus. What it does demonstrate is how useful it is to talk explicitly about functions. Mathematics texts often talk implicitly about functions by referring to a bunch of variables, say (y_i), that depend on another bunch, say (x_j), without making explicit the fact that y_i=f_i(x_j) for some functions (f_i). This can result in all kinds of ambiguities including the ones mentioned in the introduction you link to. Maybe by "lambda calculus" you mean merely "lots of explicit functions" in which case I suppose I agree.

### SICM & LC

Aside from the most basic features common to most PLs, the features of Scheme which SICM relies on most heavily are lexically scoped variables, first-class and higher-order functions, and currying. That's what leads me to describe it as lambda calculus. For example (from 1.9):

(define (F->C F)
(define (f-bar q-prime)
(define q
(compose F (Gamma q-prime)))
(Gamma q))
(Gamma-bar f-bar))
(show-expression
((F->C p->r)
(->local 't (up 'r 'theta) (up 'rdot 'thetadot))))


Despite the rather strong claim about Scheme in the preface, you could easily write such code in just about any functional language with lambda. However, you couldn't write it as easily in a language without the features I mentioned above. All of these features are characteristic of lambda calculus, and languages derived from lambda calculus. This leads me to conclude that the book demonstrates that LC is a type of math which is well-suited for use in the study of physics.

### Well suited iff...

...you plan to use computers for your proofs. Which I guess is a foregone conclusion these days.

Would the lambda calculus make sense for physics without computers (i.e. prior to the 50's)? The maths that gain dominance are a function of their utility based on the toolsets available. There are many archaic forms of math that were abandoned long ago. Makes reading old math treatises rather difficult as they have notation and specialized math that are no longer widely used.

Anyhow, you do make a convincing case for the usefulness of LC, so I withdraw my lone example. :-)

### Machine proofs keep us honest

Would the lambda calculus make sense for physics without computers (i.e. prior to the 50's)?

An interesting question. I think a case could be made that it would have made sense. One place that could run into trouble, though, is that without the assistance of computers to check the lambda expressions, the level of rigor might often seem pointless, and people would fall back to leaving the obvious stuff out. The machines keep us honest.

Anyhow, you do make a convincing case for the usefulness of LC, so I withdraw my lone example. :-)
Sorry about that. :-) I think the principle is sound, it's just that examples need to use something less general. How about: Riemannian geometry is useful for modeling spacetime and gravity, but not so much for modeling chemical reactions. (Best I could do on short notice.)

### Interesting claim

lexically scoped variables, first-class and higher-order functions, and currying

You'll find all of these things in even the most trivial linear algebra. For example, if g(.,.) is the inner product function, then \y -> (\x -> g(x,y)) (excuse my pseudocode) is the function that maps a vector to its dual. Any page of a pure mathematics book probably has many examples of currying, though few mathematicians would call them that. These are just operations on functions and mathematicians have been doing this for a long time. I don't know whether this is a clear cut argument that we should all be doing lambda calculus, or whether this just demonstrates that what mathematicians already do could be called lambda calculus but that explicitly calling it that adds nothing. Certainly the things that I think of as specifically features of LC, like Church-Rosser, aren't really of interest to most mathematicians.

### Re: Interesting claim

You'll find all of these things in even the most trivial linear algebra.

I didn't say these things weren't present in other formalisms, but SICM uses Scheme, and Scheme gets those features from lambda calculus. I'm not arguing that LC is the only thing that can do the job, but it is appropriate for the job, and SICM demonstrates that.

Sussmann and Wisdom certainly could have used a computer algebra system to achieve similar ends, but they didn't. A side effect of their choice of a functional subset of Scheme, is that they've demonstrated by their own experience and writing that LC is suitable for use in that field.

In fact, the linear algebra example only underscores the broader point which S & W make. Although the features in question may all be present in linear algebra, they're exactly the kind of thing which are assumed, and never laid out very explicitly, and which can lead to ambiguity. The SICM preface footnote gives an example, with the dual meaning of 'f' on either side of an equation.

Any automated system capable of expressing such formulae would help expose these ambiguities, to keep us honest in the sense I've described. However, LC has an additional quality which can be a benefit here, which is that it's also a computing formalism. While it can easily be used to express static equations, it can also be used in more computationally complex contexts, without having to extend the system in ad-hoc ways.

I don't know whether this is a clear cut argument that we should all be doing lambda calculus

To an extent, it is. Lambda calculus is a theory of functions. What theory of functions underlies linear algebra? We're talking about foundations here, mathematical logic. Working mathematicians may ignore those foundations much of the time, but I think the history of mathematics has shown that it pays to have explored the foundations.

or whether this just demonstrates that what mathematicians already do could be called lambda calculus but that explicitly calling it that adds nothing
In the areas we're discussing, what mathematicians already do can be mapped to lambda calculus, using lambda calculus as a theory of the functions the mathematicians are working with. You see something similar with denotational semantics: programming languages that weren't designed with LC in mind are provided with semantic theories expressed in terms of LC. LC is used here as a metalanguage for expressing other languages. However, not surprisingly, it turns out to be quite a useful language in its own right, which raises the question as to why it (or a direct derivative) shouldn't just be used directly. The functional languages, and SICM, answer that question by doing exactly that.
Certainly the things that I think of as specifically features of LC, like Church-Rosser, aren't really of interest to most mathematicians.
Any system is much more than just those features which are most unique about it. In any case, the Church-Rosser property isn't specific to LC — it's a property which can arise from the features of a reduction system. Mathematicians certainly depend on the ability to reduce their equations in different orders and arrive at the same result. Perhaps they should care about Church-Rosser! ;)

### Perhaps they should care...

Well mathematicians work from a different set of axioms from lambda calculators (is that what people who work in LC are called?). The issue isn't whether or not two different derivations get different results but whether or not it is possible to derive A and not A. And as there's a convenient theorem that shows we can't ever hope to guarantee this is impossible, sticking your head in the sand and ignoring the issue is completely justified behaviour :-)

### mechanical proof as crutch?

I guess we shoud expect--perhaps even welcome--the aid of computers in proof verification. Still, the (modern) humanist facet bristles at the intrusion of computers into this most arcane human endeavor.

I worry (in futility, of course) that ready access to mechanical proof verifiers (especially among students) will make the goal of elegance in proof seem hollow, even quaint. It's really about the perennial student question:

"Why do I have to learn [insert topic here] when my [insert tool here] can do it?"

I wonder: Does anyone see mechanical proof as being like code generation? Like code generation, mechanical proof:

• is probably efficient and (I suspect) elaborate
• can be judged to be acceptable even as the inner workings remain a mystery

Well, maybe not a mystery ;-) but certainly daunting. After all, we use code generators quite profitably to get scanners, parsers, etc. when the hand-coded alternative is inexpedient and we don't seem to care (much) for elegance there.

Is there a mechanical proof ROI? Perhaps mechanical proof should be undertaken when:

• the intervening steps of the verifier are provably easier to apprehend than the handworked alternative
• verification of the proof languishes because of human limitation

Of course I'm overlooking the most obvious confound: What if the logic of the logic-checking software is in error?

### Well....

We use type-checkers to prove the theorem that a well-formed typing exists for a given program (or else to prove that it doesn't).

Every time you load a Java applet into your browser, the JVM proves that it is well-behaved according to certain parameters.

Code generation, of course, is a constructive form of proof--it generates a solution that satisfies the initial proposition. Not all proofs are constructive.

All of these things are beneficial, and a good chunk of research in computer science is geared towards further automating the generation of code or verification of constraints. Computer theorem provers are wrong far less often than human theorem provers; when one writes reinterpret_cast in C++, one usually does so after mentally constructing a "proof" that the typecast is in fact valid in all cases. Frequently, human programmers are incorrect on such matters; which is why many hold the view that unchecked forced typecasts are a highly dangerous feature (and not to be used lightly, if at all).

Regarding the perennial student question; I think it is important to distinguish mechanical procedures for arriving at some solution, and the underlying theory. Students in math are still taught about square roots, logarithms, trig, and such. However, in most cases they are not taught rote procedures to compute/approximate these things by hand (or with a slide rule) in elementary mathematics courses. (Students are still taught long division--at least when I was in school). For the elementary math student, I don't think that this is a problem. Pocket calculators are cheap and ubiquitious, and even work when stranded on a desert island. :) There are other topics which could be taught in the limited study time available, which would better expand the student's knowledge of the subject.

Algorithms for these things are still introduced when more advanced topics (like Newton's method, Taylor series, etc.) are covered, but then computation of specific functions are seen as a more general case.

Regarding elegance. When we are doing programming-as-art, elegance is indeed a useful thing. When we are doing programming-as-work (the functionality of the thing being what is important, rather than the aesthetics), elegance is still useful when it serves as an indicator of robustness/correctness. A system which is elegant is more likely to be understandable (and correct) than one which isn't.

However, when doing programming-as-work, it is my belief that automation trumps elegance. Even though hand-coded recursive-descent parsers are more "elegant" in a sense than machine-generated LALR(1) parsers; when I need a parser my usual technique is to fire up yacc or a similar tool (depending on what language I am targeting). Even if I'm not writing any actual code, yacc will happily tell me if my grammar is ambiguous or utter nonsense--having the computer do the formal reasoning about these matters is highly beneficial.

One more point. If you look at the output of most "pre-processing" DSL translators (those which translate a DSL into a general-purpose language; I'm also including language preprocessors and macro systems), the resulting code in the general-purpose language is frequently an incomprehensible mess. Does this mean that we should abandon use of DSLs on aesthetic grounds? (Or high-level languages themselves, for that matter--the assembly language or bytecode output of most compilers is even more "inelegant"). I would think not. Elegance, if it is to be measured, should only be measured on the human-authored source; not on any computer-generated byproduct.

### Re: Well....

Code generation, of course, is a constructive form of proof--it generates a solution that satisfies the initial proposition. Not all proofs are constructive.

I realize that. Thanks.

Regarding the perennial student question; I think it is important to distinguish mechanical procedures for arriving at some solution, and the underlying theory.

As do I. However, I see many more students who learn the theory only to forget it once the procedure has been acquired. The problem with procedures is that order is frequently essential to correctness. I find that if students make an ordering error, they lack the metacognition of proof to get them "back on track".

Computer theorem provers are wrong far less often than human theorem provers...

Maybe so, but the human theorem prover's work is likely checkable by still other (yes, fallible) human theorem provers. I guess I was reacting to runaway subordination of intelligence to process.

If a computer theorem prover is itself incorrect, its flaw may be propagated blindly in a form that isn't amenable to checking by humans. Are we to construct theorem provers to verify yet other theorem provers ad nauseam? This was the context of my humanist comment.

As I understand it, theorem provers range from several thousand to hundreds of thousands of lines of code. Plenty of room for error there in my opinion. I suppose proving the (computer) theorem prover correct is more manageable than verifying the proofs to which one would set it.

There are other topics which could be taught in the limited study time available, which would better expand the student's knowledge of the subject.

A pragmatist! I concur.

Even if I'm not writing any actual code, yacc will happily tell me if my grammar is ambiguous or utter nonsense--having the computer do the formal reasoning about these matters is highly beneficial.

Again, I agree.

There are other topics which could be taught in the limited study time available, which would better expand the student's knowledge of the subject.

Of course not. But elegance need not be sacrificed altogether when crossing the partition from human-authored source to "whatever". BTW: assembler and bytecode are still comprehensible with effort ;-).

It's pat to say that elegance is solely the provice of human-authored source because it conditions us to expect abomination elsewhere. I enjoy functional programming because referential transparency is elegant; it allows reason to penetrate. It would be incongruous to desire elegance there and not elsewhere.

### Reliability of the proof tool

As I understand it, theorem provers range from several thousand to hundreds of thousands of lines of code. Plenty of room for error there in my opinion. I suppose proving the (computer) theorem prover correct is more manageable than verifying the proofs to which one would set it.

As far as I understand, most theorem provers consist of (at least) 2 parts: one for constructing the proof, and one for checking it. Basically, the first is the big intelligent' part that, based on the input, generates the real, internal' proof - which consists of very many very small steps, and is usually invisible to the user.

However, the big step-by-step proof generated by this part is usually checked by a very simple checker.

Basically, there might be bugs in the intelligent' part of the proof tool, but those should be caught by the checker. And the checker is relatively reliable.

### I wonder: Does anyone see m

I wonder: Does anyone see mechanical proof as being like code generation? Like code generation, mechanical proof: (*) is probably efficient and (I suspect) elaborate (*) can be judged to be acceptable even as the inner workings remain a mystery

Hmm, mechanical proving is as laborious as trying to define a compiler with S and K operators only. That they were able to prove this non-trivial theorem is incredible and must have taken them several years, and a lot of -elaborate- hacking. I guess the fact that this is mostly first, maybe second order, reasoning helped a lot, but still, for theorem proving this is an enormous achievement.

Actually, there is a lot to be learned there. As I understand it: it seems that classical reasoning is "easier" than intuitionistic reasoning (they hacked ordinary reasoning into the system by mapping properties to boolean values), computational reasoning is the prefered way of going (if you can express a function as a computation then do so), rewriting is very valuable... Nice paper.

Of course I'm overlooking the most obvious confound: What if the logic of the logic-checking software is in error?

In their case, I guess it's more that you need to believe their formalization of planar graphs (if I understood the paper correctly).

### Interesting

via TAR, we find an interesting discssion by Hud Hudson leading to this conclusion: Do we, then, have a counterexample to the celebrated "four-color conjecture"? Well, that depends on exactly how the conjecture is formulated, and unfortunately, formulations that are widely taken to be equivalent may in fact not be equivalent.

Which goes to show you the one of the subtle implications of using formal reasoning: you have to state formally what it is you are trying to prove...

### Advanced Chess, er... Mathematics

Recently, I stumbled upon a game known as Advanced Chess, introduced (I think) by none other than Kasparov himself.

The idea is simple: pit a human assisted by a computer against another human assisted by another computer. Let the human player come up with [long-term] strategies while the computer takes care of [short-term] tactics, by exhaustive searches, chess libraries and a few heuristics. No, this is not exactly chess. But yes, I believe that this is the future of chess -- and warfare, by the way.

What's the common point ? Well, between playing chess and building a formal proof, there's not much difference, is there ?

We have a number of proof assistants around, which could handle "tactics" -- I'm only familiary with Coq and Twelf, but some others might be quite appropriate. We also have tools for formal manipulation, which could feed goals to the tactics and/or handle "subtactics" -- I'm only familiary with Maple and Mathematica, and both should be reworked to get something along these lines, but I think that the raw material is here already.

Are we ready for prime-time ? In my limited experience, I would say "not yet". However, I believe that all of this represents the future of Mathematics. Oh, and just as Advanced Chess will probably give birth to Advanced Warfare, we might need to think about Advanced Physics and Advanced Biology.

### How to Believe a Machine-Checked Proof

I'd like to point out a wonderful article (first pointed out by Conor
McBride on the Haskell mailing list half a year ago or so):

@INPROCEEDINGS{Pollack:belief,
AUTHOR = {Robert Pollack},
TITLE = {How to Believe a Machine-Checked Proof},
BOOKTITLE = {Twenty Five Years of Constructive Type Theory},
EDITOR = {G. Sambin and J. Smith},
YEAR = 1998,
PUBLISHER = {Oxford Univ. Press},
URL = {http://www.dcs.ed.ac.uk/home/rap/export/believing.ps.gz}
}
`

### A different kind of approach to automatic theorem-proving

There are algorithms for proving theorems in certain specialized areas of mathematics that generate (relatively easily) human-checkable proofs in the form of proof certificates. One of the most famous examples is probably the WZ (Wilf-Zeilberger) algorithm for proving identities involving hypergeometric series, a very large and interesting class of series.

It seems to me that these kinds of approaches are much more likely to be succesful than trying to use a general interference approach. Both from the point of view of actually arriving at proofs in the first place but also in the subsequent phase of human-checking of the generated proofs. It's about using an abstraction well-suited to the problem domain. Another good example of this kind of thing is proving correctness of imperative programs via Floyd's formalism (loop invariants, etc). The hard part is coming up with the right invariants but once they have been discovered they essentially serve as the core of a proof certificate and the purported proof is generally checkable without much fuzz.