The Experimental Effectiveness of Mathematical Proof

The Experimental Effectiveness of Mathematical Proof

The aim of this paper is twofold. First, it is an attempt to give an answer to the famous essay of Eugene Wigner about the unreasonable effectiveness of mathematics in the natural sciences [25]. We will argue that mathematics are not only reasonably effective, but that they are also objectively effective in a sense that can be given a precise meaning. For that—and this is the second aim of this paper—we shall reconsider some aspects of Popper’s epistemology [23] in the light of recent advances of proof theory [8, 20], in order to clarify the interaction between pure mathematical reasoning (in the sense of a formal system) and the use of empirical hypotheses (in the sense of the natural sciences).

The technical contribution of this paper is the proof-theoretic analysis of the problem (already evoked in [23]) of the experimental modus tollens, that deals with the combination of a formal proof of the implication U ⇒ V with an experimental falsification of V to get an experimental falsification of U in the case where the formulæ U and V express empirical theories in a sense close to Popper’s. We propose a practical solution to this problem based on Krivine’s theory of classical realizability [20], and describe a simple procedure to extract from a formal proof of U ⇒ V (formalized in classical second-order arithmetic) and a falsifying instance of V a computer program that performs a finite sequence of tests on the empirical theory U until it finds (in finite time) a falsifying instance of U.

I thought I had already posted this, but apparently not.

Consider this paper the main gauntlet thrown down to those who insist that mathematical logic, the Curry-Howard Isomorphism, etc. might be fine for "algorithmic code" (as if there were any other kind) but is somehow inapplicable the moment a system interacts with the "real" or "outside" world (as if software weren't real).

Update: the author is Alexandre Miquel, and the citation is "Chapitre du livre Anachronismes logiques, à paraître dans la collection Logique, Langage, Sciences, Philosophie, aux Publications de la Sorbonne. Éd.: Myriam Quatrini et Samuel Tronçon, 2010."

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

[OT] Please include authors in post

I have a small request: when people post papers here, could those posts please include information about the authors of the paper? (And ideally, where it appeared.)

[OT] and the year of publication

The year of writing (or publication) is also important as it gives a rough idea of the research context -- plus, it's often harder to find out from the paper alone.

This paper is from Alexandre Miquel, and I could find no date about it on the author's webpage : it appears to be a yet-unpublished manuscript. Its latest bibliographic reference being from 2009, the paper is really recent. Finally, it refers to the 8.2 version of Coq, which was obsoleted by Coq 8.3 in October 2010, so I think we can precisely date publication between 2009 and 2010. Yet the paper, being unpublished, may still evolve in the future.

Some doubt

Dear All,

I see that he uses the following forall rules:

    G |- t : A
    ------------------- x notin G
    G |- t : forall x A
    
    G |- t : forall x A
    -------------------
    G |- t : A[x/s]

But usually the proof objects are not left untouched,
usually we have:

    G |- t : A
    ---------------------- x notin G
    G |- \x.t : forall x A
    
    G |- t : forall x A
    -------------------
    G |- (t s) : A[x/s]

I wouldn't mind if the paper wouldn't stipulate
proof object based processes.

So my speculation is that the author somewhere on the
road left PA2, must have effectively worked in
another system.

This can easily happen, it already happens if we
directly turn a definition:

   Define I x = x

Into:

   I = lambda x.x

Instead of:

   forall x ((I x) = x)

The introduction of a lambda would require some
comprehension axiom.

This could invalidate proposition 11 of his paper,
which I feel invalidates Gödels first incompletness
result, whereby I have no doubts on the falsification
principle in what ever incarnation it comes.

Bye

P.S.: He uses already the simple inference rules here:
http://perso.ens-lyon.fr/alexandre.miquel/publis/witness.pdf

And he gives an explanation:

For this reason, a (Curry-style) proof-term should not be
confused with the proof (i.e. the derivation) it comes from,
since the latter contains much more information that cannot
be reconstructed from the proof-term. In such a setting,
the proof-term is merely a computational digest of the formal
proof, where some computationally irrelevant parts of the
proof have been already removed.

Agreed

There may be some leaking into a larger system, but the paper already makes the point that PA2 was only chosen as being representative, and the point holds in considerably stronger systems, in particular some flavors of ZF. More recently, there's been work on including certain forms of the Axiom of Choice. If there's a variant of ZFC that this covers, there's good reason to believe (but no proof!) that it holds for all of mathematics.

I don't lose sleep over ZFC not having been proven to serve as a foundation for all of mathematics, so I don't expect to lose any sleep over what might be a gap here, either, but I do think you raise an interesting point.

When is a proof-term a term which is a proof?

It is inappropriate to refer to a term as a "proof-term" if it does not constitute a totally recheckable proof. Curry-style proof terms, terms in extensional type theory and Lord knows what Lord-knows-what-else else fail to qualify as proofs because they don't render finitary the advice "Oh, you figure it out!". While we're on the topic, it is inappropriate to refer to a tactic script as a "proof-script" when it does not constitute a totally recheckable proof.

I suspect that this is a

I suspect that this is a veiled response to the discussion you had with Rich Hickey and subsequently with me on Reddit.

Hickey made a comment about why he thinks tuples complect (intertwine) meaning with order: to the programmer each slot in a tuple has different meaning, but in the source code that is only represented by the position in the tuple instead of as a name as it would be in a record/map:

If the 3 ints [in a tuple] represented my accounts receivable, cash on hand and debt, I care very much what they represent, and the correctness of my program (to the outside world) depends greatly on those semantics, none of which are guaranteed even though my type checker proves I used them as ints successfully. That your program is self-consistent is irrelevant if it is wrong about the world.

Statically typed programs and programmers can certainly be as correct about the outside world as any other program/programmers can, but the type system is irrelevant in that regard. The connection between the outside world and the premises will always be outside the scope of any type system.

You responded that you should model the semantics of "receivable", "cash on hand" and "debt" in the type system or formal model. I made the point that for algorithmic code this often works. A problem like sorting has a relatively simple specification compared to the implementation, so the specification is more likely to be correct. But for non-algorithmic code (a phrase which you find meaningless -- I know), a formal specification will not solve all or even most of your correctness problems. The difficulty and risk is in the step from informal requirements (like "this web app will turn a profit") to formal specification, not from formal specification to implementation. So you might as well do the implementation directly, especially if that is faster than doing full formal specification (which with current tools it definitely is, and will remain so for the foreseeable future).

What in this paper do you find compelling evidence that setting up a full formal model of software requirements in something like Coq and then creating a proof and extracting a program is a better way to create real world software?

Hard is different from impossible

Jules Jacobs quoting Rich Hickey: ...the type system is irrelevant in that regard. The connection between the outside world and the premises will always be outside the scope of any type system.

...is falsified by:

"We... describe a simple procedure to extract from a formal proof of U ⇒ V (formalized in classical second-order arithmetic) and a falsifying instance of V a computer program that performs a finite sequence of tests on the empirical theory U until it finds (in finite time) a falsifying instance of U."

This dispenses with Rich's claim that such a procedure can't exist.

Jules Jacobs: The difficulty and risk is in the step from informal requirements (like "this web app will turn a profit") to formal specification...

Fair enough. However...

Jules Jacobs: So you might as well do the implementation directly, especially if that is faster than doing full formal specification (which with current tools it definitely is, and will remain so for the foreseeable future).

I already posted counterexamples on reddit. You seem to labor under a few misapprehensions:

  1. Taking advantage of the Curry-Howard Isomorphism means extracting code from proof developments in Coq.
  2. Taking advantage of techniques like those described in Lightweight static resources has an unacceptable cost:benefit ratio.
  3. The tools provided by languages for web development such as Opa, Ocsigen, or Ur/Web are inapplicable to aspects of interest of the domain of your particular application.

My assertion, assuming that I've understood your arguments correctly, is that all three of these claims are false.

This dispenses with Rich's

This dispenses with Rich's claim that such a procedure can't exist.

I'm not sure why you think it does. Can you explain further? What Rich is saying is what you seem to agree with in your next sentence: "The difficulty and risk is in the step from informal requirements (like "this web app will turn a profit") to formal specification...". He's saying that no type system will tell you that you formalized concepts like "debt" correctly, it can only tell you that your program and whatever formalization you have of debt are consistent with each other. Note that it's not even remotely clear how you'd formalize a human concept like "debt".

My assertion, assuming that I've understood your arguments correctly, is that all three of these claims are false.

You have not. Actually, I disagree with all three statements just like you do. The things I said were quite different:

On 1: It seemed to me that you were confusing the theorems you get out of the Curry-Howard correspondence with the theorems you get out of type soundness. The theorem that CH associates with a type like "int -> int" is *not* the theorem saying that at run-time a particular procedure will return an int when given an int. That's what you get out of type soundness. The theorem you get out of CH is the much more trivial "int => int" where => denotes implication, which is trivially true.

On 2: I didn't say anything about the usefulness of those techniques. I just said that the theorems that come out of CH applied to those types are fairly trivial (although the theorems you get out of type soundness are not!).

On 3: Again, I said no such thing. I merely remarked that you can get most of the static guarantees touted by Ur/Web in modern dynamically typed web frameworks by using them idiomatically. For example, SQL queries you make by using their database APIs idiomatically are well formed by construction. Similarly, template languages like HAML ensure that HTML tags are correctly balanced.

I have already responded to all of this in greater detail on Reddit some time ago. I'd suggest returning the discussion to there if you are interested in having it because all the context is there and moving to another web site mid-discussion is probably irritating to members of both web sites.

Briefly and partially

Jules Jacobs: I'm not sure why you think it does. Can you explain further?

The paper relates falsifiable experimentation in the physical world to formal proof in such a way that falsification of the logical conclusion can be used to derive an experiment in the physical world to falsify the antecedent. I suppose you could argue against this on at least one of two grounds:

  1. Find a flaw in their epistemology.
  2. Demonstrate that there are meaningful activities in the physical world that can't be described by their epistemology.

Jules Jacobs: He's saying that no type system will tell you that you formalized concepts like "debt" correctly, it can only tell you that your program and whatever formalization you have of debt are consistent with each other.

Right. This paper gives the mechanism for relating the formalization to experiments in the physical world, and debt exists in the physical world.

Jules Jacobs: Note that it's not even remotely clear how you'd formalize a human concept like "debt".

Um... what? Rich was talking about debt as a formal concept as it was ("money owed"). This isn't metaphysics or philosophy. No one is talking about "debts of gratitude" or what have you. :-)

Jules Jacobs: It seemed to me that you were confusing the theorems you get out of the Curry-Howard correspondence with the theorems you get out of type soundness... I didn't say anything about the usefulness of those techniques. I just said that the theorems that come out of CH applied to those types are fairly trivial (although the theorems you get out of type soundness are not!).

It sounds to me like you believe there's a difference when there isn't.

Jules Jacob: I merely remarked that you can get most of the static guarantees touted by Ur/Web in modern dynamically typed web frameworks by using them idiomatically.

Then they're no "guarantee" at all, since they can't be seen to hold until someone (perhaps other than you) runs them. I happen to agree, philosophically, that one should either take the phase distinction seriously or not bother having one—that is, at the other extreme end from the interests I'm expressing here are systems in which everything is first-class and reflective/metacircular, hence my interest in Kernel. But "static" and "guarantee" have a specific, technical meaning that you don't get to just handwave away by claiming you can have them at runtime in dynamically-typed languages. That sort of thing is what I mean when I use the term "incoherent."

Jules Jacobs: I have already responded to all of this in greater detail on Reddit some time ago. I'd suggest returning the discussion to there if you are interested in having it because all the context is there and moving to another web site mid-discussion is probably irritating to members of both web sites.

It's a fair point, but I'll likely add a link from there to here, as I find LtU's editing considerably less annoying than proggit's, and the input here of considerably higher quality.

The paper relates

The paper relates falsifiable experimentation in the physical world to formal proof in such a way that falsification of the logical conclusion can be used to derive an experiment in the physical world to falsify the antecedent.

The paper provides a way to go from a falsification of a proof to an experiment in the physical world (where "physical world" is a mathematical model of the physical world...). What you'd need to disprove Rich's claim is completely different: you'd need a method that given ill defined human language requirement produces a formal model that has the property that if a program satisfies the formal model, then it matches the requirements. It's quite obvious that such a method would be godlike omniscient. Consider, for example, the requirement that a plane shouldn't crash.

Um... what? Rich was talking about debt as a formal concept as it was ("money owed").

Since when is "money owed" a formal concept? Last I checked there weren't any mathematical definitions of "money owed" that would reject programs that calculate with "money owed" in a wrong way.

It sounds to me like you believe there's a difference when there isn't.

Could be, I'm not even dilettante on the matter. I'll repeat what I already said on reddit for context. My naive thinking tells me that "the program p returns an int when given an int" (which you get from the type checker accepting your program typed as int -> int combined with the soundness proof of the type checker) is quite a different theorem than "int implies int" (where int is a finite or infinite disjunction, depending on your language). The latter theorem is trivially true and doesn't even refer to the program p.

Fortunately, we have many experts on the topic :) Perhaps one of them could clarify the situation?

Then they're no "guarantee" at all, since they can't be seen to hold until someone (perhaps other than you) runs them. I happen to agree, philosophically, that one should either take the phase distinction seriously or not bother having one—that is, at the other extreme end from the interests I'm expressing here are systems in which everything is first-class and reflective/metacircular, hence my interest in Kernel. But "static" and "guarantee" have a specific, technical meaning that you don't get to just handwave away by claiming you can have them at runtime in dynamically-typed languages. That sort of thing is what I mean when I use the term "incoherent."

I didn't say you get those properties as static guarantees in dynamically typed languages. Obviously not. Like it or not, from a practical perspective the difference between statically guaranteed correct and no guarantees at all is not perfect vs bad. Correctness isn't black or white. There are different tools, and those tools have a different (unknown) payoff probability distribution that says how likely it is to end up in a situation that you like amount X, where X is some number that describes how much you like the situation. For example if you make a stock market trading program, X might be dollars earned. One aspect of this is how probable it is that a mistake will go into production. The purpose of static checking in tools like Ur/Web is to reduce that probability. What I'm saying is that modern dynamically typed web framework reduce much of the same classes of mistakes that Ur/Web's static checking does, for example danger of non well formed SQL can be reduced by using the framework's database APIs instead of writing raw SQL.

From our earlier exchanges it seems quite probable to me that you disagree with this worldview, and that you just define "correct" as "matching the given formal specification at hand" and that's the end of the story (I believe you already said something like that a long time ago but I could be wrong). Then we have to agree to disagree.

Curry Howard

My naive thinking tells me that "the program p returns an int when given an int" (which you get from the type checker accepting your program typed as int -> int combined with the soundness proof of the type checker) is quite a different theorem than "int implies int" (where int is a finite or infinite disjunction, depending on your language). The latter theorem is trivially true and doesn't even refer to the program p.

Under the usual Curry Howard correspondence, it is equally trivial that p takes an Int and returns an Int, because p is a typed program. The deeper theorem you're talking about comes from splitting a typed program into an untyped program and a theorem about that program. Also, be warned that the Curry-Howard correspondence has mystical significance and everything ultimately follows from it.

Mysticism

Matt M: Also, be warned that the Curry-Howard correspondence has mystical significance and everything ultimately follows from it.

Heh.

I do find it funny when people take the observation that it always holds to somehow mean that all types/theorems must therefore somehow be relevant to a domain. I think Jules' example is a great one. Is "int -> int" likely to be relevant to any domain we can think of off the top of our heads? Of course not. But always hold it does, which means two things, pragmatically:

  1. Given a sufficiently rich type system, spelling out a type can completely specify, at least intensionally, the behavior of a function of that type. This gets us code extraction from tools like Coq that's correct by construction. How costly this is depends on the current state of the art in proof automation.
  2. Given a much less rich type system, you can still construct types that enforce very useful constraints in a domain, but rely on type inference to enforce them throughout the rest of the code. This gets us techniques like Lightweight static capabilities. How costly this is depends on the richness of the type system and the developer's mastery of it.

My core theses are:

  1. 2. above is really cheap iff you choose the right language and learn it moderately well.
  2. Both 1. and 2. above stem directly from the Curry-Howard Isomorphism with no mysticism whatsoever, other than its universality, which is no more mystical than, say, universal quantification is.

Then as far as I can see

Then as far as I can see you, Rich and me are almost in complete agreement, excluding perhaps our beliefs about the relative costs/benefits of certain type systems.

Exactly. As programmers

Exactly. As programmers we're dealing with "the program p" as a string of characters in a file. The usefulness of the type system comes from the fact that if it accepts the program, then "the program p returns an int when given an int" is true at run time, and NOT in whether "int implies int" is a theorem.

What was the name for the school of thought that programs have meaning irrespective of their types vs the school of thought that programs that don't type check are meaningless?

Do you mean Church-style vs. Curry-style?

Edit: Wikipedia

Yes


Both

Jules Jacobs: The usefulness of the type system comes from the fact that if it accepts the program, then "the program p returns an int when given an int" is true at run time, and NOT in whether "int implies int" is a theorem.

The whole point is that those are two different ways of saying the same thing (in intuitionistic logic). The benefit is that you can reason about the latter without actually having to execute the program. As the types get more sophisticated, the value of that increases pretty dramatically.

Jules Jacobs: What was the name for the school of thought that programs have meaning irrespective of their types vs the school of thought that programs that don't type check are meaningless?

There isn't one, since there is no such school of thought as the latter one you describe. The closest we get to it in computer science is the difference between untyped and typed lambda calculi, both of which were invented by Alonzo Church, and neither Church nor anyone since has suggested that the untyped lambda calculus or its descendants are meaningless—only that today, the cost of taking advantage of types to prevent entire classes of errors is sufficiently low that more segments of industrial software development should do so.

Beyond that, my personal intellectual axe to grind is with those whose rhetoric leads me to believe that they're disavowing the very possibility of a relationship between type theory and "the real world," which is somehow always beyond the reach of mere mathematical logic. This is fatuous intellectually ahistorical nonsense. The economic question is all well and good; suggesting that types have nothing to say about the physical world is simply wrong.

Jules Jacobs: The usefulness

Jules Jacobs: The usefulness of the type system comes from the fact that if it accepts the program, then "the program p returns an int when given an int" is true at run time, and NOT in whether "int implies int" is a theorem.

The whole point is that those are two different ways of saying the same thing (in intuitionistic logic).

I think Jules' point is that the theorem 'int -> int' corresponds to there being some program -- any program -- that takes ints to ints. The easiest proof corresponds to the identity function. The judgment "p: int -> int" (read p proves int implies int) is what corresponds to the useful property of p.

Emphasis

This is the key thing that is being missed. The 'p' doesn't disappear when going across the CH bridge, nor does CH state that the formula represented by the type is true. CH states that p : int -> int is equivalent to p is a proof of int => int. CH also states that program transformations, and thus particularly reduction, are equivalent to proof transformations. So things like subject reduction are also covered.

The whole point is that

The whole point is that those are two different ways of saying the same thing (in intuitionistic logic).

I don't see how that can be true when the former is making a statement about a program as a string of characters (see the sentence before the one you quoted) and the latter makes no reference to said string of characters. Can you explain?

Suppose we are writing a factorial program. Why would we use a type system? Because we are trying to prove the theorem "int -> int" by writing a factorial program and then checking it with the type checker? No. We are using a type system because factorial : int -> int tells us something useful about the behavior of factorial program. The same goes for lightweight static capabilities. We don't write the programs to prove the types, which even in that case are trivial to prove. We use the types + type checker to prove that the capabilities don't leak.

There isn't one, since there is no such school of thought as the latter one you describe.

Quoting from the wikipedia page Matt linked to:

An intrinsic/Church-style semantics only assigns meaning to well-typed terms, [...]

Note that this isn't at all making value judgements about untyped languages or anything. I was asking for the name of the above school of thought, not about some group of people who hold strong opinions against untyped languages.

Beyond that, my personal intellectual axe to grind is with those whose rhetoric leads me to believe that they're disavowing the very possibility of a relationship between type theory and "the real world," which is somehow always beyond the reach of mere mathematical logic. This is fatuous intellectually ahistorical nonsense.

Perhaps you should find somebody to grind your axe with then. Rich was simply saying that no theorem is going to prove that your program matches real world requirements, like "handles money correctly". You can formalize how to handle money correctly all you want, but then you've just moved the correctness problem to the formalization. We perceive the world through limited sensors, and there are always going to be multiple possible worlds that match with the input we experience. However good our beliefs fit our past experiences, our knowledge of the physical world will always be just beliefs.

You can formalize how to

You can formalize how to handle money correctly all you want, but then you've just moved the correctness problem to the formalization.

Fortunately, auditing specs are a more manageable problem than auditing whole programs. I don't see how anyone can think that's not a win. Certainly there's a way to go to make auditing specs simpler, but I don't think there's any evidence to suggest we can't make headway on this problem. This domain is almost completely unexplored, so any definitive statements seem like pure speculation.

Sure, nobody is denying that

Sure, nobody is denying that auditing specs can be easier than auditing implementations. That brings us back full circle: see my comment here, the part after the quote.

No matter how much simpler it may be, you your specification isn't going to prove your assumptions about the world. Which is what Rich said:

Statically typed programs and programmers can certainly be as correct about the outside world as any other program/programmers can, but the type system is irrelevant in that regard. The connection between the outside world and the premises will always be outside the scope of any type system.

Also, as I said above, it's not just about correctness. It's about overall effectiveness of a methodology. For the software that I use personally, bugs aren't a significant problem at all. I'd happily trade a few bugs for more usability. Unless formal specification + implementation + proof is going to be cheaper to do than just implementation, you have a trade-off between that and other things money can be usefully spent on. I don't think it's an obvious win at all.

Statically typed programs

Statically typed programs and programmers can certainly be as correct about the outside world as any other program/programmers can, but the type system is irrelevant in that regard. The connection between the outside world and the premises will always be outside the scope of any type system.

I think the last statement is likely correct, but his last statement does not follow from his first statement. For instance, I can use a spec for the outside world designed by a physicist and an implementation for simulation purposes designed by a programmer that is then verified against this spec. The type system is in no way irrelevant in this scenario, it is in fact key to composing the expertise from two different sources.

How would you get the same level of assurance in a dynamically typed language? The implementation itself must be written by an expert in both domains. We're not at the point where the above is easy, but I'd say we're certainly on our way.

Asking for the same level of assurance is unfair

... since the dynamic guys generally agree that formal methods do produce higher assurance against certain types of bugs. How would they deal with your scenario? Presumably the physicist would write tests.

This violates the separation

This violates the separation of concerns though (physicist writing code), unless we have a DSL that easily allows the specification of such propositions. But this is just a model/logic that we'd rather check statically, since random testing won't ensure conformance in all cases.

This is also treading in the territory covered by this paper on the connection between falsification and implication. Interesting discussion!

You're right. Rich would

You're right. Rich would presumably count the physicist with the programmers though. He worded the statement carefully; he's not saying that the program won't be more correct, just that the assumptions won't be more correct.

Bring on the zombie horses

There seems to me to be quite a gap between this statement:

no theorem is going to prove that your program matches real world requirements, like "handles money correctly".

...and this:

The connection between the outside world and the premises will always be outside the scope of any type system.

The former seems true in general, whereas the latter requires a big qualifier: that aspects of the connection in question can't be proved by a type system, but that doesn't mean that we can't do anything to improve the static model of that connection, and benefit from the results. In fact, we can do a lot. The above quote relies on a false dichotomy that throws up its hands at the prospect of static modeling of a specification, and pretends it's no use to even try; but that's not true.

For example, it's already been discussed that in the tuple example upthread, debits and credits can be modeled as types, and this allows the type system to prove that the uses of debits and credits in a program follow certain rules.

You propose that this "just" shifts the problem:

You can formalize how to handle money correctly all you want, but then you've just moved the correctness problem to the formalization.

But the "just" here is unjustified. Whereas previously, we had programs in which every use of one of these tuples was potentially at risk for an incorrect ordering of its values, with an appropriate type model that risk is eliminated, and it can be statically proved that all such tuples have the debits and credits in the correct place. It then becomes impossible to compile an entire class of erroneous programs that contain incorrect tuple ordering.

Now, it is of course possible that such an implementation of debit and credit types could contain a bug, but this doesn't negate the improvement achieved by eliminating the class of tuple-ordering bugs.

Further, one of those "(un)reasonable effectiveness" situations arises here to help combat bugs, since the type implementation abstracts and centralizes the implementation of debit/credit arithmetic.

Instead of using generic numeric operators throughout a program, which raises the possibility of an incorrectly performed debit/credit calculation at every use site, we've done what good programmers do and not only abstracted and centralized that operation, but set it up so the typechecker can reject incorrect uses of the types and their operations.

If there's a bug in the debit/credit implementation, the bug exists in one place, instead of potentially in an arbitrary number of places scattered throughout the code. Since such a centralized bug potentially affects every use of debit/credit arithmetic, it's much easier to detect and isolate than if we had erroneous uses of tuples or generic arithmetic scattered throughout a program.

But I'm basically describing applications and benefits of the DRY principle here, which only underscores how misleading that original tuple example is. It's a straw man which says "if you don't model a problem well, a type system isn't going to help you."

Even in dynamically typed languages, data structures or objects are often used to do this kind of centralization. You could implement debit & credit objects and centralized operations on them in such a language, and the main difference then between the typed and untyped implementations would then be whether errors are detected statically or at runtime.

One reason this often isn't done in dynamic languages is that the benefit isn't as great (no static checks) and it incurs overhead - exactly the kind of overhead that a compiler with a type system can optimize away.

So there's an argument here that type systems make precise modeling of the problem domain more useful and more viable.

It can be difficult to recognize this if one is used to creating imprecise models, due to a language-imposed combination of cost and low benefit to do otherwise. Reduce that cost and increase the benefit of precise models, as a good type system can, and the situation changes quite a bit.

Reduce that cost and

Reduce that cost and increase the benefit of precise models, as a good type system can, and the situation changes quite a bit.

Right. We have quite a long way to go with reducing the costs and increasing the benefits, but perhaps eventually we will get there. Whether that approach will ever take over is questionable though. Perhaps the cost of implementation will remain cheap enough relative to formal specification + implementation/proof that the reduced probability of bugs offered by the latter approach will continue not to outweigh the extra cost for most software.

As Abelson & Sussman almost wrote...

"Programs, and types, must be written for people to read, and only incidentally for machines to check and execute."

We have quite a long way to go with reducing the costs and increasing the benefits, but perhaps eventually we will get there.

The future is unevenly distributed. A lot of the distance to be covered involves mainstreaming of existing technology.

the reduced probability of bugs

The benefits go beyond that. There are benefits to precise modeling of the problem domain that go beyond reducing bugs or detecting them early. The ability to reason about code is a critical factor in software development, and types help with that.

Think of types as a code comprehension tool that happen to have some side benefits, such as allowing compilers and other tools to also better comprehend programs, with the additional benefits that brings.

...the extra cost for most software.

That seems like an unsupported assumption to me, and raises the question of how and where the cost measurement is being performed.

Take mergesort in Coq, a

Take mergesort in Coq, a problem that's unlike most software, seems almost made for formal verification.

There is a long way to go, and it's not just mainstreaming of existing technology.

Related Work

First: it's great to see Anton again, and as always, he makes the points I wish to make more succinctly and patiently than I seem capable of.

Jules Jacobs: There is a long way to go, and it's not just mainstreaming of existing technology.

Strong specification using dependent types is admittedly extreme (specifically, the top-right-rear corner of the lambda cube :-) ) so I agree it's worth focusing on systems that are closer to hand. Elsewhere I pointed to Lightweight static capabilities and lightweight static resources. I'm still interested in a critique of the results, vs. whether you feel what's "proved" by the "theorems" that the types represent is "trivial" or not.

I'm not sure that I

I'm not sure that I understand your question, or rather, I'm pretty sure that I don't ;) Could you elaborate? For example what do you mean by results?