I am pleased to announce that the latest issue of The Monad.Reader is now available.

The Monad.Reader is a quarterly magazine about functional programming. It is less-formal than journal, but somehow more enduring than a wiki page or blog post. I think it may be of interest to many of the readers of LtU.

Issue 7 consists of the following four articles:

• Matthew Naylor
A Recipe for controlling Lego using Lava
• Yaron Minsky
Caml Trading: Experiences in Functional Programming on Wall Street
• Duncan Coutts
Book Review: “Programming in Haskell” by Graham Hutton
• Dimitry Golubovsky, Neil Mitchell, Matthew Naylor
Yhc.Core – from Haskell to Core

Feedback and comments are very welcome.

## Comment viewing options

I read the Caml Trading article. It would be interesting to hear much more detail about how they exploit the type system to useful ends. That would quiet the little voice in my head saying "it would have been easier to use Lisp" :-)

I'm a bit depressed that the cited problem areas for Ocaml are the same ones that made it unappealing to me more than five years ago when I was taking it for a spin. :-|

### Except...

Luke Gorrie: It would be interesting to hear much more detail about how they exploit the type system to useful ends. That would quiet the little voice in my head saying "it would have been easier to use Lisp" :-)

Except, of course, for the reasons that it would be impossible to do what they do in Lisp, short of writing a statically-typed DSL in Lisp. :-)

More seriously, what's described in the Monad Reader is all stuff that's been discussed here on LtU at length: see Lightweight Static Capabilities, Eliminating Array Bound Checking through Non-dependent types, and Number-Parameterized Types for good launching-off points. More generally, all of the posts on "phantom types" and/or "GADTs" are also worthwhile.

### Indeed

Phantom types (and GADTs one of whose precursors was known as "first class phantom types") are a simple but powerful example of things you cannot do in a dynamically typed language.

### Objects and Generics

I found two critiques from the article interesting, and was wondering if you or anybody else had an opinion on them: generics and objects. The latter is particularly interesting because of the recent discussion about OCaml objects. I'll quote the critiques here:

Generic Operations

One of the biggest annoyances of working in OCaml is the lack of generic printers, i.e. a simple general purpose mechanism for printing out human-readable representations of OCaml values. Interestingly, object-oriented languages are much better at providing this kind of feature. Generic human-readable printers are really just one class of generic operations, of which there are others, such as generic comparators and generic binary serialization algorithms. The only way of writing such generic algorithms in OCaml is through use of the macro system, as we have done with the s-expression library. While that is an adequate solution for us, OCaml could be improved for many users if some generic operations were available by default.

Objects

It has been my experience and the experience of most of the OCaml programmers I’ve known that the object system in OCaml is basically a mistake. The presence of objects in OCaml is perhaps best thought of as an attractive nuisance. Objects in ML should be at best a last resort. Things that other languages do with objects are in ML better achieved using features like parametric polymorphism, union types and functors. Unfortunately, programmers coming in from other languages where objects are the norm tend to use OCaml’s objects as a matter of course, to their detriment. In the hundreds of thousands of lines of OCaml at Jane Street, there are only a handful of uses of objects, and most of those could be eliminated without much pain.

### Generic Operations

Generic Operations

I agree, as do many in the OCaml world; long have they awaited the coming of the GCaml. Alternatively, OCaml could potentially implement intensional type analysis of some form to achieve the same effect.

The presence of objects in OCaml is perhaps best thought of as an attractive nuisance. Objects in ML should be at best a last resort.

Agreed as well.

### A Couple of Thoughts

It's hard to write a generic printer without some kind of representation of types at runtime. Fortunately, there is a dialect of O'Caml that provides such a feature: MetaOCaml, and (who else?) Oleg Kiselyov has written such a generic print function for MetaOCaml.

As for objects, one of the big lessons of O'Caml, IMHO, is that... well... in a language with decent support for abstract data type creation by other means, an object system really isn't that useful. Most things that I've used objects for in the past were cheap simulations of abstract variant types. An actual module system and actual variant types serve my purposes quite a lot more nicely than abusing classes and inheritance does.

### This is my one big complaint

This is my one big complaint about O'Caml, really. I'd love to see a language which is very close to O'Caml is many ways, with type classes as seen in Haskell.

### I was discussing this the

I was discussing this the other day, based on my vaguely contemplating trying to add type classes in in SML in an ad hoc manner (basically just expanding them to record passing in a macro like manner). The general conclusion seems to be that adding them in like that doesn't work very well, and that one is better off trying to get something along the lines of modular type classes.

I'd definitely be intrigued by the results, but it's moved out of the "simple thing I could hack together" category, so I'll probably not do anything to it myself. :-)

### Typeclasses in ML

Actually, implementing simple typeclasses in ML (at least, OCaml) is
relatively straightforward. Even polymorphic-like recursion can be supported,
which often occurs in methods. Here's an example:

http://caml.inria.fr/pub/ml-archives/caml-list/2007/03/b716400996f50b94db2c01d2deabfb58.en.html

Although the implementation is a faithful translation of Haskell code,
the explicit use of dictionary passing is quite an annoyance. The
translation replaces double arrow with a single arrow. Thus we have to
supply the necessary argument, which is a witness of a particular
constraint. In Haskell, these arguments are inferred -- which is very
nice. The constructor classes (classes parameterized on type
constructors) require the use of the module system of OCaml.

Gavin Harrison wrote: I'd love to see a language which is very close
to O'Caml is many ways, with type classes as seen in Haskell.''

So does Greg Morrisett. That language, he called ZML', was the gist
of his invited talk at the ML2005 workshop. It seems Mitchell Wand has
written a transcript:

https://lists.ccs.neu.edu/pipermail/prl/2005q4/001103.html

### Type-indexed values

Actually, implementing simple typeclasses in ML (at least, OCaml) is relatively straightforward.

Indeed, there are quite a few ML papers on the subject of type-indexed values. See TypeIndexedValues in the MLton wiki for a number of references to relevant papers. Some further techniques have also been discussed on the MLton users mailing list: Combining type-indexed values using functors or how to get two for the price of one and A new approach to type-indexed values in SML. As a practical example, the MLton library contains an implementation of a unit testing framework with QuickCheck -style support for randomized testing employing type-indexed values. See unit-test.sml and qc-test-example.sml.

### I love objects! (but not OO)

Personally, I love O'Caml objects. They provide a great way of getting around variance issues, by allowing one to split covariant operations from contravariant operations, while still sharing a single closure. (I forget who, a while ago, pointed out that objects are just closures with multiple entry points.) They also provide a nice dual to polymorphic variants; I find myself using objects instead of records in the same places where I would use polymorphic variants instead of a plain valriants (i.e. places where case sharing, subtyping, and extensibility are important).

That said, I think objects try too much to be like objects :) Everything above could still be accomplished without instance variables, initializers, classes, or inheritance. (Although since classes provide an interesting dual to polymorphic variant abbreviations, I would hate to see them disappear! And to be useful they really require both inheritance and initializers.) Instance variables, though, should be done away with IMHO; their only real value is that they allow inheritance of variables, which is a questionable idea in itself.

I'm curious if anyone else uses objects the same way I do, and if so if they share my same thoughts.

### Objects = closures w. mult. entry points

Talk of Xavier Leroy says that.

But I think that applies only to OCaml objects.

### No

This is pretty much exactly how E's object system works.

### As do they all

More generally, in almost any OO system an object consists of some code packaged together with an environment (i.e. instance variable bindings), which is pretty much the definition of a closure.

### Right!

That's where I read it... good memory! Those are great slides... they really help carve out the boundaries of modules and objects. (They're a little outdated though... Caml's modules have gotten a bit more powerful since then with recursion.)

### The term 'generics operations' might be misleading

Ocaml has parametric polymorphism and functors so you can do a lot of generic stuff.

The real 'problem' is that caml doesn't have any introspection capabilities. In the printing example, in an OO language you just say that everything as a .toString() method and call that. For an ADT, how does the system know what function to call to print out something of type X? You need to have a generic function that dispatches via the type in an extensible manner at runtime. And since you can't write something like 'type(X)' in ocaml, it's a little hard to perform that dispatch.

I don't know about the technical feasibility of adding introspection to ocaml, but I get the feeling if that feature was there, it'd break ocaml's much hyped (and rightfully so) type-safety.

And that carries over into the comments on OO. Objects in ocaml are basically there for stuff where you want to dispatch at runtime, or duck type, or whatever you want to call it, on an unknown type. For example, most of the GUI front ends use ocaml objects because that's the easy way to call 'onClick' or 'repaint' or 'onDoubleClick' for a random unknown widget.

### Unsexy types

Many of the responses to your original question suggested that what makes ML useful is various advanced typing tricks like phantom types and type-indexed values. I can confirm from my experience that these are really useful tricks, and we use them more and more.

But I think people's eagerness to show off the neat stuff leads them to under-emphasize the fact that most of the advantage of types in a language like ML comes from completely vanilla uses of the type system. One of our programming maxims at Jane Street is to "make illegal states unrepresentable". Algebraic datatypes, parametric polymorphism and abstraction can be used to great effect in this regard.

I think that people who are new to ML think of the type system as merely a constraint, that is to say, you write code as you did before, and then you see whether the compiler accepts it. As people get more experienced, they start to use the type system as a tool in its own right, and one of the things that is constantly on one's mind is how to pass off as much of the proof burden as possible to the compiler. Most of this work can and should be done using boring old vanilla ML types.

### I would tend to agree. The

I would tend to agree. The major thing to get across is the power (and fun) of typeful programming.

### Code examples?

I'd be interested to see source code examples of your most useful types, so that I can consider how I would approach the same problems.

Some problems that you can solve with type systems can also be solved in other, better, ways.

EDIT: If it were any other topic I would not add the disclaimer that this comment is not intended to be inflammatory in any way. I've studied both ML and Haskell to some extent and I respect both.

### Examples

It's hard to come up with pithy examples of the utility of types because small examples are always kind of trivial (except for the sexy-types examples, which are cool and fairly compelling, but in reality are not the main story.) I guess if you want one particularly compelling and pervasive use of ML-style types, I would point to the pattern matching support, and in particular the exhaustiveness checking done by the compiler.

The thing I love about ML-style types is that they make programming and reading other people's code pervasively easier. There's just a whole class of errors that you can stop worrying about because they can't happen. Every individual example you point to seems trivial; but the aggregate effect is huge.

I think that the value of types gets bigger as your codebase gets larger and more people hack on it. Types are a great way of encoding, communicating and enforcing invariants, and the more people involved, the more important that is. I don't want to overstate things. ML-style types are hardly a panacea. But for the kind of work that we've been doing, they've been an incredibly good fit.

### Making life simple

Thanks for the well-written reply. I feel the same way about some of my favourite abstractions, e.g. Unix processes instead of threads also eliminate whole classes of error that you can simply stop thinking about. Invariants are good things!

### chocolate and peanut butter?

hm, is Alice ML the closest mix? If only it had a debugger (as Erlang does).

### Type-level hackery

Some problems that you can solve with type systems can also be solved in other, better, ways.

I agree with Luke here. Embedding a Prolog interpreter into Haskell's type system or adding heterogenous lists, or lambda calculus in C++ templates are all nice hacks, in the "How far can we push it?" sense. But there must be better, more direct ways to do this which the author, and more important, fellow programmers, can still understand when coming back to the code after a month. If not, maybe the language needs revising or is not suited for the problem at hand.

Sometimes, there are hidden costs attached to a solution, for example incomprehensible error messages from the compiler. Page-long error messages are familiar with C++ template meta-programmers, and I have seen Haskellers looking puzzled at the output of GHC, too.

I have seen demos where FACT compiling simple functions for ages because of all the evaluation happening at compile time. If we consider a frequent compile-debug-run cycle as promoted in test-driven development (and I believe also for statically typed languages this is good practice), it should be obvious that is not the way to go.

One of my favourite quotes is from Brian Kernighan:

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

Myself, I am flipping back and forth between languages with static type systems, and with run-time type systems, whatever is more appropriate. Sum types and DSLs with rigid checks blur the line pretty much.

### Carpentry

There are attached to many modern power tools, flimsy plastic safety guards designed by committees of lawyers. These are not only obnoxious but dangerous since they force the tool user to focus on holding them out of the way instead of on the actual cutting.

On the other hand, there are jigs and pushers (such as the fence on a table saw) that are very helpful and without which it would be virtually impossible to cut a straignt line, or a circle, or edge a small piece on a shaper.

In my limited experience, OCaml's type system does a decent job of being more like the latter than the former.

Josh

### Good Point

One of the things that I feel got lost in the shuffle of objects is a strong notion of the abstract data type: all you can do with them is apply the exported operations of the type on them. So once you've identified those, put them in a module, keep the type abstract, and Bob's your uncle. It's true; you can make a tremendous number of illegal states unrepresentable doing nothing more than that, and that code (to me) ends up having a rather remarkably Pascal-ish flavor to it. It can actually be a relief, in my experience, to be able to stop there, and not have to worry about how this type might be extended, variance issues, etc.

### Illegal states

One of our programming maxims at Jane Street is to "make illegal states unrepresentable". Algebraic datatypes, parametric polymorphism and abstraction can be used to great effect in this regard.

This is indeed a very good technique. I use this in Haskell a lot, for example by avoiding "naked" data structures. I tag them with newtype instead and make sure that operations only accept those.

Contrived example:

newtype Sorted a = Sorted a
sortThings :: [a] -> Sorted [a]
-- ...
operateOnSortedThings :: Sorted [a] -> b


So, in other words, I am using ADTs, and over time I have caught quite some bugs which went unnoticed in my Haskell code because they were obscure corner cases (although we have manually written test cases, and even a random test generator). However, this approach is not limited to languages which are statically checked. Assertions and proper ADT design help a lot. Whether I have to write assertions on the type system level, or as assertions, makes little difference to me.

### Illegal states

The main difference it makes is that assertions in the type system help your prove that your invariants are correct when the code is written, while ordinary assertions tell you when your invariants happen to be violated at runtime. That's a pretty big difference in my book...

### Illegal states

You are certainly right.

I will go and reflect on why I don't feel much of a difference. Perhaps I start testing earlier in a language without static type system, and thus find errors normally caught by a type system while doodling with the code interactively, or via the test harness. Which is not as good as a correctness proof by the type system, of course.

### Proofs considered incidental

I wrote a bit about this in a comment entitled The Unreasonable Effectiveness of Testing.

The summary is that the same properties of programs that type inferencers rely on (type dependencies between terms) are exploited by tests, so it doesn't require a huge number of tests to achieve a high degree of type correctness for a program.

In most cases, the correctness proof itself is not the most important advantage of static checking of a program. Rather, it's the type discipline that must be followed in order to achieve that correctness proof. The final proof is like the certificate you get when you graduate from college: it's not that important by itself, but to have obtained it you must have done a lot of work, at least some of which is important and necessary. You can do the work without getting the certificate, but many people don't, and in the programming case they may pay a price for that in terms of programs with poor type discipline, which can have consequences for reasoning and the "ilities".

Note that there may, of course, be situations where for some reason or other, a type soundness proof is considered essential. But those cases are fairly rare in practice. I wrote a bit more about this in the linked comment.

### Proofs considered essential

I agree with your basic point that there is a lot of overlap between the constraints imposed by a type system and those imposed by a testing regime. But I don't think that this leads to the conclusion that one can therefore do without the type system without paying a price, much as one can't give up testing without paying a price.

I find that a key advantage of type-driven programming is that it make it easier to design and enact the type discipline I want to follow. Having the verification step taken care of automatically frees me up to worry about other issues.

Moreover, writing good tests is difficult and tedious work, and reducing the scope of the things that must be tested is a big win. The reason to prefer type-based proofs when they are available is not just that they are less tedious to accomplish, but also that they are in the end more convincing than tests. Type definitions are typically smaller and easier to read and understand than test suites.

Your wrote the following in the post you referred to:

The "guaranteed type-safe in production" point is just not as big a deal in practice, much of the time, as many of its proponents would like to believe — other than from a marketing perspective, where it's a very effective hook which plays to a desire for perfection. Many people like to believe that their systems are just about as mission-critical as ICBM control software, and require as much scalability as Amazon or Google. In this context, static checking seems like a no-brainer.

The future applications of static checking to other kinds of properties, like concurrency, sound very promising, and could very well provide some much stronger arguments in favor of static checking, but at the moment these seem almost entirely academic.

Our systems aren't ICBM control software and we don't need to scale quite as big as Amazon, but static checking is, in my experience, a huge advantage. Yes, I'd love to see fancy type systems to make it easier to write concurrent systems and do all sorts of other neat things. But even without all that, it is clear to me that the simple, vanilla ML-style types significantly improve our ability to build efficient and correct software quickly.

### Essential tests check types anyway

But I don't think that this leads to the conclusion that one can therefore do without the type system without paying a price

I said something similar in the grandparent comment: "they may pay a price for that in terms of programs with poor type discipline, which can have consequences for reasoning and the "ilities"". So I don't think we're disagreeing on that point.

I was mainly observing that the emphasis on proof as the primary benefit of static checking is, I think, a simplification, which can sometimes lead to the broader set of advantages of type-aware programming being underestimated or misunderstood.

I should also mention that I'm thinking in terms of dynamically-checked languages like Scheme and Erlang — both of which have various kinds of type inference and analysis tools available, which reflects a degree of type-friendliness that isn't found to the same extent in some of the more widely popular dynamically-checked languages.

Having the verification step taken care of automatically frees me up to worry about other issues.

Moreover, writing good tests is difficult and tedious work, and reducing the scope of the things that must be tested is a big win.

It isn't usually necessary to write tests to check types. Write tests to exercise functionality, and type errors will fall out as a matter of course. It's difficult to come up with practical scenarios in which a test can produce the right results while somehow concealing a type error. In practice, a good test suite is a very effective guard against type errors, even if it pays no specific attention to checking for them.

The reason to prefer type-based proofs when they are available is not just that they are less tedious to accomplish, but also that they are in the end more convincing than tests.

In an absolute sense, this is clearly true. But my argument is that if you consider having a good test suite to be a requirement, then the difference in degree of convincingness is not usually very important in practice — tests are "good enough".

Type definitions are typically smaller and easier to read and understand than test suites.

This is an excellent point, and it's why type-aware programming in dynamically-checked languages often involves commenting procedures with their types (obviously not an ideal solution), or more rigorously, using an assertion or contract mechanism (like Contracts for Higher-Order Functions) to encode the types in the program (typically at the module level). These also work well in conjunction with tests.

But even without all that, it is clear to me that the simple, vanilla ML-style types significantly improve our ability to build efficient and correct software quickly.

I don't disagree with that sentence as written, since it could apply to Scheme and Erlang just as well as ML. No doubt you'll now want to qualify the sentence. ;)

### It isn't usually necessary

It isn't usually necessary to write tests to check types. Write tests to exercise functionality, and type errors will fall out as a matter of course. [...] In practice, a good test suite is a very effective guard against type errors, even if it pays no specific attention to checking for them.

Where is the data to support your claims?

It's difficult to come up with practical scenarios in which a test can produce the right results while somehow concealing a type error.

No it isn't. For example, it is a fairly common idiom in untyped (and stupidly typed) languages to return some special value, e.g. #f, false, None, NULL, null, 0, or -1, to indicate failure or non-existence of result. The problem with this is that it is easy to forget to check for such values everywhere. A test with a valid value will produce the right results while conceiling the fact that the program fails spectacularly under some circumstances. Does NullPointerException ring a bell? Or how about "expects argument of type <X>; given #f".

### Perhaps nuanced arguments are doomed

Where is the data to support your claims?

What kind of data would satisfy you? How about the economic success of companies using languages like Python and Erlang?

I'm describing a fairly common experience for people using good testing practices with dynamically-checked languages. Joe Armstrong has made a similar observation about Erlang, backed by experience with post-hoc typechecking, as quoted in this LL1 message.

On the other side, I haven't seen any data that indicates that lack of static typechecking results in significant problems in practice, when good testing practices are used.

A big part of this issue is a practical point about the consequences of a runtime type error in production. For large classes of systems, a small number of such errors when a program is first deployed are quite acceptable, particularly if such errors occur mainly as a result of user input errors.

Again, I'm not saying that there's no benefit to static checking, but that the (limited) perfection of an automated static proof often seems to be overemphasized as a selling point in its own right. Perhaps this is because it's much easier to point to a static proof capability than it is to enumerate all the ways in which using types well can improve a program and its properties. If that's the case, then I think it's worth keeping sight of the fact that there's much more to the issue than "just" the static proof.

A test with a valid value will produce the right results while conceiling the fact that the program fails spectacularly under some circumstances.

That's true, but of course proper testing includes testing corner cases and erroneous inputs. What I was saying was that in a given test, if you get the expected output for a given input, it is very unlikely that a type error actually came into play during the computation in question. Further, by testing with erroneous input and corner cases, which is necessary in any case, the majority of type errors are typically detected. If that were not the case, I think you'd hear about it in the TDD community at least.

I also happen to believe that this applies even more strongly to largely functional programs written in a type-aware way. You mentioned NullPointerExceptions, which of course are ubiquitous in Java, which has a static type system, albeit a weak one. A big reason that NPEs are common in Java is its imperative nature: for example, variables often start out having a null value, and are later mutated to the value they're supposed to have, and may even later be mutated back to null. Reasoning about this is complicated by imperative indirection, i.e. that producers and consumers often communicate indirectly, via mutation. The "proper" solution to this problem with Java is not better static typechecking.

Besides, NPEs can also be largely eliminated during testing, so their existence (as with "given #f" errors) doesn't negate my point.

### PLT not economics

How about the economic success of companies using languages like Python and Erlang?

Sorry, I must have been reading the wrong blog. I thought that LtU was about programming language theory (PLT) rather than economics. Certainly, if someone makes money using programming language X then that definitely proves in PLT ... nothing.

Besides, NPEs can also be largely eliminated during testing, so their existence (as with "given #f" errors) doesn't negate my point.

What the hell is your point? That type errors are caught by ordinary testing of functionality, because there are successful companies using untyped languages. That's the lamest argument I've ever heard at LtU.

Seriously, the problem with testing for proper handling of illegal values is that you should then test it everywhere. Every function is a suspect and should be tested with every possible combination of illegal values. In a sane type system, however, no such testing needs to be done.

What I want is data that shows that ordinary testing of functionality is also effective against the kind of errors caught by type systems.

### Infinite testing takes too long

Sorry, I must have been reading the wrong blog. I thought that LtU was
about programming language theory (PLT) rather than economics.

On the Departments page, you'll notice quite a few categories outside the "Theory" tree, quite a few of which are difficult to reconcile with the PLT-only view. IMO, one of the strengths of LtU has been the interaction between theory and practice. Perhaps in pure mathematics, theory can flourish without reference to practice; the same is not true of PLT in general. As a supporting reference, I'll cite Alan Perlis' foreword to SICP.

In this particular case, tradeoffs do exist in typed vs. untyped systems, and the choices between those tradeoffs can't usually be made purely on formal theoretical grounds.

Perhaps a more important issue for LtU is that existing type systems do not represent the zenith of PL development. There is a significant amount of work on type systems that explores beyond the boundaries of the classic HM-inspired systems. The larger space that is being explored is a subset of the untyped space, and the characteristics of that space are quite relevant. This brings us to the issue at hand:

Seriously, the problem with testing for proper handling of illegal values is that you should then test it everywhere. Every function is a suspect and should be tested with every possible combination of illegal values.

You might think so, but you'd be wrong in practice, if your goal is to achieve working systems without spending infinite resources on testing. Static type checking doesn't eliminate the need for testing, and for realistic systems, it doesn't even change the theoretical requirement for infinite testing into a finite requirement. So we have to compromise in our testing, no matter what. It turns out that in practice, in a large number of cases, it is viable to compromise by not performing static type checks. That's an interesting phenomenon in itself, and certainly for the practice of programming, it's one that's worth having some understanding and appreciation of.

What I want is data that shows that ordinary testing of functionality is also effective against the kind of errors caught by type systems.

As I've said, like it or not, some of the best evidence of this is in the form of companies producing working, sufficiently bug-free programs in dynamically-checked languages. While it's reasonable to decide that you're not interested in such things because you prefer to work with the formality of static type systems, I think it's less reasonable to argue that such experiences should be dismissed or ignored because they don't fit your favored model.

### Still waiting for the data

You still have not provided any data to support your original claims. You said:

Write tests to exercise functionality, and type errors will fall out as a matter of course.

I think that you should write more carefully. Instead of claiming that testing of functionality catches type errors, you should say that some people make money by utilizing untyped languages, which is what you really mean.

I really have no interest to discuss economics here. I'm also not interested in debating such subjective issues as when a program is "sufficiently bug-free".

### Proof doesn't go far enough

The Erlang example is data to support my original claim. You've basically argued that it doesn't count, but the fact remains that type errors are demonstrably caught by testing that doesn't specifically focus on catching type errors.

I really have no interest to discuss economics here. I'm also not interested in debating such subjective issues as when a program is "sufficiently bug-free".

Unfortunately, most non-trivial programs cannot currently be formally proven free of bugs, so "sufficiently bug-free" is the only criterion possible. Your disinterest in discussing this is certainly at the heart of our disagreement. From my perspective, that disinterest is likely to limit the practical relevance of your theoretical observations.

[Edit: it's worth keeping in mind the definition of economics, e.g. "the study of choice and decision-making in a world with limited resources," or "examines the allocation of scarce resources among competing wants." It's pretty difficult to argue that programs have unlimited resources, or lack competing wants, or that these issues can reasonably be ignored in the PLT context.]

### The Erlang example is data

The Erlang example is data to support my original claim. You've basically argued that it doesn't count, but the fact remains that type errors are demonstrably caught by testing that doesn't specifically focus on catching type errors.

I haven't read the entire article with a microscope, but in the part of the paper talking about the type system, I see no mention of any sort of testing (not to say that they didn't do any) and no claims about catching type errors due to functional testing. IOW, your claims are not exactly backed up by the article.

See here for Joe's description of their testing process. That link is down for me right now, but the page is also in Google's cache.

### Joe's description of their software development process

Joe's description of their software development process includes elements that go beyond testing (design for fault tolerance (many separate points, so I guess this is a big thing in their environment) as well as code and design reviews). Frankly, I'm not convinced that testing is the factor that eliminated type errors from the Erlang kernel libraries. Also, Joe repeatedly makes the point that he thinks that catching type errors at compile time would be nice and is one of the techniques needed to make reliable systems.

Joe also makes a somewhat uninsightful analogy. He compares type checking to spell checking. In fact, I think that the analogy is reasonable, but only as far as the language being considered is untyped and the type system is of the post hoc kind without abstract types. The analogy breaks down when you actually consider a typed language with abstract types. Then the type system no longer checks only the kind of errors that a spell checker would check, but also checks for domain specific errors encoded by the programmers. Also, for a proper analogy with a typed language, you would need something like a spelling and grammar aware editor that only lets you write sentences with correct spelling and grammar (thus you will have no such errors by construction).

### If a type appears in a forest that's not statically checked...

Re the Erlang development and testing process, Joe's first six points are fairly standard even outside telecoms applications, or at least aspired to. I agree that the entire process helps to catch errors, that's why people do it, or try to. However, as the Erlang experience shows, static type checks are not an essential part of that process, even if they're desirable. This is particularly true if the only goal is to rule out "runtime type errors" to at least the same degree that other kinds of bugs are usually ruled out.

However, that shouldn't be the only goal. I perhaps need to repeat that I'm not arguing against the use of static type systems. I am saying that the reason that they're desirable has less to do with the static proof they provide, in most realistic cases, and more to do with the ways in which they help reasoning, promote a type-driven style, and raise conscious awareness of the static properties of programs.

This perspective does, however, naturally lead to considering to what extent it is possible to program in a type-driven (or type-aware) style in the absence of automatic static typechecks, and to what extent the benefits of that style are independent of an automatic static typechecker. Automatic tools can provide proofs, but the absence of automatic tools doesn't mean that proofs aren't possible, or that theories about programs, borne out by the empirical evidence of testing and live deployment, are incorrect. Theories about programs may even be supported by partial proofs, or intuitive reasoning. Theories about types in such programs are not necessarily post-hoc. The automated checking of types should not be confused with the presence of types.

Re the points about abstract types vs. checking of representations, I think you may be drawing an inappropriate comparison. Dynamically testing assertions about types requires a tag-like mechanism. The presence of tags at some level in a language or program, where they might not be present in an equivalent statically-checked system, doesn't necessarily mean that the former is dealing with representations while the latter is not. If you disagree, perhaps we should look at a small example.

It also may be worth re-mentioning Contracts for Higher-Order Functions, which demonstrates how checking of higher-order contracts can be performed in a dynamically-checked language. This approach can be used to test assertions about higher-order types, and more.

However, even in the absence of such tools, it is possible for programs to make use of higher-order types and have errors in those types be effectively detected during testing. The reason is simple: even though a higher-order type may not be explicitly expressed in a program, assertions and the usual dynamic tag checks can still catch problems with the components of such types. If none of the components of a higher-order type violate the associated assertions or tag checks, chances are high that the higher-order type is being used correctly. If that's somehow not the case, it's still very likely that a good testing regime will uncover the problem.

Again, I think a long history of practical experience provides evidence of this: higher-order programming in a language like Scheme would be nightmarish if it weren't possible to reason about the types of higher-order functions, and be able to reach a reasonable degree of assurance that those types are being used correctly before deploying the program.

### Crash crash crash

Note: The Erlang systems I work on are littered with type errors and routinely hit them at runtime. However the process abstraction and design principles make the systems exhibit satisfactory behaviour in the presence of these errors. Crashing is no big deal in Erlang.

### Ouch

That'll be a disappointment to Joe. :) Do these systems use type assertions at all? Do you have any sense as to what allows these errors to slip through testing? Does crash-resistance lead to a blasé attitude to errors?

I used Erlang as an example because of Joe's comments, and because his decription sounded similar to my experience with even just reasonably well-tested systems. I didn't realize that the rest of the Erlang industry were such cowboys. ;)

To quote Patrick Logan, "All we have are stories, but I think they are worth telling if for no other reason than to root out contrary stories (which I have not see[n])."

### Oddly...

...this story doesn't bother me, perhaps because other Erlang programmers have done a good job of explaining the failure-oriented philosophy of Erlang, and perhaps also because I can see how message-passing concurrency, and isolating state in mnesia instances, can help isolate invalid states in a fashion similar to what Yaron Minsky observes about O'Caml at compile time. Similarly, I don't worry much about Oz being dynamically typed thanks largely to its single-assignment logic variables. As has been pointed out in this thread, there's more than one way to avoid shooting yourself in the foot with illegal operations/states, and there's a lot of value to taking a reasoned comparative approach to them.

Incidentally, Anton, you may find Felix 1.1.3 interesting. Felix has been touched on, off and on, here, but this release seems especially intriguing: it now boasts Haskell-style typeclasses, user-definable axioms, reductions, and lemmas, and generates Why output as well as C++. Why fronts several interactive theorem provers as well as several decision procedures, once of which is Ergo, which understands Why syntax natively. Felix 1.1.3 + Ergo, generating C++, might be the most pragmatic prove-at-compile-time toolset available today, but I haven't yet tried to do anything substantial with it.

### Not so fast

I'm familiar with the crash-resistant approach, which is comparable to what's used in LAMP web applications, where a crash usually only affects the current request. However, while some kinds of errors might be transient, type errors in those environments usually aren't — give them the same input, you'll get the same error, and this usually translates to a user not getting the desired result until the code is fixed. So you really want to try to avoid them at runtime, in most cases that I'm familiar with.

I don't know what the situation is with the Erlang systems that Luke has described; presumably the type errors in question aren't preventing someone from making a phone call, or whatever (or if they are, I'd be horrified, and I don't think lack of static checking is the issue).

Thanks for the Felix link. I'm not sure that's quite my cup of tea, but I'm glad to see such things aimed at the C++ world (it bills itself as "successor to C++").

### Cold hard truth

Typically the crashes will occur in an obscure code path that hasn't occured during testing. It will affect only those calls/sessions/whatever that happen to follow the same path and are almost by definition infrequent. The developers will look in the log file, see the crash message, instantly diagnose the problem (type errors tend to be no-brainers) and apply a patch directly to the running system to make it disappear. Just like in Erlang: The Movie!

There is some disruption of service for some users and it's certainly an undesirable fault, but it's still not the end of the world. I would certainly not trade flexibility in hot code loading etc to reduce this errors since our entire way of doing business is based on those specific qualities of Erlang.

I'm much more concerned about logic errors that can e.g. cause a million unwanted SMS to be sent or a heap overflow. I don't see type errors at the root of these issues in practice.

I could run Dialyzer right now and see 20 obscure bugs in the system but it isn't clear that my time wouldn't be better spent adding new features instead. Of course if these bugs caused the whole runtime system to terminate with an obscure error message then it would be another story!

See also the discussion of Heisenbugs in Why do computers stop and what can be done about it?

NB: This is my personal opinion and doesn't necessarily reflect the Erlang world view. Each product has to make its own trade-offs on speed of releasing features, probable error rate, etc.

### but...

Today I'm hacking a compiler written in Erlang and for this particular module I would rather like to have a good static type checker. Those are nice for manipulating complex data structures :-)

### Quite

Types (like the other things we worry about) are tools. Reasonable people can argue about tools. Competent professionals know when to use different tools.

### Regarding the LL1 message

I'm not familiar with the Erlang kernel libraries, but I would suspect that having no errors in them is largely due to having almost every program exercise them so they are effectively much more thoroughly tested than what can be achieved with ordinary application code and them being much simpler and more clearly specified than typical application code.

Furthermore, abstract types are one thing that typical experiments with post hoc type systems ignore completely. Type inferencers are rarely smart enough to automatically insert abstract types into an unannotated program. In fact, I've yet to read about such a type inferencer.

The lack of abstract types in typical post hoc type systems basically means that ordinary typed programs have fundamentally stronger invariants proven by their type systems. In a typed language with abstract types you can use type abstraction to enforce (once and for all) that one can not write programs that construct illegal values. IOW, programmers utilize the type system to prove domain specific properties of their programs. On the other hand, a post hoc type system typically only verifies that a specific fragment of program code treats data representations (not abstractions) consistently (e.g. uses only string operations on strings). A post hoc type analysis typically says nothing of the code not included in the analysis (most of which is typically written after the analysis).

### Data

By this reasoning, I can point at the number of companies successfully writting software without either static typing or test suites, and conclude that neither are needed.

### Yes

That's kind of the point, yes. Even companies without test suites do testing of some kind — it's impossible not to for non-trivial programs. And that testing detects enough of the program's bugs for those companies to achieve their goals. To these companies, and to companies which use test suites, a static proof of type soundness is clearly not a very high priority. They're unlikely to be convinced of the benefits of static typechecking by being told that it will give them a static proof of type soundness. That's why, in the first comment which I posted in this thread, I wrote "In most cases, the correctness proof itself is not the most important advantage of static checking of a program."

### Let me just add...

...that I agree that "being told that it will give them a static proof of type soundness" is indeed pretty useless information without a whole lot of background information about the Curry-Howard Isomorphism and the use (or lack!) of typeful programming, vs. mere "I managed to get the code past the type checker" programming—background that obviously has nothing to do with the problem domain per se other than the need for the "typeful programming" to result in types that accurately reflect the constraints imposed by the domain. This clearly relates to the observation that there are frequently good economic reasons why a rigorous approach to static typing in an industrial setting is not undertaken, particularly given the sorry state of the currently-popular statically-typed languages.

### Interpretation

When I see claims of this nature, it reinforces an unfortunate opinion: that the writer only thinks of types as defining how values are represented, so if the tests don't reveal, e.g. that you're trying to treat a string as an integer, then you've accomplished what static typing does. Indeed, this essentially used to be my perspective, and from this perspective, "values rather than variables have types"—that is, the "dynamic typing" way—makes perfect sense.

But once you begin to think of types as "defining the permissible set of operations on a value" and using types to make invalid operations impossible at runtime or, to use Yaron Minsky's take, "make impossible states unrepresentable," you run into typing and testing's dual nature, and you're quite literally wasting your time, e.g. testing if your binary tree insert function maintains order if the type guarantees that it does (this is admittedly said in the context of an unusually expressive type system, but then, that's part of my point). This is good news: it means that you can concentrate your testing on where it counts, i.e. those aspects of your system that cannot be represented statically no matter how rich your type system is. But just as C, C++, and Java are poor representatives of the statically-typed world, the JavaScripts/Pythons/Rubies of this world are poor representatives of the dynamically-typed world: I encounter trivial type issues in them constantly, contrary to the claims of their respective camps, which is why I'm happy that research into gradual typing etc. remains ongoing.

### Not relevant

e.g. testing if your binary tree insert function maintains order if the type guarantees that it does (this is admittedly said in the context of an unusually expressive type system, but then, that's part of my point).

This argument says "just you wait, when these advanced type features trickle down you'll no longer be able to deny the awesome power of static type checking". That may be true, but it's irrelevant to the current discussion.

[Edit: unless you can think of examples of such type system features being used in practical systems today, which I'd certainly be interested to hear about.]

### I haven't read the code

I haven't read the code myself, but I believe darcs does an amount of stuff like this.

### Phantom witnesses

Thanks for the tip. I see that on darcs-devel, David Roundy describes using GADTs:

...to create phantom types that act as witnesses to the context of a patch. The idea is to attach to each patch a "before" and "after" state, and use the type system that keeps patches from being used out of order.

To oversimplify, the (Patch) type becomes (Patch a b), and one can enforce things like

commute :: (Patch a b, Patch b c) -> Maybe (Patch a d, Patch d c)


The catch is that the above type signature doesn't enforce that "d" is a "new" context, and wouldn't work. Instead we have to define lots of GADTs. The benefit is that it becomes very, very hard to write wrong commute-related code.

Nice.

### Existence Proofs...

...don't need to trickle down; they demonstrate that you already can't deny the awesome power of static type checking. Thankfully, I know that's not your argument. :-) Anyway, why is it irrelevant to the current discussion? I understand the current discussion to be precisely about how much milage we can get from basic abstract data types. Sure, I'd like it if Epigram or Cayenne or Concoqtion suddenly became wildly popular, but there's a great deal we can do short of that. Start with ML or Haskell and ADTs and with a little forethought some classes of bugs become impossible. Toss in just a dash of Olegian magic and we have lightweight static capabilities and still nary a GADT, let alone a dependent type, in sight and, e.g. in statically eliminating array bounds checks or empty list checks, there are a couple more classes of bugs that we don't have to bother testing for. GADTs make that set larger; dependent types make it larger still. A complex program will still have plenty left to test! I just like being able to knock as much stuff off of that list in advance as I possibly can given some minor investment of my time. I understand that other folks draw that economic line somewhere else, and that's fine.

As for practical systems, I think Yaron is talking about his practical systems and how, even without such rich type systems, a typeful programming approach has demonstrated advantages over an untyped approach. For some level of richer types having pragmatic benefit, I'd first refer to the liftweb thread. Finally, Philippa's point re: darcs is well-taken.

### Trickle-down to practical

Trickle-down to practical systems is still important because practical issues can reduce the theoretical benefits of some of these techniques. In particular, economic issues can come into play that are rarely considered at the theory level (nor should they be, usually).

My point about relevance is that Yaron started out describing how "most of the advantage of types in a language like ML comes from completely vanilla uses of the type system." One of the main points I'm trying to make is that we shouldn't underestimate how much of that benefit comes from a typeful approach to program design, regardless of whether automated static checking is used. In this comment, michaelw described how detection of illegal states can be done without static checking; this matches my experience, and also seems to match the Erlang experience.

Even hi-tech type systems examples like the "binary tree insert function maintains order" which you mentioned can be tested for, without requiring sophisticated type machinery (with the attendant waiting for the next version of the compiler to arrive and solve your latest problems). Certainly, you don't get proof from such testing, but you do get the ability to disprove, which physicists seem to manage quite well on.

### Perlis vs. Dijkstra

In comment-32242, Anton wrote:

Unfortunately, most non-trivial programs cannot currently be formally proven free of bugs, so "sufficiently bug-free" is the only criterion possible.

Hear, hear.

Compare the notion of sufficiently bug-free to the Lakatosian notion of believable theorems beautifully presented by De Millo, Lipton, and Perlis in Social processes and proofs of theorems and programs:

Many people have argued that computer programming should strive to become more like mathematics. Maybe so, but not in the way they seem to think. The aim of program verification, an attempt to make programming more mathematics-like, is to increase dramatacilly one's confidence in the correct functioning of a piece of software, and the device that verifiers use to achive this goal is a long chain of formal, deductive logic. In mathematics, the aim is to increase one's confidence in the correctness of a theorem, and it's true that one of the devices mathematicians could in theory use to achive this goal is a long chain of formal logic. But in fact they don't. What they use is a proof, a very different animal. Nor does the proof settle the matter; contrary to what its name suggests, a proof is only one step in the direction of confidence. We believe that, in the end, it is a social process that determines whether mathematicians feel confident about a theorem -- and we believe that, because no comparable social process can take place among program verifiers, program verification is bound to fail. We can't see how it's going to be able to affect anyone's confidence about programs.

Dijkstra wrote up a quick and cranky rebuttal of an early version of the above paper. Nothing to see there. (Really. His rebuttal is remarkably devoid of any interesting content.)

I find it very amusing that 35 years past the publication of "Social Processes", the state of the art in formal verification has remained at the same stage that it was at in 1972:

The only really fetching defense ever offered for verification is the scaling-up argument. As best we can reproduce it, here is how it goes:

1. Verification is now in its infancy. At the moment, the largest tasks it can handle are verifications of algorithms like FIND and model programs like GCD. It will in time be able to tackle more and more complicated algorithms and trickier and trickier model programs. These verifications are comparable to mathematical proofs. They are read. They generate the same kinds of interest and excitement that theorems do. They are subject to the ordinary social processes that work on mathematical reasoning, or on reasoning in any other discipline, for that matter.
2. Big production systems are made up of nothing more than algorithms and model programs. Once verified, algorithms and model programs can make up large, workaday production systems, and the (admittedly unreadable) verification of a big system will be the sum of the many small, attractive, interesting verifications of its components.

With (1) we have no quarrel. Actually, algorithms were proved and the proofs read and discussed and assimilated long before the invention of computers -- and with a striking lack of formal machinery. Our guess is that the study of algorithms and model programs will develop like any other mathematical activity, chiefly by informal, social mechanisms, very little if at all by formal mechanisms.

It is with (2) that we have our fundamental disagreement. We argue that there is no continuity between the world of FIND and GCD and the world of production software, billing systems that write real bills, scheduling systems that schedule real events, ticketing systems that issue real tickets. And we argue that the world of production software is itself discontinuous.

This 35-year old debate between Perlis and Dijkstra seems to be mirrored in the above discussion between Anton (cast as Alan Perlis) and his worthy opponents. For example, substitute "static typing" for "algorithm" in the following paragraph:

C.A.R.Hoare has been quoted as saying, "In many applications, algorithm plays almost no role, and certainly presents almost no problem." (We wish we could report that he thereupon threw up his hands and abandoned verification, but no such luck.)

### I find it very amusing that

I find it very amusing that 35 years past the publication of "Social Processes", the state of the art in formal verification has remained at the same stage that it was at in 1972

Hardly. We have numerous working theorem provers now, and a large body of research on how to verify programs in each of these systems, and even extract working programs from them (see proofs of various compilers and abstract machines like OCaml's). With the recent release of Concoqtion, verification has become even more accessible. Was any of this available in 1972? Even ML wasn't available back then.

To say that there is significant research still to be done on how to make verification even easier for an even larger body of programmers (like refinement types for instance), is not to say that our wheels are just spinning in the mud. Lightweight static capabilities (LWC) is an example of a currently accessible verification technique for everyday programmers, that has no equal in any untyped/dynamically typed language.

There is no doubt that software development as a whole is a social process, as open source has adequately demonstrated. Static type systems, and verification by extension, simply enrich the dialogue and ensure these "social interactions" are on a sound, consistent footing in a way that is otherwise not achievable.

This 35-year old debate between Perlis and Dijkstra seems to be mirrored in the above discussion between Anton (cast as Alan Perlis) and his worthy opponents. For example, substitute "static typing" for "algorithm" in the following paragraph:

C.A.R.Hoare has been quoted as saying, "In many applications, algorithm plays almost no role, and certainly presents almost no problem." (We wish we could report that he thereupon threw up his hands and abandoned verification, but no such luck.)

Now that's an interesting statement. :-)

As a developer working on just such "real applications", I can attest that "pure algorithms" play a marginal role. Fortunately, static typing and theorem proving is not about just verifying "pure algorithms", but about structuring your data safely, and consistently so as to maintain invariants and properties that the developer knows to be true about his system; even "real bills" have invariants. Without exploiting static typing as in LWC, a developer has no way to express and verify these properties except via rigourous, tedious, and therefore frequently non-exhaustive testing; even then, absence of evidence is not evidence of absence, which is a critical difference in the confidence achievable via typing vs. testing.

So Anton is right that static typing enforces structuring discipline and consistency, but that's not all it does.

### We choose to formally verify in this decade

So Anton is right that static typing enforces structuring discipline and consistency, but that's not all it does.

I'll use this as an opportunity to emphasize that I'm not saying that's all that static typing does. I do think that it's a more useful property than the proof aspect in many cases, though.

Of course, it's difficult to cleanly separate the proof aspect from the structuring discipline — although soft typing and similar approaches do do that to some extent, giving hints rather than orders to the programmer.

Although I'm enjoying this subthread, I also want to say that I don't see very much connection between the points I was making and the broader question of the viability of stronger formal methods. Sure, it'll be a while before the average developer in, say, corporate IT has to worry about attending an Alloy training course (although I'd enjoy being proved wrong). But the last thing we want is for the state of the art in corporate IT to be the model for the future of the field.

At the very least, work on formal methods serves "to organize and measure the best of our energies and skills", as Kennedy put it in his moon speech. But formal methods have already achieved more concrete things than that.

### My, How Time Flies

el-vadimo: This 35-year old debate between Perlis and Dijkstra seems to be mirrored in the above discussion between Anton (cast as Alan Perlis) and his worthy opponents.

I hope that we've transcended the "opponent" aspect of the debate—primarily by acknowledging two things: first, that economic considerations frequently legitimately drive the static/dynamic choice toward the dynamic side, essentially for reasons of reduced cognitive burden as the interpreter/compiler essentially never complains to you; and second, that various novel technologies including, but not limited to, multi-staged programming, abstract interpretation, etc. make the static/dynamic distinction blurry relative to what is traditionally presumed to be a sharp compile-time/runtime phase distinction.

Having said that, I can't help but notice that if we accept 1972 as the year that this debate reached its zenith, antedates Automath by only five years, Nqthm by only one year, and predates LCF, HOL, Isabelle, ACL2, Mizar, the entire OBJ family, Twelf, and, of course, Coq, to say nothing of many others.

In other words, the proofs they were talking about, and the compexity of what was being proven, were constrained by what could reasonably be proven by hand, and the social aspects being referred to revolved around how much you trusted a specific logician to get the proof right. The proofs of FIND and GCD in the quote above were not automated. The situation today is vastly different, with theorem provers and model checkers being applied in a variety of high-assurance domains. Coq is especially interesting for its ability to extract certified programs from proofs, supporting Scheme, O'Caml, and Haskell as target languages. Also, it's interesting to note that Hoare's FIND proof has been automated in Coq, and as Coq developments go, it's trivial.

Having said all of that, the question of why you should trust an automated verifier any more than a human one is a good one. One reason that I'm prepared to trust proofs in Coq is that Coq's kernel, the Calculus of Constructions, has itself been formalized in Coq.

When it comes to logic, that's the bottom line: mathematical logic has a multi-millenial history that has stood up extremely well to the demands of Popperian scientific philosophy, offering both explanatory and predictive power along with reproducibility. As Anton is quick to point out, programmers are still thinking in terms of types that, in many cases, could be represented statically even when programming in a dynamically-typed language; these languages are generally best thought of as resting on the untyped lambda calculus vs. a typed lambda calculus sitting somewhere on Barendregt's lambda cube, and it's just a question of whether these languages are essentially explicit about that (Lisp and, dramatically more so, Scheme) or not (Python, Ruby, et al.). If the programmer wished to, they could adopt one of the typed lambda calculi, just as Church himself did. The reasons not to are essentially social, rather than proofs lacking value—after all, from the Curry-Howard Isomorphism, we know that anyone working in a statically-typed language is doing theorem proving. They're probably just proving theorems in a not-very-expressive logic, and in the end, the goal of type-theoretic research in programming language theory is precisely to add expressive power to such languages while avoiding the burdens imposed by theorem provers with such expressive logics as those named above.

### Model-checking and model-finding

Having said that, I can't help but notice that if we accept 1972 as the year that this debate reached its zenith, antedates Automath by only five years, Nqthm by only one year, and predates LCF, HOL, Isabelle, ACL2, Mizar, the entire OBJ family, Twelf, and, of course, Coq, to say nothing of many others.

Not to mention that since that time software verification has come to include both model-checkers (such as SMV, FDR, and SPIN), and model-finders (such as Alloy Analyzer), many of which have seen actual use in real industrial projects. Granted, those things aren't strictly "theorem-proving" in the sense that you end up with a readable proof, but they do allow one to prove certain properties of a software system. So it's hardly correct to claim that nothing has changed since 1972.

That said, Anton has raised a very good point: software development, like other engineering activities, is constrained by economics. As in any engineering development, the goal isn't (and can't be) to produce the "perfect" design, but rather one that is "good enough" for the customer and environment. Or, as Anton put it, to create something that is "sufficiently bug-free". In other engineering disciplines, analysis is performed to increase confidence in a design (not to prove the design correct). The quantity of analysis depends on the degree of confidence that is required. Not surprisingly, greater confidence generally comes at greater cost. That's not to say that it isn't useful to develop new tools an techniques that make it easier and more economically feasible to analyze software - the more confidence we can obtain in a given design, the better, in my opinion. Just that I think that a lot of the arguments over dynamic vs. static type systems, and the merits of things like theorem-proving, tend to miss the point that different problems and domains have different needs and different economic constraints.

### Neither tests nor proofs

... and it finds many real bugs in real tested systems.

The work which I often express fondness for, the work of Dawson Engler.

### iceberg looming in the background

In comment-32691, Paul Snively wrote:

... the proofs they were talking about, and the compexity of what was being proven, were constrained by what could reasonably be proven by hand, and the social aspects being referred to revolved around how much you trusted a specific logician to get the proof right.

To lift yet another quote from the paper:

Suppose, however, that an automatic verifier could somehow be built. Suppose further that programmers did somehow come to have faith in its verifications. In the absence of any real-world basis for such belief, it would have to be blind faith, but no matter. Suppose that the philosopher's stone had been found, that lead could be changed to gold, and that programmers were convinced of the merits of feeding their programs into the gaping jaws of a verifier. It seems to us that the scenario envisioned by the proponents of verification goes something like this: The programmer inserts his 300-line input/output package into the verifier. Several hours later, he returns. There is his 20,000-line verification and the message "VERIFIED."

There is a tendency, as we begin to feel that a structure is logically, provably right, to remove from it whatever redundancies we originally built in because of lack of understanding. Taken to its extreme, this tendency brings on the so-called Titanic effect; when failure does occur, it is massive and uncontrolled. To put it another way, the severity with which a system fails is directly proportional to the intensity of the designer's belief that it cannot fail. Programs designed to be clean and tidy merely so that they can be verified will be particularly susceptible to the Titanic effect. Already we see signs of this phenomenon. In their notes on Euclid, a language designed for program verification, several of the foremost verification adherents say, "Because we expect all Euclid programs to be verified, we have not made special provisions for exception handling... Runtime software errors should not occur in verified programs." Errors should not occur? Shades of the ship that shouldn't be sunk.

So, having for the moment suspended all rational disbelief, let us suppose that the programmer gets the message "VERIFIED." And let us suppose further that the message does not result from a failure on the part of the verifying system. What does the programmer know? He knows that his program is formally, logically, provably, certifiably correct. He does not know, however, to what extent it is reliable, dependable, trustworthy, safe; he does not know within what limits it will work; he does not know what happens when it exceeds those limits. And yet he has that mystical stamp of approval: "VERIFIED." We can almost see the iceberg looming in the background over the unsinkable ship.

Luckily, there is little reason to fear such a future. Picture the same programmer returning to find the same 20,000 lines. What message would he really find, supposing that an automatic verifier could really be built? Of course, the message would be "NOT VERIFIED." The programmer would make a change, feed the program in again, return again. "NOT VERIFIED." Again he would make a change, again he would feed the program to the verifier, again "NOT VERIFIED." A program is a human artifact; a real-life program is a complex human artifact; and any human artifact of sufficient size and complexity is imperfect. The message will never read "VERIFIED."

By the way, thanks for the references. Even though you obviously don't read mine, I do read yours :) Oleg's page that you referenced above has a link to How to Believe a Machine-Checked Proof by Robert Pollack.

Thanks also to Derek Elkins for the Dawson Engler link.

### On the Contrary...

...I do read the references, and the lengthy quote that you offer above only reinforces the point that it was written prior to, e.g. an understanding of the Curry-Howard Isomorphism or, more specifically, what type systems implement what logics, and what the limitations of those logics (and therefore type systems) is. To any modern user of any of the systems I've named, it will only occasion a bout of eye-rolling that would not have been deserved in 1972, but is most assuredly deserved today. No one today argues that "runtime software errors should not occur in verified programs;" on the contrary, another Coq contribution is precisely a toolkit to reason about programs raising exceptions.

One of the major reasons that I provided the link to Oleg's comments about believing proof checking was for the link to Pollack's paper, which I had read some time ago. It's an excellent paper, and I'd be very curious to hear Pollack's thoughts on how close we are to having a mechanized metatheory for Standard ML, on the encoding of the Calculus of Constructions in Coq, on the extent to which he feels he can or cannot trust Coq's parsing and pretty-printing, and finally on Coq's maturity and the scope of mathematical logic that has been formalized with it.

So the point remains: theoretical computer science has made real advances since 1972. One such real advance is the identification of the Curry-Howard Isomorphism. This has informed both theorem provers and programming language design in positive ways—one of which has been to clarify the limitations of static type systems and even extremely expressive proof tools such as Coq, but another of which has been to show that such tools are good for more than proving the correctness of GCD or FIND.

Finally, it's early days yet. There isn't yet a certified compiler for any practical programming language. One for Standard ML will likely be the first, but I would expect others to follow shortly on its heels. To the extent that these certified compilers will be developed either with tools that have proofs of their own kernels coupled with a high degree of maturity (Coq), or at least the high degree of maturity (Twelf, unless there's a similar kernel proof that I'm unaware of), we'll be able to legitimately trust that these compilers are correct implementations of languages with whatever formal semantics they claim to have to a greater degree than mankind has experienced to this point.

This seems to me to be that great rarity—an unalloyed good.

### Memory errors

Systems that are "proven correct" are (reasonably) only proven correct with respect to the abstractions of the language. Even at the assembly level these are too high-level. A memory error (which is only going to become more and more common) would violate the correctly verified invariants. Most likely this will lead to data corruption.

Before detractors of "proving" programs swoop in, there are already ways of dealing with this. One is to run triplicate copies and have them vote on the correct output; two out of three wins. But more importantly, robustness against memory errors is another property which we can formally verify of our programs. In fact, that is such a property one would want to verify.

Verifying properties such as that one* looks like a good use of these kind of tools period. However, for proving ones program correct, the real issue lies elsewhere. Here is a quote from a #haskell-related IRC channel just yesterday: "oh, wait, I was solving the wrong problem." This is where most meaningful software problems typically lie. Only insofar as a given formal specification is easier to read and understand as compared to the code, is it of any potential use in this regard.

* other examples: deadlock-freeness, memory safety, maintenance of (certain) invariants, input sanitation

### A memory error (which is

A memory error (which is only going to become more and more common) would violate the correctly verified invariants. Most likely this will lead to data corruption. [...] Before detractors of "proving" programs swoop in, there are already ways of dealing with this. One is to run triplicate copies and have them vote on the correct output; two out of three wins.

Static Typing for a Faulty Lambda Calculus provides an automatic conversion from the lambda calculus to a fault tolerant lambda calculus which is resistant to random bit flips.

### notice of correction

In comment-32691, Paul Snively wrote:

Having said that, I can't help but notice that if we accept 1972 as the year that this debate reached its zenith, antedates Automath by only five years, Nqthm by only one year, and predates LCF, HOL, Isabelle, ACL2, Mizar, the entire OBJ family, Twelf, and, of course, Coq, to say nothing of many others.

Not that anyone cares at this point, but I should set the record straight. As the careful reader must've undoubtedly discovered for themselves, De Millo et al.'s piece was published in 1979 -- not '72. My bad.

As for when "bouts of eyerolling" reached their zenith, it wasn't until about 1993 -- the year of the QED Manifesto (google scholar).

### I'm confident I'll regret this...

el-vadimo: As for when "bouts of eyerolling" reached their zenith, it wasn't until about 1993 -- the year of the QED Manifesto (google scholar).

I give up. What's wrong with the manifesto? It reads like a feature-by-feature description of Coq to me. Let's just concentrate on the critique of existing work and their "root logic" description. Starting with the critique, which is Section 3, we can go point-by-point:

1. Too much code to be trusted. This is far and away the toughest nut to crack. Coq has perhaps the best story in the field here, however, with its formalization of the Calculus of Constructions, which is Coq's "root logic."
2. Too strong a logic. "There have been many good automated reasoning systems that 'wired in' such powerful rules of inference or such powerful axioms that their work is suspect to many of those who might be tempted to contribute to QED - those of an intuitionistic or constructivist bent." Obviously, the constructivists are pretty happy with the Calculus of Constructions. :-)
3. Too limited a logic. The concern is that a constructive approach will make it hard to reason classically. This wasn't an invalid concern, but see Coq's standard library and its support for classical logic and sets, as well as some of the contributions.
4. Too unintelligible a logic. Here is the Wikipedia entry for the Calculus of Constructions. It's extremely simple.
5. Too unnatural a syntax. I would look at Proof General plus X-Symbol in conjunction with coqdoc. Also note that Coq has extensive support for syntax extensions and interpretation scopes.
6. Parochialism. Coq is open-source under the LGPL.
7. Too little extensibility. This doesn't seem to be an issue for Coq, although Adam Chlipala has indicated that extending Coq proper is underdocumented. But I don't think that's what the manifesto is referring to; instead, this seems essentially like a restatement of their concern about, e.g. the difficulty of adding classical (or modal, etc.) support on top of the root logic. From that point of view, Coq is also extremely successful.
8. Too little heuristic search support. Obviously not a problem with tactics like "auto," "tauto," "intuition," "omega," for Pete's sake even "firstorder," and a language for writing whatever tactics you need.
9. Too little care for rigor. If someone wishes to make this argument against Coq, more power to them.
10. Complete absence of inter-operability. An excellent point; see this for some dated (2005) information on how Coq fares here.
11. Too little attention paid to ease of use. Also an excellent point. I like Proof General quite a bit, but other folks might want something easier still.

So those are the critiques of existing work. Let's now turn to Section 5, "The Root Logic—Some Technical Details." It's tempting just to quote the whole thing, but I'll try to resist. But here's a very important chunk:

The lambda-calculus-based "logical frameworks" work in Europe, in the de Bruijn tradition, is perhaps the most well developed potential root logic, with several substantial computer implementations which have already checked significant parts of mathematics. And already, many different logics have been represented in these logical frameworks. As a caution, we note that some may worry there is dangerously too much logical power in some of these versions of the typed lambda calculus. But such logical frameworks give rise to the hope that the root logic might be such that classical logic could simply be viewed as the extension of the root logic by a few higher-order axioms such as (all P) (Or P (Not P)).

One possible argument in favor of adopting a root logic of power PRA is that its inductive power permits the proof of metatheorems, which will enable the QED system to check and then effectively use decision procedures. For example, the deduction theorem for first order logic is a theorem of FS0, something not provable in some logical framework systems, for want of induction.

This is an excellent two-paragraph summary of the Calculus of Inductive Constructions, the basis of Coq. As we've already seen Coq does support the checking and use of a variety of decision and semi-decision procedures.

Update: I neglected to point out that "But such logical frameworks give rise to the hope that the root logic might be such that classical logic could simply be viewed as the extension of the root logic by a few higher-order axioms such as (all P) (Or P (Not P))." is explicitly addressed in Coq.Logic.Classical_Prop, where we find precisely:
 Axiom classic : forall P:Prop, P \/ ~ P. 

as well as Pierce's Theorem, etc.

Simply stated, it appears that complete proofs of certain theorems that involve a lot of computation will require more disk space for their storage than could reasonably be expected to be available for the project. The most attractive solution to such a problem is the development of "reflection" techniques that will permit one to use algorithms that have been rigorously incorporated within QED as part of the QED proof system.

Another excellent point, and Coq does indeed support proof by reflection.

In any case, it is a highly desireable goal that a checker for the root logic can be easily written in common programming languages. The specification should be so unambiguous that many can easily implement it from its specification in a few pages of code, with total comprehension by a single person.

I haven't done the exercise of extracting the CoC formalization as O'Caml code and trying to understand it, but I find it encouraging that the attempt can be made. Conversely, I haven't attempted to write "just a checker" for the Calculus of Constructions (Inductive or otherwise), either. But it might be fun to try sometime just from the CoC Wikipedia page linked above.

So mostly what I see here is "someone should pick a logic powerful enough to express most of mathematics and make it easy enough to use that lots of people do;" some hints that a constructive logic might do, if only one could rigorously lay classical logic atop it; some observations about computational costs and the value of reflection; a desire to support heuristic proof search; and a desire to support decision procedures... all of which exist in Coq today. I'll leave it to others who are familiar with, e.g. Twelf to offer any related responses.

Now, perhaps you could explain, in detail, what is eye-roll-worthy about that?

### death by a thousand Dedekind cuts

In comment-32691, Paul Snively wrote:

What's wrong with the manifesto?

As Perlis would put it, our disagreement is "at best a matter of taste, and at worst, a matter of philosophy."

I'm of two minds about it. Accordingly, there are two parts to my argument: (a) the main thesis and (b) the obligatory CYA caveat that I'll throw in at the end to cover the unlikely case that history will prove my main thesis wrong ;)

At its heart, our disagreement stems from the fact that you prefer to start counting from "3":

Starting with the critique, which is Section 3, we can go point-by-point...

I'm more of a traditionalist who likes to follow royal advice:

The White Rabbit put on his spectacles. 'Where shall I begin, please your Majesty?' he asked.

'Begin at the beginning,' the King said gravely, 'and go on till you come to the end: then stop.'

So then, why don't we start with Section 1, What Is the QED Project and Why Is It Important?

QED is the very tentative title of a project to build a computer system that effectively represents all important mathematical knowledge and techniques. The QED system will conform to the highest standards of mathematical rigor, including the use of strict formality in the internal representation of knowledge and the use of mechanical methods to check proofs of the correctness of all entries in the system.

A cranky old mathematician -- call him Абрам Ильич -- that I met around 1995 once recited me a ditty he had heard years ago from a group of young pioneers who were visiting A.I. and his colleagues at the Institute for Mathematics on some suitably solemn patriotic occasion:

Когда вырастем-то все мы,
Станем делать теоремы,
Чтоб советская страна
Теорем была полна.

When we all grow up / We'll be cranking out theorems / In order to pump the Soviet country / Chock full of them.

A.I. found it richly ironic that a group of children so innocently ignorant of the workings of the mathematical enterprise could be so dead on in their description of it. He liked to refer to it pejoratively as the theorem industry. He basically made the same argument that Perlis et al. make in Social processes...:

... mathematicians publish 200,000 theorems every year. A number of these are subsequently contradicted or otherwise disallowed, others are thrown into doubt, and most are ignored [emphasis mine - V.N.]

To borrow Morris Kline's memorable phrase, the vast majority of published results are "pea soup and tripe.". According to A.I. (and Hardy before him), the only kinds of result that are worth publishing and preserving are non-trivial results. A.I.'s definition of triviality was maddeningly unsatisfactory yet unquestionably true. A mathematician's theorem is trivial if any other mathematician with the same resources (i.e. time and training in the particular domain) could have established the same result.

The QED project aims to mechanize the production of tripe. (I should say "aimed", given its timely demise.) In light of this, it strains credulity to see people like Rudnicki proclaim:

I am interested in certain pragmatic aspects of building a repository of formalized, machine checked mathematics. I take it for granted that the value gained from building such a repository or repositories is self evident.

... It puzzles me why we have so many different proof assistant systems for doing formalized mathematics and yet the body of formalized mathematics organized in a systematic way is so pathetically small.

It is an exception rather than a rule that an actively working mathematician is willing to learn Mizar and even when they do, their interest is typically short lived.

It puzzles me that anyone would be puzzled by the big yawns that projects like Mizar elicit from working mathematicians.

Says Morris Kline:

... rigorous proof is the polish on mathematics. It is the last stage of a development. ... the rigorous proof comes after the discovery and full intuitive understanding of the theorem to be proved. This is true historically of mathematics in the large and is true of the very process of creating individual theorems. Today we are trying to emphasize the gilt and the polish and we are leaving out the substance.

Anton's favorite mathematician Richard Hamming says in The Unreasonable Effectiveness of Mathematics:

The idea that theorems follow from the postulates does not correspond to simple observation. If the Pythagorean theorem were found to not follow from the postulates, we would again search for a way to alter the postulates until it was true. Euclid's postulates came from the Pythagorean theorem, not the other way. For over thirty years I have been making the remark that if you came into my office and showed me a proof that Cauchy's theorem was false I would be very interested, but I believe that in the final analysis we would alter the assumptions until the theorem was true. Thus there are many results in mathematics that are independent of the assumptions and the proof.

Says Hilary Putnam:

Philosophers and logicians have been so busy trying to provide mathematics with a "foundation" in the past half-century that only rarely have a few timid voices dared to voice the suggestion that it does not need one. I wish here to urge with some seriousness the view of the timid voices. I don't think mathematics is unclear; I don't think mathematics has a crisis in its foundations; indeed, I do not believe mathematics either has or needs "foundations". The much touted problems in the philosophy of mathematics seem to me, without exception, to be problems internal to the thought of various system builders. The systems are doubtless interesting as intellectual exercises; debate between the systems and research within the systems will and should continue; but I would like to convince you (of course I won't, but one can always hope) that the various systems of mathematical philosophy, without exception, need not be taken seriously.

Neither -- I hasten to add - should automated theorem provers like Coq be taken seriously. This is not say that Coq, Mizar, and other similar efforts are useless. Misguided, perhaps; but certainly interesting and important. Just not in the way they are claimed to be. Byproducts of this type of research are going to be more significant than its stated goals.

### After all that...

...we find that we are in vehement agreement. The reason I didn't talk about the other sections of the QED manifesto was that they didn't seem to me relevant to whether or not the enterprise was either worth undertaking or (as I attempted to support) had essentially been successfully undertaken. Let me once again simply refer to material that I think supports your point, and with which I wholeheartedly agree:

Hilary Putnam:

I don't think mathematics is unclear; I don't think mathematics has a crisis in its foundations; indeed, I do not believe mathematics either has or needs "foundations". The much touted problems in the philosophy of mathematics seem to me, without exception, to be problems internal to the thought of various system builders. The systems are doubtless interesting as intellectual exercises; debate between the systems and research within the systems will and should continue; but I would like to convince you (of course I won't, but one can always hope) that the various systems of mathematical philosophy, without exception, need not be taken seriously.

Let me assure you that I am sufficiently familiar with the work of Gödel, Kolmogorov, and Chaitin to be in complete agreement with this. I have little to no interest in mathematical philosophy except insofar as these gentlemen have done great work in establishing its limits.

Neither -- I hasten to add - should automated theorem provers like Coq be taken seriously. This is not say that Coq, Mizar, and other similar efforts are useless. Misguided, perhaps; but certainly interesting and important. Just not in the way they are claimed to be. Byproducts of this type of research are going to be more significant than its stated goals.

If you limit your perception of what their "stated goals" are to "formally proving the foundations of mathematics," then we are in 100% agreement, and all I choose to do is to interpret one of the byproducts to be "providing an excellent toolset with which to design programming languages that won't screw their users over." The problem with this is that at least in Coq's case, it's hard to argue that that is a byproduct when it's exlicitly articulated as a feature of the system, up to and including extracting certified software from the Coq development into Scheme, Haskell, or O'Caml. But draw the line where you like; it doesn't matter to me. Coq's usefulness as an interactive theorem prover and certified software development platform is well-established, regardless of how you feel about the program of formalizing the "foundations" of mathematics, whatever that even means, if anything.

"A proof is a repeatable experiment in persuasion." — Jim Horning

I make no claim stronger than that.

### theorem industry redux

Re: comment 3513:

A cranky old mathematician — call him Абрам Ильич — that I met around 1995 once recited me a ditty he had heard years ago from a group of young pioneers who were visiting А.И. and his colleagues at the Institute for Mathematics on some suitably solemn patriotic occasion:

Когда вырастем-то все мы,
Станем делать теоремы,
Чтоб советская страна
Теорем была полна.

When we all grow up / We'll be cranking out theorems / In order to pump the Soviet country / Chock full of them.

А.И. found it richly ironic that a group of children so innocently ignorant of the workings of the mathematical enterprise could be so dead on in their description of it. He liked to refer to it pejoratively as the theorem industry. He basically made the same argument that Perlis et al. make in “Social processes…”:

… mathematicians publish 200,000 theorems every year. A number of these are subsequently contradicted or otherwise disallowed, others are thrown into doubt, and most are ignored.

Note to self: In Not only beyond Journals, not only beyond Papers. Beyond Theorems., felix breuer echoes this sentiment:

The flip side of [increased publication rates and ever-deepening specialization] is of course that the majority of these very particular theorems will simply not be relevant for most mathematicians … It is … likely that mathematicians who, 20 years later, arrive at an equivalent problem in another field will simply solve it again on their own. Which brings us to the other consequence: Mathematical research is becoming more and more redundant.

In the end, both trends, increasing specialization and increasing redundance, have the same effect: Mathematical research is becoming increasingly irrelevant.

[Entire post deleted: see discussion below. I am keeping the thread as is, for the record. -- Ehud]

### Implementable parameterized monads

Have you seen Oleg's note on parametrized monads, which explores a parameterization with separate type parameters for input and output states? :
     class Monadish m where
gret  :: a -> m p p a
gbind :: m p q a -> (a -> m q r b) -> m p r b
`
See also Robert Atkey's MSFP paper Parameterised Notions of Computation (cited on the above page), which has considerably more detail.

### Hmm...

This appears to be a strange and rather subtle kind of spam. The entire comment is copied from here. I cannot find an original source for this user's only other comment, but that comment also has the highly suspicious "Submitted by: [spam link]" footer...

If nicoduca is a real human, perhaps s/he will offer some defense?

### Sneaky

Thanks for pointing that out. It's sort of like being Sokaled! I've removed the links for now, and made the user subject to moderation. I'd nuke the comment but there's a legitimate reply to it... Spamming as cross-pollination?

### Haha

Yeah, really. I want to make a joke about Turing Tests or Chinese Boxes or something, but somehow I can't quite think of what to say... :)