Joshua Bloch: "Nearly All Binary Searches Broken"

This amusing discussion is coming up all over the place.

Notice that a language with slightly different semantics would "solve" this problem, or at least mitigate the problem: All you need is to have range constraints on integers, thus not allowing the numbers to overflow unnoticed. This is the standard out of the box behaviour of Ada, for example. So another way to view this is as a question of finding good default behaviour for programming languages (you can suppress the range checks in Ada, if you *really* want to).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I'm just a tyro but...

Couldn't the problem have been avoided by using Java's BigInteger in the first place? For the search algorithm there is constant time overhead in doing so, but this algorithm would only run about 30 iterations at the array size where the programming error becomes a factor.

No matter which preferred remedy -- bounds checking or infinite precision[1] -- is selected, the error is ultimately Joshua Bloch's. It's amusing that he assigns blame to Programming Pearls when that book includes the passage:

"As laborious as our proof of binary search was, it is still unfinished by some standards. How would you prove that the program is free of run-time errors (such as division by zero, word overflow, variables out of declared range, [...])"

It seems that Bloch thought that question was rhetorical rather than something that programmers should seriously consider.

[1] Or the solution outlined in Bloch's blog, which makes me sad for some reason.

Couldn't the problem have

Couldn't the problem have been avoided by using Java's BigInteger in the first place?

Kind of. Arrays in Java are indexed using ints, not anything else. So to avoid the error, you'd have to use BigIntegers for all the intermediate computations, and then convert the BigInteger to an int whenever you are about to dereference it.

Optimization

I guess that would be optimizing for correctness at the expense of writability. On the other hand it would be rather silly to bring in BigInteger when Java arrays have a size limit of their own.

It still kind of grates me that Joshua Bloch's blog carefully avoids taking any kind of responsibility for the error in the Java code. For example:

the binary search program ... in Chapter 5 of Programming Pearls contains a bug...

The version of binary search that I wrote for the JDK contained the same bug...

The bug is in this line:

This bug can manifest itself for arrays...

So what's the best way to fix the bug?

And now we know the binary search is bug-free, right?

The binary-search bug...

The general lesson that I take away from this bug...

A bug can exist for half a century despite our best efforts to exterminate it.

Contrast Dijkstra:

We could, for instance, begin with cleaning up our language by no longer calling a bug a bug but by calling it an error. It is much more honest because it squarely puts the blame where it belongs, viz. with the programmer who made the error. The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation. The nice thing of this simple change of vocabulary is that it has such a profound effect: while, before, a program with only one bug used to be "almost correct", afterwards a program with an error is just "wrong" (because in error).

Clarification

I'm sorry if I didn't make this clear in the original posting: The program contained a bug. I wrote it. It's my bug. I take full responsibility for it (in the JDK).

Josh

I think that was clear. It

I think that was clear. It is nice to see people owning to their mistakes by the way. Cheers.

When I have to write a

When I have to write a binary search routine, I pretty much assume that it will have a fencepost error or two with 100% probability. So I was going to ask you why you didn't use the standard library binary search, until I realized you were the one writing it. :)

Response on a different issue?

Welcome to LtU, Mr. Bloch! Thanks for "owning up" to the bug: I'll try not to beat you up with Dijkstra quotes :) .

I'm actually more interested in your response on a different issue, namely this remark pointed out by "crux":

It is not sufficient merely to prove a program correct; you have to test it too.

This line also caught my eye in the original post, as it seemed to me unfairly critical of the idea of proofs about programs (sorry, Frank: proofs about algorithms!). Perhaps you just meant to call attention to the difficulties in applying proof techniques to real programs, but I suspect that many people reading the Google blog wouldn't make fine distinctions in this area.

Various people above and below have pegged the problem to specification, language implementation, types/or data structures, but I would focus (as some here have) on the difference between the relatively informal notion of proof Bentley used in Programming Pearls (if I recall correctly, just a human-readable one) and the more rigorous, machine-checked kind of proof that is typical of, e.g., the Coq system. These sorts of problems (relating to the difference between Java ints and mathematical integers) are just the sort of thing that automated proof systems are likely to catch.

I have a vague sense that Coq in particular has some "standard libraries" relating machine-style and mathematical integers which might be used to catch such a bug (though perhaps not in the context of C programs). Can anyone more familiar with Coq verify this? (I no longer have access to the Coq book that may left me with this impression.)

Anyway, my reaction to the quoted remark was that it's unfair to criticize proofs about programs broadly on the basis of this particualar kind of failed human-readable proof. Of course, a good machine-checked proof requires a lot of foundation in specification, etc., as others have pointed out, but I think that the tools to catch this kind of bug are actually in place with current machine-checked technology.

Any response on this? (And thanks again for your participation.)

Response on a different issue?

I'm actually more interested in your response on a different issue, namely this remark pointed out by "crux":

It is not sufficient merely to prove a program correct; you have to test it too.

This line also caught my eye in the original post, as it seemed to me unfairly critical of the idea of proofs about programs (sorry, Frank: proofs about algorithms!). Perhaps you just meant to call attention to the difficulties in applying proof techniques to real programs, but I suspect that many people reading the Google blog wouldn't make fine distinctions in this area.

Indeed I was trying to call attention to the difficulties in applying proof techniques to real programs, but mainly I was tipping my hat to Donald Knuth, who said in 1977 "Beware of bugs in the above code; I have only proved it correct, not tried it" [Personal communication to Peter van Emde Boas].

Like Kunth, I believe that formal proofs are not sufficient to guarantee the "correctness" of a program, because there are no guarantees that the model against which the program was proved mirrors reality. Michael Jackson writes about this in Software Requirements And Specifications [ACM Press, 1995].

Therefore, I do think it's critical that you test programs even if you've proven them correct. Also I think it's critical that you get other smart people to read them. And yes, I believe that you should use languages that eliminate entire classes of bugs. As I'm sure you're aware, Java does this to some degree, even though it doesn't eliminate the possibility of arithmetic overflow.

Simply put, I believe that you should do everything in your power to ensure that your programs are correct. There is no silver bullet. Agilists may say that testing alone is sufficient; it isn't. Formalists may say that proofs alone are sufficient; they aren't. I'm a firm believer in the efficacy of testing, and of formal methods, but I think they work best together.

Josh

Josh

Two aspects to this problem

(which turn out to be just one!)

1. You need to write your proof at the "level" of the programming language, otherwise the proof might not reflect the real semantics. This is why the "algebra of programming" approach (discussed here many times) is so intersting and important. Since algebraic manipulation of code is such a desried goal, functional programming raises so much interest (even though not everyone here thinks it is the holy grail).

2. You need proofs that are mechanically verifiable, since proofs can contain errors just as programs can. Indeed, many programmers prefer tests over proofs because they have more faith in tests - since they know how often their "proofs" are no no good (they learned how to write bad proofs in school - their grades proved it!) They simply know that a proof written by them on paper can be junk and they won't know it without outside help, while a failed test case is easy to spot. This is why people need more knowledge on writing mechanized proofs or why such techniques should enter mainstream compilers. Using proof assistants (or theorem provers) is too much of a black art. Of course, if proofs are done by program transformation (see item 1) they are also much easier to verify.

Error!

It contained an error! :-)

As I read my messages again, I realize that they are harsher than I intended. But I do feel that the words we use to talk about programming are important.

I wonder about the points made regarding defensive programming. Many practices of defensive programming -- e.g. try/finally resource handling, copying instead of reference passing, contracts, or data validation -- can be automated in whole or in part by PL semantics, compilers, or language runtimes. What will it take for these PL niceties to gain broader acceptance, similar to how garbage collection is now something expected by most programmers today?

....Sad for some reason...

I suspect the following quote will make lots of the more mathematically-inclined sad, as well:

It is not sufficient merely to prove a program correct; you have to test it too.

(especially so when followed up with a discussion of how comprehensive testing is, in fact, impossible.)

Nearly all?

Many languages don't limit the size of their integers arbitrarily: Scheme, Lisp, Python, Erlang, Smalltalk, Icon, Ruby, Dylan, etc.

Even in those which do, the range of an integer is usually in the same order as the size of the address space. Considering the size of an array element (it's extremely rare to need a binary search among a billion of bytes), the array size will overflow sooner than integers used to count them.

Java limits array sizes to 32 bits, and OCaml on 32-bit platforms to 22 bits. These are more serious problems, and unfixable by adjusting a line of arithmetic.

Er...

Qrczak: OCaml on 32-bit platforms to 22 bits

No; an int on 32-bit platforms is 31 bits in size. However, O'Caml also has int32 for when that's critical, the downside being that int32 is a boxed type. Of course, O'Caml also has arbitrarily-sized nums. But for the binary search problem, you may want to see this post on statically fixed array dimensions in O'Caml using phantom types. Warning: not for the faint of heart. :-)

Actually

It's just array sizes which are limited to 22 bits.

Harrumph.

This is really exactly the wrong attitude to have. It puts the blame in the wrong place, and perpetuates the very thing you criticize. After all, in ten years, maybe it won't be "integers" but rather "rationals" (consider ZUIs) or "XML" or "URLs" or something like that which is not implemented the way you want it. No language is going to provide everything you need, with the performance profile you need, in its standard library, and what's indispensible for one programmer is irrelevant to another. That's why we have programming languages (which let you compose things) rather than just programs.

Furthermore, programming languages don't "have integers". int is a standard C type, and the name suggests it's supposed to represent integers, but they are plainly not integers. You know this. If you need integers, it is your job to code them or pull them in from some library.

Blaming the language is disingenuous, especially when it comes to C or Java, which have a trillion freely available libraries. Even if you can blame the standard libraries for not attending to a ubiquitous need, who is the one who decides to use the standard libraries? You. if you use something which you KNOW is not an integer and then later complain that it causes bugs in your programs because it's NOT an integer, where should the blame go?

Neither is it the case, BTW, that Java and Ocaml only provide arrays of a certain size. What they only provide is special syntax for arrays of a certain size. It might be the case that there is no way to code up larger arrays without a performance hit larger than you would get by building them in, but that is a different matter and goes to expressivity.

Update: BTW, I take this attitude not because I think it's a crime to use ints where integers are required, but rather because the claim that "ints should be integers" is simply not constructive in a context where you can define whatever type you like. If you can level that criticism at int, purely because of how it's named and how other people use it and what they may or not prescribe, then it becomes admissible to level the same sort of criticism at other elements of the language. But that sort of criticism is not intrinsic to the language at all. The former reflects rather on particular users, and the latter goes against the basic tenet (which languages like Ruby, however, flagrantly and, I think, very wrongly flout) that a thing is not defined by what you name it.

RE: Harrumph

When language designs involve decisions that introduce discontinuity problems, that is something that can reasonably be blamed on the language. When these discontinuity problems penetrate deeply into the 'standard' libraries, that is something that can reasonably be blamed on the language.

As you say, we use programming languages (which let you compose things). Wherever there is a composition 'gotcha' (safety issue or performance issue or security issue) or arbitrary limitation against composition, you have a fair target for criticism.

The use of fixed-width integers called 'int', and their use throughout the standard libraries, is such a case.

Don't be so quick to blame the user when it is the language-designer who sets precedents, influences culture, and decides the initial libraries and composability profile for the language.

This isn't a bug in the algorithm

Rather, this is a problem with types.

On 32-bit platforms, it's functionally impossible to allocate arrays of 2^30 elements of anything larger than a char, maybe a short- and even chars and shorts are iffy at that size, and the more non-trivial the program, the more iffy they are.

The error here is that Java specified *ints* to be array indexes, not *longs*. On 64-bit platforms where it is possible to allocate multi-gigabyte arrays, this starts becoming a serious limitation. Shortly after they discover the bug in binary search at 1<<30, they discover that they can't create an array 1<<31 elements in size.

Just messing around

It's not a problem with types, it's a problem with data structures :)

The fact that these data structures happen to match with types is almost a coincidence. My types are about energy management and memory deallocation.

Sorry for playing purist, I probably spend too much time teaching.

It's nice to keep in mind

It's nice to keep in mind that arrya indexes and array sizes are not the same thing in all languages. In some languages a[10] means that the indexes are 0..9, but there are language in which you can define an array a(100000..100001) which contains just two elements, even though th indexes are quite big...

Personally I think there are

Personally I think there are two problems. One is that people treat ints as if they were integers, much in the same way that some people (less nowadays, I guess) treat floats and doubles as if they were reals.

The second is that (I suspect) the proof that Bentley did was probably a proof about an imaginary program of an imaginary programming language. The truth is that you cannot strictly prove anything about C programs or Java programs because they don't have an axiomatization!

And even if they did it would be — to borrow the standards groups' jargon — "informative" not "normative". As long as you prioritize the informal spec over the formal one, you are effectively leaving a "loophole" for any possible application of formal methods. The assertion that formal methods cannot eliminate bugs becomes a self-fulfilling prophecy.

(Lest someone call me on this: formal methods cannot eliminate bugs from concrete objects like "real" programs stored on real magnetic disks and running on real silicon, but they can definitively eliminate bugs from the formal object that a program represents. And THAT is valuable because it assigns blame where it is due. If the formal spec is normative and consistent, for example, and your "real" program does follow its own (consistent) spec, then the blame has to fall by definition on the the language implementation.)

Right. When I teach about

Right. When I teach about proving correctness I always stress the fact that proving the algorithm is not the same as QA of the program.

intended vs actual behavior

Frank Atanassow: ...but they can definitively eliminate bugs from the formal object that a program represents. And THAT is valuable because it assigns blame where it is due.

It helps to make such a distinction when asked to change existing code, from an old before state to a new desired after state, since the before state actually has two parts: 1) what it was supposed to do, and 2) what it actually does. In typical uncommented code, intended behavior must be inferred. Usually there's a bug in 1 or 2, or sometimes both, and conversations are confusing without a standard way to distinguish 1 and 2. (Folks get puzzled when told about a bug in code's intended plan, and that yet another bug stops the code from even achieving the intended plan. :-)

On another sub-topic, aside from a choice of integer representation, it's a coding error to permit overflow when unnecessary. Instead of (lo+hi)/2, one might instead calculate lo+(hi-lo)/2 which can't overflow, and works when an invariant guarantees hi is not less than lo and both are positive.

In recent years I've gotten lazy on Linux by assuming 64-bits for long long when I want to check a 32-bit addition for overflow. Back when I only had 32-bit arithmetic, I'd check for overflow (of two positive integers added together) by checking whether the sum was less than either addend. Generating errors on overflow was better than ignoring overflow and then corrupting data.

Putting lenses to work

It helps to make such a distinction when asked to change existing code, from an old before state to a new desired after state, since the before state actually has two parts: 1) what it was supposed to do, and 2) what it actually does. In typical uncommented code, intended behavior must be inferred.

So this looks like a wonderful application for Bi-Directional Programming:

  1. One has implementation version X.
  2. From that, one derives specification version X.
  3. Specification is modified, producing specification version X+1.
  4. Specification version X+1 is merged with the implementation X, producing implementation X+1.

Now, if only that would really work (also, deriving specification from implementation sounds unsound).

In C, too...

There are a number of C environments for 64-bit processors that provide 32-bit ints and 64-bit longs and pointers (or “I32LP64” for, uh, short), which could lead to the same class of problem. But this is what size_t and ssize_t are for (well, and intptr_t and uintptr_t; ah, standards), and well-written code will use them for that.

Of course, not all code is well-written.

sliding windows

My favorite version of this particular problem is ye olde sliding window protocol. The sliding window protocol is only correct with unbounded sequence numbers but every implementation that I know about, TCP for example, uses bounded sequence numbers.

You could use bigints to fix that, but then you would be faced with a couple of further questions: Are you really willing to send that much overhead on the wire, in terms of bits of sequence number vs. bits of data? Do you really want your interrupt service code to be dealing with bigints?

Modular arithmetic

Another interesting problem in the same class is that of a wraparound counter, such as the 32-bit quantity on Windows machines that wraps every 42 days. It is sometimes convenient to store and use this quantity to compute small deltas (under 21 days) but it is difficult to prove this sort of code correct.

Of course, in the large scale you have the Y2K and UNIX timestamp problems. It's interesting how people deal with these -- by assuming that people can't live past 100, for example. This is the same class as the above problem in that the relation between the representation and intended meaning (time) is one-to-many.

Re: bigints in network protocols -- I think we should learn to reason about upper bounds, not fight them, since their existence gives us nice properties to infer about systems -- like how much $$ to spend on a given project :)

sliding

As I understand it, that's not a problem until the window size starts to become a significant fraction of the space of sequence numbers, since the wraparound is specified to Do The Right Thing. (Like DNS zone serials.)

Window size and sequence numbers

Yes, the problem appears when you have two un-acknowleged items with the same sequence number. I think that can only happen when the window size is greater than the sequence number space, but I may be missing something---it's been a while. But with multi-gigabit/second networks and 32-bit sequence numbers, you can wind up in the odd situation where the sequence number space becomes a performance limit. Assuming, of course, that your TCP correctly handles the situation.

The solution? Expand the sequence number space by using a TCP option. Sure, that'll work!

If anyone's interested, one place to start would be IEN-74, Sequence Number Arithmetic, and RFC 1982, Serial Number Arithmetic.

Assumptions

Would anyone have assumed that the compiler would have left an integer division by 2 to be executed as a division? The bug isn't in the sort algorithms: it's in the math library.

this bug is irrlevant

Since nobody pointed it out so far: I refuse to lose sleep over this bug, because it will not be triggered under credible circumstances.

First off, we should limit discussion to sane platforms, where int and pointer are about the same size. If an int is not sufficient to index the whole available memory, life becomes too difficult.

On these platforms (assume 32 bits for integers and pointers), the bug only recovers one bit of accuracy. I maintain that this bit doesn't matter: since 2**30 elements is almost the complete asdressible memory, you will never be in a situation where an array of 2**31 elements is sufficient but one of 2**30 elements is not. Next week you'll need more elements and you are out of luck again. Therefore, either 30 bits is plenty enough or 31 bits is not enough, either. In the latter case, you'll need a more general solution (BigInteger) anyway.

Regarding the "proof" of correctness: this shows that humans are too unreliable to verify a proof. An automated proof verifier wouldn't have accepted thbe proof, because it cannot be fooled into thinking that (low <= high) implies (low + high <= high + high). Proofs without automated verification are useless.

Ignore this

Never mind. I thought that the current JDK source might still have this bug, but I had misread it.