Using real arithmetic to eliminate bounds checking?

The usual problem with eliminating array bounds checking is that integer arithmetic is undecidable. For example, the compiler can't figure out that you can index into an array of m·n elements using an expression like i·n+j, even if it's known statically that 0≤i<m and 0≤j<n. Or rather, that particular problem is simple, but it's hard to come up with a general algorithm that would work for many different programs. Many people are trying to use dependent types and machine-checkable proofs for that, like in Hongwei Xi's paper.

It just occurred to me that statements like "if i ∈ [0,m-1] and j ∈ [0,n-1], then i·n+j ∈ [0,m·n-1]" (all segments are closed) can be interpreted and proved in real arithmetic, instead of integer arithmetic. The nice thing is that real arithmetic is decidable. If that approach works, we can have a language with integer-parameterized types like "array of n elements" and "integer between m and n", and have a simple definition of what the type checker can and cannot automatically prove about programs using such types, without having to write any proofs explicitly.

Of course the system will be limited by the fact that it's talking about reals rather than integers. For example, there will be problems with integer-specific operations on type parameters, like division and remainder, but I don't know how often these will be required.

Are there any obvious problems with this idea? Has it been tried?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I have to say I've never

I have to say I've never seen Real analysis used for bounds checking. Maybe there's some original research to be done ;)

On the subject of dependent types and integer checking, Coq's "omega" tactic does a good job of proving/disproving numeric contraints. It's a decision procedure for Presburger Arithmetic, which is known to take doubly-exponential time, but I've never encountered a problems large enough for it to struggle.

I can certainly imagine such procedures being useful for using dependently typed language to implement a simply/dynamically typed language with automatic bounds checking.

Presburger arithmetic

I think Presburger arithmetic wouldn't be enough to handle the m·n example in the post, because it can't handle multiplication unless one of the multiplicands is a constant, unless I'm missing something about how Coq's "omega" works.

Modulo N

I think integers modulo-N are decidable, so limiting int to a machine word seems a usable solution. It would be something I would look at before reals. I plan on using CSP logic programming and goal-oriented answer - set - programming for this. This seems a proof of decidability in some sense, as the CSP outputs the ASP problem, and the SAT solver always terminates (eventually).

Integers modulo N

Integers modulo N are trivially decidable because you can check any statement exhaustively, but that's slow, because for k variables you need to check all combinations of k machine integers. The question is more about which theory would give you better shortcuts for realistic use cases. I have a hunch that reals would have a slight edge over integers, because they are simpler to reason about when multiplication is involved, but I'm not confident about that at all...

SAT and SMT

SAT and SMT have had huge advances in the past 20 years. Utilising these seems the best option and can already be seen in action in Microsoft researches F-star language which builds on F# and the Z3 SMT solver. Its a bit different from what I am looking at, but you can experiment with it right now.

Z3

I guess I'd be happier with a mathematical definition of which programs are well-typed, rather than point to something like Z3 as a mandatory part of type checking. I wonder if there's a realistic example of bounds checking that can be eliminated by reasoning about integers, but not reasoning about reals?

Registers

If the machine is using a register to do indexed addressing, then modelling this with integers modulo register size is more accurate than infinite precision integers anyway.

yeah

Yeah. That pretty much kills my original idea. Thanks :-)

Do you know if there's any decidable subset of arithmetic that would cover cases like m·n in my post?

not really

I have some ideas, but need to do some more work to understand if there is any advantage to the approach. Really all to vague to describe at the moment.

From the Wikipedia article you cite:

The decision procedures are not necessarily practical. The algorithmic complexities of all known decision procedures for real closed fields are very high, so that practical execution times can be prohibitively high except for very simple problems.

I suppose you could get lucky, but then we mostly get lucky when working with integers, too.

That is a great idea! Experiment using Z3:

This is actually a really great idea! At first I thought it wasn't necessary, as surely SMT solvers could solve such trivial inequalities. However, it turns out they can't! (I tested Z3, CVC4 and Alt-Ergo.)

In particular, Z3 cannot prove this simple inequality: \(i + 1 \leq m \land n \geq 0 \Rightarrow n(i + 1) \leq nm\) using integer arithmetics (the reasons are outlined in this StackOverflow thread).

However, Z3 also includes an advanced non-linear real arithmetic solver called NLSat (which is not enabled by default) that can also be used for solving integer artithmetic. For example, NLSat can prove the original statement in OP's post:


(declare-const i Int)
(declare-const j Int)
(declare-const m Int)
(declare-const n Int)

(assert (and (<= 0 i) (<= i (- m 1))))
(assert (and (<= 0 j) (<= j (- n 1))))
(assert (and (<= 0 n) (<= 0 m)))

(assert (not (<= (+ (* i n) j) (* m n))))

(check-sat-using qfnra-nlsat)

However, if we "make a mistake" (i.e. the statement we're trying to prove is false) NLSat will not necessarily find the counterexample (it might find a Real counterexample, but not necessarily an Int one), in which case Z3 will return unknown, and we have to run Z3's usual integer arithmetic solver in hope that that one finds the counterexample.

I'm currently writing an experimental type system that uses Z3 to verify function contracts. It's mostly finished, so in the evening I'll push it to the dev branch of my repository.

Cool!

Many thanks for looking into this! BTW, I think the last assert should be:

(assert (not (<= (+ (* i n) j) (- (* m n) 1))))

(If anyone's interested, there's a nice online interface that lets you play with these things without installing Z3.)

As Keean pointed out, using this in an actual programming language might be problematic if it's using machine integers. Though that might be just one more reason to use integers that promote to bigints transparently...

Right now my main test cases for eliminating bounds checking are these: matrix addressing, matrix multiplication (arrays MxN, NxK, MxK), binary search, quicksort, and Knuth-Morris-Pratt. I'm still trying to figure out whether the approach with real arithmetic will be enough for all these examples.

For bounds checking, not

For bounds checking, not really; if an array has length l, then as soon as you prove that the array access is in bounds, you can be sure that the result fits in a machine word. If you want to be sure that the calculations don't overflow, you can simply check before each multiplication, subtraction and addition that the result fits in a machine integer, for array indexes that shouldn't be a problem.

Anyhow, I added NLSat solving to my prototype, which you can now check out in the dev branch. I don't think NLSat is necessary for quicksort and binary search, however, as they only feature linear integer arithmetics.