Framing: The implementation side

One of the main obstacles for formally verified imperative software is the so called framing problem. A solution to the framing problem has to give a precise answer to the question: "What does a procedure modify and what does in leave unchanged?".

Some weeks ago I have presented the specification view of a new attempt to solve the framing problem (An approach to framing and mutability).

The following paper describes the implementation view of the frame contracts and how the implementation view and the specification view can be kept consistent.

Furthermore it demonstrates how loop invariants can be kept readable by using ghost functions.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Scope

When you say "a new attempt to solve the framing problem" it is a little unclear what your scope covers. The definition that you give is very general. When the solution has to "give a precise answer", how is precision measured and what is the most precise answer? I'm not familiar with Modern Eiffel, but does it not include features that would make the framing problem undecidable in general, e.g. general recursion, unbounded memory allocation etc?

The buffer example is a very thin layer on top of builtin functions. The semantics of a buffer are a small modification on the semantics of a builtin list. How does this approach scale up to more complex examples? In particular the ratio of specification size to actual code size seems to be quite high.

Scope, precise answer, ...

Thanks for the comments. Let me try to give the following answers to your comments/questions.

1. Scope: The presented example is simple, but the approach is general. I.e. the approach should solve the framing problem in general.

2. Precise answer: With "precise answer" I meant an answer which can be derived formally by a machine. The rules presented are written in English, but they are pretty formal. The compiler can derive the dependency graph of all functions. Since no cyclic dependencies are allowed (remember: recursive functions do not introduce cyclic dependencies) there is always a partial order and there are always functions on the bottom of the dependency graph. Having the dependency graph it can be answered formally what affect a routine has by looking at its specification.

3. Undecidability of the framing problem: The derivation of the implicit framing conditions is not undecidable (or uncomputable) because I have given an algorithm on how to compute them. I do not understand what kind of undecidability you have in mind. Could you elaborate more on this concern? Do you mean termination of the involved functions and procedures?

4. Scaling: I admit that the example given is fairly simple and not really challenging for the framing problem since a buffer cannot reach other buffers, i.e. it cannot reach objects of the same type. I have chosen this simple example to get the idea on how to derive the implicit framing conditions across. But the method scales. The most challenging examples of the framing problem (at least to the best of my knowledge) are linked structures like lists (simply or doubly linked), trees and general graphs. I am currently preparing a paper to demonstrate the method on linked structures. It works well. It works well because it uses enough abstraction. Usually the affect of a procedure is described in terms of what it modifies on the heap. The method I have described does not talk about modification of heaps or memory cells. A procedure modifies functions. It modifies the access functions of the datastructure. And it modifies the access functions only for the described arguments. The values of the access functions for other arguments are untouched. You can apply this kind of reasoning to any datastructure.

5. Ratio of specification size to code size: Why do you see the ratio high. The derived framing conditions are implicit. There is no need to write them down, they are derived by the compiler. I consider the explicit part of the specification quite compact. If you write a library you have to give to your users some kind of description. If you write a buffer with a procedure "extend_rear" you have to give at least some description like

Append an element at the end of the buffer

. Compare this verbal description to the specification

content = old (content + e)

. Is the ratio really high?

Undecidability

Let's assume that you analyse a function that contains a conditional access to a data element, something like (ignoring the proper syntax) :

if( pred(X) ) {
  access foo.blah;
}

If I've understood your blog properly then foo.blah would always be in the dependency graph for the function. This answer is not as precise as possible (as only certain values of X trigger the branch with the dependency), but it is a conservative approximation meaning that the problem of calculating the dependency graph / framing problem is decidable.

functions and procedures

In Modern Eiffel I distiguish between functions and procedures. A function does not have any visible side effect. I assume you talk about a function in this sense in your comment.

Functions are either on the bottom of the dependency graph or they depend on others. The function in your example obviously depends on the function "bla" applied to the argument "foo". Furthermore it depends on the predicate (which is a function as well) "pred" applied to the argument "x".

I assume your code snippet above comes from the implementation of the function. But for deriving frame contracts we need the specification. Let us assume your function has the following definition

   func(x:X, foo:FOO): RT
       do   
            ...   -- implementation
       ensure
           Result = if pred(x) then foo.bla else f2(x) end
       end

Then this function depends on "pred", "bla", "f2". The specification is visible to the caller of the function. Therefore the caller has not only the dependency information, but also the definition. It is therefore always clear to the caller what data are accessed by a function. I.e. the contract is as precise as it can be. The implementation of a function can just decide about the algorithm on how to do the actual calculation. The implementation has no freedom in choosing on which visible other functions it accesses. There are no nondeterministic functions.

Now let us look at a procedure which modifies "func".

    some_proc(x:X, foo:FOO)
        ensure
            func(x,foo) = old exp
        end

The only thing which a user can derive from this specification is that "some_proc" must have modified the functions "pred", "bla" and "f2" (or functions on which these functions depend) in a manner that "func(x,foo)" has the specified value. As opposed to functions, procedures have freedom. If e.g. the value of "func(x,foo)" has already the desired value at the entry point of the procedure, the procedure might choose to do nothing or change "pred", "bla" and "f2" in such a manner that the condition is maintained. From this contract it has the freedom to do this. But it can modify "pred" only at the argument "x" and "bla" only at the argument "foo" and "f2" only at the argument "x". It has to leave the functions unchanged for other arguments. And it must not modifiy any functions on which "func" does not depend.

The implementor of "some_proc" can change the implementation and the potential user of "some_proc" has to work with any implementation which respects the contract. That is what contracts are good for. If the specification of a procedure leaves some things open then it cannot be expressed more precisely.

The implementation has no

"The implementation has no freedom in choosing on which visible other functions it accesses."

Given that the caller can only rely upon the contracts, does this mean that the implementation of func has to use exactly pred(x), foo.bla and f2(x) or can it use alternatives that satisfy the same contracts?

I suppose it might be confusing if g(x) is used instead of f2(x) in some but not all instances.

Could I replace f2(x) with g(x) where g(x) is a ghost function with the same contracts as f2(x)?

equals for equals and identicals for identicals

Thanks for the question. My statement has been a little bit too rigorous. Clearly if two functions are equivalent one can substitute one for another. In the implementation you usually use functions which are the most effective and in the contracts functions which are easier to understand are preferable. But the implementation and the specification of a function must be equivalent.

This is not true for procedures. The implementation of a procedure might satisfy a stronger contract than required (e.g. the implementation can be completely deterministic whilst the specification might leave some more options open).

Derivation of implicit contracts

Given that a procedure must specify all of it's side-effects why do the compiler-derived contracts contain some, but not all, of these implicit non-effects (ie for a /~ b)?

As a user I'd prefer to see all or none. Is there a benefit to the compiler I'm overlooking?

In your some_procedure example I'd expect to see one of the following:

some_procedure(b:CURRENT)
    ensure
      not b.is_empty
      b.capacity = old b.capacity
    end

or

some_procedure(b:CURRENT)
    ensure
      not b.is_empty
      all(a:CURRENT) a.capacity = old a.capacity
      all(a:CURRENT,i:NATURAL)
        require
          a /~ b
          i < a.count
        ensure
          a.is_empty = old a.is_empty
          a.count    = old a.count
          a.content  = old a.content  -- needs to be kept
          a.first    = old. a.first
          a.last     = old a.last
          a[i]       = old a[i]
        end
    end

derived frame contracts

In the written code you would usually just see

    some_procedure(b:CURRENT)
        ensure
            not b.is_empty    -- b.capacity = old b.capacity is not necessary, because "is_empty" does not depend on "capacity"
        end

It is not wrong to write some or all of the values which are not modified, but it is not necessary.

The compiler goes down the dependency graph to discover the basic functions which can be modified by this procedure. In this case it discovers that the only basic function involved is "content" for the argument "b", i.e. "b.content" might have a new value and "a.content" for all "a/~b" must be unchanged. Since "is_emtpy", "count", "first", "last" and "[]" all depend on "content" for all "a/~b" these functions have to be unmodified as well as you spelled out correctly in your example. But it is not necessary for the copmiler to derive these conditions explicitly because the compiler "knows" the definitions and can derive these conditions if needed.

As a user you don't "see" what the compiler derives. But as a caller of "some_procedure" you can count on these facts. On all of them, regardless whether the compiler derives them explicitly or implicitly by knowing the definition of the functions.

Therefore from a user's point of view you are right. The compiler infers all these non-effects and you can count on them.

As a user you don't see what

As a user you don't see what the compiler derives.

Interesting. I wasn't sure if I had understood your idea on your language design blog. Seems like you're confirming.

I don't know if you actually have it figured all out, but this looks very promising.

Then, as a user, if possible, I'd also love the cherry on top:

provide a compiler option to generate some form of annotation/metadata allowing to inspect, if so wished, the full set of (compiler-made) derivations along with the rest of these specs (ideally, type or class-grained if separate compilation is available).

Would be great, IMO.

Since the language

Since the language definition phase is not yet concluded, the compiler still has to wait a little bit. But I will consider your option.

Anyhow: Good diagnostics are needed in case that the compiler/verifier cannot verify the assertions in order to give a good hint to the developer on where to refine the specifications and assertions.

I was in fact thinking that

I was in fact thinking that with this kind of information provided "for free", the programmer could *also*, in some cases, actually *learn* about interesting things for himself/herself, from the compiler's derivations over the dependency tree and, therefore, on the full extent of his/her explicit specs + the compiler-inferred ones.

But of course I understand you have other design-level priorities to tackle, first.

I was just anticipating a bit over the provisions you'd have to make, maybe, before or during bootstrap, in order to support this sort of feature (e.g., in some runtime, or library, or builtins or whatnots).