Mutable Structures: Arrays

Nearly all languages with imperative elements have some kind of an array. In C
an array is just a typed pointer to a memory region. You can address the
different elements by giving an offset to the start of the memory region. The
compiler multiplies the offset with the size of the objects contained in the
array to get the actual offset in bytes.

The C view of arrays has certain advantages: It is memory efficient because it
uses just one pointer. Accessing elements is easy because it just needs some
simple address arithmetic.

But the C view has some pitfalls as well. First you might fail to allocate
memory for the array. Then the pointer to the memory region has some undefined
value. Second you might read elements of the array which have not yet been
initialized. Third you might read from or write to elements of the array which
are not within the allocated region.

Modern languages avoid this pitfalls by certain methods which can be executed
at compile time or at runtime. The failure to initialize the pointer to the
memory region is usually handled by initializing all pointers with zero. Any
access to a zero pointer results in a runtime exception. Addressing elements
outside the region is usually handled at runtime as well by storing with the
array its capacity and generating an exception if elements outside the region
are addressed.

All these solutions avoid memory corruption but they have a cost. Additional
code is required to check null pointer accesses and to check out of bound
accesses. The single pointer implementation is no longer sufficient because
the size of the allocated region has to be stored in the runtime object.

A language which allows formal verification can avoid all these pitfalls
without any cost. Without any cost? Well -- without any runtime and memory
footprint cost. But nothing is free. Some assertions must be provable at
compile time. I.e. whenever you want to access some element of the array you
have to be sure that the index is within the array bounds. And it is not
sufficient that you are sure. You have to convince the verifier of this fact.

In the article we show an array structure in our programming
language which allows us to convince the verifier that we make no illegal
accesses.

Since an array is a mutable structure we have to address the framing problem
as well. The framing problem is addressed appropriately if for any modifying
procedure it is clear not only what it does change but also what it leaves
unchanged.

A lot of effort is done currently in the verification community to address the
framing problem. Frame specifications like modify and use clauses for each
procedure, separation logic, different sophisticated heap models etc.

In the article we demonstrate that the framing problem can be solved without
all these complicated structures. It is sufficient to regard access to data
structures as functions and have a sufficient powerful function model. It is
amazing to see that proper abstraction makes some difficult seeming problems a
lot easier.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

An interesting language for

An interesting language for low-level programming with sophisticated types is ATS (Applied Type System). It has dependent types (useful for proving correctness of pre/postconditions and pointer arithmetic) and linear types (useful for tracking resource usage like memory allocation).

One the plus side, and why I mention it here, is its excellent integrateion with C. It can handle C data, pointers and functions natively and allows them to be wrapped with more detailed types.

As a negative, in my opinion, is its syntactic distinction between compile-time and run-time information. There are two kinds of argument and two kinds of return value. This can lead to cluttered and redundant code. For example, if we statically know the value of a numeric argument, we must express this with a compile-time argument N of type Nat and a run-time argument of type {N} (a singleton type containing only the value of N), we can't just use one argument. When compared to languages like Agda or Epigram, which tend to carve up problems at their natural seams, this boilerplate can make ATS code look like a sledgehammer :(

compile time and runtime information

Hi Chris,

in the language I am designing there is basically no distinction between compile time (i.e. verification time) and runtime information.

There is one exception: ghost types. Functions using or returning values of ghost type can be used only within assertions and are used only for reasoning purposes. The cannot flow into any runtime structure. This gives a lot of power to the verifier because you can express functions which are not computable and predicates which are not decidable and use them within the proofs. I.e. you can work like a mathematician and define functions which are convenient for your proofs without taking care whether they are computable or not.

Example: The capacity and the domain (i.e. the set of valid indices) of the array. These two attributes are ghost attributes and used for reasoning purposes only.

C arrays aren't pointers

It's a common misconception.

Were C arrays pointers, char b[3][5]; could't work.

The type of b is char[3][5] and the type of b[0] is char[5].

Where the confusion comes in is that b[i] or *(b + i) operates on a pointer to the first element of b.

It's important to understand that b is not the same as a pointer to its first element, and likewise that b[0] isn't pointer, but an array, since sizeof b[0] needs to be 5 for b[i][j] or *(*(b + i) + j) to work.

C arrays are not pointers but

C arrays are not pointers but regions of memory (either on the stack or on the heap). But the variables which name arrays are implemented as pointers to the corresponding memory region. Sorry if I expressed myself too sloppy in the introduction of my article.

Actually they aren't

Actually they aren't implemented as pointers.

To understand it you might find the following questions useful.

char c[3]; What is sizeof c? What is sizeof (c + 0)?