Contract Verification and Structural Subtyping of Interfaces

The following is a draft of a paper I am working on for OOPSLA. It is my first submission to a conference proceedings (I am doing this without any academic guidance!) any comments, crtiques, or suggestions would be most gratefully acknowedged. I haven't finished the bibliography yet, and marked sections which I plan of referencing using the [xxx] placeholder.

Abstract

Structural subtyping is a straightforward and powerful way of extending the abstraction of Java style interface types, however it is not without its shortcomings. I describe in this paper how, by including the verification of preconditions and postconditions, we can solve the commonly associated problems of naive structural subtyping, leading to more modular and robust software.

Introduction

The eternal question faced by software developers is how to write software which is both more modular and more robust. Interface oriented development has demonstrated itself to be an effective way to develop software which can be more modular than traditional class based OOP development techniques. Interfaces are allow a programmer to decouple the usage of the methods of an object from its actual implementation. In languages like Java there is still exists a coupling, between an interface and a class, in that a class has to declare a-priori its relationship to an interface. Languages like Standard ML [xxx] provide a more powerful and abstract alternative in the form of structurally subtyped interfaces called signatures. Signatures are are arguably too powerful, and can lead to a new class of errors due to structural conformance with incorrect semantics.

Contract verification, specifically the verification pre- and post-conditions, is a well-established and effective way of reducing defects in both procedural and object-oriented programming languages. Contracts also act to document the requirements and obligations of the methods [MeyXX], effectively expressing an object's semantics. One benefit of this, which is the focus of this paper, is the observation that when signatures are used with contracts, accidental structural matchings can be detected as pre- or post-condition violations.

Interfaces

An interface is a set of function signatures. An object or class is said to implement an interface when it provides definitions for functions with signatures corresponding to each of the function signatures in the interface. An interface variable is usually implemented as a reference or pointer to an object which implements as the interface. In many languages (e.g. Java, C#) a class has the responsibility of declaring which interfaces it implements, which is a form of nominal subtyping. An interface is a supertype of the class, and the subtype relationship is explicit. This has the limitation that a class can't be bound to a new interface without rewriting and recompiling the source.

Nominal subtyping of interfaces has another problem, it limits modularity and thus reusability. Consider the case of a class written in Java representing an indexable stack of cats:

    class Cats
    {
      public int size() {
        return m.size();
      }
      public Cat get(int n) {
        return (Cat)m.get(n);
      }
      public void set(int n, Cat x) {
        m.set(n, x);
      }
      public void push(Cat x) {
        m.add(x);
      }
      public Cat pop() {
        Cat ret = (Cat)m.lastElement();
        m.setSize(size() - 1);
        return ret;
      }
      public bool isEmpty() {
        return m.isEmpty();
      }
      private Vector m;
    }
  
The question a programmer needs to ask, is "Which interfaces should this class implement?". The answer is, that depends on how this class is being used. We can easily guess at a couple of likely interfaces such as the following ICatArray or ICatStack interfaces:
    interface ICatArray {
      int size();
      Cat get(int n);
      void set(int n, Cat x);
    }

    interface ICatStack {
      void push(Cat x);
      Cat pop();
      bool isEmpty();
    }
  
Now consider the posssible future case of a producer which only stacks cat and a consumer which only unstacks classes. In the interest of reducing coupling these classes would only accept minimal interfaces such as the following ICatStacker, and ICatUnstacker.
    interface ICatStacker {
      void push(Cat x);
    }

    interface ICatUnstacker {
      Cat pop();
      bool isEmpty();
    }
  
So unless the class explicitly implements these interfaces, the potential for resuability is restricted. Here the programmer faces a dilemma, should they anticipate the possible extra uses of the class, and pre-design numerous interfaces? Anticipating all possible interfaces is impractical since even in this trivial example there are 7! possible interfaces to consider. Even just implementing a handful of extra interfaces in anticipation of such scenarios violates the agile programming principle known as the YAGNI (You Aren't Going to Need It) principle [xxx]. This principle states that a programmer should only code what they need at the current time and no more. Anecdotal experience shows that one of the biggest sinks of programmer time and productivity is writing code for scenarios which don't arrive.

Some languages offer more powerful interface constructs which support structural subtyping, and thus overcome the stated limitations of nominal subtyping. For instance Standard ML provides a signature construct [xxx]. There is also a GCC extension to C++ called a signature[xxx]. In ISO compliant C++ we can emulate structural subtyping using templates [xxx], [xxx].

Semantic Mismatching with Structural Subtyping

Structural subtyping of interfaces introduces a new set of problems. The interface may match different classes with unanticipated semantics.

Consider the following example of a simple last-in, first-out (LIFO) stack interface written using Heron:

  interface IStack<type T>
  {
    public
    {
      def push(T x);
      def pop();
      def top() : T;
      def isEmpty() : bool;
    }
  }

Assuming a pre-existing double ended queue class (CDeque) a simple implementation of a LIFO stack could be as follows:

  class CStack<type T>
  {
    public
    {
      def push(T x) {
        m.push_back(x);
      }
      def pop() {
        result = m.pop_back();
      }
      def isEmpty() : bool {
        result = m.isEmpty();
      }
    }
    fields
    {
      CDeque<T> m;
    }
  }

In Heron an interface variable can refer to an instance of the class even though the class does not explicitly implement the interface:

    CStack<int> stack;
    stack.push(1);
    stack.push(2);
    IStack<int> stack_ref = stack;
    stack_ref.push(3);
    printf("%d\n",stack_ref.top()); // outputs 3
    printf("%d\n",stack.top()); // also outputs 3
  
Notice that in Heron an interface variable binds to the class just like a reference does.

Now consider the following class implementation of a first-in first-out ( FIFO ) stack, commonly known as a queue:

  class CQueue<type T>
  {
    public
    {
      def push(T x) {
        m.push_front(x);
      }
      def pop() {
        result = m.pop_back();
      }
      def isEmpty() : bool {
        result = m.isEmpty();
      }
    }
    fields
    {
      CDeque<T> m;
    }
  }

The problem that arises with structural subtyping is that since the CQueue class also satisfies the IStack interface the programmer can legally bind an IStack variable to an instance of CQueue. This causes problems when a programmer makes natural assumptions about the semantics of the class bound to an interface. For instance consider the innocuous example of a reversal function:

  def reverse(IStack<int> x)
  {
    CQueue<int> q;
    while (!x.isEmpty())
    {
      q.push(x.pop());
    }
    while (!q.isEmpty())
    {
      x.push(q.pop());
    }
  }

  def demo()
  {
    Stack<int> s;
    s.push(1); s.push(2); s.push(3);
    reverse(s);
    // as expected s is now reversed

    CQueue<int> q;
    q.push(1); q.push(2); q.push(3);
    reverse(q);
    // q is not reversed!
  }
  

Solution

One simple and effective solution to the problem of semantic mismatching is to introduce contract verification. The idea of expressing and verifying contracts as part of an interface is not a new idea [xxx], however to the best of the author's knowledge, the proposal of a mechanism for combining contract verification and strucutural subtyping is itself novel.

The approach that was taken with the Heron language is to allow interfaces to be written explicitly, with the help of two reserved keywords: inner and that.

An interface variable can be thought of as syntactic sugar for a class which delegates all the member functions to a single object. Accordingly Heron allows the earlier IStack interface example to be written out equivalently as :

  interface IStack<type T>
  {
    public
    {
      def push(T x) {
        inner;
      }
      def pop() : T {
        result = inner;
      }
      def isEmpty() : bool {
        result = inner;
      }
    }
  }
  
The inner keyword is equivalent to a call to the bound object. Since the bound object is actually a sub-type of the interface, the keyword inner from the BETA programming language lexicon was chosen, to represent the call from super-class to sub-class.

In order to distinguish between the two kinds of stacks, we add assertions to verify the preconditions/postconditions. The pre and post macros are library defined macros which behave like the assert macro in C++ but can be turned on or off independently of each other.

  interface IStack<type T>
  {
    requires
    {
      def push(T x) {
        inner;
        post(!isEmpty());
        post(top() == x);
      }
      def pop() : T {
        pre(!isEmpty());
        result = inner;
      }
      def isEmpty() : bool {
        result = inner;
      }
    }
    extension
    {
      def top() : T {
        pre(!isEmpty());
        result = that.pop();
        that.push(result);
      }
    }
  }
  
The extension section contains function definition which are not required in the implementing class. The keyword that refers to the object referenced by the interface, rather than to the interface reference itself. The importance of the that keyword is that it allows the prevention of infinite recursion (e.g. push calls top which calls push ad infinitum).

With the introduction of the postcondition check in the push function that top() == x any usage of IStack with an instance of a CQueue class will trigger an assertion failure. For example:

  def demo()
  {
    CQueue<int> q;
    IStack< s = q;
    s.push(1); // no problems, yet
    s.push(2); // assertion failure!
  }
  

Implementation

The techniques described in this paper were implemented in the Heron programming language, using a Heron to C++ translation tool called HeronFront [xxx]. The Heron language is closely related to C++ and shares much of the C++ semantics. The more salient differences include:

  • implicit result variables - like Eiffel [xxx], Turbo Pascal [xxx] and other languages, Heron functions have implicit result variables. This makes it easier for the programmer to write post-conditions which examine the result of a function.
  • inner keyword - used in interfaces to call the corresponding function in the bound object.
  • that keyword - used in interfaces as a disambiguator to direct function calls to the bound object instead of the interface function.
  • visibility - class fields are always private, this ensuring that the only manipulation of a class is through its methods, allowing for easier verifications of invariants.
  • runtime polymorphism - the only mechanism for runtime polymorphism in Heron is through the interface dispatch mechanism. Virtual functions are more or less redundant, since every design pattern involving virtual functions can be rewritten using a callback pattern involving interfaces.
  • interface - the Heron interface consists of required functions, and extension functions.
  • guaranteed default construction - all Heron objects support default construction. This is used so that result variables, can be implicitly default generated. This is a design choice which was done more out of convenience than anything else.
  • comparison operators - all Heron objects support comparison using the == and != operator.

Heron Interface Syntax

The grammar for the interfaces in Heron is defined here in relation to the C++ grammar:

    interface ::==
      interface name template_params? { section* }

    section ::==
      inherits_section | requires_section | extension_section

    inherits_section ::==
      inherits { type_expression* }

    requires_section ::==
      requires { (function_def | function_decl)* }

    extension_section ::==
      inherits { function_def* }

    template_params ::==
      < type identifier [ , type identifier ]* >

    function_param ::==
      type_expression identifier

    function_params ::==
      ( function_param* )

    result_type_decl ::==
      : type_expression

    function_sig ::==
      def identifier result_type_decl?

    function_decl ::==
      function_sig ;

    function_def ::==
      function_sig { statement* }
  

Conclusion

The idea of verifying pre- and post-conditions in interfaces is rapidly gaining traction [xxx]. This paper has attempted to point out that not only is it a good idea but one which also works well in conjunction with structural subtyping to make it a much more practical and effective technique.

This work diverges from related work on adding contract verification to interfaces in that it doesn't prescribe how or even whether such a thing should be done, but rather enables it. The goal here is to enable the programmer to implement things as needed, rather than enforce a particular approach. This has a benefit of keeping the language complexity much simpler and maximizes programmer flexibility.

The approach of adding contract verification does not completely solve the problems that have been associated with structural subtyping. However, this is a case of where some medicine is better than none. It is hopefully an inescapable conclusion that the expression of preconditions and postconditions in an interface is a good thing, and when used in conjunction with structural subtyping it becomes particularly powerful.

A logical extension of this work is to use static contract verification techniques, such as the Boogie program verifier [xxx], to perform compile-time checking of contract conditions to avoid the overhead of a having to execute programs to catch incorrect use of structural subtyping.

Acknowledgements

Much thanks to Peter Grogono, Oscar Nierstrasz, Kresimir Cosic, Achilleas Margaritas, Howard Lovatt, Max Lybbert, and Pablo Aguilar for reviewing and commenting on various drafts this paper, and providing useful insights.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I like structural subtyping.

Hello Chris! Congatulations for your new job at Microsoft.

I like structural subtyping very much. I am going to write a small GUI library in C++ where the programmer does not use classes with virtual functions, but instead all that is needed is structs and functions. When a part of the library needs a vtable, the vtable is built "secretly" for each widget according to what functions are visible at the point of declaration. This allows me to re-use "methods" from classes that are not related but they have a common super-type. For example, a struct Button contains all the data needed by a button, then a check box and a toggle button share toggle behaviour without each "class" knowing about each other.

I am going to use structs and functions and not classes and methods because I want my GUI library to emphasize 'composition' of UIs from simpler structures, just like in functional programming where programs are made from composition of other programs...in other words, it is going to be a functional approach for UIs.

Structural subtyping has many advantages over the classic model, in my opinion:

  • it decreases the coupling of program modules.
  • it decreases the code needed for typing (no more having to type 'virtual', for example).
  • it decreases the time it takes to write a class; there is no need to think upfront if a method needs to be virtual or not (in C++ anyway).

Although I agree with your view that structural subtyping may lead to the problem of semantic mismatch, I have to disagree that it is a problem tied to structural subtyping. This problem exists anywhere where an object (or better said, a value) is used.

In the specific example you mention, the real problem comes from interpreting 'pop' in two different ways: a) pop from the back, b) pop from the front. As I think the term 'pop' generally refers to stacks where the back element is implicitely popped, the problem could be solved by not naming the two functions with the same name. In other words, 'pop' is not a proper name for a queue operation.

A large class of problems can be solved by simply choosing the right names for operations. But there may be a class of problems that are not solvable in that manner, so contract verification is a good solution. As I wrote you in the Unimperative boards, I have some ideas that I would like to discuss with you regarding the static verification of contracts.

Naming versus Contracts

Hi Achilleas,

The solution of better naming discipline only partially fixes the problem, since programmers can still have functions with the same syntax and different semantics (but in that case it would be accidental instead of intentional). I do agree that better naming however does significantly reduce the chance of this error happening.

Your project sounds very interesting. Please tell me more! I actually have thought about something similar as well: widget programming using COM objects. The whole idea has a lot of promise I think.

Also I would like to read more about your thoughts on static verification of contracts. You can email me privately at cdiggins@gmail.com or maybe post to LTU if you want a wider breadth of comments?

Well, I would like to make a

Well, I would like to make a Widget library that uses structural subtyping, like this:

//root class for all widgets
struct Widget {
    //widget properties
};

//generic paint implementation; it must be implemented by subclasses
void paint(const Widget &wgt, DeviceContext &dc);

//a button; it inherits from a widget
struct Button : Widget {
    //button properties
};

//a toggle button; it inherits from Button
struct ToggleButton : Button {
    //toggle button properties
};

//toggle button paint
void paint(const ToggleButton &tb, DeviceContext &dc) {
    //bla bla paint toggle button
}

//a check box; it inherits from Button
struct CheckBox : Button {
    //toggle button properties
};

//checkbox paint
void paint(const CheckBox &tb, DeviceContext &dc) {
    //bla bla paint check box
}

The vtable for each class would be constructed when a widget is inserted into another widget, or the main widget is instantiated for the screen. If C++ does not find an implementation for a type, it will naturally choose an implementation for the base type of that type, and thus have structural inheritance of methods at the point where it is needed.

I will sent you an email about my ideas on static verification of contracts at the email address you provided.