Define it twice -- preemptive bughunting or waste of time?

Suppose I allow multiple definitions of routines/classes in a programming language, with the caveat that anything multiply defined must have identical semantics (identical interface, identical results, identical effects if non functional) and different syntactic structure in every definition.

Then have the development environment (but probably not the runtime environment) run multiple copies of the program, synchronizing just before and after each call to the multiply-implemented routine or class. If different results/effects are detected by a comparison of the program states at return time, halt with an error message. Clearly, if both (or all) implementations are supposed to have the same semantics, then when results/effects diverge, it is evidence that at least one has a bug.

It seems unlikely, though possible, that both would have the same bugs, leading to rare "false positives" in which the test suite completes without error despite a lurking bug. False negatives however -- where the test suite halts with a comparison error despite the tested code having identical semantics, would be eliminated. Every time the alarm goes up, there is definitely something wrong.

Refactoring would also be easier; when a new implementation is supposed to provide the same services to the rest of the program as the old one, you could run them simultaneously rather than serially through a test suite, and make sure that, in fact, they do. This would also mean you can add test cases spontaneously and run them parallel, rather than having to reconfigure your program (twice!) to test both implementations against each other on a new test case you add/find after the first run of testing.

A semi-random thought about development environments. More a programmer convenience, perhaps, since you could do the test and comparison by hand as well, but nevertheless an important complement to static analysis. It would allow you to test things well beyond the ability of a typical type analysis to make sure that they are and do, in fact, what you want.

Anyone seen something similar, or is this actually an original idea?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Intersection of Definitions

In general when we 'define things twice', they aren't the full definition both times. I.e. we use a type for one definition, and a value for another, and they must intersect (the value must be an instance of the type).

Constraint programming is basically an ad-hoc intersection of multiple definitions. One could also simultaneously support both allowances (extending the possible solutions) and constraint (eliminating possible allowances).

Define it twice, OK. Run it twice, not so much.

I suspect you'll get few positives, and many false negatives:

I once worked on a project porting an application from one operating system to another. We had an automated test suite. It found a 'divergence' on every screen and every report.
Why? because every screen and every report had a date and time stamp.

If you run multiple implementations on the same machine (database) you're going to generate duplicate keys, but not-quite-duplicate rows (they'll differ by timestamp). Perhaps the implementations generate db updates in a different sequence, but it's the same set of updates at each commit point. Results diverge but neither implementation has a bug.

For refactoring, suggest you look at automated tools for regression testing.

Okay, yeah, valid on functions not procedures.

Fair call. This technique would be valid only on functions. If you have some procedure that isn't a function (ie, something that alters or depends on state (such as the time of day, etc) other than that passed to it via arguments) it's not as useful a technique unless you can abstract external state as well.

That's how Airbus does it

The Airbus flight control system has the same functionality implemented by two teams in different locations using different programming languages on two different kinds of CPU. The results are compared constantly during flight.

http://www.cs.st-andrews.ac.uk/~ifs/Resources/CaseStudies/Airbus/Airbus-fcs.pdf

can't find it now

i thought i'd read something similar about a software project back in the 20th century and that the result was not that exciting, that the software had sort of ended up being too similar, because it was humans doing it in both teams. if we could get dolphins instead...

Dolphins might produce

Dolphins might produce much the same. Even if similarities aren't intrinsic in the natural structure of the problem domain, and aren't intrinsic in the functioning of all general-purpose intelligence, the dolphins would have to be in cultural contact with us before they'd apply themselves to the task, so it's unlikely one could avoid contamination with similar modes of programming thought.

off topic

Off topic: John, your blog won't let me post comments. Could you please email me? jason.grossman@anu.edu.au

Whether your blog would let dolphins post is still untested.

Some paradigms promote similarity.

If people are developing something in any "total" style, they will be likely to produce code that is very similar.

Which brings up the question, what do I mean by a "total" style? For example, consider Java. It is an obligate-OO language. Every method must be part of a class. You can subvert it by just having one big class, but people feel weird about doing that, so it isn't likely. Two separate teams of programmers, if both using an OO paradigm, are likely to break down the problem into similar components, and then will find that the interfaces between those components require very similar methods, etc.

Consider Haskell. It is an obligate-functional language. There are effectively no mutation operators. Two separate teams of programmers, if both using a functional paradigm, are very likely to break down the problem into similar high-level functions, which will drive a similarity in low-level functions, etc.

Scheme, on the other hand, is an example of a language that doesn't enforce a particular style; you can solve the problem in an OO way or a Functional way or a Procedural way or whatever.

OO and Functional are "total" styles, as I think of them, because when you are committed to the programming style, there tends to be "one right answer" to design questions for a given problem, or at most a cluster of "acceptable" answers that are closely related. If you do it any other way without breaking from a total style, my experience is that you will eventually write at least triple the code. Maybe not all at once or to get the first prototype out the door, but you will.

N-version programming

This is called N-version programming, for N=2. See Wikipedia. Note particularly the study of Knight and Leveson, which revealed serious problems.

Thank you!

I had not seen that. Hmmm. Given the era of the studies, it's a good bet that all the languages in use (unless LISP was involved) were procedural and relatively unstructured. It would be worth revisiting with some languages that encourage conflicting "total" styles to see if the results still tend toward excessive similarity in that case.