Behaviour Diffing

Alrighty, another question for which I'm not sure how to start looking for answers. Don't worry, this will probably be the last one for awhile.

The idea for this one is that you have version 1 of a program, and you make a change to it. Now how does version 2 differ from version 1? Or put another way, what inputs/test cases end up returning different results from before?

The purpose of this is to develop a tool that will help to ensure one is not introducing any bugs. I'm aware that there are some computability issues. I'm not asking for anything that solves the halting problem. I think that such a tool could still be useful.

I've been studying static analysis and formal verification. With more study I think I could answer this question for myself, but that's going to take forever :)

And, on a more general note, are there any heuristics for figuring out what to look for when one wants to answer a question like this? Or is it you just keep studying and following leads until you get somewhere? I don't mind the latter, it's just a lot of work that I'd rather avoid if I could :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Automated Debugging

Take a look at the literature about automated debugging. There are plenty of (pragmatic) approaches to capture program behavior; however, most of them are dynamic. Relevant (software engineering) conferences are ICSE, ESEC/FSE, ASE, and AADEBUG.

Regression testing work

This question comes up in regression testing, also. One person who has done work on this in regression testing is Gregg Rothermel at U-Nebraska-Lincoln. You can get to his pubs here. Check the older work, '93 through about '98. He has the abstracts up, so I would hope you can get an idea by reading them.


Behavior diffing

For simple programs that are recursive in nature (e.g. compilers), I wrote a hierarchical logging library to generate a log of program activity as an text file with varying levels of indentation, and use a standard diff tool to compare them.

This is an area where development tools could provide a lot of value. Determining when the execution two versions of a program diverge is an extremely labor-intensive task to carry out manually for complex programs.

denotational semantics and bisimulation

Related to this are denotational semantics and bisimulation. The former brings us the idea of specifying the effects of a program as a set of traces, so we can compare the set for both old and new versions to see what changed. The latter show us that we can call two programs equivalent if they are equal up to bisimulation (i.e. there can be foung no state transition that produces different effects). These are the theoretical foundations to Tim's approach.
Now the trick is producing and verifying these program traces. AFAIK the most successful approach is to use abstract interpretation, but the field is too young right now.

operational vs. denotational semantics

I think you mean operational semantics, which is indeed a set of traces - denotational semantics is where you specify the meaning of a program via some meaning function.

operational vs. denotational semantics in process calculi

Although I'm not sure exactly what Daniel had in mind, in the world of process calculus/algebra (whence bisimulation originates), the denotational semantics of a particular calculus is typically defined as a function for mapping process expressions to sets of (among other things) event traces: the "meaning" of a process is the set of sequences of events (inputs and outputs) the process might produce. The operational semantics for a calculus defines the meaning of a process as a transition system.

Better late than never

I didn't read LtU for some time, but Allan answer was exactly what I had in mind (and would answer). Thanks.

Ample for Java

This is an area where development tools could provide a lot of value. Determining when the execution two versions of a program diverge is an extremely labor-intensive task to carry out manually for complex programs.

You might find our Eclipse plugin Ample interesting. It instruments Java code such that for all objects short sequences of incoming method calls are collected in one set per class (in memory). These sets are compared across passing and failing unit tests. Classes that differ the most are suggested as potentially buggy.

One of our unrealised ideas is to use this mechanism as a trigger in a debugger. Observe one run and then halt the debugger every time we see a method sequence that we haven't observed in the previous run.

logical vs statistical

In limited languages, change impact analyses can be very useful (and the changes, verifiable).

Alice Zheng and Ben Liblit employed machine learning techniques in some neat work using data like that from Ample as a basis.

I think it's cool :)

The cites

The paper is available from Alice's homepage along with many other interesting papers in a similar vein. If you're super keen, the Mining Software Engineering Data Bibliography has a bazillion other related papers.

In my opinion approximate and statistical reasoning is the great unexplored frontier of programming lang. theory.


Trailing slash in the link ( ). That page is great, I had only seen a couple from each category before. Thanks!

- Leo