Balancing the tension of dev vs. debug?

Java's enhanced-for loop means you don't easily know what item is currently used out of the collection when you stop in the debugger, because there isn't an index any more. (You can laboriously look for a matching reference value in the collection I guess.) Seems like use of functional approaches such as map() or filter() must 'suffer' from the same issue. Is it the case that debugging is very different than developing? The former seems to want to be able to reveal any and all information, the latter to hide as much as possible. Are there languages which somehow manage to do a great job at serving both masters?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Lazy functional languages

Lazy functional languages are notoriously difficult to debug. We had numerous threads on the issues involved and the tools invented to overcome them. Truthfully, I don't I head of a good solution...

(Mind you, I am religious in my disdain for debuggers in general, so I am not the best person to comment on these issues).

GHC's debugger is worth

GHC's debugger is worth playing with, FWIW. Though the last time I tried to do something mildly clever using ST RealWorld to poke around inside some ST computations piece by piece something didn't want to type right (but that wasn't with the current version of GHC, either).

Different Techniques

I should start by saying that I tend to avoid debuggers for languages other than C/C++. And then I only use them to map seg-faults onto line numbers. For debugging in any language I've always use the "add layers of prints" approach to find the problem. The simple reason for this that I find the interface of writing chunks of code to filter the stream of events that a program creates to be easier than any utility that I've tried to do the same thing.

When you debug-by-printf your first problem disappears. Although you don't know the index within a collection you can display what the item is rather than fighting with references.

Debugging declarative languages tends to be very different. One of the nicest things (and some would say worst) things about Prolog is that you can abuse the procedural semantics to cause side-effects at any point in the execution. In this setting I tend to use the same set of tricks as debugging within an imperative language.

When working in a strongly typed language, such as Haskell, it is not possible to litter the code with IO effects without major rewriting of the functions involved. The main technique in that situation tends to be lifting the function out and testing it in a different context. IE writing a wrapper for the function that allows IO to occur.

In general the only viable approach that I've found for debugging in Haskell is to keep functions small enough that you can debug them in your head. When you hit problems with higher-order code like filtering / mapping it seems to be easiest to split the function into smaller steps and convert your HO operations to map onto Strings that can be output to check each intermediate step.

One general characteristic that I've come across is that weakly typed languages make it easer to debug code, but harder to ensure correctness. Strongly typed languages make it easier to enforce correctness, but more difficult to verify it when something goes wrong. I'm not aware of a functional language that solves this problem; one way would be to have debug statements that act as side-effects on a stream of events. Basically emit statements that are invisible during normal execution of the program, but can be swiched on to examine how execution proceeded.

For debugging in any

For debugging in any language I've always use the "add layers of prints" approach to find the problem.

This is far more difficult in web applications. I've found the graphical .NET debugger very decent overall actually. It's much faster than inserting print statements generally, though not always.

Yes this is true. In general

Yes this is true. In general inserting print statements breaks down for any kind of interaction. Web apps are a good example, and GUIs can also be difficult to fix without a decent debugger.

One workaround

I noticed this problem and came up with a heavyweight solution for the project I was working on - redesign the project so that it can always be debugged in a form amenable to print-statements. For interactions the solution is to record the input at an abstract level, with an event layer separating the form of input(GUI, Web app, etc.) from the internal state. One ends up with an Observer/MVC-style of architecture; after doing that, you can simply create a command line interface, record some test cases with the production interface, and then play them back one step at a time. Lots of benefits result from using this approach.

When working in a strongly

When working in a strongly typed language, such as Haskell, it is not possible to litter the code with IO effects

Whilst that's true in general, the Debug.Trace haskell module provides this function:

trace :: String -> a -> a

This lets you print arbitrary values at arbitrary points in your code without requiring type surgery.

I was waiting...

...to see other comments before chiming in.

First, I tend to agree with your conjecture that debugging is different from designing (in my vocabulary, both are part of the higher activity of developing, but one hopes the latter is a much larger part ;-) Working to conceive an appropriate set of concepts and express them effectively seems to be a different "mental mode" than the detective work of determining why an expression doesn't capture the intended concept. However...

I honestly can't remember the last time I used a debugger. (I admit that it was a normal part of my toolkit in the days when I was writing device drivers for MS-DOS.) And the for-each-in construct has been around for so long (decades at least) that I was just annoyed that Java waited so long to have it.

But since reading your post, I've been pondering the original issue you raised--the interplay between language features and defect diagnosis. I believe that an emphasis on design-for-testability in general (and the use of unit testing specifically) plays a role in the answer. I believe that features of the language that support the creation of "units" of code that can be reasoned about and tested in isolation contribute throughout the development cycle (design, construction, and defect diagnosis/elimination). For example, in Java, it's OO and the ease of developing against interfaces instead of concrete classes. (Of course, quite a few of the patterns common in OO are subsumed by higher-order functions in FP.)

Thinking about my own recent experience, it seems that most defects relating to iterating over a collection have to do with one of the following design errors:

  1. Putting an object in the collection that shouldn't have been there to begin with (based on the intended meaning of the collection).
  2. Conversely, failing to have included an object that should have been present.
  3. Misusing the object obtained from the collection.
  4. Failure to maintain state appropriately between successive members of the collection (e.g. incorrectly modifying an accumulating value).
  5. Making an incorrect assumption/dependency about the ordering of the collection.

For my education, can you give me a use case for needing to know the index when reasoning about a defect?

Re: use case

I don't have anything detailed and very concrete off the top of my head, apologies. Uhm, drat. If I come across something soon I'll post.

A component of it is the desire to be able to get my bearings inside some code; the more data i can inspect to reassure myself what is vs. what isn't working right, the more comfortable i am that i'm on the right track.

Add a counter

I think the best way to see how many items are treated before the bug happens is to have a counter which tells you how many items of the collection were treated. The nice thing is that it can be extended to the parallel case (computations executed on items in parallel).

If you tell the compiler that you want a debug executable it should be easy to add that automatically, however in Java I do not think there is any notion of "debug executable". I have no idea what the bytecode output for a enhanced-for loop looks like and whether it is amenable to that extension.

Let the debugger add the counter

Most debuggers these days will allow you to set a break point that only triggers after it has been reached a certain number of times. The debugger holds the counter, rather than the executable.

Re: not needing a debugger

I've heard that before from some number of people. It has always completely flabbergasted me; I regularly need to use both printf and interactive source debugger style debugging to figure out what the heck is going on / wrong. :-)

It would be interesting (cf. the recent request out of MSR to research people doing FP development, presumably including debugging? or the stuff Brad Myers has done) if there were some long-term on-going research into how people debug, what causes them to. I suspect there are many factors involved in deciding how to approach debugging:

(a) what you are familiar with in terms of tools. i often see Java folks who are apparently afraid / unwilling to use the debugger i think simply because they've never tried it. the thing about using a Java debugger vs. printf is that you can answer more new questions w/out having to restart the system.

(b) what your code base is like. if it doesn't suck a lot then maybe just reasoning about it will work. but if you have stateful gnarly yukky spaghetti code (even in Java, shock!) then it can be harder vs. just doing empirical study.

(c) what language you are using.

(d) what your system is doing; is it batch, or interactive? is it floods of data, or always just waiting for the user to do something? etc.

(e) how you learn as an individual. if i'm trying to fix a bug, i tend to like to be able to see / browse as much data around where i think the bug is as possible, to get a feel for things. i like to have access to any and all data so i can try to get my bearings both in terms of what code is executing, and e.g. where in the input data the 'cursor' is. this can include stupid stuff that should really be assumed to be working like oh does the iterator really work, or whatever vs. leaving all the inner state of the abstractions hidden. (that really drives me nuts in the case that happens a lot with Java of some kind of reflection or aop or lazyness or whatever e.g. with mocking for unit tests, or with xml parsing libraries, etc.)

(f) yadda yadda yadda.

Visual Studio 2008 SP1 + C#

is supposed to allow fine-grained debugging of LINQ queries, including using LINQ in the visualizer.

Back in time debugging ...

I came across this talk a while back and found it interesting - Debugging backwards in time by Bill Lewis - a tech talk at Google.

Interesting talk. Bill

Interesting talk. Bill Lewis' work can be found here. Similar technology was included in Omnicore CodeGuide a year or two later as well.

Only two "L"s

Later searchers may want to notice that Bil Lewis' first name has only one "L" in it.

A hardware approach

Tom Cargill at Bell Labs once suggested a simple hardware scheme for stepping programs backwards using a cycle counter with an interrupting comparison register. Instead of keeping a log of all updates and incrementally undoing them, you run the program once, snapshotting every once in a while. Imagine a binary tree in time of all possible snapshots. Record just the
snapshots along a path to the root. Then running backward just involves running forward from the most recent appropriate snapshot, updating a few (usually) snapshots on the way. This steps the program backward in constant amortized time per step with only a logarithmic space penalty. It works for any program that you can checkpoint/restart and requires no changes to the running software.

Bart Locanthi built an interrupting cycle counter into his Gnot workstation, but Cargill switched jobs before implementing this idea.

the obvious way

This is probably more or less how UndoDB works. I have not used it but according to its documentation it takes snapshots by forking, and likely logs all data it gets from syscalls.
The OCaml debugger also uses forking, by the way.

There are also simulators able to run backwards. This is obviously easier to accomplish, but the principles do not have to be any different.

Re-execution needs to be deterministic. I'm not sure how UndoDB handles multithreaded programs - perhaps they have their own deterministic scheduler.

Counting instructions rather than cycles seems more useful on modern hardware where the number of cycles to run a piece of code can vary from time to time.

Web Debugging

I always write a a log file and then just do this :-
tail -f /log/file/name

and that works for just about any web language on the planet!