Software Development with Code Maps

Robert DeLine, Gina Venolia, and Kael Rowan, "Software Development with Code Maps", Communications of the ACM, Vol. 53 No. 8, Pages 48-54, 10.1145/1787234.1787250

Getting lost in a large code base is altogether too easy. The code consists of many thousands of symbols, with few visual landmarks to guide the eye. As a developer navigates the code, she follows hyperlinks, such as jumping from a method caller to a callee, with no visual transition to show where the jump landed. ... Better support for code diagrams in the development environment could support code understanding and communication, and could serve as a "map" to help keep developers oriented. ... Our goal is to integrate maps into the development environment such that developers can carry out most tasks within the map.

Although the focus of this article is largely on "Code Map as UI", there are hints of the possibility that we might eventually see "Code Map as Language Element" (for example, the comment that "An important lesson from the Oahu research is that developers assign meaning to the spatial layout of the code. Code Canvas therefore takes a mixed initiative approach to layout. The user is able to place any box on the map through direct manipulation..."). The same ideas will of course be familiar to anyone who has worked with environments like Simulink, which provide a combination of diagrammatic structuring and textual definition of algorithms. But in the past such environments have only really been found in specific application domains -- control systems and signal processing in the case of Simulink -- while the Code Map idea seems targeted at more general-purpose software development. Is the complexity of large software systems pushing us towards a situation in which graphical structures like Code Maps will become a common part of the syntax of general-purpose programming languages?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

code maps = literate program

A poor man's alternative to this is of course Literate programming, where chunk definitions represent the blocks of code and chunk references represent the relationships. Standard graph-layout algos and pattern matching on the chunk names can give a very useful clustering/spacial layout in my experience.

Making it visual and practical BTW is a matter of a very simple script (provided that you already have a literate programming environment in place), which is has been in use for many years. See, for example my pet project that uses noweb.

Now, if we only extend the chunking language a bit towards Turing-completeness... ;-)

Org babel

The babel extension of the Emacs mode Org mode may be an interesting development for literate programming. For example, it allows to mix languages to a certain extend. Not sure, though, how well it works for large-scale projects, which is what the above research aims for.

Now, if we only extend the

Now, if we only extend the chunking language a bit towards Turing-completeness... ;-)

Why? And what is "a bit"? Do you want Turing completeness or not? And Why?

Well, I guess for code maps

Well, I guess for code maps (or LP) to become a practical replacement for direct source-code line editing in an imperative language (which seems to be the primary target of both code maps and LP), you'd need to provide the ability to build maps as predictable composition of chunks. That implies the need for abstraction/specialization of some kind ...

Like providing the ability to parametrize chunks (ultimately, leading to chunks becoming lambda's). In web and noweb chunks are thunky - no parameters are allowed. And they are first-order. Which leads to some serious limitations w.r.t. abstraction capability.

A "bit" because recursion is probably not useful on the chunking level. Unless you want to implement a code generator in it of course;-)

But what kind of power are

But what kind of power are you looking for? Do you think you would want to compute the transitive closure of a relationship in an object graph and then comment on it? I mean, what are the use cases you have in mind?

As for "imperative language", I don't think that is a distinction Knuth cared about. Knuth cared about really efficient code and one of the things he has struggled with is figuring out how a compiler could possibly do the micro-optimizations he does in MIX and MMIX. Burstall and Goguen's approach of defining specifications in terms of theories was the first approach that made sense to Knuth, although he attributed the idea to Hoare.

As an aside, some of Knuth's coding style is less than ideal and he can get away with it because (a) he is Knuth (b) he is not dealing with abstract problem domains with detailed problem decompositions, in that the core problem never changes; many problem domains have explicit requirements but they are unstable, and many problems have vague requirements due to insufficient expertise and vision.

It's interesting that the Code Maps article mentions Code Bubbles in the conclusion, implying the two projects are deeply related. Code Bubbles in a Java IDE that uses Eclipse JDT for the back-end but WPF for the front-end. The two communicate via RPC. Code Maps appears to provide a canvas facade over Visual Studio 2010 (the zooms into the source code look like Visual Studio 2010 code, and WPF 4 font rendering).

I recall that one of the

I recall that one of the reasons Knuth created web was to overcome inherent limitations of the language he used at that time - Pascal. So one can for example define templates and use them throughout your program. Or use chunking as a poor man's alternative to inlining.

Of course this is much less of a problem in modern languages as varied as C++ and FP ones. And inlining is also solved by the compiler, LTO etc. But even there you can find certain limitations.

If there was a pre-processor (like tangle) that could fill this gap, then a number of shortcomings (e.g., templates having rigid structure, overheads etc.) could be addressed.

And yes, occasionally you even want to compute using the chunk graph (which would require some sort of an introspection). Or rigorously relate together the main chunk thread and a parallel chunk thread related to testing, verification, benchmarking ...

Overall, I don't see Code Maps addressing these issues but rather focusing on re-design of the old UI paradigm of navigation panes and tabs. Is it really more than just a editable visualization of existing stuff? And if the diagram is a first-class or a higher-order value, how can it be represented visually in a plausible way?

code maps != literate programming

code maps == literate programming

I don't see it. There is no serious amount of "tangling" - pulling together source files from out of an expositional presentation.

Rather, code maps give you a very limited way to break up a source file into multiple edit windows, so long as you break it up on method or class boundaries and keep those windows within the same directory box. Atop that are various window layout, syntax highlighting, arrow drawing, and hiding features.

In TeX for example, fragments of code (say, all the global variable declarations in a pascal program) can be split across many chapters so that each variable is explained alongside the concepts to which it most closely relates.

Code maps (so far, per the article and video) having nothing like that. Yet that is critical to the essence of literate programming: coding and expositional presentation by a "web of concepts" not tied to the physical layout of source files.

= and != both yield type cast errors here

code maps == literate programming
code maps != literate programming

Literate programming is such a fundamentally different topic than what is being discussed here.

Both of these suggest viewing literate programming in terms of concrete details. Donald Knuth had a much bigger vision than what TeX could afford when he created WEB. He wrote a book for us all to read, too.

[Edit: Most people refer to stuff that isn't true evolutionary step towards literate programming as "semi-literate programming".]

I will write a very long reply to this CACM article. Suffice to say I am not impressed by this article, since it seems to jump from topic to topic aimlessly and tries to connect ideas as if they belong together. It is best not to force ideas together. The natural exchange and evolution of ideas is the right way to improve software. Design in isolation - Oahu - is not.

literate and code maps


Literate programming is such a fundamentally different topic than what is being discussed here.

In some important ways I think you are mistaken. Both techniques aim to improve the mechanism - the practice - of writing, maintaining, presenting, and studying programs. Both attempt to do so by offering a human-facing presentation of the program which is, to greater and lesser degrees, divorced from the program as presented to the interpreter. In both cases, the human-facing representation allows for graphical rearrangement of the program and in both the selective hiding of details. Both allow the human presentation to be embellished with discussion and diagrams.

Perhaps most importantly, both code maps and literate programming allow authors to introduce abstract concepts which cut across the physical structure of the interpreted program, but which are vital to organizing and understanding the program.

On of the largest differences between code maps and literate programming, is that code maps are, in comparison, vastly more constrained in how those abstract concepts can be presented -- the constraints come because you can only get a little bit away from the actual physical structure of the interpreted program (the underlying "source files"). Literate programming is far more flexible (assuming we're allowed to make comparisons between book form and zoomable hypertext).

The natural exchange and evolution of ideas is the right way to improve software. Design in isolation - Oahu - is not.

I would guess that your comment there would perplex the authors. The Oahu project started from the observation that the natural exchange and evolution of ideas about a program, among people working most closely on the program, was very often in the hand-written form of these architectural studies -- various kinds of block and arrow diagrams, with accompanying notes, pictures, discussion, etc.

There is a kind of folk literature of that form.

What they've attempted to do could be described as building a wysiwig editor for those kinds of diagrams, making it a "smart" editor by directly integrating it with the underlying development tools (compilers, debuggers, code indexers, etc.).

I am not impressed by this article

I agree that it's kind of thin. I think they are doing R&D under the hard constraint of innovating on top of existing MSFT IDE foundations, doing little experiments to find what can be put together out of various UI bits and pieces they have handy. In this case they had the graph toolkit to play around with. So it's a bit like "given the IDE foundations and these UI pieces, what do you do?" and, being UI types, they observed the ubiquity of this diagramming and put 2+2 together. A more plainly spoken paper ("We did this. It seems cool. Here's how we did it.") might be a less frustrating read.

A more plainly spoken paper

A more plainly spoken paper ("We did this. It seems cool. Here's how we did it.") might be a less frustrating read.

Agreed. A more plainly spoken paper would have been nicer. But my major argument is that they jumped from strawman to strawman to results. There was no connection between any stages of this article to how they came up with the solution. It talks about Jane's job tasks, Jane's grief, and then tries to tie together her grief with a new solution. The best way to manage labor is to eliminate it, remember?

If you'd like, I can e-mail you my work in progress reply, which I have to shorten - it is way too long (2013 words and growing) and tries to (a) point out that their complaint is that Jane suffers disorientation through poor navigation and that they make the immediate leap to "code maps" as a way of solving the problem rather than studying how navigation is supported in IDEs today - Figure 3 is gross and just shows they've never seen a Smalltalk IDE or appreciated how the debugging environment works (b) deconstruct what modern IDEs do wrong (c) how Jane could solve her problems today without a "HOME Canvas" (d) questioning what sort of abstractions should be used to visualize code ... etc.

I also love how they don't actually explain how Oahu solves the problems they wrote so arduously in the beginning use case at the start of the essay. Really, this is awful writing. Probably the only reason they don't deconstruct current IDEs is because they don't want people to know how much Visual Studio sucks despite 8 million lines of code and millions of dollars in development budget.

Edit: By Design In Isolation, I meant that their use case is so ridiculously vague and trumped up!!! That is why their re-telling of how the use case works with Oahu is so completely vacuous and meaningless!!

then our tools shape us

building a wysiwig editor for those kinds of diagrams

we've had things sorta like that before, and they've all empirically kinda sucked, should you ask me. (a debatable thing to want to do.) not because they features sucked (well maybe they did, but that's not my particular chip-on-shoulder) but because the user experience of using a regular old computer sucked.

one thing i hope this upsurge in tablet adoption will bring us is a bit of a renaissance wrt visual languages, documentation, diagramming, manipulation thereof. because it it all nice for people to want to move from the whiteboard to something more digitally powerful and manipulable, but the interfaces have to date all sucked donkey poo when it comes to letting people really jazz free form riff talk brainstorm about things. the mouse sucks. touch screen vertical monitors (unless they are huge whiteboard kinds) suck.

(somebody hire me to work on it? ;-)

tablets / visual languages

one thing i hope this upsurge in tablet adoption will bring us is a bit of a renaissance wrt visual languages, documentation, diagramming, manipulation thereof.

I think that's a large part of why the authors of the paper are playing with scroll-bar-free, panning, zooming, mostly-tiling UI tools.


The CACM article doesn't mention it but Oahu is a Surface codename project. The 75,000 line c# project they refer to is just one experiment conducted by the Oahu team. Supposedly there are others.

Well, if there is no

Well, if there is no tangling then there must be some "untangling"!
What happens if you suddenly introduce a new class or a method inside one box? Will it automatically cause creation of another box?

LP can be perfectly used without C and/or TeX. If I remember correctly this was the point of noweb. You can also embed C in C or Haskell in XML if you want.

Doesn't a code map representation of the code resemble chunk graph representation, where presentation is guided by the design rather than by language limitations (physical layout of source files)?

re: "untangling"

What happens if you suddenly introduce a new class or a method inside one box? Will it automatically cause creation of another box?

When you plunk a text-edit cursor down in a box, it is nothing more or less than at a particular position within a particular underlying source file. If you insert a new class or method, it is (exactly) like inserting that text in the underlying source file.

The IDE atop which they are building -- the one that predates code maps -- keeps an automatically updated index of where each class and method can be found. To that, code maps add an index mapping file locations to particular boxes. Code maps use that for searching and navigating.

From what they describe, the user can explicitly pull a given method or class out to a separate box. It stays the same place in the underlying source file. But now that line of that source file maps to a different box.

There is a tiny bit of "untangling" in the sense that they keep track of how file lines map to boxes, but that's it. Also note that how things are broken out into boxes is somewhat restricted (from what they show).

Doesn't a code map representation of the code resemble chunk graph representation, where presentation is guided by the design rather than by language limitations (physical layout of source files)?

Only secondarily, is how I would describe it. For example, code map boxes break only at boundaries defined by language limitations. At least in the examples they show, boxes are also grouped by directory. That is one reason why it is not obvious how tangle-like concepts such as chunk appending and macro expansion could be introduced.

Sounds a bit like a graphical editor

Sounds a bit like a graphical editor for Beta language's fragment system.
Here you use slots in place of specific syntactical entities and separately create one or more fragments that can fit in matching slots. A fragment specification describes how to pull together a complete program from various files. This was strictly for physical structuring of a program. Reminiscent of structured schematic editors (like OrCAD etc. where you can separate the definition of a logic block from its use). Interestingly most hardware designers have moved away from schematic editors! Anything but the most simple stuff is designed using Verilog/VHDL, not schematics.

Excellent related reading in "See also" section

Jef Raskin's ACM rant in 2003 on "The Woe of IDEs", written two years before his death. Great quote about psychological effect observed in open source IDE development: "Almost no attention is paid to the user interface of the [open source] IDE because the programmers who participate generally have not studied cognitive science and are unaware of the difficulties their own IDE designs are causing them."

i wish cubicon weren't apparently crack-ware

crack as in too crazy (in a good way) to ever become real. it sounded nice when i saw a presentation of it years ago. there's a web site, but not much else that i've heard of in a while.

it was, i thought, about providing different "weaves" to the user/dev.

Hard person to Google

Found this

The Future Is Now: The Architecture of the Semantic Net, which talks about his views on distributed architecture for collective intelligence

I'm (almost) afraid the future was yesterday actually...

Well, yes, yesterday.

IMHO, the current Code Maps and Code Canvas attempts are pretty interesting ideas.

But as I like to keep myself pragmatic usually, I tend to look first at what we've got already, and honestly, that's always the same thing that doesn't cease to amaze me for almost a decade now : we do have so much to keep our brains busy with, when you think of it.

So much, or rather, sooo many : languages, of course.

I've been having the feeling that maybe even more than the recent crave for marrying (or, confronting) the "object paradigm" with (or, against) the "functional paradigm", or the "visual" vs. the "textual", or etc, what we may need to really acknowledge/accept with the full extent of its implications is ... we are actually experiencing a paradigm shift, but not towards either of these ends, which keep breeding others, btw (e.g., reactive programming is another "new" kid on the block)

As I see it, it's just not for us about finding, "by luck" or not, what is "magic" formula of combination of these (functional unifying OO? functional only? etc) it's more about enabling *the choice* of combination, and for the long run. Because it seems to me that it all comes down to only one object being observed in the end, redesigned, augmented, tinkered with, derived from, ... (you name the labour) : the languages being invented or improved, precisely!

Sorry I can't disclose more than these shy screenshots about this and that for example, but my latest and still current approach is thus to acknowledge this fact that we always end up needing, in our tools and skillset, the best bridges possible (preferably two way, and in an isomorphic fashion, that's even sweeter when we can) *between* languages. These pictures being just an example in the specific problem space of systems' domain specific modeling, to have a useful source of truth before codegen; but it's not just about modeling, of course, it's also about provers, verifiers, etc.

So, I wrote "yesterday" in the comment title alluding to the fact that I'm only (now, 2010) about to put at real work these linked above, while I *could* have, technically, implemented them as early as... three years ago, already. Well, yes: the underlying toolset in that instance --the MS DSL Tools-- having become stable in Feb. 2007.

So, if we're willing to target modernity, while the research is for sure always something to watch as attentively as regularly, we really have so much (too much?) to do with the tools we've got already. Man.

But maybe it's also time to realize that to satisfy our hunger for power in these tools and languages, we'll have "to pay the price" of looking at the latter as the real new *de-facto* first class of building blocks to deal with -- I suspect they are, already, first-class, but it's likely still in our subconscious only (except mine ;) and for sure not acknowledged yet in our tooling and our practice.

My .02

The Future was Yesterday

I thought companies already provide solutions like these, see for example

Sorry for the confusion

Sorry for the confusion, my bad. I was actually alluding to the tooling support for the design and implementation of one's new language(s) reusing the design and implementation of the legacy, and not "just" language artifacts management, no matter how large its scale is.

treemap fancy visualization

Shameless link to my own related work

Well, you can...

... to me, that's really neat! I'd love to have the same integrated in the Visual Studio beastie I have to use, anyway, that's for sure.

Thanks for the shameless link, then. Very to the point. :)

Should work on windows too.

Should work on windows too. It's using cairo + gtk + ocaml and all 3 are windows compatible.

Did you see the See Also section of the article?

There is a link to some work on using heat maps to detect performance issues: Visualizing System Latency by Brendan Gregg

You might also be interested in last year's PLATEAU workshop at OOPSLA (now SPLASH). I've also seen a preview of some of the papers for this years PLATEAU workshop at SPLASH, and some of them provide some good insight into software psychology.

links to those interesting

links to those interesting papers ?

they're rolled into one tech

they're rolled into one tech report for PLATEAU 2009.

As for 2010, I surveyed the early PLATEAU 2010 workshop program based on talk title and e-mailed some of the speakers for a pre-draft. The advantage to seeing pre-drafts is that when you are listening to the talk you don't have to write as much down and when you are verbally processing information your brain only retains 3 out of 7 words.