Software Cartography and Code Navigation

A recent PhD dissertation by Adrian Kuhn; abstract:

Despite common belief, software engineers do not spend most time writing code. It has been shown that an approximate 50-90% of development time is spent on code orientation, that is navigation and understanding of source code. This may include reading of local source code and documentation, searching the internet for code examples and tutorials, but also seeking help of other developers.

In this dissertation we argue that, in order to support software engineers in code navigation and understanding, we need development tools that provide fi rst-class support for the code orientation clues that developers rely on. We argue further that development tools need to tap unconventional information found in the source code in order to provide developers with code orientation clues that would be out of their reach without tool support.


Among the code orientation strategies used by developers, spatial clues stand out for not having a fi rst-class representation in the ecosystem of source code. Therefore, we introduce Software Cartography, an approach to create spatial onscreen visualization of software systems based on non-spatial properties. Software maps are stable over time, embedded in the development environment, and can be shared among teams. We implement the approach in the CodeMap tool and evaluate it in a qualitative user study. We show that software maps are most helpful to explore search results and call hierarchies.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

More Resources

This is the first I've heard of the idea of software cartography, but it certainly has my head spinning with possibilities.

what to measure?

Software visualization is of interest to me. And I certainly spend a lot of time lost in source code. So, to map anything one needs a way to measure distance. So what metric does this author propose?

We propose to use vocabulary as the most natural analogue of physical position
for software artifacts, and to map these positions to a two-dimensional
space as a way to achieve consistent layout for software maps. Distance between
software artifacts then corresponds to distance in their vocabulary. We use a
combination of the Isomap algorithm [179] and Multidimensional Scaling [33] to
projection the high-dimensional vector space model onto the two-dimensional
visualization pane. Finally we use cartographic techniques (such as digital elevation,
hill-shading and contour lines) to generate a landscape representing the
frequency of topics. We call our approach Software Cartography, and call a
series of visualizations Software Maps, when they all use the same consistent
layout created by our approach.

Apparently his formulation of vocabulary is api usage. So there's some notion of the amount or selection of api's used in given functions, source files, packages, projects represents distance. Interesting take on the matter, warrants further study but I am dubious as to the benefits.

I think the constraining

I think the constraining issue is what can be measured. API usage is fairly easy, and the result can be useful when trying to make sense of how to use a library (seeing as the popular APIs are the ones you are likely to use!). They do some qualitative users studies to validate their approach, but I'm not confident that their results are very meaningful. Still, the idea is good and it can help us to start thinking about the topic of understanding libraries.