archives

Software Cartography and Code Navigation

A recent PhD dissertation by Adrian Kuhn; abstract:

Despite common belief, software engineers do not spend most time writing code. It has been shown that an approximate 50-90% of development time is spent on code orientation, that is navigation and understanding of source code. This may include reading of local source code and documentation, searching the internet for code examples and tutorials, but also seeking help of other developers.

In this dissertation we argue that, in order to support software engineers in code navigation and understanding, we need development tools that provide fi rst-class support for the code orientation clues that developers rely on. We argue further that development tools need to tap unconventional information found in the source code in order to provide developers with code orientation clues that would be out of their reach without tool support.

...

Among the code orientation strategies used by developers, spatial clues stand out for not having a fi rst-class representation in the ecosystem of source code. Therefore, we introduce Software Cartography, an approach to create spatial onscreen visualization of software systems based on non-spatial properties. Software maps are stable over time, embedded in the development environment, and can be shared among teams. We implement the approach in the CodeMap tool and evaluate it in a qualitative user study. We show that software maps are most helpful to explore search results and call hierarchies.

Distributed capabilities versus network latency

With distributed capabilities we can send an object from computer X to computer Y. We do not want computer Y to be able to look at the internals of the object, hence we send a distributed reference to the object. If Y wants to do something with it, it will have to ask computer X to do it instead. This has two problems: (1) it causes latency on computer Y because of the network round-trip (2) it causes extra load on computer X.

In some cases the security we get from using references is not necessary. The data inside the object might not be secret, or it might be partially secret. For example suppose we have an Employee object, and send it over to computer Y. The name and ID of the employee are not secret, so we can send them over so that Y can do its work without further contacting X.

Another case is a mutable Employee object. Suppose that its name and ID are Mutable[String] and Mutable[Number] respectively. A Mutable is a mutable reference. When we send the object over to Y, we want to give it read access but not write access to the name and ID. Hence we should not send the distributed references to the Mutable[String] and Mutable[Number] objects, because then Y would have the ability to invoke the name.set(newname) method of the mutable reference. Instead we have to send over an immutable version of the Mutables.

Suppose the Employee object has a method incrementID that sets its ID to ID+1. When we invoke this on computer Y we will have to send a network request to X because we do not have write access to the ID. In this case it is not possible to eliminate problem (2) but it is still possible to eliminate (1). We can give Y write access to its own mutable ID reference. When Y calls incrementID it updates its local ID to ID+1, *and* it sends a network request invoking the method incrementID on X. This way Y can continue its further execution immediately, instead of waiting for the network request to complete. A malicious Y can do what it pleases with its own ID, for example it can decrement it. But it cannot mutate the ID stored on X other than via incrementID.

Is there a paper that describes primitives for doing this? Something along the lines of marking some objects as not secret, and then automatically provide either latency reducing operations (such as incrementID) or provide operations that completely eliminate network usage (such as getName), as appropriate. There might be problems with keeping the local and remote objects in sync, and not duplicating remote calls to other objects. Any ideas/input is appreciated.

Filtering system calls with a packet filtering language

LtUers will appreciate the new security feature of the Linux kernel that lets you run Berkeley Packet Filter programs over system call arguments. After all, Userland already knows its ABI: system call numbers and desired arguments.

Adding more interpreter-like features to the kernel has been discussed before in the context of splice, a zero-copy data transfer API:

(Of course, the "kernel buffer" notion does allow for a notion of "kernel filters" too, but then you get to shades of STREAMS, and that just scares the crap out of me, so..)