archives

Crowd Documentation: Exploring the Coverage and the Dynamics of API Discussions on Stack Overflow

A quite interesting and definitely LtU-relevant blog post on "Crowd documentation". Which is actually just a good sales pitch for the following technical report: Crowd Documentation: Exploring the Coverage and the Dynamics of API Discussions on Stack Overflow

Chris Parnin, Cristoph Teude, Lars Grammel and Margaret-Anne Storey (2012)

Traditionally, many types of software documentation, such as API documentation, require a process where a few people write for many potential users. The resulting documentation, when it exists, is often of poor quality and lacks sufficient examples and explanations. In this paper, we report on an empirical study to investigate how Question and Answer (Q&A) websites, such as Stack Overflow, facilitate crowd documentation — knowledge that is written by many and read by many. We examine the crowd documentation for three popular APIs: Android, GWT, and the Java programming language. We collect usage data using Google Code Search, and analyze the coverage, quality, and dynamics of the Stack Overflow documentation for these APIs. We find that the crowd is capable of generating a rich source of content with code examples and discussion that is actively viewed and used by many more developers. For example, over 35,000 developers contributed questions and answers about the Android API, covering 87% of the classes. This content has been viewed over 70 million times to date. However, there are shortcomings with crowd documentation, which we identify. In addition to our empirical study, we present future directions and tools that can be leveraged by other researchers and software designers for performing API analytics and mining of crowd documentation.

Implementing abstract classes automatically?

Here is an interesting construct I'm working with recently. Say we have an abstract class (trait) with one abstract method do:

trait A {
  abstract void do();
}

Now say we have two implementations:

trait B : A {
  override void do() { ... }
}
trait C : A {
  override void do() { ... }
}
object myUnderspecifiedA : A { ... }

Given an under specified object that extends A, the compiler could choose either B or C to "complete" it; perhaps we have a model that expresses that an object that extends A more often extends B rather than C, so why not just select that? Also, the object might only be able to extend one trait and not the other to implement "do," consider:

trait D : A {
  abstract void bar();
}
trait E : D {
  override void bar() { ... }
}
trait F : D {
  override void bar() { ... }
}
trait B : E {
  override void do() { ... }
}
trait G : A {
  abstract void foo();
}
trait C : F, G {
  override void do() { ... }
}
trait H : G {
  override void foo() { ... }
}

object myUnderspecifiedAWithF : A with F { ... }

In this case, the compiler could have the object extend C but not B, since in the latter case the bar method would be implemented in two different ways. Now, the completion of an object with additional traits is recursive: because we have the object extend C, we must now find a trait to implement the abstract foo method from an extended G trait, so we then also choose H (the final solution includes A, F, C, H). Finding a solution for the object then involves finding a trait that implements an abstract method, adding that trait to the extend list, then solving for additional abstract methods; the solution could always dead-end and require backtracking, so this algorithm kind of looks like a Prolog solver.

Has anyone heard of a class system like this before? My goal is to have a language where you can under specify a program and the compiler will fill in the gaps by finding traits to implement abstract methods that maintains consistency. A "best" solution could be chosen based on a model that includes "can-be" probabilities.