Tracking the Flow of Ideas through the Programming Languages Literature

Michael Greenberg, Kathleen Fisher, and David Walker, "Tracking the Flow of Ideas through the Programming Languages Literature", SNAPL 2015.

How have conferences like ICFP, OOPSLA, PLDI, and POPL evolved over the last 20 years? Did generalizing the Call for Papers for OOPSLA in 2007 or changing the name of the umbrella conference to SPLASH in 2010 have any effect on the kinds of papers published there? How do POPL and PLDI papers compare, topic-wise? Is there related work that I am missing? Have the ideas in O’Hearn’s classic paper on separation logic shifted the kinds of papers that appear in POPL? Does a proposed program committee cover the range of submissions expected for the conference? If we had better tools for analyzing the programming language literature, we might be able to answer these questions and others like them in a data-driven way. In this paper, we explore how topic modeling, a branch of machine learning, might help the programming language community better understand our literature.

The authors have produced some really interesting visualizations of how the topic content of various conferences has evolved over time (it's interesting to note that OOPSLA isn't really about OO software development any more, and that PLDI appears to have seen an increasing emphasis on verification and test generation).

Also of potential interest to LtU readers: there is a prototype tool at http://tmpl.weaselhat.com/ that is based on the work presented in this paper. It allows you to upload a paper PDF, and will return the 10 most closely related papers according to the POPL/PLDI topic model. It could be a handy research tool. But, if nothing else, it's a fun way to see what else is related to a paper you're interested in.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Thoughts? Questions we should answer?

We'd love to hear from LtU readers any questions they think we might be able to answer. We've recently been given access to much larger datasets, and community guidance on what sorts of questions to look at first would be much appreciated.

NB you can see some of the questions we've looked at in the figures page, http://tmpl.weaselhat.com/figures.php.