My colleague Mike Rainey described this paper as one of the nicest he's read in a while.
STABILIZER : Statistically Sound Performance Evaluation
Charlie Curtsinger, Emery D. Berger
2013
Researchers and software developers require effective performance
evaluation. Researchers must evaluate optimizations or measure
overhead. Software developers use automatic performance regression
tests to discover when changes improve or degrade performance.
The standard methodology is to compare execution times before and
after applying changes.
Unfortunately, modern architectural features make this approach
unsound. Statistically sound evaluation requires multiple samples
to test whether one can or cannot (with high confidence) reject the
null hypothesis that results are the same before and after. However,
caches and branch predictors make performance dependent on
machinespecific parameters and the exact layout of code, stack
frames, and heap objects. A single binary constitutes just one sample
from the space of program layouts, regardless of the number of runs.
Since compiler optimizations and code changes also alter layout, it
is currently impossible to distinguish the impact of an optimization
from that of its layout effects.
This paper presents STABILIZER, a system that enables the use of
the powerful statistical techniques required for sound performance
evaluation on modern architectures. STABILIZER forces executions
to sample the space of memory configurations by repeatedly rerandomizing
layouts of code, stack, and heap objects at runtime.
STABILIZER thus makes it possible to control for layout effects.
Rerandomization also ensures that layout effects follow a Gaussian
distribution, enabling the use of statistical tests like ANOVA. We
demonstrate STABILIZER's efficiency (< 7% median overhead) and
its effectiveness by evaluating the impact of LLVM’s optimizations
on the SPEC CPU2006 benchmark suite. We find that, while O2
has a significant impact relative to O1, the performance impact of
O3 over O2 optimizations is indistinguishable from random noise.
One takeaway of the paper is the following technique for validation: they verify, empirically, that their randomization technique results in a gaussian distribution of execution time. This does not guarantee that they found all the source of measurement noise, but it guarantees that the source of noise they handled are properly randomized, and that their effect can be reasoned about rigorously using the usual tools of statisticians. Having a gaussian distribution gives you much more than just "hey, taking the average over these runs makes you resilient to {weird hardward effect blah}", it lets you compute pvalues and in general use statistics.
Interesting survey.
Based on a brief look I am not sure I agree with all the conclusions/rankings. But most seem to make sense and the Notable Libraries and examples in each category are helpful.
Using Commutative Assessments to Compare Conceptual Understanding in Blocksbased and Textbased Programs, David Weintrop, Uri Wilensky. Proceedings of the eleventh annual International Conference on International Computing Education Research. Via Computing Education Blog.
Blocksbased programming environments are becoming increasingly common in introductory programming courses, but to date, little comparative work has been done to understand if and how this approach affects students' emerging understanding of fundamental programming concepts. In an effort to understand how tools like Scratch and Blockly differ from more conventional textbased introductory programming languages with respect to conceptual understanding, we developed a set of "commutative" assessments. Each multiplechoice question on the assessment includes a short program that can be displayed in either a blocks based or textbased form. The set of potential answers for each question includes the correct answer along with choices informed by prior research on novice programming misconceptions. In this paper we introduce the Commutative Assessment, discuss the theoretical and practical motivations for the assessment, and present findings from a study that used the assessment. The study had 90 high school students take the assessment at three points over the course of the first ten weeks of an introduction to programming course, alternating the modality (blocks vs. text) for each question over the course of the three administrations of the assessment. Our analysis reveals differences on performance between blocksbased and textbased questions as well as differences in the frequency of misconceptions based on the modality. Future work, potential implications, and limitations of these findings are also discussed.
This list of classic books is the result of a poll ACM conducted where members named their favorite computer science books.
Good list. Bar the last two, which I have nothing against, the list consists of favorites of mine. It is always nice to see how many classics of CS come from work on programming languages. Not a surprise for anyone here, of course, but not always acknowledged. While we are on the subject of classic books, check out Luke's twitter poll here.
Ceptre: A Language for Modeling Generative Interactive Systems.
Chris Martens
2015
We present a rule specification language called Ceptre,
intended to enable rapid prototyping for experimental
game mechanics, especially in domains that depend on
procedural generation and multiagent simulation.
Ceptre can be viewed as an explication of a new
methodology for understanding games based on linear
logic, a formal logic concerned with resource usage. We
present a correspondence between gameplay and proof
search in linear logic, building on prior work on generating narratives. In Ceptre, we introduce the ability to
add interactivity selectively into a generative model, enabling inspection of intermediate states for debugging
and exploration as well as a means of play.
We claim that this methodology can support game designers and researchers in designing, anaylzing, and debugging the core systems of their work in generative,
multiagent gameplay. To support this claim, we provide two case studies implemented in Ceptre, one from
interactive narrative and one from a strategylike domain.
Some choice quotes from the artice follow.
Simple examples of the rule language:
The meaning of A o B , to a first approximation, is
that whenever the predicates in A are present, they may be
replaced with B . One example of a rule is:
do/compliment:
at C L * at C’ L * likes C C’
o at C L * at C’ L * likes C C’ * likes C’ C.
[...]
Note that because of the replacement semantics of the
rule, we need to reiterate everything on the righthand side
of the o that we don’t want to disappear, such as the character locations and original likes fact. We use the syntactic sugar of prepending $ to anything intended not to be
removed in order to reduce this redundancy:
do/compliment: $at C L * $at C’ L * $likes C C’ o likes C’ C.
A more complex rule describes a murder action, using
the ! operator to indicate a permanent state:
do/murder:
anger C C’ * anger C C’ * anger C C’ * anger C C’
* $at C L * at C’ L * $has C weapon
o !dead C’.
(This rule consumes C ’s location, maintaining a global
invariant that each character is mutually exclusively at a
location or !dead.) Here we see a departure from planning
formalisms: four instances of anger C C’ mean something different from one.
Here we are using an emotion not
just as a precondition but as a resource, where if we have
enough of it, we can exchange it for a drastic consequence.
Whether or not we diffuse the anger, or choose to keep it by
prepending $ to the predicates, is an authorial choice.
Concurrency in narration:
Two rule applications that consume
disjoint sets of resources from the same state can be said to
happen concurrently, or independently. On the other hand,
a rule that produces resources and another that consumes a
subset of them can be said to be in a causal, or dependent,
relationship. Less abstractly, if resources represent facts associated with particular game entities or characters, then independent rule applications represent potentially concurrent
action by multiple such entities, and causally related rule applications represent either sequential action by a single actor,
or synchronized interaction between two entities.
Stages, and a larger example:
We would like to for some
of these rules to run automatically without player intervention. In our next iteration of the program, we will make use
of a Ceptre feature called stages. Stages are a way of structuring a program in terms of independent components. Syntactically, a stage is a curlybracedelimited set of rules with
an associated name. Semantically, a stage is a unit of computation that runs to quiescence, i.e. no more rules are able
to fire, at which point control may be transfered to another
stage.
[...]
Additionally, we can test the design by “scripting” certain player strategies. For instance, we
could augment the two rules in the fight stage to be deterministic, fighting when the monster can’t kill us in one turn
and fleeing otherwise:
stage fight = {
do_fight:
choice * $fight_in_progress * $monster Size * $health HP * Size < HP
o try_fight.
do_flee :
choice * fight_in_progress * $monster Size * $health HP * Size >= HP
o flee_screen.
}
If we remove interactivity from this stage, then we get
automated combat sequences that should never result in the
player’s death.
I saw this work presented at ESOP 2015 by Neil Toronto, and the talk was excellent (slides).
Running Probabilistic Programs Backwards
Neil Toronto, Jay McCarthy, David Van Horn
2015
Many probabilistic programming languages allow programs to be run under constraints in order to carry out Bayesian inference. Running programs under constraints could enable other uses such as rare event simulation and probabilistic verificationexcept that all such probabilistic languages are necessarily limited because they are defined or implemented in terms of an impoverished theory of probability. Measuretheoretic probability provides a more general foundation, but its generality makes finding computational content difficult.
We develop a measuretheoretic semantics for a firstorder probabilistic language with recursion, which interprets programs as functions that compute preimages. Preimage functions are generally uncomputable, so we derive an abstract semantics. We implement the abstract semantics and use the implementation to carry out Bayesian inference, stochastic ray tracing (a rare event simulation), and probabilistic verification of floatingpoint error bounds.
(also on SciRate)
The introduction sells the practical side of the work a bit better than the abstract.
Stochastic ray tracing [30] is one such rareevent simulation task. As illus
trated in Fig. 1, to carry out stochastic ray tracing, a probabilistic program
simulates a light source emitting a single photon in a random direction, which
is reflected or absorbed when it hits a wall. The program outputs the photon’s
path, which is constrained to pass through an aperture. Millions of paths that
meet the constraint are sampled, then projected onto a simulated sensor array.
The program’s main loop is a recursive function with two arguments: path ,
the photon’s path so far as a list of points, and dir , the photon’s current direction.
simulatephoton path dir :=
case (findhit (fst path) dir) of
absorb pt −→ (pt, path)
reflect pt norm −→ simulatephoton (pt, path) (randomhalfdir norm)
Running simulatephoton (pt, ()) dir , where pt is the light source’s location and
dir is a random emission direction, generates a photon path. The fst of the path
(the last collision point) is constrained to be in the aperture. The remainder of
the program is simple vector math that computes rayplane intersections.
In contrast, handcoded stochastic ray tracers, written in generalpurpose
languages, are much more complex and divorced from the physical processes they
simulate, because they must interleave the advanced Monte Carlo algorithms
that ensure the aperture constraint is met.
Unfortunately, while many probabilistic programming languages support random real numbers, none are capable of running a probabilistic program like simulatephoton under constraints to carry out stochastic ray tracing. The reason is not lack of
engineering or weak algorithms, but is theoretical at its core: they are all either
defined or implemented using [density functions]. [...] Programs whose outputs are deterministic
functions of random values and programs with recursion generally cannot denote
density functions. The program simulatephoton exhibits both characteristics.
Measuretheoretic probability is a more powerful alternative to this naive
probability theory based on probability mass and density functions. It not only
subsumes naive probability theory, but is capable of defining any computable
probability distribution, and many uncomputable distributions. But while even
the earliest work [15] on probabilistic languages is measuretheoretic, the theory’s
generality has historically made finding useful computational content difficult.
We show that measuretheoretic probability can be made computational by
 Using measuretheoretic probability to define a compositional, denotational
semantics that gives a valid denotation to every program.
 Deriving an abstract semantics, which allows computing answers to questions
about probabilistic programs to arbitrary accuracy.
 Implementing the abstract semantics and efficiently solving problems.
In fact, our primary implementation, Dr. Bayes, produced Fig. 1b by running a probabilistic program like simulatephoton under an aperture constraint.
Markus Voelter, Bernd Kolb1, Daniel Ratiu, and Bernhard Schaetz, "mbeddr: an Extensible Cbased Programming Language and IDE for Embedded Systems", SplashCON/Wavefront 2012.
Although embedded systems are an increasingly large part of our lives, and despite the fact that embedded software would undoubtedly benefit from the kind safety guarantees provided by more advanced type systems, most embedded software development is still done in C. That's partly a result of toolchain availability, and partly because many more advanced languages typically impose requirements on memory, dynamic memory allocation, and other runtime infrastructure that simply aren't supportable on a lot of resourceconstrained microcontrollers or acceptable in a riskaverse environment. Mbeddr seems to be seeking a middle ground between C, and creating a whole new language. From the paper:
While the C programming language provides very good support for writing efficient, lowlevel code, it does not offer adequate means for defining higherlevel abstractions relevant to embedded software. In this paper we present the mbeddr technology stack that supports extension of C with constructs adequate for embedded systems. In mbeddr, efficient lowlevel programs can be written using the wellknown concepts from C. Higher level domainspecific abstractions can be seamlessly integrated into C by means of modular language extension regarding syntax, type system, semantics and IDE. In the paper we show how language extension can address the challenges of embedded software development and report on our experience in building these extensions. We show that language workbenches deliver on the promise of significantly reducing the effort of language engineering and the construction of corresponding IDEs. mbeddr is built on top of the JetBrains MPS language workbench. Both MPS and mbeddr are open source software
It appears that mbeddr allows multiple DSLs to be built on top of C to provide greater safety and more domainspecific expressions of typical embedded software patterns. Additionally, it provides integration with various analysis tools including modelcheckers. Another paper, "Preliminary Experience of using mbeddr for Developing Embedded Software", provides a look at how all of these things fit together in use.
The mbeddr approach seems similar in concept to Ivory and Tower, although mbeddr uses JetBrains MPS as the platform for creating DSLs instead of building an embedded DSL in Haskell.
Michael Greenberg, Kathleen Fisher, and David Walker, "Tracking the Flow of Ideas through the Programming Languages Literature", SNAPL 2015.
How have conferences like ICFP, OOPSLA, PLDI, and POPL evolved over the last 20 years? Did generalizing the Call for Papers for OOPSLA in 2007 or changing the name of the umbrella conference to SPLASH in 2010 have any effect on the kinds of papers published there? How do POPL and PLDI papers compare, topicwise? Is there related work that I am missing? Have the ideas in O’Hearn’s classic paper on separation logic shifted the kinds of papers that appear in POPL? Does a proposed program committee cover the range of submissions expected for the conference? If we had better tools for analyzing the programming language literature, we might be able to answer these questions and others like them in a datadriven way. In this paper, we explore how topic modeling, a branch of machine learning, might help the programming language community better understand our literature.
The authors have produced some really interesting visualizations of how the topic content of various conferences has evolved over time (it's interesting to note that OOPSLA isn't really about OO software development any more, and that PLDI appears to have seen an increasing emphasis on verification and test generation).
Also of potential interest to LtU readers: there is a prototype tool at http://tmpl.weaselhat.com/ that is based on the work presented in this paper. It allows you to upload a paper PDF, and will return the 10 most closely related papers according to the POPL/PLDI topic model. It could be a handy research tool. But, if nothing else, it's a fun way to see what else is related to a paper you're interested in.
Sergi Valverde and Ricard Solé, "Punctuated equilibrium in the large scale evolution of programming languages", SFI working paper 201409030
Here we study the large scale historical development of programming languages, which have deeply marked social and technological advances in the last half century. We analyse their historical connections using network theory and reconstructed phylogenetic networks. Using both data analysis and network modelling, it is shown that their evolution is highly uneven, marked by innovation events where new languages are created out of improved combinations of different structural components belonging to previous languages. These radiation events occur in a bursty pattern and are tied to novel technological and social niches. The method can be extrapolated to other systems and consistently captures the major classes of languages and the widespread horizontal design exchanges, revealing a punctuated evolutionary path.
The results developed here are perhaps not that surprising to people familiar with the history of programming languages. But it's interesting to see it all formalized and analyzed.
Eugenia Cheng's new popular coscience book is out, in the U.K. under the title Cakes, Custard and Category Theory: Easy recipes for understanding complex maths, and in the U.S. under the title How to Bake Pi: An Edible Exploration of the Mathematics of Mathematics:
Most people imagine maths is something like a slow cooker: very useful, but pretty limited in what it can do. Maths, though, isn't just a tool for solving a specific problem  and it's definitely not something to be afraid of. Whether you're a maths glutton or have forgotten how long division works (or never really knew in the first place), the chances are you've missed what really makes maths exciting. Calling on a baker's dozen of entertaining, puzzling examples and mathematically illuminating culinary analogies  including chocolate brownies, iterated Battenberg cakes, sandwich sandwiches, Yorkshire puddings and Möbius bagels  brilliant young academic and mathematical crusader Eugenia Cheng is here to tell us why we should all love maths.
From simple numeracy to category theory ('the mathematics of mathematics'), Cheng takes us through the joys of the mathematical world. Packed with recipes, puzzles to surprise and delight even the innumerate, Cake, Custard & Category Theory will whet the appetite of maths whizzes and arithmophobes alike. (Not to mention aspiring cooks: did you know you can use that slow cooker to make clotted cream?) This is maths at its absolute tastiest.
Cheng, one of the Catsters, gives a guided tour of mathematical thinking and research activities, and through the core philosophy underlying category theory. This is the kind of book you can give to your grandma and grandpa so they can boast to their friends what her grandchildren are doing (and bake you a nice dessert when you come and visit :) ). A pleasant weekend reading.

Recent comments
1 hour 4 min ago
1 hour 19 min ago
1 hour 22 min ago
2 hours 26 min ago
4 hours 40 min ago
4 hours 58 min ago
7 hours 40 min ago
19 hours 17 min ago
1 day 18 hours ago
2 days 10 hours ago