STABILIZER : Statistically Sound Performance Evaluation

My colleague Mike Rainey described this paper as one of the nicest he's read in a while.

STABILIZER : Statistically Sound Performance Evaluation
Charlie Curtsinger, Emery D. Berger

Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade performance. The standard methodology is to compare execution times before and after applying changes.

Unfortunately, modern architectural features make this approach unsound. Statistically sound evaluation requires multiple samples to test whether one can or cannot (with high confidence) reject the null hypothesis that results are the same before and after. However, caches and branch predictors make performance dependent on machine-specific parameters and the exact layout of code, stack frames, and heap objects. A single binary constitutes just one sample from the space of program layouts, regardless of the number of runs. Since compiler optimizations and code changes also alter layout, it is currently impossible to distinguish the impact of an optimization from that of its layout effects.

This paper presents STABILIZER, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures. STABILIZER forces executions to sample the space of memory configurations by repeatedly re-randomizing layouts of code, stack, and heap objects at runtime. STABILIZER thus makes it possible to control for layout effects. Re-randomization also ensures that layout effects follow a Gaussian distribution, enabling the use of statistical tests like ANOVA. We demonstrate STABILIZER's efficiency (< 7% median overhead) and its effectiveness by evaluating the impact of LLVM’s optimizations on the SPEC CPU2006 benchmark suite. We find that, while -O2 has a significant impact relative to -O1, the performance impact of -O3 over -O2 optimizations is indistinguishable from random noise.

One take-away of the paper is the following technique for validation: they verify, empirically, that their randomization technique results in a gaussian distribution of execution time. This does not guarantee that they found all the source of measurement noise, but it guarantees that the source of noise they handled are properly randomized, and that their effect can be reasoned about rigorously using the usual tools of statisticians. Having a gaussian distribution gives you much more than just "hey, taking the average over these runs makes you resilient to {weird hardward effect blah}", it lets you compute p-values and in general use statistics.

State of the Haskell ecosystem - August 2015

Interesting survey.

Based on a brief look I am not sure I agree with all the conclusions/rankings. But most seem to make sense and the Notable Libraries and examples in each category are helpful.

Using Commutative Assessments to Compare Conceptual Understanding in Blocks-based and Text-based Programs

Using Commutative Assessments to Compare Conceptual Understanding in Blocks-based and Text-based Programs, David Weintrop, Uri Wilensky. Proceedings of the eleventh annual International Conference on International Computing Education Research. Via Computing Education Blog.

Blocks-based programming environments are becoming increasingly common in introductory programming courses, but to date, little comparative work has been done to understand if and how this approach affects students' emerging understanding of fundamental programming concepts. In an effort to understand how tools like Scratch and Blockly differ from more conventional text-based introductory programming languages with respect to conceptual understanding, we developed a set of "commutative" assessments. Each multiple-choice question on the assessment includes a short program that can be displayed in either a blocks- based or text-based form. The set of potential answers for each question includes the correct answer along with choices informed by prior research on novice programming misconceptions. In this paper we introduce the Commutative Assessment, discuss the theoretical and practical motivations for the assessment, and present findings from a study that used the assessment. The study had 90 high school students take the assessment at three points over the course of the first ten weeks of an introduction to programming course, alternating the modality (blocks vs. text) for each question over the course of the three administrations of the assessment. Our analysis reveals differences on performance between blocks-based and text-based questions as well as differences in the frequency of misconceptions based on the modality. Future work, potential implications, and limitations of these findings are also discussed.

ACM Classic Books Series

This list of classic books is the result of a poll ACM conducted where members named their favorite computer science books.

Good list. Bar the last two, which I have nothing against, the list consists of favorites of mine. It is always nice to see how many classics of CS come from work on programming languages. Not a surprise for anyone here, of course, but not always acknowledged. While we are on the subject of classic books, check out Luke's twitter poll here.

Ceptre: A Language for Modeling Generative Interactive Systems.

Ceptre: A Language for Modeling Generative Interactive Systems.
Chris Martens

We present a rule specification language called Ceptre, intended to enable rapid prototyping for experimental game mechanics, especially in domains that depend on procedural generation and multi-agent simulation.

Ceptre can be viewed as an explication of a new methodology for understanding games based on linear logic, a formal logic concerned with resource usage. We present a correspondence between gameplay and proof search in linear logic, building on prior work on generating narratives. In Ceptre, we introduce the ability to add interactivity selectively into a generative model, enabling inspection of intermediate states for debugging and exploration as well as a means of play.

We claim that this methodology can support game designers and researchers in designing, anaylzing, and debugging the core systems of their work in generative, multi-agent gameplay. To support this claim, we provide two case studies implemented in Ceptre, one from interactive narrative and one from a strategy-like domain.

Some choice quotes from the artice follow.

Simple examples of the rule language:

The meaning of A -o B, to a first approximation, is that whenever the predicates in A are present, they may be replaced with B. One example of a rule is:

     at C L * at C’ L * likes C C’
  -o at C L * at C’ L * likes C C’ * likes C’ C.


Note that because of the replacement semantics of the rule, we need to reiterate everything on the right-hand side of the -o that we don’t want to disappear, such as the character locations and original likes fact. We use the syntactic sugar of prepending $ to anything intended not to be removed in order to reduce this redundancy:

do/compliment: $at C L * $at C’ L * $likes C C’ -o likes C’ C.

A more complex rule describes a murder action, using the ! operator to indicate a permanent state:

    anger C C’ * anger C C’ * anger C C’ * anger C C’
    * $at C L * at C’ L * $has C weapon
  -o !dead C’.

(This rule consumes C’s location, maintaining a global invariant that each character is mutually exclusively at a location or !dead.) Here we see a departure from planning formalisms: four instances of anger C C’ mean something different from one. Here we are using an emotion not just as a precondition but as a resource, where if we have enough of it, we can exchange it for a drastic consequence. Whether or not we diffuse the anger, or choose to keep it by prepending $ to the predicates, is an authorial choice.

Concurrency in narration:

Two rule applications that consume disjoint sets of resources from the same state can be said to happen concurrently, or independently. On the other hand, a rule that produces resources and another that consumes a subset of them can be said to be in a causal, or dependent, relationship. Less abstractly, if resources represent facts associated with particular game entities or characters, then independent rule applications represent potentially concurrent action by multiple such entities, and causally related rule applications represent either sequential action by a single actor, or synchronized interaction between two entities.

Stages, and a larger example:

We would like to for some of these rules to run automatically without player intervention. In our next iteration of the program, we will make use of a Ceptre feature called stages. Stages are a way of structuring a program in terms of independent components. Syntactically, a stage is a curly-brace-delimited set of rules with an associated name. Semantically, a stage is a unit of computation that runs to quiescence, i.e. no more rules are able to fire, at which point control may be transfered to another stage.


Additionally, we can test the design by “scripting” certain player strategies. For instance, we could augment the two rules in the fight stage to be deterministic, fighting when the monster can’t kill us in one turn and fleeing otherwise:

stage fight = {
      choice * $fight_in_progress * $monster Size * $health HP * Size < HP
    -o try_fight.
  do_flee :
      choice * fight_in_progress * $monster Size * $health HP * Size >= HP
    -o flee_screen.

If we remove interactivity from this stage, then we get automated combat sequences that should never result in the player’s death.

Running Probabilistic Programs Backwards

I saw this work presented at ESOP 2015 by Neil Toronto, and the talk was excellent (slides).

Running Probabilistic Programs Backwards
Neil Toronto, Jay McCarthy, David Van Horn

Many probabilistic programming languages allow programs to be run under constraints in order to carry out Bayesian inference. Running programs under constraints could enable other uses such as rare event simulation and probabilistic verification---except that all such probabilistic languages are necessarily limited because they are defined or implemented in terms of an impoverished theory of probability. Measure-theoretic probability provides a more general foundation, but its generality makes finding computational content difficult.

We develop a measure-theoretic semantics for a first-order probabilistic language with recursion, which interprets programs as functions that compute preimages. Preimage functions are generally uncomputable, so we derive an abstract semantics. We implement the abstract semantics and use the implementation to carry out Bayesian inference, stochastic ray tracing (a rare event simulation), and probabilistic verification of floating-point error bounds.

(also on SciRate)

The introduction sells the practical side of the work a bit better than the abstract.

Stochastic ray tracing [30] is one such rare-event simulation task. As illus- trated in Fig. 1, to carry out stochastic ray tracing, a probabilistic program simulates a light source emitting a single photon in a random direction, which is reflected or absorbed when it hits a wall. The program outputs the photon’s path, which is constrained to pass through an aperture. Millions of paths that meet the constraint are sampled, then projected onto a simulated sensor array.

The program’s main loop is a recursive function with two arguments: path, the photon’s path so far as a list of points, and dir, the photon’s current direction.

simulate-photon path dir :=
  case (find-hit (fst path) dir) of
    absorb pt −→ (pt, path)
    reflect pt norm −→ simulate-photon (pt, path) (random-half-dir norm)

Running simulate-photon (pt, ()) dir, where pt is the light source’s location and dir is a random emission direction, generates a photon path. The fst of the path (the last collision point) is constrained to be in the aperture. The remainder of the program is simple vector math that computes ray-plane intersections.

In contrast, hand-coded stochastic ray tracers, written in general-purpose languages, are much more complex and divorced from the physical processes they simulate, because they must interleave the advanced Monte Carlo algorithms that ensure the aperture constraint is met.

Unfortunately, while many probabilistic programming languages support random real numbers, none are capable of running a probabilistic program like simulate-photon under constraints to carry out stochastic ray tracing. The reason is not lack of engineering or weak algorithms, but is theoretical at its core: they are all either defined or implemented using [density functions]. [...] Programs whose outputs are deterministic functions of random values and programs with recursion generally cannot denote density functions. The program simulate-photon exhibits both characteristics.

Measure-theoretic probability is a more powerful alternative to this naive probability theory based on probability mass and density functions. It not only subsumes naive probability theory, but is capable of defining any computable probability distribution, and many uncomputable distributions. But while even the earliest work [15] on probabilistic languages is measure-theoretic, the theory’s generality has historically made finding useful computational content difficult. We show that measure-theoretic probability can be made computational by

  1. Using measure-theoretic probability to define a compositional, denotational semantics that gives a valid denotation to every program.
  2. Deriving an abstract semantics, which allows computing answers to questions about probabilistic programs to arbitrary accuracy.
  3. Implementing the abstract semantics and efficiently solving problems.

In fact, our primary implementation, Dr. Bayes, produced Fig. 1b by running a probabilistic program like simulate-photon under an aperture constraint.

mbeddr: an Extensible C-based Programming Language and IDE for Embedded Systems

Markus Voelter, Bernd Kolb1, Daniel Ratiu, and Bernhard Schaetz, "mbeddr: an Extensible C-based Programming Language and IDE for Embedded Systems", SplashCON/Wavefront 2012.

Although embedded systems are an increasingly large part of our lives, and despite the fact that embedded software would undoubtedly benefit from the kind safety guarantees provided by more advanced type systems, most embedded software development is still done in C. That's partly a result of toolchain availability, and partly because many more advanced languages typically impose requirements on memory, dynamic memory allocation, and other runtime infrastructure that simply aren't supportable on a lot of resource-constrained microcontrollers or acceptable in a risk-averse environment. Mbeddr seems to be seeking a middle ground between C, and creating a whole new language. From the paper:

While the C programming language provides very good support for writing efficient, low-level code, it does not offer adequate means for defining higher-level abstractions relevant to embedded software. In this paper we present the mbeddr technology stack that supports extension of C with constructs adequate for embedded systems. In mbeddr, efficient low-level programs can be written using the well-known concepts from C. Higher level domain-specific abstractions can be seamlessly integrated into C by means of modular language extension regarding syntax, type system, semantics and IDE. In the paper we show how language extension can address the challenges of embedded software development and report on our experience in building these extensions. We show that language workbenches deliver on the promise of significantly reducing the effort of language engineering and the construction of corresponding IDEs. mbeddr is built on top of the JetBrains MPS language workbench. Both MPS and mbeddr are open source software

It appears that mbeddr allows multiple DSLs to be built on top of C to provide greater safety and more domain-specific expressions of typical embedded software patterns. Additionally, it provides integration with various analysis tools including model-checkers. Another paper, "Preliminary Experience of using mbeddr for Developing Embedded Software", provides a look at how all of these things fit together in use.

The mbeddr approach seems similar in concept to Ivory and Tower, although mbeddr uses JetBrains MPS as the platform for creating DSLs instead of building an embedded DSL in Haskell.

Tracking the Flow of Ideas through the Programming Languages Literature

Michael Greenberg, Kathleen Fisher, and David Walker, "Tracking the Flow of Ideas through the Programming Languages Literature", SNAPL 2015.

How have conferences like ICFP, OOPSLA, PLDI, and POPL evolved over the last 20 years? Did generalizing the Call for Papers for OOPSLA in 2007 or changing the name of the umbrella conference to SPLASH in 2010 have any effect on the kinds of papers published there? How do POPL and PLDI papers compare, topic-wise? Is there related work that I am missing? Have the ideas in O’Hearn’s classic paper on separation logic shifted the kinds of papers that appear in POPL? Does a proposed program committee cover the range of submissions expected for the conference? If we had better tools for analyzing the programming language literature, we might be able to answer these questions and others like them in a data-driven way. In this paper, we explore how topic modeling, a branch of machine learning, might help the programming language community better understand our literature.

The authors have produced some really interesting visualizations of how the topic content of various conferences has evolved over time (it's interesting to note that OOPSLA isn't really about OO software development any more, and that PLDI appears to have seen an increasing emphasis on verification and test generation).

Also of potential interest to LtU readers: there is a prototype tool at that is based on the work presented in this paper. It allows you to upload a paper PDF, and will return the 10 most closely related papers according to the POPL/PLDI topic model. It could be a handy research tool. But, if nothing else, it's a fun way to see what else is related to a paper you're interested in.

Punctuated equilibrium in the large scale evolution of programming languages

Sergi Valverde and Ricard Solé, "Punctuated equilibrium in the large scale evolution of programming languages", SFI working paper 2014-09-030

Here we study the large scale historical development of programming languages, which have deeply marked social and technological advances in the last half century. We analyse their historical connections using network theory and reconstructed phylogenetic networks. Using both data analysis and network modelling, it is shown that their evolution is highly uneven, marked by innovation events where new languages are created out of improved combinations of different structural components belonging to previous languages. These radiation events occur in a bursty pattern and are tied to novel technological and social niches. The method can be extrapolated to other systems and consistently captures the major classes of languages and the widespread horizontal design exchanges, revealing a punctuated evolutionary path.

The results developed here are perhaps not that surprising to people familiar with the history of programming languages. But it's interesting to see it all formalized and analyzed.

Cakes, Custard, and Category Theory

Eugenia Cheng's new popular coscience book is out, in the U.K. under the title Cakes, Custard and Category Theory: Easy recipes for understanding complex maths, and in the U.S. under the title How to Bake Pi: An Edible Exploration of the Mathematics of Mathematics:

Most people imagine maths is something like a slow cooker: very useful, but pretty limited in what it can do. Maths, though, isn't just a tool for solving a specific problem - and it's definitely not something to be afraid of. Whether you're a maths glutton or have forgotten how long division works (or never really knew in the first place), the chances are you've missed what really makes maths exciting. Calling on a baker's dozen of entertaining, puzzling examples and mathematically illuminating culinary analogies - including chocolate brownies, iterated Battenberg cakes, sandwich sandwiches, Yorkshire puddings and Möbius bagels - brilliant young academic and mathematical crusader Eugenia Cheng is here to tell us why we should all love maths.

From simple numeracy to category theory ('the mathematics of mathematics'), Cheng takes us through the joys of the mathematical world. Packed with recipes, puzzles to surprise and delight even the innumerate, Cake, Custard & Category Theory will whet the appetite of maths whizzes and arithmophobes alike. (Not to mention aspiring cooks: did you know you can use that slow cooker to make clotted cream?) This is maths at its absolute tastiest.

Cheng, one of the Catsters, gives a guided tour of mathematical thinking and research activities, and through the core philosophy underlying category theory. This is the kind of book you can give to your grandma and grandpa so they can boast to their friends what her grandchildren are doing (and bake you a nice dessert when you come and visit :) ). A pleasant weekend reading.