archives

Using Commutative Assessments to Compare Conceptual Understanding in Blocks-based and Text-based Programs

Using Commutative Assessments to Compare Conceptual Understanding in Blocks-based and Text-based Programs, David Weintrop, Uri Wilensky. Proceedings of the eleventh annual International Conference on International Computing Education Research. Via Computing Education Blog.

Blocks-based programming environments are becoming increasingly common in introductory programming courses, but to date, little comparative work has been done to understand if and how this approach affects students' emerging understanding of fundamental programming concepts. In an effort to understand how tools like Scratch and Blockly differ from more conventional text-based introductory programming languages with respect to conceptual understanding, we developed a set of "commutative" assessments. Each multiple-choice question on the assessment includes a short program that can be displayed in either a blocks- based or text-based form. The set of potential answers for each question includes the correct answer along with choices informed by prior research on novice programming misconceptions. In this paper we introduce the Commutative Assessment, discuss the theoretical and practical motivations for the assessment, and present findings from a study that used the assessment. The study had 90 high school students take the assessment at three points over the course of the first ten weeks of an introduction to programming course, alternating the modality (blocks vs. text) for each question over the course of the three administrations of the assessment. Our analysis reveals differences on performance between blocks-based and text-based questions as well as differences in the frequency of misconceptions based on the modality. Future work, potential implications, and limitations of these findings are also discussed.

State of the Haskell ecosystem - August 2015

Interesting survey.

Based on a brief look I am not sure I agree with all the conclusions/rankings. But most seem to make sense and the Notable Libraries and examples in each category are helpful.

Big questions

So, I've been (re)reading Hamming /The Are of Doing Science and Engineering/, which includes the famous talk "you and your research". That's the one where he recommends thinking about the big questions in your field. So here's one that we haven't talked about in awhile. It seems clear that more and more things are being automated, machine learning is improving, systems are becoming harder to tinker with, and so on. So for how long are we going to be programming in ways similar to those we are used to, which have been with us essentially since the dawn of computing? Clearly, some people will be programming as long as there are computers. But is the number of people churning code going to remain significant? In five years - of course. Ten? Most likely. Fifteen - I am not so sure. Twenty? I have no idea.

One thing I am sure of: as long as programming remains something many people do, there will be debates about static type checking.

Update: To put this in perspective - LtU turned fifteen last month. Wow.

Update 2: Take the poll!

How long will we still be programming
 
pollcode.com free polls

STABILIZER : Statistically Sound Performance Evaluation

My colleague Mike Rainey described this paper as one of the nicest he's read in a while.

STABILIZER : Statistically Sound Performance Evaluation
Charlie Curtsinger, Emery D. Berger
2013

Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade performance. The standard methodology is to compare execution times before and after applying changes.

Unfortunately, modern architectural features make this approach unsound. Statistically sound evaluation requires multiple samples to test whether one can or cannot (with high confidence) reject the null hypothesis that results are the same before and after. However, caches and branch predictors make performance dependent on machine-specific parameters and the exact layout of code, stack frames, and heap objects. A single binary constitutes just one sample from the space of program layouts, regardless of the number of runs. Since compiler optimizations and code changes also alter layout, it is currently impossible to distinguish the impact of an optimization from that of its layout effects.

This paper presents STABILIZER, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures. STABILIZER forces executions to sample the space of memory configurations by repeatedly re-randomizing layouts of code, stack, and heap objects at runtime. STABILIZER thus makes it possible to control for layout effects. Re-randomization also ensures that layout effects follow a Gaussian distribution, enabling the use of statistical tests like ANOVA. We demonstrate STABILIZER's efficiency (< 7% median overhead) and its effectiveness by evaluating the impact of LLVM’s optimizations on the SPEC CPU2006 benchmark suite. We find that, while -O2 has a significant impact relative to -O1, the performance impact of -O3 over -O2 optimizations is indistinguishable from random noise.

One take-away of the paper is the following technique for validation: they verify, empirically, that their randomization technique results in a gaussian distribution of execution time. This does not guarantee that they found all the source of measurement noise, but it guarantees that the source of noise they handled are properly randomized, and that their effect can be reasoned about rigorously using the usual tools of statisticians. Having a gaussian distribution gives you much more than just "hey, taking the average over these runs makes you resilient to {weird hardward effect blah}", it lets you compute p-values and in general use statistics.