Benchmarking and Statistics

I recently posted an excerpt from a Ruby book I'm working on (it's a sidebar from a section on the Benchmark library), but I'm not sure if it's solid enough for publication (my math background isn't up to snuff). I'd love to see/hear what LtUers think about it.

Generally the idea is that benchmarking as currently provided by the library is insufficient -- doing one comparison run between options (even a large run) doesn't indicate anything about the statistical significance of the variation between run times of the two (or more) options. So I present a somewhat naive statistical model for doing those comparisons (and in the blog post encourage someone with better Ruby and Math chops to librarify the idea).

You can read the post at: Benchmarking, Lies, and Statistics.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

bench.erl

I occasionally use something like this in Erlang: bench.erl. This prints the same simple statistics as the unix ping program, and it can also generate a simple graph with gnuplot. Example:

11> F = fun() -> lists:sort(lists:seq(1, 100000)) end.
#Fun<erl_eval.20.54682479>
12> bench:dotimes(100, F).
total = 3336.292 ms, min/avg/max/mdev = 24.344/33.363/47.183/4.614 ms
ok

A glimpse in the CVS tree at work shows that colleagues have extended the hell out of this but I haven't time right now to see in what way.

BTW, finding bench.erl is the first time I've tried Google Codesearch before my own harddisk to find a program that I wrote. But it didn't find it, despite living on Sourceforge!