Python Optimization Surprises

This weekend, I took another crack at trimming microseconds off the common-case path for generic function execution, and succeeded in dropping the excution time from 13.2 microseconds to just over 9.8. (Which is about 9 microseconds overhead added relative to a hand-optimized Python version of the same trivial function.) Along the way, however, I realized a couple of surprising things about Python performance tuning.

An amusing story that tells you something about Python's implementation.

The discussion of closures is of particular interest...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

An object lesson

I think this is an interesting object lesson in the pitfalls of trying to optimize below the language level, as we were discussing in another thread.

9.8 microseconds?!

Perhaps I misunderstood what was being said in the blog, but a 9.8 microsecond overhead seems _incredibly_ high to me. (Even 0.8 microseconds sounds high). In a lot of the code I've been writing lately, I've been concerned about timing down to the nanoseconds. (Not uncommon when working with very large datasets, I'd think).

It also seems strange to me that 9 out of the 9.8 microseconds is additional overhead strictly for generic functions. I guess I'm used to generic functions in statically-typed languages, where genericity usually has no impact on performance.

Sounds about right

Performance isn't something you should expect from interpreted, dynamically typed languages. Python is one of the slower ones, and a non-optimized function call where the arguments are passed in an allocated tuple and everything is reference counted isn't going to be fast.

Now the other Python (the compiler for cmucl/sbcl) is an entirely different issue...