archives

We're doing it wrong....

Poul-Henning Kamp makes an interesting case in favor of his opinion that CS education in analyzing algorithm complexity, is focusing on a model of computation which is simplified to the point of being stupid and wrong, and that it leads students to real-world failures in writing performant code.

By taking page swapping into account, he has achieved a one to two order-of-magnitude speedup in normal use cases over a "standard, provably optimal" algorithm used throughout computer science. He argues that the model of computer memory as having uniform access time is and has for several decades, been wrong and misleading to the detriment of the students and standards of performance. He's on about virtual memory and disk swapping in particular, but his points are just as valid in a system with a high-speed memory cache fronting a slower main memory, which is essentially all of them. Here is a link to his article:

You're doing it wrong

This is of interest to language developers because most people aren't actually writing those fundamental algorithms anymore. They're using the libraries that the language developers provide. So let's not provide stupid and wrong, eh?

On the other hand, I don't see how his described application (a server) requires the ordering that a tree structure provides. It isn't in the business of efficiently providing lots of dynamic ordered lists of the documents it serves; it's in the business of efficiently providing lots of served documents. If I were attempting to optimize access to documents, I'd be creating a hash table of pointers to these documents rather than a tree of any description, thus never needing to access more than one page (or maybe two for rehashes in the small fraction of cases where hash bucket overflows have happened) in order to find out where the document is kept.

On the gripping hand, his described tree structure is perfect for collections that have to support a lot of insertion and deletion while remaining ordered, so I can see it as having good use to implement, say, database tables.