Iterators Must Go

Andrei Alexandrescu: Iterators Must Go, BoostCon 2009 keynote.

Presents a simple yet far-reaching replacement for iterators, called ranges, and interesting "D" libraries built on it: std.algorithm and std.range.

Ranges pervade D: algorithms, lazy evaluation, random numbers, higher-order functions, foreach statement...

(Related: SERIES, enumerators, SRFI 1, and The Case For D by the same author)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Back to cdr....

Of course, given that ranges have semantics similar to containers, this moves C++/D closer to the old Lisp way of traversing containers--an operation which gives you a unique element of a (non-empty) container, and an operation which gives you a container containing everything else.

Mutation, of course, is likely the fly in the ointment. One of the good things about C++ style iterators is that it gives you a Big Red Flag that you're likely working on shared data, and if you make changes through an iterator they affect the underlying container. In a side-effect-free world, it matters not, but with side effects, it does.

Which brings up an interesting question: Do ranges suffer from the same invalidation problems that C++ iterators face (increase the size of an STL vector beyond its capacity, and it gets realloced--with all existing iterators to the array silently becoming wild pointers)?

Invalidation...

... is a design choice for a particular implementation of ranges or iterators. C++ iterators could have been designed to inject 'observers' on a collection, for example, such that a resized vector will automatically move any existing iterators if necessary.

Ranges, and iterators, may both go either way on that issue.

I agree that mutation seems problematic for the given design.

Rubies choice is interesting...

Rubies ".each{|o| }" iterators are a major selling point of Ruby. But silently, under the hood for the Ruby container primitives, the design choice that "you can never invalidate an iterator" has been made.

Yes, it means Ruby will be fundamentally slower than languages and libraries that allow iterator invalidation, but hey, it means far fewer gnarly bugs.

Well. Honestly I think I

Well. Honestly I think I might prefer iterator invalidation; mutation during iteration is bad news. Bending over backwards to permit it seems to invite bad behavior from programmers.

Concretely, when Ruby's Hash class was changed in 1.9 and JRuby to do insertion-order iteration, many people complained very loudly -- because they had written loops which kept inserting elements into hash while iterating over it, which they had expected to eventually terminate.

Preference

Iterator invalidation has horrible concurrency semantics, forcing either coarse-grained locking (which invites deadlocks if the operations over the elements also need protection) or range copying (which often turns out to be unnecessary and therefore inefficient) in the presence of concurrency. Add to that your complaint, which regards additive updates in the presence of looping.

If it comes to preferences, what I'd prefer is that iteration occurs over a frozen snapshot of whatever values (or object references) are held within the collection, with the snapshot taken at the time the iterator or range was acquired. If the original collection is updated (or even deleted) concurrently or as part of the iteration process, it will not be seen by the iterator.

Implementation may use versioned pointers or copy-on-write under the hood to avoid copies in the case the collection isn't updated during iteration. (A statically compiled language could choose implementation strategy based on predicted usage patterns.)

And, of course, a viable strategy is to have immutable collections stored to variables. Relational databases manage so quite well, though this has a broader impact on the language used to 'manipulate' a collection (trading it for language to create new collections by composing existing collections).

or back to APL

I think the problem of side effects is one of the remaining advantages of doing things the APL way and describing algorithms in terms of bulk operator primitives. Index sets are still valid even if the vector's location in memory changes. Of course, some operations that truly are serially dependent on single values (ackerman) are awkward when described with bulk operators.

I really wonder if Alexandrescue is familiar with Iverson and his goals. Certainly comparing STL to APL is somewhat apples to oranges, but one can't help but think they're working on overlapping problems when it comes to finding concise language for describing algorithms.

Invalidation

Which brings up an interesting question: Do ranges suffer from the same invalidation problems that C++ iterators face (increase the size of an STL vector beyond its capacity, and it gets realloced--with all existing iterators to the array silently becoming wild pointers)?

Most of the same rules would apply for the std containers. A range referring to a std::vector will be invalidated by capacity-increasing operations on the vector. A range referring to a std::list is more resilient to invalidation (only erasing an element at the beginning/end of the range will invalidate it; inserting elements doesn't invalidate anything.)

Those invalidation problems aren't inherent to the range concept, but I think they would be present for the existing std containers if they were converted to work with ranges. It's entirely possible to design a vector-like container which has a range specified in terms of begin/end offsets and a reference to the container, rather than pointers to the elements themselves. That would not be invalidated by re-allocations, and could check the offsets to detect when the vector has shrunk. That probably wouldn't be chosen as the usual representation for a std::vector's range, due to (you guessed it) performance reasons :-)

_

_

an article form

Yes, slides suck for presenting interesting ideas.

There's a clear and compelling article form of Alexandrescu's ideas in On Iteration. I'd like to see him expand this out to book-length, like Stepanov's recent book.