Question about Lazy-ness and algorithmic runtime analysis

Consider the following Haskell-ish functions: (I'm not actually gonna type them into hugs or anything so forgive any syntactical errors)

naiveMin [x]  = x
naiveMin x:xs = if x < y then x else y where y = naiveMin xs

qsort []   = []
qsort x:xs = qsort [ a | a <- xs, a < x] ++ [x] ++ qsort [ b | b <- xs, b >= x ]

lazyMin xs = head (take 1 (qsort xs))

Now what I'm asking is, if we have lazy evaluation semantics, in the lazyMin function, when we call qsort, will the only part that gets evaluated be

qsort [ a | a <- xs, a < x]

or possibly at most:

qsort [ a | a <- xs, a < x] ++ [x]

since we are only taking 1 element from the list. Does this mean it's runtime is O(n), like naiveMin? Are we using lazy evaluation to complete change how much code gets run from outside the function? I'm guessing the answer is probably yes, which also kind of turned on the light-bulb as far as the usefulness of lazy evaluation goes. (Oh yeah you can have an infinite list, but um, so what?). I just want to make sure I'm not missing anything here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

friday

Does this mean it's runtime is O(n), like naiveMin?
No, since qsort [ a | a <- xs, a < x] needs to be evaluated to take the first element, and that evaluation is recursive. If it turns out that x is indeed the smallest element, you'll be done immediately (unlike a strict qsort). However, on average you'll need to sort a list half the size of the original, and so on for the worst case, which keeps the big-O behavior the same as that of the strict qsort.
Are we using lazy evaluation to complete change how much code gets run from outside the function?
I'm not an expert, but the paper mentioned in this previous discussion sounds like it is possible for lazy evaluation to beat strict evaluation in big-O.
Oh yeah you can have an infinite list, but um, so what?
I can't say that impresses people often, but have you ever wondered why logical operators in C/C++ are short-circuited? (i.e. if (someVar && func() ) { ... } won't call func() if someVar turns out to be false). So, the philosophy is: a) Why should something be evaluated if it may not be needed? b) why can't I have infinite lists?

No, since qsort [ a | a Aha

No, since qsort [ a | a

Aha I see, I think. If I have a list [4,7,5,1,2]
we get:
[1,2]

after the first call.

It can then go into to the second call:

[] ++ [1] ++ ....

which gives us:

[1] ++ ....

Which touches 8 elements (as opposed to 5 with the regular min.), and looks like the regular qsort, but for just half the list as you said. What would work would be selection sort, but of course that's just N executions of naiveMin anyway.

Darn. So much for my deep insight into laziness. I didn't read the paper yet, as I don't have the proper software on this machine, but I'll be sure to.

Why can't I have infinite lists?

Actually, this is a very good question, one that I've managed to get to know a bit about, so I do my usual of trying to explain it in hope of understanding it better myself.

Infinite data structures and partial, in particular, non-terminating function are related. If you have potential non-termination, it is equivalent to allowing _|_ (bottom) to be in every type. In the Curry-Howard isomorphism of equivalence between types and propositions, the equivalent logic system no longer models the same propositional calculus that we are used to. You can still force it to, but it turns out to involve significantly larger amounts of work. In McBride's paper, Why dependent types matter (was on LtU before, can't remember where...), there is an outline for preferring total function over partial ones. That, however, is a type theorist's view. The engineering compromise that exists in Haskell is that infinite data types are just too cool to give up. By working in a different category, so-called CPO, instead of SET, you can make all this clean, after a fashion, and make infinite data types (codata) be defined and worked on in the same way as normal data; so the same foldr will work just as well over a list or infinite list. So the reason that you "can't have infinite lists" is to do with an engineering compromise that some languages have made, on the most fundamental level about their inherant type systems and proof properties.

For some reason this sort of thing fascinates me; this is probably bad. I nearly missed a physics supervision yesterday due to forgetting the time whilst reading Larry Paulson's work :S

Linear-time minimum via quicksort

Unless I'm missing something, head . qsort does yield the minimum in linear time on average. Denote by T(n) the time head . qsort takes on average on a list of length n. Then

  T(1)  = 1
  T(2n) = 2n + T(n)

The solution to these equations is T(n)=2n-1, for n a power of 2. In fact, it's the same recurrence as Hoare's Find.

apples to oranges

To me, a comparison of (take n . qsort) to c++'s std::partial_sort is a much better way to see what lazy evaluation gives you. The former being the compostion of two nicely separated, re-usable concerns. The latter being a specialized algorithm to avoid unnecessary computation, which had to be designed, coded, and maintained separately from c++'s std::sort.

qsort will never behave asymptotically like naiveMin... naiveMin is solving a much simpler problem. And I'll point out that your naiveMin shows that you can code a faster algorithm for a simpler problem without moving to a strict language. So again, to me, a comparison of qsort and naiveMin really doesn't have much to do with the merit of lazy evaluation.

sort vs. partial_sort

To me, a comparison of (take n . qsort) to c++'s std::partial_sort is a much better way to see what lazy evaluation gives you. The former being the compostion of two nicely separated, re-usable concerns. The latter being a specialized algorithm to avoid unnecessary computation, which had to be designed, coded, and maintained separately from c++'s std::sort.

Now that, makes some serious sense. Thanks!

no problem

no problem

Forget Infinite /Lists/

Infinite lists are certainly useful. Consider various kinds of iterators or generators that generate sequences. Further, not having to require an explicit upperbound often allows more modular programming while increasing generality, simplicity and ease of use. This is covered in the classic Why Functional Programming Matters. Also the stream style of programming as described in, say, SICP can be extremely natural. But only talking about infinite lists is nonsensical. Practically any recursive data structure can be infinite. Again from Why Functional Programming Matters, one of its main examples is a game tree as a lazily evaluated infinite tree. Further we have circular data structures and programs, e.g. the technique of tying the knot. Near the bottom of that page I have a response to a statement of Oleg's in which I implement a one-pass parser that supports forward- and back-links for an extremely simple language. (There are also a variety of other interesting things on that page and linked pages.) The thing to note (and this generalizes to much more complicated languages) is that the code for the actual parsing is no different from the code that would support only back-links. Essentially, to write a parser (or programs that have similar issues, e.g. a linker) that supports forward-links all we need to do is write the straightforward back-link only parser that returns the environment it builds up in addition to the AST and then simply pass that environment in as an argument; this produces code like let (ast,env) = parse tokens env that only makes sense in a lazy language. In a strict language, there are essentially two options: use a two-pass parser or mimic the lazy code by "marking" forward-links in the pass, returning a "fix-up table" and then mutating the marked forward-links using the fix-up table. Of course, only the former is available for a strict pure language (though the impurity is local so it could be handled a la the ST monad).