"Practical" advantages of lazy evaluation

Hi all,
i am new to functional programming paradigm. One thing that i have doubt about is lazy evaluation. I dont know how much it is useful in practical sense of programming? Is it useful, because it allows us to be less careful at the time of writing the program ( by not giving syntax errors) and then crashing suddenly during actual execution of the function .. i am not sure. Can you please explain a little on this topic.

regards
chinmay

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

optimization: unused arguments are not evaluated

Say you have a function which doesn't use some of its arguments, because of a certain condition. Without lazy evaluation these arguments are always evaluated, which might be an expensive operation.

That's the theory

That's the theory, but it would be good to see non-contrived examples where this is a significant win, examples that balance out the overhead of lazy evaluation.

(Sorry if I sound picky here. I've just been noticing that the performance benefits of lazy evaluation are often cited, but rarely supported with samples from real code projects.)

What, inefficiently

What, inefficiently generating an infinite list of primes is too contrived? ;)

Seriously, I think the supposed performance benefits of laziness are a red herring. In a bit of recent real-world coding (generating DNA datasets), I found that once I adjusted to thinking in terms of streams, laziness made some styles of programming elegant and concise where they would be awkward in an eager language. I guess this succinctness could encourage to algorithmic improvements sometimes, but with either strict or lazy evaluation, you have to be doing something weird to do anything but just the work you wanted.

Lazy quicksort example: using laziness to invent new algorithms

laziness made some styles of programming elegant and concise

Here's a neat example of that in Oz. Take the standard declarative quicksort algorithm which sorts a list of length n in O(n log n) average time. Now annotate the quicksort as lazy without changing anything else. Presto - this gives an efficient algorithm for finding the k smallest elements out of n, in time O(n + k log n). What's more, k does not have to be known in advance. The algorithm is given here.

Packrat parsing is another example...

of laziness making things asymptopically faster.

Not entirely

Laziness doesn't make packrat parsing asymptotically faster, it just makes implementing a packrat parser easier. You could say, though, that the not-implicitly-lazy-language implementations end up implementing laziness anyway, though.

Generating queries and

Generating queries and projections from large graphs or similar structures, when the user's only looking at a small chunk of the data at a time.

The greatest example of a lazy data structure

I think the web is one of the most ignored examples of laziness including potentially infinite data structures, optimization, what not. Imagine that when you visit a page on a server, it had to generate all the pages that you can possibly view. Even if you just consider a server generating all the pages for the links on the page you navigated to, laziness is what really pays off here - in terms of network performance, computations saved, ... and what the hell, it simply makes hyperlinking feasible in the first place.

Several years back, I wrote a small screen browser for large sized images such as aircraft maintenance diagrams. This implementation matched the laziness inherent in web interactions with corresponding lazy code on the server side and was rather elegant and robust.

The project I mentioned was rather small scale, but I can't help wonder if someone has a framework in a language like haskell that simply uses laziness to present a network of pages and interactions. Shriram Krishnamurthi's continuation-based library probably comes closest to what I mean.

Shriram Krishnamurthi's

The project I mentioned was rather small scale, but I can't help wonder if someone has a framework in a language like haskell that simply uses laziness to present a network of pages and interactions. Shriram Krishnamurthi's continuation-based library probably comes closest to what I mean.

The web-calculus is exactly what you mean. :-)

What type of laziness?

There are two sorts of laziness out there. The first is optional laziness, such as is found in Ocaml. In this case, most of the program is evaluated eagerly, but some computations can be marked as lazy (in Ocaml, the 'lazy' keyword introduces a lazy evaluation, that can be forced with a call to Lazy.force).

This level of laziness is usefull in developing purely purely functional data structures. I did a short introduction here on the usefullness of lazy lists, even in an eager language like Ocaml.

The other "style" of laziness is pervasive laziness, like Haskell. This does bring some advantages- for example, laziness has kept Haskell pure, and made it easier for Haskell to parallelize (unlike, for example, Ocaml). Also, it opens up possibilities for high level optimizations which are either incredibly difficult to do or generally not worth it in a language like Ocaml. For example, it is always true in Haskell that map f (map g lst) is the exact same as map (f . g) lst . This is true for all functions f and g, and all lists, in Haskell- it is not true for all functions and all lists in Ocaml (hint: what happens if f or g has side effects?). And thus Haskell doesn't have to perform tricky and complicated tests to see if the transformation is safe. In addition to eliminating the intermediate data structure, this transformation has also opened up new optimization possibilities (the function (f . g) can be specialized and inlined in many cases).

There are downsides to pure laziness as well. For example, based on results from the Great Computer Language Shootout (where Haskell is performance competitive with Java and C#), laziness costs about as much as executing in a virtual machine. Plus, you have the problem of space leaks.

Purity

All of the advantages you list have to do with purity and not at all with laziness. As discussed in the Haskell retrospective, perhaps the primary benefit of laziness, as you said, is to have kept Haskell pure.

The big issue in practice so far as I've seen for laziness is not performance exactly*, but simply that the "radically different" performance characteristics. Code must simply be written differently in a lazy-by-default language as opposed to an eager-by-default language; it is not something that you can think about when you want to. StackOverflow on the (old) Haskell wiki, describes one instance (though it isn't explicitly addressing this point.)

* I doubt laziness is the "issue" for GHC's performance in the shootout. And space leaks are related to the fact that you must program differently in a lazy language, but I will admit that understanding the space/retention behaviour of a lazy program is objectively harder in general.

I'm not sure

Enforced purity is an advantage, I'll agree. But I'm not sure that deforrestation is always correct in an eager language.

Actually, here's an example. Say we have an original list such that g throws exception G on the third element, and f throws an exception F on the first element of map g lst. In an eager language, then, the code map f (map g lst) throws exception G- as the first thing it does is evaluate map g lst to completion, at which point on the third element g with throw the exception G, and f will never evaluate a single function. On the other hand, the code map (f . g) lst throws exception F- as in an eager language it will evaluate f . g on the first element of the list, at which point f will throw exception F, and g will never be evaluated on the third element.

Note that in a lazily evaluated language, both code samples will always throw F, wether the optimization is applied or not.

In an eager language, it might be possible to either type of inspect both f and g and insure at least one doesn't throw an exception, and that if this is the case, it's possible to apply this optimization anyways. But that's a lot more difficult than the situation Haskell is in, where it can just apply the optimization without having to check anything.

Exceptions are impure

...at least by common definition. And the example you show is one of the reasons.

Purity again

Actually, oftentimes fusion has rules on the strictness that would not be a problem in a strict language. A recent example is the stream fusion rule eliminating fromStream . toStream.

As for your example, it is, again, a purity issue. Exceptions are effects. Nor is it guaranteed which would be thrown in a non-strict (as opposed to lazy) language, which Haskell is. This is why you can't catch (these) exceptions in Haskell 98 and the semantics for exceptions for GHC is "imprecise".

Finally, laziness does not technically enforce purity, but have fun understanding your code in an impure lazy language.

[Edit: Actually, the problem is the mix of laziness and strictness.]

Deforesting strict languages

But I'm not sure that deforrestation is always correct in an eager language.

If you are talking about full blown deforestation by Wadler et. al., it might introduce termination. Given

f x = y

and applying Wadler's deforestation on "f bottom" it will yield y, which terminates. Changing the deforestation algorithm this can be avoided, but I haven't found anything published on it.

Thank-you so much

Thank-you so much for writing that blog post discussing the usefulness of lazy lists. I think that I actually understand now. Of course, I had heard the phrase "separating generation from selection" before, but that didn't really mean anything to me. If anyone reading this is even slightly fuzzy on why lazy lists are so great, read that post!

PS: First post. I've lurked here for a while, but this is the first thing that's made me feel the overwhelming need to write a comment. Thanks again, bhurt-aw!

One advantage not mentioned

One advantage not mentioned by anyone (including me) so far is that laziness (with purity of course) gives you an unrestricted beta rule.

Lazyness in C

When you know a bit C or Java already just look at the evaluation of conditions like this:


if(A && (B || C))
{
// do something
}

It is an example of lazy evaluation since not every subterm of the condition is evaluated before the whole term evaluates and finally the if-block gets evaluated. The conditional expression is evaluated from left outermost to innermost. This can be quite crucial when you program defensively:

if( A && A[0] == 1)
{
// do something
}

So lazyness is a matter of fact also in nonpure, imperative languages.

Advantages

The benefit of lazy evaluation is that it allows the language runtime to discard subexpressions that don't determine the end result of your program. Often, when you have lots of temporary computations and conditionals, some work can be discarded this way. As a result, lazy evaluation can sometimes reduce the combinatorial time complexity of an algorithm.

The drawback to lazy evaluation is that it forces the runtime to suspend the evaluation of subexpressions until it is certain that they determine the end result of your program, in order to guarantee that an unevaluated subexpression that diverges doesn't cause the program itself to diverge. This must be done by creating thunks containing the suspended state of a subexpression. In a garbage-collected language, this can sometimes increate the space complexity of an algorithm.

Lazy evaluation also increases expressiveness, as it enables you to access components of data structures out-of-order as you're initializing them, as long as you don't create any circular dependencies.

There are intermediate schemes such as lenient evaluation that bring some of the benefits of lazy evaluation without some of the drawbacks.

Erroneous values

Also just to add on to this, one thing I enjoy about Haskell is that you can not only discard temporary computations, but you can also pass around erroneous values (such as _|_) in datastructures lazily. These too will be discarded without escalating into exceptions (assuming of course that your code is correct and doesn't actually evaluate them!)

This tends to remove many checks and hacks related to edge conditions and border cases in algorithms. Say you have a calculation that's meaningless on the leaf nodes of a tree (it ends up attempting a division by zero on those nodes), and you have another algorithm that ignores the leaf nodes anyway. In a lazy language you don't have to worry about checking for this case, inventing some spontaneous data structure to hold a bogus value, or catching any kind of arithmetic exception (since the actualy division by zero never occurs.)

But is that what you are

But is that what you are talking about really a property of lazy evaluation? I remember the time when I used CAS intensively and thunked terms were pervasive in symbolic manipulations. An unbound name never lead to a failure and expressions and not only values were often endproducts of computations. However there was just very rudimentary lazy evaluation ( as in my C example ) available.

Infinite data structures

There are lots of cool things about being able to work with infinite data structures, which is something lazy evaluation gives you.

For a mathematician who's used to thinking of the natural numbers as a single object â„•, being able to say "[0..]" in Haskell is priceless. Writing the Sieve of Eratosthenes to get the prime numbers is pretty much just writing down its description.

primes :: [Int] -> [Int]
primes (n:ns) = n : primes (filter (\v -> v `mod` n /= 0) ns)

In an imperative language, you have to think hard for a while to figure out how to implement it. You need to think about when the subparts are going to be calculated and how to allocate space for them.

The cleanest implementations usually fake it with iterators, but even so, the added layer of notation and implementation makes it harder to think about. (Exercise for the reader: implement "iterator<int> primes();" in C++.)

Automatic backtracking is another nice feature that naturally falls out of lazy evaluation.

What about space?

I'm mostly clueless about Haskell, but I think I've heard that while some things are easy to write down, they can also lead to - possibly surprising to the lay user - large requirements in space. How many thunks get generated, etc. for various things which are easily transliterated into Haskell?

So presumably as with all things a) there are trade-offs to understand and b) the end purpose matters; if you want to kick some ideas around and don't have particular performance bounds in mind vs. writing the next game server / stock trader.

If you haven't read it,

If you haven't read it, check out Why Functional Programming Matters.

Why Functional Programming Matters

Let me reemphasize this. This is one the best motivating papers for lazy evaluation and higher order functions even 23 years after it's creation. The most significant feature missing from it's treatment of laziness is circular programming, a.k.a. tying the knot.

It seems that no one

It seems that no one responsed the mention of syntax erros by the original poster. It may be worth noting that lazy evaluation, in the sense discussed here, has nothing to do with not catching syntax errors or type errors at compile time. Laziness refers to the timing of the evaluation of arguments passed to functions.

Suppose you want to perform a number of...

...steps on a long list of data in a strict language. You could try to write modular code that looks something like this:

list' = step1(list)
list'' = step2(list')
list''' = step3(list'')

But this entails having the entire list sitting in memory at one time so it can be passed in and out of the steps. In order to avoid this overhead you could write your code as

for each element in list:
element' = op1(element)
element'' = op2(element')
element''' = op3(element'')
append element''' to result

But the individual steps are no longer encapsulated. We've had to crack each one open so that the individual operations can be interleaved.

However, if we are working with lazy lists, we can write code that looks like the former, but behaves like the latter, because each element in the list is only created and processed as it is needed. We no longer have to write interleaved code.

This gives a kind of orthogonality between modularity of code and modularity of execution. That the code is expressed in terms of acting on entire lists doesn't entail that it actually processes entire lists. This decoupling of modularities (if I can call it that) can be very useful in writing efficient and yet readable and maintainable code.

Two Kinds Of Laziness

It strikes me that the overwhelming majority of the benefits of Lazy Evaluation actually involve large (potentially infinite) data structures. So by making data constructors non-strict (e.g. the list constructor ':') you automatically gain the majority of the benefits, (which are of course extremely worthwhile).

The downside to Laziness is large numbers of thunks being created and then immediately evaluated because more often than not function application needs to be Eager.

It is simple enough to force function application to be Eager, but in our new massively multi-core world we are more likely to have spare CPU cores available than not, so I can't help but think that the default for function application should be Eager Evaluation rather than Lazy, along with the option of making it Lazy should it be useful.

Having said that, it is quite likely that I am missing something important, in which case I would love to be enlightened.

(N.B. I know the Haskell standard actually states "non-strict", but I believe every major implementation is Lazy and it is Laziness being discussed).

We should all know by now

We should all know by now that CONS should not Evaluate its Arguments...

Laziness is avoiding work

We recently had a situation at "work" where laziness would have been useful. We were generating lists of email addresses to add as mailto: links on a student information system. Querying the database and generating the text took quite some time to do, and 99% of the time the link is not followed. The obvious optimisation was to only generate the mailing list when requested -- lazy computation. This isn't the first time we've made such an optimisation. No doubt if our language was lazy by default I'd be reporting the reverse!

Parallel Call By Need

programmable control structures for domain specific languages

One of the biggest practical benefits I didn't find mentioned above for lazy programming languages is that you can define your own control structures without extending the language. "if then else" or "while" isn't usually definable directly in a strict language (without abusing some form of quotation operator). In a language like Haskell you can build full fledged (optionally imperative) sublanguages with your own control structures.

Smalltalk

, of course,implements *all* its control structures by abuse of its quotation operator (albeit the block construct is a form of quotation which is well-suited to this purpose). Or some approximation thereof. :)

Why?

Why does lazy evaluation make this possible?

Using strict evaluation only

it's impossible to write a function which has the same semantics of, say, the ternary operator in C/C++

a ? b : c

which evaluates a, then evaluates b if a converts to true (or a nonzero value in older dialects of C which lack a boolean type), OR evaluates c if a evaluates to false/zero. Note that exactly ONE of b or c is evaluated.

You cannot write a function in C/C++ with the same behavior. You can try (in C++-style pseudocode, glossing over a few dirty C++ details such as pass-by-const-ref or pass-by=value):

template <class T>
T choose (bool a, T b, T c);

However, when you try to call such a beast, ALL of a, b, and c are evaulated--and in an order of the compiler's choosing. If either b or c causes termination or an endless loop, is expensive, or has a side effect--this is a Bad Thing. Note that this prevents use of conditionals to enforce preconditions; the following code

const char *str;
// do something to str;
choose (str != NULL, process_str(str), process_str(""));

will likely crash if the process_str function assumes its argument is non-NULL. On the other hand, this is perfectly safe:

str ? process_str(str) : process_str("");

as the "true" path is only executed if the precondition is satisfied.

Note that C/C++ macros use call-by-name, which can be used to implement call-by-need; so conditionals can be wrapped by macros. But C/C++ macros are dirty nasty things for myriad other reasons. :)

In Smalltalk, which has no lazy operators, one uses quoting to simulate this. Quoting in Smalltalk is done by the block constructor, which creates a lightweight, statically-scoped closure-like-thing that defers evaluation of its body until sent the "eval" message. Thus in Smalltalk one can write

(str = nil) ifTrue: [ ^str process. ] ifFalse: [ ^str process ].

Apologies if I scotched the Smalltalk syntax. For those unfamiliar with Smalltalk, "ifTrue: ifFalse:" is a 2-argument method supported by many objects, including the Boolean objects true and false. True's implementation of this method sends the eval message to its first argument, causing the quoted code which makes up the body of that block to be evaultated; likewise the implementation provided by false does the same with the second argument. Note that ifTrue: ifFalse:, like all methods in Smalltalk, is strict--both arguments are passed in; but only one gets sent the eval: method.

One can do similar things in C/C++ code with function pointers or "functor" objects, but since neither language supports lexical closures you can't really use this technique to implement general-purpose control structures. In particular, the body of any C/C++ "blocks" cannot reference any variables in the enclosing function's activation record. Java's inner classes come closer (being partially lexically scoped), but not close enough--JDK1.7 reportedly will introduce a construct similar to Smalltalk blocks.

(Edited to fix markup)

inlined block execution

Scott Johnson: For those unfamiliar with Smalltalk, "ifTrue: ifFalse:" is a 2-argument method supported by many objects, including the Boolean objects true and false.

Smalltalk compilers typically optimize such messages to inline them. (It's how I did mine back when, and the only way I've seen it done in other compilers.) Basically a compiler can statically ask about every keyword message: is this a keyword message that looks like those I inline? (Eg, is this message "ifTrue:ifFalse:" and are the two arguments blocks that take zero parameters?)

On a match against this, a compiler can assert the receiver is boolean, or fallback on general dispatch when it's not. When the receiver is boolean, the block code is inlined just like the if/else clauses in a C compiler. So representation as a zero arg lambda, or some other kind of continuation, isn't necessary.

But the class hierarchy can provide actual implementations in the Boolean class, as well as in subclasses True and False, and this can mislead one into thinking the class code is actually reached at runtime, instead of inlined.

Otherwise I vouch for your description. Blocks represent delayed evaluation, which is why they're arguments to control structures, since otherwise every branch would be evaluated as parameters before dispatch. (I think it's common for folks learning Smalltalk at first to ask "why are these blocks?" because the role of delayed evaluation in control structures is not apparent.)

Thanks for the clarification

I was describing semantics, and didn't include any notes on implementation.

It's probably worth noting, for those unfamiliar with ST, that similar tricks are used in Smalltalk to implement looping constructs, BTW; coupled with recursion. (Smalltalk has no looping construct either). And any reasonable Smalltalk implementation will optimize the static cases, producing code which looks similar to a looping construct as found in other imperative languages.

whileTrue and intervals

I meant no slight — I'd assumed you already knew what I'm saying. I was aiming at other folks passing through, who might otherwise suppose ST did control structures inefficiently by dispatch.

(Smalltalk has no looping construct either).

Once again, this is just for other folks. Smalltalk has whileTrue: which is basically like a while statement in C, and since most looping contructs in C map onto while, I don't suppose much will seem missing.

A lot of folks using C and C++ write loops looking like this:

for (int i = 0; i < n; ++i) { /*something*/; }

In Smalltalk this can be written using an interval like this:

(0 to: (n - 1)) do: [:i | "something". ]

because the to: message to an integer creates an interval. [cf]

Pervasive Laziness vs. Optional Laziness

I've been thinking a lot lately about laziness in different languages, and I have a question that I hope an experienced functional programmer might be able to help me with.

As I understand it, in Haskell, lazy evaluation is the default behavior. There are several other languages that have some form of laziness available, such as generators in Python or delay/force in Scheme. As described by the previous posts in this thread, laziness can be extremely helpful at times. However, I don't fully understand why Haskell's pervasive laziness is better than optional laziness.

One person mentioned that pervasive laziness could make parallelization simpler. Also, since Haskell's standard library is mostly lazy, it is more likely that a standard library function can be used as part of a function that you want to be lazy. (With Python, you might have to re-implement some standard library procedure that wasn't originally written as a generator.)

Are there any other advantages to Haskell's pervasive laziness?

As I understand it,

As I understand it, pervasive laziness forces you to make the language purely functional and outlaw unrestricted side effects.

Based on your comment

Based on your comment, I re-read SPJ's Haskell retrospective slides, and one of the slides describes stopping side-effects as the big reason for pervasive laziness. I think I understand now how laziness forces a language to hold fast to the ideal of purity.

Now if only I could figure out the big reasons why purity is so important... :)

Now if only I could figure

Now if only I could figure out the big reasons why purity is so important... :)

Which is exactly why I wrote: On the Importance of Purity. :-)

Purity is important...

...because it enables (or at least makes tractable) lazy evaluation! :)

The great thing about tautologies is they are always true.

Actually, purity is important because it permits referential transparency, which is useful even in a strict language. Referential transparency permits all sorts of analyses which cannot be done if it no longer holds--and any term which is dependent on a mutable value, or which causes mutation--isn't referentially transparent.

Note that for a term to be RT, the transitive closure of its dependencies must be RT. One act of mutation, or dependency on a mutable or volatile thing, and RT is broken.

However, non-RT expressions can include RT components. Sometimes you *want* side effects in real-world programming. Even then, it's nice to segregate the world, so to speak, into those parts which are RT (keeping in mind the transitive closure rule above) and those which aren't. Things like monads are a way of cleanly doing this--of encapsulating the stateful part of your program in such a way that most of it is referentially transparent. Think of it as a clever bit of refactoring for purity's sake.

Don't read the slides!

Read A History of Haskell: Being Lazy with Class by Hudak, Hughes, Peyton Jones and Wadler.

Slides are a terrible way to glean deep insight.

Carelessly, I scanned the document looking for some suggestive areas to read. 3.2 Haskell is pure may be a good place to start:

Once we were committed to a lazy language, a pure one was inescapable. The converse is not true, but it is notable that in practice most pure programming languages are also lazy. Why? Because in a call-by-value language, whether functional or not, the temptation to allow unrestricted side effects inside a “function” is almost irresistible.

I don't have the time to fish up more from this rather long paper. Sorry.

As I understand it....

...one advantage to everything being lazy is that it makes the language more orthoganol. That is, if laziness is optional, you many times end up with two sets of functions - one strict and one lazy - depending on the need at hand.

Two sets of functions

No, if you need a strict version of a function in a lazy language, you still need to write two versions. An example would be foldl'.

No way to genericize that?

Could a language have a "force" modifier that would convert a lazy something into a non-lazy something? So you don't have to literally write the entire function again?

Not in general

One problem is that non-strict code can be written differently. For example, you can create infinite structures, using code that in a strict language would never terminate. A strictness modifier would fail on such functions, and you'd need a very smart compiler to warn you about such things - e.g. it could possibly notice the lack of a base case in a pattern-matching recursive function, but in general the halting problem would probably get in the way here.

One can be enough

...if your language provides some form of introspection to discover lazy suspensions (which, technically speaking, is impure, of course). For example, in Alice ML you could write:

fun map f xs if isLazy xs = lazy map f (await xs)
  | map f nil = nil
  | map f (x::xs) = f x :: map f xs

This version of map is as lazy as the argument's spine.

Similarly it can be done at

Similarly it can be done at a type level using relevance types (a substructural type, related to linear types but without the "and only once" part) and the appropriate flavour of polymorphism. Of course, the hope is that the compiler will specialise and you'll have two separate implementations under the hood. Somewhere on my to-do list is knocking up a language and implementation thereof somewhere between a mini ML and Fluet's rgnURAL...

One of the early chapters...

of ATTAPL discusses something very similar--the use of relevance types to optimize away laziness and memoization in a lazy programming language. If the compiler can prove that a given argument to a function (here treating functions as n-ary) will always be needed, then the compiler can "strictify" the argument and optimize away the thunk that would normally be passed to encapsulate the delayed computation.

The same chapter also talks about other substructural types, including linear types, and their uses in PLs.

maybe not for all values of practical, but

from the already converted, and all.

Strictness annotations everywhere, sometimes?

Interesting question ... when looking at Clean or Haskell source code (I'm still in the early phase of learning these languages) I have noticed, that some programs are absolutely full of strictness annotations (that disable lazyness).

Are there programs which a strict language would have been inherently better suited? I don't mean switching from Haskell to C++, just thinking of a Haskell2 that is strict by default, and lazyness is optional.

Wrong default

Are there programs which a strict language would have been inherently better suited?

Almost all of them. Personally I think that laziness is the wrong default. I'm sure you can find my arguments with google. I'm not going to waste my time repeating them here.

I look at some of your

I look at some of your comments on comp.lang.functional. What are the reason beside performance?

It's hard to reason about

It's hard to reason about resource use when laziness is the default.

Isn't that kind-of a

Isn't that kind-of a perfomance issue? Or are you talking about doing things in the correct order?

edit >

I am wondering if there is something about lazy semantics that makes it less useful than strict semantics. A lot of critique of lazyness seems levelled at the implementation.

Correctness, predictability

What are the reason beside performance?

What I care about is understanding the dynamic semantics of the programs I write. My objection to laziness as the default isn't really about performance as such. I consider a program that fails because it has an unintentional space leak incorrect --- not merely inefficient.

superset

Outside of performance issues, lazy evaluation is a superset of strict evaluation, is it not? (in a purely functional language, of course)

And the point is?

One could also argue that the set of terms that run to completion without errors in a non-statically typed language are a superset of the terms that run to completion without errors in a corresponding statically typed language. Yet, there are quite a few people who prefer statically typed languages, because a statically typed language typically makes it easier to reason about programs (by ruling out many kinds of errors statically).

meh

I suppose I might as well cut to the chase. What do you find wrong with lazy evaluation?

Prone to leaks

Like I said earlier, I'm not going to waste my time repeating the same arguments here. There is nothing wrong with lazy evaluation per se. It is having lazy evaluation as the default that I think is wrong and I've already hinted as to why on this thread.

oops

Sorry, I hadn't noticed your previous posts. Could you provide an example of a space leak that is "unintended"?

Just google for "Haskell space leak".

Just google for "Haskell space leak".

ok

I can certainly see where you're coming from, but I have one nagging question. What makes space leaks caused by laziness any harder to predict than space leaks caused by strictness? Presumably the latter is easier to deal with because people have more experience with strict languages. Is there something intrinsic to laziness that makes reasoning about space usage harder, irregardless of experience?

That has already been

That has already been answered. If you had actually googled for previous arguments, you should already have an idea.

The problem is that, in general, space usage with lazy evaluation depends on the behavior of both producers and consumers. The same cannot be said of (pure) strict evaluation (except in the trivial sense).

Let's consider the map example by Derek Elkins. It is relatively easy to come up with a formula for the space consumption of the strict version (assuming purity). Given a list of length n, the strict map always takes O(n) space for storing evaluation contexts (stack). This does not depend on the caller of map or the consumer that traverses the resulting list.

The same does not hold for the lazy map. In the worst case it also consumes O(n) space for evaluation contexts (unevaluated thunks), but in the best case it takes only O(1) space. This depends on the behavior of the consumer. If the consumer only examines the spine of the list, O(n) space is taken for holding evaluation contexts. Note that the same evaluation contexts also exist in the strict version, but are created and eliminated one-by-one, assuming a left-to-right order of evaluation or a more carefully written variation that makes the evaluation order explicit. If the consumer examines the list element-by-element, only O(1) space is taken (the contexts are created and eliminated one-by-one).

This may look favorable to lazy evaluation (taking as much space as strict evaluation, but only in the worst case), but the result does not generalize as can be seen from the second example provided by Derek Elkins. The strict version always takes O(1) space for evaluation contexts while the lazy version takes either O(n) space or O(1) space (assuming the result is never examined).

I believe that Derek missed the O(n) case for lazy map and the O(1) case for the lazy factorial. Quod erat demonstrandum.

the easiest way

Wouldn't the easiest way to deal with something like that be to:

1) profile your code
2) if something's acting funky, put a strictness annotation on it

?

If the application needs to be tuned more heavily than that, you should already know how lazy evaluation affects space usage, even if you're using a strict-by-default language. Otherwise you could miss out on an optimization opportunity.

That covers the general programmer and expert cases. Did I miss anything?

Easier said than done

2) if something's acting funky, put a strictness annotation on it

That is the difficult part. Even if you profile and notice space leaks, deciding where to put the strictness annotation is not easy. Just consider that to automate it, you'd have to solve the halting problem. (Perfect strictness analysis is undecidable.)

Note that adding a strictness annotation to a program may change it in observable ways (other than just space usage, which isn't considered observable in a pure language). For example, profiling for space might point to the lazy map. But you can't just replace the map function with a strict version. That could break the program.

Note that adding a

Note that adding a strictness annotation to a program may change it in observable ways (other than just space usage, which isn't considered observable in a pure language). For example, profiling for space might point to the lazy map. But you can't just replace the map function with a strict version. That could break the program.

If it does then one of two things have happened though:

1) Someone's used laziness to implement a 'time travel' trick
2) You're throwing things that don't terminate around

Both are admittedly easier to spot in a strict language because you have to encode the laziness. But once you've done that, the language isn't going to help you reason about it any.

That is the difficult part.

That is the difficult part. Even if you profile and notice space leaks, deciding where to put the strictness annotation is not easy. Just consider that to automate it, you'd have to solve the halting problem. (Perfect strictness analysis is undecidable.)

Use a binary search?

Note that adding a strictness annotation to a program may change it in observable ways (other than just space usage, which isn't considered observable in a pure language). For example, profiling for space might point to the lazy map. But you can't just replace the map function with a strict version. That could break the program.

True, although incredibly rare if you're dealing with a purely functional language.

Use a binary search? I

Use a binary search?

I guess you slept through your "theory of computation" course. How do you know the resulting program is correct?

irrelevant if you're dealing with a purely functional language without a bottom

Well, if you are dealing with a total language, you might just well use strict evaluation. I would guess that with a total language, which cannot be Turing complete, the compiler could likely be much more effective at strictness analysis.

I guess you slept through

I guess you slept through your "theory of computation" course. How do you know the resulting program is correct?

I'm not sure what you mean. If you don't know which function to make strict, use a manual binary search until the space usage goes back to normal.

Well, if you are dealing with a total language, you might just well use strict evaluation. I would guess that with a total language, which cannot be Turing complete, the compiler could likely be much more effective at strictness analysis.

Please re-check my post. I changed it.

Re: using binary search to insert strictness annotations

Sorry, but I'm not going to continue this part of the discussion. You either obviously do not understand the issues, and I'm not interested in investing the time to teach them to you, or you are playing a game. I recommend that you read about the halting problem, strictness analysis (start at Wikipedia, for example), and about space safety (I posted a link to a relevant article just earlier). You could also go to some active Haskell forum and suggest your solution there.

???

I'm not sure what you mean. You've never debugged something by commenting out specific regions of your code until the error stops?

???

I think he's asking how you know your program still works after you add the strictness annotation that the binary search suggests. Adding a strictness annotation in a language with _|_ can break things - change terminating programs to non-terminating ones.

Work-day Pragmatism vs. Applied Theory?

I'm inferring that Vesa is coming from the "if it ain't proven, it is likely to come back and bite you in the ass some day" and Curtis is coming from the "look, it basically works, ship it" perspective? (Just trying to guess/hilight what might be the different assumptions which lead to loggerheads.) I can really appreciate both sides.

Civility, please

The participants in this dicsussion are all well-known contributors here who we can assume are acting in good faith. No reason to suggest that one user or another is trolling.

Not necessarily

Some paper (I can't remember which one, sorry) showed that eager and lazy evaluation are duals of each other. In an environment with continuations, eager evaluation may cause a continuation to be executed which may prevent the (lazy) evaluation of a term that fails.

not sure

I'm not sure what you mean. Could you give me an example?

Symmetric Lambda Calculus

That would be Andrzej Filinski's Master's thesis.

On a different but related topic, in some sense, lazy and eager languages have dual space-leaks, and in my opinion, it's important that support for both be handy (and utilized). For example,

(define (map f xs)
    (if (null? xs)
        '()
        (cons (f (car xs)) (map f (cdr xs)))))

This is bad Scheme code. It has a space leak.

map f [] = []
map f (x:xs) = f x : map f xs

This is good Haskell code. It doesn't have a space leak.

(define (fact n acc)
    (if (< n 1)
        acc
        (fact (- n 1) (* acc n))))

This is good Scheme code. It doesn't have a space leak.

fact n acc | n < 1     = acc
           | otherwise = fact (n-1) (acc*n)

This is bad Haskell code. It has a space leak.

The best way to fix the bad Scheme code is with a laziness annotation. The best way to fix the bad Haskell code is with a strictness annotation.

Simplistic

map f [] = []
map f (x:xs) = f x : map f xs
This is good Haskell code. It doesn't have a space leak.

Your analysis is too simplistic. Space usage with lazy evaluation depends on both consumers and producers. It is a whole program property. The above code snippet gives us neither the complete set of producers nor the complete set of consumers. And, indeed, the above map can cause space leaks. For example, when the length of a list obtained via the above map is taken, it results in O(n) space being used to hold the unevaluated f x thunks. That would be comparable to the Scheme code taking O(n) stack space to construct the result. Also, the space taken by f cannot be reclaimed until the entire list has been forced.

I think it's a bit much

I think it's a bit much complaining when the strict alternative would consume a similar amount of space for the evaluated data!

It is not a complaint

I just pointed out that the lazy map has the potential for a space leak that is comparable to the space leak in the non-tail recursive strict map. Like I've said repeatedly, space consumption with lazy evaluation is a whole program property. Rather than trying to shoot the messenger by appealing to emotion and drawing out red herrings, I suggest you either accept or refute the statement.

Your space leak as presented

Your space leak as presented is behaviour that you're perfectly willing to accept from the eager function, and which is still in a significant sense present in its tail-recursive variant - the only difference being that you get space chewed up by values instead of thunks. Pointing out a double standard is hardly drawing out a red herring.

Now if you'd talked about the space behaviour of length . map f . map g, that would be different - we get thunks referring to earlier thunks instead in place of values, and twice as much heap space consumed. The problem isn't thunks (which in and of themselves are within a constant factor of the values they produce and may well be smaller), it's thunks that depend on other thunks. Incidentally, your statement that "Also, the space taken by f cannot be reclaimed until the entire list has been forced" is also incorrect - it's reclaimable as each individual element is forced, and until that forcing no space is taken up by f as opposed to a thunk for f's application.

I accept your statement. But I shouldn't have to fill in the relevant evidence for you! You've said that you don't wish to repeat your arguments - don't expect to give a half-baked version and not get called on it.

Personally I'm happy to work in a language with non-strict semantics because it allows me to reason about where code is in fact strict and can thus be eagerly evaluated whereas doing the reverse in a strict language is significantly harder. The behaviour you're pointing to is a result of evaluation strategy, not strictness per se. At this point the tools available aren't particularly well-developed, and I have no problem accepting that non-strict semantics and lazy evaluation are both the wrong default for you at present.

Confused

Your space leak as presented is behaviour that you're perfectly willing to accept from the eager function, and which is still in a significant sense present in its tail-recursive variant - the only difference being that you get space chewed up by values instead of thunks. Pointing out a double standard is hardly drawing out a red herring.

Sorry, but you are wrong. Aside from my position that lazy evaluation is the wrong default, I'm not here giving my "acceptance" to behavior --- neither strict nor lazy. I'm merely describing how they behave. But, to the point, a reply that I was preparing concurrently with you, already explained that what you say isn't the case: read here carefully. Both the lazy and strict versions construct pretty much the same evaluation contexts (I'm using the term loosely). What matters is the order in which they are created and eliminated. After all, lazy vs strict is about evaluation order.

If you push a context onto the stack and then pop it off the stack and repeat n times, you use O(1) space. OTOH, if you push n contexts onto the stack and then pop the n contexts off the stack, you use O(n) space. Likewise for thunks constructed on the heap. If you first create a thunk and then force it and repeat n times, you use O(1) space for the thunks. OTOH, if you first create n thunks and then force the n thunks, you use O(n) space.

Now if you'd talked about the space behaviour of length . map f . map g, that would be different - we get thunks referring to earlier thunks instead in place of values, and twice as much heap space consumed. The problem isn't thunks (which in and of themselves are within a constant factor of the values they produce and may well be smaller), it's thunks that depend on other thunks.

What you write above is completely bogus. You are making a category mistake. Even in

let
  l' = map (+ 1) l
  n = length l'
  ... uses of l' ...

the combination of map and length creates n thunks. The thunks take space and that space can be quantified.

Incidentally, your statement that "Also, the space taken by f cannot be reclaimed until the entire list has been forced" is also incorrect - it's reclaimable as each individual element is forced, and until that forcing no space is taken up by f as opposed to a thunk for f's application.

You are confused. Consider the following example:

let
  g x = x `elem` someVeryLongList
  l' = map g l
  ... uses of l', but no uses of someVeryLongList ...

Now, assuming lazy evaluation (as suggested by the Haskellish syntax), the list someVeryLongList cannot be reclaimed until g can be. The same leak does not happen with strict evaluation. With strict evaluation, the list someVeryLongList can be reclaimed as soon as map returns.

I accept your statement. But I shouldn't have to fill in the relevant evidence for you! You've said that you don't wish to repeat your arguments - don't expect to give a half-baked version and not get called on it.

What has happened here is that you are confused, which is part of what I wished to avoid.

Personally I'm happy to work in a language with non-strict semantics because it allows me to reason about where code is in fact strict and can thus be eagerly evaluated whereas doing the reverse in a strict language is significantly harder.

And the evidence for that is in? (I believe you are making another category mistake here, confusing laziness and purity.)

The behaviour you're pointing to is a result of evaluation strategy, not strictness per se.

Excuse me? No it isn't. Lazy vs strict evaluation is essentially about evaluation strategy --- about the order in which redexes are eliminated. This is fundamentally so, because strictness analysis is undecidable. I'm not interested in word games. Lazy, non-strict, call-by-need, call-by-name, I don't really care about the differences; call-by-name is hopelessly inefficient.

Sorry, but you are wrong.

Sorry, but you are wrong. Aside from my position that lazy evaluation is the wrong default, I'm not here giving my "acceptance" to behavior --- neither strict nor lazy. I'm merely describing how they behave.

"Space leak" sounds an awful lot like a value judgement to me. Per below, I'm willing to accept it in specific circumstances - but you're attributing it to the wrong causes.

Now if you'd talked about the space behaviour of length
. map f . map g
, that would be different - we get thunks referring
to earlier thunks instead in place of values, and twice as much heap space
consumed. The problem isn't thunks (which in and of themselves are within
a constant factor of the values they produce and may well be smaller),
it's thunks that depend on other thunks.

What you write above is completely bogus. You are making a category
mistake. Even in


let
  l' = map (+ 1) l
  n = length l'
  ... uses of l' ...

the combination of map and length creates n thunks. The thunks take
space and that space can be quantified.

Both the eager and lazy code construct the spine of l' and leave something in the list's item slots - the eager code leaves values, the lazy code leaves thunks. The difference in the space consumed is thus the difference between the thunks and the values, which is a constant factor and may well go in the thunks' favour. What does differ is when l can be freed assuming no other references to it - its spine goes as that of l' is constructed, and its items as the thunks referring to them are forced. So okay, the problem's thunks referring to other entities - but it's not the thunks themselves and citing them as the difference in space usage is at best confusing and at worst hand-waving.

Incidentally, your statement that "Also, the space taken by f
cannot be reclaimed until the entire list has been forced" is also
incorrect - it's reclaimable as each individual element is forced, and
until that forcing no space is taken up by f as opposed to a thunk for f's
application.

You are confused.

Yep, here I've read "the space taken up by a function" as someone used to first-order languages would - the space taken up by its execution. Mea culpa, you didn't say what I thought you did.

Personally I'm happy to work in a language with non-strict
semantics because it allows me to reason about where code is in fact
strict and can thus be eagerly evaluated whereas doing the reverse in a
strict language is significantly harder.

And the evidence for that is in? (I believe you are making another
category mistake here, confusing laziness and purity.)

All I need in a non-strict language to do this is to reason about a value's relevance, per relevant logic, relevant types etc. This is well-studied in the context of compilers for non-strict languages. I am not aware of approaches for strict languages other than attempting to decode the encoded non-strict code and work from there. It's rather difficult to conclude that a function's non-strict when you've assumed they're all strict to start with.

The behaviour you're pointing to is a result of evaluation
strategy, not strictness per se.

Excuse me? No it isn't. Lazy vs strict evaluation is essentially
about evaluation strategy --- about the order in which redexes are
eliminated. This is fundamentally so, because strictness analysis is
undecidable. I'm not interested in word games. Lazy, non-strict,
call-by-need, call-by-name, I don't really care about the differences;
call-by-name is hopelessly inefficient.

There is no single "strict evaluation" strategy - rather, it refers to strategies that are only appropriate for strict functions or environments without _|_. Demonstrating or at least assuming strictness is a prerequisite for using these strategies in a partial language. Non-strict semantics do not preclude the use of eager evaluation in any and all circumstances, and a function is strict whether or not the compiler is aware of this - \x.x is strict however dumb your haskell implementation is. As such, informed discussion has to distinguish the semantic properties of strictness and relevance from the evaluation strategies used. There's no reason that relevance inference can't be done in much the same manner as type inference, complete with annotations and without any unsafe casting or coercing going on. We can adopt much the same approach to eliminating the possibility of _|_ and enabling eager evaluation that way given a sufficiently powerful type system, but relevance inference is IMO closer to being doable in a sufficiently usable programming language - and ties in nicely with using other substructural properties for resource management.

"Space leak" sounds awful

"Space leak" sounds an awful lot like a value judgement to me.

You are kidding, right? Tell that to next person who comes to, say, Haskell-Cafe to ask why a program fails after using all memory and/or uses asymptotically more memory than the programmer predicted. I'm sure everyone will enjoy the joke. I can already see the title in the next Monad.Reader: "There is Value in Every Thunk; or, Why Space Leaks are a Value Judgment" :-)

Both the eager and lazy code construct the spine of l' and leave something in the list's item slots - the eager code leaves values, the lazy code leaves thunks. The difference in the space consumed is thus the difference between the thunks and the values [...]

I believe I can now better understand the confusion. When I say that the lazy version uses O(n) space to hold unevaluated thunks, I'm not making a comparison to strict evaluation. It is a statement of fact about lazy evaluation and quantifies the order of growth of space consumption used to support evaluation. I'm saying that the thunks, which are used to hold enough information to continue computation, are comparable to stack frames, which are used to hold enough information to continue computation, and are both artifacts of the evaluators. I believe Derek already understood what I was talking about.

There is no single "strict evaluation" strategy - rather, it refers to strategies that are only appropriate for strict functions or environments without _|_.

Like I said, I'm not interested in word games. The languages used in Derek Elkins' examples were Scheme and Haskell. Scheme uses strict evaluation and more specifically, call-by-value evaluation with an unspecified order of evaluation of arguments. Haskell, OTOH, uses non-strict evaluation, and, more specifically, is usually considered to use, call-by-need.

"Space leak" sounds an awful

"Space leak" sounds an awful lot like a value judgement to
me.

You are kidding, right?

Not in the slightest. You're describing behaviour as leaky in one situation when behaviour that's in a significant sense equivalent in another gets a free pass. "Space leak" implies that you consider the space behaviour unacceptable, that you are making judgements about whether that behaviour has any value for your purposes.

I'm not saying that there's anything wrong in doing so, I'm just pointing out that this is what you are doing.

I believe I can now better understand the confusion. When I say that the lazy version uses O(n) space to hold unevaluated thunks, I'm not making a comparison to strict evaluation. It is a statement of fact about lazy evaluation and quantifies the order of growth of space consumption used to support evaluation. I'm saying that the thunks, which are used to hold enough information to continue computation, are comparable to stack frames, which are used to hold enough information to continue computation, and are both artifacts of the evaluators. I believe Derek already understood what I was talking about.

I understand what you're talking about, but you're hand-waving away a significant amount of heap behaviour. It's not enough to compare stack frames and thunks. This is actually a significant part of what you're complaining about - the continuation behaviour is comparatively obvious syntactically whereas heap behaviour isn't.

There is no single "strict evaluation" strategy - rather, it
refers to strategies that are only appropriate for strict functions or
environments without _|_.

Like I said, I'm not interested in word games. The languages used in
Derek Elkins' examples were Scheme and Haskell.

I'm certainly not looking to play word games here - there is a point to the distinction I'm making, which I'll highlight in my response to your second post. While it's not unreasonable to treat Scheme as one expression of all that can be done with strict semantics until someone introduces some counterexamples, using Haskell as a bound on what's possible in a non-strict-by-default language is going to seriously skew any discussion of whether that can be an acceptable default.

Free pass?

You're describing behaviour as leaky in one situation when behaviour that's in a significant sense equivalent in another gets a free pass.

You lost me here. What is it that I'm giving a "free pass" to and where did I give it? As far as I know, I have not given a free pass to anything in this thread, and, again, I'm not complaining here about anything when I talk about the space usage of map. I'm merely describing it.

Derek Elkins described the Scheme version of map as having a space leak. This, I assume, is due to accumulation O(n) frames due to not using tail recursion. I have never given a "free pass" to that leak, if that is what you are implying. What I have done is that I pointed out that the Haskell version of map, under certain kinds of consumers, results in having O(n) live thunks. This I consider to be comparable to the Scheme version using O(n) stack frames. If the Scheme version is consider to have a space leak, then so should the Haskell version be.

what's possible in a non-strict-by-default language

I'm eagerly waiting for the Turing complete, non-strict language that makes space consumption as easy to predict as with strict evaluation --- while maintaining good time complexity, of course.

CPS transform

I'm saying that the thunks, which are used to hold enough information to continue computation, are comparable to stack frames, which are used to hold enough information to continue computation, and are both artifacts of the evaluators. I believe Derek already understood what I was talking about.

I don't think you're implying this, but to be clear: CbV CPS transforming the Scheme code so that it allocates no stack would not change my analysis.

It wasn't an accident that I mentioned this in the context of Filinski's symmetric lambda calculus where continuations are dual to values. From this perspective, my original post can be viewed as saying, lazy languages leak values and eager languages leak continuations. Conal Elliott prefers the term "value-oriented programming" to "functional programming". In this context (not quite what he was intending...), he's 100% on the ball. Functional languages are horribly asymmetric a la Filinski. (Incidentally, there are a lot of similar dualities between recursion and iteration from other perspectives.)

An actual interesting question rather than debating is: do the issues with lazy evaluation dualize? Is it easier to keep track of when continuations are used in a lazy language than a strict language? Would there be more involved "continuation leaks" if continuations weren't so second-class (even in languages with "first-class" continuations)?

Re: CPS transform

I don't think you're implying this, but to be clear: CbV CPS transforming the Scheme code so that it allocates no stack would not change my analysis.

Neither would it change mine. The order of growth of the space consumed to support evaluation would still be the same.

Relevance

All I need in a non-strict language to do this is to reason about a value's relevance, per relevant logic, relevant types etc. This is well-studied in the context of compilers for non-strict languages.

Admittedly my knowledge of substructural logics is limited, but, skimming a few papers and ATTPL, a relevant type system allows you to determine whether something is used at least once, which means that you might just as well evaluate it eagerly. (This is at least superficially similar to strictness analysis.)

I would assume that there are at least two ways to use such a type system. You could use one internally in a compiler, as a part of an optimization pass (as a companion to or a replacement for strictness analysis), or one could make the types a part of the language, requiring programs to respect type annotations with relevant types. In either case, like traditional strictness analysis, such a type based analysis will necessarily be conservative (at least unless you otherwise restrict the language).

In the former case, where relevant types are completely inferred, I'm not sure that this would really change anything. The current situation, where strictness analysis is just used as an optimization, is that sometimes you (the programmer) get lucky and sometimes you don't. Having the compiler infer relevant types in the dark and use them for optimization wouldn't change this.

Arguably, in the latter case, where relevant types are part of the language, the language would probably make it somewhat easier to reason about resource usage, because an annotation that says that something is relevant (used at least once), would probably effectively mean that the compiler evaluates it eagerly. I don't know whether this is practical or not. The usual problem is that the types quickly get very complicated and you wouldn't want to write them down --- leaving them to inference --- and then you (the programmer) also wouldn't get much of support for reasoning about space usage. OTOH, I'm not sure how different this really would be from using ordinary strictness annotations. That would likely depend on how the type system treats any possible type annotations. Assuming a safe inference and type system, this might mean that a programmer written annotation that is not consistent with the inferred (relevant) types, could lead to a program being rejected. I'm not sure whether this is really useful or not. AFAIK, sometimes strictness annotations are knowingly used to change the semantics of the program (the annotations aren't consistent with non-strict semantics in the general case).

However, all of this is pretty much irrelevant to my analysis of the examples provided by Derek Elkins. In particular, AFAIK, Haskell does not specify any particular strictness analysis or relevant type system to be used for optimization and there is no de facto standard across Haskell implementations for such optimizations. Also, I think that an evaluation strategy guided by a relevant type system should not be called lazy.

I am not aware of approaches for strict languages other than attempting to decode the encoded non-strict code and work from there.

That sounds complicated, but I would rather see a language that is strict by default, but allows one to use lazy evaluation with suitable annotations (rather than via encoding). Approaches like relevant types and strictness analysis could be used to optimize the lazily evaluated parts (even taking advantage of knowing that the rest of the program is strict --- it would likely just improve the accuracy of the analysis).

It's rather difficult to conclude that a function's non-strict when you've assumed they're all strict to start with.

Well, I'm not entirely sure what that would mean, but why should this be more difficult than concluding that a function is strict when you've assumed they're all non-strict to start with? In both cases you should have a pure language (or first prove the absence of side-effects). Consider the following function (written in SML syntax):

fun ignore _ = ()

Why should it be difficult to conclude that the ignore function could be (in some sense, at least) considered non-strict?

Non-strict semantics do not preclude the use of eager evaluation in any and all circumstances

I'm probably not reading you right, but eagerly evaluating a division by zero or eagerly mapping over an infinite list when the result isn't actually needed is not valid for a non-strict language, AFAIK. Something like Eager Haskell has to make extra effort to back out of such situations.

However, dually, there is no reason why a compiler for a language that uses strict evaluation by default could not use program analyses to evaluate some parts of the program lazily as long as it doesn't change the observable behavior of the program.

All I need in a non-strict

All I need in a non-strict language to do this is to reason
about a value's relevance, per relevant logic, relevant types etc. This
is well-studied in the context of compilers for non-strict
languages.

Admittedly my knowledge of substructural logics is limited, but,
skimming a few papers and ATTPL, a relevant type system allows you to
determine whether something is used at least once, which means that you
might just as well evaluate it eagerly. (This is at least superficially
similar to strictness analysis.)

Rather, that you can eagerly evaluate without altering the (purely) observable behaviour. There are still situations in which lazy evaluation is operationally preferable - an obvious one would be head . map f . map g.

Existing strictness analysis is indeed relevance inference and is commonly thought of as such.

Arguably, in the latter case, where relevant types are part of the language, the language would probably make it somewhat easier to reason about resource usage, because an annotation that says that something is relevant (used at least once), would probably effectively mean that the compiler evaluates it eagerly.

The right solution IMO is to explicitly separate proofs of relevance from hints and demands about evaluation strategy - the latter can be verified against the former. We then end up with seq split into two functions - an evaluation demand and an unsafe relevance coercion. The consequences of the latter can, of course, be tracked by the type system.

I don't know whether this is practical or not. The usual problem is that the types quickly get very complicated and you wouldn't want to write them down --- leaving them to inference --- and then you (the programmer) also wouldn't get much of support for reasoning about space usage.

Supporting partial type annotations can help a lot here. Allowing "no usage info" to be expressed simply by not saying anything will help too. Ultimately, when it's time to do some serious reasoning it's enough that the code contained enough annotations for the compiler to figure it out - after that you can bang type questions into a REPL or inspect explicitly-typed intermediate code to check what came out of the inferrer.

OTOH, I'm not sure how different this really would be from using ordinary strictness annotations.

Type-level support would let me ask ghci with a simple :t, separating evaluation hints from relevance coercions would let me know when the semantics have been intentionally changed and prevent doing so unintentionally. With existing annotations the best you can do is poke at the compiler's intermediate output and see what you got.

Also, I think that an evaluation strategy guided by a relevant type system should not be called lazy.

I'm inclined to agree, but this is where conflating lazy and non-strict has got us - and to be fair, it's going to be an awful lot lazier than eager evaluation. These days you'll have a hard time convincing people that GHC doesn't produce lazy code.

That sounds complicated, but I would rather see a language that is strict by default, but allows one to use lazy evaluation with suitable annotations (rather than via encoding). Approaches like relevant types and strictness analysis could be used to optimize the lazily evaluated parts (even taking advantage of knowing that the rest of the program is strict --- it would likely just improve the accuracy of the analysis).

I'm somewhat skeptical as to the kind of annotation burden (counting using lazy versions of functions as annotations as well) that would be involved in setting up a lazy pipeline - I know that present-day Haskell is no better if you want to set things up strictly, but type-based specialisation offers the possibility of rendering a large amount of code eager with a handful of well-placed annotations.

It's rather difficult to conclude that a function's non-strict
when you've assumed they're all strict to start with.

Well, I'm not entirely sure what that would mean, but why should this
be more difficult than concluding that a function is strict when you've
assumed they're all non-strict to start with? In both cases you should
have a pure language (or first prove the absence of side-effects).

But a non-strict language doesn't assume functions are non-strict to start with - just that they're potentially non-strict until shown otherwise. "Non-strict language" just means "language in which some functions are non-strict".

Consider the following function (written in SML syntax):

fun ignore _ = ()


Why should it be difficult to conclude that the ignore
function could be (in some sense, at least) considered non-strict?

It isn't, so long as you start off with a non-strict language! Otherwise you either can't express it at all without an encoding or are effectively inserting relevance coercions everywhere.

However, dually, there is no reason why a compiler for a language that uses strict evaluation by default could not use program analyses to evaluate some parts of the program lazily as long as it doesn't change the observable behavior of the program.

However, this leaves you with the nastier end of the halting problem - you need to check that code doesn't suddenly start terminating when evaluated lazily having failed to terminate under a strict scheme. When this becomes an issue, you won't necessarily leak space when one implementation doesn't do as good a job as another - you may well just leak time instead without actually allocating enough to eventually exhaust your storage. This is why IMO in the presence of a sufficiently strong type system laziness is the right default - it's a lot easier in most cases to set a boundary for 'too much space'.

Without that type system, it's clearly the wrong default for you whereas I'm comfortable it's the right one for me. But "laziness is currently the wrong default" is a significantly weaker statement.

Head strict lists

Indeed. The best "default" here is likely head-strict lists which solve this problem underscoring my point that being able to readily mix eagerness and laziness is extremely important.

In a different post you list three "gotchas" about the Scheme code. (3) is uninteresting. I'm not going to clutter the code with irrelevancies due to Scheme (R5RS, at least) lacking a standard proper list type. (2) is somewhat interesting in that it potentially wouldn't be a problem if the example was lazy. (1) is very interesting as it makes a point. Garbage collection makes the space usage of a program non-local. It is undecidable, in general, whether and when something gets garbage collected. Naively applying your logic, garbage collection is the wrong default. (I'll be nice and not even talk about side-effects which include non-locality in their very definition.) Similarly, using Turing complete languages at all is the wrong default. Let's assume, for the sake of argument, that you take neither of those stances. We, as the hypothetical you certainly isn't alone, accept this for two reasons: 1) these things afford us some benefit and 2) with help from a touch of delusion, we believe we can write in a style that curtails most of the potential complexity. (These apply to the side-effect case as well.) These apply to laziness as well. The question comes down to a subjective, do the benefits outweigh the costs and do you know how to control the complexity. This will be different for each person.

Finally, you argue against laziness as default however, the problems you list aren't related to laziness being the default, but to laziness at all. Whether or not it is the default you will have exactly the same problems. So to the extent you utilize laziness at all it will be important to know how to deal with it. Luckily, for you, whether or not you use a language that is lazy by default, you benefit from the practical experience of those that do.

Predictability not optimality

Indeed. The best "default" here is likely head-strict lists which solve this problem underscoring my point that being able to readily mix eagerness and laziness is extremely important. [...] Finally, you argue against laziness as default however, the problems you list aren't related to laziness being the default, but to laziness at all.

You are missing the point. I'm pointing out the inherent difficulty of predicting the space usage of lazy evaluation (compared to strict evaluation). Being able to easily predict space usage is highly important in programming. As I've argued repeatedly, and nobody has refuted the conclusion, laziness makes it more difficult to predict space usage. That difficulty is the reason why laziness is the wrong default.

What you are talking about is about achieving optimal space usage and that is something that I'm not arguing about. Being able to easily mix both eager and lazy evaluation is useful in getting there and I have never said that wouldn't be the case. However, that is a misguided goal. What matters is that a programmer can easily and accurately predict space usage and determine whether it is acceptable or not. I regularly use space inefficient algorithms, when I can bound the space usage and determine that it is acceptable.

Garbage collection makes the space usage of a program non-local. It is undecidable, in general, whether and when something gets garbage collected.

That is a completely separate issue and has entirely different properties. If you understand the space safety properties of the language (or compiler), then you can determine whether a specific variable is garbage or not at a specific point of evaluation. Whether or not the space is actually reclaimed precisely at that point, or some time later, almost never matters. What matters is that should the program need more memory, it will be able to reclaim at least the part of the garbage guaranteed by the space safety properties.

This will be different for each person.

So, naively applying your logic, there are no good or bad choices in programming language design. All properties of programming languages are entirely a matter of taste.

Syntactically apparent from the producer

(define (map f xs)
    (if (null? xs)
        '()
        (cons (f (car xs)) (map f (cdr xs)))))
This is bad Scheme code. It has a space leak.

Scheme gotchas: 1) In the above it is unspecified whether (f (car xs)) is evaluated before or after (map f (cdr xs)). This is significant in the sense that it determines whether the input list can be garbage collected before the map finishes. 2) The above goes to an infinite loop given a cyclic list. 3) The above fails when given an improper list.

OTOH, assuming we avoid the gotchas, the map itself takes at most O(n) stack space to process a list of length n. This is syntactically apparent (call to map is not at a tail position) from the code of map alone and we need not consider the whole program. Furthermore, as every Schemer knows, this is easy to fix. It is explained on the Scheme 101 course. Also, space taken by f is not held onto after the map returns.

Tail-recursive map and declarative agents

The above example of bad Scheme code turns into good tail-recursive Oz code when translated directly into Oz syntax. This gives:

   fun {Map F Xs}
      if Xs==nil then nil
      else {F Xs.1}|{Map F Xs.2} end
   end

This is because Oz has single-assignment variables. To understand the execution, we translate this example into the Oz kernel language (I give just a partial translation for clarity):

   proc {Map F Xs Ys}
      if Xs==nil then Ys==nil
      else local Yr in
         Ys=Xs.1|Yr
         {Map F Xs.2 Yr}
      end end
   end

That is, Map is tail-recursive because Yr is initially unbound. This is not just a clever trick; it is profound because it allows declarative concurrency and declarative multi-agent systems. Single assignment variables are read using dataflow synchronization. A concurrent agent with an input stream Xs and an output stream Ys can be created using this version of Map, as follows:

   thread Ys={Map F Xs} end

This agent has constant stack space, runs concurrently, and is declarative. To summarize, things are not so cut-and-dried as the simple lazy/eager duality implied by Derek Elkins' examples. See chapter 4 of CTM for more elaboration of declarative concurrency. In my view, thinking of execution in terms of a process calculus (like the Oz kernel language) rather than in terms of a lambda calculus is better because it gives a broader perspective on what is possible.

Example code

Cool, I didn't know this before. I should look at Oz code in kernel syntax more often. However, I think the translated example code should look like this:

   proc {Map F Xs Ys}
      if Xs==nil then Ys=nil
      else local Y Yr in
         Ys=Y|Yr
         {F Xs.1 Y}
         {Map F Xs.2 Yr}
      end end
   end

Kernel language for Map

Yes, you're right, my translation forgot to call F. Sorry about the bug.

A "Real World" example

in c++ even. I had to write my own rational class with lazy evaluation because the boost::rational class normalizes the rationals on every action. The class I wrote only normalizes completely on comparisons.
It is several ORDERS of magnitude faster than the non-lazy boost:rational.