Practical Laziness

Hi y'all!

I have a language that I've been working on for a while that I intend to be lazy. At this point, the "flavor" of the language is pretty much set, but I'm still working out some of lower-level details, and I'm wondering about the implementation of laziness. Strict-evaluation is certainly easy to implement, since one never bothers to create thunks, but for non-strict evaluation, is there any literature on the practical elements that go into choosing how frequently to turn thunks into values? Do most implementations wait until they hit conditional forms or IO, or do they tend to put upper-bounds on how large a chain of thunks can get, or ... ?



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Some Keywords

I touched this not very much in depth, but there's a good summary of possible implementation techniques in the paper about GHC's Spineless Tagless G-Machine by Simon Peyton-Jones[1]

Then there is Urban Boquist's PhD thesis, "Code Optimization Techniques for Lazy Functional Languages"[2]. He uses the Graph Reduction Intermediate Language--GRIN--as his backend.

However, AFAIK, he no longer works on it, and one of its limitations was that it was actually an untyped representation. The JHC Haskell compiler uses a typed GRIN. The source code is freely available under the LGPL, so maybe you can take a look at that.

The JHC page [3] also has some other links related to this topic.


Simon Peyton Jones Book

Simon Peyton Jones wrote a book about implementing lazy languages.
It is available on-line in

way we did it once in early 90's

About 13 years ago I worked on the runtime of a lazy, functional, visual, dataflow programming language (for a company in Palo Alto) used to model financial instruments in a graphical environment. We needed to define an answer to the same question: when do you actually evaluate a lazy expression? The answer we used might apply to your situation, or inspire some other idea.

Conceptually, the entire tree of possible values that might exist at runtime was a single unrolled graph. The model supposed you'd have a node in memory for every possible value you might want to compute. But you'd only calculate those displayed as an output in the UI, or those whose outputs flowed into a displayed value.

All previous node values were cached (memoized), so you wouldn't calculate them again unless they became dirty. Nodes could only become dirty if a user pushed a new value into a displayed input, or if the node was downstream from outputs of such a dirtied input node.

Dirtying nodes from changed inputs was a push model: you'd follow outputs and mark nodes dirty if they depended on a node you just dirtied (and continue this recursively if you just changed a node from clean to dirty).

Displaying nodes in outputs was a pull model: a dirty node required recalculating inputs, and if they were also dirty you'd recursively keep going until you grounded in clean nodes.

In short, all answers follow from analysis of computations as a great big graph. But it sometimes requires difficult thinking when comparing to the dual model of imperative control flow.

From a naive look at that

From a naive look at that algorithm, it looks like you could get into situations where dirtying a dependency tree of nodes took up a lot of time. Did you find that to be the case in practice much, or did things tend to ground into already-dirty nodes pretty quickly? I suppose my second reaction is that you'll only tend to have large fields of clean nodes to dirty if you've got something actively using them and cleaning them, and so the cost of dirtying them will mostly get absorbed into immediate re-cleanings for calculating changed outputs.

dependency dirtying

Brooks Moses: From a naive look at that algorithm, it looks like you could get into situations where dirtying a dependency tree of nodes took up a lot of time.

In practice that would be a problem if mutating inputs happened often. But in the context of deployments that scaled (they had eyes on balancing bank books with really complex financial instruments), a pure non-mutable state functional programming was assumed for performance. The cost of dirtying a dependency tree was expected only when users played "what if" games while manually tweaking things they could see, and then user perceptible performance responsiveness was all we wanted.

Because I left to go work on Apple's OpenDoc, I never saw it deployed in contexts so large that we had to alter the naive implementation model to avoid some large scaling costs. For example, while the model pretended all nodes were always in memory, you wouldn't want to do that for very large graphs. But you could still make an external view of the system behave as if all nodes were always present.

Similarly, you could try safe approximations of the dependency dirtying model, such as dirtying entire classes of nodes by rule without actually touching memory for them. But it could get complex if you invented something fancy. But the simple minded dependency dirtying description is a simple spec to follow, so you would reason about whether fancy simulations still did the same thing.

Thinking about the "whole graph" (the entire execution tree, ie an unrolled call stack tree) sometimes helped when deciding whether a technique changed any behavior perceived by an observer. (It always made me relax, and reminded me of discrete math technique in school addressing all parts of a global system at once.) We wanted users to be unable to tell an imperative approach was not being used, so they could assume eager evaluation if that fit their mental simulation better.

The guys I worked with at first were hardware guys used to building large simulations of chip circuits, which might explain some of the approach used.

Sounds Like Cyclic Lambda Calculi

The approach that you're describing, Rys, sounds to me quite a lot like Lambda Calculus Plus Letrec. With respect to the topic at hand, we find, in Part III:

Since most implementations of non-strict functional languages rely on sharing to avoid repeating computations, we develop a variant of our cyclic lambda calculus that enforces the sharing of computations and show that the two calculi are observationally equivalent. For reasoning about strict languages we develop a call-by-value variant of the sharing calculus. We state the difference between strict and non-strict computations in terms of different garbage collection rules. We relate the call-by-value calculus to Moggi's computational lambda calculus and to Hasegawa's calculus.

Credit where credit is due: Tim Sweeney pointed me to this work some months ago.