## Representing Control in the Presence of First-Class Continuations

Robert Hieb, R. Kent Dybvig, and Carl Bruggeman. Representing Control in the Presence of First-Class Continuations. ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, June 1990.

Languages such as Scheme and Smalltalk that provide continuations as first-class data objects present a challenge to efficient implementation. Allocating activation records in a heap has proven unsatisfactory because of increased frame linkage costs, increased garbage collection overhead, and decreased locality of reference. However, simply allocating activation records on a stack and copying them when a continuation is created results in unbounded copying overhead. This paper describes a new approach based on stack allocation that does not require the stack to be copied when a continuation is created and that allows us to place a small upper bound on the amount copied when a continuation is reinstated. This new approach is faster than the naive stack allocation approach, and it does not suffer from the problems associated with unbounded copying. For continuation-intensive programs, our approach is at worst a constant factor slower than the heap allocation approach, and for typical programs, it is significantly faster. An important additional benefit is that recovery from stack overflow is handled gracefully and efficiently.

Many Scheme implementations do not implement call/cc particularly well, but it is important to remember that call/cc is not inherently slow. This is the classic paper describing a clever run-time representation that supports call/cc efficiently.

## Comment viewing options

### is this spaghetti stacks?

Is this a minor variation on Stallman's "phantom stacks" from 1980? See Phantom Stacks: If You Look Too Hard, They Aren't There.

Another (different but similar and probably complementary) approach of stack/heap hybrid can be found in Aubrey Jaffer's SCM Scheme implementation. To oversimplify his trick: environment allocation is stack allocated from a small stack and frequently copy collected to a general heap. That is, the collector is not exactly "generational copying" in the usual sense - environments are segregated into their own 0th-generation space because of their unusual but often highly regular allocation and short-lifetime usage pattern. (That description of SCM is oversimplified because SCM has an additional complication that, alas, it must nonetheless copy the C stack when capturing a continuation but that is more a historical artifact of the implementation than anything truly essential.)

-t

### Differences between Stallman and Heib et al.

I wasn't familiar with Stallman's paper before today, and there does seem to be a few similarites, and quite a few substantial differences.

One of the big concerns of Stallman's paper is dealing with the funarg problem, but funargs are an orthogonal concern to Representing Control. Stallman also talks about allocating cons cells and potentially arbitrary values on the stack.

By contrast, Chez closures are heap allocated, along with any mutable variables and cons cells. The only things on the stack are local variables that aren't explicitly mutated, whether they be immediate values or pointers to the heap.

Stallman also seems to suggest that the garbage collector will reorder the stack... however Chez's collector doesn't modify stack segments, other than to find and update pointers to objects on the heap.

Basically, in Representing Control, stack segments are heap allocated, as well as the linked list structure that manages the segments. So there seems to be some similarity in that the notion of "stack" versus "heap" is somewhat blurred, but that's where the similarities seem to end.

### Cheney on the MTA is essentially the same trick...

...except that the "small stack" is in fact the C stack. Each Scheme procedure allocates everything on the C stack, and calls its continuation without ever returning; when the C stack overflows, live data is evicted to the currently-live semispace of the heap, and the C stack is cleared via longjmp().

As implemented in Chicken Scheme, overflows are taken when the stack is 64K in size, much smaller than the kernel stack limit, so there is plenty of room to call C procedures. Tail recursion and call/cc are equally efficient, and the only problem arises when C and Scheme invoke each other too many times.

### 64KB is "plenty of room"?

64KB is "plenty of room"? Default thread stack size in Java is 256KB and in .NET on Windows it's 1MB. I wouldn't have expected the disconnect between popular OO VMs and Scheme to be this large.

### 64KB is not so bad - apples/oranges

You're comparing apples and oranges, I think. Aren't the 256KB / 1MB limits in Java and .NET hard limits on the depth of recursion? Cheney - not so. In Cheney, the 64KB limit is just a "cons threshold" the triggers a 0th generation collection - the depth of recursion is limited by the heap, not the 64KB.

There's a few design decisions at play among the set of ideas before us (Cheney, Phantom Stacks, SCM, Hieb/Dybvig/Bruggeman). All have in common that you have some stack allocation that is more or less a 0th generation, pretty limited in size. Variants are "what gets allocated from that stack" (e.g., Cheney: essentially everything; SCM: environment frames only), how compaction occurs, how "other generations" are collected, and so forth.

Subjectively speaking, I kind of like (intuitively, not in any absolute measure of computational complexity) the SCM hack of a 0th generation short-stack for parts of the environment in Scheme. It's darned subtle to implement but ultimately a small amount of code. It takes clever advantage of the "expected case" liveness patterns of environment frames. It takes clever advantage of the small number of easily found references to (the expected case of) uncaptured environment frames. SCM is by no means a fast Scheme by modern standards for lots of reasons external to this discussion. On the other hand, the introduction of this collection technique to SCM was a (measured) large improvement to performance.

-t

### We need more rigor.

Representing Control isn't about garbage collection or funargs, there is no garbage on the stack in Chez. I think this discussion has so far played much too fast and loose with the differences in the approaches.

In the original paper, when a continuation is captured, a stack segment is split into two new segments: this involves mutation, not copying. One of the new segments is the empty portion of the old segment, while the other is the portion of the segment that had useful data in it.

Handling stack underflow is critical to Chez's continuations. The return address at the bottom the new segment is set to a stack underflow subroutine, and the original return address is stored in the linked list of stack segments. A normal underflow handler will simply move down the linked list of segments, adjusting the frame pointer accordingly. However on the resumption of a continuation, the handler will copy (part of) the next stack segment.

I fail to see how underflow handling is critical to Cheney on the MTA, not to mention that stack segments are not "small stacks" or "generation 0 heaps". I'll admit I'm not as familiar with Cheney on the MTA as I am with Representing Control, but it seems to me that the differences are not slight.

### re rigour

My only point is that all of the techniques brought up here segment the stack in roughly the same way. They'll all have some broad performance characteristics in common and some minor ones that set them apart. They differ very much in terms of which one is best suited to a particular design context but they are all minor variations on one another. They are not interchangeable in a given context but, if we are allowed to vary the context, they are interchangeable. If the rest of your implementation suggests any of these variants - don't worry about picking which among them is best because they are all basically the same by the most interesting measures.

-t

### Plenty of room...

"Plenty of room" refers to the space left on the C stack, after you remove the 64k used as the small stack.

### re: "plenty of room"

So, are you saying that instead of 64KB it should be 16KB or something? The 64KB isn't exactly being "removed" and it isn't hard to imagine a Java or .NET hybrid with Scheme in which the default 64KB is diminished (at the cost of run-time speed) in the event of getting stack overflow exceptions. 64KB but, in the event of stack overflow, whatever is available that is less than 64KB.

-t

### survey by Clinger, Hartheimer and Ost

Clinger, Hartheimer and Ost has a pretty thorough survey of implementation strategies, including the Hieb-Dybvig-Bruggeman approach described above.

William D Clinger, Anne H Hartheimer, and Eric M Ost. Implementation strategies for first-class continuations. In Journal of Higher Order and Symbolic Computation, 12(1), 1999, pages 7-45

Larceny still uses the Incremental Stack/Heap strategy described there.

I cannot argue with the critique provided by Hieb-Dybvig-Bruggeman's paper; I'll just say that it is a (relatively) simple implementation technique, and it allows for some cute hacks like making the stack cache part of the nursery.