Using a single stack in the presence of continuations


The use of continuations in a program generally goes together with the use of multiple stacks. That is because several function stack frames can be active at the same time.

This results in obvious memory management issues, such as deciding the location and the size of these stacks. This document explores the idea of a constant size function stack frame, and the conditions in which it allows for a single stack.

I was not sure how to present my ideas in the forum, so I put them in a PDF document.

Does it make sense to think in terms of constant size function stack frame? Is it useful to have a single stack instead of multiple stacks?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

as usual, it depends

Denis Bredelet-jido: Does it make sense to think in terms of constant size function stack frame? Is it useful to have a single stack instead of multiple stacks?

It depends on what you want to optimize. Performance is what everyone usually aims to optimize. But even that requires assumptions: about what happens often, what operations are useful in an app, and hence what a developer uses heavily.

When async happens rarely, and continuations follow a typical C sync call tree, then monolitic stacks are best for performance: giant, contiguous stacks like those under pthreads.

But if async control flow occurs very often, then monolithic C stacks are a pain, so folks often trampoline async calls to make "stackless" control flow by making the stack effectively heap-based (in callback data structures basically the same as stack frames).

That sort of fine-grained heap-based stack is often called a "spaghetti stack" in literature (check wikipedia for example) but is better called a tree-stack because the shape of continuations is a well-formed tree and not a messy spaghetti-like graph.

If I say tree-stack instead of spaghetti stack, assume those terms are identical.

If stack frames are garbage collected, cost can grow a lot, but might be less expensive than what you do in a monolithic stack when code is async and fine-grained. If you refcount frames -- however else you garbage collect everything else -- you get better immediate resource reclamation, maybe with more complex code but with an option to pool frames of similar size.

If your code is going to scream through a synchronous local calculation, one big C stack touches less memory with less bookkeeping per call frame. In which case heap-based frames sound crazy and unnecessary.

But if your code is an inherently unpredictable lattice of async dependencies (say because everything depends on inputs from both network and async secondary storage i/o), so any one function call might wait unpredictably for an async result, then heap-based frames start to sound good.

For space re-use optimization purposes, it's useful to aim for standard stack frame sizes. You can pool frames of the same size as interchangeable, with less fragmentation when frames can't be all possible sizes. (I'm not going to discriminate internal and external fragmentation here, but you can pretend I did and write those paragraphs for me.)

A standard frame size is like a constant size stack frame. Is that what you mean? One size fits all means a lot of unused space in some cases. Maybe you need more than one constant size, pooled separately.

Multiple stacks are a good idea when you have async continuations. But with tree-stacks, many call tree leaves share the same internal parts of the tree, so it's hard to say how many stacks you have. It's only easy to count the number of leaves still alive; measuring degree of overlap is tedious.

Languages favoring use of full continuations tend to like spaghetti stacks formed in trees of shared sub-trees, while languages like C want just one stack per thread, without any sharing.

When I can always call out to a C runtime, I can easily get the monolithic stack style. So what I really want is support for continuations in tree-stacks in a language, in case that greatly simplifies what I'm doing.

Just quick, an alternative

Just quick, an alternative to pooling stack frames of various sizes (a good idea) is to have the RTS manage a chain (typically 2) of standard size frames - in those cases where needed. This can have some (tradeoffable) advantages such as a singe frame "next->" pointer, nicer cache behavior when effectively pushing/popping std size frames, etc.