RAII and Async Stackless Fibers

I like RAII for exception safe programming, and as a general model for resource management. In another thread it was mentioned that this does not work well with an asynchronous stackless fibre model (where stackless might mean cactus stacks). As I like the fibre model for managing concurrency as well, because you don't need to have threads waiting on high latency services, I wanted to continue that discussion, focusing on: is RAII really a problem with these kind of fibres? What is the core problem? How could a new language combine these features in a way that avoids the problem (if the problem is a significant one)?

I don't really see what the problem is, as in simple terms RAII is attaching a resource (be it memory, file-handle, or any other) to a handle that is placed on a stack. The resource is allocated when the handle is created, and freed when the handle is destroyed. The handle is uncopyable, so it can only be assigned using move-semantics. Naively this would seem to work fine with cactus stacks.

Comment viewing options

Not a problem with fibers but with Re-invocable continuations

which would be another use of stackless setups.

Re-invocable Continuations

I think cactus-stacks solve this problem as the stack fragments only get freed (releasing their RAII resources) when all the stack fragments with a base reference to that fragment are freed. This would tend to hold onto memory (although less than multi-threading as the parent stacks are shared) I guess the point is that the continuations themselves cannot be managed by RAII techniques, and you would have to GC the continuations, unless you restrict continuations to lexical scopes?

Are lexically-scoped re-usable continuations a useful compromise?

The problem is symantics not memory management

Using RAII, executing the destructor means that you no longer hold a resource, but if you save a continuation within the scope of the RAII holder then it's possible to continue while still holding it.

You could change the RAII to not be based on scope but to run in finalizers, and that would fix the problem sort of, but releasing resources only on garbage collections gives one very different behavior than basing it on scope. It rather makes the pattern less useful.

An alternative is something like (or exactly?) dynamic-wind, where resuming the continuation would re-acquire the resources, but then that gives you the problem that on some designs running a continuation could fail to get an assumed resource and deadlock or fail. Maybe that's a better answer, but complex.

Cactus Stack

I don't see the problem you are describing. Consider we create a resource on the stack with an RAII handle:

[Resource1] <- current_stack


Now we create a continuation, that will get its own stack, where the base pointer points to the shared base stack:

[Resource1] <- continuation1_stack
<- current_stack


Now it is fine to continue and the when the current stack returns, the shared resource will not get freed because the stack frame containing it only gets destroyed (and destructors called) when all the stacks derived from it are gone.

So the cactus-stack changes the semantics of RAII, because it changes the data-structure used for the stack. But this is expected, and behaves quite nicely.

resource and scope management decouple some under fibers

I like Josh's phrasing (about semantics being the problem) because a dev following a rote C++ formula for success can get a surprise if an escaping continuation keeps an activation frame alive, since resource-holding scope no longer matches a strict nesting expected. Safe hold-and-release of resource still works, but the shape can be surprising, so a dev must suddenly think again. Locks seems like a big problem, if a handle pins one in mutex-held state, and the scope can become unbounded. It can cause a dev to demand re-invocable continuations must be outlawed.

But having facile control over use of continuations is part of the design. It would be a crime to hamstring them just because a dev wants to keep following habits without thinking, so all you get is exactly like C++ but slower because the stack is not monolithic. In essence, a dev is obligated to perceive and manage lifetime scope. If a continuation might be captured -- and it would be hard to tell for sure whether this did not happen given information hiding interfaces -- a dev must have a more detailed plan than "nesting takes care of it".

Instead of a resource being held by a part of a call stack with limited scope in one monolithic thread, it's better to view a resource as held by a fiber, where an escaping continuation might be part of a forked fiber also holding that resource. So lifetime can be associated with fiber scope instead of call-tree subset scope. (Predictability is still good, instead of a continuation free-for-all, so you want something a dev can treat as a region for reasoning purposes, so intuition doesn't give garbage answers.) Maybe a dev wants to make fiber spawning illegal during scope holding a certain resource; failure of pre-condition is a fine reason to kill a fiber by assertion. But a dev must answer such questions.

Now I'll try to explain what I meant by the throw-away one-sentence parenthetical comment. For clarity, let's use pin to abstractly mean a construct holding a resource, as both noun and verb, so a pin object pins a resource until it is destroyed or until manual release occurs. (Normally we say some concrete subtype like handle pins a resource, so pin as noun occurs less.) This idea still works fine under async stackless fibers, but lifetime scope can become weird, absent a plan on the dev's part. There's another degree of freedom requiring a dev to plan.

(Anyway, RAII via destructors in C++ doesn't fit well in an async stackless fiber model either, so refusal to be predictable about what happens is just aggravation.)

You would need your own C++ compiler, to rewrite into a fiber version, because throwing a C++ exception would not do what you expect. You'd have to re-map exception use to invoke a continuation mechanism, so control flow stays inside a fiber (or kills the fiber if no exception handler is within when an exception occurs). You might need to replace the entire STL library system with a new fiber-friendly version, or otherwise you would not be able to rewite code using templates to safely honor fibers. Then you would have a whole new fiber-C++ ecosystem, and an obligation to explain what everything means to a dev, when the original C++ semantics are hard enough to explain already. I expect you would get it, but a lot of devs would be mystified by lurid interactions of fiber continuations and templates. There's a serious too-many-moving-parts problem.

Using a stock C++ infrastructure would cause fights over how runtime control-flow works. But writing your own from scratch is a huge undertaking with a risk few devs will grasp basic nuances of correct usage.

Edit: I decided I was unfaithful to report an idea of killing fibers, when I actually mean killing a lightweight process in which a fiber is a member (ending all member fibers within, assumed co-dependent). A fiber is like one rail in a collection of train rail-lines, with a continuation acting as a rail car executing there. But a train metaphor doesn't seem worth pursuing. Instead I say the collection of all fiber cars in one process is a lane, which runs only one car at a time, so fibers in one lane never run physically in parallel, which makes green monitors and conditions work (and between other lanes in the same host thread too). So a lane has identity, but not individual fibers beyond unparking them in response to async replies. Signalling or messaging a lane unparks a fiber awaiting events of that sort, so a lane can respond concurrently with its executing fibers, which then signals conditions or spawns new fibers as needed. A lane is a multi-fibered actor.

C++ fibres.

Thanks for this, its useful. You could pin a resource handle to a fibre, if each fibre had a vector of open resources. You would free a resource by simply removing it from the vector (by vector I mean an STL auto-sizing array). I am not sure I like this, as it risks accumulating resources that never get freed if the fibre is long lived.

I think I prefer the cactus-stack model above, with the additional observation that we control stack forking carefully. We model this as a thread-fork in the language so that it is clear resources are shared between threads, and we don't allow explicit continuations. The idea is the most common fibre operation, parking a fibre whilst I/O takes place does not fork the stack or cause a problem with resources. Perhaps all fork points have to be function calls?

I think the above could be dealt with in C++, so that the programmers model of resource holding makes sense, and is reasonably straightforward.

Exceptions seem trickier, as you would have to catch exceptions at every stack-forking point, and fix up the stack so that resources get released only if the exception results in no fibres remain requiring the resources to be held, and then re-throwing the exception.

It is certainly starting to get messy. However for my purposes, you could imagine a C++ like language that used RAII to manage resources and fibres for concurrency and handled the stack and exceptions properly as part of the language. So the problem is not that async fibres and RAII don't work together, but that C++ does not really support that.

Perhaps we could imagine a C++ to C++ compiler that let's you program in fibres as if they were native threads?

careful cleanup preserves a lot of value in cooperation

Below I will say fibre to mean "collection of fibers in a lane", so I can use British spelling to include a latent idea of plurality in fiber groups (cf. fiber-vs-fibre). That way our terms match more closely.

When I kill a fibre, I want to free all resources held, but not run fibre continuations again, now suspect as dangerous. So I want all fibre pins visible to inspection, so I can release them without fibre cooperation, since it will never run again. I have three broad resource categories -- memory, locks, and descriptors -- each handled a different way. All locks held by a fibre are chained in a list, so we just unlock them all (which likely ruins any critical section if the dead fibre was inside when killed). All waits for mutex on locks go in another list. If I refcount for gc, I can hand all fibre continuation cactus-stacks to a utility task, to tear down asynchronously, running destructors on zero refs (so they had better not depend on dead fibre state).

Descriptors amount to resources owned outside a fibre. I want a separate map of all of them held by a fibre, so it's not necessary to trawl through fibre activation frames. Even better, I can ensure a descriptor can only be released once, since it will no longer be inside a descriptor map after first release. I'm inclined to use in-memory b-trees as descriptor maps, to minimize cache lines touched. Whether maps are per-fibre or per host thread is a design choice. I like centralization as a way to review everything held by all fibres. Descriptors come with associated generation numbers, which must match at release time or else be ignored.

Something awkward happens if you want one fibre to acquire a descriptor-based resource that becomes owned and released by another, after a hand-off between producer and consumer. Either you need a "held by anyone" flavor that is not fibre-specific, which sounds unsafe, or you need to have consumers dup a descriptor to get their own fibre-scoped pin, which sounds better. Releasing a descriptor amounts to an async one-way message, so this is another thing a background util task can do in the background after a fibre is killed. I thought descriptor map handling would sound like the vector of open resources you noted.

A thread that schedules fibre continuation calls could catch all exceptions and kill the offending fibre, but this would dramatically increase execution cost. It would be like adding a try-catch block to every function call and function return, both of which involve returning to the trampoline. It would work, but probably ruin a lot of the performance value of lightweight threads. I'd be more inclined to catch exceptions only in a thread's main loop, and kill all fibres in the thread if an exception is caught. But this amounts to making hard C++ exceptions illegal, though soft fibre level exceptions would still work.

"That's like the scene in Braveheart," Ned observed, "where they pull the arrow out of the old guy."
"How do you figure?" Wil asked.
"Instead of pulling it out yourself," Ned explained, "you ask the next guy to do it, so he gets punched in the face."
"Okay, I get it," Wil granted. "Then later Mormont gives a valyrian sword named Longclaw to Jon Snow?"
"Don't mess with me, Wil," Ned warned.

Fibres and Exceptions

That all sounds good, a couple of further thoughts:

In C++ you can use move semantics to hand off a descriptor with zero copies (one of the cases where C++ semantics allow better performance than C).

I would not want to catch exceptions in the thread scheduling, as this would likely be hidden in the language runtime. I think the answer is that exceptions need to be handled by a continuation, so that a fibre consists of both its success and failure continuations. catch would push the old failure continuation onto the stack and continue with the new one, throw would jump to the failure continuation instead of the success continuation (which is where return would jump). The failure continuation would wind the stack back to a point saved when the continuation was created (freeing the resources).

move, exceptions, and stack dimensions

Looks like we're winding down to small matters. (But there's a bike-shedding law about small matters generating more heat than big matters, so I'll break off somewhere.) I am familiar with rvalue references and move constructors and assignment operators added to C++11. I have simulated them in C++ and C for a dozen years, by doing "move" on the callee side, instead of in the caller after return (usually by passing in empty-constructed output params and doing "swap" to exchange resources cheaply). The problem of specializing code to target temp values can also be solved by not having those temp values.

In C++ you can use move semantics to hand off a descriptor with zero copies (one of the cases where C++ semantics allow better performance than C).

The problem is that devs may not realize annotated-descriptors need to have ownership moved, because they look like stateless values (composed of one or more integers, so any copy is as good as an original). The associated map entry for a fibre is basically a capability to permit freeing later. The required move operation applies to the capability, so the map entry changes from source producer to destination consumer. I sometimes have trouble getting devs to understand contracts with a spatial quality, about "where" info is located and its associated scope and lifetime. Maybe it's because the semantics are not local to the code seen now. It's easy to write a C api so capability move is part of the contract, but getting a dev to understand the contract is hard, because the description is too long.

About exceptions, I would rather a scheduler not know about them at all. Instead, if a fibre throws, the next continuation it runs is one from the error handler, wherever declared, and looks to the scheduler like any other function call. This is what I mean by a "soft" exception. I think of the C and C++ stack as being on one dimension, and cactus stacks on another, but my description of the spatial metaphor involved was disagreeable to a very bright coworker some time back.

(Suppose you draw a picture of a calltree on graph paper, with later things occurring more to the right, by convention. Say the graph drawn is in the XY plane, with stack depth represented on the Y axis, and relative time -- for a given Y coordinate -- on the X axis. That's how the native language (C or C++) stack works. Cactus stacks are on another axis, say the Z axis, so when the scheduler invokes a fibre continuation, the Y dimension stops changing while fibres run on the Z axis. For a given scheduler invocation, all Y stack backtraces are the same, while Z stack backtraces vary based on activation frames in cactus stacks. As far as fibres are concerned, the Y stack doesn't exist, and as far as the scheduler is concerned, current Z stack depth of a fibre cannot be observed without inspection. Non-interference of C-stack and heap-based fibre stack can be viewed as involving different dimensions.)

You omit Shared Ptr

RAII works well with various garbage collection strategies too. I don't find it fair to discuss RAII as if it only handles stack allocation strategies. It's orthogonal to that.