Let's kick continuations around for a while...

Recently Rys McCusker and I started talking about continuations, in another node where that was probably off topic. The link if you want to read it is http://lambda-the-ultimate.org/node/5082.

The ability to directly refer to continuations is supposed to be something like an "ultimate" control structure, or "goto with its context intact," which all other control structures are trivially implementable with.

But because it does a lot more than "goto" in order to preserve that context, there can be some serious problems if that extra stuff it does is irrelevant, wasteful, or harmful. And I have experienced that some kinds of continuations are just plain worse than others. But let me explain what I mean by kinds of continuations. Most languages that have them at all have only one kind, so the fact that there are different kinds is something that people who only program in one language tend never to notice.

There are a bunch of different things that can be called continuations. I think of them as being divided along four axes. First, there is delimited versus reified. Second, there is lexical-capturing versus dynamic-capturing (or both, or neither). Third there is winding/unwinding versus raw. And fourth there is mutable vs. immutable capture.

Delimited or "one-shot" continuations are delimited in dynamic extent; You can store them in values or pass them to subroutines or whatever, but having called a function, you can its continuation to return from that function exactly once, and that only if you haven't already returned from the function in some other way first. With delimited continuations you can reorder the sequence in which functions return, making it not equal to the usual standard of being the reverse of sequence in which they were called. You can even "delay" some of the function returns until after the program halts (ie, never) by just never using them, or by having your function just skip the returns of its callees by telling them to call its own continuation directly. These are mostly no problem in my experience.

Reified continuations can do the same kind of things that you can do with delimited continuations in terms of returning from the function or passing one to a called function or reordering function returns, but reified continuations have an additional property: having captured a reified continuation and stored it as a value in some variable that persists beyond the dynamic extent of the function (ie, after the function has already returned) you can call it and simulate that function returning, with the same environment (that may include the environments of other functions that have already returned) AGAIN, even though the function whose return you are simulating has not been called a second time. This sounds like a good idea, but experience has taught me that it is deeply problematic.

Lexical-capture vs. Dynamic-capture is whether your continuations capture the lexical environment (ie, the bindings within the lexical scope of the function whose continuation is taken) or the dynamic environment (ie, the bindings within the dynamic extent of the function whose continuation is taken). Usually, programming languages use scope, which is called lexical scope, or extent, which is called dynamic scope, but rarely both. In those two cases this is a trivial choice because your continuations will capture the same environment whose bindings are extended by your function calls (it could be done the other way but that would be insane). However, if your language has both lexically scoped and dynamically scoped variables, this distinction can become crucially important. Usually continuations that capture the lexical environment will allow the dynamic environment to be inherited from wherever the continuation is called, rather than from where the continuation is taken. You can make powerful arguments on both sides of the question of whether continuations that capture the lexical environment should or should not also capture the dynamic environment. Semantically it's nicer if they do, but in terms of implementation that doubles the branchiness of your environment tree and can become very expensive very fast if people use it a lot. In Common Lisp version 1, before they switched to lexical scope by default, there were (delimited) continuations that captured the dynamic environment but not the lexical environment. I guess there is actually a fourth subcategory here: "goto" is a sort of continuation that captures neither lexical nor dynamic environment.

Winding and Unwinding is another important distinction, and another controversy. If your continuations wind and unwind, it means that you can arrange for code to be called when a given scope (or extent) is entered or left, including entering or leaving that scope by means of a continuation. A classic example would be releasing a resource when you escape from a function that needs that resource, and then locking the resource again when re-entering that function. With raw continuations, it's a lot more like just a jump; you don't get to call set-up and shut-down code. In some ways winding and unwinding continuations are more orderly, but once again the devil is in the implementation; this renders continuations nearly useless for mechanisms like lightweight context switches between green threads.

And finally we get to the last category; is the captured environment mutable or immutable after capture? An implementation of continuations could copy the stack, and then treat the copied value as immutable. If the bindings visible in the stack were then mutated before the continuation were called, the stack from the continuation (including the former values) would be copied back into the working stack, undoing the mutations. This type of continuation is called a "restart" and is very rare because it is highly expensive in memory costs. In the usual case, the captured environment is mutable - ie, the implementation records only pointers to the stack frames where the variables are found, not the variables themselves. So if code after the continuation is captured mutates the values of those variables, the continuation when invoked will not overwrite the mutated values.

And I think that is an exhaustive taxonomy of all the things that have been called "continuations." Doubtless I've missed something.

Continuations are used to implement:
control structures - for example if you want to roll your own do-while loops, no environment capture is needed.
nonlocal flow of control - for example if you want to roll your own try/catch and exception handlers.
variable scoping - using the winding/unwinding variety, you can simulate dynamic variables in lexically scoped languages or vice versa.
green threads - if you want to roll your own cooperative multitasking.
session management - you can represent the session of visitor #1392 to your website (or whatever) as a continuation.
type system extensions - using winding/unwinding continuations can allow you to add typechecking code on entry and exit to functions, even for types the native language could not define. The drawback is that these are runtime rather than compile time typechecks.
and many other things.

Many of these things can't be done within the (other) tools a well-defined language can give you to work with. So, yay.

But many of these are BAD uses for continuations, in that continuations are a BAD way to implement these things. So, boo.

Many of these things cannot coexist in the same program because different uses of continuations would force stack captures that can never be recovered by garbage collection, or otherwise interfere with each other. This is especially true of reified continuations. So, boo some more.

Many of these different "continuations" have very fundamental semantic differences, in terms of composability and tractability for combinations.

And now that I've written half a book here, you guys can tell me what I missed or what it means. ;-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Confusing terminology

It's very confusing to use "delimited" the way you do. The normal use is to represent something which is only part of the full continuation, from the current stack frame down to some prespecified stack frame. Delimited continuations can be called and returned from, whereas undelimited continuations can only be called -- there is no way back. This property is orthogonal to being only callable once, which in turn is orthogonal to being only callable from the dynamic scope. Racket escape continuations are fully reified, as with call/cc, but it is an error to call them more than once.

Agreed

The usage of "delimited" in the original post is nonstandard.

Delimited continuations capture a fragment of the stack, essentially, up to some delimiter, and there are many variants of this idea, as you are probably aware. They are generally reusable, although one-shot delimited continuations are a very important special case. This design space has been discussed here many times. (eg., in reverse chronological order, 1, 2, 3, and no, I really can't believe it's been ten years...).

The usage of "reified" is also a little odd. I would describe any first-class continuation as reified, whether it's one-shot or not. If I can get hold of it and pass it around, it's reified. Non-reified continuations are the ones that already implicitly exist in the form of function returns, exceptions, branch targets, etc.

It's a helpful taxonomy of design issues around continuations, though.

Argh, you're right.

Sorry about that.

If I go back and edit the initial post, this whole side thread will be confusing, so I'm going to acknowledge here that I was thinking of one thing and typed another.

You've got it absolutely right that delimited continuations are delimited in terms of stack frames rather than in terms of usability.

The funarg problem

The terminology you seem to be after is the venerable downward funarg versus upward funarg.

(Historical review: If a locally defined function is passed down the stack from where it is defined, where the stack is understood to grow "downward", it's a downward funarg, while if it's passed up the stack, to outside the scope where it was defined, it's an upward funarg. After Lisp broke up into myriad dialects following LISP 1.5, dialects tended to support downward funargs but not upward funargs — because as long as you only have to handle downward funargs, not upward funargs, you can implement your language using a conventional ALGOL-style stack, allocate local procedures in a stack frame, and discard those local procedures with their stack frames when popping from the stack on function return.)

So if a continuation created by call/cc can only be used to terminate the call/cc prematurely, expiring once you exit the call/cc's dynamic scope, that's a downward funarg.

calling downward coordinate system convention

Regardless of which way the stack grows, you can adopt the vertical coordinate system of how call trees are diagrammed. Thus if called routines (callees) are shown beneath parent callers, then nesting is "downward" no matter which way the stack grows in absolute addresses.

       main()
      /      \
    foo()   exit()
    /   \
bar()   baz()
          \
          f()

If funarg is short for function argument, then a downward funarg is a function passed as argument down to a callee (as opposed to up to a caller) when nested routine calls are downward by convention.

Pretend you can define nested functions in C (which you can't, unless you extend it to resemble Scheme with lexical closures). Then it would be possible to define f() locally in main() and pass it to other functions called, like baz().

int baz(int (*fn)(int x), int y) {
    return (*fn)(y);
}
int g_gobal_y = -1;
int bar(void) {
    ++g_gobal_y;
}
int foo(int (*fn)(int x)) {
    bar();
    return baz(fn, g_gobal_y);
}
int main(int argc, const char* const* argv) {
    int base = 1;
    int f(int n) { return base + n; }
    
    if (foo(f)) {
        exit(1);
    }
    return 0;
}

Here f is a downward funarg when passed to baz() because the scope of definition in main() is still on the original stack, where it's still alive without taking pains to keep alive as a closure. An escaping function passed up to callers is an upward funarg.

"funarg" usage ahistoric and not quite right

I'm not absolutely certain but first, I think you are using "funarg problem" ahistorically, John. Historically I think we talk about funargs only because of the "upward funarg" problem. The "upward funarg" problem is in retrospect entirely about what we now recognize as environment representation. Any implementation that supports closures of independent, indefinite extent (such as heap allocated closures) solves the funarg problem. Continuations don't quite enter into this.

The two are not unrelated, of course:

I acknowledge that if an implementation has closures (and supports upward funargs); has tail call optimization; and has or can support conversion to continuation passing style then a translation for call/cc follows trivially. We could say:

CLOSURES + TCO + CPS yields CALL/CC

I also acknowledge that if an implementation has call/cc by any means then, from that, closures for upward funargs can be simulated using an "unspecified whole program transformation":

CALL/CC + UWPT yields CLOSURES

(This latter derivation isn't very interesting. It expresses only that CALL/CC must implicitly closures of a sort and that with a transform we can tease those out into closures of the familiar sort.)

One-shot continuations (call them CATCH/THROW in keeping with lisp tradition) don't seem to fit in so nicely.

CATCH/THROW + _______ yields??? DOWNWARDFUNARGS

I suppose you could fill in the blank with some form of closure conversion but then, having done that, you don't need CATCH/THROW any more. More: if you have closure conversion and CATCH/THROW then another UWPT easily gives you CALL/CC (and TCO).

Similarly, I don't see how to fill in:

DOWNWARDFUNARGS + ________ yields??? CATCH/THROW

In short, both historically and logically, I think one-shot continuations and funargs problems aren't very closely related.

Nope, funarg problem was about context.

The upward funarg problem was about a callee passing references to local variables back to the caller -- in a system that deleted local variables when the called function exited. That worked just about as well as you'd think it might, and forced implementers to be sure to pass *values* back from functions instead of *pointers*, which broke certain invariants that their clean runtimes relied on and produced asymmetries in their language designs.

much later I identified it as being another aspect of a problem that includes variable capture, macro hygiene and alpha-renaming, etc - namely, failure to preserve the link between expression (in this case a variable reference) and the environment/context in which it occurs.

citation for history of continuations

Reynolds, John C. (1993). "The discoveries of continuations". Lisp and Symbolic Computation 6 (3/4): 233–248.

PDF copy from cs.ru.nl

h/t Wikipedia

citation (first?) first class cont. in scheme

Sussman, Gerald Jay and Steele, Guy Lewis Jr. (1975). "MIT AI Lab: AI Memo 340"

I'm not sure but perhaps this is the first documentation of first-class continuations in Scheme (here referred to as "CATCH tags" and "delta expressions").

I notice that here, first class continuations are a kind of hack for implementing "CATCH". CATCH is described informally, only via a return-once example.

CATCH reifies the current continuation as a DELTA expression which *can* be used to return more than once, though the memo does not mention this.

The memo advises that if you don't get CATCH you probably shouldn't use it.

h/t [Reynolds 1993] (citation in a separate comment)

citation: "Advent of the Recursive Procedure"

This is not really canonical to a history of continuations but it does set some context for the early work that [Reynolds 1993] documents.

Daylight, Edgar G. (2010) "Dijkstra's Rallying Cry for Generalization: The Advent of the Recursive Procedure, late 1950s -- early 1960s"

Abstract

According to J.A.N. Lee in 1996, computer scientists “are reaching the stage of development where each new generation of participants is unaware both of their overall technological ancestry and the history of the development of their speciality, and have no past to build upon” [1, p.54]. A technically and historically accurate account, as attempted here, can help us, computer scientists, grasp some of the fundamental ideas underlying our discipline. This paper describes some early contributions of E.W. Dijkstra by elaborating on his involvement in putting forward and implementing the recursive procedure as an ALGOL60 language construct. Particular attention is paid to Dijkstra’s generalizing style of solving problems.

My comment:

In the "early days" it was not obvious or a done deal that mathematical-style functions should play a significant role in programming languages.

It is not clear from the record that trying to making functions a programming language concept ever had a strong, clear, coherent motivation. In retrospect it appears an ad hoc move, perhaps a mostly stylistic attempt to "dignify" early computer science as a serious field. Functions presented some implementation puzzles which took a while to sort out. Performance concerns ran deep (and we can see performance worries still worth mentioning in Steele's 1975 Rabbit thesis).

Per Reynold's, continuations first appear in the field of programming language semantics. There, they are a way to reconcile, for analytic purposes, the decidedly non-functional properties of programming language "functions" with the mathematical concept of a function.

Later, continuation passing style emerges an implementation technique -- mainly meant to reduce a language that seems to need something like a stack to a language that does not.

continuations: ad hoc, incoherent, mystified

My view of history is that there was never any coherent rationale for adding first-class continuations, at least to imperative languages. They arise in theory as a way to analyze the behavior of programs in imperative languages with "functions". They arise in practice as an internal implementation technique to simplify the interpretation or compilation of such languages. (They arise in a related way in functional programming languaages.) Continuations as first-class values appear to arise more or less by accident as a quick and dirty method for implementing CATCH in early experimental versions of Scheme. Later, Scheme hackers noticed you could do weird things using that hack (and picked up the tradition of trying to dignify ad hoc developments in sombre-looking math).

To an implementer, a continuation is nothing more or less than "the registers you have to save when making a 'function' call".

To someone doing semantics, a continuation is nothing more or less than a way to describe the behavior of a program, from some point in its execution, as a function of a value representing the state of a mutable store.

As a result of this history I can't see any reason to expect that first class continuations are a "good" programming language construct.

By extension I think we should also question the primacy of "functions" in imperative languages. The use of continuations in semantics underscores why it is necessary to put scare quotes around the word "functions" in an imperative context. The dream of languages which are mathematically dignified (in that way) yet imperative was, it seems, nothing more ever than a dream.

whither primacy then?

If not functions, what?

"if not functions"

How about state machines? They can compose in many interesting ways. And we already know of some higher-level abstractions for expressing them; I'm thinking of lexical syntaxes and other grammars for example.

machines

A pretty good model for machines[1].

I feel that machines are an excellent foundation for modeling application behavior. But I still like functions. I'd model machines with functions. :)

re: machines

But I still like functions. I'd model machines with functions. :)

I hear a fellow in Iowa once wanted go to the corner store for some milk. He felt most comfortable navigating his way around central Paris so to get to the corner store he first took a train to New York City and then got on a boat.... :-P

What does model mean?

Functions are logically simpler than any process model as they embody simple parameterization. So what we should do is logically model processes as functions, and then physically model (compile) functions as processes.

"logically simpler"?

Functions are logically simpler

I don't know what that is supposed to mean or why it would be important.

Semantics

Formal semantics and reasoning are easier with functions.

re Semantics

"Formal semantics and reasoning are easier with functions."

In the case of imperative languages I'm not sure that's true at all. For example, I think Shutt's example of the incomplete definition of call/cc in Scheme is symptomatic of the way in which "functions" in an imperative context aren't really functions. The very concept of continuations in denotational semantics, as used to describe imperative languages, seems terribly ad hoc and weird.

Functional or imperative: I don't think there is any strong empirical evidence that functions are in any useful way "easier" than the mostly untried alternatives.

Meanwhile, children and lay people using programmable calculators seem to take to state machines like fish to water but often stumble badly at recursive functions. I think state machines might be easier to reason about.

Formal vs. informal

When I (and I believe David) refer to functions, I mean honest functions, not some imperative "function". IMO they are plainly simpler semantically than any concept of process, which will probably include an appeal to functions in its definition. When you talk about people preferring state machines to recursion, I don't necessarily disagree, but that's more about informal reasoning than formal reasoning.

performance

We still don't have the 'sufficiently smart compiler' that can remove all the abstraction overheads (at least not for general purpose languages).

But your analogy suggests a performance difference of about three orders of magnitude in time (e.g. expanding a 20 minute round trip into a two week cruise and vacation). I would guess a more realistic difference between hand-optimized code and what we get from a compiler with moderately performance-conscious code is less than one order of magnitude. (Efficient interpreters tend to have one order-of-magnitude difference [1], and a compiler should do much better.)

OTOH, I suspect that one reason we're having performance difficulty is that we're still trying to interface with low level code written in other languages or ad-hoc low-level models, essentially opaque to analysis. The recent movements towards unikernels and compiling applications into full virtual machine images as the executable format might become a big step for extracting performance from high level languages.

re performance

It's not just a question of performance. For example, the Iowan who goes to the local store via Paris not only takes longer, but he depends on a societal capacity to forge steel and lay railroad tracks (so he relies on a very complicated "sufficiently smart society").

Uncontrolled, complicated towers of dependencies harm robustness and the ability to adapt.

re towers of dependencies

I think you have your point misaligned here. Functions (in the pure, mathematical sense) are certainly a simpler foundation, with fewer external ('societal') dependencies, than whatever it is we're doing today. The modern OS, graphics, and network stacks are complicated, uncontrolled towers of external dependencies and protocols that harm robustness and adaptability.

The modern approach is favored primarily for performance: our computers and compilers don't efficiently implement the really simple ideas (such as arbitrary constraint solving, or modeling physical systems at the atomic level, or purely functional programming). So we favor the more complicated ideas. If the simpler ideas performed just as well, we'd be using them almost as fast as we discovered them. There'd be a stronger community bias to find the simplest concepts, rather than a strong counter-pressure to reject simple ideas whose performance on readily available hardware is not highly competitive.

It really, ultimately, is just a question of performance.

Besides that, the tower of dependencies isn't even a strong distinguishing point in your metaphor. Your Iowan who goes to the corner store by the direct route depends nearly as much on society, e.g. the ability to deliver the milk to the store and the electricity for lighting and cooling and the monetary system. And if he decides to buy something besides milk, it may well have 'Made in China' stamped into plastic, with all the attendant overhead of shipping overseas.

just so stories

The modern OS, graphics, and network stacks are complicated, uncontrolled towers of external dependencies and protocols that harm robustness and adaptability.

The modern approach is favored primarily for performance: our computers and compilers don't efficiently implement the really simple ideas (such as arbitrary constraint solving, or modeling physical systems at the atomic level, or purely functional programming). So we favor the more complicated ideas.

First: why should I believe that "the modern approach is favored primarily for performance"?

For example, an alternative possibility is that the stacks that are very popular these days mostly reflect a long history of accumulating local decisions about labor costs. The cheapest-to-rapidly-develop solution very often wins at each step. Where possible it is very often cheaper to "take" earlier made solutions than to "make" a more performant or more robust one. Along the way many plausible, unrealized ideas are left by the wayside for want of the labor resources to work on them.

Now I think you or someone might sincerely want to say something like: Yes, but if we finally did have the right hardware and the right compilers for very high level languages these would prevail by lowering labor costs.

To me that is at best a hypothesis. Worse: it is a hypothesis that has been around for decades, with many attempts to prove it by demonstration, and for the most part those attempts have failed.

Perhaps the foundations of software engineering including programming language theory should not be mathematical abstractions that we hypothesize might one day finally be revolutionary. Perhaps the foundations ought to be the real conditions under which software is produced by real people.

In that case the kind of self-contradictory mess we've made of "functions" represents a failure of research that results from it having spent a few decades distracted by a quixotic quest for the ultimate mathematical foundations.

The simpler ideas DO have

The simpler ideas DO have lower labor costs. Consider: Brute force search and constraint solver is easy to write. Brute force atomic simulation is easy to write. Either can solve many problems with little effort. They'd just perform unacceptably.

Whether or not a compiler will ever exist to make them fast is a separate issue. And it is true that we got where we are by lots of local decisions. But there is no tradeoff between simplicity and labor costs.

idealism is a lousy foundation

Although you haven't expressed it in an academically formal way I think this statement is a fine example of the dubious idealism that has characterize PLT for decades:

The simpler ideas DO have lower labor costs. Brute force search and constraint solver is easy to write. Brute force atomic simulation is easy to write. Either can solve many problems with little effort. They'd just perform unacceptably.

In actual reality with real programmers working in the real world -- where labor costs are a real and regularly measured thing -- your broad assertion about the "simpler ideas" has never been observed. Decades of trying with at best very marginal progress. (I acknowledge that there have been niches where some of the techniques you list have been useful but that is not evidence of your very broad claim.)

So the "simpler ideas DO have lower labor costs" in some abstract, idealized sense unconstrained by the obvious, empirical, direct facts.

Elsewhere, PLT practitioners and fans express bewilderment at their seemingly minor impact on real-world practice.

And first class continuations in an imperative context are (in theory) both a floor wax and a dessert toping.

Reality

If you don't care about performance, solving Chess and every NP-hard problem becomes simple and trivial, and takes very little effort. If we historically were truly optimizing for 'labor costs' and ignoring 'performance', a wide variety of business problems could be solved trivially. But poor performing solutions don't succeed because humans don't have the patience to wait a few hundred years for an answer; heck, Amazon and E-bay have observed (empirically) that every 100ms latency causes a 1% loss in sales. 1% of us won't even have the patience to wait 100 more milliseconds per page load.

Real programmers, working in the real world, where labor costs are a real measured but poorly estimated thing, are systematically accepting labor and maintenance costs in order to improve performance (which is another real measured thing). This is our empirical reality. Performance, both algorithmic complexity and the coefficient, is frequently a major concern to industry programmers, and even to academic programmers.

The only reason constraint solvers are even starting to see industrial interest is that they've improved performance around four orders of magnitude in the last twenty years, two of those due to software. Immediate mode GUI is very simple, much easier to implement and debug than retained mode, but historically inefficient for slow-changing interfaces; only with React (which simulates immediate mode and actually improves performance compared to stateful DOM) is it making a major comeback. Ray tracing is simpler than rasterization, but won't become popular until someone (maybe Euclideon) makes it fast. Garbage collectors, which are undoubtedly a labor saving technology, still aren't accepted by most game developers. Multi-threading, which greatly increases maintenance and labor costs and promises some moderate benefits to performance, has been widely pursued in many an industry.

What I'm NOT doing is suggesting the current practice is wrong.

Performance matters to me, too, and I'm only willing to sacrifice a fraction of it (maybe 30-50% for the full stack, which is high compared to many programmers) to reduce complexity and labor costs. I don't favor FP because it's simple; IMO, FP is still very mechanical and requires a lot of labor. I favor FP as a sweet-spot compromise between simplicity and performance that has pretty good compiler technology.

your broad assertion about the "simpler ideas" has never been observed

And my hypothesis, as you describe it, accounts for "why not". Labor and simplicity is a flexible tradeoff point for other important features, like performance.

My assertion that simplicity saves labor only conflicts with your own position, that we've been optimizing primarily for labor costs and that unnecessary labor costs thus must arise mostly as an emergent consequence of greedy local labor saving decisions.

Machines

I wonder how these are related to the "Processes" I've talked about? Do you have a better link to the concept than the slides?

Read the Source, Luke

The Haskell code I also linked could offer a pretty solid understanding, assuming you're already most of the way there.

Thanks

They're much more specialized / not too closely related.

worth rewording the problem?

Is there any point in describing the same problem space again in different words?

For example, a thread is a stack of state for routine calls, most recent on top. If routine X calls routine Y, we suspend X until Y returns, after which X resumes at the sequel to the call to Y. Here sequel means continuation: what X continues doing when it resumes from that call. It's clearly the sequel to the call to Y, so that's a good term.

X passes the return address indicating the sequel in the form of an exit device. If Y makes a tail call to routine Z, the same exit can be passed to Z so it returns directly to the sequel in X, skipping followup in Y. We can call this exit-passing-style when the exit is explicit and can be manipulated by optimization or first class value usage.

Since an exit refers to a sequel keeping a thread alive, clearly the cost is stack state if a programmer keeps one alive. Calling an exit replaces the current thread with the sequel the exit denotes. So it's a non-local exit if the sequel is not in the immediate caller of the current routine.

Doesn't this sound less confusing than traditional continuation verbiage?

Note despite usage looking like a call, an exit is not actually a routine because it doesn't come back. It's a nub or button you fire to goto the sequel, swapping the current stack with the sequel stack in a current thread.

jump vs exit/enter and green thread stacks

Every exit to a sequel is also an entrance, since leaving where you were means arriving where you go. (Yes, there's absolutely an arrow metaphor here.) For this reason, when you look at things from the POV of an arriving point, it's weird to call the device an exit when you enter from the destination perspective. So I'm toying with a plan to call it jump instead, tuning it to exit or enter depending on the context assumed by a scenario considered. So you jump to a sequel to exit where you were, and thus enter the sequel context. When you consider going, the verb is to exit. When you consider arriving, the verb is to enter. But either way it's to jump to a sequel. This fits the use of jumps to implement goto, but a little more abstract.

[How you implement green threads can make the way I said it sensible or awkward. In my case, a fiber (green thread) is something that can be scheduled and belongs to a lane (green process), and doesn't really have identity beyond the pointer a lane knows about. In my model, lanes can be killed and thus every fiber within, but not a single fiber since a lane depends on all of them. So when I talked about killing a thread, that's an abstract view that must be tuned to a concrete situation fitting implementation. When I said a thread is a stack, it means a fiber has a stack, and that's most of the fiber's state; so if you replace the stack it changes what a fiber doesn't almost completely (and could break the lane if this wasn't in the realm of what the lane expects is possible). All old stack frames released will end up freeing all held resources, unless they are still reachable (from, for example, sequels still alive via jump/exit references). A fiber is just a shell for hosting a stack, so the green thread represented by the stack can run when scheduled. Unless a lane has attached special significance to a fiber -- like the one awakened to handle signals or messages -- all fibers are interchangeable, and it doesn't matter how many there are. Spawning a new fiber just means allocating the shell, hooking it into a lane's list, and attaching the stack; then the first thing it does is run the sequel. Every time a fiber runs, all it does is execute the sequel to what came before. So whether you replace an old fiber's stack with a non-local exit, or whether you spawn a new fiber instead, is just a matter of organization and preference. Concurrent execution is the default. When conflict can occur, you use synchronization mechanisms.]

nice

yes, Naming is important. There are several things in CS are weird because their names don't work well for newbies, i dare say. Avoiding making it worse is a nice thing to attend to. Still, it will always be a tad subjective.

sequel and jump still work for me so far

I'm slightly more interested in clarity than some. Continuation has the grandeur of lots of syllables. A shorter term is useful if said often. Even more important than a term is its explanation. The phrase "the rest of the computation" is vague, begging one to ask, "In the context of what?" Anything crisp and specific would do: where code resumes in the context where the ref was taken. Avoidance of term code seems odd, as well as context.

It is pretty subjective. But if an average listener doesn't declare "I get it!" most of the time, we can do better. As long as each new elaboration of a story fits and seems clear, it's hard to do much better. Arch explanations aren't helpful. I keep expecting the signature phrase of Kent Mansley in The Iron Giant -- "and all that that implies" -- at the end of opaque tech jargon.

Continuations in Kernel

In designing Kernel, I perceived Scheme's continuations as unifiable with the concept of exceptions. I was thinking of dynamic-wind, a feature I was appalled by because it seemed to highlight gaping holes in its own design — the R5RS actually had the gall to say "The effect of using a captured continuation to enter or exit the dynamic extent of a call to before or after is undefined." In other words, we're making this up as we go along instead of using a conherent conceptual framework, so we have no clue what should happen in that situation. <headdesk>

I ended up with first-class continuations that are not procedures; there's a primitive procedure for constructing a procedure from a continuation, and from that is derived a procedure for applying a continuation. I didn't have a problem with the multi-value-return thing because I'd purged from my design the mishandling of operand lists that led to that (as I'd best describe it) misconcept. But as for call/cc, although I did provide it because I couldn't derive it from the other facilities, I didn't find it natural as the primary constructor of continuations. My unification of continuations with exceptions entailed continuation guards, procedures triggered by entering or leaving the dynamic extent of a continuation — and, since I meant continuations to be useable as exception handlers, I wanted to be able to extend a continuation declaratively, from outside its dynamic extent, without triggering the guards on that extent. Whereas call/cc could only be used to extend a continuation from the inside. (I discussed guarded continuations a while back on my blog; they're in Section 7, pp. 111ff, of the Kernel report (pdf).)

My conclusion was that calls and returns are identical.

In a lambda calculus with variable names and environments, each function is a function of one argument and has one return value. And there's a very deep symmetry in it.

The symmetry is that once you figure out what you have to do to solve hygiene and the funarg problem, the relationship between the argument and the caller environment is exactly the same as the relationship of the return value and the callee environment. So, at the implementation level (AND the semantic level) calls and returns are exactly the same.

The design of Lisp broke that symmetry by allowing complicated lambda lists that did all kinds of things with any number of arguments, while keeping the return to a single value. "You can always return a list" they said, but having an argument be a list implicitly while the return value might or might not be a list, and only explicitly, broke the symmetry.

When I saw multiple-valued returns, I was very skeptical about whether they were good design for a while, but then I remembered my vague unease about the asymmetry in lisp implementations between call and return which didn't exist in lambda calculus -- and a light went on. "Of course!" I said. "This is just an attempt, by people who don't even necessarily realize why it's the right thing to do, to restore that symmetry!"

And that led me to the conclusion that if we're going to have that symmetry fully restored in the design of a lisp allowing multiple-argument calls and multiple-value returns, then call sites needed return argument lambda lists that tell them how to interpret and bind the actual return values, in exactly the same way that callees need call argument lambda lists to tell them how to interpret and bind the actual arguments. And every thing you do with argument lists, is a thing you need to be able to do with return lists.

This makes continuations complicated. Like functions, continuations now have an arity. Call-by-name, if you've got it, implies return-by-name, so continuations have to be sensitive to named return values.

"yield," called with arguments the same way as a function call, becomes the name, bound inside the function, of its own continuation at the point of the call to "yield," and returns control to the caller. "return" becomes the name, bound in the callee, of the caller's continuation from the point where the callee was called. It also returns control to the caller. Either one gives its arguments to the callee as returned values.

And either can be passed to callees (or callers) as an argument (or as a result).

One would think that this complicates implementation a lot, but it doesn't. You just reuse the same stuff you already implemented for calls. And if a function returns "yield" or "return" as one of its return values, or passes them to a subroutine, that's fine.

Solving the symmetry thing gave me a fexpr-based implementation anyway, so function arguments are thunks of expression and reference to environment for expression evaluation. As a result, the same way a function can take a thunk argument and not evaluate it (like the consequent branch in the function "if"), a return site can take a continuation from the caller or recieve its own continuation from the point where the caller was called) and not immediately evaluate it, so returning either continuation does not imply that the call site immediately evaluates it and gets stuck in a loop.

So, for me, call sites look like this:

(function_name actual_arg1 actual_arg2 etc | formal_return1 formal_return2 etc)

making the vertical bar a special symbol dividing actual argument values and formal return symbols for binding.

And it turns out that "yield" was the final piece of the puzzle. when evaluated, returns control to the exact point within the callee where it was called - but, like any function call made by a caller, "yield" can be returned to. "return" is implemented as a call to the caller's continuation from the call point, but semantically, it's just "yield" followed by a jump to the end of the function.

If "yield" is passed in the list of values given back to the caller as result values, the caller (or whoever) can then call the value it got (which won't be bound to that name in its environment) to return to that exact point in the callee. If "return" is passed in the list of values given back to the caller as result values, then the caller has its *own* continuation from where the callee was called, and this can be called as a normal continuation.

The whole thing buttons up. There is no semantic distinction between function call and function return - either can pass its own continuation or any other continuation in its posession, including its caller's continuation from the point where it was called.

So if the programmer is being "excessively clever" (which usually, but not always, means s/he has made some kind of design failure) s/he can write something that is complete spaghetti code or continuation soup.

I don't really know whether to be proud or ashamed.

I still don't know...

I'm still trying to figure out what to do about winding continuations.

If the callee has code it must execute on any entry to its own context or on any exit from its context, then it follows that so must the caller.

Hence, the wind/unwind routines are properly properties of the call frame or environment, rather than properties of the continuation.

The question is whether a caller needs to do its own exit code whenever it makes a call and then perform its entry code when that call returns. That's the symmetric solution, but it's agonizingly slow.

I believe now that it should probably depend on lexical scope. That is, if foo is a function that provides a lexical scope within which bar and baz are contained, then the winding code for foo is executed when foo is called - or if baz or bar get entered from any point outside foo's lexical context. Likewise foo's unwinding code gets called whenever foo exits - or when baz or bar return control to some context outside of foo.

But baz need not call foo's wind/unwind when it is just calling bar, because there is no entry/exit from foo's lexical extent.

With me so far? Okay, the agonizing bit is that baz calls "+" and "-" and "if" and "cond" and so on which are functions (remember it's a fexpr-based dialect) defined outside the extent of "foo". And, those being conceptually primitives, it's going to call them a lot.

So "foo" where the wind/unwind rule lives, now needs to also express that there are things that can be called without winding/unwinding.

But this is crazy. Those are system-defined functions! Nevertheless, a call implies a closure, and a closure requires leaving foo's environment - and then reentering it again.

Wind/unwind code is really painful when consistency is important.

I've only skimmed these

I've only skimmed these threads, but the ability to hide returns within calls allows you to give simple semantics for system calls, like print. When you call 'print', the process you're in is actually returning a request to print to the OS (or parent process, more generally), which does the print and then resumes the process you're in. If you're a purist -- and I'll omit the reasons why you should be :) -- all side effects can actually be handled this way. When you do this, it effectively becomes a requirement that you be able to bypass any wind/unwind mechanism that you have installed when you're returning to / getting called back from the OS. Otherwise, printing something trips the mechanism. When viewed in this light, wind/unwind should not be invoked whenever you enter / leave a process. Rather you wrap a particular set of protocols of a subprocess with the wind/unwind construct, only invoking wind and unwind when transitions occur on those particular protocols.

request/reply and whether winding handlers are needed

When you call 'print', the process you're in is actually returning a request to print to the OS (or parent process, more generally), which does the print and then resumes the process you're in.

Sometimes asymmetry in relationship can restrict who is allowed to initiate an exchange. When this happens, you can let a passive end initiate anyway by letting an open request stand indefinitely until a reply is used to initiate an action. (We did this in an async deduplication system a few years ago, where it was only easy for one side to initiate.)

Where normally a request initiates and a reply finishes, you can invert this, by sending a request meaning "if there's anything you want me to do ...?" as an open invitation. Then a reply can initiate a specific action. The active/master side has to keep issuing more open requests so a passive/slave side can reply when and how it likes. Or with store-and-forward, you can batch pending replies anyway before the next open invitation shows up as a request.

My point is that distinctions can be artificial, reflecting more quirks in local control flow organization than something with deep meaning.

This picture is simplified, in the sense there's even more nesting inside this VM with a green process box containing green thread boxes, but it's enough to discuss queuing:

+----OS---------------+
| +-----process-----+ |
| | +---thread----+ | |
| | | +---VM----+ | | |
| | | | fibers  | | | |
| | | +---------+ | | |
| | +-------------+ | |
| +-----------------+ |
+---------------------+

Anything that happens inside the VM is the result of requests queued into it from a host process, so in effect any outbound messages are replies. Using dependency injection, you can send in a request that means, "Perform these actions for me, and any time you need a system call done, ask me to do it for you while a fiber is blocked."

Here's a slight change in topic, on debugging. If a VM is single-threaded (assuming multiple instances in different threads for parallelism) and does nothing unless a host process lets it run, all state inside the VM is completely frozen between opportunities to run, with no possibility of race conditions. Until you let a VM run again, it's a static core image you can analyze to your heart's content, pulling out as many metrics as you like, including full analysis of what's alive now and where it came from, under which backtraces. The analysis can be run from another VM instance, so you can use the same code model to analyze itself, but in another instance. The analyzing VM instance can be remote in another process or network node, though you'd probably want a local proxy as intermediary, to smooth over all async i/o effects and to execute local memory access reads and writes. Stopping a VM is the trivial part.

This is an extreme example of yielding control. Letting control execute in another VM doesn't mean the quiescent VM needs to execute unwinding code. There's no reason the frozen VM needs to know it was frozen, except it will see when looking at the clock when it runs again. (Unless you lie about the time, which would be easy if the VM cannot make system calls without invoking an api you provide, which can correct clock skew with a single value.) My point is that sometimes running other code is none of your business, and does not imply stack winding control.

special forms with boundary guards

Later I'll comment on the idea of call and return being roughly the same, which matches what I had in mind with both being jumps. In CPS transformed C, you can turn both calls and returns into alter-the-stack-then-jump-by-return-to-trampoline. The method called to alter the stack makes it deeper for a call or shallower for a return. (The entry point for a call looks a lot like formal params were returned from a call to get-my-arguments, except to optimize, you might as well inline them in the activation frame.)

Hence, the wind/unwind routines are properly properties of the call frame or environment, rather than properties of the continuation.

The following pseudo code aims to suggest what I described recently. Assume a Scheme style special form that controls enter and exit of code inside a body, once a head has been executed and a tail not yet executed. None of these symbols are evaluated, because they are just syntax: try, head, on-enter, body, on-exit, tail:

(try (head head-clause)
     (on-enter enter-clause)
     (body body-clause)
     (on-exit exit-clause)
     (tail tail-clause))

After the head runs and before the tail runs, if a non-local jump enters the body or exits the body, we run code for on-enter and on-exit respectively. The fact the head was run but not the tail would be indicated in the frame running the body. So if a continuation jumps upward past this form, we run on-exit code. And if a continuation in the body is called (from outside the body) before the tail is run, we run on-enter code. It's slightly like a semaphore permitting only one control flow inside the body at once (in this thread) without passing through customs in the form of enter and exit handlers.

This formulation might not be practical, but it's a concrete draft one can criticize. It doesn't specify where the handlers go, only that upward or downward jumps passing the head and tail guards must invoke the handlers. You could have a less verbose variant missing handlers that simply kills the lightweight process, thus asserting such boundary violations cannot happen.

To be clear, since

To be clear, since I really didn't get into the multiple-value-return business before.

My view is that the asymmetry in Lisp is mostly an illusion. That is, a Lisp function does receive a single value and return a single value, just as in lambda-calculus — also as in vau-calculus, which is really just lambda-calculus in disguise, and the disguise is related to the illusion. The one value received by a Lisp function is its argument list (or its operand tree, if it's a fexpr). This single value is broken up into pieces by unification with the function's parameter tree. Scheme fails to fully support the symmetry, in a couple of ways. For one, it doesn't provide a way to break up a function-result into pieces by unification; but Kernel does suport this, as one can write something like

($define! (x y z t) (f))

which exepects the result of the call to f will be a list of length four, and locally binds the four elements of that list to symbols x, y, z, and t. This is, of course, syntactically incompatible with Scheme's short-hand for defining a function, a short-hand which I don't miss because I believe it to be partly responsible for newbies failing to learn to think in terms of closures; better to use explicit lambdas consistently. The other way Scheme fails to support the symmetry is that it does not require apply to accept a non-list for its second argument. In Kernel,

(apply list 2)

evalutes to 2, following the library derivation of function list,

($define! list ($lambda x x))

whereas Scheme does not require this behavior; I actually don't recall, off hand, whether Scheme requires (apply list 2) to fail on the grounds that 2 is not a list, but as I recall, most implementations of Scheme would fail (apply list 2) for that reason.

value passing in calls and returns

Calls and returns can be unified by seeing each as value-passing with a jump. If multiple values are involved, you can group them in a tuple as a frame, so it's structured like one frame argument.

The value frames for calls and returns are best handled differently only to reduce coupling. Calling a subroutine might allocate both args (in a value frame) and locals (in a routine frame) as a single allocation combining args and locals in an activation frame. While you could put args in a separate frame hanging off the frame for locals, the same way return value frames are handled, it's just more heap fragmentation. However, logically the entry point of a routine is just like the return continuation of every nested call, in that both receive values and execute with the activation frame as context.

Return values can also be inlined in activation frames, but then you'd be sharing more state than necessary, with coupling that restricts what can occur in nested calls, possibly interfering with TCO (tail call optimization). It's simpler to not mess with state in an original caller until actually returning to execute the sequel after the call, which might unpack the return value into locals as a first step. (If there was only one activation frame ref to a return frame, you'd have to unship returned values right away so you could discard the last when making a new call.)

The signature of a continuation (or sequel) does look just like that of a subroutine call when multiple values can be returned. Both calls and returns can populate a frame for values passed the sequel when it runs. Since the frame for locals did not exist before a call, it's an optimization to conjoin arg values and locals in one activation frame. But other than that it's just like a return. The model looks like frame dataflow in both directions, pushing or popping the stack. If it were not for side effects in mutable state, you'd have a functional tuple stack.

A function call looks like exchanging an argument frame for a return frame, where nesting is just doing this recursively. A single jump is half of a transaction: here's a frame for you to consume. A pair of them in call/return form swaps one frame for another. A fiber is a threaded frame-swapping machine that can have side effects on state outside itself, though hopefully only inside the same lightweight process. Trying to exit from the last frame in a fiber would be just another way to end the fiber as green thread. [That might be unusual when sibling fibers can share a large or small stem of a common call tree, shared at spawn time; forking a fiber amounts to "split me in two and return two different values to new and old fibers" with a cloned frame for a child so it can get a different return value.]

Garbage collection = Tail call optimization

Both calls and returns can populate a frame for values passed
the sequel when it runs. Since the frame for locals did not
exist before a call, it's an optimization to conjoin arg values and locals in one activation frame. But other than that it's just like a return. The model looks like frame dataflow in both directions, pushing or popping the stack. If it were not for side effects in mutable state, you'd have a functional tuple stack.

Except if there is a continuation that can escape back into that environment, or a reference has been stored outside the environment to anything defined within it, you have to keep the callee stack frame until it can be garbage collected. So the callee stack frame really and truly does follow exactly the same rules as the caller stack frame. The caller frame usually stays in memory not because it's a caller, but because of the slightly more precise notion that there is a reference to a continuation that escapes into it. If the callee performs a nonlocal exit, and the callee stack frame gets garbage collected, then there's no more continuation pointing back at the caller and the caller can get collected.

So tail call optimization simply falls out of calls and returns being treated identically, plus normal garbage collection rules.

nodding and agreeing

you have to keep the callee stack frame until it can be garbage collected.

Yes, I didn't mean to imply stack popping meant freeing space; but I see that can mislead by implying past experience (pop means free) that's wrong. Only the tip of an explanation iceberg is short. There's always a long fractal tail of provisos. I find this okay when story doesn't change, and fractal tail add-ons match, just with more detail.

In this case stack purpose is organization (non-interference with sub-tasks) rather than space management. In Lakoff's model of language, terms harken to mini-stories applied like templates with standard slots you can fill. The template I meant was "because this is how we model recursion" as opposed to "because this is how we manage space."

You can always run out of memory by holding too much before a task is done. Space you hold should be visible under a debug tool. Lack of visibility causes poor engineering. Feedback improves plans. Implied memory management in languages with gc and no profile tools can encourge bad practice.

So tail call optimization simply falls out of calls and returns being treated identically, plus normal garbage collection rules.

Yes, that's a take-away someone new to TCO should get from the story. When a nested caller isn't needed any more, and we stop using to it -- passing a higher exit sequel return address instead -- it's unreachable and can be collected. Under refcounting (like my model), that can be right away, so pools of standard frame sizes are important for efficient performance.

Treating them the same means: jump wherever you want and reclaim unused things.

geleranization vs restriction

If I follow rightly, you're describing a scenario where instead of both the input and output of a function each being a single value, each is a frame. The difference being that a frame is not a first-class value, but is instead a second-class list of values. The Smoothness Conjecture naturally favors generalization, first making these frames first-class and then allowing them to be things other than lists; and when you make those generalizations, you're back to the input and output each being a single value. Hence, as I see it, the scenario where the input and output are each a single value is more general than the one where each is a frame. That's the position I took in the Kernel design: that multi-value return is a restriction rather than a generalization.

smooth like surf-rounded stones

I did partly phrase it as I did to match your earlier comments about how a tuple still looks like a value to lambda calculus. One template in my remarks was "you can look at things as if" to present a model that purports to offer sensible reasoning still in accord with old views, but affording a new useful angle. Whether a model is literally true in code depends on how closely you reify it; I was going to reify it quite closely when rewriting async C code.

Whether a frame is first class depends on whether it's visible as a named entity like others, in the layer in which a dev is writing code. You can have them down inside a runtime implementation R without making them visible in high level H. So they'd only be first class in R, and perhaps visible only indirectly in H.

High level things are generally abstract so you can map them to different runtimes. Anything you make first class in high level H is obligated to be first class too in every runtime Ri, for i runtime versions implementing H. Features in high level specs are minimized to avoid constraining every runtime.

You can look at C as if that frame model is true, and you can see it, without reasoning that induces nonsense. But then you can also make it literally true in code organization through source rewriting -- for example, if you want part of the stack heap-based, like all the calls above atomic leaf call trees in C not rewritten this way (because the more numerous leaves are much faster, as before).

The async VM for scheduling C jobs in a trampoline will call jobs that execute the next sequel, using exactly the model I described. The arg to each call is a frame, and the return from each call is a frame, so the model is frame swapping with side effects. (And at the bottom, performance sensitive code runs in the original fast C that is not rewritten. Control flow overhead affects only high nodes in the call tree, where yield must be possible for async scheduling.)

(Below I use job as the concrete mechanism satisfying abstract fiber. You should mentally replace job with fiber each time you see it, if confused.)

There's a frame base class, a struct, which must be the first field in each frame subclass. After that comes the formal arguments. And finally come local variables. When you "call" an async method, that routine allocates and inits an activation frame, copies arguments into the formal parameter section, then points at the previous stack top before replacing the job's stack top. (The frame is efficiently pushed on the stack list by refcount using a handle.) The sequel to run next, ie a continuation, is the next routine run by a job, which occurs when the trampoline next gives a job control. The sequel looks like the continuation to a get-my-args call, which is what this routine just did while providing the activation frame too. Ultimately, this activation frame is swapped for a return frame, though using different slots so they are semantically distinct.

A call operation amounts to "push activation frame and return to trampoline". If the job's lane (lightweight process) just ran out of timeslice, it yields now. Otherwise it keeps going until it parks (green blocks). If parked now, because it just a made an async api call and must wait for async callback to unpark it, some other job in the same lane runs instead, as long as the timeslice is not exhausted. If the other job is a consumer for what the parked job produced, the handoff causes the content to keep being handled while hot in the cache because there's no context switch beyond changing which job is in charge.

You can think of the stack top frames in job fibers as the tips of knitting needles weaving a fabric of computation in the form of object graphs, where the yarn is refcounted nodes one can connect like tinker toys. The moving business end at the head of the machine can generate an arbitrarily large trail of data structure, either on purpose because that was the goal, or accidentally because you leaked by mistake. With a lightweight process limit, allocating too much kills the lane after first being put on notice.

Formally both arglists and returnlists are single values.

Yes, you can map it directly onto single-argument single-return lambda calculus (plus name binding and side effects) where frames are an aggregate data structure and the single value passed back and forth is the frame, and most of the values stored in each frame are thunks. In fact that's a very good description of the VM.

The point of this aspect of my obsessive symmetry-seeking is about language expressiveness/ease of use/consistency of the user's mental model. The user doesn't think about or write code for packing arguments into a frame before calling a function with >1 arity and then unpacking them on the callee side; why should the user have to think about and write code for packing a frame with return values and unpacking it on the caller side when a function with >1 narity returns? Why shouldn't destructuring that return frame be handled in the same way, with the same nice compact lambda-list syntax that the user has already had to learn for destructuring arguments?

The point for usability is that it provides a nice means for applying all the stunts you do with extended lambda lists (optional arguments, named arguments, variable bindings, and unevaluated arguments which enter the called routine as thunk values rather than as the plain old data they evaluate to) to the return frames. The alternative is to require extra effort and clutter up the program with unnecessary code by laboriously packing the frame on the callee side and laboriously unpacking it on the caller side. Manually packing and unpacking is so annoying that 99.9% of the time people won't do it, and 98% of the time they won't even realize that they can, even when it's the right thing to do.

If you're going to allow multiple arguments in a call, it follows that you should allow multiple returns in a recall. Regardless of whether, under the skin, what gets passed is a single frame-flavored value, they need to have the same ease of use as multiple arguments. This helps the user understand that they are the same thing, as well as being convenient. And if you can accomplish that using the same syntax, that's a bonus!

Yes, the smoothness conjecture means that if you have frames, a frame should be a first-class value. And, yes, a callee can get its args as a single first class value (of type sequence) and, as a bonus, the caller can get its returns the same way.

Specifically, with the same syntax as in traditional lisps; using a single symbol to get the sequence of values and thunks, rather than a lambda list to destructure them and bind local variables.

scope, winding, and resources

A common resource management pattern looks like 1) acquire resource, 2) do stuff, then 3) release resource, where doing stuff includes calling routines that might invoke non-local exits, potentially skipping the necessary resource release in (3). A scary resource to acquire and not release is a lock, since system deadlock can occur afterwards if enough things pile up. Other standard things to worry about are open files and allocated memory. But we can call them all resources without loss of correctness when ignoring semantic detail.

(I'm not going to summarize RAII in C++ or dynamic-wind in Lisp systems, or equivalent mechanisms. Folks needing clues for further research might start with c2 wiki pages about resource-acquisition-is-initialization, dynamic-scoping, and call-with-current-continuation. Finding more links should not be hard, and the problem should be familiar. So the only question should be: how do you make non-local exits safe without a crazy or complex interface?)

In the absence of green threads explaining when activities might resume, when interrupted by non-local gotos, it's hard to reason about when resource cleanup might occur in single-threaded systems with complex control flow. And it's hard to make sets of definitions that don't conflict. With threads however, the thread will eventually reach 3 just once unless you (a) kill it, (b) jump past it, or (c) re-enter that sensitive section again by firing an exit going to a sequel after 1 but before 3. The last seems like the perverse one of course.

Let's restrict ourselves to locks, and assume other leaks (memory or file descriptors, etc) are handled as easily as locks when coping with non-local exits like exceptions. Instead of critical section, we might call the 1-2-3 sequence in the pattern by another name that also covers non-mutual exclusion issues. Maybe restricted or vital section depending on whether you want to emphasize access control or resource importance and liveness.

Causing another routine (say a coroutine) to run in another thread is not a problem while we hold a lock: We'll eventually run again and release the lock once done with section. Firing a non-local exit during 2 that remains inside the scope of 2 is not a problem because we will still reach 3. Killing the thread during 2 is also not a problem, if thread cleanup releases all resources held, which amounts to reaching the cleanup done by 3 anyway. So our only clear exposure to trouble is a non-local exit from 2 that leaves the section, so 3 can only be reached later by resource cleanup (if it occurs because no leak is present). If we re-enter 1 again already holding the lock, we can deadlock if the mutex is not recursive. (Don't make locks recursive; it's better to pursue exact control.)

So code performing steps 1-2-3 of the restricted section wants to know when this scope will be violated by any exit to a stack not including the sequel heading to 3, or by any entrance that skips over 1. Perhaps it would like to forbid this, perhaps by killing the thread if it happens. But that's just a special case of calling a handler when it happens, since the handler can kill the thread. Invoking an exit can check a thread's list of handlers (if any). No handler means do it now. Any handler whose stack frame would be skipped (because it's not in the stack of the sequel the exit targets) must cause the handler to get run before the exit dispatches.

To control access via entrance by non-local exit to a sequel inside the section, you only need to see whether there are such sections with handlers in the sequel. Then you can either forbid or handle.

Forbidding and killing (instead of tolerating and handling) when a restricted section is violated seems like the rational and sane thing to do, because asserting "this will not happen" leads to simpler invariants. Actively encouraging more complex results is self-defeating, because the technical debt of analyzing impact on bugs later can follow you forever. It's easier to say no, and it's technically easy and efficient to say no, when the activation frames are clear in the cases involved.

Edit: note under threads the problem with opening and closing files is no longer a problem, because there's no reason to close a file if the thread keeps running. To do something else while a file is open, you have another green thread do it. It's only weird jumping-around-in-scopes-with-one-thread that causes conflict, because that was really just a lousy poor-man's-concurrency tactic.

Actually reified continuations are WONDERFUL

For instance I (and I'm hardly the only person to do this) have very neatly embedded an extended Prolog in Scheme using reified continuations.

If you're slightly clever, you can also implement all of the other kinds of contiutions using them, such as delimited composable continuations including such behavior as not capturing what you don't want and not messing up the garbage collection (tricks include starting an empty continuation at startup with a command loop in it and saving to run functions passed to it, and using call/cc to call a continuation.

I have Prolog queries reified, that's useful, being able to embed backtracking inside of other control structures that aren't affected.

I'm about to write a reified optimizing search, that's one that allows you to save the state of partial searches and continue them... Ideally you'd be able to control the trade-off between saving state and re-creating it as needed, keeping track of memory usage and switching between saved state and iterative deepening where you recreate it.

Note that for saved state, continuations DON'T SAVE ENOUGH in a language with mutable state - so my continuations have to save more, and you even have mark which values are to be exempt from saving...

If you think that reified continuations are useless then you've never thought about search, never thought about logic languages, never thought about constraint programming, never thought about narrowing.

Search has a lot of uses

Don't forget parsing and code generation.

Is there a good reason why

Is there a good reason why you think you can't use delimited continuations to implement Amb?

Also, can you take code that uses undelimited continuations to implement Amb, and other code that uses undelimited continuations to do anything else, and use them together in the same program? Ever tried?

You need at least reified continuations, delimited or not

"Reified continuations can do the same kind of things that you can do with delimited continuations in terms of returning from the function or passing one to a called function or reordering function returns, but reified continuations have an additional property: having captured a reified continuation and stored it as a value in some variable that persists beyond the dynamic extent of the function (ie, after the function has already returned) you can call it and simulate that function returning, with the same environment (that may include the environments of other functions that have already returned) AGAIN, even though the function whose return you are simulating has not been called a second time. This sounds like a good idea, but experience has taught me that it is deeply problematic."

If you can't continue inside of a function that has already returned then you don't have amb.

Sorry, I used the wrong word.

I really screwed up when I wrote the definitions above; I put the wrong words in front of several of them. To start with, you're right. None of these tricks work with implicit-only continuations.

"Reified" in the sense of being data that can be stored in variables and structures, returned from functions, etc, is not what I should have said. That is actually the axis of another split, reified vs. implicit. All of the continuations I'm talking about are "reified" in that sense.

I should have used the word "undelimited" to express what I meant. The "Wrong abstraction" that I'm talking about are continuations that, when called, never return. These are the kind we get in languages like, eg, Scheme. They are powerful in that you can use them to implement a lot of neat things, but you can't usually have more than one neat thing implemented that way per program.

There are several other mechanisms that have more expressive power and CAN be used to implement several different neat things per program, so you can have, eg, amb and green threads at the same time. Delimited continuations where control bounces right back to the point after the continuation call when it reaches a particular point elsewhere (usually but not always in the caller or its caller or etc) are popular (if it's a caller or its caller etc, this is the "exception" pattern).

Another mechanism, which I happen to prefer, is "yield", where jumping out of a flow-of-control always creates a way to jump right back into it at the same point. "Yield" is especially useful when it can send multiple return values back to the caller (its own continuation plus other things) and then whoever uses the continuation (usually but not always the caller or its caller etc) can call it with arguments that are seen at the continuation point as the values that "yield" returned.

"yield" and delimited continuations are equally powerful, in that you can use either one to implement the other, or for that matter either one to implement undelimited continuations. But you can't implement either of them in terms of undelimited continuations.

I think you're wrong that undelimited ones aren't enough

With enough glue you can implement delimited continuations using undelimited ones, you just have to save a clean continuation with a command processing loop in it, and push your function into it at the start of the prompt or whatever other limit you have.

If you want composable continuations the trick is to use call/cc to pass the current continuation to the one you're starting so it can send a value back.

You're free to think so.

Think whatever you like, but try to write a program in terms of call/cc that uses both amb and green threads. If you succeed, you will still discover that

(A) It will bear little resemblance to any code you wrote when implementing programs containing only one of these features, and

(B) It will allocate memory that cannot be garbage collected, until it crashes.

I don't see a problem

But my implentation of amb uses thread local cells, so it's already working on top of Racket's threads which are not green.

It's true that amb would be rather confused if it had to coexist with another amb-like construct, but it's not confused by threads, it simply assumes that every thread has a separate non-determinacy universe and keeps a separate root for each thread in a thread local cell.

Perhaps the principle is:
Every multithreaded program has to be designed from the ground up to be thread aware.

And perhaps "every non-deterministic program has to be designed from the ground up to be non-determinism aware."

There's another case

When you reify non-determinism, then you have to be able to separate it from the current thread's chain on return and reintroduce the chain of choice points when you re-activate it. That goes on behind the scenes though.

Seriously, just write the code. You'll find the problem.

If you're not using green threads implemented in terms of call/cc, you're avoiding the problem. Pick some other "clever" usage of call/cc instead and try to mix that with amb.

If you use call/cc to implement both green threads and amb, then each capture of a continuation used for amb will contain a reference to a stack containing a continuation to be used for green threading (or vice versa).

Each of these continuations prevents the other from being garbage collected, and at least one of them is always "live" in that the program can still reach it. Whenever one of the "live" continuations is used, either one or two new continuations are captured and added to the cyclic structure.

Because the GC doesn't know *which* continuation your program will use, it can't throw either away until your program actually uses one - but at that point the program has captured at least one *new* continuation that also refers to the one your GC couldn't throw away. And the one you couldn't throw away contains a reference to the one you used, which means you still can't throw that one away.

Repeat ad nauseam, and you run out of memory.

Your suggested "solution" creates a sandbox within which you're simulating delimited continuations. That's a good insight, and in fact there is *NO* use for call/cc that is not better served by delimited continuations. So why not use them in the first place?

Further, your sandbox still leaks memory because the outer level at which you're delimiting continuations is a live frame containing references to two different continuations - each of which captures a reference to the previous value of the other. No matter which one you use and replace, the stack gets deeper because the other still refers to it.

I misstated

I actually only save ONE continuation at start then restart it and let it mutate to become each new search.

Since it's saved at the start of the program, I don't think it contains anything but globals, and it doesn't prevent anything from being gc'd.

And since a search only lives in one thread I don't think there's a problem. And if you implement thread local cells, I think you have enough tools to avoid problems. but we've been going around and around too long on this.

I think it can be finessed

In a language with mutable state, you can erase references even without leaving functions they were passed to. If you're aware of the garbage collector and the stacks and so on, you can code so that things work.

As for gc

if each green thread starts with an empty continuation saved at startup, then green threading won't mess up gc even with non-delimited continuations, because each saved continuation doesn't include the others - and amb isn't relevant if it has a thread-local root.

Did you hear yourself?

Bearing in mind that "starting with an empty continuation saved at startup" and "thread-local root" are just ways of saying you want to use delimited continuations?

agreed, in the case of the saved continuation,

but it does mean that one can be implemented in the other.

scatterings of techniques vs language features

Yes, there are a variety of techniques that can be conveniently expressed using continuations. If there were not then, by now, no language would have them. The subject problems arise, though, when we start trying to combine these techniques in programs and libraries.

But isn't the recommendation to throw away

all of the techniques and all research into variations on these techniques when one makes blanket statements like "reified contituations are bad and shouldn't be implemented".

You either have the necessary tools to do the work or you don't.

re "but isn't...."

But isn't the recommendation to throw away all of the techniques

No.

As I said elsewhere -- wrong word.

I messed up when I wrote the OP. I should have called the Wrong Abstraction I was talking about "undelimited" rather than reified.

But I said this already when I responded to your other post, and explained more fully there.

do you have a good example in mind?

While I'm not getting Josh's point, I am interested in problems you mentioned. If you know a problem different than one I note next below, I'd enjoy hearing it.

The subject problems arise, though, when we start trying to combine these techniques in programs and libraries.

Here's the problem I know about: in the presence of state mutation, most code will act sanely only if a continuation is called just once, because it won't be in the "right state" a second time. Viewed as an FSM (finite state machine), a local piece of code expects to resume in a specific state to handle expected events. When neither state nor events match expectations, graceful failure would be nice, but in C (for example) a SEGV is pretty likely.

Code would normally need to be designed so it can reasonably cope with continuations called more than once, so consequences of potential interference and entanglement have been considered and handled by a developer. So by default, normally you want an unplanned second call of a continuation to (minimally) trace in debug stats, log errors, and possibly raise an exception.

We might annotate calls to say what is expected about a continuation corresponding to the return. A (too?) simple scheme might be traffic light colors: green for okay, yellow for warn, and red for stop (via failure). A sensible default would be red. Yellow is indecisive, while green says a dev intended this to happen so its a normal thing. Whatever the color, stats can record what actually happens, or else a dev who is debugging cannot easily see what happened. (Obviously call/cc is an explicit green light.)

It doesn't take much non-determinism to make a system very mysterious and thus mostly immune to analysis when debugging; we need breadcrumb trails recording enough evidence. I'm assuming the problems that come up (when combining techniques in programs and libraries) are related to instability and mystifying non-determinism. Did you have another issue in mind?

Processes

I've written about my process abstraction several times on LtU (the only place that comes up in search for me now is here). I believe they are similar to algebraic effects (which also handle this kind of thing very nicely and so you might want to check out) but with a more concrete feel. Particularly, this problem of continuations not packaging up state, too, is nicely handled by the abstraction.

still over my head -- I like the pool's shallow end

While I read everything you write, I drop all packets early when I get a match on abstract math, Haskell syntax or examples, or anything with the word monad in it (which has never improved a message from my perspective). I love the idea of process, and like that you're doing something related, but I have no idea what it is because I use explanation in plain language as a filter predicting consequence on regular developers I want to support. However, if you merely make people like the word process, that's useful to me. :-)

You're the author of several of the funniest things I've seen on LtU. (The Aha! remark about confusion with the band was really first rate.) I'm okay with sarcasm when it comes with terrific wit. So feel free to beat me up if it drums up business for your project. If you have a recipe to make most developers comfortable with Haskell, that would be awesome.

Processes

From a programming perspective, the main ideas of processes are:

1) All "side-effects" happen by signalling out from your local process to handlers that handle those signals. So when you write something as simple as this:

var x = 10
...
x = 20
...

We can model that with processes by having the part following the variable declaration be a nested process that signals 'get' and 'set' and having the var declaration install a Process that handles those signals and maintains the cell state.

And of course this scales up to more complicated signals, like e.g. network communication. Rather than calling global OS level functions to deal with sockets, your application could be a process supporting the Socket protocol. It is then straight-forward to instantiate the application in a sandbox, handling those socket requests yourself.

2) Because processes don't have hidden links to "the heap" (objects living in the nether and owned by no-one), it's easy to package up processes, saving their entire state. Further, you can do this in response to signals. So, for example, you could handle a request to send a packet by capturing that process as a value and doing something with it later (possibly multiple times). This gives you the power of delimited multi-shot continuations but in a nicer way (because there are clear lines of what state is being captured).

I haven't put out a careful write-up of any of this yet and there are many details I'm not addressing, but I'm just trying to give a big picture view here to convince you that it might be interesting. It's really not that mathy.

PS. I'm glad to hear that you appreciate my attempts at humor, but sad to hear you think I'd want to "beat you up" over these comments. I hope I haven't come across as a bully in the past (and I apologize if I have). I'm usually easy going.

any more elaboration on process signals?

I like the first signal handling part, but the second part is still too slippery for me to grip, where a process as value gets applied like a bundle of context-free behaviors (although that sounds a lot like agent said that way: an object that's a process). I'm used to thinking of signals moving around in a computation graph, but it sounds like you want to move a process around the graph too. Whenever folks asked me about mobile code around 15 years ago, I said I liked the idea of freeze-drying code's todo list and giving that to new code instances instead. It might depend on what you consider to have identity, since you can always remount new things under old names, so identity can be transferred; that probably gives a lot of duals in how one can organize code and data. But I tend to want to see values more like data than code.

by signalling out from your local process to handlers that handle those signals.

I especially like a degree of process isolation that moves focus to signal traffic, so a process can be viewed as a black box easily versioned, replaced, proxied, chained, restarted, load-balanced, and finessed in other ways, based on the idea entanglement is low enough that problems are merely negotiated instead of forcing failure cascade as when lockstep agreement is assumed. One expects a process that dies will not take out every other process in an app composed of a suite of processes. (Sloppy coding can still cause that, but hard dependency only requires local restart, not global reset.) Plug-and-play is the point, so a dead process should only be a temporary red light on a dashboard.

Rather than calling global OS level functions to deal with sockets, your application could be a process supporting the Socket protocol.

Yes, an abstract protocol interface is much better. You can map it directly to OS level functions, at the cost of a mere function call on top of a system call, but now you have portable and embeddable code since you can stick it into anything handling that protocol. (You realize our talking about this will cause someone to want to spam the topic to drown message in noise. The ease with which code can be made portable is not a favored topic to all.)

Lately I've been doing something that induces a mild cannot-see-the-forest-for-trees experience, except I'm wrestling enough that I'm dreaming about it, so I must be learning for a change. I started drafting an env (environment) object that abstracts the entire world outside a VM, partly so I can first do this first entirely with conventional threads so I start with a baseline to compare with fibers. The env has a posix object inside, with a function pointer for every Posix call, with all the types abstracted. (For example, to hide sizes you ask an env to act as factory allocating things like threads, with everything refcounted so lifetime management is abstracted too.) The api can be wired up directly to system calls, or to a simulation, so getting a sandbox is fairly easy. Part of my abstraction of types (file descriptors for example) is adding extra handshaking like generation numbers, so closing someone else's descriptors by mistake will usually be caught; so a lot of unpleasant bugs should be stopped.

The weird part of the experience occurs when I think about the file system, and I can tell I don't have a global overview as I swim in detail. I think I'm going to aim for a simplified 9P protocol from Plan9 -- maybe I'll call it 6P for 6fs file system because it's only part way there -- that can also implement the posix api too. It seems like I'll be able to merge the OS file system namespace into an atlas that also mounts file systems serving this protocol in the local process, as well as other processes so I get a distributed ... something. It wasn't at all a goal to get that, but it seems to fall out by accident just from using a protocol, so the positive feedback implies a winning tactic of some kind. It's distracting to have the temporary scaffolding be exciting. But just making things protocol-focused affords a chance to mediate all signals the way you want. Even without a fiber engine, the thread-based stuff might be interesting already, just because you can play virtual OS in tools under your control.

It is then straight-forward to instantiate the application in a sandbox, handling those socket requests yourself.

Async signals are profoundly flexible in terms of what you can stack using protocols, as long as your code acts like a process operating on async traffic.

I hope I haven't come across as a bully in the past

No need to apologize. You're just the victim of Bayesian profiling, after observing most smart folks with a quick wit have a sharp tongue too. The same goes for expecting competitive hostility. (A hostile disposition is so much the norm, it's like expecting every dog bites with no provocation. Social mores generally don't forbid hostility; not even religions do, though toning it down toward members of the flock may be encouraged.)

Single threaded

One thing I should have mentioned right away is that everything about my Processes abstraction is semantically single threaded. That's a confusing aspect of the choice of name, I guess. I imagine that an implementation could be multi-threaded but I haven't worked through the details and it seems a little complicated.

Semantic simplicity is a big goal. For example, I don't have a global concept of identity and there is no action-at-a-distance. Semantically each signal follows the local wiring to decide which route to take. There is no way to directly send a signal to a far away process, even if the elimination of trivial signal forwarding is an important optimization in an implementation.

it sounds like you want to move a process around the graph too

Yes, that's about right. It's somewhat a special case, but you should be able to freeze processes and treat them as first class values. It might require some care, though, to ensure that the frozen value has a reasonable type and is not overly entangled with its context. If it is, you may find that the only sensible thing to do with it is restart it in an identical context.

Lately I've been doing something that induces a mild cannot-see-the-forest-for-trees [...] It seems like I'll be able to merge the OS file system namespace into an atlas that also mounts file systems serving this protocol in the local process, as well as other processes so I get a distributed ... something.

Sounds like a fun project for the holidays. It's been a while since I've hacked on a nice system like that. I mostly do engineering coding lately and haven't worked on my language implementation in a while. I'm hoping to spend some time working on the write-up of my logic over the holidays, though. I'm hopeful that I'll have something to share soon. Packet loss may be a problem.

threads are somewhat over rated

Threads are a little tedious, and preemptive scheduling is really obnoxious. So it's not a good model to embrace by choice. They need to exist in a single-threaded fiber runtime, just to stop fibers from blocking each other via "scheduling fragmentation", by letting a worker pool of threads do blocking system calls. But I'm not enthused about them, just resigned to their presence in standard kit, which isn't hard to use with a little care. (A "regular" dev usually doesn't exercise enough care and this affects everyone on the same team.) A higher level model that can compile to suitable thread use seems much more practical as a robust tactic.

everything about my Processes abstraction is semantically single threaded.

Makes sense to me, as long as each process is concurrent with others, absent deliberate synchronization to rendezvous on something. You can use processes themselves to simulate whatever threaded semantics you need. In a green thread system, there's not much role to a green process beyond associating related green threads. A shared mailbox when a green process has identity might be an exception; the significance of identity is partly induced by mutable state, because the only reason to name a process is because you know something about its history that changes its behavior (maybe it owes you a service debt). If processes are not interchangeable, you need to find the right one to continue an old transaction.

I mostly do engineering coding lately and haven't worked on my language implementation in a while.

My day job is almost painfully boring, with a focus on doing what must be done, with little scope for inventive thinking. The main reason I don't look for another job is because I'm not aware anyone is doing something interesting, product-wise, anywhere. (Everything web related is as exciting as double entry bookkeeping, which I thought was fun at age 18, but not any more.) It's all just stuff that needs doing, often in very complex ways not actually necessary. I keep wanting to ask whether a Rube Goldberg machine was really necessary here.

So it is a fun holiday project. :-) I hope to work up enough steam to keep at it a couple years as a regular hobby project. Maybe I'll understand some of the plain language summaries of your writeup. I'd be surprised if there was such a summary, though, because discussion often stays as hard core as original texts, because folks treat highly detailed as meaning "real", while general description is hand waving. The other versions with different details are real too, though, but are hard to discuss using terms bound tightly to one formulation. You could always try to hint at other ways of doing things that might have worked too, so folks are not quite so narrow in their feedback. Don't you hate when people echo back exactly what you say?

interesting Jason Hickey paper on async Plan9 file i/o

I found an interesting 2004 thesis by Jason Hickey, Providing Asynchronous File I/O for the Plan 9 Operating System, in partial fulfillment of two different degrees in different departments, which I won't summarize. As the abstract below makes clear, it has sections about async resource access and user-threaded web server performance, so it targets fine user space scheduling, of interest to folks adding that to a language. In my case it's likely a plausible match for my plans with fibers using a 9P style atlas namespace.

This is somewhat off-topic, but it doesn't really belong in a discussion of it's own, since it has no direct bearing on programming languages, unless for some reason you're interested in presenting system interfaces at a language level. You can read about Plan9's file system protocol a long time before running across this paper, but it's one you would want to see early if you had any interest in practical ID management issues. I was looking specifically for an explanation of exactly what is meant by qid in 9P server semantics, and while I found out, I still have no hint what q abbreviates. (Conjecture: unique.) The 32-bit fid values chosen by a client and 64-bit qid.path values chosen by a server would seem, from first principles, to have collision issues when multiplexing clients or servers who don't know about each other's extant ID choices. Big parts of the thesis are about addressing such collision issues specifically, which is what I wanted to read about. :-)

This thesis proposes a mux abstraction that multiplexes messages of a network file protocol to provide asynchronous access to all system resources on the Plan 9 operating system. The mux provides an easy-to-program asynchronous interface alleviating the need to manage multiple connections with different servers. A modified version of the Plan 9 Web server demonstrates that the mux can be used to implement a high performance server with user-level threads without having to use a kernel thread for each user-level thread.

Scalability tests demonstrate that the mux implementation scales well with hundreds of clients and hundreds of servers. Furthermore, the user-threaded version of the web server performs comparably with the kernel-threaded implementation on disk bound workloads and exhibits an 18% decrease in performance on in memory workloads. These results suggest that the mux could provide performance benefits for more intricate applications that can exploit the fine-grained control of user-level scheduling.

Like, for example, a fiber VM with cooperating green processes. (Regarding the last sentence's conjecture.)

Edit: If someone wonders "what do folks sometimes put into a qid, anyway?", at least one interesting answer appears in golang's copy of Russ Cox's code for src/lib9/_p9dir.c which does this:

   110			d->qid.path = ((uvlong)st->st_dev<<32) | st->st_ino;
   111	#ifdef _HAVESTGEN
   112			d->qid.vers = st->st_gen;
   113	#endif
   114			if(d->qid.vers == 0)
   115				d->qid.vers = (ulong)(st->st_mtime + st->st_ctime);

That's also interesting because of how an inode generation number is used as the version number, about which Maciej Rozycki says:

As you may know, there are serious problems with creating unique file handles in the userland NFS server. They exist because the inode generation number, which allows to determine if an inode was deleted and recreated, is currently only available to the kernel -- it's by no means exported to user programs. I've been working to remove this limitation recently and here I am presenting the results.

The qid version number is supposed to help with cache coherency, to tell when remote content changes. My adding this material aims to get folks asking themselves, "Hmm, how do I manage cache coherency issues in distributed data systems?" Programming languages definitely need to deal with cache coherency here and there.

State mutation can be somewhat handled with enough work

It's inefficient, but my embedded backtracking and prolog had some solutions for that. My implementation of amb wasn't the obvious simple one, there was a list keeping track of choice points and other related details.

So there were some new let forms that created variables that interacted with amb. One sort was the familiar logic variables, but another were mutable state variables that know about choice points and revert values back on failure. There was also a generalize run-on-backtracking form that is useful for implementing data structures that revert mutations ... there was also a tagging system to enable a few optimizations that, sadly, aren't complete because scheme's optimizer doesn't know about amb or these speical forms.

With some work, one can write programs that interact with amb in some graceful ways...

I really like the the ability convert between non-determinism and iteration or generators as well as the ability to convert in the other direction. Ie to reify non-determinism.

I used a unification

I used a unification algorithm & linear-equation solver at one point that sort of worked like that: State a set of constraints and relationships that had to apply, and then let the system try to find sets of values that would satisfy those requirements. I wasn't using a lispy syntax on that project, but I can see how your alternate let-form would work. Something like this?

(let_solve (x y z)
   (and (=? (+ x y) z)
        (=? (* y 2) z)
        (=? (+ 1 y) z))
  (write x y z))
==> 1 1 2

What about AMB?

Among the versions of continuations that he mentioned, only reified continuations are powerful enough to implement the amb operator, am I right?

Amb is powerful enough to implement most of what I mentioned above, thought not some of the optimizations I use.