Strongly Typed Coroutines (minor question)

I'm considering "asymmetric full coroutines" and their operators as defined in this before LTU described paper Revisiting Coroutines. At least recent versions of Racket support these style of coroutines I'm told (documentation lags), but in a latently typed language.

In this model, a coroutine supports operators:

  • c = create p
  • r = resume c s
  • s = yield r

Furthermore, when the coroutine function exits "normally", it is then "dead".

Thus coroutines are as much, or more, "pipes", with values "s" fed by resume and consumed by yield, as they are "generators" just returning values "r" from yield (as in Python, as I understand its limited generator mechanism).

Just as important, as a "stackful" construct, the yield call may occur anywhere on the call stack, in any called routine - a desirable property.

Thus, my inquiry is how to type the yield call (r -> a) in a manner expected by the coroutines' creator and caller (hence the "asymmetric" description of these coroutines).

One possibility is to wrap the coroutine in some kind of parametric class type:

  class Coro s r is ... yield: r -> s; ... end 

The coroutine object might be passed down the call chain eventually to any routine wishing to yield, now via:

  s = co.yield r;

This seems quite burdensome, requiring design of routines explicitly for coroutine use. An attraction of coroutines is a flexible, more-or-less ad hoc inversion of control mechanism.

Another, more complicated option, is to percolate yield types from routines up through callers, in the manner of Java checked exceptions, restricting the universe of routines callable from a coroutine, eliminating potential calls to yield that break type constraints.

This latter method only affects routines that are known to call yield in their subroutine calls, either directly or via a passed HOF (objects with methods calling yield, etc.).

A second question is how to distinguish values returned via yield from the final regular coroutine return indicating the death of the coroutine. Is it customary to wrap every yield'ed value in Some r and then return None upon coroutine death? The same question might be asked regarding any indication of the exhaustion of values sent to the coroutine via resume c s.

Or is some potentially more memory efficient, if stylized, mechanism typically employed? FWIW, the use of exception handling language facilities for this kind of "out of band" control flow offends my sensibilities. Dynamic control via continuations, CL's lexical block label dynamic exit special form, or even Icon's fail mechanism all present themselves more appropriate.

Yes, a rather narrow pair of question for LTU, but I hope it will interest some folks here, and as always, any and all help greatly appreciated.

- Scott

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I'm trying to understand

I'm trying to understand what property you want to type. E.g., there's a continuation monad in Haskell, and thus you can encode weaker forms. Part of it, like making sure an argument 'c' being passed to 'resume' is a continuation, seems basic. Or are you wondering how to do this without the manual taint (e.g., using inference)?

Or, is it that you want to prevent dead continuations from being called? That actually sounds pretty neat! At first, I was thinking of a solution using linear types or type state, but then I got worried about composition, and didn't reason through it all the way. It'd be pretty useful to have, even beyond basic safety: consider better resource handling in continuation-based web servers.

Edit: rereading the post, it seems that your first question is about the former, and second about the latter. Using 'maybe' types for the latter seems to break your goal of transparency, however, where the type system does the work.

Coroutine interface handles

The cleanest solution I know is replacing yield and resume by put and get operations. These operate not on coroutines themselves but on typed handles (sources and sinks) that reflect the coroutine interface. A producer coroutine applies put to a sink with a value, a consumer gets values from a source. Then all you need is a pipe operation that gives you a bound-together sink/source pair.

One example of this design can be found in OmniMark. For a more generic and low-level approach, see my streaming component combinators EDSL implemented in Haskell.

pipes sound good

Mario B.: A producer coroutine applies put to a sink with a value, a consumer gets values from a source.

That's what I'm doing again right now, revising a lightweight process model under trampolines. Typing pipes seems simpler than typing coroutines. I prefer doing dataflow with pipes, so scheduling is driven as a side effect; a put need not force a context switch until bounded buffer limits kick in. (I'm doing this in naked low level C, so it's not a language I can describe, at the moment anyway.)

If coroutines need to be typed per values used in yield and resume, this seems to require coroutines have pipe-ness built into their semantics. Every time I bind together two concepts this way, when they might be independent, it causes surprising unintended complexity somewhere downstream.

Maybe that's how coroutines are defined (passing dataflow values between one another), as opposed to merely being independent fibers that happen to use a pipe. But I'd rather have fibers that happen to use pipes in order to effect coroutine behavior. You can always force a rendezvous style context switch on every put with a pipe permitting only one write in the bounded buffer semantics.

If you keep the two ideas separate, you can have fibers return a typed value on exit, which put and get typed values on pipes, without those being the same type.

Typing coroutines via pipes

Pipes might be easier to type if the I and O of the coroutine are separable, but would seem to greatly complicate using types to capture: When a coroutine signals "GetX", it's now in a state where it's waiting for "Return".

Or am I missing something?

expanding context a little

Without specifying a developer visible model of time with respect to coroutine behavior, it's hard to talk about them being in one state and then another. I was tempted to ask Scott about his time model, but decided my remarks could just assume my own model. :-) I think a potentially concurrent model is implied by talking about fibers.

If I assume fibers (lightweight green threads), then on get a coroutine either keeps going because the next X is there, or it blocks until a producer signals X is in the pipe to awaken the consumer. So a consumer only indicates interest in getting an X because you can see it blocked on whatever primitive it expects to signal (a lightweight condition variable in my case). Here a pipe is like an on-demand future for a stream of X's, and looks like a generator or memoizing cache depending on whether the buffer is empty.

I can also imagine a coroutine definition with no real time element except as metaphor: it looks like two indendent continuations handing off data, but it's actually just a formal model in the language, implemented with unspecified time behavior. Here you'd ask a developer to imagine time behavior to grasp the model, then you'd say you weren't actually doing that and refuse to pin down details.

I'm particularly interested in semantics of starting and stopping, especially stopping that expresses failure. Will coroutine types imply what happens in producer or consumer failure? Or does the model define failure as impossible? (I know this is funny, but folks do that.)

I know this doesn't help analyze whether typing coroutines is simpler or not. But in the absence of a progress guarantee, or a way to examine coroutine status to make runtime decisions, I'm not sure what to make of a coroutine type in a contract -- except it means when you do get a value, it will be of the sort specified. I guess I'm vaguely curious whether types ever encompass time semantics, or whether they tend to address only timeless qualities.

I've been assuming Scott wants a producer coroutine to awaken a consumer as cheaply as possible so there's no performance criticism, and that strong typing helps them contract whether data put and gotten is acceptable to both. But that he doesn't want to specify time and progress behavior, which I'd like, or how either end disengages (like closing a pipe). Out-of-band behavior like "close" can be hard to union with type of content in the channel.

Coroutine death is tricky business

I'm particularly interested in semantics of starting and stopping, especially stopping that expresses failure. Will coroutine types imply what happens in producer or consumer failure? Or does the model define failure as impossible? (I know this is funny, but folks do that.)

I know of no truly satisfactory answer to this. I can offer two designs I've tried personally. They both turned out to work well in practice. You may consider this as good news, if you're practically inclined, or otherwise as an affront that demands a third solution to encompass both.

The low-level EDSL I linked above relies on Haskell's Maybe (in case of get) and Bool (for put) types to communicate if the peer coroutine has died. I was at first worried that checking the result all the time would be too cumbersome, but as it turns out that was mostly delegated to higher-level operations like putList.

OmniMark uses a different approach. The consumer always runs first, and the producer waits until it's asked for data. If the producer dies first, the consumer continues running normally; it can see that its source is at EOF. If the consumer dies first, the producer gets killed; only its always blocks will be run, and if they try to write anything into the sink a run-time error is raised. This story had a few refinements before we got it right.

Expose rather than complicate

I wouldn't call things any more complicated, really. It is quite possible for an untyped coroutine to be in a state where it expects something done before it resumes. As one example, Lua's untyped coroutines raise an exception in this situation.

Don't get me wrong. Your objection is valid, you've just overstated it a bit. It's similar to saying that typed value-returning functions are more complicated than untyped procedures because a call to a function expects a value.

You need at least three coroutines to cause the problem. You may have, for example, two producers P1 and P2 feeding one consumer C. When C suspends to P1, if P1 may resume P2 rather than yield a value to C, you hit the problem: the values P2 yields may be of a different type.

There are ways of resolving this issue. The one I've settled on is to simply disallow P1 from switching directly to P2, because it has no handle to it. The above configuration can be represented like this:

pipe
  (\sink-> P1 sink)
  (\source1->
    pipe
      (\sink-> P2 sink)
      (\source2-> C source1 source2))

Note that the only handles P1 and P2 are given are one sink each, both of them leading to C. The consumer picks from which source to get the next value, which resumes the appropriate producer.

+1

I put some thought into this exact issue a while back (I was hacking together a little coroutine library in C#). This was indeed the best solution I could think of. Instead of allowing an open-ended top-level yield construct, make it a method on some typed object that knows the yielded and resumed types. If you don't have that object, you can't yield:

void DoStuff(Coro<int, string> coro) {
    // do stuff...
    string result = coro.Yield(1234);
    // more stuff...
}

I believe this problem is similar to supporting non-local returns in closures. (I.e. if a return inside a call to a closure should cause the outer calling method to return, how do you ensure it returns the right type?)

Good Question

Yes, a rather narrow pair of question for LTU, but I hope it will interest some folks here, and as always, any and all help greatly appreciated.

For what it's worth, I think this is a great question, and I don't know of any better place to be able to discuss things like this.

Thanks all

With regard to typing yield, using either a typed pipe as a (pair of) typed handle or an object handle with a typed yield method would do the trick for my particular needs.

Termination detection might well be a function of the pipe or coroutine object or whatever. I'll spend and hour or two doing stream of consciousness pseudo-code to see what coro utilizing control flow seems most natural.

Some of the interesting problematic cases might be avoided if the thread of execution that creates the coroutine must uniquely enjoy the call-response relationship with that coroutine. One can pass coroutine "sinks" to other I believe this is the model discussed in the above linked paper, but I'll have to double check. In any case, in general, I'm looking for high bang-for-the-buck but more or less trivially managed control abstractions.

One interesting case is if the coro termination returns a value. Here something akin to a Haskell Either case might be put to good use, although it seems a bit cumbersome upon first reflection. Again, a little reflection and pseudo code should make things clear.

FWIW, one example I came across (one of the Modula's? I do not really remember) required that the type of yield'ed values and the type of the value returned upon termination be the same. Nicely consistent, but I can't see how that aids in the practical use of coroutines in general. I can just as well imagine a system where coroutines must be of Unit type upon termination (normal routine exit). Hmmmm.

Thanks again!

Scott