## Project Loom: adding fibers and continuations to Java

Just saw this on Hacker News -- Project Loom: Fibers and Continuations for the Java Virtual Machine with the following overview:

Project Loom's mission is to make it easier to write, debug, profile and maintain concurrent applications meeting today's requirements. Threads, provided by Java from its first day, are a natural and convenient concurrency construct (putting aside the separate question of communication among threads) which is being supplanted by less convenient abstractions because their current implementation as OS kernel threads is insufficient for meeting modern demands, and wasteful in computing resources that are particularly valuable in the cloud. Project Loom will introduce fibers as lightweight, efficient threads managed by the Java Virtual Machine, that let developers use the same simple abstraction but with better performance and lower footprint. We want to make concurrency simple(r) again! A fiber is made of two components — a continuation and a scheduler. As Java already has an excellent scheduler in the form of ForkJoinPool, fibers will be implemented by adding continuations to the JVM.

I'm a fan of fibers, and this has quite a bit of interesting material in it for like-minded folks.

## Comment viewing options

### sure (park vs block terms)

Home page works, if you want to move it. Since I also want to discuss, I did not want to seek attention. Rather than dump my impressions, I'd rather chime in, sporadically. I can add a first comment though:

I thought using verb to block for both threads and fibers causes confusion, since this makes it hard to discuss non-blocking styles ported to a fiber runtime. You end up describing fibers as both blocking and non-blocking, so it's a mess. When applied to fibers, I like to park better, which means to suspend without blocking the host thread. You can use suspend as the generic idea, with block and park specialized to threads and fibers.

Edit: I think the proposal is being actively revised; I didn't notice the park/unpark operations in my first reading. I'm cool with this, but wish I had saved a copy of an earlier draft.

### Done. I thought the whole

Done. I thought the whole point of putting on home page is to discuss. I tend to put my thoughts into the first comment. A subtle form of clickbait.

### Misc thoughts

The article seems to claim that JVM's today do not have "green threads" which existed in Java many years ago. https://en.wikipedia.org/wiki/Green_threads Is that true? If so, what happened to them in the interim? Presumably, the answer is quite relevant to the topic.

For improving the popularity of threaded programming in general, in languages where backward compatibility is not an issue, I favor the approach of the D-language which makes static and global variable storage thread-local by default, and shareable by request.

Java does have non-blocking I/O libraries. It doesn't have good primitives to periodically check for completion in the middle of unrelated execution in the current thread. Are fibers a/the best solution to that? I can imagine a simple kind of "Do this when I/O is ready" construct which would not be a thread at all, but simply a code generation strategy to check for completion at some interval and then call a function when the flag is set, without bothering the programmer.

### control flow origami

I've been thinking about a post describing the "why" of fibers, with the problem of saying little about the Project Loom proposal in Java. While on topic, it seems too much about what I find interesting; making it short seems hard too. Also, the "why" part varies depending on one's objective. Every time someone asks how to achieve an effect, you want to ask, "What are you trying to do?" The answer is quite different when I consider things I do at work, versus things I would do for fun. Both involve "doing a lot of things at the same time", but a work context version is about a thousand times worse in scale. There's little point unless there are not enough native threads to scale to a set of activities you want to organize as concurrent, while still viewing them as call graphs for "separate" programs (for clarity in debugging).

The article seems to claim that JVM's today do not have "green threads" which existed in Java many years ago. https://en.wikipedia.org/wiki/Green_threads Is that true? If so, what happened to them in the interim? Presumably, the answer is quite relevant to the topic.

That page on Wikipedia says green threads were used early on, but were replaced with native threads, except in Squawk. I get the impression Java has a model of threads that could be native or green, but not easily both. I guess usually just one is picked. Another useful page on the topic might be https://stackoverflow.com/questions/5713142/green-threads-vs-non-green-threads. (Some comments there suffer from ambiguity when not distinguishing thread from fiber.)

For improving the popularity of threaded programming in general, in languages where backward compatibility is not an issue, I favor the approach of the D-language which makes static and global variable storage thread-local by default, and shareable by request.

As little shared mutable state as possible seems a good idea. Exceptions should be motivated by a story, and an argument explaining safety and correctness in the face of concurrently updated mutable state. (Single writers are a good idea, with one actor as the update bottleneck. When this is not feasible, a good justification is needed with clear explanation.)

Java does have non-blocking I/O libraries. It doesn't have good primitives to periodically check for completion in the middle of unrelated execution in the current thread. Are fibers a/the best solution to that? I can imagine a simple kind of "Do this when I/O is ready" construct which would not be a thread at all, but simply a code generation strategy to check for completion at some interval and then call a function when the flag is set, without bothering the programmer.

Checking for completion is a sort-of single-threaded perspective. You would only care if you intended a control flow join, where one thread needs to coordinate with some other effect that happened concurrently. I want to discuss that topic as a separate post, about the idea of "organizing cooperating strands which divide a single program into several concurrent parts". In effect, I want to run a program as a lightweight process composed of multiple strands in a group which know about each other, and probably signal each other with condition variables. Note here granularity is a motivation for fibers, because instead of one thread representing a task, I want N strands in a group, and I want them as cheap as possible. Suddenly having too many threads is a scaling failure, when you cannot scale by several orders of magnitude whenever needed.

Coordination of a non-blocking i/o API with fibers is simple when a completed i/o can unpark a fiber waiting on that. In this sense fibers are a good solution when the clarity is striking. The completion callback amounts to "awaken fiber". This only gets complex in the context of cancellation. What if that waiting fiber was killed? This is why you want the notification to go to a strand group instead, by process group id, so it awakens whatever fiber was waiting on that, if any. If the whole lightweight process is gone, the id is no longer valid and notification does nothing. (Unless, I suppose, you register surrogates to note messages to processes that are gone, for debug auditing purposes.)

A "do this when i/o is ready" callback mechanism is a typical way async interfaces are structured. I see a lot of this in C. (I do a lot of IPsec stack debugging, basically full time, involving too many native OS processes.) This sort of organization often results in what people call "callback hell" because it is difficult to see how you got someplace, any time something goes wrong. The absence of monolithic call graph for the activity as a whole makes it hard to construct a synoptic view when the past is erased except for breadcrumbs of data here and there. While it totally works, it is daunting to be smart enough to debug, when the author was just barely smart enough to wire all the necessary bits together. You can think of fibers as the same thing, but folded up differently, with a cheap stack providing context in a way analogous to normal single threaded programs, so you can debug more easily.

### The article seems to

The article seems to claim that JVM's today do not have "green threads" which existed in Java many years ago. https://en.wikipedia.org/wiki/Green_threads Is that true? If so, what happened to them in the interim? Presumably, the answer is quite relevant to the topic.

That page on Wikipedia says green threads were used early on, but were replaced with native threads, except in Squawk. I get the impression Java has a model of threads that could be native or green, but not easily both. I guess usually just one is picked. Another useful page on the topic might be https://stackoverflow.com/questions/5713142/green-threads-vs-non-green-threads. (Some comments there suffer from ambiguity when not distinguishing thread from fiber.)

I recall some early VMs which claimed the ability to use both green threads and native threads. Your stack overflow link hints at one reason why green threads became less popular: when multi-core CPUs became the norm, they would not take advantage of the performance boost available to native threads, which would get scheduled on other cores.

These historical discussions suggest that green threads were dropped as a capability after being implemented suggests an overall lack of interest in the use cases where the number of simultaneous threads exceeds practical OS/memory resources. while at the same time, the single CPU/box is still adequate for overall computational throughput - as opposed to needing a rack/server farm/compute cloud. That might still be true...

Java does have non-blocking I/O libraries. It doesn't have good primitives to periodically check for completion in the middle of unrelated execution in the current thread. Are fibers a/the best solution to that? I can imagine a simple kind of "Do this when I/O is ready" construct which would not be a thread at all, but simply a code generation strategy to check for completion at some interval and then call a function when the flag is set, without bothering the programmer.

Checking for completion is a sort-of single-threaded perspective. You would only care if you intended a control flow join, where one thread needs to coordinate with some other effect that happened concurrently. I want to discuss that topic as a separate post, about the idea of "organizing cooperating strands which divide a single program into several concurrent parts". In effect, I want to run a program as a lightweight process composed of multiple strands in a group which know about each other, and probably signal each other with condition variables. Note here granularity is a motivation for fibers, because instead of one thread representing a task, I want N strands in a group, and I want them as cheap as possible. Suddenly having too many threads is a scaling failure, when you cannot scale by several orders of magnitude whenever needed.

Coordination of a non-blocking i/o API with fibers is simple when a completed i/o can unpark a fiber waiting on that. In this sense fibers are a good solution when the clarity is striking. The completion callback amounts to "awaken fiber". This only gets complex in the context of cancellation. What if that waiting fiber was killed? This is why you want the notification to go to a strand group instead, by process group id, so it awakens whatever fiber was waiting on that, if any. If the whole lightweight process is gone, the id is no longer valid and notification does nothing. (Unless, I suppose, you register surrogates to note messages to processes that are gone, for debug auditing purposes.)

This Wikipedia article on Futures and Promises is relevant, but poorly written. Lazy, non-blocking use of futures seems like a good feature, though the article makes it difficult to say where it is available and in exactly what form.

### Fibers to replace ad hoc state machines with vanilla code style

Yes, people note cores are underutilized when pursuing green threads alone. You need multiple threads and/or OS processes to better spread load over cores. Performance has more than one dimension; it depends on what you measure. If you do one thing, and you measure wall clock latency, devoting as many cores to it as possible tends to reduce latency, unless you screw up via lousy locality. If you do many things, and are taking pains to spread load over cores already, you want to get as many things done as possible during a given period of time, and this measure is more like transaction throughput. Ideally, you want as many free cycles left over as possible, after servicing some load, so you can do even more.

Marketing is always going to want to advertise a bigger number of simultaneous peers, maximum number of tunnels, and tunnel setups per second. There is essentially no limit to the demand for number of devices connected at once. But once you've spent all the resources, you're done. All you can do is spend less per thing. Doing this too cleverly tends to introduce bugs. So you want simple structure that scales with little cost, where you can also tell what the heck happened. Threads become a scarce resource in high end commercial stuff, but likely not in personal projects.

Absent fibers, async conversations are typically built around finite state machines of varying informality. The next event looks up an entity involved so it performs the next step in negotiation before waiting on other events. At the same time, the system may alter due to either configuration changes or network errors that abort ongoing connections. The complexity of timing in edge conditions can be really irritating. In my environment I want fibers in C, since more of the processes I deal with are written in C. (Numerous daemons are written in several languages including C++ and Java as well. The Java processes are not as highly loaded as those in C.) I have no strong opinion about Java; fibers just seem a good idea in addition to threads.

However, I think code must be written so it knows (the coder knows) about fibers and threads, as opposed to being ignorant of them, in hypothetical pursuit of strand transparency. Locality is quite important, so you need to know if you have scheduled a task to be performed "close by" for low context switch cost. You also need to know which things must be thread-safe, and which don't. I'm not at all in favor of programmers having no idea what they are doing. So a semantic model must be visible in either language or library interfaces, for a programmer to manipulate. Adding fibers to a language yields something like a new variant of the language, where code written for it would not generally work in the stock base language.

### Fibres

Almost every understanding of fibres is plain wrong.

Fibres are a sequential programming construction, they have nothing, on the face of it, to do with threads or asynchronous behaviour. So comments about Java Green threads completely miss the mark.

Lightweight threads are still pre-emptive. Fibres exchange control (yes, by a form of continuation passing) under program control.

If you want to see fibres and how they should be defined and used, you should look at Felix because AFAIK there are no other systems which correctly implements them.

Fibres, on the surface of it, look a bit like threads because they use indeterminate activation of suspensions instead of concurrent activation. But the difference in the first instance is profound. Fibres can access share memory, they can perform mutations, and they can do so just as freely --- and with the same issues -- as ordinary procedures.

The model of fibration resembles CSP but it is much more constrained. There are 4 operations, basically.

* spawn a new fibre
* read or write data on a synchronous channel
* suicide
* create a new scheduler

The rules of fibration say that that at a synchronisation point, the active status of a fibre is changed, and in particular the running fibre is no longer running. Then, the scheduler randomly picks an active fibre to give running status to and resumes it.

Spawn creates a new fibre from a coroutine, the spawner and spawnee have active status. Scheduler picks new running fibre now. Could be either one or even some other fibre.

Read and write cause running fibre to become inactive. Then, if there is a read and write on that channel, both are made active. Scheduler picks running fibre now.

Suicide kills the active fibre. Scheduler picks running fibre now.

Scheduler construction is a *subroutine*. A new scheduler is created. A coroutine is then made active on that scheduler. The scheduler is then called and the caller suspended. The callee scheduler picks an active fibre to run (initially there's only one so this operation is deterministic).

A scheduler returns control when there are no active fibres. If it is the top level scheduler, then, the program terminates.

Suspended fibres are continuations.

At any one point, *exactly one* fibre is running. This is sequential programming. There is no concurrency. The behaviour of the system *except for the random choice of fibre to run* is entirely deterministic.

A *model* of a fibrated system is one in which the random choice of fibre to run is replaced by a deterministic rule. In Felix, spawn causes the callee to run immediately, a read/write synchronisation causes the reader to run first. The programmer, however, should not depend on such implementation details -- at least in principle.

Felix also has asynchronous event handling, which is used to provide timers and async socket I/O. This is an *add on* feature in which, when a scheduler runs out of active fibres, it checks if there are any waiting on events. If so, it hangs until the event driver releases them back to active status, having serviced their requirements (for a delay, or for socket I/O).

You don't need promises with this system. When you do a socket or timer request, the fibre making the request appears to block. The *other* fibres keep running! In fact in Felix, we do NOT use channels to do this blocking, although we could (it would be more general than the current system).

With this design, the goal is to *optimise* the behaviour so that we can sometimes replace *indeterminate* with *concurrent*. We want to make the stuff run faster but preserve the sequential semantics.

With this design, the programmer faces new challenges. One has to learn how to deal with indeterminacy. This is where concepts like purity come in. A coroutine is pure if it depends ONLY on its input and output channels. Its the same concept as for functions, but coroutines are imperative. So if your coroutine cooperates with others by modifying shared state via pointers, it isn't pure.

If all the coroutines in a system are pure, they can all run concurrently. The problem is to deal with essential cases where this cannot be so. If all your data transmissions use purely functional immutable data structures, then concurrency works. But we need mutable data. At present in Felix I am adding uniqueness typing for exactly the same reason as in functional programming: if we can send *ownership* down a synchronous channel, so the reader exclusively owns the data, then the reader is free to modify it. It no longer matters *when* the reader runs, before or after the writer, or any other fibre.

The primary difference between fibration and functional programming is that fibres are cofunctions. That is, they're functions turned inside out. However, plain ordinary functions are very weak. Functions are subroutines which work by calling other subroutines. The send an argument to the subroutines they call and suspend waiting for an answer. Throw in products and we get the usual tree processed top down.

Cofunctions work backwards. There are many languages with cofunctions, the most important is C++, the whole library is based on them. C++ people call them iterators. That is precisely what a cofunction is. Its an inside out function. With an iterator, you construct a closure with the argument and then demand a result. It is the iterator which suspends between demands, not the caller. Its a tree, but its a bottom up tree.

In Felix, we go much further. Cofunctions are weak, just as functions are. In the past, people tried to wrap data into objects with methods so reduce the problem with conventional imperative programming where a procedure could write stuff anywhere, and there was no codomain type to constrain it. OO doesn't work of course.

Coroutines similarly are not a complete solution but they're objects turned inside out. Iterators are weak, cause whilst you can invoke any number of iterators in your iterator, you can only yield a single value. You have any number of reads, but only one write. Functions are weak for the dual reason, you can pass any number of inputs but you are stuck with a single output.

With general coroutines, you can have any number of distinct input and output channels, just as objects can have any number of methods. With objects the methods are organised around spatial data. Dually, with coroutines, channels are organised around temporal processes: one views a fibre as an *active* process. Critically, coroutines have local data on a stack.

Ultimately there is a well known ways to ensure concurrency in the presence of shared mutable data. Its called a mutex. Locks create all sorts of problems so this is a very bad method, its a last resort. The key idea is to be able to reason about *sequential* code and treat concurrency as an optimisation, in much the same way in functional programming one ignores the fact that persistence would be impractical without a garbage collector. GC is an optimisation. In fact construction is an optimisation (persistent data is not created, it was already there, you just located it .. actual construction is an optimisation).

The critical thing, in my view, is not to throw out functional programming for cofunctional programming but to use both together. Pure FP is a disaster. So is pure imperative. All current attempts to mix both are failures. ISWIM languages like ML are a disaster because referential transparency is lost. Algol like languages are a disaster because purity is lost. Haskell is just a plain disaster. The *right* way to proceed demands *symmetric* treatment of a categorical model and its dual, as far as possible, and this is why continuation passing is king: it unifies imperative and function concepts.

### concurrent vs parallel

(Note I see no problem I favor American spelling of fiber, while you favor British fibre. As long as we're consistent it doesn't seem weird. I'm not at all pushing my convention. Similarly, I hope nothing I say sounds like criticism. My reactions seem to me more exploration of alternate word definitions, where your definitions are not wrong, just different.)

Fibres are a sequential programming construction, they have nothing, on the face of it, to do with threads or asynchronous behaviour.

It sounds like you're saying a fiber does not resemble a thread, in the sense of having a stack, being able to suspend and resume, with partial order in runtime with respect to other fibers. Or maybe you mean both resemble strands, with those behaviors, but a fiber is not at all a native thread, which sounds right.

My definition of synchronous is that when you make a call, to perform an operation handled by someone else (a function or system call, or whatever), you don't run again until the result of that operation is returned to you. When routine A calls subroutine B, A does nothing until B is done and returns a result to A. Asynchronous means not synchronous, so you don't wait for the result: you keep running (modulo scheduling) while the callee might or might not perform that operation during a timespan interleaved with your own execution.

If fiber A asks that something be done, which is performed by fiber B, that's an asynchronous operation when A and B can have interleaved execution after A makes the request but before B is done generating a result. In this sense fibers have something to do with asynchronous behavior, if fibers cooperate and have interleaved execution. So I'm not sure what you mean.

Lightweight threads are still pre-emptive. Fibres exchange control (yes, by a form of continuation passing) under program control.

By "lightweight thread" you mean fiber? Fibers can be pre-emptive, but don't have to be, since an implementation can have cooperative scheduling, where context switch happens only at known points, rather than any arbitrary point in fiber execution. (Making data shared among fibers safe against fiber context switch is simpler when you know it cannot happen some places.)

Fibers exchange control when a fiber scheduler switches them, which might be provoked when fibers deliberately interact with a synchronization primitive like a condition variable (fiber flavor rather than thread flavor) or a mutex (again fiber flavor). I would implement a channel using those, so performing i/o uses condition variables. Fibers might also alternate control when a scheduler decides the current fiber has run long enough.

In my model a fiber might also swap which continuation it was using, so that one fiber was both sides of a producer/consumer coroutine setup. But since fibers are incredibly cheap, there would be almost no difference between using two different fibers for that, except complexity. (You would be inviting programmers to mess directly with continuations, when they could loosely ignore this when using fibers in style mimicking threads.)

Suspended fibres are continuations.

Yes. And a stack in addition to the continuation, plus a presence in scheduler mechanisms (i.e. a fiber has identity as an object apart from the continuation is runs next). In my model a fiber is also a continuation when it is not suspended; every return address is a continuation, as well as each call address when winding the stack instead of unwinding. So the scheduler is mostly a continuation switcher in a trampoline that also decides who runs next.

Suspended threads are also a continuation plus a heavyweight stack, plus identity in a thread scheduler. But that might not be a first class continuation, letting you change it easily.

At any one point, *exactly one* fibre is running. This is sequential programming. There is no concurrency.

Yes, within a thread only one fiber is running. In other threads more fibers might run in parallel on different cores, if there is more than one core.

So there is no parallelism of fibers within one thread, when they cannot run simultaneously. But you can still say they are concurrent though, if you subscribe to a definition of concurrency like the one on Wikipedia (https://en.wikipedia.org/wiki/Concurrency_(computer_science)) characterizing concurrency as decomposition of programs in to partially ordered sub-parts.

When lifetimes of fibers are overlapping, with interleaved execution, you get classic concurrency problems even when they cannot run simultaneously. Suppose one fiber with identity X wants to perform three operations A, B, and C together as an atomic unit, but each of those operations has a way to yield control to other fibers. To prevent other fibers from seeing that atomic operation only partially performed, fiber X must protect the whole with a mutex at least. However, with no mutable state shared with other fibers, no one can perceive A+B+C unless X sends that result when X is good and ready. So there is only a problem when A, B, and C involve mutable state other fibers can see too.

Sometimes folks distinguish between concurrent and parallel to address the difference between logically overlapping lifetimes and potential physical simultaneity. Here a typical discussion on Stackoverflow: concurrent-vs-parallel.

Ultimately there is a well known ways to ensure concurrency in the presence of shared mutable data. Its called a mutex. Locks create all sorts of problems so this is a very bad method, its a last resort.

Badness is relative to contention, mutex implementation, exposure to pathlogical faults in wait-for graphs, and lack of planning on a programmer's part. Mutex can be very cheap, especially without contention and without gratuitous yield on lock (say in pursuit of fairness). With fibers, a mutex can cost nothing but recording identity of a lock's holder, when no other fiber tries to also lock while it is held.

If fibers try to hold more than one lock at once, they are exposed to deadlock unless a canonical lock order is used to prevent this. If a low priority fiber locks a mutex and makes a high priority fiber wait, you can get priority inversion, unless you plan a way to defuse this. On the whole, aiming for as few locks as possible seems a good idea.

If there's something crazy you can do with threads, you can probably do it with fibers too, unless an interface is stripped of options letting you do those things (for the programmer's own good). With fibers, though, it's probably easier to write tools that walk graphs of relationships and analyze what went wrong, because race conditions are less horrible absent pre-emption.

### programs as cooperating strands in a group

About two weeks ago I mentioned a separate post on this topic, so here it is. (Note strand is a generalization of process, thread, fiber, or coroutine: a thread-like thing representing a sequential control flow, but scheduled concurrently with other strands.) Before I explain, consider a question for context. In your favorite concurrent runtime, how would manage a group of strands? Suppose you use a language with threads, or coroutines, or lightweight processes -- anything that gets you logically asynchronous concurrent scheduling of multiple control flows. How do you represent a program that requires a group of related strands?

I'm using "program" in a very vague and general sense, as opposed to "executable binary", so it doesn't mean the piece of software you link and execute in your operating system. It means one activity in your application, that might have several concurrent parts. For example, in a network app, when you service a connection, you might perform a bunch of asynchronous operations all on behalf of that connection, so that one thread or one strand cannot represent it, if you need one strand per async piece. If the connection aborts, or if you kill the program, you want all those strands in the group to gracefully terminate and cleanup, without disturbing the rest of the runtime.