managing closed worlds of symbols via alpha-renaming in loosely coupled concurrent apps

I enjoyed discussion in Why do we need modules at all? (about Joe Armstrong's post on the same topic) more than anything else on LtU the last few years. Several comments reminded me of something I've been thinking about several years now. But describing it doesn't seem on topic in reply (or, I lack the objectivity to tell). The modules thread is already long, and I don't want to pollute it with junk.

This is mainly about transpiling, where you compile one source to generate another, with a goal of automated rewrite, where both syntax and complexity in the input and output are similar: compiling one language to itself, but re-organized. For example, you can use continuation passing style (CPS) to create fibers (lightweight threads) in a language without them natively, like C. Obviously this is language agnostic. You can add cooperative fibers to any language this way, from assembler to Lisp to Python, as long as you call only code similarly transformed. You're welcome to free associate about this or anything else mentioned, though. For the last year or so, almost all my thinking on the topic has been about environment structure, lightweight process management, and scripts effecting loose coupling relationships via mechanisms similar to Unix, such as piping streams, etc. Once you work out how a core kernel mechanism works, all the interesting parts are in the surrounding context, which grow in weirdly organic ways as you address sub-problems in ad hoc ways. Thematically, it's like a melange of crap sintered together, as opposed to being unified, or even vaguely determined by constraints stronger than personal taste.

Note I'm not working on this, thus far anyway, so treat anything I say as an armchair design. There's no code under way. But I probably put a couple thousand hours thinking into it the last seven years, and parts may have a realistic tone to them, in so far as they'd work. It started out as a train of thought beginning with "how do I make C behave like Erlang?" and grew from there. I've been trying to talk myself into wanting to code parts of it, sometime.

Several folks mentioned ideas in the other thread I want to talk about below, in the context of details in the armchair design problem. (I'll try to keep names of components to a minimum, but it gets confusing to talk about things with no names.) Most of the issues I want to touch deal with names, dependencies, modules, visibility, communication, and the like, in concurrent code where you want to keep insane levels of complexity in control if possible.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

the abstract ad hoc

I think this bit is deep and fairly lucidly stated but easy to overlook:

Once you work out how a core kernel mechanism works, all the interesting parts are in the surrounding context, which grow in weirdly organic ways as you address sub-problems in ad hoc ways. Thematically, it's like a melange of crap sintered together, as opposed to being unified, or even vaguely determined by constraints stronger than personal taste.

The concrete ad hoc

Unix is a concrete example of an ad hoc approach to system design.

It has a small, unifying kernel mechanism but all the interesting parts are in the surrounding context. Those surrounding parts grew in ad hoc, weirdly organic ways. The assemblage of parts is patterned and navigable and clearly reflects the problems it was meant to address ... but the pattern is unprincipled. The parts grew in response to the problems confronted and nothing more.

The concrete structured

Smalltalk-80 is a concrete example of a highly structured approach to system design. Nearly every problem encountered as it grew is taken as a representative of an abstract class of related problems -- a hierarchy of such classes, in fact. Much of the system design attention is spent on trying to find a unifying taxonomy of abstract, highly generalized problems and solutions.

The abstract structured

I think PLT as practiced has tended to implicitly assume structured, rather than ad hoc system architectures are preferred. There is a constant push for programming language mechanisms that support ever more abstract generalizations of problems and mechanisms to organize those abstractions into vast, hypothetical libraries of prior work.

What Smalltalk-80 exemplifies concretely -- that design-as-quest-for-simple-order -- PLT has come to focus on abstractly.

Contemporary PLT is (mostly) a study of "the abstract structured" in computing system design.

The abstract ad hoc

I do not think there is lately much PLT thought spent on the abstract ad hoc: language theory and design in support of the ad hoc nature that describes unix.

Now we are talking about that, though.


design patterns, traits, auto-wiring, constraints

The 'abstract ad hoc' reminds me of OO design patterns, object capability model design patterns, etc.. Each pattern addresses a particular problem, but there is no common principle, no generic way to compose them, just a bunch of forces and contexts.

Also, traits or mixins might fill that role to some degree. While there is some structure with how traits are composed externally, the effects of a trait is much more ad-hoc. Traits themselves are abstractions of ad-hoc properties, rather than composition of structured models.

OO in general is rather ad-hoc with regards to composition, but also tends to be rather concrete. Potentially we can abstract the composition aspects, e.g. with auto-wiring of objects, or automatic discovery of ambient resources. By doing so, we may at least abstract a relatively ad-hoc behavior.

Constraint models and the structures they produce tend to be very ad-hoc, organic, purpose specific, but also abstract.

abstract ad-hoc to abstract structured.

Design patterns being abstract ad-hoc, can be formally expressed as Concepts (type-classes combined with associated-types or type-families, and type constraints). Concepts become structured when arranged like Stepanov's Elements of Programming.

maybe chaotic good alignment favors abstract ad hoc

I'm trying to cut things I want to talk about into bite-sized pieces. I hope I don't get lazy and just respond to specific parts of your comments in the modules thread — several of which impressed me, like the ephemeral nature of arranging communication linkage in scripts. I feel guilty about wanting to create more code via copy and rename, though, since Dillinger wants to end up with less instead of more. But I think there's scope for managing how disjoint or shared code gets, as part of a deliberate examination and rewrite.

I do not think there is lately much PLT thought spent on the abstract ad hoc: language theory and design in support of the ad hoc nature that describes unix.

Yes, that's a fruitful area lacking attention. :-) I want to harness a few Unix style benefits in user space process management, partly as a means of organizing what happens so it's possible to debug what occurs in one process with thousands of independent flows of control. An ad hoc approach appeals to me for multiple reasons. I usually want to shed rules to get degrees of freedom back, to restore options in design choice. But it may also be from high cognitive disinhibition, which happens to be often useful when you need answers to novel problems, and making something up out of thin air is productive. It looks crazy to people until it suddenly yields a high grade result. Anyway, I can describe another flavor of concrete solution benefiting from a style of doing things one finds in unix.

One thing that is useful

One thing that is useful about "Unix style" is that there is a universe of useful mashups people can create - relatively easily - on the basis of the catenative use of command line utilities via pipes, streams, etc.

By agreeing on a general interface (in this case streams of text) as a boundary between modules, they managed a much more definite and stable abstraction boundary between parts of the system than we usually manage to enforce with, eg, OO programming. Process level separation allows complete replacement using bits that have absolutely no runtime in common, and don't even necessarily use the same ABI.

When I envision the same kind of approach to general programming, I wind up thinking of every module being represented by its own separate executable, each having a different process space and communicating with others only by explicitly sending information over ports and sockets. Given what we've learnt about threading difficulties and shared memory/state in multiprocessing systems, not to mention physically dispersed systems, why are we still even thinking about making these huge monolithic programs that do that? It turned out to be a bad idea.

A huge side benefit of this is that some things simply don't need to be repeated. A "browser" has become a universal interface for webpages, so if your program just starts and communicates with a browser, that can BE its whole UI. your program has more on-task things to do, am I right? So there really only needs to be one or two or three "browser" executables on the whole system; that's code we don't need a repetition of in every application, and if it's widely used it'll get more concentrated effort and debugging than UI code tied into each of hundreds of individual executables.

re: Unix style

Something important to remember about the "Unix style" of pipes and such is that the vast majority of software in Unix doesn't work like that. Instead, a lot of communication is very ad-hoc, through shared files and filesystems, databases, shared memory, sockets. Indeed, when Thomas Lord speaks of the 'concrete ad hoc' and Unix, it doesn't bring to my mind the concrete structured use of pipes and file streams. Rather, it reminds me of the organic mess created in their wake, applications that are very frustrating to mash together.

and then we give it a HyperCard ui, yeah, that's the ticket

useful mashups people can create - relatively easily - on the basis of the catenative use of command line utilities via pipes, streams, etc.

Concatenative streaming is also especially effective at coordinating concurrent pieces, so control flow joins occur fluidly as enough content accumulates in stages to fire additional handling. Traditional top-down call tree organization is particularly bad at waiting for stuff to show up, and dealing with spontaneous out-of-band messages. Stream piping is a good divide-and-conquer strategy for some things. When that strategy fits a problem, it's awkward there's no easy way to do it, normally, without using separate processes, which is heavyweight (when scaling that is; when not scaling, not much matters anyway).

By agreeing on a general interface (in this case streams of text) as a boundary between modules,

A lot of value comes from "agreeing on an interface", and even having an idea of interface as abstraction manipulated as part of design and engineering. More often I see developers proceed from a position of assuming "global pervasive truth" which crashes and burns when sharp edges don't actually line up perfectly, partly because the idea of edges in conditions was presumed easy to remove. There's an awkward survivorship bias in the self-selection of people good at programming, because they can follow rules accurately, and they start education in a nicely broken down jigsaw version of reality where everything has a correct answer. The reward system is all about ego-stroking and ranking people for coming up with the same cookie-cutter answers expected by teachers, who can lack imagination in lower educational ranks. (Sometimes this is why I think smart people often suppose everyone old is an idiot, from age, because that was their experience with nearly every teacher. Finding a smart older person to interact with is not easy for children and young adults, so experiencing the right givens for a valid Bayesian conclusion is hard.)

You can also stream message sequences, not just undifferentiated sequences of bytes, like piles of sugar or sand running down the bottleneck of an hourglass. This model matches what Erlang does: message-in maps to message-out with associated type metainfo. Byte streaming is good for low-level data-plane content switching, while message streaming is better for high-level control-plane policy switching. You can make either make do for everything when the fit is less natural, and automation removes the sting of conversion.

Process level separation allows complete replacement using bits that have absolutely no runtime in common, and don't even necessarily use the same ABI.

Process separation is pretty awesome. That's one reason I don't use process to describe a lightweight (green) process, because they don't have address space separation, and can interfere quite nicely if you let them. If an OS process is like a swimming pool, chopped up into lanes scheduled cooperatively, then waves can totally interfere with other swimmers in the same process. (Here I'm pretending there aren't native threads too. A lane is really a named sub-group of fibers in a thread, all doing some cooperative activity concurrently.) If one lightweight process really doesn't interfere with another, and they only interact via immutable byte or message streams, you can migrate peers into different processes, or even different network nodes, and tests should still pass unchanged. That's how you prove it experimentally. But for debugging purposes it can be nice to do coarse grained debugging in one process, freezing them all at once under gdb, say.

so if your program just starts and communicates with a browser, that can BE its whole UI.

I like the idea of using a browser as an ad hoc inspector a lot. It can be cheap to support extremely fine-grained state getters, because you really only need to send back a little text at minimum, instead of turning every little thing into another part of a monolithic gui that has to have a consistent user model.

On the other hand, you can also use a native widget library to wing an interface that might be hard to do in a browser. One of the things I really miss about Apple's MPW (Macintosh Programmer's Workshop) is worksheets in the shell, letting you select and execute any piece of text you want. So instead of dealing with unix command histories, you just keep piles of script around and execute them as soon as you see them. And you can write meta-scripts to go find, select, and execute parts of other scripts, including those just generated by yet another script (written by tools that are coroutines running in the shell process). Part of the story I told is also indistinguishable from a plan to clone MPW, but I didn't let the character Ty go off on a tangent about YPW.

Actually, it's not that hard to do anything, if not for wrestling with monolithic tool sets and stacks that have high barriers to entry, flaky usage requirements, painful failure modes, agonizing debug disciplines, and tech zealots who rank everyone they see on the are-you-awesome-with-my-tool scale. This is one of the reasons programming gets boring eventually, because everything is a grind, and all you get is something you knew was going to work anyway, welcomed with boos from folks browsing Facebook. (I'm not bitter, that's just rhetoric. I can turn emotional tone in text every which way. My dad was a marketer, coincidentally specializing in that.)

abstract ad hoc name resolution in ephemeral envs

From features discouraging sprawl, this part on ephemeral environment caught my eye:

It would seem that programming systems for high-level, loosely coupled composition must assume some form of ephemeral environment that is responsible for resolving the symbolic names of loosely coupled components and actually effecting their creation, destruction, and connection.

In shell scripts, the ephemeral environment is implemented by the run-time system in /bin/sh along with the unix kernel. Certainly there are other possible ways.

That ephemeral environment, generally speaking, has to be stateful. The high-level component calls upon the ephemeral environment to create new loosely coupled components, connect them, later destroy them, and so on.

As far as I know, a lot of loosely coupled systems find parties to interact with via naming systems, in some form of pub-sub or managed namespace, which would include a file system. (Blackboard style tuple spaces are another way to connect components based on dataflow bindings.) The mid-90's OpenDoc system needed a naming system to find part editors and readers registered at runtime.

As soon as I started thinking about shell scripts running in a fiber VM, I realized a file system -- even if just locally visible -- would help a lot in coordinating agents trying to avoid tight coupling. To avoid dependence on a particular system's organization of file systems, you'd have to simulate one in a standard way, then let parts of an OS file system be mounted in the virtual system to cause convenient interaction with the host enviroment.

dialog about C rewriter for fiber VM and associated shell

[Here's a story intended to introduce the armchair design I don't want to really describe, unless there are questions. Use of dialog format makes it possible to compress Q&A content, and to show nature of design as argument, compromise, division of labor, and pragmatic cooperation. Each character has a focus I won't explain, and some may not be used (Crash is out of town). All of them work together now, or did in the past, and everyone but Ivy is fine with a current C focus in tools. Wil is the primary fiber VM implementor as well as the guy working on tools rewriting C to coroutine form. Ty is the environment and shell guy, Ivy is the only one interested in C++ integration, and Ned is a critic/worry-wart with good points. Dex wants to be right. Al loves Linux. Sam is a regular guy BSD dev who jockeys ad hoc concurrent systems for network stacks in user space. In this scene, they explain the significance of components to Sam, as well as Wil's private jargon for runtime mechanisms, as it would affect Sam's stewardship of Acme's flagship product with the lamentable name of Shazzam, using Sam's Z virtual machine named zam. This story is basically about Ty's shell, in so far as it provides Unix features Wil wanted enough to bring Ty on board. Wil self-servingly edited the dialog, making everyone seem to avoid repetition and roughly stay on track.][Edit: In an old tv show named Gomer Pyle USMC, the title character likes to exclaim "Shazam", and thus the joke.]

     "Why do I want to rewrite dedup running now atop zam?" Sam asked. "It's fine, fast enough, and pretending nothing is wrong works for me."
     "It's hypothetical, Sam," Ivy reminded. "Imagine being able to tell what's happening. Not just stats."
     "Who said you could use codename thorn?" Sam asked Wil. "Someone is using that already."
     "Not when I started using thorn ten years ago," Wil observed. "Anyway, glyph þ was sometimes replaced with lowercase y in old English, so I use y and thorn interchangeably. Convenient for use as both vowel or consonant. Our VM, the thorn async machine, is called yam like the food, and an associated runtime including environment is yurt."
     "Like that hut dwelling for nomads out on the steppes," Sam rolled his eyes. "Is that a runtime-as-home metaphor? I don't care about the names."
     "Me either," Wil nodded along, "but things nevertheless need names to forestall confusion. What did you name your shell again?"
     "Gomer," Sam replied. "Which talks to zam over Pyle ipc."
     "Surprise, surprise, surprise," Ned offered, doing an impression of Jim Nabors. Sam ignored him.
     "Gomer's syntax is pretty weird," Ty said, making Sam shrug. "And you can't run Gomer inside zam."
     "Why would I want to do that?" Sam asked. "I only use Gomer to configure Shazzam network parameters, as a Linux command line tool. We don't need shell scripts inside Shazzam. I can just system() a command line whenever I want."
     "If you don't mind Linux process launch latency," Ty marveled. "Spawning a green thread takes, like, a microsecond. You can't launch OS processes cheaply, like for every new connection. You can only afford to do it rarely, when it won't matter. But that's not really the point. Wil likes using scripts for remote testing. That's what got me involved."
     "Can you speed this up? I have a meeting soon," Sam warned.
     Ivy and Wil exchanged a look, then Ivy turned to Sam and made boxes in the air with her hands, obviously visualizing some kind of layer architecture reflexively, before explaining.
     "Today, for dedup," Ivy began, "zam talks to Wil's low calory async job system called jam written for Acme to do all the async i/o involved in the codec's caching subsystem, running in another process and using yet another layer of async communication architecture. Each layer has its own stats, own unique async api, unique logging and failure modes. You can't even exercise some of the code paths, unless a specific rare condition actually happens in live execution. Wil, did it drive you nuts you can't test all the states jam fibers get into under dedup?"
     "In the worst way," Wil agreed wholeheartedly.
     "If you replaced jam with yam, from thorn," Sam wondered, "wouldn't it be exactly the same?"
     "Almost," Wil nodded. "But yam is more completely uniform, and not specific to dedup the way jam is, and makes it easy to spin up arbitrary lightweight processes to do anything, like listen for connections, or handle events delivered from some other party who manages all the connections, like the zam runtime does inside Shazzam. So you could drive a yam-based dedup like a black box from a remote process that modeling what is suppose to happen, and methodically exploring all the states. You can write generic server and agent setups with yam."
     "Interesting," Sam allowed. "You would even be able to test it outside Shazzam, too."
     "Yeeesss," Wil smiled. "The dedup black box setup based on yam fibers doesn't need to know where it's living. So I could stick it in another environment, like a daemon Ty uses to host his shell runtime, and drive it there, from the same daemon process or from another one. Then after embedding the whole shebang inside Shazzam, again, we can still drive it from shell scripts in Ty's daemon. That might require some debug-only infrastructure, in the form of yam agents waiting to receive instructions about what to do."
     Ned interpreted Sam's pause as confusion. "The yam core is a stripped down kernel that doesn't do anything but schedule fibers, unless you extend it with a yurt runtime," he explained. "It's a closed symbol system that can't even talk to the outside world unless you subclass the yenv interface to process calls implemented outside, in the environment, which is required when a call might block because yam jobs can't block, they only park when waiting for an async continuation."
     "Job?" Sam prompted.
     "Oh, I use fiber as the abstract term, and job as the concrete term," Wil explained. "The general idea is fiber, but the specific implementation is a job. Some words are reserved to refer to native OS entities, like process and thread. So when we talk about a process, it's always an OS process; and when we talk about threads, it's always a native thread. The lightweight entity corresponding to a green process is a lane, which is a collection of one or more jobs, plus infrastructure resembling a process like an id, so you can kill it, or send it signals or messages, which awaken whatever job is parked until such an event occurs."
     "I wish you wouldn't use job to mean that," Sam said sourly. "As far as I'm concerned, a job is a command line artifact, where all the processes launched for one command in a pipeline are identified as a job. How it can mean fiber?"
     Ty smiled and nodded. "When my shell executes a command line, it spawns a job specifically for that command. So there is a one-to-one correspondence between command and job, but it's a shell job: the fiber associated with the command's pipeline."
     "Oh," Sam said. "I guess that makes sense. Lane is a weird term, though. Does that make a job a car?"
     "Like a mining car that runs on rails, you can think of a job as a fiber car housing the continuation of that green thread," Wil suggested. "All jobs belonging to one lane run only when that lane is scheduled. Both thread and fiber are metaphors based on linear continuity, and so is a lane. I actually like how it's hard to confuse term lane with anything else, as it's virtually unused in computing."
     "No, I've seen term lane used before," Ned disagreed, but Wil waved him off.
     "Two more questions, then I need to go," Sam said. "Why does there need to be a shell? And is it hard to write code for a job fiber? No wait, four questions. Is the thorn runtime multi-threaded, and does it use garbage collection?"
     "In reverse order," Wil replied, "thorn refcounts, which is a weak form of GC, and a single yurt scope is single-threaded, so for multi-core you use multiple instances and load-balance with message passing to coordinate when necessary. This means we only need synchronization mechanisms that are cooperative for fibers in one runtime instance. And then just thread-safe message queues between them with copy-only semantics."
     "To help write job fibers, Wil has a compiler," Ivy grabbed a chance to jump in. "Or rather a transpiler, that converts conventional looking input C into kinda weird-looking output C after a continuation passing rewrite. The CPS conversion makes each job cooperatively async and non-blocking -- a style sometimes called stackless, except it uses cactus stacks allocated on the heap. In Wil's model, the abstract idea of heap is satisfied concretely by a vat which allocates all refcounted objects. Stack frames are refcounted, and tail call optimization is directly supported as needed. I can write a Lisp compiler that outputs plain C, which passes through Wil's compiler, then run zillions of non-blocking Lisp fibers on yam. Should be fun."
     "Wil's compiler is named..." Sam prompted.
     "It understands a subset of C we call Cwil, using Wil's name as a subscript, pronounced quill," Ty enthused. "It was my idea."
     "Or swill if we use a soft C," Ned reminded, and Wil grimaced.
     "Why is a shell necessary?" Sam repeated.
     "It isn't, it's just convenient," Wil shook his head. "You can hand-write spawning code, stuffing input arguments into stack frames. But you lose things that way. First, suddenly it's hard to log what happened. The more you spawn lanes as programs using a command line, the easier it is to maintain a history summarizing activity. Second, with a shell you suddenly have easy remote access, because you can just open a shell connection, or send one-off command messages. If you thought it was creepy to have Ty's shell embedded with yam in Shazzam production builds, you could strip down the virtual file system visible inside, so not much could be invoked."
     Sam hung his head suddenly in a mock slump. "Virtual file system," he mumbled. "Next time. I gotta go."

containing sprawl by partitioning into disjoint sub-programs

The dialog above provides a little context to remark on ideas like that of sprawl discussed in the modules thread. (This quote comes from Thomas Lord's What do you mean "anti-democratic"? post:)

I mean that if you have a big messy sprawl program -- eye-crossingly inter-dependent code and a huge lines-of-code count -- it requires something like a persistent bureaucracy to monitor and manage.

Even if a single process is huge, with a lot of lines of code in total, you might arrange sub-programs inside are small and can be understood in isolation -- if a tool can show you the scope of code that doesn't make calls outside itself in the same process, other than to a well-defined set of environment calls.

For compiling sub-programs as coroutines, so they can be invoked by command line in a virtual file system, I had in mind a tool that does whole-program analysis for a sub-program. It does require a lot of annotation of source, to show how it's compartmentalized, unless it can be automatically inferred by rules like naming conventions.


Superficially sounds a lot like Docker. Shell, internal filesystem, separate processes communicating with Unix pipes. Just build a bunch of Unix programs and ship in Docker.

not personally familiar with Docker

Superficially sounds a lot like Docker.

Probably, and Thomas Lord's observation about abstract ad hoc advantages to unix are consistent with similar tactics usually being a good idea.

The idea of yam and yurt etc mechanisms, in the dialog, is mainly the idea of embedding one async system inside another, as a device the host system gives async requests, which get handled locally but with behavior like it was submitted to a remote server. It composes, in that the nested async service can issue other async calls to more services, both local and remote, and it all folds up into one reply to the host. So indirection is transparent, except the extent monitoring and debugging obligates you know what happened, really.

I know the async embedding works, because I've done it (the fictional jam interface in the dialog), but in a bletcherous ad hoc way. Using Docker is off limits to Sam and Wil in the dialog though, because Shazzam is a monster process consuming and scheduling all local resources, except when delegating to embedded pieces.

Docker is deployment method not VM

Docker is not a VM, it is a deployment method. Of course its not exactly what you are describing because it only runs on Linux, so you need a VM to run it on other OSs.

Docker uses Linux containers, which are light weight containerisation, like Solaris had. So the objection regarding monster processes is not accurate at least when hosting on Linux.

mm okay

Didn't say Docker is a VM. I must not have made the point about a monster process clear: there are no resources left for Docker when one real time process per CPU is hogging it all. Why is it you think I should care about Docker? (For all topics X, a suggestion to "go read about X" is a non-starter without powerful incentive; references are never relevant in my experience.)

Why can't you use docker?

I am not a fan of the 'narrative' explanation, and I find it hard to understand what you want. If you could explain why Docker is not suitable (as it seems a nearby point in the design space) it would help me understand.

I am not sure I get your resource issue either. The containerisation is part of the Kernel, so real time processes can still be run, and can still use all available resources inside a container.

it seems slightly rude to ask like that ...

There's what I want, then there are use cases -- not the same thing -- which overlap. Narrative is far shorter, partly because sparse when only suggestive. So far the narrative version was a bad investment of my time, but it's always a gamble other folks say something interesting instead of just that-reminds-me-of-xyz.

The story described an actual situation at work, which I'm not at liberty to describe in non-narrative form. (And it would be much longer, and boring, and violate norms here even worse while giving slim odds of reward.) There's a piece of production technology with a very large number of man years in it, that cannot be abandoned, but can be changed by extending it according to its own rules. Using Docker would amount to completely different architecture, and would be starting over. A person can start over on a personal project, but a company cannot start over on a thousand years plus of accreted software.

I could write another story, about the same length, showing a realistic reaction to proposing just one piece be changed to Docker. It would sound like, "Hi, I want to ignore your plugin system, crowd aside the whole architecture, and use this new container approach because it would be fun and make me look cool. Can I have a substantial fraction of storage and cycles?" It would be seen as lunacy, and evidence of having lost it, degrading future reputation as a crank. In reality not even a meeting would occur; people would exchange looks and talk to you less, but not fire you, yet.

(A coworker of mine wanted me to talk folks into using Erlang, probably because that would be resume enhancing. That would have had a slim chance of succeeding, at the cost of implying we could not cut working in C. The main resistance would be to using a new language mixed in with what's there now. They won't permit C++, but will permit Lua, but nothing else as a major language in a main process they are paranoid about. You are familiar with the way folks don't like using new languages, right? Getting everthing into one process is their version of efficiency, shared state danger be damned. Anyway, would embedded Erlang have high quality with other parts of the process stepping on it's runtime? No, it would fail and have to be reverted in retreat. Using Erlang in another process would work, but ignores the plugin system and adds latency for queueing into another process above what already happens, on the hot path to getting traffic on the wire. To get the resume enhancement, you would have to bamboozle folks into believing performance would go up when it wouldn't, and I have an honest streak I can't seem to shake.)

The use case here (later we can look a others), is a process managing thousands of network flows in a load balancer in one process, because the whole network stack is inside this process (with a fantastic number of engineering years of inertia and very profitable). When dedup is enabled, which only makes sense if the pipe is too slow to stuff in bytes as fast as they arrive, you can burn CPU to compress it by removing duplication, if you can get this done before the pipeline is clear enough to take more output. In effect, you want to take an i/o bound system and make it CPU bound, or come as close to balancing them as you can. Thus CPU is bandwidth, and when you waste CPU, less gets on the wire in the same amount of time, and you lose. Nothing is free. For each flow, after cutting it into pieces you can deduplicate each piece separately in an embarrassingly parallel algorithm. (Choice of where to cut makes a dramatic difference.) So for each flow, you want N parallel job fibers deduping asynchronously before they join to write in the correct order. Storage is another process, with an async api, so the job system must marry one async api to another, everything async every which way. For brevity let's call this jam for job async machine. It is required to look like a server embedded in the current process; otherwise it subverts the plugin system. Except it shows current network flow state transparently to the host load-balancing process, which wants to make policy decisions like "should I xoff now?" in the tcp part of the stack for this connection, without any information hiding in the layers, like being able to see all the plumbing and wires in your walls at home. Cancelling jobs, when a network connection aborts, is pretty interesting, but I won't describe it except to note I'm glossing over details that exist despite your not knowing about them; there's a lot of hand-rolled synchronization amounting to cooperative mutexes and condition variables. The load-balancing process is limited, in part, by how much state is used per flow, because there's only so much RAM. The smaller jobs get, the more dedup is feasible, so we want job fibers as small as possible because we'd use a quarter million if we could and the platform had enough resources. I'm sure I don't need to tell you these cannot be Linux processes even if Docker is used.

What I want, which matches this use case by the way, is tiny stacks in as many lightweight processes as I can get in the form of fibers, which can run in a real time process by virtue of never blocking (a worker thread queue does blocking stuff) and staying inside a timeslice before yielding to the host. A single stack activation frame should be enough, but often several frames must be held to track enough state. Using setjmp/longjmp is stupid because that just copies something before using it, instead of using it right where it is now. (Copying does not increase speed.) As many fibers as possible, which cannot block. They also need to support streaming just like pipes, when what is being done is async network flow dissection, disassembly and reassembly. This isn't hard to do by hand when you don't mind having no idea what is happening at runtime. (When someone overwrites a job with a memory stomper, you have to prove jam didn't do it, that someone else did; guilty until proven innocent.) But if things looked like Linux processes, with command lines and a proc file system, you'd be able to tell what was going on, using tools that make sense. In fact, why can't jobs be written like they are just ordinary threads and be compiled into coroutines? Life would be so much easier.

Getting there (hopefully)?

Sorry, I didn't mean to sound rude, and I can see from this that you are talking about something completely different. Are you looking for something that can compile a multi threaded single process network program with blocking IO into an event (async IO) model (still single process/multiple thread) by way of a program (source to source?) transformation for the purpose of improving legacy code?

Is this adding thread pooling to a program that currently launches one thread per connection (like stateful TCP connection). You want to have a fixed size thread pool, and assign incoming packets to any thread getting the state back from memory, effectively requiring event based async IO and multiple threads?

Then you want to view the state of the 'fibres' referring to the 'event' states overlaid on the underlying threads from a REPL loop where you have various commands to manipulate the state.

You can encapsulate the event pattern in an error I continuations monad, which gives the appearance of a linear control flow. That would solve the programming problem as a DSL in Haskell. Of course you could write this as a source-to-source transform on a restricted C subset. So jobs could be written like ordinary threads (except you need to make sure all calls are nonblocking). The source to source translator could easily include a REPL loop, and some standard diagnostics, even hooks for adding new commands.

depends on host env that embeds fiber runtime

Note this is not at all productive from my point of view, since I'm not learning anything. I already knew it was hard to get folks to understand anything said about threads, fibers, blocking, and scheduling, etc. So this isn't different.

Is this adding thread pooling to a program that currently launches one thread per connection (like stateful TCP connection). You want to have a fixed size thread pool, and assign incoming packets to any thread getting the state back from memory, effectively requiring event based async IO and multiple threads?

Good god no. That's an emphatic no, like I'm not sure you read what I wrote. (I'm really tired of talking to you and I'm done after this, no hard feelings. I've been dieting a good while now, but I'm only 25 pounds lighter and still have 25 pounds to go, and being hungry all the time makes me cranky as hell. I suspect it skews my idea of what it's okay to say out loud. I may tune this tomorrow, if I decide I'm being a jerk.)

Nothing in what I said implies I want or need a thread pool. In exactly one spot, where I said fibers must not block, that a thread pool could be used to avoid blocking. I'll say this in more detail so you can imagine cases in sufficient detail. First I'll describe the work production environment again, slightly, with respect to thread constraints. Call the work context Z (after the fictional zam runtime, which is not its name, of course).

Plugins in context Z cannot block. (There's a watchdog that kills Z if a heartbeat in shared memory does not change fast enough. During init, some blocking things happen, managing not to kill Z if they finish fast enough.) More threads in Z are forbidden. There cannot be a thread pool. Fibers used in plugins must yield suitably, and cannot call anything but an async api. If yam from thorn was used to write fibers called in the Z context, none of the fibers could call a blocking api, or require use of any code that would block, and there could not be a thread pool. However, Z manages scaling numbers of network connections asynchronously, without blocking api under threads. The Z plugin scheduler for flows does this. So it would be possible to write a server, in yam, like an http inspector, without threads or a thread pool, without any blocking api used. Maybe that sounds weird, but it's true. So you could write a shell in yam executing commands under Z where no blocking ever happens. Async agent-based messages can be passed between yam istances, inside Z or somewhere else, and no blocking would happen under Z. Fibers park but do not block, which is cooperative, unlike pre-emptive blocking with native threads.

Now let's look at another context named Y for the thorn runtime, written portably to not understand how a host environment works. There's an environment object E, containing a vtable among other things (struct of function pointers to you), where things the env knows how to do go, that are outside the closed world known by yam under thorn. The quill compiler rewrites code intended to run as yam coroutines, and any function called that is not explicitly white-listed is a compile time error. Functions in the thorn library are white-listed, as well as env functions explicitly listed in the environment E provided when a runtime instance of yam is created. (An evil dev named Bo may recompile thorn with more white-listed symbols treated as if part of the thorn library; works fine, but it's Bo's problem, not Wil's.) Env functions fall into two classes: inside and outside thorn (Iy and Oy for brevity). Each Iy function marked inside is understood as fast and non-blocking, and can be called in situ wherever they occur, indirecting though the env vtable; so the env implements Iy calls but they can be called locally when the yam scheduler is running. Each Oy function marked outside is understood to be potentially slow or blocking, and thus uncallable while the yam scheduler is running. To "call" them, the compiler emits code to queue the job in a env list for outbound Oy requests, to be serviced by the host environment, but only after the yam scheduler returns (yields) to the host environment. The job hangs forever if the env does not service it. The outbound Oy queue tells the env which running jobs are now blocked waiting to have outside calls performed. After an env has serviced them, it's unnecessary to queue them back into yam, because they are already parked in a wait state queue, and merely need to be made runnable to "return" from an outside call.

Under Z, outside calls handled by the env would be handled by async messaging, and the zam scheduler, not threads. Outside calls would never block as long as Z forbids such blocking calls; it's not complex at all. If for some reason a call would block, and Z knew this and objected, it could kill the job's lane, implementing a fault terminating the lightweight process. But there's no reason why fibers running under Z would link against api that could block, since Z has to whitelist everything quill compiles into Z's jobs running under yam.

Now let's look at another context named X, or perhaps T, for Ty's command shell running in Ty's standalone daemon, which has another copy of thorn and perhaps a completely different set of yam jobs. Wil gives Ty the job of writing a daemon with a shell language, so Wil can write tests. Because this isn't under Z, there's no async i/o architecture and no network stack, so Ty is going to use epoll for connections and a thread pool to service any blocking calls made in fibers Ty runs in his daemon's local yam runtime. Wil has a shell protoype named ysh in thorn, but Ty has extended it, adding stuff specific to his daemon, and Ty's executable is named shy, but usually Ty uses shy as a verb, to express sending inter-process shell commands when talking to a different shell (ysh) running inside a Z context. As much as possible, Ty uses non-blocking interfaces, so jobs and lanes don't park waiting for async replies to outside env calls. If Ty's thread pool contain N threads, and each services one blocking job call at once, this means at most N jobs can be getting a blocking call serviced concurrently. Any more than N blocking calls at once sit in the thread-safe queue of jobs waiting for worker thread service. For this reason, Ty's daemon can only do a few dozen things at once where Z might be able to do thousands, because Z doesn't use threads.

In short, context Y doesn't give a damn how how the env works, or how outside calls get services. Context Z allows no outside call using blocking api. Context T in Ty's daemon is the wild wild west where anything goes, and both threads and blocking api is fine. But if Ty relies too much on blocking interfaces, Wil won't be able to stress test code in Z as well, because Ty's shell in his daemon won't be able to generate as much load as Z can handle. (Ty just says: for more load just run more daemons, on more ports, from more machines.)

However, it is absolutely not the case that Z needs thread pools. And the design of yam has nothing to do with thread pools per se. A thread pool is a trivial undertaking -- the work of a afternoon including debugging. Anyone who doesn't get thread pools is a junior dev in the backend C and C++ world. Suggesting a senior dev has a problem that could be solved if they only knew how to get a thread pool in play is the same as calling them a moron. Though irritating, instead of getting insulted, a proper reaction is to think: is this trolling? Was I that unclear? Does this person think everyone is a dolt? A little bit of all of the above? Another reasonable reaction is to plan on writing less documentation, because no good deed will go unpunished, so the more you say the worse tech support questions are likely to get someday.

Single Threaded?

I'm just trying to be helpful, but I don't know much about you so its difficult to know what you do or do not know, so I tend to err on the side of clarity. I could just assume you know everything, but then there would be no point in your posting a question? I am also not sure I agree that threading is such an easy problem as you suggest, but that's probably another discussion.

Its hard for me to understand the problem the way you writing about Z's and Y's and yams, jams, and zams, but that's probably my problem not yours. In any case first I need to understand what it is you are explaining.

So now it just sounds like Z is a simple event model (one thread, only async non-blocking IO), just like JavaScript? Is that right? The problem being that each async operation needs to be passed the continuation function makes the code difficult to read? The error-continuation-monad is definitely applicable here, and I have implemented one in JavaScript, but you can't do that in C, so it would need a source-to-source transformation. It sounds like the host-env is free to use threads or processes to hand off the blocking calls.

I think you misunderstood my suggestion about a thread-pool. Normally thread-pools are used to hide blocking calls, but that is not necessarily the case. A thread pool can be used with non-blocking IO (which is what i was thinking about) So that the event based system can make use of multiple CPU cores to reduce latency. Each thread never blocks, but there are still more than one thread running, where the pool size it tuned to the CPU cores in use (so in a quad-core with hyperthreading you might use 8 threads). Different non-blocking modules might be running in parallel on different threads. In fact the thread could change after any non-blocking call.

Lets say you have a fibre like this:

- read packet from network
- when read has completed do:
- filter packet
- write packet to disk
- write packet to network
- when both writes have completed free packet

The thread that continues from the 'when' does not need to be the same one that called the read or write.

The nub of McCusker's idea

Seems to me, the nub of McCusker's idea is contained in his response titled "containing sprawl by partitioning into disjoint sub-programs". As best as I can tell, he is looking for (or rather, thinking about) a tool that will reorganize a large interdependent program into a more modular, more manageable (and perhaps more parallelizable?) one. If this is what he wants, one suggestion is to manually decorate the code to help along any automated tool. The idea is to automate what is easy or highly repetitive and manually do what is hard. But it is possible I don't understand what he is talking about either (the narrative style doesn't work for me).


Rewriting the above description like this might help:

read-packet-from-network | filter-packet | tee >(write-packet-to-disk) | write-packet-to-network

But this may make no sense if 'read' always waits for a full buffer to be read before passing it on. In the above example the 'when' waiting for the read could be just waiting for the DMA on a single packet to complete.

I am not sure which model is preferable, one like above where you consider a buffer at a time, through what looks like a thread (but is actually event based with the remaining statements passed as a continuation to each statement, or a pipe model, where each 'process' would contain a loop.

Personally I think the 'virtual threading' looks better, where the code is written as if it were running in a thread but it is actually event/continuation based.

local areas of single-threaded, nested in a lot of layers

I still think we're done for this topic, but improving clarity hurts little. You should assume I know everything unless I ask you to explain something. In some cases I don't know, but also don't care (like if you say anything about Haskell), or feel a direction is not productive so knowing is irrelevant. The only questions I have asked were rhetorical. This isn't Stackoverflow. I only posted to have discussion, albeit about something probably not worthwhile to attempt because of complexity, requiring walls of text to get anywhere.

Z as mentioned was written by a large number of C programmers over a lot of years, and reading it is a roller coaster of interacting library calls verging on spaghetti code -- so large it's infeasible to pull all the code outside a full build system. Thus an independent index is needed to go find definitions of symbols you see (and often you find hundreds to thousands of references across dozens of different directories in separate sub-projects, many folded into the same process). You can hear something exists and be unable to find it, unless a native guide takes you there and points at it; that's the norm rather than the exception. It started out as one Z process per CPU. Then a tiger team judged N threads per CPU was the best use of multi-core, and moved all the globals (lots of them) into thread-local storage, so each Z "process" is actually just one of N threads in the same address space. Each Z instance is single-threaded. Provisioning accounts for all the resources, but with slop factors to permit dynamic allocation despite mostly static allocation. The coding style guide is to write all new code as if Z is still one single-threaded process. No addition of threads can pass review. (And if it did, it would fight with and disrupt what exists, which is why it won't pass.) Z's complexity is hard to summarize, because it does everything. If there's a way to communicate or organize control flow, it does -- all of them, as far as I know.

Here and there an outside daemon exists for control-plane stuff that doesn't suck up enough resources to interfere, so need not follow Z rules, using every other way to organize control flow and communicate that Z skips. The concept of limited palette seems alien. You can find Z control-plane dataflow that's impossible to grasp without following a chain of events, RPC, messages, object-hierarchies, finite state machines, timer-based callbacks, and format transformations suiting each receiving layer. (In a daemon you can find what look like exercises to use all known C++ features in dispatching a single piece of information as it wends through interfaces.) They pay me to deal with it, not to like it. You have to go where there's system programming to do, if you want to stay out of ridiculous startups.

There's a lot of ways to do things. Nothing I have suggested doing changes the result, only organization. I came up with another way of doing fibers in one process can run large numbers of green threads and green processes in one thread of a single-threaded system, when it's necessary to embed a server inside another without interfering, under a device api (call to give cycles, queue in requests and queue out responses). There's nothing new about it, really, and it's pretty complex. But it's complex in a way that helps limit how complex something else can get, so instead of spending two or four years writing and debugging something, it might take only 6 months using a rational system with a limited palette but excellent introspection and the ability to lace in new monitoring an debug code at compile time. But what I really wanted to talk about was the Unix way of doing things in small pieces, and about loose coupling with namespaces and virtual file systems, and using nice programming language technology to generate stuff as needed. But instead I'm having a ridiculous exchange about threads like a question board for neophyte programmers. So I should recognize my mistake and move on.

Framing the discussion.

Perhaps the discussion could have framed better? I find the following very helpful:

"I really wanted to talk about is the Unix way of doing things in small pieces, and about loose coupling with namespaces and virtual file systems, and using nice programming language technology to generate stuff as needed."

The problem I find with long chunks of text is that is makes it seem as if any topic mentioned is open for discussion, even things that might have meant as an aside.

I think managing complexity in large projects is a very important topic, but it's a little hard to understand the restrictions placed (I am sure with very good reason). In the end it could be that very few people actually have the same problems? Maybe other similar software moved to threads long ago, or perhaps are free to adopt other solutions, like Ericsson's use of Erlang for its routers?

Loose coupling also seems interesting. Isn't that what TCL is for? I have seen plenty of (older) C unix programs that inluded a TCL shell. It used to be a common pattern back when I worked on system running SunOS to write C code that extended the TCL interpreter with all the functions, and you would then be able to script the application in TCL.

maybe we could frame this by talking about the unix way

I should adopt the common strategy these days of framing discussion so everyone else sounds as dumb as possible until they prove otherwise. But it's not a polite mid-western upbringing that stops me; rather it's believing cut-throat zero-sum goals are actually destructive, and that treating others as equals is inherently better in results for both individuals and groups.

I did say they use every way to communicate, didn't I? There are TCL scripts (and Gomer scripts), ase well as Python, Perl, and Ruby scripts -- just not in the Z process. I had a long thing to say about running scripts in Z here which you haven't earned by being constructive, polite, and gracious in assuming others have heard of things like TCL, though not yet mentioned.

[Edit: I did say I'm done. But we can exchange useless fire if you want, if I needn't exert myself.]

I agree.

I agree with your attitude towards zero sum goals, so I am not really sure what the issue is?

I didn't say you hadn't heard of TCL, I just recalled some personal experience with it. You could have said "Yes I remember TCL too... etc..". I find it hard to believe that we have no common ground in this discussion.

we can look for common ground

By preference I like to start conversations abstractly, by constructing a vision of what might be possible, then exploring details and grounding to reality only as necessary. But this never works. I'm always asked to give a concrete example illustrating what I mean for context. Then the whole conversation becomes about the example, where the crazy part occurs when part of the chain of example givens are denied as non-existent. (Suppose you want to talk about improving things, and you are forced to provide an example -- say, building a better windshield wiper. Something weird will come up: they deny car windshields exist, or say cars are evil, or that machinery is destroying man's humanity. You can't talk about windshield wipers, even as arbitrary example, because saying "car" already blew up the conversation. You can't get anywhere near the point without a miracle.)

Here's the conversation I want to have:

There's something interesting going on with simple unix utils you can compose in pleasant ways. Why can't parts of a single program in one process be like that too? Why aren't sub-routines structured more like cooperating concurrent processes, and less giant tree-shaped arithmetic expressions with self-modifying state?

But if you say that, an expected response is: what do you mean? Can you give me an example? Under what circumstances would this matter? I wrote a 100 line program, and this issue of function call tree being complex does not seem like a problem, so maybe the problem is with you (what a loser)?

If you present a good example of something awful, where another approach would sound better, then the broken part was the example: you should work someplace where every problem can be solved with 100 line programs.

However, I want to do the new thing even when things are not as broken. I want everything mixed together, instead of reserving one way of doing things when you do OS command line stuff, and a completely different way when writing a process. If that's what you want to talk about too, we have common ground.

I see in you a strong taste for curating best of breed solutions, so you can cite the closest match based on pattern matching. This is very boring to me. And it doesn't reply to the vision: imagine this thing and explore the good part of it.


What do you think about the Powershell approach (doing everything the same way but making the shell more like C#, instead of making the language more like the OS/Shell)?

I agree with the idea that we can improve both the command line and process programmer models. I need to think a bit about how it could work.

Are you interested in discussing how it might be implemented in 'C', or are you more interested in discussing what the environment might be like to use? Do you think that programs for this new environment would still be written in C?

glad you're thinking about it

My answer borrows from Thomas Lord's coupling typology in features discouraging sprawl:

For example, we could decide that function application always describes a tight coupling. Or if two components share memory, that is always a tight coupling.

On the other hand, pipelining outputs and inputs is a loose coupling.

Armed with that classification of composition techniques, we can meaningfully talk about the transitive closures of tightly coupled components. If A is tightly coupled to B which is tightly coupled to C, together they are a tightly coupled assembly of components A, B, and C.

Tighter coupling is stronger dependency, while looser coupling is weaker dependency, meaning it's easier to replace things with loose coupling in general. Tight coupling favors monolithic systems, appealing to there-can-be-only-one taste in tech: majestic when viewed favorably, but perhaps despotic when viewed critically since it typically has the goal of captive audience. (Where are you going? It's perfect here.)

A quick read on Wikipedia about Powershell gives me a shallow snap judgment:

What do you think about the Powershell approach (doing everything the same way but making the shell more like C#, instead of making the language more like the OS/Shell)?

I like the idea of making a line between shell and app more fluid, but Powershell seems to move the line in the wrong direction, if you prefer loose coupling. To make a shell more like an imperative language, used in apps, is tighter coupling. I would rather parts of apps more like a shell with looser coupling.

What's more, I don't want the shell language used in the app to be required (at all) to match the one in a host environment. For example, if you required an embedded scripting language to do everything one normally does on a Linux command line, all you did was require embedded Linux with tight coupling to the host env. It's not portable, unless you have a monolithic host env to install.

(Every vender of monolithic solutions wants to frame other monolithic solutions as the only alternative, so a user has only choice of evils, not actual freedom. Note I have avoided characterizing anyone as monolithic or evil here. I'm more interested in the framing than the evil, since evil is everywhere, like smog, so what can you do. My slant is always going to be: more alternatives is better.)

I agree with the idea that we can improve both the command line and process programmer models. I need to think a bit about how it could work.

That's a good result from my view, since I really only aim to nudge folks to think when I post. When I say how to do ABC, I don't want others to do that; instead I want them to think, "That makes me realize how to XYZ, which seems interesting." Whatever comes to mind may be useful to discuss, especially if it has loose ends or seems strange.

Are you interested in discussing how it might be implemented in 'C', or are you more interested in discussing what the environment might be like to use? Do you think that programs for this new environment would still be written in C?

I've given a lot of thought to C implementation, because I had assumed I was going to do it; but I've been stuck a long time in the state of "yeah that would work, but whatever". Talking about it could be interesting, but agonizing is a common result too. I was surprised when I thought about the resulting environment, because it had potential to be fairly strange. (When I first started thinking about a shell, it was just boring as a need. But then it started being as interesting by itself as anything else, maybe even more. Not sure what to make of that.)

It doesn't need to be in C. For myself, stuck working in C, I need something written almost exclusively in C if I want to get to use it in something like Z. But so far any language seems like it would work if one uses trampolines and cactus stacks (also known as spaghetti stacks). Whatever your environment considers the first class language would work as the substrate. Tools are easier to write in a high level language, so I'd be inclined to write a Lisp or Smalltalk right away if working in C at the bottom, because they are very low cost from an individual dev's perspective to get rolling. (If you wanted to appeal to masses, something more like HyperTalk would appeal better, but the ambiguity irritates me as a dev.)

VHDL, Prolog, and IDLs

A quick thought that occurred to me this morning, is that there is some similarity to VHDL. You write 'processes' interconnected with 'signals'. Inside a process you can write procedural code, outside is just wiring.

Another thought is this has a similar architecture to my language project (although I am not looking at multiple languages for each role). A low level imperative language for writing components, with a Prolog like logic language to bind them together, where values in the Prolog language represent types in the component language. Here the components are tightly coupled internally, but could be losely coupled externally. The Prolog database has similarities to a file system (in this case flat, but hierarchical naming would be easy to add with a path separator). An interesting difference is that arguments (which represent the types of the component arguments) are included in the name lookup, and there's unification of course.

Its clear that the 'shell' should use dynamic binding, whilst the components use static. I am not so sure that the Unix command line as a text string, and character pipes is the best interface though. What about typed arguments and typed streams (maybe protocols)? You could then have a language independent IDL like CORBA OMG, or Mozilla XPIDL to define the bindings. This seems to be moving away from simple Unix, and more towards the Powershell approach. I don't like plain text argument parsing though, so I am not sure how else it could be done?

manual or auto-wiring loosely coupled components

In your brief project mention, I don't yet see a cooperating concurrent process part resembling unix tools. Unless, maybe you want to publish values in a db for a tuple space for other components to consume in producing other values. That would be loosely coupled, but I'm not picturing command lines with pipes yet. Tools connecting more anonymously because told (by a shell) have looser coupling, but tools connected by looking for each other (say by name) have tighter coupling that's more dynamic than static. I only worked with one guy, twenty some years ago, who talked about Prolog much.

Text command lines and character pipes aren't required for the model; they just seem like loosest coupling. You can stream messages instead of characters, which is definitely better for control-plane behavior. (Orders arriving by wire aren't clear until complete; even unix tools get the entire command line at once, instead of starting before a command line is done.)

Your last paragraph reads like a gentle slippery slope argument that a unix component way is bad by implication, without your saying it. (As a rhetorical tactic it looks like, "I want to agree, but I can't seem to make your model work.") You can stack related things with different entry points, so a text-based entry point funnels into a chunky typed-object entry point. It's cheap to do, especially if it requires only another dynamic activation frame allocated. Basically, you can have a continuum of binding mechanisms, varying from loose to tight, then invoke the right one using more than one technique. If you statically know the right flavor, use a different path name; if you don't then use types or negotiation.

hasn't this been studied?

You mentioned tuple spaces and messaging so it seems you alrady know about stuff. LabView. Dataflow. GPARS. Erlang. OTP. Anything from Esterel. FRP. RDP. Etc. Might you be willing & able to say which of the things that already exist seem most related to your armchair architecture? Just off the cuff, to see if it helps orient me/others at all. Not to have it preclude or mislead or wall off any avenues.

does this help you relate ideas here to existing work?

[While I know about some stuff, my exposure is accidental unless I go looking. I've never been in the racket of knowing stuff as a role. (I like a low stress role of over-qualified plumber and don't expect to change.) Languages and storage have been an interest since about 1990, so I pay attention to tech stories describing ways of organizing control and data flow, and how bytes move around in different models. In particular, I'm especially interested in how scope, namespaces, and binding work, and how this interacts with local process storage: the runtime of how an app works, or the infrastructure it uses. Around 1992 I interviewed one of the guys who worked on the Linda runtime and we talked slightly about tuple spaces. And I worked on a graphical dataflow language around 1993. It's too long a story to say everything I ever did related to language runtime models. I wrote a small Lisp and a Smalltalk in the early 90's (and a hybrid of both models), which is why I have a handle on them. The only hat I like to wear is: I'm just a developer, but one who deals with some edgy hard stuff because it's fun. Is that enough provisos? I'm not an academic or a guru, just find some topics more interesting than others. I'm not sure how it evolves into thinking about a design for years -- partly nothing better to do. I have no interest in type theory whatsoever.]

Since I'm neither academic nor researcher, evaluating past work isn't part of my process. Never heard of GPARS or Esterel, but I see enough about them in a search now that I might see what you want compared. Okay, I'll try to give a bird's eye view of features relevant to things you listed. Keep in mind I've been planning to use C, which is a crazy dangerous tool to use in something fairly complex, unless you pay a good tax in watching for runtime problems to quash them. I'd expect my model to appeal mainly to folks already doing something that's crazy to do in C, but they are anyway, despite the pain. A similar model would work in other languages, but I think people should keep doing whatever is working.

A main idea is putting as many independently scheduled concurrent things in one address space as possible. This part sounds like Erlang, but Erlang has a nice immutable data model, and should be incredibly robust and stable from that. Doing something similar in C shouldn't work, unless one or more counter-balancing forces of order help quell the chaos: heavy dynamic tag use and acute intolerance of corruption via asserts and other runtime checks. Erlang calls each independent code fragment a process, and each process has a pid identity, so you can send them messages via pid. Handling a message in Erlang is a bit like ML dispatching: a big cond switching on cases by type. (At least that's my mental image, which could be wrong.)

I call each executable entity a job, not terribly different from a continuation in Scheme. One field in a job is the continuation that runs when a job is next scheduled. Each job has a stack, in the form of a linked list of activation frames, where all references are done by refcounting, and where all refcounts are managed solely by handles. (Refcount is defined to equal the number of handles holding a ref, and no other refs can exist. Handles support auditing: another force to quell chaos.) At minimum, one frame is enough stack, so the smallest job can be pretty small. A job has sufficient identity to awaken by ID when it is parked waiting for an async reply; every ID has a generation number folded into it, including those of file descriptors, so you can tell whether a stale ID is use accidentally (another anti-chaos force). The main role of a job is to act like a thread, but actually each job is one fiber in a single-threaded runtime instance. So if you used a tool to port a Unix app to thorn, you would map a thread to a job, because that's the intended role in a code/data graph: green thread.

The entity imitating a process is a lane (treat this name as arbitrary), and instead of a pid it has a lid, the lane identity, which is normally going to be a counter plus generation number. It's like a native OS process in that it's a collection of threads in the form of jobs, and its identity lets you do things to it you'd expect to do to a Linux process or to an Erlang process it: send signals or messages. If you ported one Unix app to thorn, it would become a lane, and each thread could become a job in that lane. Each lane belows to a group, like process group sharing resource accounting among other things. (Use too much memory and every lane in the group experiences first a soft limit, then a hard limit, whose effects vary depending on policy, but escalating to kill lanes as a last resort.) Killing a lane cleans up all resources by freeing references, incrementally via async processing so there can't be a high latency stall, and this aims to be done perfectly because the quill compiler generates runtime struct introspection meta-data mapping where everything is, including handles holding refs. Tests will aim to establish tear-down works perfectly, because killing things is how you follow an Erlang crash-only model.

Each lane has a mailbox, and a signal port and mask. One lane job is designated as the one that must be awakened when a signal or message arrives. So if you signal a lane, it awakens the parked job listening for out-of-band inputs. You could turn a signal into a message and put it in the mailbox, to have exactly one such mechanism, but why bother? Each lane has a place to store a command line in argc/argv form, if there was a command line. And there's a standard input and standard output stream, which use immutable iovec fragment vectors with copy-on-write for editing (so many lanes can share and edit the same stream without ever interfering). Zero-copy is the default, unless you go to the trouble of cloning content.

When you want something that looks like a unix tool in a unix command line, the standard input and output streams come into play, and a pipeline puts every lane within one command in the same group (so handing off refcounted buffers from one to another causes zero net change in memory used, with respect to accounting limits). Credit-based flow control must occur to prevent awkward amounts of buffer accumulation in pipeline chains, and this can be coordinated with distributed credit-based flow control involving other lanes in the same process, or remote lanes, or completely foreign entities that merely negotiate.

When you want something that looks like an agent-based system (see I finally got back to GPars and Esterel), you use the mailbox for messages instead, or in addition to standard i/o streams. If for some reason a lane wanted more than one mailbox, it can put as many message queues into an activation frame used by the main job as it wants, then move inbox messages into the proper secondary box for handling, awakening any jobs parked on a queue until a condition variable signals.

All synchronization mechanisms are simple, single-threaded mechanisms of the most obvious sort -- a condition variable has a queue of jobs parked on it -- and anything you normally do in a thread-based system will be present in a form working on jobs. Few thread-safe data structures are needed: mainly thread-safe queues for transmitting information between runtime instance, as well as back and forth to worker thread pools, if the host environment happens to service outside calls from parked jobs that way, when they need to perform blocking operations.

The quill compiler is just a C rewriter, and you use any normal C compiler you like afterwards, to generate object code and link everything together. A first version of quill would be a toy with respect to some scaling issues involving organization of code trees, because I will want every scrap of code present in a tarball used as a database. Because the C pre-processor is also a C rewriter, quill doesn't like it, and I had to think a lot about what to do in conflict scenarios. The idea is that you will still keep the same macros in the source, so they will operate when the final C compiler runs. So this means when quill wants to rewrite something inside the output of a macro, there's a problem -- it's a compile error unless you tell quill to inline it first and then rewrite. Basically quill will have a full pre-processor itself, and will keep track of exactly where everything came from, down the exact byte offset of each character pulled from headers and elsewhere in the tarball database. This means a token created by joining two parts via ## will know where every byte came from: that token format will be an iovec array. One goal is to have every error show you exactly where any offending piece of source code originates. (Reports in the format of syntax-colored markup would be easy to generate, perhaps browse-able via inspector putting compile results into a virtual file system location related to inputs.)

As this is long already, I should save remarks about virtual file system for later. There would be one at compile time, when quill does fiber code rewriting. And optionally there's another one at runtime, as part of inter-lane communication and staging of intermediary results. In both cases parts of the host environment file system could be mounted on directory paths to effect user desired policy for content export. They would only become related to each other directly if the compiler ran inside as a lane and the compile time and runtime overlapped, at least during part of a staged evolution of code deployment.

excuse me, my brain is full

cool! that was a lot of good stuff.

half-serious: how about you write it in something that isn't C but interfaces with C? that way everybody wins more ;-). ATS, baby.

something, something, something, dark side

(Does ATS mean at-the-store?) As noted in a post far below, only fast and limited leaf calls in C would work. There isn't any way to interface with C in general, that isn't CPS transformed, to integrate with green threads that forbid blocking and/or high latency. I suspect you mean any sort of C, which is impossible while hitting objectives.

You might instead be suggesting a language used on top by devs need not be C. Yes, that's right, as long as it compiles to C that has been transformed to continuation passing style. My preference is a Scheme/Smalltalk hybrid, so that's what I'd write next myself. Most of the focus on C so far is caused by big piles of legacy code in C, causing one to ask: but what do I do with all this? The answer: run it through the rewriter. Then it's still C, roughly isomorphic in features, but now runs under an async engine (and can have profiling, analysis, and reporting code laced into it at the same time). Porting would still involve headaches, but manageable ones.

But I suspect you just mean few folks like C, and something similar should be done in a popular language. (Would the reward be approval? Who cares what people think? :-) A lot of popular languages are ones easy to use, requiring less knowledge, making it easier to blow off your foot if something really complex happens. Adding concurrent green threads is just more rope to hang yourself, if you were hanging yourself regularly before. So it should go boom, unless wrapped up in something removing most degrees of freedom in the direction of safety.

What I'm suggesting is only an out-and-out improvement when folks are already doing something far more scary and confusing than a bunch of cooperating green threads. (Something that makes a reasonable dev say, "Oh my dear god," not that a reasonable dev would be religious. You may think there are no such things, but you'd be wrong.)

Loosely coupled, strongly typed.

Lets leave my project aside, I thought I saw some similarities, but I haven't really thought enough about the parallelism side of things.

I think I would want typed streams, and a typed command line, so that the API for a program could simply be a function. Lets leave streams/pipes to one side for the moment.

I could have:

int add(int, int) {...}

and on the command line type:

add 1 2

Doesn't this come back to having a database of all functions. The functions could be in a hierarchical namespace, like a filesystem:

/math/add 1 2

Data files could be like structs (and be typed in the metadata). You could then do:

/data/c = /math/add /data/a /data/b

as a shell command. This seems fairly loosely coupled to me, even if it is typed?

like the filesystem examples

I enjoy this reply of yours quite a bit, in case it doesn't show otherwise.

I think I would want typed streams, and a typed command line

Once you have any command line, you can launch another, typed the way you want. Or you can change modes in a current one. But unless that pays off another way, you just made one tool gratuitously more complex. Being able to customize your leverage is good. Types are not bad, I just don't care about theory more complex that dynamic-language-style strong types and proper contravariance and covariance handling.

Doesn't this come back to having a database of all functions.

Yes, though scope rules don't require global visibility. A typical shell has search paths, but you could require explicitly qualified paths instead, like relative or absolute URLs.

/math/add 1 2

I like that example better.

Data files could be like structs (and be typed in the metadata).

If you have a virtual file system, the difference between "file" and something in memory is slightly arbitrary.

I started thinking about a virtual file system in terms of supporting URLs for an embedded http inspector revealing daemon state, where accessing some paths would be understood as executing code to populate that part of the virtual file system. (Gosh, I'm going to need an abbreviation. I don't like VFS much, but it's not horrible.) For example, you could cause a compilation unit to be compiled by accessing a specific path, and if it was already compiled and no dependencies had been changed, you might just get a cached copy from last time (with indication to this effect).

You could go to your secure shell connection, type in a command line defining new behavior in the VFS as a script, perhaps compiled, then when you went to a browser to get a relevant URL, it would fire the new code you just arranged. I had a long digressing train of thought about "calling" a path with another path supplied as argument specifying what you wanted to happen. It seems like it would be pretty easy to setup event driven responsive functional programming behavior.


I think I understand now the the difficulty you've been making reference to because it is the same difficulty I encounter myself.
Several months ago I wanted to talk to a programmer I met about how stacks were interesting because you could hand one to a component expecting only a stack of n elements, the component could modify the stack by pushing and popping things to it and the whole stack could be modified appropriately. (basically it's a bidirectional view of the stack). And did he know of any other structures that allowed a bidirectional view so trivially?

He responded that I should learn python because python was so flexible that you could do lots of things with it.

He responded that I should

He responded that I should learn python because python was so flexible that you could do lots of things with it.

We have a long way to go on education.

try again with someone else

I thought that might sound familiar to a lot of folks. You describe something to get feedback, but a description doesn't work, because it's totally dominated by a listener's view to the extent they seem to hear another, almost unrecognizable idea. Most upsetting is when an early proviso is denied. You want to say: you know how when A is the situation and B occurs, and you try C but outcome D occurs a lot that still echoes B, and you think maybe E would help ... And they just say, "No, B is false. That never happens. You're mad." If you try to explain C and D seem to confirm B is actually the case, then it makes you mister psycho in their eyes.

People don't mean harm by not listening to you. The only good recipe I know for good conversations is to talk to a lot of smart people, and sometimes it will gel, though usually not. Might as well take what life gives you, since frustration doesn't help.

But I've seen people who

But I've seen people who don't seem to have this trouble. For example Guy Steele seemed to have very little difficulty in communicating why static scope is cool. I think it might because he gave an example where dynamic scope caused a problem and then he gave reasons why all the other possible solutions were inadequate in that example.
Once there was at least one case that required readers to acknowledge the concept he was able to bring up a number of cases where static scope was merely convenient not mandatory. (Micheal Faraday did it similarly in his lecture "the chemical history of a candle")

So I think when people ask for an example, they are hoping for a case where this mechanism is clearly the best answer, so that they can hold onto the idea as something different than all the other things they know.

A lot of the time I can't give this example because I want to talk about similarities between several techniques and what principals behind their differences are, if we can find them.

"I can't give this example

because nobody cares."

There's sorta 3 categories of reactions I see:

(1) Totally not agreeing that it is a problem of design or anything, and insisting that the developer simply has to avoid making the mistake. (null pointers, security, causing deadlock, misconstruing specs, whatever.) To me this is either being thick/dense, or being like la la la I'm not listening.

(2) Seeing my point and even seeing the standard solutions (e.g. using Option/Maybe in the case of the possible NPE) but then nobody really getting onto the bandwagon because it would require boiling the ocean that is the codebase. I mean, who *really* wants to start putting in all those pattern matching statements? And deadlock is so easy to fix (once it happens) from the Java log output. And since we've never used such witchcraft before who knows what the unknown unknown consequences will be. Better to just live & die by the NPE.

(3) The code is already written from the vantage point / style where all these things have been addressed. So they are already using Option/Maybe or non-Nullable types with a static/dynamic model checker, or dataflow to avoid deadlock, etc.

I have not really had the (a) self-promotional abilities (which you can just read more accurately as: smarts) and (b) dumb good luck to ever work in a type 3 environment much at all ever.

but... but... it *is*!

machinery is destroying man's humanity


small price for the singularity

Sometimes objections are plausible, so I included one. :-) Then you say, "Let's pretend machines only constrict humanity, so we can talk about windshield wipers hypothetically, without destroying anyone's humanity." Not many folks like thinking about counterfactuals though, so it's kicking and screaming the whole way.

Yesterday's Blog post seems relevant.

TLDR; mad science about my ongoing and rather quixotic quest to create a near-universal translation-target for programming languages. It seems relevant because if there's ever going to be a universal repository of code, all mutually linkable and interoperable, that we can meaningfully search and use and index and curate, it's going to have to exist in a superset of all languages of deliberately constrained semantics.

metadata? that I can roundtrip back and forth between the bytecode and the lisp dialect without loss of meaning.

The problem I’m trying to solve is that when a compiler lowers the level of abstraction of a language in order to express it in terms of machine-level representation, it necessarily makes arbitrary choices which hide the abstractions it’s translating.

I'd like a generic solution to this for all translators, not just the one you are working on. So I think a standard for metadata that compilers could emit and then could be kept around. Just like debug symbols are.


Me too.

Boy, I'd like a generic solution to that for all translators too, but I seem to only have one lifetime to work on it. :-)

just the format

Sorry, yeah.

I know that until something gets used "n" times one knows that the abstraction isn't right, but it could be nice to try to avoid as much as possible "hard coded" aspects that are just for your tool chain / context, or maybe blog about what seems to you to be specific vs. generic-izeable?

Like, you are forging ahead with this, maybe in your copious spare time docs / thoughts would be good to post for others to leverage off of.


If it's language agnostic then it makes debugging impossible

Consider the transform to CPS style - you get a program so mangled that it's no longer human-readable.

As a hobby, I'm working on a language that I plan to have better support for metaprogramming than existing languages. It's still pie-in-the-sky though.

Clearly if you want to be able to debug a program that's gone through transform phases, you need a debugger that can map back to the original source, not just the transformed source.

save everything, granular spans (not just lines), and map

Clearly if you want to be able to debug a program that's gone through transform phases, you need a debugger that can map back to the original source, not just the transformed source.

Yes, so you would want to keep all the original source, plus all the transformed result, along with a clear mapping, that's browseable, preferrably using an HTTP server embedded in whatever manages the store holding all that. And you'd want to be able to attach scripts to URLs, to generate new functional transformations on demand. Maintaining the data isn't so hard (to me, since I'm a backend specialist) while a nice UI in a frontend is hard (to me, because I don't do that).

However, you could get a lot of mileage out of having a headless daemon model lots of stuff, which a client with a UI that browses and communicates with the daemon, with both running locally so i/o between them is high bandwidth and low latency. Then the UI could not crash the backend in another process (unless a sloppy mistake entangled them).

Debugging through a web server?

I've never used a system like that... All of the debugging I've done was either inside a fancy IDE, or some simpler breakpoint and disassembler type debugger. Most of both of those supported remote debugging though.

While I suppose it would be nice to be able to leverage and not have to write a whole IDE, I'm a bit sceptical about running a debugger as a web service, it sounds resource-expensive and slow.

I don't know what a headless daemon model is.

I better go to bed.

I'm tired and not reading well.

I guess all that talk about low latency high bandwidth daemon means that you keep the debugger in a separate local process. Good idea, though that's usually separable into remote debugging too.

I don't understand the rest of your design, or what HTML is doing in it, other than giving you a display format.

through your own hand-written web server

I don't know what a headless daemon model is.

There's a casual generic meaning of daemon and a very specific technical one. The latter means a process that has disconnected so many ways of interacting with it, that hurting it on purpose or accidentally is hard, generally closing all file descriptors not used, disabling signals, and escaping influence of parent process. Usually folks I know rarely go that far; it's actually convenient to run them in the foreground when debugging, for example. The casual generic meaning is that normal interaction is through connections to ports, and that there is no UI, or even a command line (unless when debugging it seems a good idea). The casual sense is the idea of keeping state in a backend "daemon" process, then running other parts of an app in another process.

The headless part just means "doesn't have a UI" and is redundant with daemon.

While I suppose it would be nice to be able to leverage and not have to write a whole IDE, I'm a bit sceptical about running a debugger as a web service, it sounds resource-expensive and slow.

I probably have a funny take on web services, not liking them much, and never considering use of anyone else's pre-packaged tools. The idea I meant was that nothing stops any process from listening on a port and serving HTTP on it. The handling and processing of HTTP requests can be done as efficiently as anything else in your carefully honed process -- unless you commit to dumping in a big pile of sloppy crap other people write, over which one has no control.

If part your runtime is an engine whose purpose in life is to efficiently handle lots of async i/o, then supporting a handful of local clients on the same machine -- from the developer running the local server as just another process -- won't have much impact at all because there's almost no load. (Unless the request initiates a high cost script, spending arbitrary amounts of cycles.)

Suppose you write an awesome and efficient IDE in one process, with a gorgeous responsive UI. Now cut it in two and put one part in another process. If both halves have an engine that serves high volume low latency i/o, as a user you won't be able to tell it's two processes, except the one with no UI will be harder to kill, etc. The number of processes is an independent variable, it doesn't determine basic architecture much.

Expressiveness is both the friend and enemy of Generality.

That's all about programming language expressiveness as defined by Matt Felliesen (whose name I hope I have spelled correctly).

His point about expressiveness was that if you have two languages A and B, and there exist programs in A that require nonlocal transformation to express in B, but there do not exist programs in B that require nonlocal transformations to express in A, then B is a more expressive language than A.

Continuation Passing Style is a global transformation, for example. So if you have a language in which some semantics are not expressible without going through a full CPS transformation and operating on the transformed code (the way you have to do to get implicit cooperative multitasking or certain other kinds of stack and thread guarantees in C) you're looking at semantics that could be supported more directly by some more expressive language. Your language that supports metaprogramming, for example, is going to have a lot of stuff in it that can't be expressed in C without global transformations, for exactly that same reason.

Having a debugger that can relate such a transformed program back to the source code in a comprehensible way is not clearly impossible, but it is also not clearly possible and, to the extent that it can be done at all, may depend on the users being "MUCH smarter than the average bear" as the saying goes.

Anyway, that's one reason I'm working on a translation-target dialect; the idea being that if I can come up with a language capable of expressing most things that other languages can express using strictly local transformations (my standard here is "using an isomorphic Abstract Syntax Tree") then it should be able to translate programs from other languages directly into it without loss of structure - IOW, with the abstractions from the source language intact rather than having been destroyed or obscured by necessary whole-program transformations.

So the program as expressed in the translation-target dialect becomes the (hopefully human-comprehensible if abstractions are successfully preserved) object that the debugger works on, and the translation-target runtime becomes a common runtime that allows code originally written in different languages to interoperate. The downside is that if people then edit the code based on the semantics available in the translation-target dialect, those changes are likely NOT to be expressible without whole-program transformations that render the program incomprehensible and its abstractions opaque when going back to the original source languages.

So.... If I am able to successfully roundtrip between the debugger and the target dialect that accomplishes something - but I don't believe in my ability to extend this capability of universally preserving semantics in a way that allows reconstructing a comprehensible program back to other languages where the semantics are more restricted.

It's sort of paradoxical; the more general semantics is easier to preserve because it is expressed by a language that doesn't require any further whole-program transformations to preserve it.

It's also sort of paradoxical (or maybe its the same paradox) that if given the power to cleanly express nearly any semantics, you can also express a complete semantic mess. The translation-target dialect can cheerfully express semantics that were mistakes in the original languages, like reified continuations, or which absolutely don't work together, like dynamic scope and fexprs, and can mash together programs using routines from programs in languages having wildly incompatible semantics, and it's up to the programmer to sort that mess out.

Geez, three hours and still compiling, on a 64Gbyte system with 8 CPUs. I get time to post on LtU, but still, I gotta make this system faster.

Actually, it's static scope and Fexpr that are a mess together

The original meaning of fexprs was just passing a quoted list, and in a language with dynamic scoped variables, that meant that you could just call eval and get the meaning, since there was only one environment, with the one exception of the parameters holding expressions - and it only needed two paths if it wasn't sure if the function was a fexpr.

It's in static scoped languages where fexprs are a mess and you have to pass the environment as well as the expression. Languages like kernel go so far that they evaluate nothing automatically and leave the creation and maintainence of environments up to the user - so they make compilation so hard that no one has managed to even make a fast interpreter.

Actually it's closures that make dynamic scope and fexprs a mess

First of all, thanks for getting me to reexamine something I thought I had fully understood. Progress in this project is often marked by noticing that self-evident "fully understood" truths are the result of unexamined assumptions.

As long as closures are transient or stateless, you're right that static (lexical) scope presents a more complex model than dynamic scope. This is surprising to me because I just haven't been thinking about models with transient or stateless closures (ie, where the closures are always destroyed, stack-like, when the procedure returns).

Closures are the fundamental mathematical model underlying all abstractions involving memory allocation, including both stack or "auto" allocation and dynamic or "heap" memory management. Thinking about a language with transient-only closures means no heap allocation. That means no allocation beyond extending the current environment with transient stack frames, any of which can have zero or one child frame at a time while it exists and each of which is destroyed before its parent is extended with a different child frame. While limited, that isn't a useless model at all; it's very clean and quite capable, and suits such things as FORTH, and straightforward value-based pure-functional mathematics evaluation just fine. It's very easy to prove correct, and can be implemented to be blazing fast. And, as you point out, it means fexprs are no problem whatsoever with dynamic scope.

But if I'm going to include OO languages and languages that use heap allocation among my translation sources, I'm going to have to have persistent closures. Further, these persistent closures must have their own state. And that means that the environment at the moment the closure is created is the one that they extend, and because in dynamic environments the structure of that environment will differ depending on the call site, that means that fexprs in a dynamic-environment system create objects of differing structure when the procedures containing them (or the fexprs containing them) are called from different sites.

Further, because the objects created contain executable code that depends on bindings visible in the environment at the call site, and the calls are made at different times, the semantics of stateful procedures defined in the object's environment can depend on values of "globals" which vary at definition time. That gives you either semantics of procedures defined in your objects changing after the objects are created as the wider-scoped bindings they depend on change (if your bindings are resolved by reference), or objects created by the same call containing procedures with different semantics depending on the state of global (or parent) bindings at the moment of the call (if your bindings are resolved by value).

Anyway, modeling all allocation as instances of closures means that the interaction of closures with your scope and environment rules very rapidly sprouts special cases depending on the desired semantics of variables including but not limited to standard "auto" functions of the sort we usually think of as "stack" variables and the semantics of persistent functions and objects that we usually think of as "heap" variables.

I find lexically scoped bindings to be less confusing in general, and as an engineering matter I'm convinced that they scale better; it's less easy to anticipate varying conditions at call sites over a vast amount of code than it is to see what bindings have static scope at the definition site. I am also a supporter of alpha-renaming (or whatever equivalent discipline for retaining "hygiene) as a means of keeping binding changes at call sites from altering the behavior of macros and fexprs in unintended ways.

But my preferences are hardly relevant for purposes of the translation-target dialect; because there are languages that follow different scoping disciplines I need to allow for the creation of closures (and the evaluation of fexprs) which follow those rules. And because there are languages that don't have any alpha-renaming or hygiene discipline, I must respect that some semantics require its absence.

There is also this..

You don't need CPS to do fibers in C++ if you're willing to use a library that depends on non-standard parts of the implementation.

Boost has a library called "co-routine" that is portable in a practical sense, because it's been ported to so many platforms. I think it depends on allocating new stacks on the heap, and messing with non-standard, setjump/longjump records to do switching.

Someone expanded that to a full cooperative threading library with fiber-safe mutexes and condition variables and the like. It hasn't been accepted into the official boost libraries, but you can get it here:
It's called Boost Fiber.

I'm thinking of forking that into something appropriate for running something with similar qualities as .net or the JVM in pure C++ - allowing the illusion of normal threading to co-oexist with a parallel garbage collector, the point of the fibers being to keep the number of hardware threads down to the number of logical cores so that synchronization with the GC doesn't get too slow when the number of threads is huge. A design like that requires things that are usually hidden, not CPS but safe points and special code for possibly blocking calls.

and in C without C++ you get nothing from Boost

You are not following the non-blocking async part at all.


The original article didn't mention non-blocking async, it mentioned fibers implemented through continuation passing style.

You can simulate non-blocking async from fibers no matter how they're implemented, by putting in yields which I mentioned as part of gc safe-points.

What I said about Boost was that the cooroutine library gives you the primitives you need to implement fibers WITHOUT transforming the program to continuation passing style. Then I linked to a library that implemented that which has not been accepted into the full Boost library.


I misread your title.

Yes, Boost is for C++ rather than C.

But since C is a subset of C++ you don't LOSE anything by writing your C programs in C++.

maybe someone will feel like talking about Boost, if not me

I'm pretty familiar with how easy it is to use C inside C++, since I used C++ professionally almost full time from 1990 to 2010. Two things stop me from using it much now. First is law written in stone at work, that "thou shall not use a C++ compiler in this particular production process", over which I have no control. Second is recent changes in C++, ensuring I can't ever know exactly what has happened to the runtime, when I don't want anything to happen I don't know about. I have no reason to dissuade folks from using it though.

(If there's a feature I really like in C++, I can parse a new extension to C in a compiler variant, that compiles to async C running in fibers. Code requiring the moral equivalent of exceptions can be accommodated, but not the behavior of destructors for RAII without adding implicit destructor on unwind support.)

I understand.

I do have a level of comfort with using pure C, you'll have to roll your own of a lot of things, but you'll always know what's going on.

The more I use modern C++ with its dependance on unreadable template tricks the more I cringe. You can do almost anything in a C++ library, but a library that does amazing things takes a week or a month in scheme or it takes a room full of experts a year or two in C++ and it won't do as much. And it's more likely to expose bugs in the three remaining C++ compilers...

And with something as primitive as C, the pressure for a preprocessor like the one your contemplating is higher.

You could still copy the Boost Coroutine library into pure C you know. setjump/longjump are C library calls, not C++.

C with templates

I think C with templates would be interesting.

C++ without objects?

The great thing about templates is that you can use them to define things like object systems which generates efficient code, so if you took objects out of C++ some clever programmer would put them back in ... and results would show you why template metaprogramming is a bad idea: The library will be unreadable, create error messages that make no sense. The template system will require that the code that uses it have horrible syntax and odd limitations. Code using the system would compile slowly and use so many corners of the language that it would expose bugs in all of the compilers.

imitating templates with macros in C

During long walks this last week, I found myself designing C preprocessor macros imitating template features from C++, mostly because I want minimal generics support in C-only libraries just to reduce boilerplate. Otherwise the same kind of collection would require a lot of boilerplate for different types. I use a lot of hashmaps, vectors, and lists in C, and it would be a shame to write them more than once, or to make them unnecessarily type unsafe. I can emit boilerplate from macros -- either at normal compile time, or in an earlier pass generating the versions specialized to specific types. (I expect folks to see this provides no compile time meta-programming like C++ templates. Yes, that's not a goal ... just re-use of the same way of generating nearly identical code, but using different symbols with names specialized to new types.)

Generally I find C macro usage creepy, when done via C preprocessor, especially if singly-typed void* style macros are used. Lack of type-checking lets bugs fester as you pile up code. But I don't see why we can't generate very strongly typed code instead, though it requires a dev write a lot of C-based methods, when they are used by name in generated code from macros. In C++ you would have to write constructors, assignment operators, comparison operaters, etc too. So it really isn't any different in C when it comes to "but I have to write every method the api tries to use". Yeah, so?

A common awkward problem with preprocessor macros in C is difficulty in discovering where symbols are actually declared and defined, because they might be constructed by joining several tokens into one. So without a convention for names, one can't decide what to type in global source code searches.

Where in C++ one might write map<KeyType, ValType>, it seems not so bad to write mapT(KeyType, ValType) in C, basically replacing the angle brackets with parens. If we prefix a cy namepace on every global C symbol, the macro definition might look like this:

#define cy_mapT(Key, Val) cyMap ## Of ## Key ## By ## Val ##

And that joins all those parts into one token cyMapOfKeyByVal, plus the next token that follows too. If struct names all lowercase with underscores (aka snake case) and derived symbols like function prefixes are camel case, then the following would declare a templatized typedef for the a struct type:

typedef struct cy_map_of_key_by_val cy_mapT(Key, Val)_t; /* cyMapOfKeyByVal_t */

The declaration of a map type specialized to key and value in a header might just be this:

#include "cyMapT.h"
cy_mapT(Key, Val)_decl(Key, Val, cy_map_of_key_by_val); /* header interface */

And definition elsewhere would be just another macro ending in defn instead of decl:

cy_mapT(Key, Val)_defn(Key, Val, cy_map_of_key_by_val); /* implementation */

The interface would declare a lot of symbols using macro cy_mapT(Key, Val) as the pseudo-class namespace prefix, followed by an underscore and the method name. In effect, anywhere you would see std::map<Key, Val>::foo in C++, here you might see cy_mapT(Key, Val)_foo in C instead. It seems like it would be easy to search globally for that string, or for cyMapOfKeyByVal_foo, so finding things in the original source or in the post processed source would be easy enough.

Because we can't get intelligent results for native and pointer types, every "type" used to specialize a macro template would need to correspond to a struct, with standard operators named consistently using a similar prefix style expected by macros. So instead of cy_mapT(int, int), you would stick int inside a struct and use it as key and value type instead. You would awkwardly need to define (for example) copy constructors and comparison operators for native types wrapped in structs, but only once.

Edit: I would probably add a multi-line compiler directive to quill's preprocessor, so long macro style templates would not need annoying line continuations with a backslash. (And then a handy but simple tool can be written to convert back and forth, so you can just invoke it from command line as easily as a name demangler. The rewriter would also support directive rewriting, but little stand-alone things are good idea too.)


I use term operator to refer to any function whose purpose corresponds to what normally is thought an operator. So comparing two things for equality is an operator, no matter what the function is named. So I don't mean operator overloading in C at all. Only that a struct Key would be expected to define Key_eq() to compare two keys. Absence of such a function would be a compile error when a macro generates usage of it.

int Key_eq(const Key* a, const Key* b); /* operator==() nonzero return implies true */

Macros might assume the existence of methods named eq, ne, lt, gt, le, ge, etc, though it's rare I think it's acceptable for names to differ by just one character. (Hard to avoid with two character names.)

## token pasting illegal at macro end

Well that idea didn't survive intact after casually trying it out.

Turns out ending a macro with ## is illegal, so the suffix must be passed as another macro argument (as below), which is both good and bad. While it can't look as much like template syntax, at least it avoids weirdness of extending an identifier after the macro close paren.

#define cy_mapT(K, V, suffix) cyMapOf ## K ## By ## V ## suffix

/* cyMapOfKeyByVal_t */
typedef struct cy_map_of_key_by_val cy_mapT(Key,Val,_t);

/* cyMapOfKeyByVal_ctor(cyMapOfKeyByVal_t* self, int size) */
extern int cy_mapT(Key,Val,_ctor)(cy_mapT(Key,Val,_t)* self, int size);

void foo_stub( ... ) {

    cy_mapT(Key,Val,_t) map;
    if (cy_mapT(Key,Val,_ctor)(&map, 3)) {

Declaration, definition, and usage all look the same, because the macro is just an alternate way of writing cyMapOfKeyByVal_ctor, for example. It does look a little weird because of all the parens, inducing slight cognitive fatigue from repetition. So I would want a rewriter to substitute the macro inline as part of a transpilation pass.

preprocessor token-pasting not universally supported

I should warn anyone trying this tactic that support is not universal: the C preprocessor feature used here isn't portable.

(After screwing around on my Mac pro, I discovered under Apple's Darwin neither gcc nor cpp understand # or ## operators for stringize and token-pasting. Not only does cpp leave them in place, afterward gcc chokes on them. Obviously there are work arounds, using different tools, and a library can come bootstrapped with already preprocessed headers upon which the code base depends. Any generated preprocessor can also understand the operators.)

Edit: Unsurprisingly, building a recent GNU release of gcc does yield a cpp that understands token-pasting (although it doesn't understand minus '-' as the name of stdout on the command line -- here it hangs unless the output is unspecified). I'd have been unable to do it without following steps outlined at, because building every library that gcc uses first is slightly complex.

The nice thing about transforming the code

is that you CAN make your transformer add stuff in, like unwind support or invisible yield calls.

But since C is a subset of

But since C is a subset of C++ you don't LOSE anything by writing your C programs in C++.

C hasn't been a subset of C++ since before C89. You can do portable stack switching in C via setjmp/longjmp without depending on the C runtime internals, by inferring the stack slot in the jmp_buf. As long as the jmp_buf isn't swizzled in some way, which glibc started doing a few years ago.

A slower continuation-based backend doesn't even need to interpret the jmp_buf at all, so it's even more portable.

Nested Functions

Another difference is that you can have nested functions in C, but not in C++.

I know nothing of this but it might be useful

Tools for making custom versions of GCC

Free association

This is a really impressive discussion which involves more software knowledge than I will ever know, but somewhere I read the words "free associate", so here we go.

There seems to be a parallel in all this to model theory. I am thinking about the concept of existential closure. For instance the existentially closed algebraic fields are consistent. Thus we have a wholeness property of consistency and the atomic properties of the field. There are examples of extending fields in most algebra books and a somewhat more general approach is used in model theory. What is interesting is that the field is extended, they don't try to prove everything from scratch. Software is much more complex, but there do seem to be a wholeness propertys and atomic properties in large software systems that actually work :).

Edit: Extension is a matter of using the rules and elements of the field to generate new elements. Closure amounts to generating all possible extensions.

so is there constructive-math to that?

Despite having no more than slight compression of that, I'd like to hear more, if I'm correct in getting a theme of a system growing to fill a space. That would remind me of extending software, but still following some established pattern of organization. (But that might be jejune -- sterile with respect to new leverage, because the output matches input. Say, if you pour iron filings into a circular petri dish, observing the resulting shape is round is not new information.)

I do like a bit of free association. If the discussion is impressive, though, it might as well be arcane if there's no way to get traction through application. I should add a little bit more about alpha-renaming, or stack organization, or both. Can you explain model theory briefly in less mathematical terms?


1. Models are consistent by definition, so yes model theory is constructive. But the theory is based on theorems, they use the theorems, usually, not constructive logic. Except possibly to prove the theorems.
2. Existential closure is about building systems, mathematically a system is usually a group like structure or an algebra.
3. Closure here doesn't mean small or finite. The intended meaning is more like "all".
4. This thread and several other current threads are about large systems. One point I wanted to make is that systems not only have particulars or atomic properties but also wholeness properties. Both are required to have a system. There is a duality between system and states.
5. I am not a leading authority on model theory but I do think about it in relation to computing. If I am not getting it quite right let us know.

Goguen, Cardelli, etc.

cf, eg, Goguen — part of his data structure signatures indicate whether equality of the whole is simply equality of the parts, or whether new relations are introduced (example: (Z,Z)->Q). Cardelli also makes a distinction between "this representation should be abstracted; pay no attention to the man behind the curtain" and "you should use the representation's structure when interpreting the represented value", but I never read him closely enough to know if he makes this distinction on model theoretic[0] grounds, or simply to save typing.

(NB. This information is academic, as in practice, no –to first order– software is developed in this fashion; any resulting wholeness properties are due solely to the good judgement[1] of its writers.)

[0] are 'pert' and 'buxom' structures really technical terms in model theory? Or are they part of Simmons' idiolect?
[1] good judgement, they say, comes from experience. In practice, the most profitable experiences are those which have been suffered by others.

The man behind the curtain

In the philosophy of C.I. Lewis we choose our fundamental conceptual schemes. In Computer Science there appear to be two: logical and space. But by choosing logical we don't get rid of space, it is off stage somewhere. In choosing model theory, space is on stage and logic is off stage. These are wholeness properties and we must be careful how we mix them up, if we dare to do so.

choice and consequence

I'm sorry, I don't follow at all. If it would help your explanation, my fundamental divide in Computer Science is between space and time*, where space deals with the representation of states and time deals with the transitions between states. My confusion is this: the former is discrete and amenable to commutative boolean calculations, while the latter is usually neither as discrete nor as commutative, so to my way of thinking logic naturally coheres with space, while time requires either a more sophisticated apparatus or a coarser analysis. What am I missing?

* those who require "cash value" in their reading may find the common genre of "pretty-printing" papers more worthwhile when considered, not as treating a horizontal vs vertical spatial problem, but as treating a parallel vs sequential scheduling problem.

You aren't missing anything.

You aren't missing anything. That is a great a priori, analytic analysis in terms of Lewis's system. Lewis is a pragmatist and he regards our conceptual schemes as pragmatic choices. Once made they are hard to change, so Lewis refers to them as "true no matter what". But like other pragmatic choice they are situational. We have schemes for all the areas we deal with. I understand the logic scheme and use it when I deal with logic, but model theory is a very different scheme, and so on.

Having different schemes for everything is normal but unsatisfying, and this is what I was getting at in my post and the reason for the "man behind the curtain" metaphor. As I see it the two choices of logic and space/time don't go together "no matter what", and that is a problem.

Putting your analysis and mine together is interesting. Indeed state does abstract away time, unless we think about a space/time manifold. In that case every point is a state, so maybe we agree after all. Second, you don't really mention "Logic" at all unless you consider Boolean algebra to be logic. "Boolean logic" is a misnomer, and part of my problem, but it is normal to talk about the logic of states. So what is logic really?

Logic is not about states it is about symbols and ontology. For instance a logic of time is not really about time itself but symbols about time. Logic is categorical. It occupies one half of the human brain. The only other category on this scale is space, which occupies the other half of the brain. So it seems that no matter what position we take the other is always "the man behind the curtain" as in the Wizard of Oz.

A little more about logic and space. Logic satisfies the three laws of Aristotle: Identity, Mutual exclusion, and Jointly exhaustive. Of course Identity is the bad actor in this, but there doesn't seem to be any alternative to Identity in a symbol system. If we take state to be a space concept, logic is a lot different. We still have three laws: Non-identity, Mutual exclusion, and Jointly exhaustive. Non-identity is roughly Hilbert's "undefinable". States get there definition from the system they are in. From a mathematical and formal perspective this is a lot better than "Identity" and leads to algebraic logic. We loose nothing in this transition except continuous haggling about inconsistency. (IMHO)

Edit: We can handle particulars in an algebraic system such as logic programming with the CWA. With a little more work we can provide a situation for every purely logical deduction which essentially solves the inconsistency problem.

Edit: In a logic of symbols such as Russell's logic negation means changing the truth value of the symbol. In algebra and space it means complement as in topology. This is a big difference. See for example Murray Shanahan's book "Solving the Frame Problem"

alpha-renaming and include file transform bricolage features

There's a bit of bricolage to parts of design mentioned, which contributes to the ad hoc theme Thomas Lord mentioned. Saying you need a shell and adding one is just random: not related to fibers and green process organization. Maybe it follows suit, after leading with process organization. (How do I usually manage processes? Well, there's a shell...)

Alpha-renaming in a transpiler, rewriting source while staying in one language, affords features matching what occurs often when writing software: take some old software and re-arrange it for a new purpose, without the old version interfering. For example, one can decide for stylistic reasons that all old names looking like X should now look like Y. But when you do this in a hundred thousand lines of source (and I did this a couple times in the 90's) it can take a painfully long time done by hand. So if you're going to rewrite source, to get CPS or some other kind of transformation, you might as well allow renaming everything, so edges and vertices in the code graph can have as many names mapped as you want, as long as the resulting graph is isomorphic to the original.

A renaming transformation is functional. If you have all the old source here, and transpile to output there, it's a function of the input with specifications provided in sufficent detail in another place. The transform specifications consist of a bunch of rules about what should happen, which is of course the function, but it can have a lot of parameters including all the name swapping rules. Another thing you want to specify is bindings when otherwise you'd have free variables.

For example, because C has include-file-based interface management, there's a binding step that must occur during compilation to find any given header intended by a preprocessor include directive. Generally folks resolve this with frenetic makefile hacking in C build systems, to get the right directories searched, but there's still a bit of include-file hell involved. So among the transformation rules, I want rules about how include files get handled, including explicit overrides on a file-by-file basis if desired. Then you can say, "When compiling file f.c, when you include header f.h, I want you to map that to this path to a header file in the virtual file system." Then you can get exactly what you want. If the transpiler has all the source in a tarball database (possibly in parts but virtually joined), there's not as much ambiguity as present in the host file system.

CPS cactus stacks vs C stack and leaf calls

This is about C integration, characterizing what fibers can call after going though a CPS (continuation passing style) rewrite. If the same thing is done in another language, something similar applies, though details would vary of course. A fiber cannot call arbitrary C code that hasn't gone through analysis for CPS purpose -- not if there's a constraint on call latency.

(One debug mode in quill compiler rewrite output can put time measurement calls around foreign function calls that have been marked "fast and synchronous"; then an assert can check whether too much time has elapsed, killing the whole process if the time contract is broken. More than N milliseconds, for some N, is too long.)

C has a synchronous execution model: functions don't return until they are done. To interrupt a C function before done, you block a thread or process that's running, then resume where you were later once unblocked, with the C stack as it was before resuming at the execution continuation. (The used part of a C stack is usually immutable when a thread is blocked, but code running an interrupt might use don't-care space beyond the stack's end for scratch processing. So saying the stack is frozen when blocked is going too far.)

A fiber model running in a VM giving real-time guarantees to a host env doesn't want a C function called to ever take too long before yielding to the host env. In effect, every call made by a fiber VM wants control to return to the VM before a) any blocking, and b) before N milliseconds have elapsed -- possibly even less than a millisecond. So you can call strnlen() because there's a maximum number of bytes you might read, but not strlen() if there's no guarantee of a null octet within a maximum range.

To call arbitrary C code basically subverts the whole model. You might as well not be using a fiber model, because it's then already broken despite having made everything more complex without getting the intended benefit: async execution within fuzzy real-time limits.

Let's define a proper leaf call in a C-based fiber model. A normal C thread executes a call tree, starting from an entry point we can call a root, then calling nested children functions recursively. A leaf call doesn't call another, forming a recursion base case. Folks learning to program often diagram call trees as a graph, with child nodes underneath parent nodes. (Partly for this reason, we're going to use a convention that stacks logically grow downward, even if they don't in your runtime's address space, because we show calls going downward in diagrams.)

In a thorn (fiber) model, a leaf call is one that enters a local root, makes one or more nested calls, then returns from the root in bounded time without blocking. And for the duration, calls use the C stack for argument passing and local variables. In other words, a leaf call is a normal C function that hasn't been rewritten for CPS. Nothing interrupts the orderly process of enter-recurse-unwind-exit. Anthing else is strange, can't be a leaf call, and must be handled asynchronously via CPS rewrite. For example, if you wanted to throw an exception, or perform any other kind of non-local exit, that's not a leaf call and must be handled as async. (But leaf calls are fast, with very low overhead, and that's why we want them used where most of our cycles go at runtime. They just need to be framed inside async dispatch.) Or, if you wanted to call a blocking api down in the call tree, that has to get deferred to a host env as an outside call, and must be async because the call tree parks and waits at that point. Anything (throw or block) that interrupts a continuous flow of enter-recurse-unwind-exit must be async.

Non-leaf calls use heap-allocated activation frames that are not (repeat not) on the C stack. But never-the-less, execution is still a call tree, just in a different plane if you diagram it accurately. To get a proper backtrace requires not only the local C-stack call tree, but also the chain of function names above it in the þ-stack call tree. A leaf call is a right angle turn into another plane using the C-stack. When a leaf call exits, it leaves that plane and goes back to the other managed by the sceduler's trampoline. So during execution, a little C-stack hangs off every leaf call in the execution tree, then it completely unwinds in return to the trampoline. But there's no C-stack any time a job fiber is parked. (The C-stack from the host env down into the VM and the trampoline scheduler is not considered part of the backtrace of an executing thorn green thread, though that's what you would see in gdb if you looked while a fiber was running.)

#define cy_async /* quill keyword ignored by other compilers */

We can use a new keyword to annotate non-leaf functions. (A macro defined to nothing can be used in normal C environments to ignore the keyword.) Say we use cy_async to mark any function async that doesn't have the fast enter-recurse-unwind-exit uninterrupted nature. (Prefix cy just means C thorn, but can be replaced with anything in alpha-renaming. Names are arbitrary.) Note if we miss one, and forget to annotate a function with cy_async, the compiler infers it anyway if any code called inside is marked async. You are async if marked thus, or if any path exists in code you call to an async function, which does mean we want whole-program analysis of code reachable by one fiber's call tree to ensure we miss no async status down in the call tree's nether regions. Every async function must be rewritten via CPS, and you can't call them directly from C, because the trampoline has to do it, so it can be interrupted and parked when the job's lane's timeslice is exhausted (or anything else happens to cause a pause or non-local change in control flow). Note there can be two versions of a function, one async and one leaf, if there's a way to statically know when both apply in use. (There's a metaphor about the similarity of async and const semantics I'm not going to make explicit.)

During a CPS rewrite, any function making an async call gets cut into pieces, put into separate functions where only the entry point is called directly, while others are reached only via continuation. To call an async method, you allocate a frame matching the call signature, populate it with arguments, and return to the trampoline saying "call this other function with this activation frame", or something very similar. (Thus a C return statement can make the þ-stack deeper, or shallower, depending on whether you are calling or actually returning. While it's readable enough that gdb debugging is okay, you don't want to think like this and do it by hand, because it's pretty distracting.) The code immediately afterward, if any, goes into a separate continuation function. Because all locals go into an activation frame, they will still be where they were before when the continuation is called. Or if there is no code afterward, what you have is a tail call, so it's not necessary to come back, and you can just apply TCO (tail call optimization) to leave the caller out of the stack.

Every unique call signature needs a different activation frame struct definition, if you want strong type checking when the next C compiler gets to the code, which is definitely a good idea. If you wanted to pre-allocate a called function's local frame up-front along with the formal parameters, you'd have to use function-specific frame structs and it would be more tightly coupled. So it's likely better to allocate a frame for formal parameters alone, then let a called function allocate a frame for locals separately. That way every function with the same call signature can re-use the same formal argument activation frame struct definition (using a standard name mangling scheme), and this would greatly reduce symbol table cost on later compilers, linkers, and debug environments.

If the only thing a quill compilation pass does is CPS transform, then it's mainly rewriting async function calls ... so they can no longer appear inside C preprocessor macros because those presume a synchronous C execution model where substitution fits nicely. So it's a compile-time error to have an async call inside a macro, unless part of the specifications for rewrite include permission to inline that macro and then rewrite async calls within.

inline annotation vs separate but associated rules

Indirectly, managing lots of symbols while programming in the large is partly about maintaining relationships among scattered bits of information. As you scale up, things get farther apart and harder to associate. Tracking and managing code changes -- such as in source code control -- can push back against small changes you want to make in the interest of refining detail. Fixing or adjusting anything incurs the overhead of cascading review, tracking, and integration afterward. Cost of any small change increases in proportion to the size of everything else.

This post is about inline versus out-of-line annotation. In a PL (programming language), inline annotation is the norm: keywords go next to things modified as part of the grammar. I suppose this is partly an artifact of code represented as text, in a file, with a linear and local focus. There's a cost angle too that I won't explore, related to not carrying knowledge of annotation farther than immediate local use, which would have been a concern decades ago in PL tools. Not now though.

Above I mentioned the idea of inserting an async keyword next to functions we want handled specially, by rewriting it's callers to understand it might park the calling fiber. (Though not required to, async api permits parking, which is important when drafting an interface subclass-able by async code, even if it's not async now.) However, inserting a keyword has management costs -- downstream change tracking -- and makes it harder to share the same code in different places with the keyword both present and absent for different viewers. Inline annotation is centralized info forced on all consumers, including tools tracking change.

What happens if we put annotations somewhere else? The first time you do something, there's an activation threshold to get over, justifying the cost of doing a new weird thing (because now you have to live with it forever). The easy default rule is "no new weird things", since this stops you from being choked to death by every random idea the next dev dreams up; cost is cumulative, so you need to pursue a spartan angle to get anywhere at a run instead of crawling.

A rewriter, though, already wants to put meta-information somewhere else out-of-line (new names replacing old ones), so that new weird thing is already bought and paid for. So you might as well support general out-of-line annotations: a system of attaching associated tuning parameters that explicitly adjust semantic policies, like whether code is async. Note async and sync code should not use the same names, of course, so semantics and naming can interact. Thus, when everything else is a text file too, you want text-based policy invoices and manifests containing rules to direct what happens when a tool like the rewriter consumes the source code. (This is a good spot for a joke dialog prosecuting a holy war over whether s-expressions or XML is the right way to do this, with a long digression into character sets and encoding, with a few hundred words about utf8 imperialism. But in short, use what matches the language; when C is ascii, all the meta-info is ascii too.)

We are used to saying, in a makefile or project description, what we want to happen with each file in source code. But we can also say what we want to happen to each function too, as well as struct formats and member fields. Anything you can declare inline, you might instead declare somewhere else, at the cost of making it harder to find (which is an important rather than trivial consideration, since it's very frustrating not to be able to find essential definition info). So async status can be declared somewhere else, instead of using keywords. It's easy enough to review by looking at an output listing of what a compiler believed was true. And you can write assertions about what rules a compiler enforces, so you get a build error from any surprise.

Rules can have scopes instead of being global only. Rule A is followed everywhere, except in this file where we do B. Rule C is followed everywhere, except in methods of this class where we do D. It's illegal to read or write this field, except in this one function. Etc. When rules can be enforced by a compiler, it saves each dev time in development, because there are fewer places something can happen. It shouldn't be necessary to read all the code, no matter how big it is, to find out what causes something.

Meta-info stored out-of-line is a kind of shim, relating stuff in one space to things in another space. When the two spaces are domain and range of a functional transformation, it's clear meta-info is part of the function, and needn't be semantically intermixed with either space (especially the domain).

only a head prologue need be public, so one frame will do

So it's likely better to allocate a frame for formal parameters alone, then let a called function allocate a frame for locals separately.

I thought of a refinement in async calling convention that decreases entities, consolidates local frames, and improves strong typing in later compilation via normal C compiler. It's probably obvious once heard, as other choices are sub-optimal, which is why I kept going over it. Here's a rhetorical question for context: How many entry points does a caller need to see in order to invoke an async function? We want the answer to be one, but I kept assuming api for the entry point needed to be public, but it doesn't. We only need a public api to make a call frame, so size of both formals and locals can be folded into one refcounted frame allocation. A continuation corresponding to entry point need not be public when it goes in the frame. After allocating a call frame, we need only yield to the trampoline (our caller) to let it move forward.

The following explanation goes into more detail than you need. As soon as you get it, there's no need to keep reading. A long tail of follow-through is there only for folks who find the business really alien sounding. My hope is that folks who don't write compilers can see what happens with async control flow here. [I was going to add a concrete code example, but it looks clear to me without it. I can add one on demand.]

Consider old-school calling convention in stack-based machines, where everything goes on the stack instead of registers, with a single monolithic stack which stages a transition from one function to another (the caller and the callee). Parameters and return address (the continuation) get pushed on the stack, then we jump to a callee who allocates more space for locals during a prologue, which is where an activation (stack) frame finishes setup, before the function body starts. Sometimes a function has more than one entry point, to present different ways of entering, perhaps with alternate arguments and/or prologues. In any case, the address of the prologue (let's call it "head" as shorter name) is different from the body, but the body is reached by way of the head as entry point.

A function body doesn't need a public name. Only the head needs to link to a body address, so only the head need be publically visible.

When an async function call is transformed by the rewriter, imagine it appends "_head" (or some other standard suffix) to the function name, representing a function that performs a prologue and then yields, but with the body already setup in the call frame, so that's where a continuation goes next. (Yes, appending a suffix to the non-public body makes more sense, especially if scope is local to a compilation unit.) A rewriter might also generate source code for the head as a thunk, which allocates the refcounted frame (from a vat) and gets everything ready to go. The net result is that a caller dispatches the head, but yields after the head returns, so the trampoline can schedule when the body gets run -- typically immediately for locality-based performance. So the dividing point in control flow occurs after the prologue has run, instead of before. The head must be publically visible, and it knows the size of locals. The caller need never see the name of a body.

A caller may have a handle to hold the frame of an async function called, because when the callee returns, the return value will be in the frame until the caller releases it. (And conveniently, this lets a callee frame live as long as needed, so arbitrary return value sizes can done without extra copying.)

Correction: the return value cannot be in the callee frame because under TCO (tail call optimization) this would cause a lot of tight-coupling induced complexity. So the caller must provide a local for the return, which gets copied or moved (from a callee frame) when the last nested call returns. All that can be managed by the rewriter, and should be, so it can be changed. Might want to support a keyword hinting whether move or copy is better for the caller (but the callee gets the last word).

refcounting vs TCO and C signature rewrite

This is aimed at anyone reading parts of the above, who then says, "I should do that." Take warning about subtle refcounting issues, if you do tail call optimization. By the end I get a little redundant, because parts of the same idea come up more than once. It can be done correctly, but it seems to require slightly complex compiler behavior in a rewriter for async fibers.

A naive rewrite of async methods might aim to pass nearly the same arguments, modulo an extra calling context parameter or two. But if you implement TCO (tail call optimization), this can fail if the space for arguments is allocated in a stack frame ditched by TCO, if a caller frame is released when no longer reachable, but the callee keeps using arguments from the now deallocated frame. This wouldn't happen if C was garbage collected based on reachability from (for example) stack frames that are still alive.

It's common for C devs to pass space allocated in local variables to functions called, often as performance optimization, and therefore important to keep as such. Since execution is synchronous, with a callee finishing before a caller resumes, a caller's locals can't go away while a callee is using them... Unless you implement tail call optimization, and the caller returns the value returned by the callee. Here, if a local's address is passed as an argument, a compiler must realize the local is a reference to the caller's frame, so it stays alive.

The issue really only comes up if a rewriter adds refcounting semantics, when no refcounting was in the original C source written by a dev expecting normal synchronous C execution. (How can the original dev pass handles as arguments when they don't exist yet?) Supporting TCO, and refcounting, causes a problem if this conflict isn't noticed. You could disable TCO if locals are passed to a tail call by address, but that kind of defeats TCO as a feature when it cannot be relied upon. So perhaps better is calling a head/prologue entry point that also passes associated handles keeping each local alive, so a callee frame can hold the same refs known to pin the arguments safely during use.

The ugly part here is temptation to have two generated entry points: one used for tail calls and one for non-tail calls, when local addresses are passed. (For non-tail calls, the caller frame stays alive anyway, and there's no need to keep locals alive via special effort.) The first time either variant is used, tail or non-tail, you can generate the associated head prologue, if not already done. The one supporting TCO for tail calls would need to pass either handle or rod (short for RefcOunteD) that can keep a local alive, via handle in the callee. This causes each original arg to be passed as a pair when the caller passes the address of a local, where the first is the local and the second is a ref to whatever keeps it alive.

Edit: If a local's address is passed to non-tail call which looks safe, but its address gets returned from the callee in a new form aliasing the pointer, this might next get passed to a tail call with status as local variable scrubbed away by anonymity. And then you have the same problem. So it might be necessary to pass all addresses paired with a ref known to keep it alive.

Are you allocating return frames?

By the time I got TCO sorted out, in a system with persistent closures, multiple-value returns, optional named arguments and returns, and unevaluated thunks for both arguments and returns, I discovered that there is essentially no distinction between function calling and function returning.

You're right that you can't put return values in the call frame if your calling discipline has TCO and closures; fundamentally this is because returns are live (or eligible for GC depending on your POV) under different conditions from arguments. So I started allocating return frames, and eventually discovered that they have all the same properties and requirements as call frames.

Both calls and returns require a frame to be allocated to transmit values, thunks, and/or references; both require the frame to be garbage collected (in the general case) rather than treated as a stack entry; type dispatch can happen on exit as well as entry (returning to a different point depending on what handles the number and types returned) - and tail call elimination can affect return frames as easily as it does call frames.

The effect is the logical conclusion of CPS transformation.

yes, return values come back in frames too

Yes, in the sense everything is heap allocated, except the small amount of C stack used only in leaf calls that are fast and cooperatively atomic. (It's up to an environment to ensure leaf calls are not more preemptively interruptible than other real-time code, if that's the idea.) So when a return value comes back to a caller, it's in some flavor of frame. It might be the frame of the last function called in a chain, or it might be a frame allocated just for the return. (Maybe the last frame was really big, and too much fragmentation would result if the return value kept it all alive.)

essentially no distinction between function calling and function returning.

Very nearly the same, so it takes a lot of words to evoke the essence of any subtle difference (and it's moot to most people who wouldn't care). You're speaking from the expert POV though, which alienates beginners. :-) I've been trying to write to a reader, Stu, who wants to know whether the output of a rewriter will make any sense when seen in a debugger.

Perhaps every long learning experience goes through phases from a neophyte start POV to an expert end POV, with middle intermediary steps often oriented toward explaining ideas in terms sensible to the start viewpoint. You could personify them as Stu for start, Ivy for intermediary, and Eli for end, and then write dialogs that make sense because focus of each persona is clear. (As an acronym, SIE recalls German for you/they, emphasizing experts pass through each phase.) Eli is happy with a soup of continuations -- a plasma of code and data -- but it sounds crazy to Stu, who has learned crazy must be shunned in programming because it's usually wrong (as demonstrated endlessly by beginners).

Suppose you want to port a bunch of old C code to run as coroutines in a fiber runtime environment, so you get the moral equivalent of green processes behaving nicely in dataflow pipelines. In effect, all the code was written by Stu, a reasonably good C dev who knows nothing about continuations. To write an argument that it's still pretty much the same code, just re-arranged by rewriting for continuation-passing-style (CPS), you need to explain in terms Stu understands, which might be argument passing and stack activation frames (because Stu once debugged in an assembler only view, and grasps that much). Therefore Ivy explains a reason for each specific complex thing added during rewrite, but mostly in terms that Stu already gets. Turning Stu into a guru so he becomes Eli isn't the goal. Instead the idea is Stu can now explain to Sam, whose C code is also going to pass through the rewriter, using only start POV terms. Stu is going to walk Sam through his code under gdb, looking at a rewriter's output, and Sam is going to cancel the project if it freaks him out.

So if anything really goofy is going to appear after rewrite, there should be a good reason that Stu understands. (Having Eli explain doesn't help.)

If TCO is a default behavior, and Eli uses refcounting for activation frames, the bizarre impact of exploding arguments passed as duos (so refs are paired with args for safety) is hard for Stu to explain, so Sam sees chaos.

However, C doesn't have tail-call-optimization normally, so suddenly turning it on may not be a good default. It wasn't need before, so why have it now? Maybe it should only be used when a developer intentionally relies on it, so stack depth doesn't explode, by using a tail keyword like cy_tail between the return keyword and the called function. (And recalling an earlier post above, maybe a separate file containing meta-info enables TCO for tail calls in that particular function, using out-of-line annotation.)

One advantage is that suddenly every rewritten function no longer needs to defend against possible TCO for tail call usage that might lose local arguments unless protected. So Stu's rewritten code looks less goofy under gdb, except in rare cases when Stu turned on TCO on purpose, and expects more complexity there.

Both calls and returns require a frame to be allocated to transmit values, thunks, and/or references; both require the frame to be garbage collected (in the general case) rather than treated as a stack entry; type dispatch can happen on exit as well as entry (returning to a different point depending on what handles the number and types returned) - and tail call elimination can affect return frames as easily as it does call frames.

I'm agreeing with you completely, if that doesn't otherwise come through. In my coroutine fiber model, most state lives in heap-allocated activation frames, strung together in lists forming cactus stacks (acyclic frame trees, if you consider many continuations at once, that might all be extant with shared subsets of stack frames). That includes a return value, which must be in some frame, until a caller can make a copy of it in the caller's frame. So some frame from the callee is still alive temporarily during the hand-off, which strikes Stu as a little weird but acceptable: the stack doesn't deallocate instantly after it quiesces, only after it is no longer referenced.

If the caller's type needed and the callee's return type don't match exactly, but are compatible, then there's conversion step (which can be either static or dynamic as the rewriter generates it) corresponding to the dispatch you mentioned.

Now the interesting part is teaching Stu what happens on out-of-memory, so he can explain to Sam. At least two kinds of OOM can occur: the vat really can't allocate anything, or the green process (a lane) hit the upper bound of a memory limit. The former necessarily kills the lane, because there might be no place to hold the args so restart can occur after handling as a recoverable error. (Getting here should be hard, if the runtime responds to low memory earlier as a matter of course.) But the latter can still allocate a frame before parking the job and putting the lane on notice that a memory limit was reached. In effect it signals a condition that can be handled.

When an async entry point is rewritten to have a head prologue, which allocates the call frame, it needs extra context above and beyond the original C function args. Whether it's a tuple or a single arg combining everything is up to the rewriter. But it must reveal the env, the vat, and the lane, because the lane's group is where total memory used is tracked, as well as memory limits. While lanes offer cooperative process isolation (by sharing only immutable state with other lanes) resource accounting is shared in a group. Refcounted immutable state received from another group is counted against the receiving groups resource allocation when new refs are added, so we under-commit memory usage instead of over-commit. That way refcounting cannot make things wedge in a deadlock.

Edit: going back over shared resource accounting for lanes in the same process group, now it isn't clear to me why I thought zero net change was so easy to see in space hand-off from one member of a group to another. A count of refs can say nothing about who holds refs, so there's no way to tell if you are adding another ref to something for which you already have a ref. If you were going to count first ref as space cost against your bugdet, you must count all of them, because you don't know how many you hold already (just by looking at the refcounted object). For the same reason, lanes in a green process group can't tell whether another member holds a ref either. So zero net change can only be done with move semantics, when both source and destination lanes are known to be in the same group. It's not nearly as nice as I had imagined. In general, adding a ref increases official space footprint by size of the object, even if shared, when you wanted a strict accounting of resource usage.

Needs a compacting gc but...

You might check out chicken scheme. It translates scheme to C in an odd cps manner that uses the stack as the garbage collector's nursery.

When the stack is full it does a minor gc and a longjump to clear the stack.

On the benchmarks I saw, it was consistently the fastest scheme-to-c implementation.

better chunking in one's current tools

Good response to the part of the original post saying it's mainly about transpiling, but less so to the rest of the same sentence, saying the goal is automated rewrite of a language to itself: staying in the same language (with symbol renaming and fiber support for concurrency).

If folks primarily interested in Scheme have not already heard of Chicken Scheme years ago, they should check it out. Scheme is an interesting language in several ways, one being the odd inclusion of first-class continuations, which are a real bummer to implement for beginning language implementors. (Most of Scheme is really simple, except for this enormous pill in the mix.) However, you can write fibers with continuations, so you can illustrate operating system algorithms in Scheme (just not suitable for actually controlling real-time hardware) because scheduling engines can be expressed natively.

When a language or runtime is powerful enough, you can get almost anything to occur if you just write it, even if not already presently supported. So in that sense it's probably possible to get Chicken Scheme to write C in a way making it an embeddable C server with a million-ish green processes where none make blocking system calls. But it's not that way now, and everything not now focused on that goal may interfere, as distraction. However, I'd enjoy someone's presentation showing how it does everything I describe above.

(Most of my posts are about rewriting one's current language to itself. Since in my case that's C, my points are usually about rewriting C, and Chicken Scheme doesn't help in any way. I did mention that I would write a Scheme/Smalltalk hybrid compiling to the variant of async C the rewriter generates. So you get I like Scheme. But Chicken Scheme wouldn't help with that at all: it doesn't generate the same async C style my rewriter would emit. While not useless, it's completely irrelevant to me personally. A single point in the C tool space doesn't help develop every other point. One of my coworkers once suggested I start by tinkering with gcc, as a way of speeding things up; I won't describe why that's funny.)

In an earlier post I said:

There's something interesting going on with simple unix utils you can compose in pleasant ways. Why can't parts of a single program in one process be like that too? Why aren't sub-routines structured more like cooperating concurrent processes, and less giant tree-shaped arithmetic expressions with self-modifying state?

You can directly infer several things from that. In any language L, whatever causes you to organize code in giant tree-shaped computations can be transformed (with dev guidance), by rewriting L to itself, into many cooperating concurrent green processes, to achieve the same sort of thing using pieces that are individually less complex when analyzed as coherent local clusters of related functions. A large program (created by N programmers for large N), can be made easier to understand when parts are amenable to analysis in isolation, and when this can be done by tools too, though it's nice to confuse a dev less as well. As a side effect, if the result decreases coupling enough, you may be able to move parts around in a distributed system, including ways letting you throw parallel resources at data sets large enough to benefit from partitioning despite extra communication overhead. However, small problems may already be simple enough the way they started.

As computing resources have scaled up by orders of magnitude, for the same price, we have not seen tools also scale up features that support cognitive chunking. (Insert thousands of words here from cognitive psychology; if you manage to say something about it I don't already know, I'll say so.) Decades ago we organized code into subroutines, then functions following rules, then object systems arranging functions in groups, etc. etc. But then the tools didn't scale up to the equivalent of combining a hundred or thousand of the old systems in one operating system process. Human beings need the chunking to organize large numbers of interacting things into few enough groups to track mentally. And the mental model needs to be correct, leading to valid reasoning and inferences, rather than forcing new bugs by telling lies.

A vendor approach to solving this problem is to provide scaling platform-specific tools with built-in architecture lock-in. But if you work with green processes, you can do this portably on every platform, provided you avoid making interfaces bind specifically to one operating system.

One advantage to writing giant tree-shaped computations in one OS process is ability to clearly see all the code, because you wrote it yourself. (Third party libraries weaken this of course.) When you use pre-packaged process offerings in the environment, a good number of things become opaque and out of your control. So despite having better chunking with processes, now you might no longer be able to control what happens, because you really only have a vague idea about internal details that end up mattering, and can't easily determine them without investing in platform kernel developers who are going to move on to greener pastures soon anyway. (And since they aren't going to write any docs, they may as well never have existed once they're gone.) Erlang has a big advantage in making process dynamics transparent enough to avoid ruining benefits provided by chunking features.

(wrong spot)


atlas name trees for virtual file systems

Suppose we add a virtual file system to a specific application process (or in general to a library), with no relation to operating system required. It's a view with name tree structure, imposed by an app (or a library it uses) by fiat because no one but the app dictates the model, though parts can be negotiated. The model is explicitly virtual, the same way a planetarium projects a view of stars in the sky without pretending to be the actual sky. It's a map with name-path keys and whatever associated values it wants, though values consisting of octet sequences may imitate a file system well. Term atlas makes a good name for this idea, because it clearly means maps of a bunch of stuff, gathered in a cohesive collection. (If determined to maintain a space idea somewhere, we can use zone to mean a specific path in the name tree, or something.)

The Plan9 model called this VFS entity a namespace and it had strong relations to the OS file system, because a primary goal was to represent everything as a file, for both data and control purposes. (Writing certain files amounts to sending a control message.) However, when we look to Plan9 for useful ideas, a predictable consequence is priming folks to think "operating system" when this is not the case, because a library is not an OS, and in-memory name-tree atlas is not a file system. Confusion seems doubly likely when the idea is to introduce behaviors much like an OS to the runtime of a language or application, because folks will draw on experience learned doing OS related work. Thus analogy to OS features may cause some readers to assume an OS goal, instead of an app simulation entirely for the purposes of internal organization.

Take an app running in a single OS process, then draw a circle (or a sphere in a 3D metaphor) around a subset of that app consisting of code from a library that runs an async fiber engine, hosted in the app by an environment layer connecting the engine to the outside world in whatever way pleases the app's developer the most. Inside the delineated circle is a closed world that interacts with the host only via explicit env interface. Platform dependent things go in the env, like system calls. The app injects runnable code in the form of fiber jobs in lightweight process lanes, via code that has been CPS (continuation passing style) transformed to permit running in the closed world. This is where dependencies on the outside env come from, injected by the app. Jobs only service sockets for connections opened by the app because the app spawned a lane tree (of lightweight processes) running such code, making env calls that park jobs until async replay awakens them to resume.

This closed world is a kind of async flatland, running in the surface of a sphere where the trampoline lives and a C stack has zero depth until a leaf call occurs, temporarily leaving the surface briefly until the C stack unwinds to the trampoline again. The host app's env implementation is a kind of atmosphere around the flatland world. The world's view of everything is an atlas in name-tree format, imitating a file system that mounts whatever needs to be visible. The atlas can map everything: within the world, outside in the app, and remotely in other local OS processes or distributed network nodes. As long as something is willing to implement a suitable async file system protocol, it can be mounted at some path in the atlas name tree. (Since the world is composed of concurrent lightweight processes, it's easy to write embedded servers acting as proxy for remote systems.)

And imitating Plan9, while remembering we are not doing an operating system, we can have per-lane views of the atlas, the same way Plan9 processes can rearrange the local namespace any way they want. Trees support copy-on-write fairly easily, so space overhead for a lane need only be proportional to changes made to an immutable atlas inherited from a parent. (Changing a tree leaf in the atlas name tree must CoW the path from root to leaf, but everything else can remain imutable and shared.)

But why? Why do we want a file system style namespace in the form of an atlas name tree? Because uniformity enables abstraction with low coupling, because you can change the implementation completely while leaving the path structure and names the same. Looking things up by name is late binding with low coupling. Accessing things directly and synchronously is early binding with high coupling. (The more memory you share, the tighter the coupling.) Whenever code is compiled to use statically known offsets in binary data structures, the result is fragile from direct dependency, since any change in format requires rework. If interface changes, a dev must fix it personally. So loose coupling via names can often mean less dev work associated with integrating changes.

A library for an async fiber runtime doesn't need an atlas name tree to imitate a file system. An app can spawn lanes to execute jobs for tasks having nothing to do with services mapped to name-based paths. So a minimal VM may not use a pseudo file system, based on an atlas. A completely separate library might define atlas interfaces and tools; you would only need to link an atlas library when running jobs that make atlas api calls, directly or indirectly. If code footprint is no object, you want everything; when you write an embedded shell, you want the atlas name tree library too, or else there's no file system simulation. Several sorts of thing can go in separate libraries: unit tests, load simulations, debug tools, reflection support, even floating point utils. (Some apps compile with options that trash floating point registers, so any floating point code will fail if interleaved with routines compiled that way. Therefore floating point code should be unreachable in such executables, if present in a library anyway.)

Unlike Plan9, which wanted everything to be a file, we only want it possible for things to look like files, when this would reduce coupling. For example, suppose runtime data structures for each lane put command line args (and/or environment variables) in specific places. We can walk data structures directly ourselves, or we can read from a path in the atlas name tree, turning into code walking data structures for us. If runtime implementation changes, and data structures too, all code taking the direct access route must be updated when tightly coupled. But indirect access via atlas path may only change if names change. The more likely something is to change, the bigger a problem this is. Tentative and/or distantly related components (developed by independent parties) will break you unless access occurs only through an indirect bottleneck, automatically updated for you when those components change.

Consider an extreme case. A customer has an old binary with a problem, and you want to debug it live, even though all your code base has moved on to newer and better things. Perhaps debug tools you want to use only exist in recent builds. If that customer's binary linked with the reflection library, you might mount a reflection info path from their remote atlas, at some point in your local atlas, then use that to drive local debug tools instead of assuming a binary layout match in each system. You can avoid assuming, when you can go fetch explicit description from a known location advertised in a name system. It's the assumptions that kill you when wrong.

When the same information already exists in some other form, it need not be reified in an atlas name tree: you can synthesize it on demand when needed at runtime, in the laziest manner that gets the right answer. For any structure you want reflected in an atlas name tree, you only need code willing to render it for the atlas system when asked. Change a data structure? Then change how it gets mapped to the atlas, and you're done. (Failing to do so might often cause a compile error, if typing is static enough.) This resembles runtime data structure interpretation: converting one binary format to another uniform format on demand for non-fragile indexing.

An atlas is like a registry that advertises what you want to know. (If you squint, an atlas is a tuple space with hierarchy.) In some cases, if an atlas is done for you, it might reveal only what you are allowed to know, for the good of app organization. However, inside one OS process, there's no way to protect yourself from other code running in the same process, since anything in mutable memory can be corrupted at will. This strikingly differs from objectives when writing an operating system, where protection is a key feature. Inside one app process, protection is cooperative, because it can be subverted by walking data structures directly instead. So value is organization, not safety.

In an agent-based system, sending messages might get most things done, so writing to pseudo-files in an atlas name tree would be an awkward way get similar control effects. But an atlas affords access to information an agent would prefer not to receive as a messages: the history of the application, for example. (If you compiled an app without an atlas library, an agent would have to send a message requesting info desired. But even with an atlas api, reading the name tree might park a job and send a message to a remote node in a distributed system to get information not cached locally. So location of messaging layers is pretty fluid and fungible. An atlas name tree merely has a consistent contract that's easy to trace in human readable form.)

For debugging purposes, you sometimes want to log atlas name tree reads and writes, since otherwise you might be unable to learn why a lane (lightweight process) failed or misbehaved, if the proximate cause was a datum from the atlas. (Yes, this is exactly like tracing system calls in Linux to discover explanations for events observed. To reason about effects we need to see the causes.)

Well I'm sure I forgot to mention something, but I can always append a note. Edit: Oh yeah, a couple more ideas are only tangentially related to atlas name trees. I'll note a simple one here, then post separately on file and socket similarity. The simple idea is about where reflection info goes in a tree, given that it is often static in nature, and thus needn't be inside an executable's binary. Meta-information and meta-meta info eventually ground themselves in a fixed point, when a given entity describes itself, the same way that the class of Metaclass is itself in Smalltalk. So while a tree of meta-info might seem like it would entail infinite regress, in practice it doesn't when you make haste toward a self-describing base case. A separate file can enumerate static parts of a reflection sub-tree, acting like a little file system that gets mounted at the right point. You can either load it in memory, or go fetch pieces from the file each time needed, with a cache to suppress back-to-back i/o for the same thing. When this amounts to help info, it's nice you can upgrade it separately without recompiling a binary.

valves, spouts, and flows

Sockets, files, pipes, and other things that can act as sources and sinks for i/o streams have something in common that might be given a common interface, for which I want a separate term, because prose is less confusing when different words are used for abstract and concrete instances of the same idea. In this case I settled on abstract name valve, whose interface can be specialized for files, pipes, and sockets, etc. So they are all valves. A valve represents the nature of a chokepoint dealing with a stream handled only in discrete blocks of data, which presumably live in buffers on the app side of a valve. (Immutable refcounted buffers, in my plan.)

The business end that job fibers deal with, defined in app-specific entities, is named a spout, which hosts refcounted data fragments among other things, acting as the front end of a valve. A spout is something like a FILE in C, while a valve is something like a descriptor plus other state and behavior to specialize stream type. The abstract name of content in the data stream itself is a flow, which is not manifested by a concrete object (unless the environment does that, which it might).

Ah, and streams with both source and destination living entirely inside a single runtime are called ... something else I haven't picked yet; can't live with thorn domain sockets. Channel is the best socket synonym, but that invites endless Go comparison. Outlet? Vent? (Jetty is a synonym for quay, but emphasizes sticking out in water. No one will keep the two straight, so that sucks compared to channel.) In analogy to port numbers, lightweight connections can use quay numbers, so you can bind, listen, and accept thorn domain socket connections involving quays. You won't be able to tell the difference between these and the other stream types, unless you go digging.

Edit: after mulling it over, the quay metaphor stinks, because the meaning of port is closer to opening there. So gate is better. And for socket, slot is closer too, when going for a near match. Because of airports, the phrase "gate number" doesn't sound like a weird replacement for port number. And slot is easy to explain as socket replacement. When it sounds ambiguous, saying gate slot is clear, meaning a slot associated with that gate. (More than one slot can use the same local gate for obvious reasons that mimic sockets.)


Sorry to be falling behind on the parsing. How does it differ from other dataflow / flow-based / orchestration systems, succinctly? Thanks.

universal watchmaker toolkit

Succinctly, though not my plan, you can write all those.

How about this?

in-the-large via effective composition of in-the-small

The image I had in mind was small gears.

Metaphor and morphism

I tend to think in terms of Boolean algebra, but the gear/mechanical picture is always there and works good with C. The gear picture is also less abstract. Composition is easily understood as adding more gears as long as they mesh with the ones already there. This is not unlike the extending a field idea that I mentioned above.

prefer story based metaphors about physical systems

While I liked your earlier comments, math interests me little, unless there's something for me to imagine as a spatial system. I don't think I ever had a math grade less than an A, through calculus and discrete math, but I didn't find it fun -- just easy, because I understood things the first time and remembered it all. I don't find math a useful source of metaphor and intuition, but terms can shorten some descriptions. In code I do something like visualize changing graphs over time, and verbal logic doesn't help at all.

Like all analogies, the gears metaphor has failings, like possibly implying lockstep clockwork, when that's not a good fit if visualizing a system of thousands of independent watches, loosely interacting. A better fit is something like a book analogy, where each page depicts a clockwork system, but no page need interact with gears on another. Where in math is a good source of metaphor about that?

Richard Feynman

xkcd 895 (as usual, the mouseover is the uber-punchline).

Domain theory

A better fit is something like a book analogy, where each page depicts a clockwork system, but no page need interact with gears on another. Where in math is a good source of metaphor about that?

This is not one of my strohg areas, but aren't you referring to denotational semantics and domain theory?

surprising suggestion

Not intentionally. That I'm aware, no practicing developer pays attention to either. After skimming now, I see no reason to pursue.

apples and oranges

How does it differ from other dataflow / flow-based / orchestration systems, succinctly?

I have time for a longer response now (skipping the holiday party). That's a what question in response to my how description, so it's basically a category mistake. You can do all those objectives in different ways, using unrelated programming languages for example. Can you do dataflow and orchestration in Ruby, C, C#, Lisp, Smalltalk, Erlang, assembler, Basic, Fortran, etc? Yes, because how you do something only changes the amount of trouble it is, though some tools don't get along well and that can be limiting. Was that too long? It was more than ten words. What's your attention budget?

I sorta hate xyz-oriented labels like object-oriented. You might call it process-oriented programming, when an app uses multiple green processes, aka lanes, so you might call it lane-oriented. But other related words in a cloud of semantic similarity include: async, portable, fiber, process, distributed, agent, CSP, coroutine, concurrent, dataflow, scripting, embedding, scaling, nesting, messaging, streaming, and maybe a few more. I don't see a reason to emphasize one. I think focus should be a model that makes sense, and not disembodied semantic aspects. One thing several aspects have in common is that they are irritating to do, which is one of the reasons I group them under a thorn umbrella, as a generic irritant.

But you know, it's just a model. I'll put it up somewhere like Github, if I code it, under a permissive model like BSD because that would please me. I don't care about the free software community at all, beyond vaguely approving of constructive behavior oriented toward social good, which ought to be the norm and isn't. However, I expect just describing it this way directs attention toward similar efforts (I see signs to this effect) and that's good enough for me. It'd be nice to get feedback on LtU, but that doesn't really happen. I generally look to HN for reactions to anything interesting I post, in the form of semantically related stories that coincidentally have ideas in common.

helping others help you

my (hidden?) point is that if you can help orient other people quickly, that's i think a good thing for the chances of people wanting to get involved. (well, or if they understand and then disagree, maybe not, of course.)

orientation in complex low level use cases

I guess I could write a lot about orienting people with respect to specific use cases. But I was expecting docs to make sense only to folks with several years experience developing in low level imperative languages aswarm with pointers, manual memory management, and hand-rolled oo in C via function-pointer based vtables. (In other words, folks used to walking on hot coals, with a different idea of what is actual cause for whining, not just "owie, I got hurt when a thorn poked my finger.") A safe language can be written on top though; you could write a hypercard clone that made it hard to do anything destructive in scripts. But that's a different tier of dev than core hacking.

Inviting help enables both good and bad things, and as Sturgeon said, ninety percent of everything is crap. :-) And of the non-crap remainder, much of it is beset with agenda, though Sturgeon didn't tell us how much. So after wading through noise, part of remaining signal aims to steer you off a cliff. In docs I expect this can be illustrated by example, using characters with different agendas. Ivy Trellis who loves C++, for example, wants to make everything depend on everything else in a huge monolithic raft of best practice libraries, impossible for one person to grasp as a whole or to protect themselves against stupidity and malice, or simply baggage. (To use Ivy's code successfully, you might need to defeat her library's seven evil exes, ie baggage.)

Part of focus on source rewriting is explicit support for fork-only libraries, or fork-by-default anyway, by using non-overlapping symbol sets. I could maintain two forks: one that takes bug fixes only, and another integrating random crap people want to add until it dies after the baggage metastisizes. But actually I don't think I'll have time to talk to anyone other than exceptional collaborators. For me to provide support would require a salary (and I'm old, so I'm not cheap) probably putting me in an evil bastard category pretty quickly for a lot of folks. A readme can give plenty of warning.

A coworker of mine agrees with me it's now easy for us to get anything to happen that we have in mind. But hard parts are two-fold: what is the old code doing, and what actually happened after the new code was added? In other words, visibility is the problem. Large, complex systems can do crazy things, and they'll keep doing them -- just emergent behavior -- after you add new code, no matter how good it is. So in a design where the idea is to put lots of concurrent things running in one address space (like I've been doing for years now), the problem is observing actual runtime behavior, so you can form intuition about cause and effect. Part of the purpose of the atlas name tree (namespace) described is to reflect state in a way easy to export in URLs for browsing, and to enable scripts reaching in and tweaking details with digital tweezers, for in situ experiments and live system examination. That's above and beyond the divide-and-conquer role of cooperating loosely bound pieces.

I've been thinking lately about starting with structure of tests for initial synchronous library code, like C-macro-based collection templates, and tarball based storage mounted in the atlas for a quill rewriter of C sources. A command line executable would take as one argument a path in the atlas identifying a nested program you want run, exactly like the path to an OS executable, but just one fragment of all the linked code. Everything a binary could do would appear in the atlas name tree, along with meta-info, including help docs, some of which might actually be text in a separate file but mounted in the atlas. A test binary might link everything in the world, but reachable on different paths. Of course tests would appear in a directory where /test/ was part of the path. (And in non-test binaries, nothing would be under that path.)

Apple had a development tool named MPW I used several years, and executables running as coroutines had help info you could display on demand, often pulled from text-based help files. This would be similar, but file system oriented, so an html version could also be browsed just as easily. With some ux provided by front end widgets, a graphical shell might be easy to setup too, with menus and buttons etc simply running scripts in the atlas namespace, extended by letting a dev mount new things wherever the store lives (such as hypothetically a tarball for one of the dumb versions).

If you can run thousands of concurrent lightweight processes in one OS process, the hard part is telling what's happening -- where are my cycles and storage going, and why did this tool hang and should I kill it? A lot of generic unix tool clones would be helpful. But I wouldn't include anything in a fork I was maintaining except parts completely documented by the dev involved, without pointing at external specs. (No more "just go read those ten thousand pages of committee report summaries, and you'll understand what this does.") I'd rather have too little as long as it's simple and I can go over every bit of it.

i once was the human C++ compiler

Believe me, i've been there, too. (Edit: although I didn't have the massive single-process lots-of-threads insanity to nearly the degree you describe.)



critical sections

Today I have a couple related topics involving mutex in critical sections and assignment of fibers to threads (when this can vary in a runtime version of a language or system). Maybe I shouldn't assume everyone knows what a critical section is; I'll pretend, but summarize the idea anyway. As you know, a critical section is code in scope where you want to serialize concurrent access to a bunch of related state, since otherwise invariants might fail during data race within related changes that do not finish atomically. When code is naturally atomic, and cannot be interleaved with concurrent behavior in another fiber or thread, it's already safe, without mutex. But what if it stops being atomic? What if a section of code requires atomicity, but you didn't declare it when already true, then later code changes make it stop being atomic? Hmm. No tool will tell you, because you didn't assert the requirement of atomicity for a critical section.

For example, suppose leaf calls in a fiber runtime are atomic, because a local call tree makes no async calls, and therefore cannot be interrupted under cooperative multi-tasking. A dev can see it's okay to make several related changes with a call to foo() in the middle, because a yield cannot occur until finished atomically. However, suppose later someone decides foo() should become async, and declares it thus in new meta-info annotation. The next compile will rewrite calls to foo() as async, and the previously atomic leaf call becomes no longer a leaf call -- it can yield whenever foo() is called. Then if a yield does occur, any pending state changes might occur only after another fiber comes in and sees a partial update, as yet incomplete. Here making a function async breaks a place where atomicity was needed, because a dev didn't declare it, because it was atomic at the moment of first writing.

When a dev needs an atomic critical section involving multiple calls to functions currently safely atomic, it should be declared in some way, so if later any of the called methods change so they can yield, you get a compile error ... or better yet, safe results because a mutex is applied to the critical section. A cooperative mutex might be best, which a compiler can optimize away when it sees all code called in a critical section is atomic already and cannot be interrupted anyway. (Or it can change the mutex into a "mock mutex" reporting an error if, impossibly, other code tries to lock it while the atomic critical section is running: a good reason to abort.)

The following section introduces a half-baked idea motivated by testing the situation above during runs in a prototype using native OS threads instead of fibers. That it's half-baked means there's a puzzle for you to work out: whether the idea is cooked when fleshed out sufficiently, or whether it always works. I'm trying something unusual in describing it before I know the answer. (Of course, I expect this may draw a stupid comment from someone new to thread pools; I may simply ignore remarks that seem irrelevant.) When the goal is to eventually have lightweight processes streaming data with consumers and producers in a pipeline, they don't know where an adjacent pipeline leg is done: another fiber, another thread, another process, etc. It should still be correct if it's another thread or another OS process. So a prototype of fiber designs can test under threads, to get a baseline that should be the same once fibers are used instead. (Code originally written as a thread can be rewritten, by continuation-passing-style, so it works as a fiber instead. But to verify fiber and thread versions act the same, you can actually run it as a thread and compare, if you want empirical evidence and not just a nice-sounding argument.)

job thread pools

Suppose a job is an entity that normally maps to a fiber (ie green thread), but can map to a native thread instead, especially for differential testing purposes. Given a thread pool of more than N threads, clearly running a test with N jobs is easy: just map them one to one. But what if you wanted to test J jobs for J much bigger than N threads? Can you bind a job to a thread from the pool only when it needs to run and make blocking system calls? If the answer was yes, you might be able to get a system performing almost as well as a fiber runtime, if a huge number of jobs can park while not currently bound to a thread, while waiting for an i/o event.

When jobs are servicing i/o events from streams or message inboxes, you might only need a job bound to a thread when servicing an event. When a job needs to wait for another event, it might disengage from its thread and park until the next event. (In a fiber runtime, a thread is only needed when making a blocking system call; the rest of the time a job simply runs and parks in the main thread of the fiber runtime.) In a more general sense, a job may wait on conditions, of which inbound events and messages are just a subset. Other conditions might be signaled by job peers using lightweight green conditions (not native pthread conditions).

Non-native green conditions are not thread-safe: they would need to be administered by a single-threaded conductor, like a traffic cop manually signalling which motorist goes next in a street intersection. Suppose the job cop runs in a dedicated OS native thread. Under fibers, this is where the scheduler would run. But if we're just simulating fibers, we can use a thread pool to actually run code concurrently (maybe not in parallel, though, if threads are not serviced by multiple CPUs). In effect, everything would synchronize through the job cop thread, which might seem like a bottleneck, except the job cop might not be doing anything except directing traffic, thus lightly loaded. The main bad part would be extra context switches, as part of the cost of having many more jobs than threads. Latency to context switch for a job to unpark would look like activation threshold cost -- high compared to activating a fiber, but maybe not high compared to the i/o generating the event awakening the job. You would only get throttled if you tried to consume more inbound events in throughput than you could context switch in binding jobs to threads while also doing some work afterwards. If you had idle CPU, context switches might only increase per-response latency.

Undergraduate computer science operating system classes like to show how several sorts of synchronization mechanism can be written in terms of each other. Given semaphores, you can write monitors and message boxes, while given message boxes you can write semaphores and monitors, etc. In this case, the green conditions used by jobs in a thread pool might send messages to the job cop thread, so all the synchronization primitives are built from messaging. A job parked on a green condition this way probably blocks the thread running the job, so the thread isn't actually free when a job parks on a condition. We might only unbind jobs from a thread when handling an event is done, and the handler returns. (Unless you actually have fibers, trying to park with a nested stack only looks like a thread when it is one and blocks with the C stack in that state. We can only unbind a job from a thread when the stack unwinds from an original binding entry point.)

That's all hand-waving though. Something might not work. Seems like it should, but details matter. Anyway, it might be possible to debug concurrent unix style tools that are written in direct/threaded style by actually running them as threads, then copying data for address space simulation in the streaming behavior, with a distinguished job cop thread acting as the switchboard with extra context switches while having many more jobs than threads (a good idea only when a majority of jobs are quiescent while waiting on i/o events). A daemon written this way should be able to interact directly -- via messaging -- with a daemon written completely differently using fibers. As black boxes, daemons might look just like each other in terms of behavior, when ignoring costs like latency from context switching. Part of a test suite might verify a test does not change results when a tool runs as a fiber versus running as a thread. It's likely easier to debug the thread-style version in a debugger.

There you go, that was for people who think threads and related designs are interesting. I didn't ask a question. But I would find any analysis interesting when it argues this testing design cannot possibly work because of a situation spelled out clearly. (Saying "people don't do that" is not an argument, because some people are sheep without imaginations, so there are many things they don't do, that work anyway.)

Edit: If a natural way to write fibers is via blocking calls (green-parking a job instead of native-blocking), then the same code run as threads will look the same, instead of looking event driven. So one-to-one (jobs to threads) is the natural match when threads want to closely imitate fibers. The many-jobs-per-thread model seems like something one would do mainly when not intending fibers done in continuation passing style. And as long as you decided to go that way, there's no reason to use blocking conditions either, since anything a job needed to know about could be turned into an event: hey the condition signaled, etc. Given a high level FSM (finite state machine) description, you could probably compile this to a nice job definition that suits temporarily binding a job to thread for event handling. But it wouldn't closely match jobs written to look like a plain standalone process. So while it's sorta interesting that it seems feasible to support a lot of async tasks with fewer thread, the testing regime for comparison to fibers probably needs to bind job to thread for job lifetime.

avoiding locks with immutability?

Some folks read about critical sections and think "that sounds hard" and dream of simplifications based on immutable data, sounding something like, "If state never changes, there's no need for mutex, because no races are possible!" (High fives all around.)

Are you guessing I'm going to say that's wrong?

Suppose your immutable data comes in clusters of related bits that must make sense and be consistent with one another. It's not enough to replace old immutable A with new immutable A'; you also need to replace old B with B' and old C with new C'. So tuple { A, B, C } needs to become { A', B', C' } atomically, without mixing old and new in the tuple seen by a reader.

If new versions show up concurrently and non-deterministically, while concurrent readers try to access the tuple as new values show up, how do you ensure a tuple seen has a consistent version set? You can lock it, or replace it atomically with a new version of the tuple.

But what if the tuple itself must be consistent with changes in other tuples? The "replace atomically with a new world root" trick only works if you have static global view of everything that has to be consistent with everything else, which implies high coupling. If you want decentralized low coupling, it might be hard to avoid locking when replacing published versions of related immutable data sets. (Functional languages are obligated to fudge immutability by hiding change in name binding systems.)

You normally create a new

You normally create a new world W' and don't make that world visible until it is completely constructed, so readers either see W or W', but not some mixture of both. But ya, this doesn't work well in practice, which is what Glitch is designed to avoid (simply replay things until everyone has seen the final W'').

what about partial internal dependency during construction?

Your optimistic algorithm to resolve after conflict is interesting. I assume the event stream causing change is assumed human-driven and therefore low frequency? How do you calculate whether it always converges? Does there exist a data set size with enough large dependency chains that a person could cause enough to invalidate wholesale downstream that resolution cannot keep up until a person stops and waits?

Calculating the new world W' generally requires internal locking if done by concurrent entities like threads or fibers. Lazy code is far easier done synchronously, because concurrent control flow joins need locks for barriers when resolution is pessimistic. If you depend on another value, and wait for it to resolve, the mechanism (a barrier based on a condition) notifying all waiters the value is available will contain a collection of waiters that's hard to make thread-safe. So mutex is needed in the time-dependent mechanisms themselves.

Convergence is ensured by

Convergence is ensured by checking for and disallowing cycles in the dependency graph, but there are other kinds of cycles that require "backing off" when a effect goes on and off (which can occur in very large examples). I don't have any formal reasoning a termination case, just a lot of fuzz testing results.

Events can come in as fast as they want: Glitch processes work from earlier times to later ones, so processing would just fall behind while still making progress.

Writes (including for reader dependency lists) need to be protected by locks. Only reads are unlocked, which makes it impossible for deadlocks to occur.

Avoiding locks with copy-on-write

In my hobby lisp I use a threading model that avoids locks and mutexes.

The way it works, a process can spawn two (or fifty) child threads simultaneously, but is itself suspended for as long as any child is running. Each child thread has read access to the state of the suspended parent, but no write access to it. Whenever a child process performs a "write" it gets its own copy of the written variable. So no thread sees any value change due to a write by any other thread; the children don't see writes by the parent because the parent is suspended, and don't see writes by each other because of the copy-on-write mechanism. The parent doesn't see writes by its children because its state is immutable (and it is suspended) for as long it has children running.

Children communicate with the parent by returning values, exactly like called functions except that there can be many of them running during the same interval. For purposes of most algorithms, the children do not communicate with each other at all. When it is absolutely necessary that they do so, they use sockets, like any other interprocess communication.

It works tolerably well. It is of course slow if the child threads write a lot of variables, but if people program in a "mostly- functional" style it's quite serviceable.

It exists mostly because I think the standard threading model where every thread has write access to the same environment is insane, hard to debug, and cannot scale beyond a single memory bus.

So you basically reinvented

So you basically reinvented MPI? :) NUMA definitely has its issues.

that's a cool way to avoid parent/child locks

That's really clever. I like the way contention is minimized by agreement, avoiding interference, without resorting to locks forcing context switches, since they won't be necessary. If done with OS threads, I think you can get away with very little synchronization. A barrier to resuming the parent might be an atomic refcount of children only, so a zero count means the parent must awaken. (Actually awakening the parent seems to require using some mechanism under mutex, inside, if this is native threads and not green threads.)

The goal is to prevent interference in critical sections: state visible concurrently that must maintain an invariant if it changes. Keeping as much immutable as possible is a good way to sidestep the issue, when there's no exposure.

If children share a heap, it must use mutex to prevent corruption under concurrent access, unless both alloc and free are atomic (which really can't be done easily with native threads). If they don't share a heap, and work in independent spaces, then a handoff must occur when returning values to the parent. That might require mutex somewhere, if only at free time.

Though done voluntarily without a mutex api, the suspended parent is basically read-locked for N children readers. And space used by each child is basically write-locked for one writer, though maybe by convention only. Doing this by rule instead of executable code is elegant and efficient, because it doesn't waste resources letting threads wrangle with one another.

Shared state change spanning potential context switch is the problem. In green threads we can isolate a critical section inside a block immune to pre-emptive yield, so a lock isn't necessary. But if a yield must occur somewhere in the middle, there's a problem even if another OS thread cannot see state involved. It only takes another independently scheduled fiber to see a partial update. Under concurrent fibers, critical sections must be made fiber-safe even if they are already thread-safe due to lack of visibility across threads.

But the best plan is to do what you did: think, then arrange conflict doesn't occur, so no explicit locking is needed, since that performs better. Immutabilty for scopes in which conflict can occur is great. You would need to communicate this model to other devs though, lest they break it due to lack of understanding, if they change code under maintenance.

When simulating OS processes in a fiber runtime, immutable streams between green processes is a good way to prevent seeing shared state change, when it doesn't change. Inside one green process though, cooperating green threads operating on shared mutable state need to avoid exposing partial updates to one another, even if the whole green process is single threaded. You're forbidding they have shared mutable state, but I can't when I want to port any random code someone has running already.

Edit: sockets are a good way for threads and fibers to send copy-only messages needing no mutex when sharing is absent. (Using copy-on-write immutable messages within fibers living in the same thread is similar.) Since this good coding style will be used in threads ported to a fiber runtime, it's why I want green sockets subclassing an abstract valve api. That way fibers to can talk to anyone with sockets, locally or remotely, and it should behave like it does in a standalone process.

It's built into the API...

I wouldn't need to warn another developer about this to prevent him from breaking it; in fact if he wanted to break it I'd probably have to explain how. Normal writes to variables will simply fail to be visible to other child threads and parent threads will be suspended automatically when child threads are launched. Nobody can break it by accident.

In order to do anything differently, you have to use bindings that are not defined unless you #define INSANITY >1 and #include SHARED_MEM_THREADS.

#define INSANITY takes an integer value; 1 for problematic constructs which can nevertheless be used in a disciplined way like threads with access to shared memory, 2 for "binary-image casts" or anything dependent on accidents of binary representation, 3 for deeply problematic constructs like reified continuations which cannot under any circumstances be composed with anything else that also uses them. It's basically a way of having the programmer say "I have been warned."


It's good when how you use something enforces a desired quality. But if hidden by transparency, semantics may be invisible. I don't expect devs to fail to see how threads work, though, so failure to create a good mental model seems unlikely.

On composing continuations: as if they were functions? I think of them as control-flow mechanisms, like special forms (e.g. if and and) causing jumps, with a continuation much like goto joining any previously reified call tree location. Since values are passed, I guess you could compose dataflow streams, but control-flow composition is almost out of the question unless the idea is to abstract goto behavior (and then nearly no one would understand it). Pipes compose as dataflow, and you can compile them to something using continuations, but it only looks like composition in the form expressing "what happens" in dataflow terms.

When syntax is uniform like Lisp pair tree ASTs, everything might look uni-typed when different things have little or no relation. It's the names and not the form that designates semantics in Lisp syntax, so one has to treat names like abstract syntax via tagging.

I thought reified continuations were neat....

until I tried to combine code from different projects one of which had used continuations to implement green threads and one of which had used continuations to implement dynamic variables.

It turned out that there was absolutely no way to have both of these functionalities exist in the same program and have them both implemented in terms of continuations. At least, not without creating a program that leaked unrecoverable memory every time a call was made.

More generally, it turns out that if you are using continuations to serve any two or more semantically distinct needs, you cannot combine or mix codethat depends on both of those needs being met. In some cases you can completely rewrite all of it to get code that does both things, but, first, this is rarely possible in practice and provably impossible in some cases, and second, in the relatively few cases where it's possible the structure of the new code has no apparent relation to the structures of the things you were trying to "compose".

And the more I thought about them, the more I realized that reified continuations turn out to be a Wrong abstraction. Correct abstractions create things that can be combined into other things. Wrong abstractions create things that may work but which cannot be combined or composed.

cost amplification in leaky abstractions

I'm totally onboard with your story, and won't say a thing in contradiction. But it's a good excuse to analyze a kind of common problem so younger system programmers can think about it and plan accordingly. Phrases that engineers might use in this context include: leaky abstraction, impedance mismatch, cost amplification, unclean definition, or (generically) interference. I'd suspect the implementation of dynamic variables as at fault here, by depending on too much, violating a principle that amounts to "remember the least amount that lets you work correctly". If they pinned a stack in memory because of lazy behavior, that sounds like more than required.

Because threads need something like a stack, using continuations is a close fit, because cost is not a lot more than the stack needed to run when control flow goes there again. There could still be accidental cost amplification, but it's hard to see it being large. But a variable is something one thinks of being small -- just one part of a closure in one part of a stack -- so holding a whole stack via continuation seems like gross disregard of cost budgeting.

I can see how first class continuations in Scheme contribute to the problem, because they are black box rather than white box (so you can't tweak their internal behavior and costs), and imply memory usage of unknown size, because the Scheme model favors ignoring space cost under automatic memory management. Fragmentation is a really easy cost to amplify when a device has implicit associated heap allocation. Memory management semantics, specifically, is one of the biggest causes I know of impedance mismatch ("using this api indirectly subverts what I want to do because it's assumptions don't match mine"). Sync vs async is another, but comes up less often when async api is basically unusual ... except that maybe a lot of network related code is far twistier than it needs to be, when it constantly marries interfaces that fit poorly here.

You talk about continuations in Scheme -- a specific instance -- as if they characterize the general idea abstractly, implying every specific concrete version might be very similar. Maybe they would need to stay pretty black box in a spec that wants to specify as little as possible, so every possible way of implementing Scheme can do them differently. But the consequence is inability to alter unexposed details, when there no tuning knobs. I almost always want to distinguish between a general idea and a specific local version, but usually the same word is used, making them hard to distinguish without using different typefaces and subscripts.

(Every time I work on a new library, the existing code is clearly a mix of different abstract plans X, Y, Z etc by the same person, or different successive engineers, each realizable as different concrete versions x1, x2, x3, ..., y1, y2, y3, ... z1, z2, z3 ..., and some are incompatible with each other. Some of the plans are broken and can never work, such as X contradicting itself internally, so if you see x1 you already know it's wrong. And sometimes instead of seeing y2 used consistently everywhere, you see x1, y2, y3, z2, etc that step on each other because they don't even follow a consistent plan. And I usually find a piece of code that matches nothing, and is utter nonsense, but a comment would have helped me see z3 was intended but was horribly botched. When people say code is self-documenting and needs no comments, I find this most bizarre, because it's always clearly full of bugs, but of unclear nature without the intended tactic noted.)

If enough low level details are exposed, you can see how they make high level features like first class continuations. In C you can simply have function pointers and lists of stack frames, and with auditing you can see where everything was allocated in a profile. If you leak memory, the remedy is just "stop doing that", which of course would be hard if a mix of scrambled tactics made a coherent plan hard to follow, but still.

The cross product of blurry abstractions is usually a mess that amplifies some or all of the fringe costs. Someone might write a utility function that could have run in 10 nanoseconds, but instead runs in 10 milliseconds because the writer never needed to use it more than once a second. But as soon as usage with something else needs ten thousand per second, it's dead in the water because of amplification. While some things are fixable, it's common for the problematic part to be pinned in place by a network of assumptions about that particular detail, making it necessary to rewrite lots of things insufficiently compartmentalized.

Hmmm, this probably belongs in a different thread.

I have a lot more to say about reified continuations (and about the possible forms of other constructions that could replace them). But it's fairly tangential to this topic, so I may start another one.

I should also go search through old topics before I do, because I'm fairly sure that particular football has been kicked around on LtU before.

looking back and breaking away

Google returns LtU reference Implementor's guide/tutorial to delimited continuations? on the first page when searching for "shift reset continuations", as well as links to other pages characterizing delimited continuations as compose-able, which fits your earlier remark a little.

I think a new discussion would be better though, to explore new ground instead of following the spoor of others. (That has a rude sound to it, doesn't it? My intention is humor rather than disrespect.) In particular, anything old with "done by magic" aspects to it will poison any attempt to vary from the indivisible platonic ideal of magic. I'm all for showing respect, but never at the cost of asking permission to think independently, which is a pretty feudal perspective for academics to favor.

I find it really tedious that one can't discuss continuations, in a general sense, without paying lip service to work done in the Scheme world, as if the idea is now owned by academic gods who got their mark on it first in that context. (Nit: reset and shift are bad names; would catch and throw have been so horrible?) Generalizing that complaint, I don't see any reason to pay special attention to anything ever said by anyone when starting from scratch, unless it moves things along faster to recap an old thing retained and endorsed exactly as before. But then you're hardly starting from scratch. Better to draw up a new plan for terrain, then show how it maps to another plan, keeping all the maps and territories distinct for discussion.

I think fibers are a better place to put bizarre control flow, so reasoning is process oriented, instead of the weird timeless, costless, high level abstraction of single-threaded exotic control flow mechanisms. It can be just as cheap with better organization, though perhaps slower on micro-benchmarks when magic versions can ride on special purpose compiler optimizations. My reasoning goes something like this:

With a lot of code, sometimes really complex things need to occur. As a divide-and-conquer strategy, chop things into fiber based processes from the get-go so they can be analyzed and reasoned about independently. Then each process can be made relatively simple. Really weird control and data flow can be expressed by messaging. The fact that fibers can be scheduled independently lets every weird thing you wanted possible, without making any single process behavior too hard for undergraduates to grasp when learning basic programming.

Okay, can't resist one more...

In fairness, Scheme is approximately the only language in real use with reified continuations. So it does seem natural when discussing them to do so in the context of Scheme.

Although even if you think reified continuations are a good idea, call/cc is silly as a "primitive" that does too much. You just want a way to refer to the continuation of the current function, full stop. It's easier to understand, handier than 'values' to call directly to effect function return, and you can call some other function with it as argument directly if you want to.

Delimited or escaping continuations are much less problematic.

well I'm appreciating your continuation remarks

I don't mind you talking about them. :-) No one else seems to enjoy continuations much, so discussions tend to go flat. I first learned about them around 1993 when I read Appel's Compiling with Continuations, which I got at the bookstore of course, since this was pre-Amazon. I had been struggling with weird impedance matches caused by implementing Dylan in C++, and Appel's book set me to walking around blinking a lot as I mulled it over. I mentioned it to former coworkers at Taligent, asking the debugger guy (Kevin McEntee) questions about how they'd affect debugging runtimes, but I came off sounding like a major weirdo, because why would the world ever use anything besides C++? I was interested in the obvious uses related to getting flexibility.

It's not surprising to me few languages besides Scheme have reified continuations, because they're a little weird in the absence of other features that would leverage them effectively, like a standard green thread system. Shortly afterward, both Java and the web, then Javascript, sucked the life out of a lot of interesting options not as clearly related to getting rich right this damn minute. Non-contiguous stacks are religiously very unpopular with folks interested in getting merit badges for performance related work, and anyone who likes them is heavily pressured to give them up. Since Scheme is not especially popular anyway, a threat of "you won't be popular any more" has almost no effect, since it can be met with an ironic sneer from folks who have more important things to worry about.

Anyway, Appel's book gave all examples in ML and I have trouble thinking of Scheme as the center of the continuation universe, though I suppose that is where the continuation capital is now. I'm not really interested in Scheme's continuations, but I'm really interested in the general idea, especially since they are virtually the only way you'll get Erlang style huge populations of green processes running in C with just a little help from tools that don't require a big engineering budget (ie a single person can do it).

... call/cc is silly as a "primitive" that does too much. You just want a way to refer to the continuation of the current function, full stop.

I agree, I just want access to the current continuation, too, and call/cc always seemed oddly complex. First you introduce the idea of continuations to a developer by reasoning in analogy to return address passed as implicit arguments to functions, and how neat it is you can pass them along to do simple things like tail-call-optimization (TCO). And then you try to explain this primitive with really funky usage requirements, not obviously related to that idea of treating return addresses as first class. Shrug. I suspect the appeal in this was actually upsetting and repelling developers, as a badge of honor like wearing a hair shirt, or something.

Anyway, in what I had in mind doing, every function that can yield would expose a reified continuation. That is, if it's possible to use one, there is one, always. And you can do whatever you want with it. But since a lot of folks have trouble using pointers, it would make sense to provide some standard mechanisms to manipulate continuations, where the spec sounds less dangerous, for folks who don't get it. I'd expect it to be considered bad style to use continuations for a trick when not necessary, but nothing would stop you except the punishment of chaos and difficulty of maintaining the result. I suppose I'd have to add a compiler feature that warns, if you enable it, when you do anything interesting that isn't one of the usual mechanisms, just to save the sanity of innocent bystanders.

(Because of type erasure in the runtime representation) It'd be hard to do reasonably good typing on continuations without leveraging reflection code that can see whether values you want to return actually match what was expected by the continuation's next usage of it. I wasn't thinking about it though. I assumed folks would usually just want to have fibers, but then it would also be easy to add exceptions and similar things. If you implemented a Scheme or Smalltalk on top, you could expose the C layer continuation as a reified continuation in the language, though it might be dangerous to call one that didn't expect a value from the language running now.

name conventions and intrinsic vs extrinsic

There's a tension between low and high level needs in library definition. A systems programmer wants as few features as possible, imposing no constraints while requiring more knob twiddling. And an app programmer wants to whip together prototypes with as little effort as possible. The former says, "I already allocated everything, so don't allocate anything for me." And the latter says, "Don't make me do memory management, just let me spawn collections in high level syntax." Intrinsic links are better for systems code: chain together existing things through a field, while extrinsic links are better for app code: allocate the linkage used to host copies of anything stuffed in the collection. The C++ standard template library favors an extrinsic link approach, with all members copied. (You can use pointers for members by reference, but the net effect is that pointers are boxed in space allocated by a collection.)

If I wanted to make this distinction clear, when both flavors are present, should I emphasize "by value" or "by boxing"? The problem with value is that someone can say "everything is a value" and therefore fail to understand, because it reveals no distinction.

I was going to put them in separate libraries, purportedly written by two characters: Wil Gowpen with a systems programming focus, and Ivy Trellis wih an app prototyping focus. (This is a simulation of Conway's law with design focus personification. Instead of long explanations of focus and priority in abstract terms, we can just say it's what those characters like.) Wil likes intrinsic links, refusing to impose allocation assumptions. Ivy likes extrinsic links, assuming no one wants to manage memory. Ivy's library depends on Wil's, but Wil rudely refuses to make the dependency cyclic. Tests in Ivy's part of the code tree may run comparisons between C++ templates and wrappers Ivy wrote around Wil's code. (Wil's reaction: "That's nice, but I don't care.")

Code interfaces use prefixes and suffixes to indicate important distinctions. The default ns (namespace) is cy, but it's subject to change because complete renaming under forks is assumed the norm. A leftmost prefix is most important, meaning code may not appear in a library at all because that entire prefix was left out. Prefix t for test goes in a separate library linked into debug builds. So an object named cyFoo might be tested by tcyFoo, and you can see by its name you want nothing to do with it. Other prefixes include r for reflection (which Wil might live without) and f for floating point (which won't function in Wil's work environment). The format of cyFoo is described by rcyFoo, which Wil aims to generate during compile, but might already exist if bootstrapped.

In contrast, suffixes just mean something subtle but important about code organization, like whether it's an abstract interface, or asynchronous. The name cyFoo implies this is concrete, but cyiFoo implies this is an abstract interface you can't use without manually subclassing with C-based vtables full of function pointers (probably using suffix c for class in cycFoo to hold the function pointers). Suffix i implies virtual function dispatching. (Duck typing can work, so inheritance is not mandatory, but multiple interface conformance might require a fancy mechanism for inter-interface type conversion.)

Suffix a means the quill transpiler rewrote cyFoo to a new interface cyaFoo with one or more async methods, so using cyaFoo will often mean voluntary yields are reachable under cooperative multi-tasking when using it. The original cyFoo synchronous version can still be used, though it won't be able to call async api, but will be usable in leaf methods that must not yield because (for example) they are in critical sections.

Wil can define a cyList interface (using macro-based templates) using intrinsic links. Then if Ivy wraps that in a high level collection version, suffix v for value (or Ivy) can imply value-based collection with automatic memory management. Here the extrinsic links Ivy added to the values would be passed as intrinsic links in Wil's api. (Intrinsic basically means "caller defined" so links are your problem, while extrinsic basically means "callee defined" so Ivy adds them for you; but they dovetail so the interfaces compose.) The behavior of a cyList interface would be very different from cyvList because the former copies nothing while the latter copies everything. But would suffix b for box be better? Yes, I'm actually asking for feedback.

I guess that would make an organic test case for explaining how renaming works in quill, when all suffixes can be swapped, back and forth between b and v as the end developer likes. Ivy might use v and respond amusingly when someone changes them all to b for box.

[I was going to make up a character named Humphry Dumphry, nicknamed Hud, who makes up terms to suit current whims. The idea was to say, "No that's not hungarian notation, it's Humphry Dumphry notation." But to my surprise, the name is already in use: he has a Twitter account among other things. Bald as an egg, with a surprised look. Guess I need to skip that joke.]

vfs unique IDs and scope erasure

In another discussion I linked a paper by Jason Hickey about multiplexing file i/o for Plan9, because I plan a protocol-oriented tactic for building a virtual fs (file system) used by portable code to represent it's environment (with distributed parts) in a unified namespace atlas (noted earlier above) for consistency despite hetereogenous contexts and async i/o. This post aims mainly to touch base with anyone following a few ideas above, but without going into detail. Instead let's look briefly at language runtime angles implicit in problems.

How do we handle self-interfering design goals? In pursuit of simplicity, we want to both centralize and decentralize. Doesn't that sound contradictory? Unifying things makes a simpler model. Separating things enables simpler divide-and-conquer tactics. In programming languages this comes up in module handling and namespace importing (or not). We want our modules separate, except when mixing them together. (Okay, I'll try to be less funny.) Everything uses ID systems, and whether IDs are unique or not is a big deal. Publically visible symbols in language names are just part of the overall ID system. And to the extent apps can see what a host operating system environment knows, interfaces in local code impact global OS names and vice versa, to an app anyway.

So we want to union IDs generated by diverse naming authorities. Some kind of disambiguation scheme is needed. (Unresolved collisions cause chaos if names don't identify as intended.) The problem can go fractal when you add persistence, if at first you disambigute using ephemeral runtime namespaces coined dynamically in a current session. Oh, you want that name to be valid in a later session too? Future proofing is hard. Entropy is a grind. What if one of your remote nodes disappears? Or changes all its names, because that was allowed in the divide-and-conquer plan? If you were trying to hide the difference between local and remote, this won't hold up over time -- remote doesn't behave the same as local. I'm being provocative, to loosen up your thinking. (At this point if I was a huckster, I'd pitch the solution, but I doubt there is one better than awareness and care.)

In a design I want to preserve and enable ongoing disambiguation. When a boundary exists, it should be preserved enough to keep seeing it, as needed. If you union two namespaces, where they touch should be visible to code that cares; both caring and not caring should be an option. A body composed of lots of cells from different remote partitions should make host cell perceivable, to support practical concerns like routing dataflow and switching handlers to suit content domain. Preserving hierarchy seems enough; path erasure is the problem, so don't erase. Every unique thing should have an ID, and if hosted in another space, a unique ID of each host on the path should be present. For for example, if we union file systems, a file system ID (fsid) should be known for each ID inside. Thus if a qid is the unique id of a file or directory, then fsid+qid is the proper way to identify an fs entity. If persisted to secondary storage, you might need to remap an fsid when dynamically assigned. We might reserve zero for none, and one for self (my local fs).

Part of the unix way leads to seeing that "sharing is coupling" and "separation is de-coupling", so whenever sharing to optimize, for comparison you want a nothing-shared version for comparison, or otherwise you don't have much evidence that a shared-but-immutable version safely imitates an unshared version.

That leads to an irritating conclusion when writing a native kernel-threaded prototype of green processes: each green process should be a thread, and they should share nothing but address space, with state copied between threads instead of relying on immutable data. (Exceptions: state in mutexed critical sections during copy via inbound queues. Inside one thread, outside critical sections, use copy-on-write as much as you want with non-atomic refcounts that are safe anyway, enabling ref-is-one-means-unshared and CoW is not mandatory.) Then you can compare an optimization using immutable shared state, which had better behave the same to be considered correct. But I'd be inclined to only use an immutable shared state optimization for fibers living inside one thread, while keeping the copy-only version of between-thread dataflow. Among other things, when you actually move one thread into a separate process and pipe-or-socket the dataflow, the marshalling and unmarshalling will be very nearly the same. Flow between threads can still be efficient though, despite being copied, when done in one address space with source pinned by convention during copy; so one would expect this to perform faster than lots of separate processes. Some threads can function as dedicated servers in the same address space; religiously copying means corruption requires wild-pointer stomping and not just sloppy locking.

Does any of this imply something about programming languages? Well, a PL should know divide-and-conquer is likely a strategy needed, and that some executing code operates independently via concurrency from other parts, but you should be able to see them anyway in some sense, or else cooperation is hard. Any boundaries should be visible, if desired, instead of hiding location in the name of transparency. Hiding is okay, but making a boundary impossible to see is bad, because it prevents intelligent action in code a dev sees and understands, and that's what burns you. Some kind of customs and passport metaphor might be a good idea, to make distance and boundary crossing visible and subject to resource management (and maybe access control). Scope erasure induces dumb code that scales poorly when boundary costs both fragment and amplify.

threads as programs, i/o pipelines, interpreter trampolines

I had a couple entertaining ideas yesterday, about throw-away loosely coupled tools in a threaded host environment. The obligatory PL part of this post is about a Lisp interpreter in that context, which I'm used to thinking of as annoyingly fragile and limited, so getting more than toy usage out of them is hard and one doesn't bother. I thought of a toy usage that doesn't suck, and also provides a baseline for testing any later fiber-oriented replacements.

(After you know a lot, deciding what to think about is hard, because the set of options is enormous taken in combinations. Finding what would be fun to consider, if it only occurred to you, becomes more difficult when novel ore veins get sparse as your programming mine gets played out. Just making a suitable memory palace to explore what you haven't looked at yet gets complex. It's like trying to turn a cave into a mine when the cave doesn't exist yet, until you think of it, while distracted by old stuff.)

All the following assumes an OS process that runs programs as (native OS kernel) threads, one thread per program, until such time that (one day) a thread hosts a fiber engine that runs N programs multi-plexed that send out blocking calls to be serviced in a thread pool. Although it can offer an ongoing REPL, a normal configuration would be a background daemon that keeps running while servicing requests to run new programs, sent to it piecemeal as command lines. Perhaps a specific port number corresponds to a session associated with a one thread acting as shell in the daemon. Until that shell stops listening, you can send it "execute this command line" requests from the command line you're using. Each program, including a shell thread, has it's own namespace in the form of an atlas vfs (virtual file system), probably cloned by copy-on-write from a prototype setup when the daemon started from a config file mounting things in the file system this way or that way, from internal trees, OS level file paths, and tarballs mounted as logical file systems. The shell starts programs by spawning threads, after setting up stdin and stdout streams as needed for pipelines. With an abstract api, the standard streams might be bounded buffer queues or else OS pipes, or both when comparing them; maybe which is used is modal, depending on an environment variable in that shell thread. When using pipes, obviously we turn off signals and check for EPIPE ourselves explicitly after writing, because we want program threads to exit gracefully without killing whole daemon.

Each C level "program" is a shell builtin, with the moral equivalent of a main(), except that globals are not normal, so standard i/o is passed in a suitable way, perhaps via abstract api struct cyi_main, along with the env abstracting OS interfaces. As structured, each builtin program can also be run as a standalone app outside the daemon, so it's possible to execute the "same" program in a pipeline either as a thread in the daemon or as a separate OS process, after the shell thread jumps through matching hoops to hook up the standard i/o as needed. Each node in the vfs tree knows what is represents in the daemon's runtime data structures. If mounted in another daemon's atlas namespace, and accessed by protocol, each node can know how to act like "a file". (Reading an object might pretty print its state, for example.) But one daemon cannot execute another daemon's builtins, since they only work in the original host process, even though visible at a path in the atlas. Shell scripts however, can be run anywhere, though what happens might depend on what is mounted where. Obviously anything text-based can be run as a script, if an interpreter program exists as a daemon builtin. If you port a simple version of netcat to a daemon builtin, you can create spectacularly circuitous i/o pipelines, if that's your thing.

So, one of the entertaining ideas was how easy it would be, as above, to wire up i/o any way you want in a test, so you can compare local and remote versions of the same code. The next idea came up while thinking about several related things: serving synthesized proc file system pages from a thread program acting as http server, and lamenting inability to multiplex programs inside one thread when blocking api usage would be the norm in a basic thread program. Then I remembered trampolines are easier to do in an interpreter than in C, so multiplexing Lisp program fibers in one thread would be relatively easy, and didn't require writing a production quality fiber engine. You would need to farm out blocking system calls to a thread pool so other Lisp fibers keep running, but that's why you have a trampoline, to make yielding trivial without stack unwinding.

The amusing part was that I'd be able to prototype throw-away experimental programs in Lisp, that run in threads, participating in pipelines interchangeably with builtins written in C. If I reified vfs (virtual file system) interfaces in Lisp primitives, I'd be able to inject scripts interactively that serve http requests, for browser-based inspection. So I could make up what I wanted to see on the fly, and I could draw upon any C builtin in the daemon, or upon any OS level command line tool since I could make pipelines using them as one leg in an i/o chain. All this in just screwing-around scaffolding, without having to break a sweat in careful fiber engineering to get something to see. Presumably fibers written in C would just be more efficient.

Spawning new Lisp threads in this model could be made quite snappy, if a prototype Lisp thread booted a runtime and then made all the memory chunks (interned symbol tables, for example) immutable by convention, so all other Lisp threads can share them without copying. A new Lisp thread would only need to allocate a new mutable main entry point sharing the immutable state of the prototype. Then copy-on-write would keep threads from interfering (and no thread would need to know whether others exist). I thought for a toy that was not too shabby, and I could get some actual work done with it, without compromising everything else in architecture because process isolation made it a black box effect. I share this only because I expect someone will think it fun to consider further. Good veins are hard to find.

During the holidays I also spent a lot of time looking at UI widget libraries and sighing. But I don't have a PL angle to relate there yet. But it seems like it ought to be possible to describe constraints on UX (user interface) in an abstract way, using something like style sheets in a fairly abstract space,that can be compiled to the display system that will actually do it, throwing away anything that doesn't translate. In the awful case where the end display system is a statically compiled system, you could generate code using the other model's scheme of definition.

JavaScript and Nodejs

Why not use JavaScript, as it already uses a single threaded, event based model, which is identical to the fibre/green thread model you describe? Nodejs would function as your daemon, and as you can reflect on the lexical context in JavaScript listing the properties of the 'this' object would function as your VFS. The 'ls' command would simply be a function that returns its result over HTTP. Use a web-browser as your UI and use CSS for styling. Nodejs also includes full Unix socket functionality for writing pipeline components. You could write the command-line in client side javascript, so that programs could return JSON or HTTP directly, so the results could be formatted tables, images, SVG graphs or MathML formulas.

we could rewrite all systems software in javascript I guess

Your suggestion is not terrible, especially if my last post is the only one a person reads. Not all green threads are the same. A lot of ground is covered by "thread-like with code in normal direct form, and yet not using pre-emptive native threads." Everything above describing threads and a daemon are about a specific environment to host a fiber engine, so there's something that can function as host when fibers are not embedded in something else. Suppose we use term jyp (pronounced gyp) to mean job thorn processor, where a job is a green thread, aka fiber. It's like a VM, except there's no virtual machine in the sense of having an instruction set. It's just C, with a trampoline and a funky continuation passing architecture, so green threads run in spaghetti stacks with cooperative async multitasking. I need to be able to embed jyp in other C code, where only C will be permitted ... at least, if I ever expect to use it where I work now, someday, or the next place I work if only C can be used. To make C run on a jyp engine would require either painful manual CPS or else automated rewrite; the latter makes more sense of course. But such a mechanism wouldn't do anything at all, unless driven from the outside.

My entire model revolves around the idea of transpiling C from what we normally write to something that runs as fibers under jyp. However, as an abstract idea, if anyone were to follow along at home and do the same thing for their own favorite language, it turns out the same plan has meaning for other languages too. You can CPS transform anything to get another sort of runtime with async green threads that scale because space footprint per fiber is low and context switch is cheap (especially if you try to keep scheduling related interacting green threads for a whole time slice to get good cache behavior). For me personally your suggestion is a miss since I need to use C, and my whole project is about transpiling C, and being able to compare behaviors before and after rewriting. But maybe it will help other folks see they should under no circumstances imitate my plan, and use JavaScript instead. I actually don't care whether what I'm doing is what other folks will also want to use too, though it's possible. I talk about it here because I'm interested in the thinking involved, and having conversation about the general idea. Your reaction to the general idea seems to be: it reminds you of what other folks do that sounds conceptually similar. Are you suggesting a browser only model? That would be exceedingly odd.

You might as well make the same suggestion to everyone who discusses every other language: tell them to use JavaScript instead. I'm not sure why you thought my remarks were a closer match to JS than anything else. For example, maybe folks who write databases, network stacks, memory managers, compilers, and operating system modules should use JavaScript, because it's probably got what is needed for those too. I'm inclined to suppose you meant to casually troll, to see if I would go berserk or something. I have trouble believing you mean your remarks, but it doesn't bother me much. I actually like getting a little feedback, even if it wasn't what I had in mind. Why is talking about JavaScript interesting? And why would you suppose someone writing high performance embedded systems software would consider using JavaScript as primary infrastructure? Isn't that a little weird?

It would indeed make sense to use JavaScript as one of the front ends emitted for UI run in the browser. But I have no interest at all in fibers running in the browser. I only care about browsers as thin client viewers, not as backend model execution devices. I don't work on user interfaces at all, and don't care about them much, beyond a view into what happens elsewhere. To the extent I am, I'd rather play around with new UIs than bother with browsers more than a minimum. Among other things, I'm interested in stripping away as many dependencies as possible, so extraneous factors are removed from consideration when debugging.

The jyp engine I have in mind won't have any dependencies at all beyond ones injected by its host environment. It's a main feature. Depending on everything in JavaScript is a big fail in terms of having no dependencies.

JavaScript has no blocking IO

I never 'troll' and I do not understand what you gain by making the supposition that I was?

I was suggesting JS as a replacement for Lisp (both being impure functional languages). You do not currently use Lisp as far as I can tell,so your own musings about Lisp are equally relevant to JS, as you would need to write the new Lisp bits in any case, so I do not see the relevance to rewriting everything. I could guess that you were intending to directly call C fibres from inside your own Lisp interpreter, but its not clear from the above.

The reason for suggesting JavaScript is its use of a green thread model. It is surprisingly hard to enforce this on a language that does not require it. For example consider event based IO in Python (twisted), this is difficult to use because if you include the wrong library it can block. The key thing with JavaScipt is there is no blocking IO at all, everything is event based. Therefore all the existing javascript libraries and applications (like nodejs) would fit your green thread model. So you can use nodejs and all its modules and any JavaScript library safely within your green-threaded environment.

You can write pipe components hosted in Node that use the unix socket API to connect to pipes from components written in 'C' and know that all the JavaScipt components run in a single thread.

If nothing else it models your system, so that you could experiment with what it feels like to program, without having to do all the implementation work.

As for fibres running in the browser, it is already an event based single-threaded environment, but I was referring to writing a simple 'shell' in the browser that sends the commands to node on the backend to run, and then shows the results before displaying the next command prompt. Kind of like Matlab, where shell commands can have graphical output if you want them to. Every PC has a web-browser these days so its not really a difficult dependency, and it runs on the client machine, so its realistic even if the backend is some embedded network device.

If you really want to embed in existing C code, and want to be able to call C code from JavaScript then duktape looks nice:

Its designed for embedding, shows how to bind C functions in on the front page. It appears you can use a JS based or C based event loop, so it could even integrate with your existing fibre engine. It also does not appear to have any dependencies beyond some system libraries and tries to minimise even this for portability.

yes non-blocking i/o is one of many requirements

Okay, then without rolling my eyes I suspend disbelief and apologize for implying anything less than good faith on your part. Please forgive me.

You do not currently use Lisp as far as I can tell ...

Long ago I used it a lot, but my usage of Lisp isn't very interesting, so there's not much for me to say about it. I wrote my first toy version in the 80's. During the 90's I interactively drove a lot of C++ with it, a lot like the way folks would use swig with Python, only using my tools of course. As this is something I'm highly familiar with, and enjoy immensely, it would be hard to replace with an alternative. (I'll refrain from invoking jargon from economics, where they talk about this a lot.) Suffice to say I'm not going to write a JavaScript.

I could guess that you were intending to directly call C fibres from inside your own Lisp interpreter, but its not clear from the above.

Suppose we use prefix a to mean async via CPS transformed green thread fibers, and use prefix t to mean thread or test for code not rewritten this way, but tuned to run in that sort of daemon. Normal C code found in the wild can be denoted with prefix n for normal, typically suited for neither.

Code in C then comes in three flavors: nC, tC, and aC, and we have transforms nC->tC, tC->aC, and nC->aC. Since nC and tC might be really close, the transformation is done by hand (stop breaking tC rules, like using non-mutexed mutable globals). Transforms to aC don't exist until a jyp engine exists, and you either do it by hand or by rewrite (where quill does tC->aC as well as nC->nC with symbol renaming).

Given some flavor of Lisp nL implemented in nC, you can run it in the daemon using either transform nC->tC or nC->aC to get new flavors of implementation tL on tC and aL on aC. One might also compile nL to nC, then convert that to aC.

In the long run I'm interested in async Lisp and Smalltalk, aL and aS, running on fibers in aC. But I don't see why anyone else would care. It's just itch scratching. In the short run, a scripting approach described earlier using a toy interpreter is tL on tC. But instead of using aC fibers, we can roll our own trampoline for an interpreted version of tL green threads, with no relation whatsoever to aC jobs so there's no attempt to think about modeling them. The point of doing that would be cooperative multitasking in one tL thread, so throw-away scripted servers can be written that do things concurrently when making system calls that block tL fibers.

See, this is confusing because there's a lot of names, and they are only slightly different from each other so it's easy to confuse them. I don't plan to give them separate code names. Instead I'm going to assign a different virtual persona (fictional character) to each domain: Wil does async stuff like jyp while Tyler does thread and daemon stuff like the post above. In a work environment, we don't give codenames to every single thing an engineer writes. It's just "Wil's stuff" and "Tyler's stuff". Wil is kinda straitlaced compared to Tyler's wild man (because his last name is Durden).

So you can use nodejs and all its modules and any JavaScript library safely within your green-threaded environment.

This is untrue for several reasons. They would all need to compile to C in little enough code that other engineers can completely review all the code while understanding it. No system calls could be made without approval of the embedding environment. And no globals are allowed, or use of any means of communication other than those provided. In most cases those JS tools would fight virulently with other mechanisms trying to do the same things. Even the presence of garbage collection without a soft real-time guarantee of yielding to a single-threaded host environment would kill the value. I don't want to say you are confused, or don't understand. The most charitable interpretation I had was that you were being difficult on purpose, but you said you're not.

Many Requirements

I'm fine with being confused or not understanding, there are many things I don't understand, and learning more about them is interesting. The biggest problem I have is allocating my limited time to all the interesting things.

ductape can disable mark-sweep garbage collection, and you can have a C-based event loop so it should be possible to integrate with your fibre runtime. There would only be two issues based on the requirements you have made explicit so far, I don't see a way of restricting syscalls, and its 42k lines of 'C' code for the interpreter, so the 'C' programmers are going to have a hard time understanding it all if they cannot treat it like a black-box.

For me rather than paragraphs of prose, a bullet point list of key requirements would help me keep them in mind.

could you clarify something here?

You use the words "thread" and "fiber" differently in your posts, and I just wanted to clarify exactly what you mean by these words.

By "threads" you mean, essentially, OS-level multithreading, the sort of thing that usually results in non-synchronized OS-level processes sharing a single memory arena?

And am I right in that when you use the word "fiber" you're talking about a system that manages multiple flows of control within a single thread, the sort of thing that is otherwise called "green threading?"

Fibers in that case absolutely require nonblocking I/O because otherwise a single blocking fiber would bring many fibers in addition to the one attempting I/O to a halt. But in the model where threads are used to manage a bunch of fibers, threads presumably do not use I/O at all. OTOH, it's entirely reasonable to interleave threads that manage a bunch of "fibers" with OS-level processes that those threads (fibers) start, and these all have their own conventions about I/O and when/whether it should suspend.

I personally think that blocking I/O (and modal dialogs, and threads sharing memory, etc) has always been a mistake, and that multiprocessing is only starting to force us to face a fact that has been true from the beginning.

thread=native/kernel vs fiber=green/user

Unfortunately I'm using thread to mean both an abstract and concrete meaning, and this conflation definitely causes a problem when you want to be precise. (It's like giving an abstract superclass and a concrete subclass the same name: terrible idea.) Temporarily, for this thread, we might pretend another word means thread in the vague general sense. When we mean "like a thread", we might use synonym hair to mean this abstract sense. (Unlike cord, no one wants to use hair to mean anything, so it's safe for temporary use. :-) We could also use cf for control flow to mean the same thing as hair, a thread-like thing; that sounds better to me.

Concretely, I always use thread to mean "pre-emptive OS-level native hair", but "green thread" means fiber or "cooperative user-space app hair". And fiber is abstract until satisfied by a specific kind of user-space mechanism. I use job to mean fiber in thorn designs.

When I seem to discuss thread-like mechanisms I mean hair, the abstract idea. In a concrete scenario involving a code plan with details, thread means native OS thread alone and is pre-emptive. Fiber is always a user-space mechanism that multiplexes a native OS thread cooperatively (and therefore matches "green thread" because adjective green means user-space as opposed to OS-kernel). Using a typology with uppercase nodes and edges inside parens:

         /    \
       (os)  (green)
       /        \
     /            \
  (posix)        (thorn)
  /                 \
PTHREAD             JOB
By "threads" you mean, essentially, OS-level multithreading, the sort of thing that usually results in non-synchronized OS-level processes sharing a single memory arena?

Yes. If a process is a single address space, a thread is whatever the operating system provides for multithreading in that address space. Usually they are pre-emptive. (Only old or odd systems have non-preemptive threads.)

And am I right in that when you use the word "fiber" you're talking about a system that manages multiple flows of control within a single thread, the sort of thing that is otherwise called "green threading?"

Yes. A fiber is a non-OS-kernel mechanism in user space to multiplex N fibers inside one thread, cooperatively, so they all block if anything in the host thread blocks. Only one fiber runs in a thread at a time, and this can happen only when the thread is scheduled.

A fiber is parked when unrunnable even when a thread is scheduled, where parked basically means green-blocked. Usually only threads block, unless we speak sloppily and generically. For example, if a fiber makes an async call, it parks until async reply, after which it can run again after unparking (when the host thread runs again, which can only happen if the thread is not blocked).

Fibers in that case absolutely require nonblocking I/O because otherwise a single blocking fiber would bring many fibers in addition to the one attempting I/O to a halt.

Here we have a long drawn out "yes" (the way Morpheus says it to Neo on the phone, as agents approach in Neo's office). A fiber can park, but must not block. A call that would block (eg a system call that does) should a) park the fiber, b) queue a request to do the call, c) get handled by a thread pool that blocks for the call, d) reply via queue to the original thread where the parked fiber lives, e) unpark the fiber with this reply. Step (c) can be replaced with whatever the host environment does to delay until a blocking effect is done. A jyp api queues the request for the host env to handle once the fiber scheduler yields. (It's not okay to put it into a thread-safe queue instantly, as some thread implementations context switch eagerly.)

But in the model where threads are used to manage a bunch of fibers, threads presumably do not use I/O at all.

Now I'm having trouble parsing. A bunch of tC code representing one cf (control-flow, like-a-thread) can hog a whole OS thread to itself, and blocking is okay, because it's just one cf and we want it to block when making calls.

If you instead pack N fibers into one OS thread running tC code, then that thread should never block, unless it's on a condition that means "waiting for one of the replies to my child fibers, because every one of them is parked on an async request the host env has not replied to yet". That thread must queue blocking operations to another thread (and a pool of them is normal, while just one helper thread is weird and undesirable).

OTOH, it's entirely reasonable to interleave threads that manage a bunch of "fibers" with OS-level processes that those threads (fibers) start, and these all have their own conventions about I/O and when/whether it should suspend.

I can't untangle this, but I bet it's what I just said.

I personally think that blocking I/O (and modal dialogs, and threads sharing memory, etc) has always been a mistake, and that multiprocessing is only starting to force us to face a fact that has been true from the beginning.

Well, yeah. I keep expecting people to recognize the fiber/green-thread approach as the sane way to live in a legacy pre-emptive blocking OS thread world. But I act (disingenuously) as if it's a tentative experiment when it's not, because the proprietary track record looks good so far.

Event Model

I tend to agree that async IO, using events is nicer. A couple of thoughts regarding the above, parking a thread is storing a continuation and jumping back to the main event loop. The main event loop takes the next incoming event from the queue, and jumps to its associated continuation. The thread naturally blocks when there are no events in the queue until something is put in the queue. So you still need blocking when reading from the event queue. I don't think there is a way of getting rid of blocking entirely unless you get rid of threads.

I could imagine a model where this is managed in the OS kernel, so that the kernel 'lends' a thread to a user space program, and any blocking or waiting returns the thread to the kernel with a continuation. Inside the kernel CPU sleep states and interrupts can be used. I don't know of any system implemented like this.

unless all the fibers are parked

It looks like you missed my paragraph with words "never block, unless" followed by a description of a condition. Here's a rephrasing: if all fibers in a thread are parked, that thread should block until a fiber awakens. This is equivalent to what I said, and also to your version. My version only emphasizes waiting specifically on a condition variable that amounts to "nonzero unparked fibers" (or nonzero queued async replies that are expected to awaken at least one fiber).

My model doesn't have a main event loop per se, because it doesn't start from a story about dispatching events. (Event loops remind me of 80s and 90s era GUI architectures. While it's not bad, focus is slightly off for some things.) However, you'll be able to squint and say its all equivalent. But in my model, the UI is in another process, because anything with a UI presents risk of memory corruption, specifically from shoddy engineering, yes. So UI events don't awaken fibers directly from a main event loop. However, you can say dataflow reaching the host env process for fibers is "sending events" when messages arrive and sockets have nonzero readable bytes available. But to me it's messages and streams and saying event is misdirection.

If plugged into the network stack where I work, a jyp engine would not receive direct communication at all while acting as an embedded server, except when processing control tunnel messages from load balancing peers. (It currently uses a proto green thread engine I wrote a few years ago, which is very ad hoc, whose behavior I could sometimes only analyze by thinking of how problems get solved in the more ideal thorn model I characterize a little here. A description would not sound recognizably similar though.)

If plugged into something like the daemon model noted above with several flavors of threads, fibers in a jyp engine would awaken as a side effect of available i/o, and from host env replies to explicit requests (typically expected to be blocking system calls). Then they would run when host threads get scheduled.

So you still need blocking when reading from the event queue.

Your perspective is exceedingly weird to me. Using libevent or perhaps epoll directly on Linux, a lot of non-blocking i/o will awaken fibers when polling says a fiber can do i/o without blocking. File system i/o would generally entail queued messages to a host env thread pool for service. When accessing part of a namespace atlas that is an FS mounted from elsewhere, the thread pool call will end up waiting on as many protocol exchanges as needed, which in turn will get serviced by a different set of fibers handling i/o over an atlas protocol connection. I have trouble viewing this as an event queue, but the concept is similar.

I could imagine a model where this is managed in the OS kernel, ...

There's probably a lot of ways to implement the env hosting single-threaded jyp fiber engines. I would do just one. Since I'll release under a BSD license, perhaps someone will try a few different things, and one might resemble your sketch. I very much approve of speculative thinking, about different ways one might do things. It's a good way to pick a better sounding prospect from among several options, instead of focusing on the the first one coming to mind.

I started out just connecting dots, designing something simple that could host jyp fibers in a simple way. But the framework of that host environment got more interesting over time, until I thought it worthwhile in itself, even if there were no fiber engine. That's just a pleasing bonus. Wherever a jyp fiber engine is stuck into a host process, if those fibers service i/o on sockets, they would be testable via loads generated by the daemon described. And it could be used for remote debugging, if at least one fiber embedded in another runtime was willing to serve a protocol describing what happens in the fiber engine when asked.

I should actually work on coding some basic library stuff, instead of writing about this too much. So I'm not expecting to talk about this much unless new ideas pop up. First I'll write some lists, hashmaps, and rings (circular queues), to go along with trees, which will be btrees related in some ways to those I wrote in libraries in the early 90's, at least to the extent of copy-on-write support so atlas namespaces can yield per-process namespace views in different threads. Losing the big picture is easy though, when you write code in head-down mode for weeks or months, so thinking broadly now and then helps keep global perspective.

Edit: I forgot to reply to this part, which is slightly wrong:

parking a thread is storing a continuation and jumping back to the main event loop.

No, parking a fiber whose stack is a linked list of frames will store nothing but the sequel continuation to run when a reply arrives, or i/o is available, etc. The thread will not jump back anywhere, because the thread's main loop alternates between giving the fiber scheduler time, servicing requests and replies, and handling control messages like "shut it down and cleanup" when someone decides the thread can stop. The message inbox queue is a bit like an event system, yes. But the thread never leaves its main loop, and a fiber's continuation is already stored. When all fibers are parked, a thread can block on the message queue. A message should arrive after one or more fibers awaken, or a message might contain the set of fibers to awaken for socket descriptors, or anything similar.

Events and Continuations

My comment regarding still needing some blocking was directed at Ray's notion that blocking IO was a mistake. Somewhere we need to wait when there are no events. That's why I went on to suggest a specialised kernel, so that there could be no blocking IO available to user processes.

I think its fine to use the term 'event' generically, and I have never seen it as just a UI specific term. It seems appropriate to use it with network IO, file IO and interprocess communication etc.

Referring to handling events with jumps: Let's say I want to read a block from a stream, I call 'read', but I don't want read to return until it has completed, so that I can write readable linear code. I can use setjmp to create a continuation inside the read function, and then longjmp to jump to the event loop where I store the continuation in a hash. When I receive the correct message, I retrieve the continuation from the hash, and jump to it, so control flow returns to the read function, complete with its stack context.

The alternative, where you can queue multiple events and then manually return to the event loop makes coding dificult as you can't easily have local variables with lifetimes over multiple async calls. I guess you could pass some kind of dictionary to every event handler, but I can see why a source to source transform would be appealing if this is what you have.

I prefer the setjmp/longjmp method of doing this, it seems a simpler solution, with no drawbacks I can think of, and reduced the chance of programmer error because the whole mechanism can be encapsulted in an apparently blocking function call.

disagree strongly with your way of doing fibers but won't argue

Let's say I want to read a block from a stream, I call 'read', but I don't want read to return until it has completed, so that I can write readable linear code.

Yes, under fibers that parks a fiber, but doesn't block a thread, so it doesn't involve blocking on events ... unless you meant park instead of block. The difference is especially important when the topic of discussion is exactly when threads block. Using setjmp and longjmp is a terrible way to implement fibers efficiently, when this involves copying stacks, if the metric is how many cache lines get touched at context switch time, and cache lines touched are the problem with memory bandwidth limits. You're welcome to do things your way, which I'm not obligated to analyze.

(By "won't argue" I mean no amount of additional detail you add warrants discussing it further, on my part, as long as it consists of you presenting your view as if I'm going to change my plan. I'll start a new discussion tomorrow to discuss activation frame stack architecture, if you want a spec with enough detail to construct a benchmark that's meaningful for stacks reaching several-to-many kilobytes in size.)


I meant a block of data from the stream not blocking or parking. A block as in one disk block (often 512 bytes). The read jumps back to the event loop, so it will continue immediately with the next event in the queue, and only blocks if there are no events in the queue. I thought the behaviour of an event queue was obvious and didn't need explaining.

If you want to claim something is an awful way to do something, you should at least justify it. Have you done any comparative benchmarks? I don't really pay much attention to speculation when it comes to performance, people are almost always wrong.

How do you handle variables in your system? For example say I wanted to do:

test(interface* i, interface* j) {
  packet* x = read_packet(i);
  packet* y = read_packet(i);
  packet* z = merge_packets(x, y);
  write_packet(j, z);
  /* its a pity this is not C++ where we could use RAII to free
   * the memory for the packets when they go out of scope 

For the jumps, the stack would be a few words. As we do not access the memory immediately after the copy can be deferred by the cache hardware. We can reasonably assume the IO syscall will take longer than the deferred copy to complete, so it may not have any measurable performance impact.

As you know, keeping data contiguous in memory is very good for caches. To the extent that dereferencing the three variable from a hash is likely to cause more work for the cache than keeping them in a contiguous block and copying.

The other point is how much performance are you willing to sacrifice for obvious correctness through simplicity? Without benchmarks its hard to make this call.

Edit: If you are using Linux, or a unix which has setcontext/getcontext, it probably much better to use these as you can set the stack for each context directly, avoiding any stack copying at all.


Could you explain how you handle variables across async calls that return to the event loop, so that I can benchmark this against the setjmp method?

Edit: discussing your use case might make an interesting example, but I am more interested in how you store state across event handlers. You obviously can't use local variables because the function returns to the event-loop. You must have some kind of environment collection which is indexed by fibre-id? This means that every store or read from this environment is much more costly than accessing a lobal variable. You would end up with something like:

struct env {
   interface* read_interface;
   interface* write_interface;
   packet* x;
   packet* y;
   packet* z;

test1(environment* env) {
   interface* i = env->read_interface;
   read_packet(i, test2);

test2(environment* env, void* res) {
   env->x = (packet*)res;
   interface* i = env->read_interface;
   read_packet(i, test3);

test3(environment* env, void *res) {
   env->y = (packet*)res;
   merge_packets(env->x, env->y, test4);

test4(environment *env, void *res) {
   env->z = (packet*)res;
   interface* j = env->write_interface";
   write_packet(j, env->z, test5);

test5(environment *env, void *res) {}

Edit: This is not really as bad as I thought it would be, especially if its generated from a source-to-source transform.

optimization: fibers reduce resource usage

There's an implicit premise in my pursuit of multitasking with fibers: it's faster and uses less memory. But I don't think it's a good idea to make claims like that without showing it empirically with code others can run themselves. However, that's where I'm coming from. Usually my role is to make things go faster and use fewer resources, while doing it correctly (with more stability than before). Here's what I said above:

I keep expecting people to recognize the fiber/green-thread approach as the sane way to live in a legacy pre-emptive blocking OS thread world.

By sane I really meant significantly more efficient because I'm an optimization wonk. That's why I think it's okay to make async dispatch slower, because it's paid for already, from avoiding nonsense and limits that would have obtained without using fibers. However, merely saying that is unsubstantiated, so you can only treat it as a premise, though one I think is true.

Because tradeoffs are always necessary, getting smaller and faster means something else gets worse. The easiest thing to worsen is complexity (which also means risk of bugs is higher too). If you hand-write CPS transformed code, you take the punishment up front, and maybe end up with code an average dev can't understand; been there, done that. But you can back up even further and rewrite at pre-compile time. This moves the complexity to the rewriter, so an average dev never needs to see it. You can review code before rewrite, which is no more complex than usual, as well as the transformed output after, which has a regular consistent pattern despite getting a finer grain.

From a user's point of view, a side effect of using fibers pervasively is removal of irregular jerkiness in performance, because latency smoothing is part of the overall plan. It's why you can get some real-time quality of service on the server side, because parking a fiber that cannot make instant progress now is how you keep making forward progress on other things ready to go. To get stuck so latency increases, you need to run out of a resource, like memory or iops to storage like disks. A UI should always be responsive if it has dedicated resources, and you don't wait on backend actions.


I agree with the efficiency of green-threads. What do you think of the idea of sending the control flow with the message in a message-passing / event-based system. If the problem (not necessarily the one you are trying to solve) requires low-latency, and you have two fibres communicating, if fibre A sends a message to fibre B, you can park process A's fibre, and if 'B' is blocking on empty message queue have the native-thread continue execution with the continuation in fibre 'B'.

I think nodejs has done a good job of showing how efficient the event based model is, and is capable of very good throughput with a small memory footprint. I think some benchmarks I looked at showed a highly multi-threaded server won on outright performance, but also needed lots of memory for all the thread stacks.

sounds like a tech support question

I expect nodejs is excellent when using JavaScript, based on reputation, and would be surprised to learn otherwise; it seems very safe to assume. I won't find out personally unless I use it, which seems unlikely unless I ever work with JavaScript full time, which I predict is never. (Picking me to optimize a JS project sounds crazy.)

Posix threads preallocate address space up front, defaulting to a megabyte stack unless you specify otherwise at creation time. (A quarter meg should still be overkill.) This chews up a lot of address space, so fragmentation is big and chunky. It's an issue on 32-bit systems but not 64-bit systems. I know folks who already pretend 32-bit systems no longer exist. Fibers need not allocate more stack than currently in use, and in much smaller pieces, so total footprint is much less and future re-use is much better. So memory story is good.

Before I tackle your first paragraph, let's standardize on terms so it's clear whether we mean process/thread/fiber or block/park when those words occur. If we use OS level names process and thread as both abstract/generic and concrete/specific, this table shows nouns and verbs in user-space green terms. Thorn is my specific variant of green mechanisms. There's no one-word name for green-process unless we coin one; here I use lwp for light-weight-process generically, and lane in thorn specifically.

         process     thread      suspend 
OS    |  process  |  thread   |  block    |
green |    lwp?   |   fiber   |  park     |
thorn |    lane   |   job     |  park     |

Any time we use fibers, there will also be a process and at least one thread, so it's very confusing to say process to mean anything other than OS process. Otherwise, every time you say the words, it's necessary to ask what flavor: OS or green. As long as you're talking to me it's simplest if you say lane to mean green process.

What do you think of the idea of sending the control flow with the message in a message-passing / event-based system.

Without specifying whether an OS process or thread boundary is crossed, there's no way to evaluate the idea in context. Also, phrase control flow (cf) is undefined unless we specify its relation to process, thread, and fiber. I assume you use it in the same sense I defined earlier, meaning the general idea of a thread-like thing, of which thread and fiber are OS and green subcases. If by cf you mean fiber, and it doesn't cross a thread boundary, then it didn't go anywhere and all you do is awaken the same fiber when a message is received. See the problem?

If the problem (not necessarily the one you are trying to solve) requires low-latency, and you have two fibres communicating, if fibre A sends a message to fibre B, you can park process A's fibre, and if 'B' is blocking on empty message queue have the native-thread continue execution with the continuation in fibre 'B'.

I can't parse that without clarification. Communication is usually presented as looking like a data structure or data channel: something that looks data-like rather than code like. So sending a message goes into a queue, flow, or mailbox etc handled by another executable entity like a process, thread, or fiber. Send is associated with write and receive is associated with read, and mutex prevents garbling despite concurrency. (Under fibers, when no other thread is involved, send can be an atomic queue update without need of locking, so mutex is the default when no context switch is possible.)

Sending a message to an executable entity like a process or lane is treating them as agents with control flows inside that can respond to receiving inputs. That's one of the reasons the idea of lane is different from job: so there's an identity one can target with a message. Otherwise, sending a message to a fiber or thread amounts to adding data as a writer to a data structure, such as pushing into a queue, which awakens the consumer (if parked or blocked) to read, typically because a condition variable signals.

There's a basic organizational difference between things messaged by name, using their identity, and things activated as a side effect of pushing new data that awakens an anonymous waiter. You get spaghetti code that's usually garbage if this distinction isn't maintained somehow. Talk of sending message from fiber to fiber gives me a crawling-skin reaction, because it violates an abstraction. In my model that would only happen as a side effect of manipulating data and signalling condition variables, so the fibers aren't coupled. If you wanted coroutines, you can do this with a single fiber that alternates between one stack and another.

My guess is you should have this conversation with a JS developer instead of me. My general answer to a question like that is a condition variable should signal a reader to consume whatever a writer sends. Bounded buffers and crossing thread and process boundaries are complications.

Message Passing

The message passing question has nothing to do with JS, but I guess it was confusing of me to mention node in the same post.

Obviously the message is going from some lwp to another lwp, or the question would no make sense. Both lwps contain fibres, so a fibre is making the request and that fibre gets parked. Presumably other fibres in the lwp can continue to run if there are any ready.

I think your abstractions are missing a concept to let me describe this properly. Considering CPUs are sequential, what do you call the actual instruction sequence executed by the processor? This single "thread of control" masquerades as different user processes because the scheduler interrupts processes and changes the 'identity' of the currently running process by saving and restoring the register set (the process context). If we can agree some terminology for this I can rephrase my question.

longish job execution story in daemon cmd line scenario

what do you call the actual instruction sequence executed by the processor?

I think I get it now. So far I haven't said much about thorn lanes and jobs. (Basically, I assume no one cares, because the interesting idea is green threads generally, not my specific versions, until I cough up some source code.) I'll say enough to answer your question. But to make alternatives clear, suppose there's a competing green thread implementation that does everything differently. Say another system is done by Ned Chary, a coworker of Wil Gowpen. Ned's role is to be negative, critical, worried, skeptical, cautious, reserved, and is in most ways not very gung-ho about Wil's stuff. But he's quite reasonable and agreeable, with as much experience as Wil and just as smart, but he dwells on downsides a lot: a competent but glum guy.

Ned has a system named zebra, akin to Wil's thorn, but Ned uses z everywhere in names where Wil uses y for thorn. Ned has his own lwp and fiber design, completely different from Wil's. Say Ned likes Smalltalk, and his code is exclusively bytecode from the Smalltalk-80 bluebook, which he calls pcode. Ned's fibers interpret pcode, so Ned's answer to your question would be "the instruction sequence is pcode" but each fiber calls a pcode bluebook VM to run it, since it's not machine code for Intel, or whatever he's running on now. A continuation in zebra has pcode and associated environment info.

Wil's thorn system (matching my design) has fibers called jobs, and they run C code that's been compiled to native machine code. A continuation in thorn is called a sequel, because Wil likes shorter names, and it's primarily a (refcounted) handle to a frame pointer, plus a pointer to a C function to call. (Maybe the C function pointer is also in the frame too, so the frame itself is the sequel.) When the scheduler's trampoline runs that fiber, it calls that C function with the frame to use as the stack. This might resume an earlier function interrupted because it made an async request; in this case the continuation is in the middle of the original function's source code, corresponding to where the return value is handled. The frame itself contains locals as well as formal arguments, and a pointer to the previous frame, which is the parent caller. (If there is one; maybe this frame is the main event loop of a fiber that never returns.)

Tyler has a daemon with a bunch of threads in it. One thread runs Wil's thorn system, while another runs Ned's zebra system, and each hosts a lot of lwp's with a lot of fibers. Wil uses lane and job for lwp and fiber, while Ned uses zane and zob (because he loves the letter z). Oddly enough, the two systems can still talk to each other over byte streams like pipes, because as black boxes neither can tell how the other was written as long as they speak a common stream format protocol.

If Wil wanted to, he could run one of Ned's Smalltalk VMs to run Ned's pcode in a thorn job, but he doesn't. Wil likes Lisp better, and he uses an AST interpreter to execute trees of Scheme style pairs; so in that flavor of fiber the instruction sequence is a tree of pairs understood to be an abstract syntax tree. Some of Wil's fibers run plain C as native machine code, while others interpret Lisp trees.

Obviously the message is going from some lwp to another lwp, or the question would no make sense. Both lwps contain fibres, so a fibre is making the request and that fibre gets parked.

Wil types a command line in bash that looks like:

% shy '/y/bin/cat /tmp/foo.txt | /y/bin/grep canary'

That runs a little command line tool that sends the command line inside the string to a shell lane Tyler has running in Wil's thorn thread, which is listening on a port whose default port number is known by shy. (Another cmd line tool named shz sends to Ned's zebra thread.) The shell is a lane whose main job accepts such requests and spawns a new job to run it. In turn that cmd line job spawns two new lanes: one to execute cat and the other to execute grep, after looking up each at the specified path in the vfs. Each is a little descriptor of the executable, featuring a pointer to a C function whose signature is similar to main(), but passing addition args needed by the lane when it runs. The shell sets up a pipe so stdout from cat is stdin for grep. When cat tries to open the file at path /tmp/foo.txt, this might require sending a vfs protocol request to a lane acting as the file system server. So here's an example of one fiber sending a request to another, which is what you asked about. All the rest of this scenario description is to make the whole thing more concrete, starting with something Wil did as as user at the Linux command line, to interact with the daemon running an embedded thorn engine (called a jyp engine elsewhere in this discussion).

The cat program simply tries to open the file, but the interface is defined on the environment, because no system calls can be made in a fiber. The C function called in this case might be named ecyEnv_open() where the first argument has type ecyEnv, which is the environment Tyler's daemon uses inside this thread's jyp engine to service host environment calls, which are often blocking, and therefore park the job instead after queueing the request for service. The call dispatches whatever function pointer appears in the env's vtable for that operation. It suspends the job which becomes parked, and maybe sticks the fiber itself into the request, or else just its jid job ID along with the arguments. If directory /tmp/ is the OS file system directory /tmp/ mounted there, a normal open() system call will occur when a thread pool handles the request. Otherwise if that directory is part of a mounted foreign file system, we send a request to the lane acting as vfs server, to perform an open request in the file protocol. This sticks the request in the lane's inbox; the inbox is serviced by one or more jobs, which poll the inbox and park when it's empty. A job parked on an empty inbox awakens when a message arrives, because a condition variable broadcasts to jobs waiting.

After successful open or failure, the request goes back in reply, to awaken the original fiber that made the request, which unparks and resumes at the sequel (continuation) corresponding to the status return value from the open call.

Presumably other fibres in the lwp can continue to run if there are any ready.

Every lane spawned by that command line is in the same lane group (like a Unix process group), so for the duration of that group's timeslice, the scheduler tries to run fibers within lanes inside that group. (This implies a list for O(1) access time; scanning all lanes for group membership doesn't scale.) If one job parks, another runs, unless every job of every lane in that group is parked. If the timeslice is exhausted, either we go to the next group with unparked jobs, or we yield to the trampoline's caller so it can do something else, like hand-off queued environment requests, such as those bound for a thread pool.

Edit: if the OS has no pipes, or if OS pipes are merely avoided, we use refcounted iovec buffer streams instead with associated condition variables to bound max buffer capacity. In this mode, all bytes passed through pipes are zero-copy in-memory buffers with copy-on-write behavior, so while they can be mutated, no mutation propagates to another consumer of the same stream.

How does that answer his question?

Rys, I think the suggestion was that when a "fiber" "parks" by making a request to another "fiber" within its own "lane", the "lane" within which the requesting "fiber" has just been "parked", should allocate its next timeslice to start running instead the "fiber" to which the request was made. This would be a good strategy as I understand it because it would keep the number of inter-"fiber" context switches to a minimum thereby reducing the overhead of the "lane."

That's assuming that Keean's understanding of your terminology is the same as mine. And may not make sense to you unless your understanding of your own terminology is the same as mine, which is by no means assured. But as I see it Keean was trying to figure out what you call an OS-level thread (I'm guessing "lane") so that he could rephrase his suggestion about how to allocate the CPU power of an OS-level thread, and the above example, while interesting, appears to offer absolutely no direct help in achieving that goal.

You're using very peculiar vocabulary. It's making it hard for people to understand you and it looks like it's making it hard for you to understand others. In the last couple of posts you've added to it - I'll be reading a dozen more before I know what you mean by "thorn system" although I think you mean what most people would call "multithreading." Or, if you don't, I haven't yet seen what the salient difference that requires new terminology might be. By the time I've become certain of that, I fear you'll have introduced something else that will similarly obscure attempts to talk to you and make it difficult to ask you to clarify your meaning.

There is nothing shameful or sinful about using the established terminology for a given technology to talk about particular implementations of it. Honest, nobody will get angry if you do.

By adding detail because the question didn't make sense

Thanks, I value finding out when I've been wasting my time. A context switch should not occur unless forced. Sending a message doesn't suspend sender unless it parks.

An OS-level thread is called "thread" and I never said otherwise. If it seems odd this is the same word, you understand perfectly. Lane means green process -- see table. Process means container of threads. Each OS process has at least one main thread.

A scheduler decides how to allocate CPU power. Fibers get no say. But locality improves performance, so for efficiency a scheduler should let a fiber keep running; or another fiber in the parent container if a fiber parks; or another fiber in the same group barring that. The idea is to reduce cache line misses by staying with code still touching a related set of data.

You're using very peculiar vocabulary.

You're saying I'm not allowed to use a word that means fiber container, holding a set of related fibers. What's the established terminology? Nothing?

Edit: Everywhere you see thorn you can replace it with "Wil's stuff"; it's a codename including anything else Wil writes even if it has nothing to do with multithreading, such as all data strucures involved with no explicit effect on fibers or synchronization mechanisms. In the story, it's directly opposed to Ned's stuff, using a different codename. If you want to describe what kind of code is hosted inside one thread, it would make sense to use a short word to talk about it. Personally I think it's stupid to restate the phrase "code from the library Wil is writing" insteading of using the codename.

shorter second try

As long as it looks like I'm wrapping up, here's a second briefer reply. You can ignore the last if it doesn't help.

The message passing question has nothing to do with JS, but I guess it was confusing of me to mention node in the same post.

Then I wonder what language is assumed as context. My posts say I'm using C, and yet you didn't assume the instruction sequence was native code compiled from C. Your post doesn't give means to guess what you assume if JS is irrelevant.

Because fibers can be implemented in different ways, not a single description below is required. Each detail characterizes only one way to do fibers. And in this sense, the proper term to use is job instead of fiber, because this is how the job version works. In the abstract, fibers don't need to do this, but in the concrete, jobs in one formulation actually do this. (It's okay if you get this wrong; I expect it.)

Obviously the message is going from some lwp to another lwp, or the question would no make sense. Both lwps contain fibres, so a fibre is making the request and that fibre gets parked. Presumably other fibres in the lwp can continue to run if there are any ready.

Messages can be sent in multiple ways. Using a slash to separate lwp/fiber, suppose the sender is S/s and the receiver is R/r. (So s and r denote fibers, in respective lwp containers S and R. In effect, S and R are a programs with one or more flows of control inside, including s and r.) The following are all scenarios that result in a message send:

  • s sends msg to R's id, landing the msg in an inbox, so r finds it in the inbox later when it polls. If r is parked on the inbox now, it awakens, because the inbox has a green condition variable with a list of parked fibers on it, including r. The reply-to part of the request might awaken s directly, or push a message in S's inbox. [Awaken does not mean a context switch, it only means become unparked so it can run later, not now.]
  • s has a handle to a refcounted fiber-safe queue that r reads, and pushes a new request in the queue directly without using the inbox of R. This implies interesting edge conditions about the lifetime of the shared queue, but the ref from s keeps it alive if R and/or r go away. The edge conditions cause coupling: what happens if s pushes a request no one will ever read?
  • Program R consumes one or more byte streams and r reads from R's stdin, while stdout from S goes to R's stdin, and s writes a message on the stream. If each "msg" begins with a message-length, then r can see when the whole thing has been parsed once entirely buffered. Reply works any way they like, including a reply to some other lwp.
  • A circuitous asynchronous chain of streams and inboxes eventually routes some output from fiber s to fiber r. The reply either goes back direcly, or it unwinds each inbound leg one at a time

I think your abstractions are missing a concept to let me describe this properly.

The abstractions are missing because they are not required until a fiber design assumes them. I gather from Ray Dillinger's reaction that he detested the way I offered abstractions describing how my job fibers work. I'm curious whether it also worked poorly for you too.

Considering CPUs are sequential, what do you call the actual instruction sequence executed by the processor?

Native machine code as usual. I'm awfully curious what you assumed.

This single "thread of control" masquerades as different user processes

That's already wrong. A user process has one or more threads inside, so a single thread cannot masquerade as a user process, because it's only one thread. In a fiber runtime, simulates is better than masquerades because actual OS processes cannot treat them as processes.

... because the scheduler interrupts processes and changes the 'identity' of the currently running process

A fiber scheduler inside a thread inside a process changes only the identity of the current running fiber in a context switch. It never interrupts fibers because yield is cooperative. The scheduler only gets control when a fiber returns. But a scheduler does suspend a fiber if the timeslice ran out. (But timeslice is not fiber timeslice; it's the timeslice of something larger, but my talking about it may confuse because I try to name the scope despite not having an industry standard name.)

... by saving and restoring the register set (the process context).

No this is completely wrong for a fiber scheduler. When a fiber yields, it's a return from a C routine to the scheduler, and the same things happen to the register set as returning from any other C function. That does not cause the thread to context switch, because thread context switch occurs pre-emptively at any time, and that would save and restore register set, whether or not the scheduler happened to be in the middle of a fiber routine or not. (However, thread context switch is highly likely to occur when a scheduler provokes it, by either pushing something into a thread-safe output queue, or explicitly blocking on a mutexed input queue.)

If we can agree some terminology for this I can rephrase my question.

I put a lot of effort into the last longer reply. If you want to rephrase and continue, I want feedback on that one too. But if we're done it's okay.

Edit: Suppose I try to guess what you're thinking. Here's Dex simulating you, Wil simulating me:

     "But when I use setjmp/longjmp, this saves and restores the register set," Dex objected.
     "I'm not using setjmp/longjmp for jobs," Wil countered. "Why in the world would I implement fibers your way? I already explained I use cactus stacks in frame lists. It's like you don't listen, or else instantly reset your mind to your original assumption, and ignore what I say."
     "Yes," Dex granted, "but I didn't like what you said. And it was long, like over ten words."
     "There's a certain simplicity to ignoring all ideas not your own," Wil allowed.
     "I thought so," Dex agreed.

Both replies were okay with me

I was just thinking about what my goals are. But lets come back to that. You wanted some assumptions so here they are, raw hardware, a 32/64 bit CPU with an MMU and a single core. Writing in C, with no access to library functions. No operating system, so no thread or process abstractions, just a single register set, and interrupts (you can have a timer based interrupt though).

When I was referring to the scheduler I was referring to a scheduler implemented directly on the CPU, hence changing the process context by saving and restoring the actual CPU registers.

The question was about how this could be implemented using cactus stacks, with message passing as an example. I don't think I have mentioned setjmp/longjmp once since you explained cactus stacks :-)

I think you previous two answers provide enough detail for me to fill in the blanks. I think I will change tack and try and explain my goals better.

What I am interested in is how to present parallelism within the context of a programming language so that it is safe and efficient, so obviously some of your requirements are not relevant to me. However I have found languages that encourage a lot if thread use have problems with stack space, the classic example being Java on the server chewing through gigabytes with thousands of threads. This is something I want to avoid. On the other hand having a single fibre lock up a whole process seems bad, even with nonblocking syscalls there is nothing stopping an infinite loop. I can't see any reason why you can't preempt fibres? You can set an interrupt timer, and on timeout save the fibre stack and context like you would on any async call, only to resume it later. Then the question becomes what is actually the difference between a fibre and a thread? Presumably threads could be implemented with cactus stacks, resulting in a best if both worlds solution that only preempts if the thread holds the CPU for longer than allowed?

hopefully last long comment on threads and fibers

Dillinger's reaction told me I wasn't getting across anywhere near as effectively as I hoped, so I assume my project cannot be obvious, and may require some futile conversations for folks who want to convince others to let them use it when ready. Despite disappointment, even negative results have value. However an associated principle is to cut one's losses when a negative result occurs. This part of the conversation is the epilogue, following a decision to talk less, as long as payoff is feeble. I'm in trailing off mode, but willing to continue while progress has traction.

You haven't said how you feel about using more terms yet. A conversation where one word has several crucially ambiguous meanings is not worth having. In my last post I started using fiber everywhere I meant job, because Ray complained I was using non-standard terms. Though I warned you about this, you still asked why a fiber could not be pre-emptive, when I was talking about the definition of job. We're going to need more words if you want clarity. Now I'll reply piecemeal to specific remarks.

raw hardware, a 32/64 bit CPU with an MMU and a single core.

So nothing can run truly in parallel, given just one core? Or are unspecified GPUs or multiple systems present?

No operating system

In the sense of being ignored while it's there? Or in the sense that one does not exist in your context? If one doesn't exist, you're writing one. In which case, if anything simulates a process or thread, you're implementing the system level process and thread abstractions. (To deny this makes odd conversation. When riding a bus, if you ask the driver where the bus is going and the reply is "you should ask the bus driver, not me", that's okay as humor but poor as serious talk.)

so no thread or process abstractions

Wouldn't it make sense to call what you write a thread then, instead of fiber?

referring to the scheduler I was referring to a scheduler implemented directly on the CPU

In my posts I never once talked about an operating system scheduler, only one that schedules fibers. But I had a feeling scheduler might be taken (anyway) as the only thing most folks ever mean by it, as if it can never refer to anything other than "official scheduler written by the gods, over which we have no control". In writing, one standard shortcut is to introduce something with an indefinite article and extra adjective -- "an xyz foo" -- and then later refer to it with just definite article and trailing noun -- "the foo" -- and these are understood to mean the same thing via back reference. Once I talk about a fiber scheduler, after that "the scheduler" means only fiber scheduler as long as I never talk about another sort of scheduler. This is lexical scoping, as it's done in plain language, making it possible to talk about things without using the one true global vocabulary shared by everyone. Once we talk about two different schedulers, we never use the base word by itself without ambiguity. We might refer to "fiber scheduler" instead as "fiber engine" (and I did use engine some places) and then shorten to just engine. How's that?

(On occasion I have had conversations with people who end up using the most vague possible terms, on purpose, so there's wiggle room to claim they meant a variety of different things afterward. In this case vagueness seems an attempt to avoid falsifiable remarks. If it's always possible to extract a new angle out of old comments, I don't talk to those people, if it's a goal to prevent anything being pinned down. Evasion subverts clarity.)

Across mutiple posts in a forum, though, it's easy to assume terms drift back to standard meanings, even if locally constrained to a narrower domain earlier:

     "You said scheduler," Stu explained, "and everyone says that to mean the OS scheduler."
     "But earlier I said fiber scheduler, and I was referring to that," Wil objected.
     Stu looked incredulous. "You expect me to read earlier posts?"
     "Sort of, yeah." Wil nodded. "Do you post without reading anything?"
     "Of course not," Stu evaded. "But you should never say just scheduler because the gods have ordained that only means an OS entity. You must always say fiber scheduler every single time, no matter how many times you narrow context to fibers."
     "Absurd", Wil scoffed. "What if I say engine? Can you remember that means fiber engine?"
     Stu put fingertips to together and tapped judiciously. "I don't like using new words. Someone might accuse me of saying something wrong. Being mocked crushes me. Basically I think we should talk only about old stuff, using words everyone agrees upon."
     "You mean, have pointless conversation," Wil said, aghast, "because nothing new can be said. Just rehash alone?"
     "Sure," Stu nodded. "We only talk to posture, not to communicate, right?"

I promise not to beat that horse to death again. But my point is that language must evolve, and there are ages-old conventions for introducing new local nuance in terms. If we don't use those conventions any more, in part it's due to social evolution that ignores the past, reads nothing from before today, and assumes global uniformity enforced by petty social negative feedback.

What I am interested in is how to present parallelism within the context of a programming language so that it is safe and efficient,

I expected concurrency instead of parallelism there. You said one core, and didn't mention another system. So that means hardware that can physically run only one control flow at once. Parallel means simultaneous on hardware running more than one control flow. (While sometimes parallel and concurrent are casually treated as alternative synonyms, that works poorly when talking about how much hardware executes code at the same time; in that context they need different meanings.)

If there are thread or process abstractions, it seems like you need to assure some level of safety that's going to take away from efficiency. Pre-emptive scheduling necessarily impedes efficiency, because it forces pessimistic instead of optimistic algorithms to resolve conflict and ensure it converges immediately, under guarantee.

However I have found languages that encourage a lot [of] thread use have problems with stack space,

Some sort of fragmentation is always present, and some kind of scale is assumed by a given design, and typically threads are assumed few in number so stacks are contiguous and as fast as possible when execution focuses inside one thread for a long time. It's just a tradeoff. But in the tech world, most people are judged on the most trivial sort of benchmark that can be put together. To get a merit badge in thread optimization, you need to win a test with just a few threads. If that forces a big stack as default, that's what the benchmark gives you. (If someone writes a benchmark with lots of threads, you can just say "you're doing it wrong" and roll your eyes at their idiocy because they don't know how multithreading is done.)

On the other hand having a single fibre lock up a whole process seems bad, even with nonblocking syscalls there is nothing stopping an infinite loop.

Though perhaps not in this discussion, I have several times emphasized executable entities should operate under limits in space and time, so violations cause bad actors to be killed. Even then, if you decide to permit effectively unbounded cycles, there should be something that watches, to gauge progress, so wild code can be stopped. You can structure it to be utilitarian, in a contract that says, "Per X cycles, you will provide Y units of value, or you will be cancelled", and then warn of deliquency before killing them.

Under cooperative scheduling, any long loop should be segmented, with yields to ensure a fiber scheduler gets control again, so an infinite loop cannot stop other code from running that performs in a quality-monitoring role. For example, in a routine that walks a linked list of arbitrary length to see how big it is, this must made async with voluntary yields to a continuation every N iterations, for some reasonable value N that can still be big for efficient execution. (You can also write a non-async version that simply fails with an error if length is too big; but async need not be forced when small lists are expected. When I want to know how long a list is, what I usually want to know is "are you longer than X?", which doesn't require looking at more than X+1 members. An api for this should be used instead because it scales better.)

I can't see any reason why you can't preempt fibres?

We're back to general versus specific. Jobs specifically cannot preempt, because I defined that version of fibers that way. I stopped using the word job because Dillinger objected I was using non-standard terms. You can define fibers that preempt.

However, if you do that, I'd call them threads instead, especially if otherwise there aren't any threads in the system. You'd effectively be blurring the distinction between thread and fiber so they were synonyms, amounting to natural language level denial-of-service by fuzzing everyone else's tech discussions.

If you had a system with threads, preemptive fibers, and non-preemptive fibers, then I'd give a different name to each. Maybe pfiber for preemptive. I suspect a system cannot have both threads and pfibers though, because they would fight with each other if mixed in one process. I guess you could use different flavors in different processes. But it would make sense that call them variants of threads.

You can set an interrupt timer, and on timeout save the fibre stack and context like you would on any async call, only to resume it later.

The problem with preemptively scheduling fibers is that none of the optimizations work around assuming mutex until explicit yield occurs. You end up with threads.

I haven't tried to evaluate what happens when you try to implement a thread with cactus stacks. (Completely experimental trains of thought have a low rate of gaining traction in getting empirical evidence to evaluate results.) Implementing threads is not something I have any desire to think about, because I prefer a space one step closer to application contexts when writing programs. I want to treat threads as a commodity.

Then the question becomes what is actually the difference between a fibre and a thread? Presumably threads could be implemented with cactus stacks, resulting in a best if both worlds solution that only preempts if the thread holds the CPU for longer than allowed?

As far as I know, no difference when you preempt. The whole point of a fiber is to multiplex a thread, and part of the point of that is that fiber code is single-threaded when a fiber operates solely inside one thread. Resulting code is easier to understand and far less subject to timing nuances. (Synchronization is still needed for critical sections spread across fiber yields, but you can write all the utils with obvious-looking single-threaded code, because things like green condition variables cannot be preempted.)

While I've been enjoying this, I have other stuff to do too, so future posts must be short.


I like the use of precise terms and I have specifically been using lwp, fibre and park because I am referring to these in general, not lanes and jobs. My only criticism is that to claim an unqualified word for your project (thorn) seems a bit much, but I can get used to it eventually. I keep thinking of bus-lanes and shell-jobs. I agree the distinction between fibre and thread is a useful one, although I am still not quite clear what happens if the OS provides fibres directly, I guess its still a thread? There are some corners that need better defintion:

|                |   OS   | User Space |
| Preemptive     | Thread |     ?      |
| Non-Preemptive |    ?   |   Fibre    |

I think the third dimension of the table (linear or cactus stack) is fairly clearly irrelevant to the definitions.

I would rather see "thorn fibre" and "thorn lwp" every time you want to refer to your specific implementation. In the end it should not matter whose implementation of a fibre you are using otherwise you are breaking the abstraction. We should be able to have a conversation about how thorn implements fibres, and then forget about that and just discuss fibres, not caring about how thorn implements them, hence the specific terms lane and job seem unnecessary.

However just because I like precise terms, does not mean I always use them. If you let me get away with vagueness, I will carry on, not because of some desire to be retrospectively correct, but because an implicit assumption that the difference is meaningless. If I use the word thread and fibre interchangeably and nobody picks me up on it, I will assume the difference is not relevant in this context. A words meaning is highly contextual, just because the difference is significant in the context you use the words, does not mean it is in my context. Also because the words you are using have mostly been introduced by character narratives there is no clear definition, so I have to guess what they mean. I then try using the words with the meaning I inferred, and wait for corrections. Gradually through trying to use the words your corrections make the definitions clear in a way the narratives cannot.

The other point that seems relevant is the difference between a conversation and posturing. Over several posts I can see that we are having a conversation, but when you say things like "I'm not going to answer unless you stop trying to convince me to change my plans" it comes across as posturing. In a conversation both people must be prepared to change their position based on the information exchanged. If one side is not then it becomes teaching or lecturing. If both sides are not its just posturing. Of course I may fail to say anything you haven't already considered, but to presume that before I have said it is shutting down the conversation.

In the case where I am not developing for thorn, and you are not interested in discussing the reasons behind your choices (with the notion that things may change based on the discussion) there is no point in discussing anything. This is why I shifted the discussion to my project, because I think you have interesting things to say about threads and fibres in a demanding environment, and I am prepared to change my plans if I encounter something I haven't already considered. If my plan is to develop a language that could replace 'C', then you are potentially a user of my language and therefore your input is valuable.

Shifting topic, when I said parallelism instead of concurrency it was because I was thinking in terms of exploiting parallelism (instruction level parallelism and parallel algorithms). There is no point in parallelising an algorithm on a single-core CPU, as it will be slower not faster. Parallelism is about increasing throughput, concurrency about reducing latency, I am interested in both, I just happened to be thinking about parallelism at that point.

I can use terms you prefer, discussing your plans

My only criticism is that to claim an unqualified word for your project (thorn) seems a bit much

I was divorced around 2003, and had almost no control over my life. It was pretty bad, but as you can see, I didn't die. However, it was that bad. One of the few things I could control was my name, and I changed it, to nickname Rys, though I still answer to David. At the same time, I picked thorn as a concept characterizing stuff I was interested in developing. If you are familiar with Game of Thrones then, metaphorically, the emblem of my house is now thorn. So that's a codename you'll see attached to my stuff. I'm sorry it annoys, but it's one of the few expressions of independence I indulge and prefer to keep. It would be weirder to attach my name to stuff; I like the codename better.

but when you say things like "I'm not going to answer unless you stop trying to convince me to change my plans"

I have a plan for a project I've been homing in on for a few years, that also gathers and incorporates a few things I've been doing twenty years. The amount of inertia in my plan is hard to convey, but a few conversations are not enough to change my personal trajectory much. I don't mind discussing very different plans on a lark.

If my plan is to develop a language that could replace 'C', then you are potentially a user of my language

Yes, that sounds likely. I'm also a potential user of Go. But I'm not a potential user of Rust. I started a discussion here some time back advocating that simple is better than complex. There's two sorts of simplicity I like: 1) less syntax, like Lisp, and 2) an exact statement of semantic nature of each feature without fuzzy parts or complex hidden magic. Anything that sounds like "you don't need to understand this part, we'll take care of it" is enough to kill my interest.

Thorn is fine

I have no problem with Thorn as a concept, so that's not it. It was more the part about abstraction, that is if as a user I can see the difference between a Thorn fibre (job) and a generic fibre, there is something wrong. Therefore I only expect to see the word 'job' or Thorn fibre when discussing how Thorn implements fibres internally. I prefer Thorn fibre to job, but I understand that's just my personal preference. When discussing the user view of Thorn I think we should be talking about fibres and LWPs, because it doesn't matter to me (a Thorn user) how they are implemented, and my intuitions about fibres from other fibre systems should apply without modification. In other words we are really discussing generic fibres and LWPs and that the platform is Thorn is irrelevant unless discussing the internals of Thorn.

Massive inertia is different from infinite inertia. Discussion is only pointless if there is no room for change at all. I accept that you have been looking at a particular set of problems for a lot longer than I have, but sometimes an outside perspective can be useful, or shine light on things in a different and unexpected way. I don't ask that you actually change your plans, only that you approach discussion with an open mind that you might discover something that makes you do so, however unlikely.

I agree on your thoughts about languages in general, although I am not a fan of Lisp. I like the 'idea' of Lisp, and writing a Lisp interpreter is fun, but I don't actually like programming in Lisp. I prefer my functional languages pure, and I find object mutation and looping constructs easier to read. I quite like programming in Ada, apart from its generics (which are module-like rather than C++'s type-class like templates) and its treatment of pointers. I find myself agreeing with a lot of things you write, and I think our philosophy of programming is similar in a lot of ways.

>There's something interesting going on with simple unix utils you can compose in pleasant ways. Why can't parts of a single program in one process be like that too? Why aren't sub-routines structured more like cooperating concurrent processes, and less giant tree-shaped arithmetic expressions with self-modifying state?

Thanks for reminding me, its easy for me to get lost in side-tracks and tangents, So I'm glad you re-posted that. I would like to discuss this too. I think Erlang handles this in a very elegant way. Shared memory (and locks) are problematic, and parallel (concurrent?) processes communicating by message passing seems a better model to me, however I am concerned that it may be less efficient than the thread model. However there is a large semantic overlap. If we consider the 'lock' on an area of shared memory to be a token that only one thread can possess at a time, you could see that this could be an access handle, that you pass from thread to thread in a message. This is the kind of concurrency/parallelism model I am thinking of. Of course this only models a simple mutex, there are other possibilities to consider like a shared read lock, and exclusive write lock.. This seems to generalise the single lock nicely, but has the problem of how to restrict the shared read handle you may have already given to other processes when you want to write to the write handle. Software Transactional memory might be a good option, but I am not sure, I probably have to try it out. If I had to give an answer to the question directly it would be performance, or rather lack of proof than the more structured approaches perform as well as the threaded/shared-memory approach.

Not angry, just frustrated.

I wasn't angry with you, and I didn't mean to denigrate you. I'm very sorry if I seemed disrespectful. But, yes, I'm speaking in some frustration here.

I guess my issue is that this terminology has been introduced in a casual, gradual, and ultimately somewhat roundabout way, which leaves much to be inferred, interpreted, or guessed from clues scattered across many paragraphs. Wrong guesses or incomplete inferences are unnecessary complexities to add to an already complex topic. I'm not asking that you avoid discussing anything new, because as you correctly point out discussion in the absence of new ideas is pointless. But I would really appreciate direct and precise definitions of new terms before launching into further discussion couched primarily in them.

Depending on what level of abstraction one takes these characters to be speaking at, and what fundamental assumptions one takes to underlie their various shorthands and uses of vocabulary, it's way too easy for me to see many ways in which each of the things they're talking about could be understood. The result is that, to me, the introduced terminology has remained quite vague. I've been trying, possibly with success but I can't be sure, to work out what exactly distinguishes each introduced abstraction from different more familiar ones. Ignoring for the moment all the terminology I'm afraid I may be getting wrong except for the two terms "fiber" and "lane" ....

I understand you to be talking about a userspace job control system, including a scheduler and inter-job communications, possibly running on several different OS-level threads simultaneously. Each "fiber" represents a single flow of control. (Note that some use the word job to mean a single potentially multithreaded program and others use the same word for each individual thread, so at least one attempt at clarification hasn't worked out all that well).

The sequence of instructions run by a particular single-tasking CPU core within the context of a single OS time slice devoted to this system may be distributed across one or more "fibers", and some overhead processing for the userspace scheduling etc. But that sequence of instructions will be within a process devoted to a single "lane" because the "lane" is a container for fibers and for several reasons I expect an individual lane to be managed by a single-threaded process at the OS level. Or, maybe I have that wrong and a single "lane" may be managed by a process running in more than one OS-level thread. I'm still not sure. But it seems to me that devoting more than one OS-level thread to running a single "lane" would be a redundant complication because within each "lane" the multitasking has been described as purely cooperative. So there's no motivation AFAICG to saddle lanes additionally with the mutex/lock overhead necessary for the pre-emptive multiplexing used by OS-level threads.

A given program written as a client of this userspace scheduler is not necessarily single-tasking, and therefore may run multiple fibers. I am not sure whether this is intended to mean single-threaded cooperative multitasking with a single program confined to one lane yet, but as I read your dialogs I have begun to suspect that it is not. I think that you intend for it to be possible that different fibers doing work for the same client program can run simultaneously on different CPU cores. If, for the reasons in the above paragraph, I infer that we wish to avoid having the operation of single "lanes" implemented using multiple concurrent OS-level threads, it follows that a single client program may simultaneously be running "fibers" which are contained in two or more different "lanes." But that's starting to be a long-ish chain of inference, and I may have gone astray already.

At any given clock cycle, it is clear that that zero or one lane may be running a given fiber, and that zero or one CPU core may be running a process that manages a given lane. It is believed that no more than one CPU core may be devoted to the same lane at the same time though not absolutely certain. I am not entirely sure whether or not a "fiber" can change lanes during its lifetime, whether lanes can be run on more than one CPU core during their lifetime, or whether multiple lanes can be managed by the same OS-level process. Most of what I know about OS threading facilities and CPU architecture would make the first two hard and probably wasteful of resources on most machines and the third pointless, so I doubt all three of these things. But I haven't been able to rule any of them entirely out based on these character dialogs.

These points of confusion are things that could have been cleared up in a dozen introductory sentences that say exactly what the introduced terminology is supposed to refer to. Despite reading many pages of dialogs and going back-and-forth on all the ways that various figures of english syntax could be taken, debating internally how various characters might be using terms that different people use differently and trying to infer what constellations of common meanings would cause those characters to read as self-consistent or self-contradictory, I still don't know for sure what the parameters of these abstractions are.

sums it up for me too

Thanks for the long reply, which I'll think about and respond to further soon. Now I'll aim for context only. I won't use any of my new terms, but I'll give a reason for wanting the terms. When I started, I didn't have in mind covering my model in fine detail, but that might have been better. Below I provide model motivation, after summarizing first intentions: I was interested in what Thomas Lord posted first, then later I said:

There's something interesting going on with simple unix utils you can compose in pleasant ways. Why can't parts of a single program in one process be like that too? Why aren't sub-routines structured more like cooperating concurrent processes, and less giant tree-shaped arithmetic expressions with self-modifying state?

In effect, I wanted to embed something like unix inside one user process corresponding to my program. But all the parts would be something I wrote from scratch, simplifying everything to a scope one person can handle, while also writing docs one person can read. (This means it can't actually be unix, which is large, only a simpler pidgin.) The over-all theme is cutting things up into manageable parts, which cooperate, using (for example) pipes, sockets, signals, files, but all simulated. The specific model of cutting things up, as well as the cooperation, would mimic unix processes. Inside one program address space, I want to embed everything: servers, clients, file systems, etc, where sometimes those embedded things were wrappers for things outside my process. (For example, mounting part of the OS file system inside my vfs simulation is good.)

So under unix, every isolated program is a process when executed. Good style is to focus on just one thing in a tool. But sometimes a tool like a server has a lot internal complexity, such has having a lot of threads and serving a lot of concurrent connections. No matter how complex the tool, I wanted to be able to port it, embedding in a nested space while retaining everything in some isomorphic setup. If an original tool on a Linux command line has many threads and reads many files, uses many sockets, and deals with signals, I wanted all of it retained in a nested simulation.

The unix environment I want to simulate has a lot of processes, and each process has at least one thread, and potentially many threads. Suppose uppercase P denotes a unix process, and T denotes a posix thread in a unix process. Inside a nested environment, inside one program, I want to simulate every P and every T, where each P is a collection of T's. If there's a productive workflow I can execute on a Linux command line, and it's relevant to what I want my app to do, I want to simulate that workflow inside my app's process, by running all those tools as embedded simulations. For example, if I embed my simulation inside a high-end load-balancing product, I want to be able to specify how a connection gets processed in every detail by a shell script ... but the shell, the script, the pipes, and the tools invoked would all just be embedded entry points in the simulation. And then I want to inspect what happens by viewing a simulated proc file system. And I want to browse the proc file system from a browser, where the http server is embedded in the same nested environment.

Say I'm simulating a lot of processes (at least several per connection, for thousands of connections) and each simulated process has several to many simulated threads. I want each simulated process and each simulated thread to act extremely similar to actual unix processes and threads, but simplified whenever this makes model and docs smaller (at least small enough for one person to do by themselves). This means each P process out in the real Linux world has a lowercase p simulation, and each T thread out in a real P process has a lowercase t simulation. The lowercase p and t models should resemble P and T models, but simplified when necessary. The p and t model simulations need new names; they cannot be called process and thread, because the simulation will also involve actual P and T processes and threads, so it would be confusing. As user space simulations, p is actually a green process and t is actually a green thread. If you call them green process and green thread, though, people will leave off the "green" when speaking quickly, and it will all just be a soup of processes and threads without being able to hear the uppercase and lowercase telling you whether an item is real or simulation.

I always coin new names for things in projects I write, and the names are always very short, and don't sound like other words used in architecture discussions where stories must be told about those things. Everyone always hates the new names at first, to the point of anger. And yet, in design discussion, my peers find they can say complex things very quickly when many ideas have a name with one syllable (instead of seven, which otherwise would be quite common). After a while, the new names are loved by folks who talk about them a lot, because every sentence is clear, and short, so new ideas fit entirely inside a listener's short term memory. This is what my work life tends to be like: hatred of terms at first, then strong approval. While this may not succeed outside work, if I write a personal project by myself, I don't need to consider what anyone else wants.

(I've used most of the terms mentioned before, and I like them, so in my fork of a project they won't change, and all my docs will use them exclusively. If they make others reject and ignore my project, that's fine with me. I don't do anything for a goal of garnering approval.)

The simulation I want to run is single-threaded, so p and t are single-threaded simulations. But since an actual app runs in a process, and often has multiple threads, the single-threaded simulation has to run inside of one specific thread. There can be multiple instances, though. And some parts of the simulation might perform best when hosted by the environment outside the single-threaded part. For example, just to get proper non-blocking execution, it really is necessary to use a thread pool to service blocking system calls, because there are not async versions of them all offered by the OS.

I have written a production green thread system before, as well as green locks and green condition variables, and a lot of other things that work in this sort of environment. So there's nothing very tentative or exploratory about the nature of the simulation. The new part is that I want that particular simulation, not doing a simulation, which I already understand fine. My talking about it is partly market analysis, and partly getting a feel for what things sound like to others, because otherwise I'll write the docs wrong, and I'll write features no one will like or understand (thinking they are obvious when not).

     "You want to simulate the whole Linux world one process?" Dex asked.
     "No, just those processes I would have had to write as separate executables," Wil corrected.
     "So you're not going to simulate systemd in your whatcha-call-it?" Dex wondered.
     "No," Wil shook his head.
     "Because doing systemd sounds cool, so maybe you should do it," Dex encouraged.
     "No!" Wil barked.
     "No need to get pissy," Dex muttered. "You're so rigid, maybe you should loosen up and take a suggestion now and then. Why don't you just use real processes?"
     "You want me to spawn over ten thousand Linux processes?" Wil marveled. "Any part of that sound like a problem to you? Why do you think we're already using green threads? It's just we can't tell what they're doing, or make process supervisors, or alter and debug the existing behavior."
     "Okay, okay," Dex held up his hands. "I knew all that."


Reminding me of the motivation (shell-like facilities implemented in userspace to enable catenative programming) helps a lot. It at least gives me a clear idea what problems systems like this need to solve and immediately clarifies in my mind a straightforward design for it.

If I were prototyping this, I would implement a single-threaded process to manage a "lane", simulate a filesystem as data within that process, and confine communication between fibers to fibers running within the same lane.

I would enable no inter-fiber communications other than the simulated filesystem, because all forms of inter-fiber communications would be simulated on the same substrate and therefore have the same latency and throughput limits. So the vfs should be as good as anything else we could write, and the only interesting question is about providing additional interfaces to the vfs. At the very least the vfs should implement queues and pipes.

The process that manages a lane would do locking of shared external system resources (such as stdout, etc) on behalf of fibers in that lane requesting access to such resources.

If people found the prototype useful, I would expect the most frequent request for a "production quality" version to be sharing the vfs across multiple lanes, which would reintroduce the locks and mutexes the exclusion of which made the system worthwhile in the first place. I would implement it while leaving non-shared vfs functionality intact and biasing the scheduler to keep communicating fibers within the same lane, thereby keeping the lock and mutex overhead from degrading performance too much.

If people found the prototype useful, I would expect the second most frequent request for a "production quality" version to be allowing fibers to share access to their memory arena as do multithreaded OS processes. This additional inter-fiber communication would still be redundant to (or should be implemented in terms of) the vfs, so all "shared" memory structures should be simply vfs files, using the locks and mutexes introduced in the paragraph above for managing access to shared-vfs in the event that fibers desiring a "shared memory arena" were scheduled in separate lanes.

Does this speculative design match fairly closely what you've been considering here?

prototype fiber simulation details

(Keean, I'll reply to you tonight -- can't clear the backlog all at once, but it's nice to have things in the queue to talk about, if you don't mind waiting.)

First I'll respond directly to your comments, then maybe more about implementing simulations. [I had in mind two stages, starting with threads only, then a second stage with fibers. Given uppercase means native OS thing, and lowercase means user-space simulation, we can say P and T are OS process and thread, while p, t, and f are green process, green thread, and fiber. (A t simulation of thread might be a different kind of fiber than f.) Stage one: use T to simulate P, so threads within it are just more T instances. Stage two: inside one T, use p and t to simulate P and T, with t as some kind of fiber and p as mainly a bookkeeping fiction (but an essential one). During stage one, where one T simulates no more than one process P, maybe use another kind of fiber f to simulate T without much rigor because we aren't trying to be as classy as p and t, but we still want a thread simulation.]

As one more notation, we might use uppercase A, for app, to denote the OS process hosting the simulation. That way we can talk about that process without confusing it with P simulated inside. (By convention we never try to host the simulation nested inside itself, so A is never among the P simulations.)

If I were prototyping this, I would implement a single-threaded process to manage a "lane", simulate a filesystem as data within that process, and confine communication between fibers to fibers running within the same lane.

I think you're describing stage one, where T simulates P, so p is implemented as T (with p called lane as shorthand for green process). It's not a pure simulation, because we're using a native OS thing (thread T) as a process, so we get coupled to T semantics, and the p model is not actually single-threaded as planned, because it has to use thread-safe api to interact with anything. However, given a standard api, a rewriter could transform code written for stage one so it runs in stage two, where it is a pure simulation.

(Normally I think of lane as only a single-threaded pure simulation p green process. I hadn't thought about stage one codenames, only stage two codenames. If we stick with uppercase indicating something essentially native OS level is involved, then uppercase L might mean a Lane implemented via T-as-P; since threads inside are just T with no simulation, it doesn't need any notation, but to be consistent we might say Job J is done via T-as-T.)

My main goal for a vfs was communication between lanes, rather than only inside, between fibers. Since fibers simulate threads, which already have an easy shared-mem communication model, they're already implicitly coupled. The boundary that means "I don't really know what's happening for sure anymore" is the lane, so other lane internals are assumed abstract, so process isolation is modeled (despite having multiple simulations inside one A app).

My original vfs motivation was modeling state management in scripts, where data can live in the file system, which lives longer than individual processes. If dataflow between normal Linux tools is via file system, I need a way to model that isomorphically, or else I have to rewrite every ported cmd line tool first thing. But the longer I thought about the vfs, though, the more entertaining the options became; now it would take a lot of words to sketch what I'm thinking. In short, I want app A to host a shared thread-safe vfs that can be patched for local namespace views in each simulated process. I think of it as part of the environment, rather than the machinery specific to p and t simulations. Basically I want the file system to be a server, though one embedded in the local A app process, that can mount other file systems including parts of the OS as well as other (local or remote) app A processes supporting the same vfs connection protocol. In effect, all i/o is asynchronous; whenever it would not be async, a thread or fiber blocks or parks as needed, so it does look async to a simulated endpoint.

I would enable no inter-fiber communications other than the simulated filesystem, because all forms of inter-fiber communications would be simulated on the same substrate and therefore have the same latency and throughput limits.

Inside one simulated process, I want to allow a free-for-all among simulated threads within. No matter how crazy an existing unix tool is before I port it, I'd be able to port it to run in the new simulation context (and this can be mostly automated with a rewriter, once enough api is identified and mapped). So narrowing to using fs alone to communicate is a problem for me. But if this is a feature, I could support a lint mode that constrains a fiber this way, for folks who want to end up with code obeying that rule.

So the vfs should be as good as anything else we could write, and the only interesting question is about providing additional interfaces to the vfs. At the very least the vfs should implement queues and pipes.

In my work environment, a vfs cannot be very elaborate; for me to port the simulation to use in a product I'd like to see use it, a pretty narrowly constrained file system would be required to avoid violating sovereignty of the actual top level host environment in app A, which is a (very) monolithic load-balancing process. So the design I have in mind makes the vfs a plugin in the environment interface, passed into the p-and-t simulation via dependency injection.

However, that's not really limiting. The environment can route all vfs requests to another p-and-t simulation instance, which is as elaborate as you like. :-) And efficient too, if this is just a thread context switch. I agree with you it's important the vfs be as good as it can be made, because it has a pretty big impact on how simulated processes behave.

But I also need to make it possible to use a crippled vfs, when dictated by the deployment target. It's an odd way to think, to ensure both very good and very bad versions of something can work using the same api. I probably need to introduce more notation: E for the environment that knows how the OS context actual works, and maps local p-and-t simulation requests to something handled outside by a) the OS, b) other local simulation instances, or c) remote services by socket connection, perhaps to other app A instances hosting p-and-t simulations.

From a simulation's viewpoint, everything related to i/o is handled by env E, from thread-safe queues to pipes to vfs service. Though this isn't required by implementation, I plan to copy anything that crosses an OS thread boundary, so shared-mem semantics is only a within-thread feature. It's pretty important that nothing inside a pure p-and-t single-threaded simulation be able to touch a thread-safe data structure, since that induces blocking (suspending everything in the simulation while blocked). We ask an env E to do anything that might block, because it knows how to do it without blocking the simulation.

The process that manages a lane would do locking of shared external system resources (such as stdout, etc) on behalf of fibers in that lane requesting access to such resources.

This sounds really similar to what I just said. If I reword it slightly longer: the env E inside app A process does all locking of shared resources, so all access to external resources by fibers is mediated by env E (which is under the control of app A, as opposed to being standardized by every app process).

If people found the prototype useful, I would expect the most frequent request for a "production quality" version to be sharing the vfs across multiple lanes, which would reintroduce the locks and mutexes the exclusion of which made the system worthwhile in the first place.

I was going to start there as the prototype. So it would be worthwhile as a standalone thing right way. But the part of the library that can be embedded as a single-thread p-and-t simulation would not be worthwhile right away, until fully implemented and debugged in the prototype. And then once you could embed the core simulation elsewhere, using a different env E, you'd be able to use the prototype to interact with embedded instances. (Among other things, you'd be abl e to do very expensive things you can't afford in some embedded versions. So debugging services can be outsourced to prototypes via connections.)

Edit: with the ability to make vfs changes on a per-process basis, via copy-on-write patching, we can arrange part of the shared local vfs uses (green) fiber locks only, and no (OS) thread locks. So vfs synchronization can be efficient in sub-trees local to a single-threaded runtime. Scripts arranging local tool pipelines can both assume and assert this is true. Any transition to a global context with OS level locks is something the vfs aims to hide, so it's possible to operate with a uniform view in fibers, even when interacting with remote content. For example, the env hides when a transition occurs between local in-memory data structure and remote file system via protocol.

I would implement it while leaving non-shared vfs functionality intact and biasing the scheduler to keep communicating fibers within the same lane, thereby keeping the lock and mutex overhead from degrading performance too much.

Hmm, you're picturing something slightly different from my image, where there's no OS lock and mutex overhead unless you talk to the outside world through env E. However, there would be some green lock overhead from talking to fibers inside one simulation, but really cheap overhead because it would be single-threaded and have no effect on cache line synchonization, and would only rarely hit actual contention causing fiber context switch. Most fiber context switches would be triggered by hitting full and empty boundary conditions on bounded buffer capacity.

If people found the prototype useful, I would expect the second most frequent request for a "production quality" version to be allowing fibers to share access to their memory arena as do multithreaded OS processes.

That will already be there in my first prototype. I would not want shared-mem across threads, except in carefully curated circumstances like queues provided for such interaction. But it would be hard to stop people, if added by hand to fibers. (A rewrite tool would refuse to do it, but hacking that would be easy too, and also unstoppable.)

This additional inter-fiber communication would still be redundant to (or should be implemented in terms of) the vfs, so all "shared" memory structures should be simply vfs files, using the locks and mutexes introduced in the paragraph above for managing access to shared-vfs in the event that fibers desiring a "shared memory arena" were scheduled in separate lanes.

One advantage to going through vfs is potential for monitoring instrumentation for debugging. The more fibers communicate directly, the less you can observe with tools. (Except a rewriter can be told to go find things and add and/or remove small changes in monitoring api usage.) Going through the env E interface will always provide tracing access of a sort one wants with standard monitoring tools.

Since you said the words "shared memory arena", I'll tell you another irritating word about this in simulation designs: vat. Heap functionality comes from env E in a vat V interface, so different sorts of heap management can be plugged in.

Does this speculative design match fairly closely what you've been considering here?

I tried to note similarities above. I can start a new sub-thread in the other near-empty discussion, to talk about vfs stuff specifically, which will tend to run to long posts. But is this related to programming languages? This is supposed to be about programming languages. (I'm hearing an Arlo Guthrie's Alice's Restaurant refrain here.) If adding a vfs to a programming language environment isn't sufficiently on topic for a PL focus, maybe we should show restraint.

On topic, I think

At this point the vfs is an implementation strategy for a stream-based programming paradigm. With facilities permitting a shared-memory based programming paradigm. Neither of these things is supported very well by existing systems. There certainly are systems that do shared memory, in fact it's the default for a lot of threading models - but implementing it as shared files in a vfs would enable introspection and debugging toools that none of the existing systems can support.

Likewise command line shells with pipes and redirection, etc, can be considered to implement a decent stream-based programming paradigm, including facilities for examination and debugging - but with very high overhead in terms of process separation.

If you can do both things better, then what we're discussing is a species of runtime for a programming environment.

It sounds as though you'd be able to implement this quite rapidly if you reuse a lot of code originally written for plan9 - a cooperatively-multitasking OS that looked for a while as though it might become popular in the middle 1990s.

detailed followup on lane usage questions

But I would really appreciate direct and precise definitions of new terms before launching into further discussion couched primarily in them.

Okay. But it looks like there is no further interest. I'm done if there aren't questions. Note in my experience people don't understand precise definitions, because context is missing. If you already know a definition, a short one is fine; but a new odd definition makes no sense without examples, which themselves aren't clear without context. New topics aren't suitable for discussion on LtU, given a focus on reacting to long-lived items posted elsewhere.

I've been trying, possibly with success but I can't be sure, to work out what exactly distinguishes each introduced abstraction from different more familiar ones.

This is useful to me, to predict what precondition is needed for some things to make sense. Sometimes: "When a thing is defined solely in terms of old standard things." So I should not expect an idea makes sense when presented as a chunk of relationships.

In particular, it should make docs easier to write, if I start from a presumption it's hopeless and that all I'm doing is leaving sparse clues. It may never be good enough when standard definitions are not a starting point. Next to the source code I can dump an appendix labeled: "incomplete notes of mad scientist, good luck."

I understand you to be talking about a userspace job control system, including a scheduler and inter-job communications, possibly running on several different OS-level threads simultaneously.

Yes, that's perfect, though missing the element of N fibers per thread.

Each "fiber" represents a single flow of control.

Yes, a single-threaded flow of control, in that interacting with another fiber has no race condition unless a yield can occur.

(Note that some use the word job to mean a single potentially multithreaded program and others use the same word for each individual thread, so at least one attempt at clarification hasn't worked out all that well).

I take it to mean a live run of something. The perspective is from whoever invoked a program run. I don't think any word has a global meaning; everything is context sensitive, and overloading is the norm. Some words just depend on what you're talking about.

The sequence of instructions run by a particular single-tasking CPU core within the context of a single OS time slice devoted to this system may be distributed across one or more "fibers",

Yes, that's an OS timeslice. But I have never talked about OS timeslices. In the context of describing only fiber systems, each time I said the word "timeslice" I meant something about the behavior of a fiber scheduler, when it sub-allocates time.

Usually when I use a word, it's a free variable subject only to constraints implied by context. Until context implies OS explicitly, an operating system definition cannot be assumed more than hypothesis. But if this causes consistent miscommunication, this is interesting, and probably means I seldom get across to folks who believe in the one-true-global-context. (However, I already know this.)

and some overhead processing for the userspace scheduling etc. But that sequence of instructions will be within a process devoted to a single "lane" because the "lane" is a container for fibers and for several reasons I expect an individual lane to be managed by a single-threaded process at the OS level.

A lane is a lightweight green process simulation with one or more green threads within. It's better than LWP because it's one syllable and is based on a linear metaphor along the same lines as thread and fiber as metaphors. It also suggests other things run within, and that those things can park when suspended. It doesn't sound like other words, or even parts of other words. The name also conveys identity, like streets are named and mean particular instances of ability to run inside. As a term, it has few faults beyond being weird, which I don't mind. It's actually an advantage, when it means something very specific once it becomes familiar.

Or, maybe I have that wrong and a single "lane" may be managed by a process running in more than one OS-level thread.

Fibers running in different threads would interact very poorly when the goal was to have single-threaded interaction of jobs running in one lightweight process. The main features of a lane are a) identity (so you can kill or message one), b) one or more fibers (so it actually does something as a process simulation), c) anything that simulates an effect from an OS process used by an app (so unix signals have meaning, and Erlang inbox messages have meaning, etc). A lane should act like a program with control flows within, coordinating with other programs in the usual way, using pipes, signals, messages, etc. Killing a lane should clean up all resources, since it has to simulate the process isolation cleanup of an OS process.

I'm still not sure. But it seems to me that devoting more than one OS-level thread to running a single "lane" would be a redundant complication because within each "lane" the multitasking has been described as purely cooperative.

Yes, more than one OS thread in a lane is destructively backwards when the idea is to arrange computation like nested russian dolls; you should not be able to contain something you were supposed to be wholy inside.

So there's no motivation AFAICG to saddle lanes additionally with the mutex/lock overhead necessary for the pre-emptive multiplexing used by OS-level threads.

Only green locks when dealing with other lanes inside the same host thread.

A given program written as a client of this userspace scheduler is not necessarily single-tasking, and therefore may run multiple fibers.

Phrase "as a client" throws me a little. If you mean program running inside the scheduler, then it's a green process simulation with green threads inside (a lane with 1 to N child fibers). It would have organization like a normal OS program as an OS process.

I am not sure whether this is intended to mean single-threaded cooperative multitasking with a single program confined to one lane yet, but as I read your dialogs I have begun to suspect that it is not.

I usually assume multitasking is a buzzword that means "concurrency is involved" and nothing more specific. When a lane is a green process living entirely inside the scope of one OS thread, then it is single-threaded, though it is multi-fibered.

You can interact with a lane in another thread (or another OS process, or another network endpoint entirely), but only by invoking an api bottlenecked by the env E, which handles all OS level interactions, including anything of a thread or system call nature. For example, sockets won't tell you where the other end lives. You can write a lane server handling connections to other threads, other processes, or whatever. If a lane reads from it's stdin, and the other side is written by an entity from another thread (or process), the env E provided by the host app hides (but transparently when you care) any non-local effect, but latency is observable.

I think that you intend for it to be possible that different fibers doing work for the same client program can run simultaneously on different CPU cores.

Depends on what client program means. If Jim launches a new OS process to get something done from the bash shell command line, the work can be done using many CPU cores by multiple threads, but any single lane will be inside just one thread. Presumably whoever wrote the algorithms involved figured out a good way to chop the tasks involved into pieces, so specializing one activity to one thread was a good idea. It might amount to running a shell inside each thread, then having a well-known shell divvy up sub-tasks so they are handled by different shells, which spawn lanes to get specific things done. It would look something like each thread simulated a unix box; you would be able to simulate an IPv6 network that actually routed things like that.

Unfortunately, the nesting model allows you to go completely nuts if you want. Restraint is a good idea. The latent flexibility is good for testing, in that you should be able to move things around and still have things work. If a lane corrupts memory, move it's thread to another OS process, and maybe only run that one lane there. It should behave the same, just slower.

If, for the reasons in the above paragraph, I infer that we wish to avoid having the operation of single "lanes" implemented using multiple concurrent OS-level threads, it follows that a single client program may simultaneously be running "fibers" which are contained in two or more different "lanes." But that's starting to be a long-ish chain of inference, and I may have gone astray already.

Though hard to tease apart, that looks astray.

At any given clock cycle, it is clear that that zero or one lane may be running a given fiber, and that zero or one CPU core may be running a process that manages a given lane.

Suppose the vfs has at least part of the directory tree devoted to showing thread containers as separate (instead of merged in a union view). Then it should be possible to send a command line to execute inside a daemon, such that a shell script x hosted in thread X outputs a stream piped to a shell script y running in a different thread Y. This would have all the lanes in script x running inside thread X, while all lanes in script y run in thread Y, distributing the load over more than one thread. One responsibility of the shells is to make-it-work when it comes to crossing execution contexts.

It is believed that no more than one CPU core may be devoted to the same lane at the same time though not absolutely certain.

One lane should not be monolithic so it runs on multiple threads. But you can have one lane act as conductor for an orchestra of lanes running all over the place, in multiple threads. You'd probably want a tool that specialized in arranging things like that, instead of assuming it was a magic feature of the runtime.

I am not entirely sure whether or not a "fiber" can change lanes during its lifetime,

Doesn't make sense. Just send the data to another fiber. Lanes are intended to have identity, but not fibers, so moving a fiber would not be observable. Lanes could move logically by having a broker indirection so things are less coupled.

whether lanes can be run on more than one CPU core during their lifetime, or whether multiple lanes can be managed by the same OS-level process.

A million lanes in one OS process should work, given enough memory and modest per-lane needs. It makes more sense to move who is responsible for work than to move a lane. If you start a new lane you can advertise, "Hey I changed my ID from x to y," and also support forwarding.

I keep hearing people say that!

The obligatory PL part of this post is about a Lisp interpreter in that context, which I'm used to thinking of as annoyingly fragile and limited, so getting more than toy usage out of them is hard and one doesn't bother. I thought of a toy usage that doesn't suck, and also provides a baseline for testing any later fiber-oriented replacements.

I just wanted to ask here, why you think of Lisp systems as annoyingly fragile and limited? Especially in the presence of Lisp systems like SBCL, that can produce fully compiled executables which can be used, if you compile standalone, on bare metal to implement operating systems?

This is a perception I keep running across, and while I understand that there are a lot of toy lisps out there because it's an easy thing to implement, I don't get the perception that anything with lispy syntax must therefore be a toy. Especially when held by someone well-informed, who has probably seen industrial-strength lisp systems.

As somebody who works on a lisp system, I guess I want to ask where we've failed. What does lisp not do for you that it ought to? What can I do to fix it? How ought a system communicate to people that it is not an annoyingly fragile and limited toy?

my bad, I was talking about a toy interpreter

It's an interpreter I'm used to being annoyingly fragile and limited, not Lisp per se, which I like quite a bit. While an interpreter doesn't have to be limited and fragile, it's consistent with motivations to cut corners when doing a simple version of something that is (choose one or several) quick, experimental, lightweight, casual, incomplete, unsafe, untuned, memory greedy, etc. None of those are inherent qualities in a language after one takes a lot of care to polish carefully and apply a lot of engineering vigor. A fast spit-and-bubblegum version of C would be like that too, or Forth or Smalltalk (which I also like and would not want to denigrate either).

So I'm really sorry it read that way. I cut corners when writing. (I write so many words it may not seem like it, but I don't mention one thing for long and nothing gets fleshed out unless a conversation pursues a detail.) In effect I was talking about a tautology that if you use a toy, it has the quirks of a toy. Except that I was trying to say in this particular case, any toy quality seems interestingly hidden by the black box nature of a thread-as-process, because scope for the toy to punish you is greatly restricted. I only didn't say toy Smalltalk in the example I mentioned because I'd actually re-do an old Lisp of mine instead of the one Smalltalk, because it would be faster.

What I meant to convey was: in that way of organizing threads acting as filters, it's easy to do quick and dirty experimental versions of things when trying to work out technique and algorithms in data or protocols, etc, without getting into an unresolvable bind -- specifically because loose coupling and separation stops the rough work from interfering with carefully done parts (assuming no memory stompers). Some high level language works well there, provided it doesn't act like it owns the OS process, or add 100 times the complexity and dependencies of careful parts. To live nicely in one thread without using mutable globals or system interfaces, you'd practically have to write one specific to the purpose. Because Lisp is beautiful and easy to implement, it's a good one to try first, even when a toy is intended. If C were just as easy to implement, there'd be a lot of toy C's around too. It's weird that people do hand-counting of implementations they can see to judge the nature of something, isn't it?

What does lisp not do for you that it ought to? What can I do to fix it?

I don't have any complaints, there's nothing for you to fix. But I can do some community service to help correct misperception in future description. I can have one dialog character suggest using a toy Smalltalk, before another says, no, use a real language like Lisp.

Even so, why is this such a widespread thing?

Unfortunately, even if you know that there are industrial-strength systems out there, I keep hearing the perception - from many directions, not just yours - that lisp systems (and smalltalk systems, and forth systems, and .... ) are "toys"

I know that some of this is just the kind of disparagement that we see from anyone talking about languages s/he doesn't use and may find vaguely threatening, but it's just so darn pervasive that I feel there has to be more to it than that.

Any ideas about what contributes to the negative perceptions?

need more steady sources of positive firsthand anecdote

I turned into a pessimist right on schedule a few years ago.

"The man who is a pessimist before 48 knows too much; if he is an optimist after it, he knows too little." - Mark Twain

It takes a long time to get a good handle on what happens in the minds of others, when you are not all that similar. Sometimes the principle of charity leads one to expect more rational data processing in others than actually occurs.

but it's just so darn pervasive that I feel there has to be more to it than that.

I think it's hearsay, which is awful evidence, but a lot of folks have trouble suspending judgment until facts show themselves. Though lazy and frightening, it's easy to simply automatically accept what is said often, without any evidence at all. In fact, what people say is considered evidence by many. Direct personal experience is poor competition when questions are not asked, and life can be lived as a dramatization of what appears in books, movies, and gossip among one's tribe. (Corollary: if what people said wasn't true, there would be headlines!) Kinda pessimistic, right?

Any ideas about what contributes to the negative perceptions?

There needs to be a free Lisp on hand from the command line, so direct evidence can be gathered first hand when following a recipe found in a blog or on Stackoverflow. Absence of this prevents a steady stream of counterbalancing good bias anecdotes. Best would be 1) free Lisp already installed, 2) good recipes to try oneself, and 3) someone who gets paid a lot says Lisp is part of their success story. The last is incentive to actually go collect the first hand data.

A demonstration of something would be good too, provided it does something that looks surreal to an audience who doesn't think it's possible. Changing the behavior of a website by sending a new Lisp function somewhere might do it, especially if what happens requires something expensive to occur, but new web pages came back right away. (You'd have to prove you weren't cheating, since that hypothesis would come up right away.)

A scripting lisp?

So if, for example, there were a lisp that was there specifically to use as an interpreter for scripts with a shebang line, and it came with a nice set of bindings to routines for handling text, file, and socket I/O in arbitrarily complicated ways, and for fetching URLs etc, and somebody released a whole lot of the sort of things that used to be done with Perl scripts for it, that might help?


Interesting. And not even too far off from what could become possible fairly soon....

I was thinking of an industrial grade free offering

Maybe someone will do a nice lisp for me, running in the same environment, since my first will be a toy. I'll be up to my eyeballs in C rewriting stuff in my copious spare time, and for a while I might be most interested in showing what happens there, in something viewing before and after code trees with cross-referencing, so a global perspective can be viewed with a browser. (Show me everywhere this field is used ... oh that's interesting; now run these scripts on anything that looks like this, and apply a filter to remove that.) I expect I'll need to do a few odd things, like preserve C comments and reformat CPS transformed code so it looks like a person wrote it, with sensible whitespace indent; which means I may go through code-style hell before it becomes clear I won't do every single thing I'm asked once I think it's good enough. (It won't be that hard to alter for another style.)

If anyone liked parts of the mix enough to use, I'd expect unintended consequences, like someone scaling an atlas namespace 1000x bigger; as soon as it looks like an app has its own file system, I'm afraid that's going to get used. :-) And a similar issue might come up with an app having its own process architecture, without needing OS permission. When you remove a natural boundary, it causes some people to spend all the new degrees of freedom, and then more, so you wish the boundary was still there. Someone focusing on the scripting language angle would be able to out-run me easily. I'd be tempted to experiment with Lisp to C compilation just because I haven't before. (And a HyperCard relative would be fun, with either browser or widget library frontend.)

Some folks may not like my toy if it varies from tradition much. An old version named ygg in the early 90s had a Smalltalk object system in addition to Scheme syntax, conventions, and operators; I liked that a lot, so a new version would be similar. Every value is an instance of some type, and every type has a class, and every class has a metaclass, straight out of the Smalltalk canon. So in addition to special forms, macros, and procedures, another sort of form would exist: message passing. When evaluating a list (a pair actually), if what appears in first position is not a special form or callable, then it's assumed a message send if the second list member is a symbol, which gets used as the selector in method lookup in the object's class. (Yes, I'm aware this resembles other things.) This organization makes it especially easy to wrap foreign functions that are method invocations on objects. But it's somewhat heretical to Schemers; a little hate from that quarter would not be surprising.

Edit: In the context of other discussion above, a language used to run a script in one daemon thread (simulating a process) would require an implementation that doesn't 1) assume it owns the process, 2) assume it owns descriptor management and polling, or 3) assume it directly manipulates the OS file system, instead of using an abstract vfs whose structure is controlled by user config that mounts directory trees suiting the user. So Josh is right below that nearly every interpreter can be used to write shebang scripts in Unix. I interpreted Ray's comment to be more about a nice set of bindings to i/o primitives already designed to cooperate as suites of related green processes. In the context of a host daemon, invoking a script doesn't launch a new OS process, so latency to run a script is how long it takes to find a builtin and spawn a fiber with an activation frame to start running its main entry point, perhaps with the script already buffered. Maybe Josh's point is that since Gnu has lisp and smalltalk dialects, and no one uses them a lot, those languages must suck; I don't think that follows though. PS: depending on script size and vfs path convention, if a script has not changed (perhaps based on a big checksum), the last invocation might cache the AST from last time so it need not be parsed again; a similar trick works with every interpreter that caches something faster to reload in the OS file system.

If you're talking about languages that work for scripting,

almost every language works for scripting automatically in Unix, and just as many will work for scripting windows if you register their file-name extensions to run with their interpreters.

So pretty much every lisp and scheme already has a command-line version that works. It's just that when people write scripts in a something other than Bash they're more likely to be Perl or Python or Ruby ( which counts as a Lisp without macros ) than lisp.

But GNU Guile is a Scheme that was meant as a scripting language. Gnu also has command line Smalltalk, and probably prolog and everything else.


A few thoughts, Lisp machines were once a widely used computing platform, so I don't think its through lack of exposure. I think people just prefer other languages - for example see the thread I started on readability and understandability.

I think it is a mistake to assume everyone's brains understand programming the same way. Some people excel at symbol manipulation, others at geospatial problems etc. The popularity of one language to another is probably due to the relative number of people that find the associated mental model easier.

There is also a perception that anything interpreted is a toy, and although there are Lisp compilers they are less well known.

Changing the opinion of the world is a Herculean task, and is probably just tilting at windmills. If you like Lisp then use it, if you dont then use something else. If you want to write a popular language, start with what is already popular and solve the problems that affect the largest number of people.

Lisp is becoming popular feature by feature

but not the language itself.

Eventually there will be a popular programming language that has every single feature that Lisp has, but it won't be Lisp. For one thing it will have a readable syntax, the one feature that Lisp will never have.

Readable syntax

Readable syntax is still a distant target. I suspect Lisp will have that before other languages, too. So it can be included in your main prediction.

:) Hi John

I've commented on other blogs about Kernel, so I've seen you around.

Well, while there are problems with using straight ASCII text, I don't really know why readable syntax should be so hard. It shouldn't be distant. How much work is it to write a parser?

We know how to write parsers. Prolog even has a parser that lets you add in new operators as you go. Polish that up and you can have a program with a nice syntax for every occasion.

I never understood why the idea of an meta-lisp wasn't pursued. Have an s-expression based language target so that code can be generated and walked, and a nice human readable grammar that generates s-expressions.

Anyway, I'm working on a meta-language system now. Of course it will be a meta-lisp, I'd sort of forgotten that that makes it a member of the Lisp family.

The difficulty with readable

The difficulty with readable syntax isn't parsing... or at least, not directly. The difficulty is figuring out what readable means and what sort of syntax would satisfy it. I've been struggling with readability, and (more educationally) watching others struggle with it, for decades. The number of solid insights available is pretty small. Perhaps I should start a blog post (in fact, I'm pretty sure I will start one, now that my thoughts have been stirred up). The most solid results I've got myself are still in pdf form, and look rather recent (2008) only because the paper doesn't directly show the twenty years I spent writing it (here). Perhaps ironically, it's centrally concerned with an abstractly defined class of parsers — but this doesn't violate my opening remark, because the treatment is about behavior rather than implementation.

I'm too tired to be deep

but I'd say that on a syntactic level what makes program readable is:

1) Terseness. If no more is on the page than you need to understand the algorithm, then there's nothing to confuse the eye. It's better to have specialized contexts or even dialects so that each sort of expression is as short as possible than to have everything be overly spelled out because too much superflous information is confusing to the eye.

2) local lack of ambiguity in symbols and grouping. You shouldn't have to move your eyes back and forth to figure out what sort of object you're looking at, you shouldn't have to group and count that way. If two different sorts of things look the same, you shouldn't have to move your eyes to distinguish them. For instance the fact that parens can mean grouping OR function application most languages doesn't confuse people because you can figure out which meaning is used by looking at the character to the left of the opening paren. "+(" would be a grouping paren while "f(" would be an application.

3) you want lots of variation in punctuation to create local visual context. Lisp's uniformity is good for walking code, it's horrible for visual absorbtion.

4) As I said in 1) you need a powerful language that allows you to create appropriate context as needed. You have to be able to avoid boilerplate code that separates the meaningful code. You need to be able to avoid the awkward constructs. So macros or local grammars are a good idea. You need a language powerful enough that you can avoid whole program transformations - no one can read continuation passing style.

I suppose that monoids can be used to avoid boilerplate

and manual code transformations too.

radically different syntax

But when I take my principles and the various trade offs seriously even a little, I can end up with languages which look so unfamiliar that I'd never get anyone to try them... That's one of the reasons that instead of writing a language I'm writing a system to write languages in, so maybe I can encourage people to loosen up.

Implied Structures

See the topic I started on function readability and understandability. My thoughts are that intermediate types that require bottom up construction following the AST make programs hard to understand. Mutating a single object or collection of known type/structure is preferable to generating unknown intermediate types. Have a look at the two versions of a Cartesian product function in the readability and understand ability topic, and see what you think.

I haven't looked yet but I think I understand.

Needing bottom-up construction means that the concrete syntax tree depends on things AFTER the current symbol.

However, in some cases the problem is that people who write about parsers forget that there is no reason that the abstract syntax tree should be the same as the concrete syntax tree.

So even though you can't construct left associative operators top down in the concrete syntax tree, there's no problem having a top down parser build an abstract syntax tree from left associative operators. It's a trivial one-liner.

As for an extendable parser

From within a language, I think that being able to add operators a-la Prolog gets you most of the way there.

But for a full parser I'm partial to using definite clause grammars augmented with operators. DCGs allow mixing semantic processing with parsing maximally, at the cost that you will have to throw some of your work away on backtracking.

Is JavaScript Lisp?

Is JavaScript Lisp with a different syntax? Lisp is not for me, as i prefer languages with type systems, and control ovet side effects. I like the look of PureScript, which is like a strict Haskell complete with kind-polymorphism that compiles to JavaScript.

Is JavaScript Lisp with a

Is JavaScript Lisp with a different syntax?

No, but JavaScript does, for all its faults, have an interesting deep property in common with Lisp. Both languages support facile manipulation of a first-class representation of source code, which can then be interpretted. Lisp has the luxury of choosing as its source-representation a typed(!) data structure, the S-expression. Seems to me anyone who thinks Lisp is untyped hasn't tried to program a wiki using templates; that's far more "untyped" than Lisp, as everything is converted into a text string in order to be passed into or out of a template. JavaScript necessarily uses text strings as its source-representation, because that's the source-representation of the web — which I admit I can't bring myself to condemn, as purely text-based media are inherently open, witness UNIX, TeX, wikis, html. Alas, JavaScript has an insanely complicated and exception-ridden program model (by Lisp standards), and even the means it uses to facilitate manipulation of the source-representation (based on so-called "regular expressions") is baroque. But still in all, I find the parallels with Lisp fascinating.


JSON is JavaScripts s-expression, and it can represent source code, although most interpreters don't support it, as the AST can be manipulated.

JavaScripts weird bits are certainly not nice, so I was just ignoring them :-) Is an idealised JavaScript Lisp? Or put it another way could Lisp m-expressions look like JavaScript?

(not? (equal? (idealize JavaScript) Lisp))

No, an idealized JavaScript isn't Lisp. Lisp has this inherent congruence between control structure and data structure, while JavaScript has an inherent impedance mismatch between the two. This is related, again, to the distinction between dynamic typing, which is often mistaken by static-typing advocates for a mere lack of eager type enforcement, and the profound lack of typing in, say, wiki markup. JavaScript control expressions have a whole lot of structure for which the source-representation (string) has no natural affinity. JSON strikes me as superficial in that respect (nothing against JSON, the deep Lisp-thing just isn't in its repertoire).


But what about m-expressions, they were to compile to s-expressions so if you can define a transformation from JS to Lisp it could be a valid m-expression syntax. JavaScript has all the mechanisms of Lisp, garbage-collection, first class functions, anonymous functions, closures, mutable variables, lexical scoping, lists (arrays), tuples (objects), and a literal syntax for both.

Yes the relationship between JS syntax and JSON is superficial, but so too would be the relationship between m-expressions and s-expressions. Didn't early m-expression proposals resemble ALGOL or Fortran?

M-expression Lisp didn't

M-expression Lisp isn't the way things went, and the character of Lisp as it did happen is deeply characterized by the consequences of its S-expression-based self-interpretation. When I wrote on my blog "Fexprs are at the heart of the rhyming scheme of Lisp", that's true of S-expression Lisp, and I don't think it's unfair to say that Lisp, unqualified, is S-expression Lisp, and "M-expression Lisp" is a fundamentally different (and less distinctively interesting) language.

As I recall, M-expression syntax didn't resemble ALGOL/FORTRAN all that closely, but it certainly didn't resemble S-expressions too much either. Then again, early S-expression syntax didn't much resemble modern S-expression syntax, either (iirc, symbols used upper-case letters, lower-case being reserved for M-expresions; symbols could contain spaces; and list elements were separated by a mid-dot, "·").

A meta lisp could be written for fortran

Really it just means having a reversable parser between a more traditional notation and s-expressions.

You certainly hope that the underlying language has the power of lisp, but the meta part is just a simple two way transform.

As I pointed out underneith,

in my sense, Prolog is a metalisp due to =..

It's a shame that none of the examples includes an

operator. =.. works fine on operator expressions, though it only explodes the top level of anything. A recursive =.. would be nicer, though I suppose there's an ambiguity problem where you need a way to distinguish form lists from other lists.

Yeah.... readable syntax.

I don't really know what to say to that. It's true, lots of people find lisp syntax hard to wrap their brains around. I don't really understand why because I'm not one of them.

The most common complaint I hear about the syntax is about the level of nesting and knowing instantly how the bits you "can see" are nested relative to each other (a complaint implying that people "cant see" the parens themselves, and don't have an auto-indenting editor handy).

I think that traditional lisp syntax does probably use more parens than it needs. One could reserve parens for actual function applications, and use other syntax separators to delimit syntactic subgroupings, and that might be clearer to many because it would reduce the amount of visually apparent nesting. So instead of writing

(let ((a 1)(b 2)(c 3))(+ a b c))

we could write instead something like

(let a 1 . b 2 . c 3 : (+ a b c))

and maybe people would find that more readable? Either way, the only defined forms you're invoking are + and let. The second style would be using the dot and colon for a new class of "syntax separators", and reserving parens just for invocations. Would that help?

The change in syntax would make macrology harder. Then again, the dialect I'm working on requires no macros because it has first-order functions. "let" for example is a function, not a macro or syntax. Sigh, but now that I think about it it would make writing first-order functions harder too.

Macros in a metalisp are no different from macros in lisp

A meta-lisp just has a parser that turns well known constructs into s-expressions. You could pattern match on s-expression syntax or on the pre-transform syntax, it would be just as easy.

a+b would translate to (+ a b) so it doesn't matter which one your macro form did a syntax case on, as long as it was clear which you were using.

Prolog actually does pattern matching on structures including user defined operators, and it's easier to read than scheme macro patterns and (for other reasons) much more powerful than scheme's hygenic macro system. But it also lets you decompose structures into lists and back with the =.. operator.

So I would suggest allowing operators to be added like in prolog, but that's orthogonal to the presence of macro forms. You could even allow more complex parsing, but once again, that doesn't have to make macros any more complex.

True but not what I was talking about.

First-order functions don't require any phased compilation or transformation of code from one form to another. That's what I meant by there being no need for macrology at all. As far as I can tell so far, no need for 'quote' and 'unquote' either.

They do however, require the syntactic analysis of combinations, and that would be ... different ... with syntax separators. Then again, with the right "split" operators, it might be easier to understand than parens, although more complicated in semantics.

Let me think.... where lambda* takes its args unevaluated.....

(define let 
   (lambda* ( bindings : forms)
      (define menv (newenv))
      (define blist (split bindings .))
      (map (lambda rgs (bind! (first rgs)(last rgs) menv)) blist))
      (return (last (map (lambda frm (eval frm menv)) forms)))))

should work...

I'll look at this when I get back...

But that doesn't look like Kernel. What has first class fexprs but isn't Kernel?

That said, Kernel like languages make everything simple at the cost of making optimization impossible.

Ie, you WANT there to be

a compilation phase where you specify at very least, what environment each symbol refers to in a text. That's less than the link I pointed to asked for, but it's really a minimum.

I updated that.

You don't want to know what object each symbol refers to but you need to know, at compile time, where you'll look it up. Except in rare, code that handles equations or code that changes code, by run time a symbol should be an index into a specific environment or a specific object not just a string or interned string.

try editing instead of replying to youself with short updates?

Hey Josh, would you mind editing a post you just made a minute ago to make it longer, instead of posting again? The discussion will become unusable sometime after post count hits 200 and multiple pages become the default. What you're doing is pushing the discussion head-long into extinction unnecessarily. But I'm sure it's not intentional.

It's a hobby lisp...

I've named the dialect, but I'll put off announcing the name until I'm ready to release it and claim the domain name. It's an appropriately metaphoric word from an entirely made-up spoken language. I'm confident no one else will take it first. :-)

The above snippet is pretty vague; I'm still thinking about the practicality, semantics, and potential syntax of splitting the argument list on any of a small set of syntax separators. I'm not even sure I got the above self-consistent. It's absolutely certain that if I have syntax separators they will have to be things that cannot be used as variable names ever.

I've been working on a fexpr-enabled Lisp off and on for a long time. John and I compared notes here a while ago. Its basic mission is to be a more-or-less universal translation target - a language into which programs written in a wide variety of other languages can be translated using, as far as possible, strictly local, syntax-level transformations (and a library that defines language semantics for the source language). The idea is to leave the structure of the program as unchanged as possible (leaving it human-comprehensible and its abstractions intact) while translating it.

Kernel has a different mission but also uses first-class fexprs.

The "Unevaluated" arguments are actually thunks, which means both the expression and its evaluation environment are referenced by the bound value. Normal operations on them will refer to the environment they appeared in.

You can "evaluate" meaning do the calculation and return that result (loop bodies, etc), "bind!" meaning create a binding for the expression (assignments, etc) in its own environment, or "force!" meaning replace the thunk with the result of evaluating it (lazy arguments, etc). Or a couple of other things but only if you've imported #insanity.

#insanity is a library of constructs normally to be avoided but sometimes required to achieve "misfeature compatibility" when acting as a translation target. It includes such gems as "BREAK" which gets you the expression and its environment separately so you can make random mutations in the environment, pick the expression apart to see what the syntax at the call site looked like, or (most common) evaluate the expression in a different environment. All of these operations break referential transparency. BREAK is, in fact, what you would use to demonstrate the problems that fexprs provoked in the early dynamic-environment lisps with no alpha renaming. Or, unfortunately, in a translation of useful programs written in those languages.

I really like the idea of metaprogramming

One limited form of metaprogramming that I'm excited about will be in a language that I will eventually write with a construct that is less than a fexpr but more than a macro.

What I'd like to play with is what I think of as a combination of closure and quote, which is to say that you capture an expression as a list/tree but also capture the environment. Ie, the symbols don't become mere atoms, they retain their link to the bindings and environments - but the expressions are not code, they can be matched on, taken apart, changed, walked etc. And of course they can also be compiled.

This isn't really anything new, it's needed by all symbolic algebra packages and constraint programming packages, but these tools aren't geared towards programming as much as fiddling with equations and so their symbols have dynamic environments, aren't real variables and they're not geared toward metaprogramming.

Beyond that, I'm trying to write something that I hope will have the same sort of abilities that your language needs but for a different purpose, I want to write a system for writing languages. My purpose is to have a well documented, easy to use system that supports all sorts of advanced features out of the box, so that if people want to play with writing complete new languages or gluing languages together, they don't have any pain involved bringing up the basics, they can just focus on their problem.

Of course that may be biting off more than I can chew, but as a hobby I do have the luxury of changing my mind for a few years and build basic tools for a few years till I'm ready to sprint.

I'd be fascinated to see and play with your Lisp, you might already have solved a lot of the problems I have in my queue.

If that's your goal you'll need to know at least this.

I will confirm immediately that fexpr's completely won't play nice unless you keep each argument and return expression associated with the environment where it appeared. There is no shortcut; you can't just pass the environment from the call site and use it for evaluating all arguments, because when one function passes one of its own arguments on to another, along with some expression that is native to the caller, the called function now has different arguments that appeared in different environments, and no single environment is the right environment to evaluate them all in.

So, yes, represent actual arguments (and actual returned values) as thunks for argument and return passing, as you would in a "hygienic lazy" dialect. The primary additional requirement after that is that the called function has to get choices about how to treat (when and whether to evaluate) its arguments and the continuation has to get the same set of choices about how to treat (when and whether to evaluate) the function returns.

I'll also be entirely clear that "BREAK" really and truly is insane, semantically. Fexprs were originally built on the ability to traverse the syntax of argument expressions, but by doing so gave expressions a variable context-sensitive semantics which meant you could not deduce semantics merely by traversing the syntax of expressions. QED, legitimate uses disappear in a puff of logic until you redesign enough to use argument thunks (which capture semantics) rather than syntax as the basis of your fexprs.

Literally every possible use of BREAK is semantically wrong in a fexpr-based Lisp. It has no legitimate purpose beyond emulating the semantics of languages where it was a misfeature to begin with.

Also QUOTE/UNQUOTE are far more "interesting" than they ought to be and should normally be avoided. Fortunately in working with a Fexpr-based semantics I have not yet found any real need for them to be outside the #insanity module.

I knew that but I hadn't followed the consequences

far enough to realize that you could mix variables from multiple transparent closures, so thank you for pointing that out.

The original fexpr's variable->atom->variable transformation almost made sense in Lisp's original shallow-bound dynamic environment, since at least within the fexpr, the environment would be the same as the caller, with the exception of its own parameters and locals. But it just occurred to me that it limited fexprs' usefulness because you couldn't safely keep an unevaluated list around, as it would leak away all of its meaning in other environments.

I would guess that Mathematica deals with this by just having a separate dynamic environment for equations - equation variables are not program variables.

Prolog is known for constraint programming and its case is complex
1) attributed variables (that aren't in the ISO standard) are used for constraint systems. They have unlimited numbers of extra fields, can change their values unlike logical variables and cause threads to block waiting for them to find a value consistent with their constraints - or backtracking is triggered if other variables have to change value to find a consistent state.
2) unifying unset variables can be equivalent to passing references to a variable. It's not a name, it keeps its identity.
3) Variables that have values are treated AS their values like Haskell - but those variables can lose their values on backtracking.

Is there an explanation of your Lisp that I can read?

Kernel like languages make

Kernel like languages make everything simple at the cost of making optimization impossible.

No, they don't. I'd agree they make it more challenging. Various measures in the Kernel design were taken specifically to provide invariants. In particular, it must be possible to make the scope from which an environment is visible wider than the scope from which it is mutable. This is why Kernel's $define!, unlike Scheme's define, only mutates the current environment, even if the symbol it binds is bound in an ancestor environment; and, given an environment value, it isn't possible in general to extract from it the identities of its ancestors.


If you save a reference to the parent environment, can you change it then?

Any enviornment you can specify

Any enviornment you can specify, you can mutate (assuming, of course, you also have access to the standard operatives and applicatives of the langauge). The hard way to do that (i.e., using primitives) is to construct an expression calling $define!, and evaluate that expression in the specified environment; when $define! modifies its current environment, its current environment is the one you've specified. The easy way to do it is to use $set!, which takes the environment to be mutated as an explicit argument; of course, $set! is a library operative whose derivation works by doing it the hard way.

In essence, a first-class environment object is a capability; it confers ability to write local bindings, and read inherited bindings unless locally shadowed.

I just realized that I skipped over

the fact that you wrote "first order functions" because, in context, I thought you meant fexprs...

But none of this makes sense now. First order means no high order functions, no functions as parameters. And that doesn't let you define lambda.

And only Kernel doesn't need a compilation phase devoted to defining environments at compile time at a cost of something like multiple orders of magnitude of slowness. So I don't think I understood what you meant.

First-order and first-class, at the same time.

For a while I struggled to make a language with first-class macros, So I could handle translation from things like blender scripts and R.

When I finally figured that out, I had a language with "first-class macros" but then every application needed to know whether it was a function or a macro before it could proceed. Then I realized that the same passing discipline that enabled these "first-class macros", could also be used for any function. So I altered the function prelude, continuation, return handling, and argument passing discipline to all use the same kind of call frame. Then application etc didn't have to care whether something was a macro or a function, it could just invoke it.

By the time special cases got worked out and things got sorted, there was no longer any semantic distinction between functions and macros, and most of the 'traverse and transform' functionality of macros had become manifestly unneeded (though still possible) as there were now more straightforward ways to achieve the desired semantics.

Because they permit the definition of syntactic forms like let or loop etc, which deconstruct their argument expressions (WITHOUT attempting to deduce semantics from applications contained within their argument expressions), it's first-order code like macros in most lisps. Because they exist at runtime, can be passed as arguments or returned as results or stored and retrieved from structures, etc, they are also first-class, like functions in most lisps.

So while there are not macros as such, there are higher-order functions, the same way there are in most lisps; they are ordinary functions that exist at runtime. They accept functions as arguments and/or return functions as return values. And nothing stops them from constructing first-order functions that define or specialize syntactic forms.

There is no need for staging whatsoever. No macroexpansion time distinct from runtime, and no macros as such. Just functions that can work to do things normally associated with first-order (or higher-order) code.

Is it slow? Yes. :-/ I want to do serious optimizing but haven't got there yet, and I can already tell that much of it will be 'optimistic' or 'falsifiable', meaning the unoptimized versions will have to be there as a backup in case the assumptions or constraints on which the optimized versions rely are observed to be violated during runtime. The semantics make formal analysis "rich in learning opportunities" as one of my profs used to say snarkily of extremely hairy problems.

I was confused because my understanding is that "first order"

is meant to meant as a contrast to "higher order" when it modified "function" or "logic".

In the case of functions it means "you can't pass a function to it, nor does it return a function".

First order logic is similarly distinguished from higher order logic, though there are even more limited versions of logic below it.

Ah. I understand.

I was using "first-order" in the sense it was used by Brown etc, meaning 'Code that manipulates or transforms other code.' As you correctly point out, that is not the meaning that is consistent with the way it's used in mathematics, and I probably should not.

Weird how these things come around, isn't it? Someone started off by describing macros as first-order specifically because they weren't around at runtime and therefore could not interact with runtime functions - but once the usage was in place it came to mean the kind of processing that macros do, or transformation/manipulation of other code. That usage spread over a fairly substantive subset of lisp implementors, and while it was *consistent* with the mathematical usage, it was no longer being used to *mean* the same thing as the mathematical usage.

And then after breaking the runtime separation that had hitherto kept it *consistent* with mathematical usage, I kept using the term because I *meant* by it 'code that manipulates or transforms other code.' Sorry, my fault. I hadn't even realized I'd done that.

So, damn. What proper epithet, if not 'first-order' ought to be used of code that manipulates or transforms other code?