managing closed worlds of symbols via alpha-renaming in loosely coupled concurrent apps

I enjoyed discussion in Why do we need modules at all? (about Joe Armstrong's post on the same topic) more than anything else on LtU the last few years. Several comments reminded me of something I've been thinking about several years now. But describing it doesn't seem on topic in reply (or, I lack the objectivity to tell). The modules thread is already long, and I don't want to pollute it with junk.

This is mainly about transpiling, where you compile one source to generate another, with a goal of automated rewrite, where both syntax and complexity in the input and output are similar: compiling one language to itself, but re-organized. For example, you can use continuation passing style (CPS) to create fibers (lightweight threads) in a language without them natively, like C. Obviously this is language agnostic. You can add cooperative fibers to any language this way, from assembler to Lisp to Python, as long as you call only code similarly transformed. You're welcome to free associate about this or anything else mentioned, though. For the last year or so, almost all my thinking on the topic has been about environment structure, lightweight process management, and scripts effecting loose coupling relationships via mechanisms similar to Unix, such as piping streams, etc. Once you work out how a core kernel mechanism works, all the interesting parts are in the surrounding context, which grow in weirdly organic ways as you address sub-problems in ad hoc ways. Thematically, it's like a melange of crap sintered together, as opposed to being unified, or even vaguely determined by constraints stronger than personal taste.

Note I'm not working on this, thus far anyway, so treat anything I say as an armchair design. There's no code under way. But I probably put a couple thousand hours thinking into it the last seven years, and parts may have a realistic tone to them, in so far as they'd work. It started out as a train of thought beginning with "how do I make C behave like Erlang?" and grew from there. I've been trying to talk myself into wanting to code parts of it, sometime.

Several folks mentioned ideas in the other thread I want to talk about below, in the context of details in the armchair design problem. (I'll try to keep names of components to a minimum, but it gets confusing to talk about things with no names.) Most of the issues I want to touch deal with names, dependencies, modules, visibility, communication, and the like, in concurrent code where you want to keep insane levels of complexity in control if possible.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.



VFS vs Shared Memory

The BSD way of creating shared memory is to mmap a file in shared mode. If you give it the tmp flag the inode appears in /dev/shm as a file, and accesses memory like a file, as opposed to non tmp shared memory where the inode appears relative to the current working directory and takes up real (sparse) disk space. You can read both kinds as a file from any process that has the necessary file permissions, or mmap if you prefer. Most Unixes including Linux support this way of sharing memory. How is the VFS idea for interprocess communication different to this?

/dev/shm is a Linux thing.

/dev/shm is a Linux thing, not a BSD one.

But as best as I can tell, Rys seems to want a concurrent PL that allows controlled sharing and is looking at VFS/Plan9 way of doing things for inspiration. Or perhaps he wants a glue language that allows more sharing than /bin/sh or rc. Or perhaps he is designing a system with the same goal in mind. I can't quite tell but my takeaway is this: The components I see are a glueable namespace - you can only access objects you can name (or are given) but you can modify this space, object handles - file descriptors or capabilities or pointers, and object specific API or protocols, and threads of control. I won't quibble about coroutines or kernel threads or processes or even what machine they reside on. This to me is more a system design issue than PL.


I didn't realise /dev/shm was unique to Linux, but in any case you still pass a filepath to shm_open for temporary mapped shared memory, so the VFS still exists, its just you don't seem to be able to mount it so you can use tools like 'ls' and 'cat' on it in BSD systems. /dev/shm seems to be useful for observing the state of shared memory from outside the program.

shm_open() uses a path but...

shm_open() uses a path but there is no guarantee you can use normal open(), read() or write() on it (and at least on FreeBSD read/write will fail). You're supposed to use only shm_open() on such a path. The main use is for two unrelated processes to share memory using mmap() without first having to create a backing file on the FS. And you can pass such a descriptor to another process using unix domain sendmsg/recvmsg (or dup(), to your child process). [Edit: this is not true: IIRC, it may have been implemented in 4.3BSD to deal with the old sys V IPC facility.] Plan9 cleaned up a lot of this mess.

non-blocking i/o only for fibers simulating green processes

The discussion is only relevant when green threads can be used in a PL and performance is a goal. If neither performance nor green threads are a focus, it doesn't matter.

How is the VFS idea for interprocess communication different to this?

My vfs idea is about IFC (inter fiber comm) not IPC (inter process comm), and is primarily about i/o (all i/o, not some) being async from a fiber's view. While i/o can park a fiber, I have a requirement that it not block it's host thread. If you're not doing green threads as fibers, my idea is completely irrelevant. Maybe a vfs is still useful somewhere else, but for fibers is must be non-blocking.

An idea of running programs as green processes involves using fibers that only park a fiber but never block the thread containing that fiber. (Blocking another thread is okay; just not one hosting other fibers ready to run now.)

From a fiber, you can't block from touching memory, if non-blocking is a requirement. The memory involved can't be paged; it must be locked in memory. You can't use a memory mapped file api, if blocking can occur. If you do memory map a file, and lock it all in memory so it can't be paged, you still cannot synchronize with anyone else writing that memory, because any way to coordinate blocks a fiber's thread, which is forbidden. The sole thing you can do safely is have fibers in one thread be writers to memory shared by memory mapping, that's locked so it can't be paged, so other readers can look at the written content. Since read content is unpredictably volatile, it would only be useful for optimistic uses that can fail, like caching hints, where a fallback exists when a hint is garbled by simultaneous read and write of something spanning more than one cache line.

Since a thread hosting fibers would have a goal of avoiding cache line contention with other threads, a writing fiber would normally use code that doesn't sync cache lines, so readers would see stale cache lines. All in all it doesn't sound like a good idea, except to folks who declare victory because memory mapped files are used.

Sharing memory causes coupling. For low coupling you must send messages instead, whether as datagrams or streams, or perhaps specialized notifications. (Simulated signals are just an arriving number captured in a mask, that also unparks a fiber whose job is to poll signals and messages.) Shared access to dataflow in streams is completely different from shared memory; for instance, you can distribute if it seems a good idea.

If you want to design a loosely coupled system around small components and stream (or message) data to one another, it would make sense to imitate one that already exists, so invention is minimized, especially if the existing system is familiar to huge numbers of available devs, who perhaps use it daily in normal development work. As soon as you consider an idea of making a green process platform pursue unix features, one notices a lot of code already exists that could be ported to green processes using non-blocking fibers. If the code you want to port works primarily with streams and messages, those can be mapped into a fiber runtime. But not if they want to use shared memory or memory mapped files. (You could do it, but it would hard. It would take treating calls as blocking that can reach code that touches memory that blocks when touched.)

I think you know all this already. So I assume you just disagree.

Locking Memory

To be honest I hadn't considered swapping as blocking. I haven't really used swap-space for the past few years. Once machines have more than 4GB of memory it doesn't really seem useful. Much better to specifically mmap a huge file if you want to have a data-structure bigger than available memory.

Personally I much prefer message passing for parallel/concurrent programming, so I agree with you there.

What I find interesting, is if you limit yourself to message passing, you cannot tell the difference between threads, fibres and processes, they all look the same from the programmers perspective.

So does that mean you plan on using named pipes on the VFS with some kind of tool to inspect the queues? I am not aware of any unix tools that can do that, although it does sound useful.

vfs features likely open-ended in development scope

I haven't really used swap-space for the past few years.

It's good to tell folks what happens if they use swap (while saying not to do that :-). The api of a fiber runtime can take a env object representing the environment, with a vat representing heap allocation (usually refcounted in my case), with preference to large aligned blocks sub-allocated in a fiber runtime. To the extent memory allocated will be used by fibers, an api contract should say "all mem fibers see is locked and cannot swap." Violating the contract is self-punishing, since you get a lot of runnable fibers that cannot make progress when a host thread waits on swap from memory access. To finish an async operation, you must first start it, so we want every running fiber to get as far as it can before parking, just to get async operations started to minimize latency.

Once machines have more than 4GB of memory it doesn't really seem useful. Much better the specifically mmap a huge file if you want to have a data-structure bigger than available memory.

A limited space problem comes back if a fiber runtime runs in a smallish partition of the address space (whatever the vat permits), then operates on data structures bigger than a fiber runtime's total footprint. User space paging is not hard to write. The blocking aspect can occur in threads devoted to blocking calls, while threads hosting fibers get each fiber to run as far as it can until parking. If fibers want to initiate far more blocking operations than there are threads to block (even after consolidation), the requests end up in queues, but all fibers can still run until they park. An unimpeded fiber should be able to keep going, no matter how many other fibers are parked. Only fibers throttled by an i/o bandwidth bottleneck should make slower progress.

What I find interesting, is if you limit yourself to message passing, you cannot tell the difference between threads, fibres and processes, they all look the same from the programmers perspective.

Yes, such uniformity is a primary benefit. Leverage always seems to come from making something generic, so a common part can be applied to different things. Adopting local perspective of one fiber, it looks like any other party is also a fiber, but it could be anything. A developer in charge of arranging global dataflow relationships ought to aim for locality for performance, but the pieces should keep working even when messages are not local. (So it can be used as a diagnostic tool: vary things that ought to be the same, to see if it reveals a surprise.) Ease in making changes should be a main engineering reward.

So does that mean you plan on using named pipes on the VFS with some kind of tool to inspect the queues? I am not aware of any unix tools that can do that, although it does sound useful.

The short answer is imitating any reasonable-sounding semantics is feasible. Anything hand-rolled is easy to inspect, while parts delegated to a third party (like the OS) are only as easy to inspect as tools given allow. Using actual OS named pipes would work; but the same api could be used for simulations via local fibers or threads. It might depend on the vfs path. If you try to create a named pipe in a sub-tree inside an OS file system mount, maybe it's an OS named pipe. Otherwise it could be a named pipe simulation when the path is within an in-memory part of the directory tree.

I think about the vfs a lot lately. So far best inspiration is in Plan9 docs, with many inventive vfs tactics under the same strategy umbrella of uniformity. Two different sorts of ideas are available for imitation (if and when it seems useful). One class of idea is how protocol is organized, so you can mount the root of a "file system", which is any server using the protocol. To a fiber, a server is just the other end of async connection: could be a fiber, thread, or process, etc. Another class of idea is what happens when you write and read certain paths in the namespace; this second area is somewhat stranger and more inventive in Plan9. Instead of having a single request/reply transaction, some Plan9 conventions have the odd organization of writing a request, then reading multiple replies, sometimes as alternative choices. (For example, write a request "how do I connect to x?" with separate reads returning "a) connect using a", "b) connect using b," ... etc, for as many alternatives exist. That implies a strange state management scheme I'm not sure is worth imitating; but experiments can always be isolated to longer meaning-narrowed namespace paths. I might prefer adding a wholy new write-read operation that's a stateless request/reply transaction, available on some paths; it would be easier to implement.

For a moment let's ignore the idea of vfs as server via protocol. In one prototype, the environment can implement the plugin for vfs as a primarily in-memory data structure, except for nodes that say "I mounted this path in the OS file system here", or else "I mounted this remote server root here with this connection". Here the vfs is just a data structure. But it can be served under the common vfs protocol, if a thread or fiber listens on a port serving the vfs protocol. That vfs server would just turn protocol traffic into data structure operations. The hairy parts would just be maintainining session state, like client-to-server-ID bindings not yet released. If the vfs is a model, then a server session is a kind of view, so a basic vfs model needn't know what kinds of complex things a server has to do when keeping promises to clients. (If the server manages to over-commit itself and get killed, this needn't make a vfs model fail locally; it might just disconnect clients. Service can come back after server restart. I guess session state can be replayed if journaled.)

I'd be inclined to implement features only as needed. If I port a tool and it uses named pipes, then I need named pipes if I don't rewrite. Since I'm not really interested in adding features I can't debug, ability to inspect is something I'd want up front. It should be easy to answer questions like, "At this moment in time, what do you know about this particular thing?" Adding pub/sub support has a kind of enterprisey production flavor -- not much fun as itch scratching.

When even considering the OS to access a filesystem....

If you go through the Operating System, AT ALL, for filesystem calls, you have to consider security and file permissions.

You knew that, of course... Just a reminder.

must prevent acting as remote proxy for low privilege users

A PL angle is likely in here somewhere. :-) Security and permission seldom seems a focus in languages, but maybe it should. So far I treat it as an environment issue, under the control of an app hosting the various bits of machinery. But any mistake would add a fiber runtime to the vector of vulnerabilities. It used to worry me a lot, when I realized my plan enabled perfectly horrific distributed worms, if an user could be persuaded to turn off certain safety mechanisms. I can't stop someone from altering free software to build a peer-to-peer tool with a trivial means of tweaking config to permit very unsafe things.

I told that to a coworker a couple years ago, explaining it really bothered me. He said it wasn't my problem: safety was the responsibility of a dev extending any tool. That was interesting, because he had once occupied a security-centric role, and had relevant background for an informed opinion. I still worry about enabling evil tools though.

Current in-house (work) use of green fibers occurs only along a dataflow path unreachable by the outside world, other than within processing completely under app control. Connections go to the network stack; it defers processing to an embedded async engine. No side channels exist. If I ever used a new engine with greatly expanded flexibility, even debug connections would go through channels heavily controlled. Even so, debug ports would not be used unless root access commands turned on such features from console.

If you go through the Operating System, AT ALL, for filesystem calls, you have to consider security and file permissions.

Yes, that will be a source of great complexity as a toy grows into a production tool. I can arrange that config info constrains what is allowed, and how many resources are allowed, but stopping devs from removing limits on purpose does not seem feasible. If someone made a consumer grade app, there would probably be an evil script a user could be talked into running, that enabled very bad ideas for config. However, if runtime state is reified and inspectable, maybe another script can be run to go audit, then report (and perhaps fix) anything that looks out of line. Seems like an ongoing war scenario (arms race).

The same coworker wanted me to implement a unix user and permissions model, then also enforce it. If all fibers in one process can write on each other, how can I enforce permissions? I could do it cooperatively, to catch mistakes, but not deliberate violations. Anyway, with a vfs I can probably follow rules correctly, cooperatively, assuming good faith. Edit: by cooperative I mean "tell me if I broke my own rules", as near as the system can tell, with false negatives (no report) when mistakes are missed.

My operating model is that one app is a single security domain, inside one process address space, so nominal sub-domains within are convenient window-dressing only, and of low priority in an early prototype. A C dev who starts out with one monolithic soup of processing in one address space will still have that situation after using a fiber system instead, but it will be better organized, and so easier to understand. If I simulate users it would be advisory, the way declaring variables private in C++ is advisory.

First versions of things should minimize features. But a very early vfs tactic will involve mounting a tarball as a file system with version-numbered files (like VAX/VMS), and tar format captures user and permission information, so I can't ignore it without risking inconsistency bugs. My early focus would only be on keeping information straight.

If you embed a library in your process, you're obligated to understand everything it does, since it will do things in your name. The model of fiber runtime I have in mind interacts with an OS only via an env object explicitly provided by the host app. A developer building an executable is responsible for auditing what passes through that api. And if anything risky can occur, any green processes permitted to run inside the engine must be audited for acceptable behavior. (Having them run scripts passed in from connections is not a good idea, without authentication basically equivalent to granting root access.) Sandboxing is feasible, both through separate sandbox process address spaces and limited vfs namespaces that show only safe programs. You'd still need to audit for messages seeking better access, to see if privilege escalation is possible. (I'm not allowed to do this, but maybe you'll do it for me?)

Problems with mere unix permissions? Try ADEQUATE permissions.

The same coworker wanted me to implement a unix user and permissions model, then also enforce it. If all fibers in one process can write on each other, how can I enforce permissions? I could do it cooperatively, to catch mistakes, but not delibate violations. Anyway, with a vfs I can probably follow rules correctly, cooperatively, assuming good faith.

In the first place, enforcing permissions is NEVER something that can be done in a purely cooperative model. Access controls must be mandatory, or they effectively do not exist. If any program, under any circumstances, can choose not to consult the permissions and just directly read or write something, the system fails with no further review required.

My own conclusion recently is that even the unix-style permissions control and ACL is fundamentally inadequate. It protects against a security threat model that is no longer the primary security problem.

The Unix access control model assumes that what must be controlled for secuity is users - to simplify matters, it seeks to deny most users the privileges they would need to subvert the Operating System. Therefore the permissions model in Unix is heavily based on privileges being associated with the user account.

But that's only half the problem. The primary risk to the system these days is not due to users deliberately trying to subvert the system; it's from software downloaded from unknown sources, doing things the users don't know it's doing, using whatever privileges the user running it happens to have.

While the unix permissions model protects the Operating System from random users fooled into running evil software, it doesn't protect against root being fooled into trusting evil software and it doesn't protect the assets of the users from whatever evil software they accidentally trust.

So, I think the system security model should be extended to include not just world/usergroup/user but also allprogs/proggroup/program permissions, so that someone could make his "accounting" files entirely offlimits to all programs that he has not made members of the "accounting" group in his own account, or even offlimits to everything except gnucash if he only runs one accounting program. Or a user could make everything in his "bitcoin" directory belong to the "bitcoin" proggroup and that way some MMO game whose developer wanted to get rich couldn't do it by stealing his bitcoin wallet.

And, yes, permissions should extend to the internet as well, mounted as a virtual subdirectory. Programs don't get write access to it unless you *want* them to be able to send information out over the Internet, and they don't get read privileges unless you *want* them to be able to hear the answer. And if you have, say, a secondlife client, it doesn't get access to anything except the subdomain under linden labs, because really why should it be talking to google analytics?

I mention this because you're talking about implementing a vfs. I've been thinking about implementing filesystems lately too, but mostly because existing file systems have a permissions model I no longer believe in.

Have you looked at capabilities?

Have you looked at capabilities or OSes that implemented them? Such as the Amoeba OS. Also KeyKos & its followons? FreeBSD now has something called `capsicum' which implements "practical capabilities". You may wish to check out Ivan Godard's "security" talk on (he is the main architect of as yet unrealized Mill processor). I think you have to push a security "kernel" mechanism down into the hardware.

There are various solutions that mitigate against root or for that matter any user running "evil software" but of course it is an arms race.

is there a model for adequate permissions?

Here I encourage because I thought that was interesting, but my points are little more than free association. Your theme is hard for me to address. Let's see, if there's another security model you think works well in a vfs, I might add that one too (as opposed to instead-of unix permissions). I think of security as saying "no" according to rules intended to maintain domain separation guarantees. In addition to saying no for failing unix permissions, I can say no for other reasons too. I would want a simple model though.

Per-file tar header blocks don't have much extra room, so round-tripping to a tarball would lose something, unless I put meta-info somewhere else, like associated meta-files. (Yes, tar files are a fairly tangential subject, so this parapraph is lightweight and easily ignored.) I could also use an alternative binary header packing things smaller; tar headers are text-oriented with numbers in ascii. This would allow me to add other permissions, checksums, and the like. When a new header block format is exactly the same size as an old tar file header block, I can convert on the fly to tar format without changing file size, or offets of existing content. Maybe some part of this inspires a new minor idea.

If you do a vfs as an in-memory data structure, you might consider btree format (but not with large blocks suitable for disk i/o) just because copy-on-write for trees is easy and efficient, if you want to support per process directory revisions, without large space costs. I worked on btrees a bit in the 90s; I could write a file system in a file format that's pretty efficient (porting an old one), but I probably won't because my time is over-subscribed. (And if you make a storage system, competitors are quite aggressive and generally pull the full-on jerk routine, which is a time sink.) The CoW part I can describe in a couple paragraphs or so, if it interests.

This is mainly for Bakul. I loved capabilities in the late 80s and thought it would be a great way to manage some game systems then. I hear wishful thinking when some folks talk about capabilities. It's hard to prevent stealing or forging them without pervasively correct security. And it seems to require some cryptography if you want strength in depth. I like to add a very weak form of capabilities to interfaces, without many bits, so little more than detecting casual mistakes is supported. Basically, it's easy to demand a number associated with an entity if you want to manipulate it, when that number should already be known. It's too easy to forge when not many bits are involved, but it doesn't take many bits to catch accidental cross domain references with high probability, so the root cause can be found and removed. This paragraph is more to make conversation, since I don't have a big point.

I'm not sure what to do about making users safe from executing code they find lying around the internet. (If you open an archive in the low transcend, you might find a self-booting evil, if you're not careful.) The main problem is they're willing to run it. I feel like an idiot when I download, say, gnu software and build it. What if autoconf rapes and pillages my machine? Nah, that would never happen. See, that's why we do it: I won't be burned this time. Maybe non-technical folks have no fear at all. If you have a specific scheme for improving things slightly, via means that are enforceable (because software can't ignore your neighborhood watch) I'd like to hear more to think it over.

If any program, under any

If any program, under any circumstances, can choose not to consult the permissions and just directly read or write something

This is only true when the "something" you are reading or writing is not itself an authorization token of some sort, like a capability.

For instance, suppose programs could not amplify strings into file descriptors, and thus the only way to obtain a file descriptor was to be given one. Consulting the permissions of the file is now irrelevant, and directly reading and writing the file descriptor is explicitly authorized.

So, I think the system security model should be extended to include not just world/usergroup/user but also allprogs/proggroup/program permissions, so that someone could make his "accounting" files entirely offlimits to all programs that he has not made members of the "accounting" group in his own account, or even offlimits to everything except gnucash if he only runs one accounting program.

Security ought to be considerably more flexible and simpler than you describe. If you want a permissions-based system to think about this, I suggest reading up on Polaris by the capability security folks. They turned ordinary Windows into a virus-safe platform with very few changes.

ring model hierarchy from polarizing data tainting?

Page five of the Polaris paper clearly states how the plan works, by running code in a user account restricted to the least authority needed. A sidebar (p.8) defines permission as written ACL rule policy, and defines authority as the set of actions a process can cause to happen. The design aims for a principle of least privilege, but (according to the paper) privilege is ambiguous, while permission and authority are more specific. Authority results from possible behavior of parties operating under respective permissions.

The idea is to taint code and data with the polarity of its source, under a polarized light metaphor, so propagation is constrained by polarized filters, to stop cross domain access when not allowed by permissions. A very low permission user account should exist for untrusted sources, so launching any OS process runs only under that restricted user account, with files created only in directories owned by that restricted user. So the OS enforces limits on authority when OS resources are involved, like processes or files.

Early parts of the paper are about how this looks in GUI apps, so an app user can see what happens and assign pet names to virtual users with less permission, like browsers that should have less access. We can ignore that, and consider what affect this sort of organization has on vfs design and/or tactics.

In an OS file system mount in a vfs, the path for a low permission restricted user should probably be in actual directories owned by that user, so the OS view looks right when a vfs is not present.

If messages come from restricted user account code or data, we should probably taint such messages with the right polarity, so operations only occur using least privilege. Maybe this means you find which part of several things has least privilege, then use the vfs view for the user account with least privilege.

Suppose we want to simulate green processes for a restricted user account in the same address space where we also run code for root (maximum permissions). Can we show a restricted user a world view that's as stark and barren as if run in an OS process by itself, with nothing visible except what is permitted?

Instead of one global vfs, a runtime can have one per class of user, then use the right one matching both code (from green processes) and data (from messages and vfs files). This might mean that completely different server processes handle low privilege requests. You wouldn't have to figure out how a high privilege server copes with low privilege data, if it could not even receive it. Maybe the runtime partitions everything in privilege rings, and there's no way to reach a better ring. (You could run low privilege fibers in a low privilege thread, and give it worse quality of service when this seems a good idea.)

Hmm, you wouldn't want to stop a high permission admin from processing part of a system with low permission. So low permission data itself cannot taint permission as low. Maybe polarization taint rules for code and data are different. The big problem occurs when high ring code executes a command where part of code or args comes from low ring. (What file should I delete? Here's a path from low ring buffer.) It doesn't seem possible to ensure low ring taint is preserved, though, especially if a high ring copies low ring buffers to high ring buffers; maybe that can be forbidden. Maybe the runtime refuses low to high copies; if low and high use different vats, with large super-block aligned addresses, you'd be able to test ring membership with a single hashmap lookup on a masked pointer address. It sounds do-able, but a total pain, and it makes everything slower to perform checks.

That would be a good reason to use different OS processes: one for high-ring-only and another for a mixture of permission rings, which runs more slowly as when a high ring has to protect itself from low ring taint effects to be safe. Maybe the worst part would be having to talk about it in public forums with people who want to argue about scenarios.

Maybe the low ring OS process is not a mixture, and is all low, and merely executes high commands from the high ring process. A lower ring does what a higher ring says, but not the reverse, and there's no mixing of low ring data moving upward to a high ring, because it doesn't happen. Then each process is equally fast; you just drop lower ring messages. Okay, I'm done. That's all the brainstorming I'll give it.