archives

emulation for Erlang style processes

In The Problem With Threads Peter Van Roy wrote: The real problem is not threads as such; it is threads plus shared mutable state. To solve this problem, it's not necessary to throw away threads. It is sufficient to disallow mutable state shared between threads

The last few weeks I've been thinking about avoiding shared mutable state in new language dialects, while retaining convenient thread styles. Recently I had a fun idea which -- as an unplanned side effect -- seems to solve shared mutable state as a problem well enough for my purposes. Suddenly I told myself, hey, I'm going to get Erlang style processes out of this. Then when I googled a bit for references, I hit the LtU thread I cited above, which I'd read recently.

Note I'm unsure how to present the whole idea yet, which I'm still groveling over in a zillion pieces, like a car that's been totally disassembled. (It's hard for me to point at this pile and say, "This would be a nice car, if assembled." To you it might seem merely a pile of junk.) Note I'm not trying to build this up; I'm apologizing in advance for how lateral this likely sounds.

The disturbing part is I seem to have veered off the language path into OS territory without meaning to do so. I'm not sure. Maybe that's always what happens when you plan to embed VM systems in host operating systems. The VM world starts to sound an awful lot like an operating system by the time you start getting processes (with no shared mutable state between threads). Try to ignore this transgression into OS territory as a separate issue.

The reason I'm describing this here is because I think a lot of folks would like to get Erlang's safe and efficient use of many thousands of threads into other languages. So this is a tactic you might consider when brainstorming on the topic.

I was thinking about instruction sets for virtual machines, because recently I was going to post a story to the history department about the Smalltalk blue book, with a minor note of interest in the Smalltalk-80 VM instruction set. Then I got derailed thinking about instruction sets, and now I'm posting about Erlang processes instead.

The missing part in the middle, which is still spinning off distracting derivative ideas for me, was a notion of using Motorala 68K family 16-bit instruction sets as a VM instruction set, instead of something like the 8-bit Smalltalk-80 byte codes. One of several reasons for this idea is a rule of thumb: "don't invent things." If you consider a virtual assembler for a VM, don't invent an assembler (or, not until you succumb to exotic extensions). Instead, get a free ride on lots of existing books and tools for some existing, well-known, easy-to-understand instruction set. I happen to like m68k.

Of course, if you don't run on an actual m68k chip, this means emulation of some kind. It's the emulation that leads to a wonderful game making the Erlang process style easy. The emulation of a chip's machine code adds an indirection causing some extra layer cost, but suddenly making many other complex things easier and perhaps cheaper to do. My original motive was something like this: debug all the compilation issues under a simulation, and later worry about real machine code.

Then I noticed that sandboxing was going to be trivial. An emulator can manually memory map all space seen by the code run on the VM, so it's impossible for VM code to either read or write space unless the emulator wants to let it happen. So VM code simply can't touch other pages in the emulator's host process except those the emulator chooses to let appear in the virtual address space.

Then, it's easy to think about sharing readonly pages among separate VM processes, simply by letting the VM's emulator map pages into multiple address spaces, with suitable refcounting. And while you're at it, you can make inter process messaging cheap too, because it's not necessary to actually copy any bytes around in memory to deliver a message in the same host process; you can simply map parts of the host address space into a virtual address space of a VM process.

Also, when sychronization primitives for concurrency are done in the VM engine, then multitasking starts to look rather cooperative (read: cheap) in each VM process. And high level operations performed by a VM engine can be atomic from the perspective of each VM process.

As an example, suppose you mimic m68k machine code and utilize A-line traps so instructions beginning with hex $A are not defined by the emulated chip, and instead cause (emulated) trap exceptions to system defined high level routines. This is how the original Macintosh architecture used 68000 traps to implement the Mac toolbox in ROM very cheaply, since each A-line instruction was only 16-bits in size. Effectively you get to dispatch to an arbitrary late-binding method using only one 16-bit instruction.

On a real 68K chip, A-line traps are not cheap because because they cause exceptions with a lot of register maintenance costs as part of the dispatch to other 68K code somewhere else.

But when a 68K engine is emulated, traps are even faster than a normal dispatch in VM code, because it can be done directly in native code in the VM engine, without emulation. Complex system operations can run native in a VM engine, and atomically from the view of a VM process. If you wanted to implement a Scheme system with, say, around 250 primitive methods, each primitive could be a separate 16-bt A-line trap instruction running native in the VM engine. Since the leading $A consumes only 4 bits of an instruction, the other 12 bits are good for 4096 different traps, without needing to start a new VM engine. So you could implement a very large high level library as atomic VM engine operations this way, without needing any emulation for code in the library.

The last paragraph has little to do with Erlang processes per se. But it helps to illustrate the idea that a VM engine for a hosted language is basically the operating system from the point of the hosted language. The VM engine can use and manage space invisible to a VM process, and perform magic like high level atomic operations without a VM process knowing how or why.

None of this requires any particular language to work. It works with any high level language you want to implement on top of emulated machine code. So it seems to be a runtime idea only, and not a programming language specific idea, unless you count the emulated assembler as the language. Weirdly, you can even use different high level languages in different VM processes, without them being able to interfere with one another.

Here I'll stop suddenly because I mentioned all my main points, and I haven't finished thinking about all the details, and so lack any conclusion as yet.

Nested commits for mobile calculi: extending Join

Nested commits for mobile calculi: extending Join
by Roberto Bruni, Hernan Melgratti, Ugo Montanari

In global computing applications the availability of a mechanism for some form of committed choice can be useful, and sometimes necessary. It can conveniently handle, e.g., contract stipulation, distributed agreements, and negotiations with nested choice points to be carried out concurrently. We propose a linguistic extension of the Join calculus for programming nested commits, called Committed Join (cJoin).
It provides primitives for explicit abort, programmable compensations and interactions between ongoing negotiations. We give the operational semantics of cJoin in the reflexive CHAM style. Then we discuss its expressiveness on the basis of a few examples and of the cJoin encoding of other paradigms with similar aims but designed in different contexts, namely AKL and Zero-Safe nets.

To me the main interest lies in section 4.2, which shows an encoding of a subset of AKL in cJoin.
More references to relationships between logic/constraint and "pi-family" approaches to concurrency are welcome!

Scrap your Nameplate

Scrap your Nameplate
by James Cheney
Recent research has shown how boilerplate code, or repetitive code for traversing datatypes, can be eliminated using generic programming techniques already available within some implementations of Haskell. One particularly intractable kind of boilerplate is nameplate, or code having to do with names, name-binding, and fresh name generation. One reason for the difficulty is that operations on data structures involving names, as usually implemented, are not regular instances of standard map, fold , or zip operations. However, in nominal abstract syntax, an alternative treatment of names and binding based on swapping, operations such as alpha-equivalence, capture-avoiding substitution, and free variable set functions are much better-behaved. In this paper, we show how nominal abstract syntax techniques similar to those of FreshML can be provided as a Haskell library called FreshLib. In addition, we show how existing generic programming techniques can be used to reduce the amount of nameplate code that needs to be written for new datatypes involving names and binding to almost nothing—in short, how to scrap your nameplate.

Email/news interface

Ok, Lambda has gotten very active of late and it's getting to be a pain to keep up via the web interface. Is there any hope of getting an email or Usenet like interface for LtU, as eloquently described by Matt Hellige?

If it involves modifying Drupal source or adding a Drupal module, I'd be happy to volunteer.