Swarm now based on Scala 2.8, uses delimited continuations

A year ago there was a forum topic about Swarm, a "distributed stack based programming language".

The basic idea is that data can be distributed across multiple computers, and when a thread needs to access some data on a remote computer, it is captured as a continuation, and migrated to that computer, where it is resumed. It is the ultimate expression of the maxim "its cheaper to move the computation, than to move the data".

Then, the idea is to use a "supervisor" process to monitor when the computation must jump between computers, and adaptively rearrange the data to minimize the frequency with which this must occur using some kind of clustering algorithm.

A year ago Swarm was a custom stack-based programming language, implemented in Scala. Of course, the fact that we had to design a completely new programming language was a serious problem, but at the time it was the only way we could get portable continuations.

Swarm has come a long way since then, and after a complete redesign you can now write Swarm code in (more or less) plain Scala. This is possible due to the delimited continuations Scala compiler plugin that will be bundled with Scala 2.8.

Swarm's source code is available under the LGPL, and we are actively looking for people to help us solve the many remaining problems with it.

You can find the source code, and a video explaining the project, at our Google Code page (sorry, no paper yet - the video is the best explanation).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

i have not read the paper so sue me but

is there ever an issue of having to serialize big object graphs to move the computation? i get the impression there is a trade-off: when moving data, it can be easier to predict what would move whereas with mobile (OO at least) code it can be a harder, especially with dynamic reconfiguring of object graphs. thanks.

edit: ah, the original post makes it sound like the code must be the same on both machines, so the code is very static. got it.

Objects are Data...

Object graphs in Swarm are not part of the 'computation'. Basically, whenever the thread is about to send a message to an object hosted by a remote machine, it packages up the thread and forwards it to the remote machine where it can be continued.

The continuation itself must contain stack data (which will likely still contain references to the prior host, along with primitive values like integers) plus the Java bytecode for statically referenced code required by that stack. Code for object access does not need stored, since the code for object access will be available wherever the continuation happens to be traveling.

Use of portable, delimited continuations in Scala allow this feature to be achieved more readily as an API/library rather than a language extension: the edges of a delimited continuation allow packaging up a minimal array of Java bytecode in combination with the necessary fragment of the stack. Without the shift/reset demarcation (and in the absence of language extensions) one would need to package up the entire stack along with all the *statically referenced* code that might be touched by that stack.

That, at least, is my understanding. I may be off a bit... I've never used Swarm or Scala, so I can only base this on what I've read about them.

Swarm would probably do a bit better to carefully select and package up certain amounts of data along with the thread/continuation. In particular, I'd want to package up any objects that are 'owned' by the continuation (with 'ownership' being 'if I'm garbage-collected, things I own will also be garbage-collected'). This would allow continuations to usefully carry arrays, stacks, etc. without bouncing back to the original host.

But that sort of feature can be added later, without modifying Swarm code. Indeed, it might already exist, taking advantage of those techniques that inline objects into stacks and such, or be transparently part of the Scala delimited continuations. I haven't looked at Swarm much since its start, and haven't looked too much into the finer details of Scala's new continuations plugin.

Not the bytecode

I think you're wrong about the bytecode. The Swarm video makes it quite clear that no code is currently shipped around. It's currently the programmer's responsibility to make sure that all relevant classes are in the classpath of the remote JVM. Obviously this is something of a shortcoming, but it's probably manageable in reasonably sized clusters.

The Swarm video makes it

The Swarm video makes it quite clear that no code is currently shipped around. It's currently the programmer's responsibility to make sure that all relevant classes are in the classpath of the remote JVM. Obviously this is something of a shortcoming, but it's probably manageable in reasonably sized clusters.

That is pretty severe.

I guess I assumed the bytecode would be extracted. After all, a big part of the Scala 2.8 plugin for delimited continuations is modifying the code-compilation to be suitable for the purpose (CPS) which should also make it quite simple to snip and forward. My own design for a similar feature certainly calls for shipping both the code and its continuation/data.

Thanks for the correction, Matt.

Is it really that severe,

Is it really that severe, given the target domain? What makes you say that? I'd be surprised if that was true of say Erlang, but for standard web apps or more flexible Hadoop jobs...? Anyways, doesn't seem like a fundamental challenge (and necessary if you want dynamic code). He made it sound more like a feature (wrt performance), but didn't substantiate it.

I might have missed it, but was there any discussion of more than one thread of execution? In the end, there was speculation about TM, but unclear what they actually do and plan to do. They already have the continuations, so spawning a bunch of executions doesn't sound like a big stretch..

Synchronizing distributed

Synchronizing distributed code repositories won't be fun, though I suppose one could use another Java system for that purpose. The problem with using another synchronization mechanism is that one now has additional red-tape and administrative burden for integrating the Swarm system. It will not be readily feasible to utilize Swarm across different security-domains or in a multi-organization cloud, or between client and server, despite the object-capability-model features achievable on the JVM.

Of course, as noted this isn't a fundamental problem. I suspect in the future that either the delimited-continuations extension or the Swarm-framework itself will prove able to package up and transport the necessary code. It might be packaged directly with the continuation, or may provide a reference to acquire larger (cache-worthy) chunks of code on-demand from the remote system. Use of the JVM class-loader from a common repository is also reasonable, but less than optimal.

...

Of course, Swarm has plenty more to do. Use of ambient authority to print to or query whatever is the most 'local' console is sort of disgusting.

The use of a cloud service

The use of a cloud service already removes most of the 'ambiance': running swarm for you EC2 job already has restricted your set of machines and EC2 VMs are giving you CPU isolation (unclear about the reality of memory). There's still the problem that it's unclear how a swarm program would restrict access of a function/library to particular regions, but I'm not familiar enough with Scala's abilities here so I can't say that that's true. I'm not sure what privileges *should* be separated to beginwith; swarm helps manual scheduling and resource allocation look less insane, so these resources do need to be available to the scheduler/allocator.

Ambient Authority for Distributed Computation

It's unclear how a swarm program would restrict access of a function/library to particular regions.

I've spent a lot of time thinking about this problem for Awelon, which aims at similar features, albeit with a much greater focus on security and disruption tolerance.

My approach is to break objects into four categories, the first three of which involve some form of ambient authority:

  1. application - volatile capabilities, unique to application-instance; these allow proxies from the language runtime back to application objects, data, and events-sources. (This level is used when the runtime is operating similar to JavaScript or LUA - as an mechanism for extending an application.)
  2. local - conceptually persistent capabilities unique to particular machine or local network or 'persistent' facets of application, such as output or prompt-access to a particular console, display on a viewport, access to events from a given keyboard or the six-axis state of a given joystick, etc.
  3. global - capabilities common among hosts, such as random-number generators, access to estimated Greenwich Mean Time, request for a 10Hz event, access to HTTP requests on non-local IP addresses.
  4. 'awelon' or language-level - fully object-capability secure objects, which may be distributed to any host that meets the certification requirement for both itself and the objects it references

Local & global objects are provided by factories that are provided by runtime plugins. This allows plugins to hook into the OS and hardware. This doesn't extend the language, though, since only the application receives the initial abstract-factory cap necessary to access the plugins. The factory in my own implementation follows an unum/singleton pattern (that is: independent requests for the same identifier, such as "mouse" or "timer:10Hz" or "file://Whatever.txt", return references that ultimately access or allow observation of the same in-memory object).

For security, objects at each tier may only 'create' caps at the same and higher tier. Since creation at tiers 2 and 3 is unum pattern, it's the same as obtaining existing capabilities, but that makes sense because local and global are ultimately about providing ambient authority through a capability.

Besides object-capability, distribution of caps themselves is also restricted (and audited) through platform-level certificates (composable, e.g. you could say "I trust anyone Google trusts" as a cert). This prevents accidental leakage of caps to eavesdroppers, and limits where 'private' data gets stored. The auditing also allows revocation of capabilities from a compromised platform (e.g. if an enemy captures an unmanned vehicle whose memory contains capabilities to access mission plans).

The certificate-based security domains and distribution-limits can also be applied to the problem of identifying "regions" to which ambient authority should be distributed - that is, regions larger than one host, but smaller than global distribution. A name for a 'global' ambient cap could be constructed with a certificate-based restriction on distribution to, say, machines on a particular network. Uncertified hosts wouldn't ever see the capability (nor its constructor), and objects containing a direct reference to the cap would be bound live within the security domain. One might use this to provide proprietary features and extensions.

I also categorize objects themselves: actors (normal react-to-message objects), bind-able data, observable event-sources, and promises (one-time tasks that may take a while).

First-class support for bind-able data and event-sources effectively allows subscriptions at the 'global' level (e.g. a subscription to a 10 Hz signal) to migrate intelligently, since the language runtime is aware of how to split the 'list of subscribers' among hosts. It also helps with disruption tolerance and determining when to switch to backup sources within a functional-reactive subprogram.

Swarm essentially has two of these: actors and promises. The video discusses supporting other things, like Swarm-aware lists and maps. I wouldn't be at all surprised if Swarm libs eventually supports Swarm-aware observer-pattern, since that sort of migrating-the-subscription is quite useful for both performance and disruption-tolerance.

Data's still being moved, just in the other direction

A continuation contains the local data needed by a computation. So by migrating the continuation, you're sending the local data to the machine with the remote data. Data's still being moved, just in the reverse direction from how it's usually done.

I'm trying to think of the difference between the two. I suppose there could be evidence that the data in a continuation is normally already "processed" or "reduced", so there's less of it (compared to data that you're going to read in the future). On the other hand, the data in a continuation typically includes everything that might be accessed in the future, which may be more than what is actually accessed.

Then again, maybe I'm misunderstanding something? (Sorry, I can't watch the video right now.)

[Edit: fix mistake caught by Andris (below).]

Possible typo


...which may be less than what is actually accessed...

You probably meant "more"?

Code migration with delimited continuations

I believe the technique is not that different from
Persistent delimited continuations for CGI programming with nested transactions
http://okmij.org/ftp/Computation/Continuations.html#shift-cgi

Because we are dealing with CGI, the saved continuation is certainly reinstated in a different process, because the original process will be dead by that time. The continuation could just as well be reinstated on a different processor, provided that the code is compatible. The code is serialized by reference: all code pointers in captured continuations are serialized as offsets from the beginning of the code segment. That is the standard OCaml functionality. I suppose one can do something fancier -- dynamic unlinking -- and produce an ELF object.

It is true that the amount of serialized data can be large: after all, we have to serialize all the data reachable from the captured continuation. Fortunately, at least delimcc on OCaml lets us serialize some data by reference rather than by value. Thus, all the `global' data or data that can be easily recreated on the other end can be serialized by reference. That notably decreases the size of the stored continuations. In my first demo CGI script, the size of the uncompressed stored continuation is about 400 bytes.

The PGAS abstraction is also

The PGAS abstraction is also known; I don't think they're claiming novelty (continuations and scheduling go waaaay back, as noted in your work). The surprise to me is how well these abstractions seem to pair up for primary concerns in programming over clusters, forming the new programming model: you can say which machine to continue to and which heap to allocate to. A lot more domain specialization makes sense, such as for getting your data everywhere before a data processing job, but simple DSLs/libraries with this as the backbone sounds very intuitive.

Missing from this set of abstractions is one of the key benefits of mapreduce and erlang: when dealing with 100+ machines, you'll get weird network effects. E.g., as the numbers go up more, rampant machine failure occurs (see the OSDI stragglers paper last year and a lot of recent machine learning based cloud research). Having a (partitioned) global heap abstraction when partitions aren't reliable loses much of the benefit enabled by MapReduce or message passing: you need some sort of support for recovering from node failure that those alternatives can bake in. You mentioned transaction support in your message here, but I didn't see it in the slides -- transactions on wrt the environment is easy with continuations, but what about the heap? Scale in size of heap and # of CPUs matters here.