How to ensure safety when millions of users inject scripts into a running system?

Although my motivation is financial systems, I use the example of computer games to make the discussion appeal to a wider audience.

Assume there is an on-line video game which has tens of millions of users. The game is some sort of strategy game with armies, battles, shootouts, etc. Although most players control everything manually, the game provides a way for players to script their actions. For example, a script could tell three soldiers to spread out in a triangle whenever they see a certain monster. Another script could have the workers automatically produce new soldiers as old one die out.

In this scenario, it makes sense to provide a proprietary, domain specific, scripting language that lets players cause all kinds of mayhem with virtual guns, but doesn't allow formatting the local disk drive.

As the game becomes more popular, some players demand more features in the scripting language. They wish to do monte-carlo simulation to figure out the possible outcomes of an attack. They wish to buy third-party scripts and integrate them into their own scripts, etc.

The game writers decide that they should create a general purpose language. They write their game engine in this general purpose language and accept scripts in the same general purpose language. This allows a community to grow around the language, with nice tools, libraries, compiler/interpreter optimizations, etc. However, this language must have the ability to control what these 'injected' scripts can do. Obviously these scripts can't modify the game engine, itself, they can't access the local disk, they can't create data-structures that eat up all the memory, they can't hog all the cpu, etc.

What kind of language is this? What real-world examples are there that do this or what papers describe such systems?

To partially answer this: the most obvious example is Java and its applet infrastructure. Applet writers embed their programs in web-pages, and don't generally have the ability to access random parts of the local disk, nor do they have the ability to open any tcp/ip ports. Applets, however, don't describe how they can be embedded into a general purpose server--their interaction is strictly defined for a browser and nothing else (as far as I know).

Lisp/Scheme system provide the ability to change running code. The clojure folks have an amazing (to me) example where a programmer telnets into his server and changes a servlet as it is serving pages to users. Other languages provide the ability to interact with the server: Erlang and Java (via osgi) come to mind. However, these interactions don't necessarily happen in a safe enough manner. They are certainly not designed to host little programs from millions of users whose competence and ethics may vary.

Haskell provides a way to keep pure code separate from code with side-effects. Perhaps the server only accepts pure Haskell code, thereby ensuring users don't do any I/O. However, Haskell also desn't have a good way of 'dynamically loading' code (hs-plugin is not very pleasant to use). I may wish to stop users from doing disk I/O, but they shouldn't have to jump through hoops to generate random numbers or mutate local state.

It seems to me that the right language to allow such a system would have the ability to control effects: but do so at a very granular level: permission for non-deterministic code, mutable state, I/O, etc. need to be controlled separately. How would one ensure that CPU time or memory is used fairly? How could the server writers make sure use scripts actually terminate (no infinite loops or recursions, beyond the global event loops which exists as long as the game continues)?

Can type/effect systems solve this? Is this better handled by virtual machines or run-time checks?

Other examples where such a system can be useful:
-Financial exchanges which let people submit little programs, rather than single commands to buy/sell securities.
-Car computer systems which let drivers embed their programs to enhance the dashboard
-General programming languages which let users provide their own implementation of interfaces (such as Comparable in Java), but ensure that these implementation don't have access to disk/networks, etc.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Object Capability Model

Consider looking into the motivations around E language. Browse www.erights.org. Object capability model achieves security without a 'sandbox', and is suitable for general purpose use.

The work on the Object Capability Model is sufficient for many purposes, minus confinement (for purchased scripts, to keep them from phoning home, leaking your strategies and other secrets). Confinement is achieved by construction (as per the KeyKOS factory) or by analysis (check to see if any objects are named within the script).

A complete security solution also needs rights amplification mechanisms such as sealers/unsealers, along with blinding, and certificates for offline and transferable rights.

Resource management and process accounting is a problem that is more difficult to deal with. Even a total function (guaranteed termination) can be told to number-crunch the Ackermann functions, so you can't simply depend on such termination. It would be wise to design the language such that partial failures are part of the language model, which would allow you to kill processes and already have well-defined behaviors for the other processes that were interacting with them. Adding resource management features to the language is also possible, though I've not yet found a satisfactory approach in my own work.

I'm working on a language intended for distributed data fusion, command and control, and creating large persistent peer-to-peer virtual worlds with the same level of interconnectedness as the Internet. Its design is already sufficient to meet the various goals you've described, minus full control over resources, and plus a few design criterion for distributed development and operation. It's still a few years from completion, though, since I can't dedicate much time to its implementation.

it seems to me that the right language to allow such a system would have the ability to control effects, but do so at a very granular level

For the purposes you named above, controlling effects isn't all that useful. It's useful for other purposes, but I'll get to that below.

For the purposes in the OP, controlling effects was just a limited way to control capability. Languages designed to support capability models can do so far more readily, at a finer granularity, adding new capabilities on-the-fly as required.

Password capability and object capability models are the most promising for this purpose. Password capability is based on certificates (i.e. I can do X because I have this certificate signed by someone you respect that says Dave can do X and I can prove I am Dave if challenged). Object capability is based on encapsulation of unforgeable object identifiers (i.e. I've already proven I can do X because, if I couldn't, then the message telling you to do X wouldn't have reached you). These approaches can be used together effectively: these two approaches have disparate strengths and performance when it comes to offline use and exclusive or transferable rights (such as e-scrip).

Controlling effects (including IO, delay, divergence or non-termination, observable concurrency and race-conditions, deadlock, and non-atomic failure) has many advantages for language design. Code guaranteed to be without a given effect is usable in a wider variety of contexts, may be subject to certain optimizations, and so. If guaranteed by construction, the features don't require intelligent analysis. Controlling effects is worth doing to support reasoning about black-box composition. Since black-box composition is exactly what is happening when adding user-code to servers, I would recommend controlling effects even if it isn't much help for security or resistance to Denial of Service attacks.

Google App Engine

What kind of language is this? What real-world examples are there that do this or what papers describe such systems?

I think the Google App Engine's Python support falls in the area you mention. The analog of your virtual world is the web itself. They've also recently announced a Java runtime as well. Both runtimes as sandboxed. For example, file system access routines in python don't work, but they provide a "data store" that scales on google's server infrastructure.

Note: I don't work for google.

Second Life and Mono

The Second Life virtual environment runs user-created scripts are run on the Second Life servers. They started out with a custom-built scripting engine but they've been switching over to using Mono.

http://wiki.secondlife.com/wiki/Mono

They demoed Mono support several years ago, showing how much faster it was than their custom engine (JITed vs. interpreted). But I bet their custom engine was designed from the start to support running a massive number of scripts in isolation, whereas Mono was not. Maybe this is why Mono support is still in beta -- retrofitting lightweight isolation is not easy.

See LtU posts #1 and #2.

See LtU posts #1 and #2.

Lang.NET 2009

Interesting, just saw a Lang.NET 2009 talk on Second Life with the title "Running 10M Untrusted, User Generated Applications..."

Sounds like they inject code into assemblies. Doesn't sound like it could be a general purpose solution.

Maybe this is why Mono

Maybe this is why Mono support is still in beta -- retrofitting lightweight isolation is not easy.

Depending what you mean by "lightweight isolation", it's not that hard at all. In fact, you can do it entirely at the IL level with a pass written using Mono.Cecil to enforce capability security. See Joe-E for a Java example of this approach.

Tcl

I'm not sure Tcl does all of what you want, but its sandbox is a pretty good one, and well worth checking out.

http://www.tcl.tk/man/tcl8.5/TclCmd/interp.htm

For instance, see 'resource limits'.

Furthermore, it's a mature, well-tested implementation.

Tcl

I would like to second this opinion. Tcl has the ability to control all the external effects that a script can have. Its model is based on slave interpretors where each slave is independent of the others and a master interpretor can control the commands executable by a slave. So a script running on a slave interpretor is restricted in its ability to cause harm to the system, but is independent of the others. (See Safe interpretors and hidden commands in the previous message's link.)

I do not think that the memory footprint of each interp can be controlled with the interp command, but you can accomplish this easily in Tcl by adding your own interp command (perhaps wrapping the original interp).

It has tiny enough interpretors (~60k for tinytcl), also there is a somewhat new Tcl engine being developed called jim which might be easier to modify if you wish to.

See also the resources section, with which you can restrict the number of commands (or statements in other languages) that can be run by a user. It can be used as a substitute for termination detection.

MOO, ColdMUD & derivatives

I highly recommend looking at two systems/languages from the early 90s that attempted to address exactly the scenario you're talking about, though on a smaller scale.

http://en.wikipedia.org/wiki/MOO

and

http://en.wikipedia.org/wiki/ColdMUD

Both were network available user-programmable multiuser systems, meant to run MUDs, though there was really no limitation in that direction. They were both based on classless object-oriented (prototype oriented) programming languages roughly in the C & Algol syntax tradition. The initial developers of the systems were well versed in Lisp & Smalltalk, and it showed.The latter was an offshoot of the former.

To support multiuser access, there were three things at work:

* A custom language & VM, which had its own (primitive) timeslicing. All user programs were time-boxed to run a maximum amount of time, preventing them from hogging system resources.
* Prototype-inheritance object model to allow for more flexible re-use models. The "world" was full of "generics" which people could clone or inherit-from.
* Programming model which had security concepts built in. They had either Unix-style ACL security (MOO) or heavy encapsulation (law of demeter style) (ColdMUD). In fact, developers wrote capability-oriented security in the latter.

IMHO these systems were a path-not-taken; user programmable systems which mixed both synchronous ("chat) and asynchronous ("email") modalities, and encouraged users to author in the world. They were popular pre-WWW (1990-1994) and essentially lost their momentum when the web boom happened. Apart from their text only nature, they make contemporary "social computing" ala Twitter, Facebook, etc. look rather micky mouse.

Worth a look.

Termination

It might be worth asking yourself why you want a general purpose language for your users. Not all DSLs try to be complete, and there are advantages to restricting yourself to a non-complete core language. In particular termination is tricky to prove in general purpose languages (for your cpu usage guarantees). So there are two ways to attack the problem:

1. Go for a restricted DSL, one common choice is to restrict loop bounds and data sizes to constants. Each program now trivially terminates, although computing tight bounds can still be tricky (but doable).

2. Give the user a general purpose language, but use a Termination Analysis. If the analyser can prove termination then allow the program, otherwise reject it back to the user. The problem with this approach (from experience) is that you need to be an expert in writing termination analysers as well as in the domain that you are working within.

If your target audience can be satisfied by door number one, then it is a much easier way to go. Thinking through your problem domain (numerical analysis on trading patterns) I can't think of any sample program that requires arbitrary data sizes or runtime. Users will probably want to operates on fixed size widows of data, rather than on some arbitrary amount.

Door number two is more difficult but you may want to consider Proof Carrying Code as a solution. The client generates a proof that the code lies within some bounds for memory and/or runtime. The server checks the proof against the code. One system that does this for real is CaioCC, slides for a paper on the subject can be found here.

In general a lot of identical issues crop up in Mobile Programs. The problem that you've described is in fact a Mobile Program problem, but with a single fixed network topology. Issues of cpu/memory bounds on mobile code and designing DSLs to enforce them have been looked at quite extensively in that area, although the results may not show up when you search for these issues in general. Here is a good starting point that combines PCC and Mobile Code.