Quote Safe unquote JVM language?

The problem is: running user scripts on your server. Supposing the Java runtime is in use, you'd like to guarantee:

  1. the users code won't blow up the heap and crash every other users scripts.
  2. view or alter other users code and data using reflection
  3. blow out it's thread with recursion abuse

Java can be made pretty 'safe' using the it's built in sandboxing features (SecurityManager, AccessController, and Classloader) but most all JVM languages out there now (JRuby, Jython, Groovy, ...) are dynamic in nature and nearly impossible to 'secure' in the sense of the three items above:

  1. Heap: users can blow up the heap with simple string concatenation
  2. Code/variable visibility: these languages don't respect private access modifiers in general, from there you can own the system by accessing whatever classloading architecture the given JVM language implements. Also depend on this to acheive their dynamic natures.
  3. Recursion; methods call methods not much to be done about this

So you might suppose a language that disallows reflection and heap allocation might be a good thing in such an environment. Suppose 'new' was not a keyword, and a convention were adopted such as 'declaration is instantiation' then you could generate bytecode that would simulate stack allocation thus protecting the heap. Disallow recursion by embedding some code to examine the call stack for the current method.

Has anyone else considered this use case? Am I talking about Ada here?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Joe-E

Check out Joe-E. It pitches itself as a ocap-secure subset of Java, but it seems more like an ocap-secure JVML, which is even better.

Edit: ocap-secure JVM, not JVML

Allocation is still an ambient privilege in Joe-E...

...and most other ocap languages. You can call new and invoke recursive procedures in Joe-E, both of which use memory. These seem to be the main things markt is worried about.

I've worked on languages where allocation is statically typechecked, but if you just want to prevent user scripts from going out of control the easiest thing is just to modify the interpreter's allocator and eval loop to limit how much memory/CPU scripts are allowed to use, ulimit-style.

Also ECMAScript 5

Have a look at the "Strict Mode" of ECMAScript 5. Javascript is now very fast, so it may be that committing yourself to Java isn't the best path for your goals.

JavaScript is not as fast as the Oracle JVM

End of story. We have internal benchmarks here to demonstrate slowness and we manually write complex memoization code extensively due to slowness.

If the requirement isn't a

If the requirement isn't a JVM language running in the same VM, virtualization approaches like Solaris containers probably suffice. The questions shift more to concerns like startup time and delegating resources, and there are many tricks for that.

As for JavaScript speed: you can write perf-critical parts in perf languages and compile into JavaScript to beat Java speeds. For magnitudes speedups, however, parallelism is where it's at, not byte code perf.

missing link

what was the url?

Fixed: asm.js vs. Java

Fixed: Box2D already running faster in JS than in Java by using C--LLVM/Emscripten-->asm.js

I had not heard of Joe-E or

I had not heard of Joe-E or ocap-secure, thanks!

'Server' in this context was ambiguous, sorry for that. I meant specifically a J2EE (tomcat, jetty, jboss, etc...) or some other Java server (ie Minecraft). So the hypothetical 'you' would really want to stick with the Jvm. These are some options I had considered:

  1. Run the user scripts (groovy, jruby, jython, ...) in an eternal process. Jvm startup time would be a killer here, you'd have to run a daemon of some sort.
  2. Implement a java interpreter in Java so you could control the heap and other aspects, and run your scripts in that (again, groovy, jruby, jython, ...)
  3. Implement a new jvm language that's statically (and loosely?) typed. But then you're users have to learn a new language, and all the cool kids are doing dynamic languages with funcational features nowadays so uptake would most likely be an issue. Unless you have a killer app like Minecraft that everyone wants in on.

1 is the simplest way, use 1.

I'm not sure why you went to the trouble of even considering 2 and 3: managing a pool of pre-started interpreter processes and using the OS's resource management features seems so much easier than 2 or 3..

The only thing to take care is that a process is only re-used by the same user otherwise there could be a security issue.

Very true, 3. is what is

Very true, 3. is what is relevant to this site; what language features would allow 'safe' execution of multiple user programs in an environment with a shared heap (ie the jvm).

I qaulify 'safe' execution because that's a loaded term, could mean a lot of different things. In my sense it's this idea of running a program in a shared environment that provides isolation and protection from other programs, and other programs from your program.

Oh, I just realized these are operating system issues as well. You guys are a good sounding board, thanks!

-----BEGIN PGP SIGNED

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

context: in response to forum post "Quote Safe unquote JVM language?" (By markt at 2013-07-10 14:05) (http://lambda-the-ultimate.org/node/4777) on lambda-the-ultimate (The Programming Language Weblog)

""""""""""""""""""""""""""""""""""""""""""""""""
blow out it's thread with recursion abuse
[...]
Java can be made pretty 'safe' using the it's built in sandboxing features [...]
""""""""""""""""""""""""""""""""""""""""""""""""

Under a SecurityManager in Java, there is still nothing preventing you from doing this. SecurityManager does not bound the space/time of untrusted code. I don't find this that bad, since I'm not concerned with availability as much as I am security.

There are however, deeper issues with the Java security model. For one, all your objects (including classes and their static fields) which are accessible by untrusted code need to be thread safe. Thread safety is very hard in Java (harder than most Java programmers are aware of). Here is the tip of the iceberg, these are questions by Dog (user 2213023) on StackOverflow:

* Why does this simple threaded program get stuck? (http://stackoverflow.com/questions/15753290)
* Uninitialized object leaked to another thread despite no code explicitly leaking it? (http://stackoverflow.com/questions/16178020)
* blocked in sending to unbounded LinkedBlockingQueue unless there is no receiver? (http://stackoverflow.com/questions/17408097)
* Assigning the value of an array variable to itself? (http://stackoverflow.com/questions/17327336)

The main issue I have with the Java security model is that it's not recursive. All untrusted code has to trust each other. If you have a music player, a codec, and an audio output stream, you can't have the three mutually suspicious of each other. If only the audio output stream is trusted, then the codec can violate the music player. The object capability model naturally doesn't have this issue, and it removes tons of other concerns that arise in the Java security model.

An example of the object capability model is the E programming language. If you know Python, Ruby, etc, it should be pretty easy to learn. Code written in the E (http://erights.org) is considerably easier to secure than Java code, and it runs on the JVM. The basis is that encapsulation is pretty much equivalent to security, and doing OOP properly includes encapsulating code behind objects. There are no thread safety issues because it uses an alternate concurrency model.

This may not all be 100% accurate because I haven't touched the Java security model for years. Many years ago, when I first learned about programming, I was appaled to see that in the so called secure OS that is Linux (and other *nix), all code trusts each other. I immediately tried to move everything into the Java sandbox, mainly by taming exising Java code to be compliant with it. After months of struggle, I realized this is futile, and nobody really supports the Java security model. There are simply way too many edge cases because of things like concurrency, field/method accessibility, classloaders, etc that interact in all kinds of different ways. It seemed to me that the creators haven't put much thought into the Java security model. I wouldn't use Java as a trusted code base because it's one of those "anything goes" languages; It's filled with all kinds of concepts that were just copied from other programming languages (unicode, shared state concurrency, unsound serialization model, unsound metaprogramming model, inheritance, etc) despite causing huge security implications. I never supported E because the main implementation in Java, so it's not really verifiable, and I think it's still not ideal, but I digress.

Regarding dynamic languages, E for example is dynamic, and I consider it much more secure than Java. Javascript and Perl have been made into capability languages (Caja, Caperl). You can implement this type of security in both static and dynamic languages.

""""""""""""""""""""""""""""""""""""""""""""""""
Has anyone else considered this use case? Am I talking about Ada here?
""""""""""""""""""""""""""""""""""""""""""""""""

Ada isn't memory safe as far as I know, so probably all you could do is run each componenet in a separate process and use IPC or something.

E is one of the best examples I've seen of running untrusted code, but if you care about securing space/time, I don't know any easy solution other than marking all critical processes so that the runtime kills only the non-critical ones.

I still await a small unified OS/language that is actually capability secure and memory safe.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)

iEYEARECAAYFAlHnYQAACgkQ3PGpByoQpZE+FQCeLBB2izjCyRfKxxOmxFEZUy83
WAQAn1OSdPrQCQK3s4rjWQHi/RLhJ7v/
=jBTi
-----END PGP SIGNATURE-----

If you're willing to give up all ghosts of performance...

The JVM is not secure for running untrusted code; you can tell by when you google "run untrusted java code", and not one result tells you with confidence "this is how you secure code". You get a whole bunch of

Yeah, if you do this and this, and maybe that, and maybe those things, it'll be secure-ish, but I wouldn't trust it. - The Internet

If nobody on the internet has figured out a way to lockdown untrusted code in the JVM, I'd bet it isn't possible directly.

Any "secure" language is going to have to implement its own security box (e.g. memory CPU limits) on top of what the JVM gives you, since the JVM's security model is so full of holes as to be worthless. Any language that is itself "secure", but doesn't protect against Java std lib code allocating memory, is just asking for an attacker to use std lib calls to OOM you and kill/corrupt your process.

One possible approach, which I have explored, is using a bytecode instrumenter to re-write Java bytecode as-it-gets-loaded in order to provide resource limit guarantees that the plain JVM cannot. I'm pretty confident it can be made secure (this is the same technique that VMWare and friends use to securely run operating systems), and there are several papers describing security via bytecode instrumentation/rewriting (google it), but I'm not aware of any currently-maintained JVM-locker-downer that you can drop in and start using. My own implementation demonstrates fine-grained memory and bytecode-usage limits (including memory allocated and bytecodes executed in std lib code), that are impossible to achieve normally, but it's definitely not secure overall.

If you're willing to give up all ghosts of performance, you can always write your own JVM and interpret the damn bytecodes yourself. Doppio is one implemented in a couple thousand lines of coffeescript, and I'm working on my Metascala VM implemented in ~3k lines of Scala.

It's surprisingly easy to write your own JVM to a decent level level of conformance: Doppio can run Rhino, javac and a bunch of other programs, and Metascala is complete enough that it's metacircular and can interpret itself interpreting another program. It'll take forever to get all the details like the SecurityManager, Threads, JMM, etc. working, and performance may be 100x slower than execing bytecode directly, but for user scripts this may be an acceptable tradeoff. By interpreting the bytecodes yourself, you have to intentionally give your interpreter the ability to affect the outside world, and if you haven't, it's almost completely secure to begin with. The only exception is CPU and Memory hogging, which can be dealt with easily since the untrusted code can only use CPU or allocate memory through your (trusted) interpreter.

Cool, thanks for the links!

Cool, thanks for the links! I spent a couple of days researching java in java jvms (jikes rvm, maxine, squak, joeq, and a few others). I agree, interpreting bytecodes seems like the easy part, the hard parts are as you stated. Reflection is critical for the dynamic jvm languages (jython, groovy, etc).

What I'm looking into now is Lua and it's vm's. Apparently they all use the host systems memory management (malloc in c, and allocate objects on the heap in Java (luaj, kahlua, mochalua)). An interpreter backed by a big ole byte array would be the way to go for securing the heap at least. But then you're implementing memory mgmt and a GC.

This IS quiet a hard problem :)

Reflection is critical for

Reflection is critical for the dynamic jvm languages (jython, groovy, etc).

Reflection is actually pretty easy, assuming your interprer's language has hashtables (most do, except C i guess). It's more the un-documented infinite amount of stuff in sun.misc.Reflection and sun.misc.VM and sun.misc.Unsafe that you have to track down and reimplement, and there's no way of tracking stuff down except running big programs and seeing NotImplementedErrors (or whatever you call them) pop out of your VM.

But then you're implementing memory mgmt and a GC

Yeah you are. It's surprisingly easy to get a simple copying GC working: metascala's is about 100 lines and mostly works (again, missing handling for all the native stuff). took me 2 days to implement by copying coursera lecture notes, so don't let that scare you off. Of course, a performance-tuned generational GC is another matter!