Joe-E: A Security-Oriented Subset of Java

Joe-E: A Security-Oriented Subset of Java. Adrian Mettler, David Wagner, and Tyler Close. To appear at ISOC NDSS 2010.

We present Joe-E, a language designed to support the development of secure software systems. Joe-E is a subset of Java that makes it easier to architect and implement programs with strong security properties that can be checked during a security review. It enables programmers to apply the principle of least privilege to their programs; implement application-specific reference monitors that cannot be bypassed; introduce and use domain-specific security abstractions; safely execute and interact with untrusted code; and build secure, extensible systems. Joe-E demonstrates how it is possible to achieve the strong security properties of an object-capability language while retaining the features and feel of a mainstream object-oriented language...

Section 5.2 discuss how Joe-E leverages Java static typing. Joe-E is implemented as a source-code verifier not a bytecode verifier. Section 6 of the paper explains this design choice.

Joe-E was mentioned on LtU in the past and is available here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Joe-E is implemented as a

Joe-E is implemented as a source-code verifier not a bytecode verifier. Section 6 of the paper explains this design choice.

It does not sound like much of a 'design choice' to me:

We perform checks on the Java source code rather than
on Java class files since the Java runtime subjects bytecode
to only a limited set of validation checks, allowing bytecode
to do a number of things that Java programs cannot.

I am also not clear on the follow-up comment:

The
expanded semantics afforded to bytecode but not specified
by the Java language are unfamiliar and not clearly defined,
and thus much harder for a programmer or auditor to reason
about.

The authors should be referred to Java and the Java Virtual Machine: Definition, Verification, Validation which discusses a project that aimed to clarify both the Java programming language and JVM spec, using AsmGofer for its abstract state machine interpretation.

It may in fact be possible that a valid Joe-E program is a valid Java program, but not a runnable JVM program...

I guess I would just like to hear more about the Joe-E approach in this regard... I've mostly just skimmed Joe-E stuff in the past.

This design choice was

This design choice was discussed on cap-talk awhile ago, when I brought up isolation on the CLR. The problem is one of "full abstraction" where there is a correspondence failure between source and VM-level abstractions, so a programmer who is operating at the source level cannot protect his program from maliciously crafted VM bytecodes.

For instance, a failure of full abstraction on the CLR was that its bytecode was not constrained to throw objects that inherit from System.Exception, but the C# compiler forced you to do so. As a result, a C# programmer could not catch such exceptions thrown by from malicious bytecode, and such bytecode could use this to violate a program's invariants. This has since been addressed on the CLR, along with a few other abstraction failures (discussed previously on LtU). The number of identified abstraction failures on the JVM is actually astounding, and it seems the CLR managed to avoid much of these problems due to its typed IL.

A more comprehensive overview of Java's abstraction failures is in the Joe-E spec. The Joe-E developers basically argued that any JVM bytecode could be properly verified by first translating to Java, then compiling to JVM bytecode using a trusted compiler. That this design approach led to higher assurance than bytecode verification should give you a sense of how profound the JVM's problems are here.

I get it now

The key is that they built a trustworthy Java-to-JVM-bytecode compiler.

Thanks!

Sorry, that was a

Sorry, that was a mischaracterization on my part: Joe-E is a Java source code verifier (as an Eclipse plugin), a library, and a taming database. They assume you are using a suitably trusted Java compiler for the time being, and they seek only to protect you from other Java code in a capability-secure way.

So Joe-E as it stands is one step on the path to capability-secure Java programs, but I think they'll need a decompiler and a trusted compiler to get all the way there.

I don't understand -- goal confusion?

but I think they'll need a decompiler and a trusted compiler to get
all the way there.

They would need these elements to get to a capability-secure JVM platform. But for achieving only a capability-secure Java platform, why would they need a decompiler?

Decompiler

The decompiler is for verifying untrusted code that just happens to be provided to you in bytecode format.

[EDIT] I suspect it would also be quite useful for verifying untrusted code that just happens to be provided to you in the form of Pizza, Groovy, Scala, et. al.

So Joe-E as it stands is one

So Joe-E as it stands is one step on the path to capability-secure Java programs, but I think they'll need a decompiler and a trusted compiler to get all the way there.

Hmm. We may have different visions of what "all the way there" would mean. Joe-E does support construction of capability-secure Java programs. We don't need a decompiler for that.

Joe-E doesn't support untrusted bytecode. That's not how you build systems in Joe-E; you write code in the source language, not the bytecode, and if you want to use code produced by someone else, you ask them for the source, not the bytecodes. I'm not sure why this would be insufficient.

Keep in mind that Joe-E makes no claim about legacy compatibility; it's for building new systems. Given that goal, it seems reasonable to me to basically ignore the fact that the bytecode language even exists, for all user-visible purposes, and do everything in terms of the Java source language. Am I missing something? I hope I'm not being too defensive.

We may have different

We may have different visions of what "all the way there" would mean. Joe-E does support construction of capability-secure Java programs. We don't need a decompiler for that.

By "all the way there", I meant only to minimize the TCB to the greatest extent possible while preserving modularity/separate compilation. Presumably Joe-E is still dependent on Java's collection libraries, which the taming database merely labels "safe". On each platform these tend to be available as bytecode, though certainly if you enforce "it's only safe with whole-program compilation, class libraries included", that's a safe choice as well.

JVM bytecodes are full of pitfalls

Hi Z-Bo, thanks for your comments! Naasking has nailed it. We originally had planned on defining Joe-E as a subset of the JVM bytecode language, but it turns out the JVM bytecode language has too many pitfalls. Here's my favorite:

public boolean static foo(byte b) {
    return b <= 127 && b >= -128;
}

Pop quiz: What can foo() return, if we know nothing about what arguments its caller might pass it?

Answer: It's a trick question. If we know that the caller is Java code, then foo() will always return true, no matter what you pass it. But if we know that the caller is JVM bytecode, then foo() can return false, surprisingly. This is so even if we know that the JVM bytecode has passed the bytecode verifier.

This is a surprising hazard that makes it unreasonably difficult to write secure code that can safely interact with untrusted JVM bytecode. What's worse is that this is not the only such hazard: there are a handful of these, as documented in the appendix of the Joe-E language spec. See also my email to cap-talk, where I elaborated on this issue. I also second the recommendation to check out Andrew Kennedy's papers on failures of full abstraction in C#; that is wonderful work.

As a result, Joe-E is defined as a subset of the Java source programming language: every Joe-E program is a valid Java program (if it's not valid in the Java source language, it's not valid Joe-E). Joe-E is not defined in terms of the bytecode language (it could have been, but that would have made it unnecessarily difficult to write secure code, as discussed above).

We write a verifier that parses a putative Joe-E program -- i.e., a bit of Java source that is alleged to follow Joe-E's additional rules -- and checks to see that the code does indeed follow all of Joe-E's rules. Thus, from an implementation perspective, code has to both pass the Joe-E verifier and also be accepted by the Java compiler before we'll accept it as a valid Joe-E program.

Incidentally, in retrospect I consider Java's decision to transport untrusted code using bytecode as a design flaw. Java's design ships untrusted code across the wire as bytecode (rather than source code), and the bytecode language does not provide full abstraction; this seems like a design flaw to me. I would bet that there exists at least one flaw in the Java libraries due to this failure of full abstraction: i.e., at least one security hole that can be exploited by malicious bytecode (say, a malicious Java applet), but that can't have been exploited by malicious Java code. But I've never had time to go trawling for such an example to try to confirm my speculation.

Incidentally, in retrospect

Incidentally, in retrospect I consider Java's decision to transport untrusted code using bytecode as a design flaw.

I agree, but not as a universal condemnation of bytecode. I suspect there are no such flaws in a VM where there is a formal correspondence between source language and abstract machine instructions, eg. OCaml's bytecode VM. However, this then leads me to wonder if there ever could be a truly "universal VM" the way the CLR has positioned itself. Even a low-level typed assembly language would have problems with inter-language abstraction violations.

This would seem to imply that any language implementation will invariably be tied to a specific VM. Perhaps a universal VM could provide a notion of "semantic environment" within which a language is assured a certain well-formed environment and execution semantics. The VM partitions the execution of bytecode according to environment and provides a well-defined IPC/FFI interface for communication between environments. Is there a better way?

Even bytecode should be a capability

System.Reflection.Emit shouldn't allow you to recognize bytecodes your language cannot provide full abstraction for.

So the issue of a well-formed environment and execution semantics must include more than just execution in the usual sense.

Thanks but... some questions

This is a surprising hazard that makes it unreasonably difficult to write secure code that can safely interact with untrusted JVM bytecode. What's worse is that this is not the only such hazard: there are a handful of these, as documented in the appendix of the Joe-E language spec. See also my email to cap-talk, where I elaborated on this issue. I also second the recommendation to check out Andrew Kennedy's papers on failures of full abstraction in C#; that is wonderful work.

What about the JBook project? That seems more relevant than a project for the CLR (Andrew Kennedy's work). One of the members of the JBook project, Wolfram Schulte, was not listed as an author because the problem of validating and verifying the design of Java and the JVM took so long, and Schulte ultimately dedicated his attention to joining Yuri Gurevich's Foundations of Software Engineering group at Microsoft, where Yuri's primary interest is in evolving algebras (ASMs). My original comments above were targeted at the curiosity that you characterized bytecode verification as "unfamiliar".

Come to that, I think this is the bottom line:

Incidentally, in retrospect I consider Java's decision to transport untrusted code using bytecode as a design flaw.

What you have in effect done by using Java as your "assembly language" is pushed up the abstraction boundary, removing stack-based execution.

I guess my next big question is, How hard would it be to build on the work done by the JBook project? Why is an independent Joe-E verifier better?

I am legitimately interested in understanding the rationale for this as a language and system designer. I've been pondering full abstraction for a term rewriting language for awhile now, which is why I bought the Abstract State Machine books. I'm looking for the best approach that will also allow others to extend whatever work I do, modularly and efficiently.

For example, I think Sandro has some good ideas with regards to mentioning that CIL defines unsafe bytecode operations, and that the CIL verifier can check the security context of an operation to know if an unsafe operation is permitted or not. But unsafety here is mainly associated with memory safety. e.g. system stability that could cause the environment the CLR is hosted in to crash, which is a separate concern from the CLR Application Domain crashing. In the LtU thread Sandro Magi mentioned, it seemed like everyone was arguing that 'full abstraction' is impossible in truly multi-language VM environments. I think that is ridiculous, and all that probably needs to be done is to 'wrap' assemblies so that the supersystem (the language) depends on a valid subsystem (a VM 'profile' -- an assembly verified against a language profile). Now the performance of this approach is a separate matter, as well as whether or not this is the ideal design approach. The assembly wrapping approach is seen in the Common Language Spec-compliant assembly checker via the System.CLSCompliantAttribute. This System.Attribute simply marks one possible executable specification.

A similar approach is seen in UML 2.0 via Executable UML Profiles, which are necessary due to the unverifiability of the semantics of statechart diagrams and other modeling diagrams in UML.

With regards to serializing of wire protocols, this means that communication between a client and server must negotiate a trust level before we can actually decide whether we can serialize this stuff. For example, the client may not know how to decode C#. Using David Ryan's argot protocol, which is designed for things such as automated code distribution, the parser might be distributed to the client.

This represents my current thinking on things, and is partly thinking out loud and mostly just intuition. I reserve the right to change my mind when somebody debunks my intuition.

The system's gotta be usable for Java programmers

What about the JBook project?

It's important for Joe-E programmers to be able to write secure code without having to understand highly obscure features of the JVML bytecode language. It's reasonable to expect them to be familiar with the Java source language, but not with all the obscure corner cases of the JVML bytecode language.

We've listed perhaps as many as 9 hazards in the appendix of the Joe-E spec. I bet that not 1 in 100 Java programmers are familiar with all of these. (Honestly, I'd be amazed if the number is anywhere near 1 in 100.) I also suspect that the list of hazards in the Joe-E spec is incomplete.

If programmers have to understand all of these hazards to have a chance of writing secure code, then we've stacked the deck against them: we've made it unreasonably difficult to write secure code. Defining Joe-E as a simple subset of the JVML bytecode would mean that programmers would have to understand all of these hazards, before they could have any confidence in the security of their code. That's an unreasonably high hurdle to impose.

By insisting that all programs must be valid Java programs, before they can qualify as valid Joe-E programs, we eliminate the need for programmers to understand the hazards associated with malicious bytecode. We hope this will help programmers reason about the possible behaviors of untrusted code they interact with: untrusted code can only behave in ways that are expressible in the source language, and since programmers are writing code in that source language all the time, hopefully they will have a good basis for reasoning about what behaviors can be expressed in the source language.

To put it another way: Java programmers are hopefully familiar with what you can do in the Java language, because they write Java code every day. Probably most Java programmers aren't nearly as familiar with the JVML bytecode language, since they rarely use it. Joe-E enables programmers to rely on their intuition and knowledge about what Java code can do, and avoids the need for them to have any familiarity with what JVML bytecode can do (and avoids the need for programmers to have any familiarity with obscure things that are expressible in JVML bytecode but are not expressible in Java source code).

(In principle, sure, we could presumably define a subset of the JVML bytecode that is no larger than the set of bytecodes that could be produced by compiling valid Java code -- but this has severe problems in practice. I'd be happy to elaborate if my reasoning is not obvious.)

I'm not sure if I'm making any sense. If not, I apologize in advance.

It makes sense

I'm not sure if I'm making any sense. If not, I apologize in advance.

It makes sense.

I am not questioning your choice of targeting Java as an intermediate representation.

I am simply curious if it makes more sense for researchers to extend the work of JBook, and go from Joe-E Compiler -> Java Compiler ASM ->
JVM Interpreter ASM. Your responsibility as a security researcher would therefore simply be an algebra from Joe-E to Java. This makes sense because it matches the needs of your Maximal Subset principle. You are defining a subset, so refining an evolving algebra seems like an ideal way to do this.

Then there is the additional curveball I throw in... the idea of 'profiles', which does not allow for your maximal subsetting principle. Instead, it forms a pointwise relation between your language and its fully abstracted environment.

I don't see any flaws in my reasoning. Are my flaws evident by the following comment?

(In principle, sure, we could presumably define a subset of the JVML bytecode that is no larger than the set of bytecodes that could be produced by compiling valid Java code -- but this has severe problems in practice. I'd be happy to elaborate if my reasoning is not obvious.)

Also, come to that:

I also suspect that the list of hazards in the Joe-E spec is incomplete.

...which probably re-affirms the idea of everyone sticking to one formal approach -- possibly ASMs -- for validation and verification of full abstraction.

Taming the Libraries

Joe-E covers the basic requirement of capability security: heavily restricting ambient authorities other than memory allocation and CPU expenditure.

But it's effectively designing all the other components for fine-grained distribution of 'least authorities' that allows capability security to be practical and tolerable. This includes GUI, network, sound, sensors, actuators, mutable composites, evaluators and linkers, persistence, etc.

Discovering the best security patterns for these components is non-trivial even before concurrency concerns are introduced, and many of the common Java libraries (such as Swing) need considerable taming even after you eliminate ambient authority.

Does Joe-E or a related project do any work on taming the libraries?

Joe-E doesn't have an extensive taming database

Thanks for your enthusiasm. Unfortunately, you're not going to be happy with Joe-E. As things stand today, Joe-E has an extremely minimalistic, bare-bones taming database: significantly less extensive than E's taming database, if you're familiar with that. I agree that this limitation is significant for would-be users of Joe-E, and that widespread practical use of Joe-E would require a more extensive taming of Java libraries. And I agree with you that taming libraries takes some non-trivial thought and effort. I wish I had good news for you, but I don't: we're forced to leave this to someone else.

If you have any insights into how one could plausibly advance the state of taming databases, as part of an academic research project, I'd welcome your thoughts. Personally, I don't see how to do it. I suspect any program committee I'd send this work to would classify it as engineering, not research, and I don't know how to craft a paper that a program committee would likely be willing to publish on this subject. And burning a year of grad student time to tame a bunch of APIs, and not getting any publications out of that effort, is pretty much a non-starter. Similarly, I don't know how to find a funder who'd want to fund that effort, even if there was some smart student who wanted to work on this. So I've reluctantly chalked this up as one of several pieces of engineering that are "important in practice, but beyond the scope of what I know how how to do in an academic research environment". This is just one instance of a systematic issue in academic research: there are some kinds of engineering that are hard to do, and hard to justify doing, in an academic research environment, even though they might be important to end users. Sorry to be so pessimistic. I'd welcome any creative solutions you might have.

If you have any insights

If you have any insights into how one could plausibly advance the state of taming databases, as part of an academic research project, I'd welcome your thoughts. Personally, I don't see how to do it.

When I was considering how to do this using a .NET bytecode analysis, it seemed feasible for an ocap verifier to apply a sort of effect inference on instruction bodies to determine whether object-capability principles are respected. Here are the CIL opcodes. We conservatively mark the following functions unsafe:

  1. Any functions that access transitively mutable static fields, ie. any static fields that are mutated, or that hold objects that are themselves mutable (for more precision, identify whether these objects are actually ever mutated after construction, ie. that a mutable static object ref is mutated within a function's scope or escapes a function's scope by being stored elsewhere)
  2. Any functions that attempt to pass static state as a by-ref param (for more precision, a flow analysis could sometimes determine if the by-ref param is ever set)
  3. Any functions that invoke the FFI (FFI calls are identified in the CIL metadata)
  4. Any functions that use some types of reflection, eg. calls to SetField which may violate immutable static state constraint from above; a more precise analysis would try to determine whether the BindingFlags used pose any danger of this; alternatively, force the use of a "safe" reflection library, at least for the more dangerous calls
  5. Any functions that use "unverifiable IL", ie. unsafe pointer operations and the calli opcode (CLR is distinct from the JVM in that it supports unsafe operations via "unverifiable" instructions); the CLR comes with a CIL verifier that checks this condition, so the ocap verifier just needs to ensure the above conditions and that the IL is verifiable using this tool

As a slight relaxation, we can allow functions from the system's class libraries to use unverifiable IL, since these are likely performance optimizations or primitives we must depend on, but we still forbid FFI calls and mutable static state by default (these will need manual inspection).

I think this could actually turn out a fairly precise result, but correct me if I'm missing something! Analyzing reflection looks the trickiest. I'm not sure how feasible this approach is with JVM bytecodes.

another approach: hack the vm