OoO Java

We typically hear about optimistic transactional memory, so it's refreshing to see analysis-driven work on the pessimistic side in OoOJava: Software Out-of-Order Execution. Of note, they guarantee determinism:

Developing parallel software using current tools can be challenging. Even experts find it difficult to reason about the use of locks and often accidentally introduce race conditions and deadlocks into parallel software. We present OoOJava, a new approach to parallel programming inspired by out-of-order processors. Out-of-order processors have long extracted unstructured parallelism from sequential instruction streams. In our approach, a developer annotates code blocks as tasks to decouple these blocks from the parent execution thread. OoOJava extracts all data dependences through static analysis to generate an executable that is guaranteed to preserve the behavior of the original sequential program.

We have implemented OoOJava and achieved an average speedup of 15.3× on our nine benchmarks. The combination of a simple parallelism model, compiler feedback, and speedups are indications that out-of-order execution-based programming models can become mainstream.

(James C. Jenista, Yong hun Eom, and Brian Demsky)

In discussions with the authors, it seems that the evaluation benchmarks were on embarrassingly parallel programs with no library dependencies, so there are many questions left to answer about the pragmatics involved.

Perhaps another interesting paper in this vein is Deterministic Parallel Java which takes a type specification based approach. A question arises of software engineering: when writing a maintainable parallel programs with potential algorithmic interference, which invariants should we specify? If just basic task boundaries, OoOJ is almost like program analysis (instead of type inference) for DPJ; if non-interference guarantees, we start to get a very different picture.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Sounds like Silk

At least I think that's how 'Cilk' is pronounced. OoOJava provides extra annotations for task parallelism, then does a few dependency analyses to discover more.

A major challenge for automated parallelism in highly imperative languages and modular languages such as Java is dealing with IO. If I use abstract objects and dependency injection, to what extent may I also benefit from static analysis and out-of-order execution?

I'd like to see these sorts of features better integrated with major language implementations. I often feel as though there's this barrier preventing me from actually using such innovations until they become 'official'.

At least I think that's how

At least I think that's how 'Cilk' is pronounced. OoOJava provides extra annotations for task parallelism, then does a few dependency analyses to discover more.

While I am a big fan of Cilk, TBB, etc., they don't provide any property as strong as determinism.

Edit: also, their scheduling is naive at this point and I don't think OoOJava pulls out more parallelism than specified. They work at avoiding observable parallelism, not finding additional opportunities. However, by using all their machinery, I think there are additional opportunities not traditionally exploited -- most language designers would shy away from using points-to etc. analyses.

In discussions with the

In discussions with the authors, it seems that the evaluation benchmarks were on embarrassingly parallel programs with no library dependencies, so there are many questions left to answer about the pragmatics involved.

In other words, most enterprise software stacks have unthinkable amounts of dependency injection and thus approximates a data parallelism problem, not a task coordination problem.

From the abstract:

OoOJava extracts all data dependences through static analysis to generate an executable that is guaranteed to preserve the behavior of the original sequential program.

And how can static analysis determine all the data dependencies in an app using OSGi?

I do think their conclusion makes sense, though.

In other words, most

In other words, most enterprise software stacks have unthinkable amounts of dependency injection and thus approximates a data parallelism problem, not a task coordination problem.

Data parallelism is about where the parallelism is, partitioning roughly equivalent functional tasks by data structure if data structure parts approximate workloads and there are few dependencies or simple (e.g., wavefront) uses of them. Recursive tree traversals, pixel shaders, etc. I don't see how that is too relevant here. For example, I can imagine software pipelining distinct tasks (and I believe one of their examples is exactly that). Arguably this is data partitioning some queue data structure, but I suspect you can say most parallel problems are data parallel at this point. OoO Java can work on data parallel and non-data parallel programs... and fail to trigger on both:

The concern here is in achieving OoO Java's goal to only concurrently run tasks that linearize with the sequential version while, at the same time, only asking for minimal task annotations from the developer to guide the search space. When evaluating their ability to do so, they used a stripped out version of Java that seems pretty far from the language we know and love (?); no standard libraries, class loaders, etc. Like you said (and actually challenging the ocap community as well), there are many dependencies in typical Java programs that would surface in a static analysis for separation. Any such detected dependence, for OoO Java, would prevent parallelism.

The idea of OoO Java in itself isn't so novel -- the real question is how it'd work given the state and requirements of modern languages, programs, and analyses. Unfortunately, that part of the paper was gutted out. I don't think all hope is lost, however, just that the interesting part of the the paper -- the evaluation of what is needed -- wasn't very revealing. For example, the Joe-E project to make a capability safe version of Java is, for analogous desirable safety properties, doing a significant amount of the grunt work I suspect useful for making OoO Java's analysis applicable.

the other 80% of joe-e

i get the impression that there's a lot more to do if one really wanted to be able to hand joe-e to a regular java person in an attempt to convert them. not that what has been done is worthless or anything, just that it is limited, presumably since taming the stuff that is already written is hard since it wasn't written to be tamed!

An opposite approach to sequential transactional memory

Just saw a great talk by Emery Berger on Grace:

Grace is a software-only runtime system that eliminates concurrency errors for a class of multithreaded programs: those based on fork-join parallelism. By turning threads into processes, leveraging virtual memory protection, and imposing a sequential commit protocol, Grace provides programmers with the appearance of deterministic, sequential execution, while taking advantage of available processing cores to run code concurrently and efficiently. Experimental results demonstrate Grace's effectiveness: with modest code changes across a suite of computationally-intensive benchmarks (1—16 lines), Grace can achieve high scalability and performance while preventing concurrency errors.

Like the above project, the authors want sequential semantics to be the default for parallel programs. Their approach to ensuring this is almost fundamentally opposed: optimistically fork each task as a mmap'd process that copy-on-writes and dynamically check that, under the fork/join task (transaction) order, process effects are ordered before accepting them. The interesting thing here seems to be a compelling runtime (deviating from DB-like and PL approaches): any logging, for the most part, is done through basic coloring, so the real expense is the original copy-on-write while a task gets rolling. Granularity is achieved by taking a page out of Cilk/TBB: just run sequentially after a certain point.