Oxymoronic? "Safety-critical development guidelines for real-time Java"

Came across the PERC Raven Java PDF advertising stuff while looking for other real-time GC implementations. I'm not sure I want Java near my nuclear reactors, but food for thought never-the-less.

(Scroll down to the second page of the PDF to see the guidelines.)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

There's been work on RT java

There's been work on RT java for quite awhile now. Some links were posted here in the past (for example). Some of the work is acutally quite interesting.

Personally, I find java terrifying even away from nuclear reactors, but that's just me I guess...

becomes a chance to plug Scala

One could avoid distatesful Java and write Scala source code - my dream is that plus running it on game hardware via something like their related "PERC Pico" small-and-fast Java implementation. (As an aside, PERC just seems like a bad name - it reminds me either of the old, old computers, or nasty hazardous dry-cleaning chemicals.)

Excuse me for saying so, but

Excuse me for saying so, but isn't that an inflammatory statement? I thought we are supposed not to do that, especially in LtU.

Anyway, is there any real reason why Java is generally terrifying?

Is there a magic way that makes programs in other languages (Haskell, ML, etc) at nuclear reactors safe? I don't think so.

Real time

Is there a magic way that makes programs in other languages (Haskell, ML, etc) at nuclear reactors safe?

Not sure why you connect these FP langauges with what Ehud is saying? He's probably more likely to be talking about a language that has a name which is a palindrome.

Keeping in Mind That Safety is Relative...

Achilleas Margaritis: Anyway, is there any real reason why Java is generally terrifying?

Absolutely. Far too often (but less with Java 5), it's necessary to write unsafe code using instanceof, downcasts, etc. to get real work done. The language doesn't have a formal semantics, which might not be a "real reason" it's terrifying, but the sheer size and complexity of the current language makes that scary. And, as I've noted before, I manage to crash the JVM relatively frequently—I'd say, on average, once a month. Hardly the kind of reliability rate you'd want for a nuclear reactor.

Achilleas Margaritis: Is there a magic way that makes programs in other languages (Haskell, ML, etc) at nuclear reactors safe? I don't think so.

Magic, no. The difference between a sound type system and an unsound one, yes.

Assuming that you stay away from the various Unsafe operations (which are named so, by the way) in Haskell, the Obj module in O'Caml, and so on, those alternatives really are vastly safer even than Java, which is vastly safer than C/C++. Better still, for something like nuclear plant operation, a language like Haskell, Standard ML, or O'Caml is at least amenable to being proven trustworthy by formal methods, e.g. the use of a theorem prover in conjunction with a formal semantics of (at least a reasonable subset of) those languages (see Djinn, Coq, Twelf, MetaPRL, HOL Light...). Nuclear plant operation is very definitely an application where you want to pay attention to even epsilons of difference in reliability, and as a professional Java programmer for about the past decade or so, I can tell you, you really do not want Java running a nuclear plant. I'd be happy with O'Caml running a plant. I'd be even happier with the subset of O'Caml that's covered by MetaPRL running the plant, assuming that the code had been proven correct with MetaPRL.

Different aspects of safety

At the low level, response times can be a critical factor, so you want a PL that has an abstraction that is close to the metal. This is not to say that Haskell & OCaml are slow (though the perceptions can be skewed on this subject), but most any language still requires crafting around real time response. Of course, limiting the language to a subset, as you mentioned here and others mentioned in another recent thread, is one way to limit your exposure.

The other factor that gets involved is the wealth of libraries for your particular domain. If all the vendors are writing their stuff in PL-X, then you can draw on their resources in terms of specifications and tests. So even if PL-X might not be ideal, the abundance of tried and tested off-the-shelf solutions can be the deciding point. Again, this is not to say that Haskell & OCaml don't have a wealth of libraries.

From my personal experience, I've written software in the past that excited sources to measure the hydrocarbons via a pulsed neutron tool - put out a fairly high dose if you got anywhere near the cloud, though the half-life was subsecond. I've also dealt with software that attempted to make setting off explosives a bit more safe (as an aside, a person was killed (prior to my joining the project) when hand signals were mixed up - no software involved, as it was all analog switches at the time). My opinion on the subject was that software should be the second line of defense - with the hardware being the primary. So safety started with the EEs. This meant that the software had to map fairly close to the hardware and work its way out form there.

Of course, the problem with my theory is that EEs are getting to be more and more like software engineers, as the hardware becomes programmable.

Circuits vs programming?

Of course, the problem with my theory is that EEs are getting to be more and more like software engineers, as the hardware becomes programmable.

I've run across this sentiment a lot, the idea that electrical engineers know how to "engineer" in a way that leads to reliability and reuse, that doesn't lead to a "tarpit" or something similar to the "software crisis".

When I took a course in circuit design in college, it reminded me a lot of programming. I'm having a hard time trying to understand why circuit design is often considered tractable, and programming applications/systems is not.

Does anybody know of a paper or online summary that explains the issues succinctly?

Do software engineers exist?

I'm having a hard time trying to understand why circuit design is often considered tractable, and programming applications/systems is not.

Not to get too OT -- but it really comes down to culture, perception, and behavior. "Tractability" really refers to how well the behavior of the finished product is understood, and this has to do with the complexity of the system. You can bet that a commodity PC motherboard has just as many dark corners as a complex piece of software (because it is composed of many cooperating abstract components, just like software).

You might browse the links at the bottom of the Wikipedia software engineering article for more punditry. Also see Alan Kay's Andy Warhol theory. Basically, there are still some who believe software people should be "developers", not "engineers" -- this will probably continue as long as there are software developers that do not act like engineers, and managers that do not treat software development like engineering.

Instanceof unsafe?

Far too often (but less with Java 5), it's necessary to write unsafe code using instanceof, downcasts, etc. to get real work done.

I'm sorry, I can understand downcasts, but why do you think instanceof is unsafe? My only beef with it whould be that it doesn't introduce an implicit downcast as in Nice - is that it?

Briefly...

...you can think of instanceof as being a bit like a typecase, but without the rest of the switch-like syntax and semantics. That is, its use is like:

if (instanceof x SomeClass)
{
// Do this
}
else if (instanceof x OtherClass)
{
// Do that
}
else ...

The danger is, as usual, in the ellipsis. The compiler can't help you know whether you've done exhaustive case analysis or not. Of course, in such a construct, you're going to want to have a final "else" that probably is going to throw an exception, but that's the point: that exception could be seen by your user instead of you.

Of course, it should also be noted that these kinds of structures are bad code anyway: on one hand, they're a poor man's variant type, so you might want to consider using a language that actually supports variant types. But even if you're stuck with Java, this sort of thing is what the Visitor pattern, a somewhat less poor poor man's variant type, is for.

Either that, or use

Either that, or use polymorphism.

EDIT: Nevermind, just realized Paul mentioned the visitor (anti)pattern, which leads me to believe they're talking about double dispatch :) You don't necessarily need variant types, just a language which has dynamic dispatch based on function arguments.

Other recent "safety-critical" story

Just to tie this story to the other recent story we've had on "safety-critical" coding guidelines, note that NASA/JPL (with which Gerard Holzmann, author of the other article, is affiliated) has been looking at using RT Java in spacecraft flight software for several years now. Of course, C is still the dominant language for flight software at JPL (and many other spacecraft builders), which is why Holzmann's article primarily addressed C.

It's also interesting to note that Ada was a popular language for flight software development, but has fallen out of favor in the last decade or so. My (admittedly limited) understanding is that the fall from grace of Ada was partly a result of difficulties in hiring programmers trained in Ada, and partly due to a decision by the US Department of Defense to abandon military standards in favor of letting contractors do what they thought was appropriate.

Since I'm discussing languages for flight software here, I might as well also provide a pointer to Erann Gat's interesting account of the rise and fall of Lisp at JPL.

More safety-critical Java

This presentation by David Bacon, one of the guys behind IBM's Metronome RT collector, includes some info related to these issues.

I'll summarize a bit: it points out that the traditional methodologies for realtime systems relied on "highly restricted programming models with verifiable properties, and/or low-level languages for explicit control," combined with "ad-hoc low-level methods with validation by simulation and prototyping." But this doesn't all scale well, particularly to more complex "integrated multi-level networked systems".

Java becomes a viable solution if some of its weaknesses can be addressed, particularly garbage collection performance.

The seventh slide starts out with this: "Surprise early adopter: defense industry. Slow, painful death of Ada [due to] lack of programmers". (I hope Ehud won't shoot the messenger!) It goes on to mention other issues driving defense industry adoption of Java for real-time systems, including the desire to use COTS systems, and the "longevity of systems: aircraft carrier, air traffic control." At the end of that slide, it mentions "IBM Real Time Java selected by Raytheon for DD(X)" [a U.S. Navy program covering several classes of advanced technology ships].

To follow up on that last point, see Java Becomes Entrenched as Language of Choice [for new military software development projects]. It mentions DD(X) and the "thousands" of Java developers working on it, as well as Java's use in UAV systems, some of which include "intelligent weapon systems".

Finally, just to drive home the point (think of it as carpet-bombing with links), Java for Safety Critical Applications gives a link to another paper on the subject, which was presented at the Space 2005 conference last August, by a Raytheon software architect.

So if safety-critical Java is an oxymoron, there are going to be a lot of embarrassed project managers at some point. Seriously, though, other than garbage collection, which seems to be well under control, it's hard to see what fundamental issues would prevent Java from being suitable in these cases. The reliability of Java is directly related to the reliability of the JVM, and these projects aren't downloading free JVMs from the net, they're using systems that have had a lot of resources plowed into optimizing and validating them. This approach has many advantages compared to what's gone before — does "safety-critical real-time C" really give you a comfortable, safe feeling by comparison?

Re: oxymoron

I don't have any experience with really safe systems, but on the face of it the most impressive stuff was the Praxis SPARKAda stuff. What little I 'd heard of about anything else in the world led me to figure C, Java, Haskell - whatever you pick - would be not really as safe as one might wish? [Edit: of course, a lot of why Praxis is successfull is methodology rather than just having some magic bullet runtime, so in theory that methodology would be applicable to other languages.]

Well you can apply the same

Well you can apply the same sort of methodology as Praxis does and use Java: there's JML and related tools which provide similar extra capabilities for formal annotation and extended static checking of Java that SPARK provides for Ada. Of course JML and the tools built to make use of it aren't quite as rigorous as what SPARK provides. I particular SPARKAda uses only a subset of Ada to ensure that all static checks can be properly resolved, whereas I'm aware of no specification for a subset of Java that would provide similar guarantees of soundness and completeness for ESC/Java2 using JML.

Still, the point is that there do exist tools to help make your Java code safer, and to provide some support for the sort of methodology that Praxis uses.

And more applications of Real-Time Java

I recently saw a presentation on the use of RTJ on the ScanEagle UAV project presented at RTAS 06. See Real-Time Java: An Experience Report for some details which references the RTAS paper and other documents.

It gives details of implementing a JVM to meet the Real-Time Specification for Java and how they coped with memory management issues, etc.

I would be surprised to hear about Real-Time Java being used in something like a civil aviation project at the moment but I believe projects such as those mentioned above will mean it will be in years to come. The number of Java programmers far outweighs the number of Ada programmers and provides more modern programming constructs than C. (Whether these features (i.e. OO) are useful in real-time embedded applications is an open debate, but I believe that C++ was used on JSF so there seems to be a demand.)

Debuggability

I wonder if the remote debugging properties of an interpreted language are any different from a compiled language? OTOH, you have access to the stack frames, runtime information, exceptions, etc -- but how do you handle SEUs? I recall several heroic stories of engineers having to patch their code to work around a memory cell that was the victim of a cosmic ray hit. Wouldn't this kind of bare-metal diagnosis be more difficult if you have to also debug interpreter, JIT, etc?

[edit]I was finally able to DL the paper -- seems that most spaceflight software recovers from SEUs by rebooting, the faster the better :) I still wonder how corruption of persisted data is handled.

SEUs

I recall several heroic stories of engineers having to patch their code to work around a memory cell that was the victim of a cosmic ray hit.

I am skeptical of this claim. SEUs are, by their nature, transient. Writing a new value to an upset memory cell will restore functionality. In fact, there's been some interesting research by Surrey Satellites in the UK that shows that the failure rate of processors susceptible to SEUs depends on the type of software running on the processor, and specifically on whether the software is likely to read erroneous values from upset registers before those values are overwritten.
...seems that most spaceflight software recovers from SEUs by rebooting...
It depends on the type of system you're running on, and how the error is detected. Some systems run three processors in lock-step (so-called "triple-modular redundancy") and majority vote on what to do. Faulty processors are restarted. Other systems simply use one processor to watchdog another, and switch control from to the backup processor when an error is detected (the faulty processor is restarted and becomes the new watchdog). Some processors handle SEU detection and recovery at the hardware level (via self-checking logic, and careful choice of fabrication processes).

I still wonder how corruption of persisted data is handled.

That depends on what kind of data is being persisted. Flight software images which aren't likely to change are often carried in redundant PROMs (although that's less to deal with SEUs than with other problems). Mission data may be stored in SEU-immune memory of some kind (e.g. disks or tapes). Data in SEU-susceptible memory is typically protected either by error-correcting codes (either in hardware or software), or by majority voting on redundant memory cells (which is effectively a brute-force error-correcting code). It's not uncommon for spacecraft to perform regular "memory scrubbing", in which the entire memory is read and rewritten (correcting errors as it goes). This scrubbing has to be performed frequently enough that the accumulated number of upset-induced errors doesn't overwhelm the error-correcting capability of whatever code is being used.

Finally, to bring this back in the direction of programming languages, there's been some interesting work at Stanford on "Software-Implemented Hardware Fault Tolerance". This involves transparently inserting various fault-detection and recovery measures (such as duplicate code, and signatures on control branches) into a program during the compilation process. Preliminary on-orbit tests on the ARGOS project seem to show that SIHFT software running on a COTS processor provides a level of reliability almost as good as that provided by expensive radiation-hardened processors. An overview of the SIHFT work can be found here. A slightly different approach suggested by spacecraft manufacturers AeroAstro is to create objects that are include error-correction code, and use operator overloading to make these error-correcting objects transparently available to programmers.

Ada

We're doing air-traffic control software in C/C++. Due to a flash of responsability and the upcoming discussion about Ada, I thought I'd give it a look -- and what struck me immediately was the COBOL-esque verbosity. Yes, the C family of languages definitely got its quirks, but I've come to like its compactness. I sometimes feel that I'm not alone...

Cobol is probably the wrong point of comparison

Algol and Pascal would probably be a better point of comparison.

If I were going to write a

If I were going to write a safety critical system I am sure the compactness of the syntax is going to be my priority item. Why, I ask you, should one bother about clear semantics, a sound (and expressive) type system, tons of best practices and tools when you can have terse syntax instead?

In the end the clarity

In the end the clarity provided by slightly more verbose but well thought out syntax is actually a significant plus. As was mentioned here earlier there was a great talk by Robert Dewar pointing out how suprisingly significant an impact little things like a focus on readability can be. In the end it is a lot less work than you might think once you get used to it.