Effects on stability/exception handling of massively parallel programs

I was wondering if any research had been done on the effects of massively parallel programs on reliability and/or exception handling. Intuitively, it seems that a massively parallel program is automatically more fault-tolerant, simply because errors are necessarily localized, and thus would cause a resource starvation as opposed to a fault that would crash the system.

Anyway, I wasn't sure if anyone had done any papers on it, if my intuitions were correct, or if they were way off base.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Intuitively

Intuitively, I'd say that massively parallel programs are automatically more fragile, because every single node has a chance of failing for physical reasons (power outages, someone tripping in the alimentation, overheating, bad hard drive, etc.)

Of course, for well-chosen applications and using well-chosen technologies (I'm thinking of JoCaml), it is easy to write distributed programs which survive this kind of failure.

Not everybody agrees

Perusing the Clean lists, I came across some disagreement with Dr. Armstrong's paper.

Wow, what unrealistic criticism!

I don't know the author (Erik Z.) of those comments, but that's the kind of over-the-top criticism that makes people not want to deal with so-called academics. I was going to pull out some quotes to show how ridiculous they are, but it's too easy.

don't know the author

Do you know whether or not he is an academic?
I can see that Joe Armstrong has a homepage at the Swedish Institute of Computer Science - does that make him an academic :-)

Don't know for sure

Anyone who argues so strongly for purity and provable program correctness--to the point of dismissing Erlang for lacking in those areas--can't possibly have done any kind of software engineering :)

I mean, sure, we all want those things, but reality steps in and so we use Python and Lisp and C and so on, even though they aren't perfect languages. And quite often those languages have benefits that make them suitable for different application areas: C for problems that map directly to existing hardware, Python for the rich libraries, etc.

make up your own mind

I'm a little puzzled by the comment about Erlang being imperative but apparently the information was gathered at a beer-bust ;-)

The thesis is available on-line - there's no reason to rely on stale third-hand information.

Yes.

That surprised me too. Erlang looks a lot more like Scheme or ML than C or Algol. Let’s see: no loops, no destructive assignments, pattern matching, first class functions; it looks more functional than imperative. As for more OO than functional, I don't know where that came from either. I always thought that OO implied objects and mutable state. (You could view processes communicating as objects communicating, but then again you can view closures as objects too).

I agree with your sentiments regarding the thesis. It's right there in front of you, no need to let somebody else do your thinking, read it and make your own decisions.

With the caveat...

that Erlang does have lots and lots of shared mutable state. It's just hidden in process statuses and mailboxes, where it can only be accessed in some reasonably-but-not-completely safe ways. You can see where an FP purist would find that objectionable, as the imperative stuff forms the top-level architecture that the functional stuff is plugged into.

(I would also say that any strict language that so strongly encourages tail-recursive-and-never-returning "functions" basically has loops in all but the name, but that could just be me.)

Weird

that was meant to be a comment to Benjamin's note, but the wiki seems to have lost that fact.

The dirty secret of FP is...

...that recursion and threads can be used to encapsulate state. So FP purists who argue against such aren't really arguing against state per se. Rather, they are arguing against threads and messaging (I doubt they'd be arguing against recursion).