Effects on stability/exception handling of massively parallel programs

I was wondering if any research had been done on the effects of massively parallel programs on reliability and/or exception handling. Intuitively, it seems that a massively parallel program is automatically more fault-tolerant, simply because errors are necessarily localized, and thus would cause a resource starvation as opposed to a fault that would crash the system.

Anyway, I wasn't sure if anyone had done any papers on it, if my intuitions were correct, or if they were way off base.

Intuitively

Intuitively, I'd say that massively parallel programs are automatically more fragile, because every single node has a chance of failing for physical reasons (power outages, someone tripping in the alimentation, overheating, bad hard drive, etc.)

Of course, for well-chosen applications and using well-chosen technologies (I'm thinking of JoCaml), it is easy to write distributed programs which survive this kind of failure.

By David Teller at Fri, 2006-02-24 13:54 | login or register to post comments

making reliable systems in the presence of software errors

See Joe Armstrong's Phd thesis (pdf)

By Isaac Gouy at Fri, 2006-02-24 18:45 | login or register to post comments

Not everybody agrees

Perusing the Clean lists, I came across some disagreement with Dr. Armstrong's paper.

By raould at Tue, 2006-02-28 23:43 | login or register to post comments

Wow, what unrealistic criticism!

I don't know the author (Erik Z.) of those comments, but that's the kind of over-the-top criticism that makes people not want to deal with so-called academics. I was going to pull out some quotes to show how ridiculous they are, but it's too easy.

By James Hague at Fri, 2006-03-03 19:25 | login or register to post comments

don't know the author

Do you know whether or not he is an academic?
I can see that Joe Armstrong has a homepage at the Swedish Institute of Computer Science - does that make him an academic :-)

By Isaac Gouy at Tue, 2006-03-07 03:37 | login or register to post comments

Don't know for sure

Anyone who argues so strongly for purity and provable program correctness--to the point of dismissing Erlang for lacking in those areas--can't possibly have done any kind of software engineering :)

I mean, sure, we all want those things, but reality steps in and so we use Python and Lisp and C and so on, even though they aren't perfect languages. And quite often those languages have benefits that make them suitable for different application areas: C for problems that map directly to existing hardware, Python for the rich libraries, etc.

By James Hague at Tue, 2006-03-07 19:59 | login or register to post comments

make up your own mind

I'm a little puzzled by the comment about Erlang being imperative but apparently the information was gathered at a beer-bust ;-)

The thesis is available on-line - there's no reason to rely on stale third-hand information.

By Isaac Gouy at Tue, 2006-03-07 03:42 | login or register to post comments

Yes.

That surprised me too. Erlang looks a lot more like Scheme or ML than C or Algol. Letâ€™s see: no loops, no destructive assignments, pattern matching, first class functions; it looks more functional than imperative. As for more OO than functional, I don't know where that came from either. I always thought that OO implied objects and mutable state. (You could view processes communicating as objects communicating, but then again you can view closures as objects too).

I agree with your sentiments regarding the thesis. It's right there in front of you, no need to let somebody else do your thinking, read it and make your own decisions.

By Benjamin D. Cutler at Tue, 2006-03-07 22:33 | login or register to post comments

With the caveat...

that Erlang does have lots and lots of shared mutable state. It's just hidden in process statuses and mailboxes, where it can only be accessed in some reasonably-but-not-completely safe ways. You can see where an FP purist would find that objectionable, as the imperative stuff forms the top-level architecture that the functional stuff is plugged into.

(I would also say that any strict language that so strongly encourages tail-recursive-and-never-returning "functions" basically has loops in all but the name, but that could just be me.)

By Dave Griffith at Fri, 2006-03-10 12:40 | login or register to post comments

Weird

that was meant to be a comment to Benjamin's note, but the wiki seems to have lost that fact.

By Dave Griffith at Fri, 2006-03-10 12:41 | login or register to post comments

The dirty secret of FP is...

...that recursion and threads can be used to encapsulate state. So FP purists who argue against such aren't really arguing against state per se. Rather, they are arguing against threads and messaging (I doubt they'd be arguing against recursion).

By Chris Rathman at Fri, 2006-03-10 14:41 | login or register to post comments

Lambda the Ultimate

User login

Navigation