For Erlang history, you can't go past Bjarne Däcker's thesis on the development of Erlang (from the beginning up to 2000): http://www.ericsson.com/cslab/publications/bjarnelic.ps|
Bjarne has been in charge of Ericsson's CSLab during the whole thing.
There's also a nice paper by Joe Armstrong, like a "gurilla guide to getting your stuff adopted in a large company", but I can't find the reference or remember the name. One of the most interesting bits of the Erlang history is how they managed to get their stuff actually used in products.
I also just found a lovely little quote from the first Erlang paper (see http://www.ericsson.com/cslab/publications.shtml):
In programming large systems, many small programming errors
will be made - we view this as inevitable. Formal systems and
exhaustive test procedures are currently not capable of ensuring fault
free software for systems of the size and complexity of modern
telecomms applications. Given that errors will be made, we are
interested in the problem of detecting and handling those errors in
such a manner that the system as a whole exhibits satisfactory
behaviour in the presence of errors.
This seems like a fairly novel attitude to me. Accept that any part of your program _could_ fail, use processes to isolate failures (they do this very well), and setup a general recovery process to handle arbitrary errors.
Of course not all errors manifest themselves as crashes, so you're never 100% safe.