Lambda the Ultimate

Distributed programming made easy
started 10/21/2003; 5:36:49 AM - last post 10/23/2003; 1:58:36 PM

Peter Van Roy - Distributed programming made easy

10/21/2003; 5:36:49 AM (reads: 791, responses: 7)

One of the major problems in programming language research is making it easy to write practical distributed applications. That is, applications that run well on distributed systems, even with all the vicissitudes of partial failure, security, naming and openness (finding the other side and connecting to it).

This is much more difficult than writing an application that runs on one machine (a "centralized" application, in our terminology). To find a true, long-term solution to this problem, the language should be part of the solution and not part of the problem. I.e., we have to design a new language. No long-term solution is possible with mainstream languages (Java, C#, ...) since their basic assumptions are wrong: they are stateful by default, they make it hard to program without state, and they use shared-state concurrency by default. State is hard because it implies a global view, and distributed systems don't have a global view by default.

A long-term solution should make distributed programming as close as possible to centralized programming, i.e., it should make the network as transparent as possible. This is nothing new--it is the standard approach for solving problems in CS! E.g., virtual memory means that memory size can be considered infinite to a first approximation, floating point means that numbers can be considered to be true reals to a first approximation, etc. The trick is how to handle when the non-transparent bits peek through the abstraction boundaries.

To realize this solution, the first step is to design a language that makes network transparency easy and efficient. At the very least, the language should distinguish between stateless and stateful data and make stateless data the default. Stateless data are preferable because no coherence protocols are needed for them: they can be copied across the network. (We find that adding a third kind of data, single-assignment, is a good idea because it gets some of the abilities of state while keeping the language declarative and the coherence protocol simple.)

When we realized in 1995 that Oz has these three kinds of data, we decided to plunge into distributed programming research. (See "Programming Languages for Distributed Applications", available on the Mozart website, or see chapter 12 of CTM.) We are now at the second step: finding the right abstractions so that the non-transparent bits peek out in a simple way. For example, see the GlobalStore: it provides active fault tolerance and uses an optimistic transaction protocol to overcome network delays.

Ehud Lamm - Re: Distributed programming made easy

10/21/2003; 7:01:32 AM (reads: 780, responses: 0)

Seems to me that from a programmers point of view, stateless data is almost an oxymoron. Making it the default sounds very confusing...

Peter Van Roy - Re: Distributed programming made easy

10/21/2003; 8:50:17 AM (reads: 731, responses: 1)

Stateless data (a.k.a. "values") is the bread and butter of functional programming. There's really no problem in making it the default.

Ehud Lamm - Re: Distributed programming made easy

10/21/2003; 10:12:33 AM (reads: 696, responses: 0)

I thought it was one of reasons people have trouble with functional programming...

Anyway, I simply misunderstood what you mean by statelss data. Immutable is a better word, isn't it?

Ehud Lamm - Re: Distributed programming made easy

10/21/2003; 2:29:15 PM (reads: 597, responses: 0)

Two more questions.

1) Do you have any evidence that the approach you advocate really makes complex large scale distributed systems easier to produce. Aside from erlang, of course...

2) Are you familiar with the Ada model (both for concurrency, and for distribution, the so called Annex E of the Ada95 Referene Manual). What's your take?

Ada tasks are not lightweight, and thus massively concurrent systems require some thought (thread pools and the like). Still, simple making objects active, or concurrency-aware (guarded forms, for example) is very easy and natural.

Peter Van Roy - Re: Distributed programming made easy

10/23/2003; 1:20:10 AM (reads: 391, responses: 1)

1) Do you have any evidence that the approach you advocate really makes complex large scale distributed systems easier to produce. Aside from erlang, of course...

Why isn't the Erlang evidence good enough? Like the Ericsson AXD 301 (>1 million lines of Erlang, market leader in its niche, availability of 99.9999999% according to Ericsson)? Joe Armstrong is finishing up a Ph.D. thesis now (yes, after many years in industry, it is possible!). I believe it will be available from KTH in December 2003. He talks about the AXD 301 and a few other products done with Erlang.

Here's a little nugget of evidence: a comparison of a simple distributed producer/consumer in Java and Mozart.

2) Are you familiar with the Ada model (both for concurrency, and for distribution, the so called Annex E of the Ada95 Reference Manual). What's your take?

I'll take a look and get back to you on this.

Ehud Lamm - Re: Distributed programming made easy

10/23/2003; 2:31:40 AM (reads: 398, responses: 0)

The Ada distributed model is fairly naive, but the concurrency features (tasks and protected objects) are quite expressive and are in wide spread use.

The Ada real time working group works hard to enchance these features, making them even more expressive, while keeping them as efficient as possible, for real time purposes.

Chris Rathman - Re: Distributed programming made easy

10/23/2003; 1:58:36 PM (reads: 339, responses: 0)

Two other languages that I've looked at that come to mind in attempts to make distributed programming a bit more accessible are Obliq and E. IIRC, Obliq uses a modified mutex scheme to achieve concurrency, while E makes heavy use of events.