Distributed Functional Programming

First – a few words to describe the problem context.

Looking at Croquet architecture that proclaims as its ultimate goal to provide “truly large scale distributed computing platform, consisting of heterogeneous computing devices distributed throughout a planet-scale communications network” brought to my mind an old idea of Virtual Worlds on Internet. [see also my abstract of distributed objects in Croquet here]

It happened so that back then (about 10 years ago) I was taking part in creating such VRML (Virtual Reality Modeling Language) worlds at Paragraph when the idea of collaborative 3D Internet was very hot and seemed to be a real next “big thing”. Being all my programming life an “object-oriented” kind, I spent a lot of time on solving the problem of object state synchronization across REAL internet. By real network I mean network with unpredictable failures, delivery times and congestions conditions. Those days this and other problems / advantages of distributed objects were actively discusses at Distributed Coalition at Caltech, that I was a member of then and that today doesn’t exist any more, which is really sad.

Anyway, back to the old problem and its possible solution in realm of functional programming that I am looking for.

In distributed object systems similar to Croquet (virtual worlds on the net), one and the same real world object must be represented on several networked machines. This representation should contain both consistent object state and object visual image correctly reflecting this state. A simple example is an avatar – a “shadow” of the real world user. Avatars can form a work group that stands for real humans physically “distributed” on the net. When object state changes (a user start pacing the room) its visual image must reflect these changes consistently and “simultaneously” at ALL network nodes “AWARE” about this object.To create network-wide “awareness” some systems, including Croquet, create object “replicas” that exist on all nodes in one ore more groups. “Original” object controls all its replicas sending them messages. These messages may contain state notification events, state-change commands or “active objects” implementing some behavior.

To ensure consistent object state across all replicas Croquet uses two-phase commit protocol. In short object updates its state through the following steps:

1) object state is going to change
2) object informs all its replicas about proposed change
3) replicas reply indicating their decision to accept the change or reject it
4) object takes final decision to change the state or not
5) object sends its decision (change state or undo change) to all replicas
6) replicas receive final decision (instruction from “master” object)
7) replicas act according to “master” instruction

There are several problems with this protocol on real internet:

1) Scalability – system doesn’t scale well when number of replicas grows over hundred. As
result, the following parameters grow immensely:
a) network congestion
b) roundtrip time off all messages involved in two-phase commit protocol
c) as a result of the above - time delays between final state changes, which in turn results in unacceptable frequency of visual frame rendering

2) Reliability – some replicas may not receive final decision from muster object at all (see step 6 above). In this case on the node where replica failed to receive final instruction this object state becomes inconsistent with the rest of the world. Also local world on this node as a whole becomes inconsistent as well. To repair such node a sophisticated re-synchronization with the rest of the world is required. Croquet architecture document does not say anything about re-synchronization mechanisms.

To conclude: two-phase commit protocol is good enough for banking transactions on highly reliable networks but does not work for real internet !

Now back to my quest. I am new to functional programming with just some knowledge of
Scheme, that I like a lot and try to use for the real world problems similar to the one
described above. It is easy to mimic objects and replicas in Scheme with constructed
functional object (including master object environment at the moment of construction).
Something like that:

(define (mk-replica master-env)
  (lambda (remote-env) 
    (display master-env)
    (newline)
    (display remote-env)))

(define replica (mk-replica "master-object-env"))
(define (remote-test)
  (replica "local-replica-env")) 

Such “functional object replicas” can be sent as messages to remote nodes. Master object can then in turn send messages or other function-objects to these replicas to realize, for example, two-phase commit protocol.

Still the problem of state synchronization remains unsolved.

It looks like Glassgow Distributed Haskell
(GdH)
targets the same goal as Croquet with non-strict functional language approach.
In contrast to Croquet (living in Squeak environment) where distributed objects
communicate with messages, GdH uses “shared memory” abstraction to distribute
computations. I am not sure that GdH model fits well for the particular case of distributed, interactive virtual worlds described here. Nevertheless GdH I may be good for other forms of parallel computations.

As long as functional programming allows immutable, persistent, “pure functional” data structures it would be interesting to extend these for distributed systems similar to Croquet and other virtual worlds on the Internet (like described here).

Any ideas how to do Distributed Functional Programming with “pure functional” data
structures?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Erlang

Do you know Erlang? It is the most used distributed functional programming language.

Concepts, Techniques and Models of Computer Programming

Concepts, Techniques and Models of Computer Programming is good in discussing both concurrency and distributed programming.

Re: CTM

wrwills: Concepts, Techniques and Models of Computer Programming is good in discussing both concurrency and distributed programming.

And imperative programming, and functional programming, and object-oriented programming, and logic programming, and constraint programming, and event-driven programming, and security, and modules, and...

Seriously, this is a mind-blowing book, the way "The Little LISPer" was when I first read it in high school, or SICP was when I first encountered it in college. But it covers many orders of magnitude more ground than its predecessors. This should be on literally everyone's shelves.

Distributed functional programming

Any ideas how to do Distributed Functional Programming with “pure functional” data structures?

Chapter 4 of CTM gives a good start. The idea is to extend
a functional language with dataflow variables. This allows you to
organize your program as concurrent entities (we call them "stream
objects") that send each other messages, while still keeping the
pure functional semantics (e.g., no race conditions). It works with
both eager and lazy evaluation, and works just as well in a
centralized or distributed setting (see chapter 11).

But after reading the rest of your message, my impression is that
you want something else, which is not necessarily purely
functional: a way to keep stateful objects as transparent as
possible and as distributed as necessary. There are many ways
to do this. You can find a good overview of the issues in the
talk I gave at a Belgian Aspect-Oriented Programming
workshop. Some of the techniques given in that talk are
implemented in the Mozart system.

But after reading the rest of

But after reading the rest of your message, my impression is that you want something else, which is not necessarily purely functional: a way to keep stateful objects as transparent aspossible and as distributed as necessary.

Yes I want something else, not just general-purpose frameworks and languages for parallel progrmming, like OZ, Erlang and Linda tuple space (that (Linda) I have some experienxce with and which certanly may help to get close to the goal of distributing state easily and efficiently).

Your phrase "a way to keep stateful objects as transparent as possible" is a good illustration of what I want, especialy I like the part "as possible"

This is a good understanding of the fact that in real life, real internet, in fact there is no such thing as "transperent stateful object". Stateful opbjects are not transperent at all! One may only want to find good enough mechanism to control this transperency in such a way that takes away some load of problems distributing state on the net from an application programmer, not usually an expert in this filed.

Nevertheless, this application programmer must be aware of the fact that application he is creating is in fact a distributed appliication with all implied limitations and consequences of this model.

And my questions here are:

1) What is an appropriate paradigm for distributed programming, better then stateful objects? I mean an abstraction that will allow application programmer to control distributed state (not neccessarily objects!) efficiently and define the scope of transperency as needed.

2)Can functional programming provide such an absrtaction and its efficient implementation?

3)Does it make sense to extend the abstarction of "pure functional data structures" (immutable, persitent data with history) to the network case?

So in fact I am looking more not for specific languages and frameworks here but to models and mechanisms that could be efficiently and convinently implemented as abstractions in functional languages.

Virtual Synchrony (VS)from Horus Project could be one of such mechanisms. I am not sure if VS was already explored in the context of some functional abstraction, though.

You also wrote:

The idea is to extend a functional language with dataflow variables. This allows you to organize your program as concurrent entities (we call them "stream objects") that send each other messages, while still keeping the pure functional semantics (e.g., no race conditions).

Sounds intresting. Do you have any papers on the "stream objects" available online? Or this is described in this book only?

Stream objects

Sounds intresting. Do you have any papers on the "stream objects" available online? Or this is described in this book only?

Well, it started in the book (Chapter 4 started as a "hole" -- a gap
in the computation models we knew, that we found while writing
the book) but you can look at the following article:

Declarative Laziness in a Concurrent Constraint
Language
, 2nd International Workshop on Multiparadigm
Constraint Programming Languages (MultiCPL 2003),
September 2003.

This article explains the theory underlying the general model
(which does both eager and lazy execution) but it doesn't
really say much about the programming techniques. For those,
you might want to check out the programming course given
by Seif Haridi and Christian Schulte based on the book.

I still recommend you see how it's explained in the book, though.

Syntax

Since you mentioned CTM...

Did you notice that all these great books rely on languages that have problematic syntax. Personally, I prefer the non-syntax of Scheme over CTM, but I guess this is just beacuse I am mor used to it.

(I hope I can be forgiven this little troll...)

What kind of problems?

Apart from being unusual to this C/Python programmer (me), I can't see why Oz's syntax is problematic. I'm in the inverse position: I like Oz over scheme because I'm not used to any of the two. And parentheses make me uneasy.

It's a matter of taste

I don't really want to argue this point, because its nothing more than a personal feeling.

I understand why people are uncomfortable with Scheme's parentheses. I was to. But I got used to them rather quickly.

I am not using Oz daily, so maybe I am mistkaen, but I seem to remember being put off by the syntax of guards.

More on syntax

A nippet of code from PvR's paper:

fun {Append Xs Ys}
  case Xs of X|Xr then
    X|{Append Xr Ys}
  else Ys end
end

Now, I know several programming languages, and this isn't the first time I see Oz code, and yet...

The curlies around the function signature is strange (even though I know Scheme...). Given the nature of the syntax, which resembles imperative languages, I'd expect only the parameters to be parenthesized.

The case statement introduces a binding, which is also a bit confusing. The order of reading isn't the natural order for me (X and Xr are the "subject" of this sentence, not Xs).

"end" is used both for "end fun" and "end case". I prefer to distinguish between different kinds of scope delimiters (unless you go for minimal syntax, like () in Scheme, {} in C, or whitespace in Python/Haskell - of which I am no great fan, but I can live with).

And oh, I really hate the keyword "fun". It's an English word, with quite a different meaning, and it distracts my reading.

All thse refelct my personal taste. So don't take this to be a flame against Oz. It's not.

De gustibus non disputandem est

*sigh*

Let me just give a few personal remarks in reaction to your detailed comments. The syntax of a language is always a delicate mix of principles, trade-offs, and practical issues. For me personally, syntax is not an issue unless it gets in the way of understanding the semantics. If it stays out of the way, like the Oz syntax does, then it is transparent. (It's like reading a book in translation: for me, reading a Dutch translation of The Hobbit is just a thin veneer over Tolkien's crisp English.) For Oz, we have put a lot of thought and practical experimentation in the syntax (as well as in the rest of the language). Upward compatibility with an existing syntax was not a design goal. On the contrary, it would mostly likely have given people the wrong intuition regarding semantics. Having a well-factored, simple, compositional syntax was much more important, especially considering how expressive the language is (see how short the syntax definition of appendix C is). For me, the Append example you give is very beautiful. Let me just this once reply to your specific remarks.

The curlies around the function signature is strange: Since parentheses were already taken for term syntax (tuples and records), curly brackets are a natural choice here.

The order of reading isn't the natural order for me: Well, the scope of the newly introduced X and Xr ranges over the 'then' clause, so the order is actually quite natural: match Xs against X|Xr. Doing it the other way would introduce no end of confusion (non-contiguous scopes of X and Xr).

"end" is used both for "end fun" and "end case": We experimented with other possibilities, and an 'end' at the end turned out to be simple, readable, and pleasing (for us, at least!).

I really hate the keyword "fun": See the subject line!

All these reflect my personal taste: This is your right. If everybody had the same taste, the world would be very boring.

Let me finish with a quote from Borges: "There is no good text that does not seem definitive if one turns to it a sufficient number of times." I would say that this applies to syntax as well!

OK

For me personally, syntax is not an issue unless it gets in the way of understanding the semantics.

I agree with this in general.

Since parentheses were already taken for term syntax (tuples and records), curly brackets are a natural choice here.

I wasn't complaining about the fact that you use curly barckets. I was referring to the fact that it's {Append Xs Ys Zs} and not Append{Xs Ys Zs}.

I didn't mean to make you *sigh*, sorry...

And for the sake of discussion..

I could point out at Norvig's publishing of Python and Java source code in addition to Lisp for his AI book.