archives

Distributed Functional Programming

First – a few words to describe the problem context.

Looking at Croquet architecture that proclaims as its ultimate goal to provide “truly large scale distributed computing platform, consisting of heterogeneous computing devices distributed throughout a planet-scale communications network” brought to my mind an old idea of Virtual Worlds on Internet. [see also my abstract of distributed objects in Croquet here]

It happened so that back then (about 10 years ago) I was taking part in creating such VRML (Virtual Reality Modeling Language) worlds at Paragraph when the idea of collaborative 3D Internet was very hot and seemed to be a real next “big thing”. Being all my programming life an “object-oriented” kind, I spent a lot of time on solving the problem of object state synchronization across REAL internet. By real network I mean network with unpredictable failures, delivery times and congestions conditions. Those days this and other problems / advantages of distributed objects were actively discusses at Distributed Coalition at Caltech, that I was a member of then and that today doesn’t exist any more, which is really sad.

Anyway, back to the old problem and its possible solution in realm of functional programming that I am looking for.

In distributed object systems similar to Croquet (virtual worlds on the net), one and the same real world object must be represented on several networked machines. This representation should contain both consistent object state and object visual image correctly reflecting this state. A simple example is an avatar – a “shadow” of the real world user. Avatars can form a work group that stands for real humans physically “distributed” on the net. When object state changes (a user start pacing the room) its visual image must reflect these changes consistently and “simultaneously” at ALL network nodes “AWARE” about this object.To create network-wide “awareness” some systems, including Croquet, create object “replicas” that exist on all nodes in one ore more groups. “Original” object controls all its replicas sending them messages. These messages may contain state notification events, state-change commands or “active objects” implementing some behavior.

To ensure consistent object state across all replicas Croquet uses two-phase commit protocol. In short object updates its state through the following steps:

1) object state is going to change
2) object informs all its replicas about proposed change
3) replicas reply indicating their decision to accept the change or reject it
4) object takes final decision to change the state or not
5) object sends its decision (change state or undo change) to all replicas
6) replicas receive final decision (instruction from “master” object)
7) replicas act according to “master” instruction

There are several problems with this protocol on real internet:

1) Scalability – system doesn’t scale well when number of replicas grows over hundred. As
result, the following parameters grow immensely:
a) network congestion
b) roundtrip time off all messages involved in two-phase commit protocol
c) as a result of the above - time delays between final state changes, which in turn results in unacceptable frequency of visual frame rendering

2) Reliability – some replicas may not receive final decision from muster object at all (see step 6 above). In this case on the node where replica failed to receive final instruction this object state becomes inconsistent with the rest of the world. Also local world on this node as a whole becomes inconsistent as well. To repair such node a sophisticated re-synchronization with the rest of the world is required. Croquet architecture document does not say anything about re-synchronization mechanisms.

To conclude: two-phase commit protocol is good enough for banking transactions on highly reliable networks but does not work for real internet !

Now back to my quest. I am new to functional programming with just some knowledge of
Scheme, that I like a lot and try to use for the real world problems similar to the one
described above. It is easy to mimic objects and replicas in Scheme with constructed
functional object (including master object environment at the moment of construction).
Something like that:

(define (mk-replica master-env)
  (lambda (remote-env) 
    (display master-env)
    (newline)
    (display remote-env)))

(define replica (mk-replica "master-object-env"))
(define (remote-test)
  (replica "local-replica-env")) 

Such “functional object replicas” can be sent as messages to remote nodes. Master object can then in turn send messages or other function-objects to these replicas to realize, for example, two-phase commit protocol.

Still the problem of state synchronization remains unsolved.

It looks like Glassgow Distributed Haskell
(GdH)
targets the same goal as Croquet with non-strict functional language approach.
In contrast to Croquet (living in Squeak environment) where distributed objects
communicate with messages, GdH uses “shared memory” abstraction to distribute
computations. I am not sure that GdH model fits well for the particular case of distributed, interactive virtual worlds described here. Nevertheless GdH I may be good for other forms of parallel computations.

As long as functional programming allows immutable, persistent, “pure functional” data structures it would be interesting to extend these for distributed systems similar to Croquet and other virtual worlds on the Internet (like described here).

Any ideas how to do Distributed Functional Programming with “pure functional” data
structures?