archives

Distributed/Parallel language semantics

I am trying to work out the distributed or parallel semantics in a new language, and not making as much progress as I would like. I will present the main ideas that I would like to implement and request some feedback or relevant pointers.

1. Abstract state
Although that is not necessarily related to parallelism, I believe that may be important when coding in that context. The main structure of the language is the object, and an object can declare various abstract states. For example the built-in list and dictionary both have declared states "fullAccess" and "readOnly". The transition from fullAccess to readOnly corresponds to the method "const()". There is no reverse transition in that case.

Fields and methods of the object may pertain to a specific abstract state. Methods can also have a different body depending on the current abstract state. For example, there is no setter method in the readOnly state.

New states declared in subclasses can refine an inherited state, or they can be top-level (in which case the inherited state and the new state form the combined abstract state).

The reason I believe this is relevant for parallelism is that the transition from one state to the other can be used as a trigger for an action, in other words the action "waits" for the transition-- a form of lock.

2. Services
Services wait for requests and perform the requested actions. This is a multi-tiered system which has roughly the following components:

A. Capability
The capability is issued by the service and held by the client process/task. It is a special identifier that is recognised by the service when it is part of a request. The capability gives rights to do a specific set of requests, for example it could be the capability to open a file with read access.

B. Enqueuer
There is one enqueuer per process/task that maintains a queue of requests. The requests are queued locally until the queue is full. Once a request is enqueued, the enqueuer returns a handle to the client that gives access to the result of the request. The service picks requests from the queue as fast as it can process them.

C. Service
This is the main component. It collects requests from various clients according to an algorithm of its choice. Then it processes each request and sends back the result to the client.

D. Resource
Each service may manage any number of resources, which often correspond to objects of the OS. A typical resource is a file. A capability typically gives access to a set of methods on a specific resource. A resource can be directly manipulated by a service only.

3. Transactions
A transaction is a piece of code that is guaranteed to execute atomically, at least from the point of view of the shared variables it uses. There are two kinds of shared variables:

A. Simple shared variable
A simple shared variable can be modified by a single process/task at once, or read by multiple processes/tasks at once. That can be implemented by a "shared" service that manages the shared memory at the appropriate level (module, application, OS, network).

B. Versioned variable
A versioned variable can exist as multiple copies in different processes/tasks at once. Each copy point to a single, simple shared variable. At the beginning and/or the end of a transaction the versioned variable copies are synchronised with the shared copy.

In case of conflict between different versions of the variable a conflict handler is called. The conflict handler can implement the following strategies:

a) Fail
The changes of the transaction are rolled back: no change to a versioned variable is committed, and an exception handler is in charge of reverting other changes (simple shared variables, files etc).

b) Overwrite
A single version of the variable is selected and all others are discarded.

c) Merge
The handler attempts to merge all the disparate versions of the variable, attribute by attribute. A custom handler can be written to perform the merge provided a list of conflicting attributes.

Modifiers to the handler are "retries" and "timeout" (in case of conflict retry up to "retries" times after "timeout" is elapsed), "usenew" and "useold" (use the last modified of first modified version of the variable or attribute)

4. Parallel execution
Various procedures are launched in parallel, with the execution plan based on a dynamic dependency graph. This should be based on the work related to Athacaspan.

Many ideas, but I stumble on both theorical and implementation details (notably wrt versioned variables and abstract state). I would be grateful for some solid advice and material to look at.