Crystal Scheme: A Language for Massively Parallel Machines

Crystal Scheme: A Language for Massively Parallel Machines
Massively parallel computers are built out of thousands conventional but powerful processors with independent memories. Very simple topologies mainly based on physical neighbourhood link these processors. The paper discusses extensions to the Scheme language in order to master such machines. Allowing arguments of functions to be concurrently evaluated introduces parallelism. Migration across the different processors is achieved through a remote evaluation mechanism. First-class continuations offering some semantical problems with respect to concurrency, we propose a neat semantics for them and then show how to build, in the language itself, advanced concurrent constructs such as futures. Eventually we comment some simulations, with various topologies and migration policies, which enables to appreciate our previous linguistical choices and confirms the viability of the model.
Note the year of publication -1991. The bibliography goes back to 1980. My question might be naive, but what are major inventions in the area of hardware parallelism since then? What ideas comprising, e.g., Cell architecture were not known at 1991? Is Cell just an engineering achievement (assuming it is an achievement) or a scientific as well?

And, returning to PLT, can Crystal Scheme be useful for Cell?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Related Paper

http://portal.acm.org/citation.cfm?id=319870

Connection Machine Lisp: fine-grained parallel symbolic processing

by Guy Steele and Daniel Hillis

topology

the Cell machine is really very similar to an architecture contemporary with this article, the CM-5, which consisted nodes containing 4 deeply piplined vector units with associated memories, a sparc control processor, and a network interface (except i haven't heard much about the network)

the parallel world has not changed very much in 15 years, except that the topologically sensitive distribution models are somewhat less important. clos networks (again used on the cm-5) have fairly flat access times across effectively arbitrarily large machines. the state of hardware has also advanced such that delays are close to being constrained by speed of signal propagation, and are within an order of magnitude of a local dram access.

the 'systolic' model (iWarp), which was in fashion then, set the tone for this paper if not the real content. it seems thoroughly dead. it required, or perhaps permitted, the compiler or programmer to map register transfers between ajacent nodes directly onto the processor mesh.

whats really happening is that the large/cheap systems are using very loosely connected processors with terrible message passing libraries. the programmer explicitly manages problem decomposition, intermediate information exchange, and collection of the final result.

but you can also get commodity SMP style machines that are quite a bit larger and more capable than 1991. given that architectures seem to have settled on the thread model for future evolution, i think this is where the interesting compiler/language work can be done. lots of homogenous threads with maybe some locality optimization depending on the cache architecture.

its also interesting to note that Steele's Fortress effort (and a parallel one called Chapel at Cray) are attempts to revive the notion that language can help manage the complexity of parallel programming, and that the compiler and runtime are in a good position to provide substantial performance gains underneath the language abstraction. the parallel languages available in 1991 weren't fantastic, but they were far more productive environments than the state of the art practiced today.