Parallel/Distributed

The Transactional Memory / Garbage Collection Analogy

Courtesy of my shiny new commute, I have been listing to various podcasts, including Software Engineering Radio. A while back, they had an interview with Dan Grossman on his OOPSLA 2007 paper, which I have not seen discussed here.

The Transactional Memory / Garbage Collection Analogy is an essay comparing transactional memory with garbage collection based on the analogy:

Transactional memory (TM) is to shared-memory concurrency
as
garbage collection (GC) is to memory management.

Grossman presents the analogy as a word-for-word transliteration of a discussion of each of the technologies. (Hence the "fun" category.)

(As an aside, Grossman does not address message-passing, but says, "Assuming that shared memory is one model we will continue to use for the foreseeable future, it is worth improving," which is probably a correct assumption.)

One point that he does make is that

[This essay] will lead us to the balanced and obvious-once-you-say-it conclusion that transactions make it easy to define critical sections (which is a huge help in writing and maintaining shared-memory programs) but provide no help in identifying where a critical section should begin or end (which remains an enormous challenge).

The one serious weakness of the analogy, to me, is that GC does not require (much) programmer input to work, while TM does.

Although some parts of the analogy are strained, there are some interesting correspondences.

Twilight of the GPU

This interview with Tim Sweeney discusses his prediction that graphic rendering will move from special purpose GPUs back to the CPU:

I expect that in the next generation we'll write 100 percent of our rendering code in a real programming language—not DirectX, not OpenGL, but a language like C++ or CUDA. A real programming language unconstrained by weird API restrictions. Whether that runs on NVIDIA hardware, Intel hardware or ATI hardware is really an independent question. You could potentially run it on any hardware that's capable of running general-purpose code efficiently.

This is driven by the development of cheap multi-core CPUs. Consumers might still buy graphics boards, but they'll be fully programmable multi-core devices:

Intel's forthcoming Larrabee product will be sold as a discrete GPU, but it is essentially a many-core processor, and there's little doubt that forthcoming Larrabee competitors from NVIDIA and ATI will be similarly programmable, even if their individual cores are simpler and more specialized.

How are we going to program these devices? NVIDIA showed the data-parallel paradigm was practical with CUDA (LtU discussion). Now, Tim asks:

...can we take CUDA's restricted feature set—it doesn't support recursion or function pointers—and can we make that into a full C++ compatible language?... From my point of view, the ideal software layer is just to have a vectorizing C++ compiler for every architecture.

The FP lover is me says that data-parallel + C++ = the horror! However, I can appreciate it is a practical solution to a pressing problem. What do you think? Is Tim right? Is a data-parallel extension to C++ the right way to go? Or is the next version of Unreal going to be written in FRP?

A Framework for Comparing Models of Computation

A Framework for Comparing Models of Computation by Edward A. Lee and Alberto Sangiovanni-Vincentelli, 1998.
We give a denotational framework (a “meta model”) within which certain properties of models of computation can be compared. It describes concurrent processes in general terms as sets of possible behaviors. A process is determinate if, given the constraints imposed by the inputs, there are exactly one or exactly zero behaviors. Compositions of processes are processes with behaviors in the intersection of the behaviors of the component processes. The interaction between processes is through signals, which are collections of events. Each event is a value-tag pair, where the tags can come from a partially ordered or totally ordered set. Timed models are where the set of tags is totally ordered. Synchronous events share the same tag, and synchronous signals contain events with the same set of tags. Synchronous processes have only synchronous signals as behaviors. Strict causality (in timed tag systems) and continuity (in untimed tag systems) ensure determinacy under certain technical conditions. The framework is used to compare certain essential features of various models of computation, including Kahn process networks, dataflow, sequential processes, concurrent sequential processes with rendezvous, Petri nets, and discrete-event systems.
The generality of the approach looks very impressive. Can anyone share first-hand experience with this framework? Would be great to see E compared to Oz!

Clojure's Approach to Identity and State

Clojure has been discussed here before. It fits in the Lisp family with S-expressions, macros, and functions as values. Like most Lisps, it's dynamically typed and impure. But its target focus is concurrency so symbols are defined immutably by default; its standard library's collection structures are immutable and persistent; and its various mutable concepts are threadsafe except, of course, for the back doors implied by I/O and JVM library interoperability. See vars, refs, and agents.

What I wanted to highlight is position paper of sorts that Rich Hickey has posted on Clojure's Approach to Identity and State. An excerpt:

While some programs are merely large functions, e.g. compilers or theorem provers, many others are not - they are more like working models, and as such need to support what I'll refer to in this discussion as identity. By identity I mean a stable logical entity associated with a series of different values over time. Models need identity for the same reasons humans need identity - to represent the world.
...
In Clojure's model, value calculation is purely functional. Values never change. New values are functions of old, not mutations. But logical identity is well supported, via atomic references to values (Refs and Agents). Changes to references are controlled/coordinated by the system - i.e. cooperation is not optional and not manual.

Hickey also addresses the choice to not follow Erlang's model.

There are other ways to model identity and state, one of the more popular of which is the message-passing actor model, best exemplified by the quite impressive Erlang. ... It is important to understand that the actor model was designed to address the problems of distributed programs. And the problems of distributed programs are much harder ... Clojure may eventually support the actor model for distributed programming, paying the price only when distribution is required, but I think it is quite cumbersome for same-process programming. YMMV of course.

The essay is worth a read on a couple of levels of interest to LtU. At an abstract level, it's a good example of a well-articulated design justification. Agree or not, it's clear that Hickey gave thought to his decisions. Too many language designers fall into the trap of blindly inheriting semantics from a favorite language and end up putting new lipstick on the same pig. Any language designer would do well to write an essay or two like this before jumping into their venture.

At the specific level, the core approach is certainly worthy of discussion and alternative designs. Is mutable state really the best way to deal with "working models"? Are there things that the pure functional camp can do to make "working models" more convenient, e.g. do Clean's uniqueness types fit the bill at least for sequential programming, or are they too restrictive? Are there things that can make Erlang style actors less cumbersome to use especially when distribution isn't necessary?

Communicating Scala Objects

I wouldn't normally think a library is LtU material, but since this one lives at the intersection of embedded DSLs, process calculi, and a spotlight language, I think it fits: Communicating Scala Objects, Bernard Sufrin, Communicating Process Architectures 2008.

In this paper we introduce the core features of CSO (Communicating Scala Objects) – a notationally convenient embedding of the essence of occam in a modern, generically typed, object-oriented programming language that is compiled to Java Virtual Machine (JVM) code.

If you would like to play with it, the library can be downloaded here.

Partial vectorisation of Haskell programs

Partial vectorisation of Haskell programs. Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Gabriele Keller, Proc ACM Workshop on Declarative Aspects of Multicore Programming, San Francisco, Jan 2008.

Vectorisation for functional programs, also called the flattening transformation, relies on drastically reordering computations and restructuring the representation of data types. As a result, it only applies to the purely functional core of a fully-fledged functional language, such as Haskell or ML. A concrete implementation needs to apply vectorisation selectively and integrate vectorised with unvectorised code. This is challenging, as vectorisation alters the data representation, which must be suitably converted between vectorised and unvectorised code. In this paper, we present an approach to partial vectorisation that selectively vectorises sub-expressions and data types, and also, enables linking vectorised with unvectorised modules.

The idea is fairly simple, and utilizes conversion between vectorized and unvectorized representations of the datatypes. A formal translation scheme is provided.

Data Parallel Haskell papers are here.

Why Multi-Core is Easy and Internet is Hard

I was recently invited to be on the panel Reinventing Audio and Music Computation for Many-Core Processors at the International Computer Music Conference (ICMC 2008), Belfast, Ireland, Aug. 2008. In my position statement, The Challenges and Opportunities of Multiple Processors: Why Multi-Core Processors are Easy and Internet is Hard, I explain why programming multi-core processors is basically a sociological problem (the technical problems were solved long ago) and why programming loosely coupled systems (like the Internet) still has a lot of technical challenges. I am curious to hear what the LtU community thinks about this.

LASER Summerschool on Concurrency

Bertrand Meyer yet again has lined up a fantastic set of speakers for this year's LASER summerschool that targets some very relevant and timely topics in our field:
  • Tryggve Fossum, Intel Fellow and Director of Microarchitecture Development, Intel Corporation on Chip level Multi Processors
  • Maurice Herlihy, Brown University on The Art of Multiprocessor Programming
  • Bertrand Meyer, ETH Zurich and Eiffel Software on Correct Contract-Covered Concurrent Computation
  • Robin Milner, Cambridge University on Bigraphs: a space for mobile interaction
  • Peter O'Hearn, Queen Mary University of London on Proof Techniques for Concurrent Programs
  • Daniel A. Reed, Microsoft Research and UNC Chapel Hill on Living in the Multicore Computing Clouds

Map-reduce-merge: simplified relational data processing on large clusters

Map-reduce-merge: simplified relational data processing on large clusters (freely-accessible slides). Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, D. Stott Parker. 2007 ACM SIGMOD conference.

Map-Reduce is a programming model that enables easy development of scalable parallel applications to process a vast amount of data on large clusters of commodity machines. Through a simple interface with two functions, map and reduce, this model facilitates parallel implementation of many real-world tasks such as data processing jobs for search engines and machine learning.

However,this model does not directly support processing multiple related heterogeneous datasets. While processing relational data is a common need, this limitation causes difficulties and/or inefficiency when Map-Reduce is applied on relational operations like joins.

We improve Map-Reduce into a new model called Map-Reduce-Merge. It adds to Map-Reduce a Merge phase that can efficiently merge data already partitioned and sorted (or hashed) by map and reduce modules. We also demonstrate that this new model can express relational algebra operators as well as implement several join algorithms.

They seem to add a third phase – merge: ((k1, [v1]), (k2, [v2])) → (k3, [v3]) – which combines the outputs of two separate, parallel MapReduce tasks.

This makes it possible to do things like joins and build cartesian products.

DryadLINQ

The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).

I don't think this project was mentioned here before.

XML feed