Google's MapReduce Programming Model -- Revisited. Ralf Lämmel.
MapReduce is a programming model (and an associated implementation) used at Google for processing large amounts of input data (such as millions of documents) and generating keyed or indexed query results (such as the indexes used for Google's web search). The programming model is stunningly simple, and it allows the programmer to largely abstract from the parallel and distributed execution of the data-processing computations. Parallel execution is promoted by the assumed skeleton of MapReduce computations. Load balancing, network performance and fault tolerance are taken care of by the MapReduce implementation.
We revisit the MapReduce programming model in an attempt to provide a rigorous description of the model. We focus on the key abstraction for MapReduce computations; this abstraction is parameterized by the problem-specific ingredients for data extraction and reduction. We use Haskell as a lightweight specification language to capture the essence of MapReduce computations in a succinct, executable and strongly typed manner. Our study substantiates that clarity, generality and correctness of designs (or the presentations thereof) are easily improved, if modest functional programming skills are put to work.
This is a fun example of type-directed exploration and prototyping, and pokes some gentle fun of Google's (ab)use of standard terminology. It's neat to get a glimpse into the programming process of a very experienced Haskell programmer.
(From the forums.)