MapReduce - functional programming in the REAL World

Tired of talking about Erlang and Telephone Switches? Next time someone asks for an example of functional programming techniques in the "real" world, suggest Google Labs to them.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Functional?

There's nothing functional about map and reduce -- the two are in fact the basic building blocks of parallel programming and are present literally everywhere when some sort of parallelism is involved.

This is just Google hype and advertising -- they've reinvented the wheel and are now touting this as some sort of special magical "google innovation" or something.

Advocacy

That's not really fair. To quote from the paper: "Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages." They are not claiming to have invented anything new, just to have been inspired by Functional Programming to produce something useful.

The point of my post is that when advocates of functional/declarative programming are told that it is of "no use in the Real World", and just "Ivory Tower", this provides more ammunition, and what is more uses a company everybody has heard of and whose search tool almost everybody uses and admires.

repeated iterations

I enjoyed the paper. The idea of intermediate keys on map output being efficiently partitioned into parallel reduce steps was nice, and the implementation (especially wrt fault tolerance and locality) was interesting

The paper mentioned application to machine learning. My question is: how does the framework handle multiple passes over the same data (say, for estimation-maximization)? Presumably, you could save yourself from reprocessing the input and simply reuse the same plan repeatedly (modulo failures) - the only thing that changes, and thus the only subsequent communication that the map nodes would need, is the previous iteration parameters (the reduce output).

Do you have the abstraction of a loop with reduce output becoming (part of) the next iteration's input, and an optimal dataflow for it? Of course, even being able to quickly perform one iteration would be nice.

Intel Connection Machine?

Sounds almost like they've built a Connection Machine out of PCs. Does this mean that Data Parallel programming is back on the rise? That would be unspeakably cool!

Best of all is that I'm completely unfamiliar with the copious prior work they cite.

(And will C++ be the new posterchild for practical functional programming? That would be a riot! I can hear John Hughes now: "I told you so!")

OpenGL: CM-in-one-chip

Data-parallel programming is definitely back on the rise, at least on the (admittedly narrow) area of general computation using GPU's. If you enjoy seeing an architecture being turned inside out and being used for a completely different purpose, you should check some of these papers out. I suggest the SIGGRAPH 03 and Graphics Hardware 03 papers (with a nice index available online). Look for the Computation on GPU section on SIGGRAPH and the Simulation and Computation section on Graphics Hardware. With these consistent reports of significative speedups (usually you get 5-10x faster on a GPU, compared to similarly costed CPUs), I'd say that this is not something that is about to go away anytime soon. My Navier-Stokes solver can be 20x faster than a single-cpu solution </plug> :)

And to stick the obligatory language design link here, check out Brook. They call it stream programming, but it's just another name for data-parallel programming. In one of Pat Hanrahan's talks, he describes programming on GPUs as having to relearn most of the stuff the folks on the Connection Machines used. Pretty cool stuff if you ask me.

Graphics card support

I'm pretty ignorant about graphics cards. Do you happen to know if any of these compilers work with an ATI Radeon Mobility M6 LY (as found in Thinkpads)?

Luke Gorrie: I'm pretty ignor

Luke Gorrie: I'm pretty ignorant about graphics cards. Do you happen to know if any of these compilers work with an ATI Radeon Mobility M6 LY (as found in Thinkpads)?

Unfortunately, no. Pixel and vertex shader support only starts to show up in the Mobility Radeon line starting with the 9000 series. Time to get a PowerBook, all of which support pixel and vertex shaders, with nVidia's GeForce FX Go5200 on the 12" models, and ATI's Mobility Radeon 9700 on the others. :-)

Or even better...

...get some GPU with Shader Model 3 support (giving you dynamic branches and less restrictions on the shader code than SM1.1 and SM2), such as a NV4x ;)

data parallel programming

Data parallel programming is also alive at Ab Initio Software Corp. with unfortunately little detail of the proprietary programming tools.

Nested Data Parallelism in Haskell

Manuel Chakravarty and Gabriele Keller are working on the the Nepal Project at the University of New South Wales.

Nepal takes elements from Nesl and Sisal and centers around array fusion.



Personally, I think that data parallelism will be much more important now that multi-core CPUs are in the pipeline.

Shae Erisson - www.ScannedInAvian.org

But why get tired of Erlang?

But why get tired of Erlang? Only last week, a manager sought me out to suggest that Erlang be used instead of C. (Admittedly at a telecoms operator.)

That apart, there is always dear old Common Lisp for real world examples.

"...Please don't assume Lisp is only useful for Animation and Graphics, AI, Bioinformatics, B2B and E-Commerce, Data Mining, EDA/Semiconductor applications, Expert Systems, Finance, Intelligent Agents, Knowledge Management, Mechanical CAD, Modeling and Simulation, Natural Language, Optimization, Research, Risk Analysis, Scheduling, Telecom, and Web Authoring just because these are the only things they happened to list." - Kent Pitman

The Erlang brand

I find it interesting that companies building Erlang-based products tend to talk about the fact in product literature. "Our system is based on Erlang/OTP, the advanced platform for distributed systems. This is why our clustering is so good." And customers seem to like this.

The Lispers seem to have more of a rough trott by and large, with a bit of concealing, apologising, and dressing-up-in-XML. Hopefully that's changed in the past year or two though.

(Nice to see you here by the way, Thomas! See you thursday?)

The Erlang brand

Good point. It seems far easier to build mindshare when you focus on a narrow topic, such as telecoms for Erlang. The downside is when that topic goes sour, e.g., the AI winter for Lisp ...

(And, sorry, no, I've managed to overschedule. No EUC for me this year.)

Telecoms brand is stronger than the AI brand

So far...

  • More people have been touched by telephones than AI programs.
  • Nobody had to reboot their phone (not yet any way, I haven't tried Windows Smartphone)
  • My telephone service is more reliable than my power service

I think it's easier to convince a customer to use a technology used by telecoms over a Microsoft technology. It's not MS's fault of course, but it's the perception.

Rebooting phones

Oh, I've had to reboot (turn on & off) a cellphone now and again--plus hardware problems, of course.

It's not MS's fault of course

It's not MS's fault of course

One should be very careful when attributing intention to MS. They only care about market share and profits, not about reliability, usability, security, or any technical issue. It is an amusing exercise to deconstruct all remarks by Bill Gates, Steve Ballmer, and so forth, regarding their effect on market share. It's rather obvious that no statement from MS should ever be taken at face value as meaning what it seems to mean. An encouraging corollary is that MS is actually a company that is very consistent in its behavior and easy to understand and predict, much more so than companies that are more technically oriented. Almost nobody predicted Apple's coup in the legal Internet music business. Almost everybody predicts correctly that Microsoft instantly created a long-term goal to monopolize that particular area.

From planet-slashdot?

Ummm. OK. Mod up (Insightful).

<wink>!

Actually, it's that every successful AI program breaks brand

Plenty of people are touched every day by AI programs
- Search engines (via Naive Bayes Classifiers)
- OCR (Neural nets)

That's two. Then there are constraint processing engines, like those that deal with autoreserving airline tickets. Also, those new speech-recognition automated telephone menus. Oh how I hate those things, but that's AI: Language Models.

It's not AI, it's that no-one wants to claim AI. It reeks of "big brother" and the End Of The World, because of all the bad press from the buzzword loving mainstream media. Things like Terminator 1-2-3, Colossus, 2001, and The Matrix all claim that AI will destroy the world.

As for more people having been touched by AI than telephones -- probably true, but not as drastically as such a broad statement would lead one to believe.

Not how people think

We are talking about customer perception here. On an everyday level, people tend think in terms of objects that they can touch, not a process, which is more abstract.

For instance, people say "best thing since sliced bread", and not "best thing since bread slicers", or "best thing since bread slicing machines", or "best thing since bread slicing".

AI is the slicing, while telephone is the sliced bread.