archives

Parallel frameworks for graph processing

Given successes in parallel frameworks for linear algebra (BLAS) and log processing (MapReduce), the current trend in large scale data parallelism seems to be more general graph processing. An early big contender discussed here was Google's Pregel.

Two fun ones have been making the rounds. First, and fairly related, is GraphLab:

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems.

There are obvious connections to systems like Cilk/TBB and, more focused on graphs, Pingali's Amorphous Data Parallelism work.

A radically different approach is John Gilbert's Parallel Combinatorial BLAS: A Toolbox for High-Performance Graph Computation (papers, slides):

This paper presents a scalable high-performance software library to be used for graph analysis and data mining. Large combinatorial graphs appear in many applications of high-performance computing, including computational biology, informatics, analytics, web search, dynamical systems, and sparse matrix methods. Graph computations are difficult to parallelize using traditional approaches due to their irregular nature and low operational intensity. Many graph computations, however, contain sufficient coarse grained parallelism for thousands of processors, which can be uncovered by using the right primitives.

We describe the Parallel Combinatorial BLAS, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications. We provide an extendible library interface and some guiding principles for future development. The library is evaluated using two important graph algorithms, in terms of both performance and ease-of- use. The scalability and raw performance of the example applications, using the combinatorial BLAS, are unprecedented on distributed memory clusters.

The approach to performance in both is fairly different. Intriguing to me is the choices in frontend languages: given the constrained domains, I suspect much higher level languages are possible (hint: synthesis).