ABSTRACT
Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.
- Thomas Anderson, Susan Owicki, James Saxe, and Charles Thacker, High-Speed Switch Scheduling for Local-Area Networks. ACM Trans. Comp. Syst. 11(4), 1993, 319--352. Google ScholarDigital Library
- David A. Bader and Kamesh Madduri, Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2, in Proc. 35th Intl. Conf. on Parallel Processing (ICPP'06), Columbus, OH, August 2006, 523--530. Google ScholarDigital Library
- Luiz Barroso, Jeffrey Dean, and Urs Hoelzle, Web search for a planet: The Google Cluster Architecture. IEEE Micro 23(2), 2003, 22--28. Google ScholarDigital Library
- Mohsen Bayati, Devavrat Shah, and Mayank Sharma, Maximum Weight Matching via Max-Product Belief Propagation. in Proc. IEEE Intl. Symp. on Information Theory, 2005, 1763--1767.Google Scholar
- Richard Bellman, On a routing problem. Quarterly of Applied Mathematics 16(1), 1958, 87--90.Google ScholarCross Ref
- Olaf Bonorden, Ben H. H. Juurlink, Ingo von Otte, and Ingo Rieping, The Paderborn University BSP (PUB) Library. Parallel Computing 29(2), 2003, 187--207. Google ScholarDigital Library
- Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine. in Proc. 7th Intl. Conf. on the World Wide Web, 1998, 107--117. Google ScholarDigital Library
- Albert Chan and Frank Dehne, CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines. Intl. J. of High Performance Computing Applications 19(1), 2005, 81--97.Google ScholarDigital Library
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber, Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comp. Syst. 26(2), Art. 4, 2008. Google ScholarDigital Library
- Boris V. Cherkassky, Andrew V. Goldberg, and Tomasz Radzik, Shortest paths algorithms: Theory and experimental evaluation. Mathematical Programming 73, 1996, 129--174. Google ScholarDigital Library
- Jonathan Cohen, Graph Twiddling in a MapReduce World. Comp. in Science & Engineering, July/August 2009, 29--41. Google ScholarDigital Library
- Joseph R. Crobak, Jonathan W. Berry, Kamesh Madduri, and David A. Bader, Advanced Shortest Paths Algorithms on a Massively-Multithreaded Architecture. in Proc. First Workshop on Multithreaded Architectures and Applications, 2007, 1--8.Google Scholar
- John T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems 22, 2006, 303--312. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters. in Proc. 6th USENIX Symp. on Operating Syst. Design and Impl., 2004, 137--150. Google ScholarDigital Library
- Edsger W. Dijkstra, A Note on Two Problems in Connexion with Graphs. Numerische Mathematik 1, 1959, 269--271.Google ScholarDigital Library
- Martin Erwig, Inductive Graphs and Functional Graph Algorithms. J. Functional Programming 1(5), 2001, 467--492. Google ScholarDigital Library
- Lester R. Ford, L. R. and Delbert R. Fulkerson, Flows in Networks. Princeton University Press, 1962.Google Scholar
- Ian Foster and Carl Kesselman (Eds), The Grid 2: Blueprint for a New Computing Infrastructure (2nd edition). Morgan Kaufmann, 2003. Google ScholarDigital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System. in Proc. 19th ACM Symp. on Operating Syst. Principles, 2003, 29--43. Google ScholarDigital Library
- Michael T. Goodrich and Roberto Tamassia, Data Structures and Algorithms in JAVA. (second edition). John Wiley and Sons, Inc., 2001. Google ScholarDigital Library
- Mark W. Goudreau, Kevin Lang, Satish B. Rao, Torsten Suel, and Thanasis Tsantilas, Portable and Efficient Parallel Computing Using the BSP Model. IEEE Trans. Comp. 48(7), 1999, 670--689. Google ScholarDigital Library
- Douglas Gregor and Andrew Lumsdaine, The Parallel BGL: A Generic Library for Distributed Graph Computations. Proc. of Parallel Object-Oriented Scientific Computing (POOSC), July 2005.Google Scholar
- Douglas Gregor and Andrew Lumsdaine, Lifting Sequential Graph Algorithms for Distributed-Memory Parallel Computation. in Proc. 2005 ACM SIGPLAN Conf. on Object-Oriented Prog., Syst., Lang., and Applications (OOPSLA'05), October 2005, 423--437. Google ScholarDigital Library
- Jonathan L. Gross and Jay Yellen, Graph Theory and Its Applications. (2nd Edition). Chapman and Hall/CRC, 2005. Google ScholarDigital Library
- Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart, Exploring network structure, dynamics, and function using NetworkX. in Proc. 7th Python in Science Conf., 2008, 11--15.Google Scholar
- Jonathan Hill, Bill McColl, Dan Stefanescu, Mark Goudreau, Kevin Lang, Satish Rao, Torsten Suel, Thanasis Tsantilas, and Rob Bisseling, BSPlib: The BSP Programming Library. Parallel Computing 24, 1998, 1947--1980. Google ScholarDigital Library
- Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly, Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. in Proc. European Conf. on Computer Syst., 2007, 59--72. Google ScholarDigital Library
- Paris C. Kanellakis and Alexander A. Shvartsman, Fault-Tolerant Parallel Computation. Kluwer Academic Publishers, 1997. Google ScholarDigital Library
- Donald E. Knuth, Stanford GraphBase: A Platform for Combinatorial Computing. ACM Press, 1994. Google Scholar
- U. Kung, Charalampos E. Tsourakakis, and Christos Faloutsos, Pegasus: A Peta-Scale Graph Mining System - Implementation and Observations. Proc. Intl. Conf. Data Mining, 2009, 229--238. Google ScholarDigital Library
- Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan W. Berry, Challenges in Parallel Graph Processing. Parallel Processing Letters 17, 2007, 5--20.Google ScholarCross Ref
- Kamesh Madduri, David A. Bader, Jonathan W. Berry, and Joseph R. Crobak, Parallel Shortest Path Algorithms for Solving Large-Scale Graph Instances. DIMACS Implementation Challenge - The Shortest Path Problem, 2006.Google Scholar
- Kamesh Madduri, David Ediger, Karl Jiang, David A. Bader, and Daniel Chavarria-Miranda, A Faster Parallel Algorithm and Efficient Multithreaded Implementation for Evaluating Betweenness Centrality on Massive Datasets, in Proc. 3rd Workshop on Multithreaded Architectures and Applications (MTAAP'09), Rome, Italy, May 2009.Google Scholar
- Grzegorz Malewicz, A Work-Optimal Deterministic Algorithm for the Certified Write-All Problem with a Nontrivial Number of Asynchronous Processors. SIAM J. Comput. 34(4), 2005, 993--1024. Google ScholarDigital Library
- Kurt Mehlhorn and Stefan Näher, The LEDA Platform of Combinatorial and Geometric Computing. Cambridge University Press, 1999. Google ScholarDigital Library
- Ulrich Meyer and Vitaly Osipov, Design and Implementation of a Practical I/O-efficient Shortest Paths Algorithm. in Proc. 3rd Workshop on Multithreaded Architectures and Applications (MTAAP'09), Rome, Italy, May 2009.Google ScholarCross Ref
- Ulrich Meyer and Peter Sanders, Δ-stepping: A Parallelizable Shortest Path Algorithm. J. Algorithms 49(1), 2003, 114--152. Google ScholarDigital Library
- Richard Miller, A Library for Bulk-Synchronous Parallel Programming. in Proc. British Computer Society Parallel Processing Specialist Group Workshop on General Purpose Parallel Computing, 1993.Google Scholar
- Kameshwar Munagala and Abhiram Ranade, I/O-complexity of graph algorithms. in Proc. 10th Annual ACM-SIAM Symp. on Discrete Algorithms, 1999, 687--694. Google ScholarDigital Library
- Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins, Pig Latin: A Not-So-Foreign Language for Data Processing. in Proc. ACM SIGMOD Intl. Conf. on Management of Data, 2008, 1099--1110. Google ScholarDigital Library
- Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan, Interpreting the Data: Parallel Analysis with Sawzall. Scientific Programming Journal 13(4), Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure, 2005, 227--298. Google ScholarDigital Library
- Protocol Buffers-Google's data interchange format. http://code.google.com/p/protobuf/ 2009.Google Scholar
- Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine, The Boost Graph Library: User Guide and Reference Manual. Addison Wesley, 2002. Google ScholarDigital Library
- Mikkel Thorup, Undirected Single-Source Shortest Paths with Positive Integer Weights in Linear Time. J. ACM 46(3), May 1999, 362--394. Google ScholarDigital Library
- Leslie G. Valiant, A Bridging Model for Parallel Computation. Comm. ACM 33(8), 1990, 103--111. Google ScholarDigital Library
- Andy Yoo, Edmond Chow, Keith Henderson, William McLendon, Bruce Hendrickson, and Umit Catalyurek, A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L, in Proc. 2005 ACM/IEEE Conf. on Supercomputing (SC'05), 2005, 25--43. Google ScholarDigital Library
- Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey, DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. in Proc. 8th USENIX Symp. on Operating Syst. Design and Implementation, 2008, 10--14. Google ScholarDigital Library
Index Terms
- Pregel: a system for large-scale graph processing
Recommendations
Towards highly scalable pregel-based graph processing platform with x10
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebMany practical computing problems concern large graph. Standard problems include web graph analysis and social networks analysis like Facebook, Twitter. The scale of these graph poses challenge to their efficient processing. To efficiently process large-...
Conditional Edge-Fault Hamiltonicity of Cartesian Product Graphs
A graph $(G)$ is conditional $(k)$-edge-fault Hamiltonian if it remains Hamiltonian after deleting at most $(k)$ edges and each vertex incident to at least two nonfaulty edges. A graph $(G)$ is $(k)$-edge-fault Hamiltonian-connected if it remains ...
A dynamic distributed approach to representing proper interval graphs
First studied by Brodal and Fagerberg [G.S. Brodal, R. Fagerberg, Dynamic representation of sparse graphs, in: Algorithms and Data Structures, Proceedings of the 6th International Workshop, Vancouver, Canada, in: Lecture Notes in Computer Science, vol. ...
Comments