Transcript PPT
PAGE: A Partition Aware Graph Computation Engine Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma EECS, Peking University, China Agenda Background • Design of PAGE • Experiment result • Conclusion 2/19 Background • Prevalent large scale graphs – Social networks – Web graph –… • Graph computing systems – – – – – Pregel (Google) Giraph (Apache) GPS (Stanford) GraphLab (CMU) … 3/19 Background • Graph Partitioning – Offline approach • METIS (Karypis Lab) – Online approach • Streaming partitioning • Linear Deterministic Greedy(LDG) algorithm (I. Stanton) Problem: The existing graph computation systems cannot efficiently integrate the high-quality graph partitioning. 4/19 The high-quality graph partitioning leads to the worse overall performance. The graph partitioning quality is improved from left to right. Average time(s/iteration) Inefficient partition integrating 80 70 60 overall cost sync remote comm. cost local comm. cost 50 40 30 20 10 0 Partition Scheme Running PageRank on Giraph with six different graph partition qualities. 5/19 Motivation of the PAGE A Novel Graph Computation Engine Low-Quality High-QualityGraph GraphPartition Partition Call for a novel graph computation engine to efficiently integrate graph partitioning with various qualities. 6/19 Agenda • Background Design of PAGE • Experiment result • Conclusion 7/19 Message processor Message Processor Message Process Unit Message Process Unit msg. Message Block msg. msg. msg. msg. msg. msg. Header msg. msg. msg. msg. msg. … msg. Message Process Unit Message Process Unit Message Process Unit 8/19 Average time(s/iteration) Inefficient partition integrating 80 70 60 overall cost sync remote comm. cost local comm. cost 50 40 30 20 10 0 The local message processing cost dominates the overall cost. The existing systems cannot provide enough local message processor. Partition Scheme Running PageRank on Giraph with six different graph partition qualities. 9/19 Overview of the PAGE PAGE worker1 PAGE worker2 PAGE worker3 Computation Computation Computation Partition Comm. Aware Partition Comm. Aware Partition Comm. Aware Distributed In-Memory Partitioned Graph PAGE applies adaptively tuning mechanism and new cooperation methods. 10/19 New Designed PAGE Worker Computation Partition Aware Monitor DCCM Communication Sender Receiver Dual Concurrent MP Local MP Remote MP 11/19 Dual Concurrent Message Processor • First type concurrency – A remote MP and a local MP are embedded • Second type concurrency – A set of message process units are contained by each message processor Dual Concurrent MP Local MP Remote MP • The concurrency is automatically determined by the system itself. 12/19 Dynamic Concurrency Control Model • The DCCM determines the proper parameters, such as nmp , nmpl , nmpr . • The DCCM is built on top of two heuristic rules. – Ability Lower-bound. – Workload Balance Ratio. Partition Aware Monitor DCCM • Monitor – Tracks the necessary metrics 13/19 Agenda • Background • Design of PAGE Experiment result • Conclusion 14/19 Environment & Datasets • Experiment Environment – a 24 nodes cluster • Dataset: the uk-2007-05-u. – Undirected – Vertex #: 105,153,952 – Edge #: 6,603,753,128 • Benchmark: PageRank Scheme Edge Cut Random 98.52% LDG1 82.88% LDG2 75.69% LDG3 66.37% LDG4 56.34% METIS 3.48% Partition qualities Balance factor: < 1%. 15/19 Partition Awareness in PAGE 70 30 25 20 15 overrall cost sync remote comm. cost sync local comm. cost 10 5 0 Partition Scheme PAGE Average time(s/iteration) Average time(s/iteration) 35 60 50 overall cost sync remote comm. cost sync local comm. cost 40 30 20 10 0 Partition Scheme Giraph 16/19 Compare with the naive solution Average time(s/iteration) 80 70 Giraph Giraph-GPSop PAGE 60 50 40 30 20 10 0 Partition Scheme * The Giraph-GPSop is the naive solution. 17/19 Contribution & Conclusion • We identify the problem of partition unaware inefficiency. • We set up a new partition aware graph computation engine, PAGE. • We design a Dynamic Concurrency Control Model based on several heuristic rules to better profile the characters of graph partition. • At last, we demonstrate PAGE’s robustness and efficiency on different graph partition qualities. 18/19 Email: [email protected] 19/19