Transcript PPT

PAGE: A Partition Aware
Graph Computation Engine
Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma
EECS, Peking University, China
Agenda
Background
• Design of PAGE
• Experiment result
• Conclusion
2/19
Background
• Prevalent large scale graphs
– Social networks
– Web graph
–…
• Graph computing systems
–
–
–
–
–
Pregel (Google)
Giraph (Apache)
GPS (Stanford)
GraphLab (CMU)
…
3/19
Background
• Graph Partitioning
– Offline approach
• METIS (Karypis Lab)
– Online approach
• Streaming partitioning
• Linear Deterministic Greedy(LDG) algorithm (I. Stanton)
Problem:
The existing graph computation systems cannot efficiently integrate
the high-quality graph partitioning.
4/19
The high-quality graph
partitioning leads to
the worse overall
performance.
The graph partitioning
quality is improved
from left to right.
Average time(s/iteration)
Inefficient partition integrating
80
70
60
overall cost
sync remote comm. cost
local comm. cost
50
40
30
20
10
0
Partition Scheme
Running PageRank on Giraph with six different graph partition qualities.
5/19
Motivation of the PAGE
A Novel Graph Computation
Engine
Low-Quality
High-QualityGraph
GraphPartition
Partition
Call for a novel graph computation engine to efficiently
integrate graph partitioning with various qualities.
6/19
Agenda
• Background
Design of PAGE
• Experiment result
• Conclusion
7/19
Message processor
Message
Processor
Message Process Unit
Message Process Unit
msg.
Message Block
msg.
msg.
msg.
msg.
msg.
msg.
Header
msg.
msg.
msg.
msg.
msg.
…
msg.
Message Process Unit
Message Process Unit
Message Process Unit
8/19
Average time(s/iteration)
Inefficient partition integrating
80
70
60
overall cost
sync remote comm. cost
local comm. cost
50
40
30
20
10
0
The local message
processing cost
dominates the overall
cost.
The existing systems
cannot provide enough
local message
processor.
Partition Scheme
Running PageRank on Giraph with six different graph partition qualities.
9/19
Overview of the PAGE
PAGE worker1
PAGE worker2
PAGE worker3
Computation
Computation
Computation
Partition
Comm.
Aware
Partition
Comm.
Aware
Partition
Comm.
Aware
Distributed In-Memory Partitioned Graph
PAGE applies adaptively tuning mechanism and new cooperation methods.
10/19
New Designed PAGE Worker
Computation
Partition
Aware
Monitor
DCCM
Communication
Sender
Receiver
Dual Concurrent MP
Local
MP
Remote
MP
11/19
Dual Concurrent Message Processor
• First type concurrency
– A remote MP and a local MP are
embedded
• Second type concurrency
– A set of message process units
are contained by each message
processor
Dual Concurrent MP
Local
MP
Remote
MP
• The concurrency is
automatically determined by
the system itself.
12/19
Dynamic Concurrency Control Model
• The DCCM determines the proper
parameters, such as nmp , nmpl ,
nmpr .
• The DCCM is built on top of two
heuristic rules.
– Ability Lower-bound.
– Workload Balance Ratio.
Partition
Aware
Monitor
DCCM
• Monitor
– Tracks the necessary metrics
13/19
Agenda
• Background
• Design of PAGE
Experiment result
• Conclusion
14/19
Environment & Datasets
• Experiment Environment
– a 24 nodes cluster
• Dataset: the uk-2007-05-u.
– Undirected
– Vertex #: 105,153,952
– Edge #: 6,603,753,128
• Benchmark: PageRank
Scheme
Edge Cut
Random
98.52%
LDG1
82.88%
LDG2
75.69%
LDG3
66.37%
LDG4
56.34%
METIS
3.48%
Partition qualities
Balance factor: < 1%.
15/19
Partition Awareness in PAGE
70
30
25
20
15
overrall cost
sync remote comm. cost
sync local comm. cost
10
5
0
Partition Scheme
PAGE
Average time(s/iteration)
Average time(s/iteration)
35
60
50
overall cost
sync remote comm. cost
sync local comm. cost
40
30
20
10
0
Partition Scheme
Giraph
16/19
Compare with the naive solution
Average time(s/iteration)
80
70
Giraph
Giraph-GPSop
PAGE
60
50
40
30
20
10
0
Partition Scheme
* The Giraph-GPSop is the naive solution.
17/19
Contribution & Conclusion
• We identify the problem of partition unaware
inefficiency.
• We set up a new partition aware graph
computation engine, PAGE.
• We design a Dynamic Concurrency Control
Model based on several heuristic rules to better
profile the characters of graph partition.
• At last, we demonstrate PAGE’s robustness and
efficiency on different graph partition qualities.
18/19
Email:
[email protected]
19/19