TDD: Research Topics in Distributed Databases

Download Report

Transcript TDD: Research Topics in Distributed Databases

QSX: Querying Social Graphs
Graph algorithms in MapReduce
 MapReduce: an introduction
 BFS for distance queries
 PageRank
 Keyword search
 Subgraph isomorphism
1
Motivation for studying parallel algorithms
Graph queries are costly
 BFS for reachability: linear time O(|V| + |E|)
 kNN join: quadratic time
 Graph pattern matching by graph simulation: quadratic time
Worse still
regular
Graph
path queries (finding simple paths): NP-complete
pattern matching via subgraph isomorphism: NP-complete
Furthermore, real-life graphs are typically big
Facebook:
1.38 billion nodes and 140 billion links
Can we efficiently evaluate graph queries?
2
The impact of the sheer volume of big data
Using SSD of 6G/s, a linear scan of a data set D would take

1.9 days when D is of 1PB (1015B)

5.28 years when D is of 1EB (1018B)
A departure from classical computational complexity theory

Traditional computational complexity theory of almost 50 years:
•
The good: polynomial time computable (PTIME)
•
The bad: NP-hard (intractable)
•
The ugly: PSPACE-hard, EXPTIME-hard, undecidable…
Is it feasible to query real-life big graphs?
3
Parallel query answering
We can do better provided more resources
Using 10000 SSD of 6G/s, a linear scan of D might take:
 1.9 days/10000 = 16 seconds when D is of 1PB (1015B)
 5.28 years/10000 = 4.63 days when D is of 1EB (1018B)
interconnection network
Only ideally!
P
P
P
M
M
M
DB
DB
DB
10,000 processors
How to capitalize on the resources and reduce response time?
4
Pipelined parallelism
 Pipelining: a sequence of operations on each data item, one
conducted by a processors
interconnection network
P
P
P
M
M
M
DB
DB
DB
op2
opn
op1
data
What are the limitations?
5
Parallel query answering
Given a big graph G, and n processors S1, …, Sn
 G is partitioned into fragments (G1, …, Gn)
 G is distributed to n processors: Gi is stored at Si
Each processor Si processes operations for a query on its local
fragment Gi, in parallel
Q(
Q( G1
) Q( G2
G
)
)
…
Q( Gn
)
Dividing a big G into small fragments of manageable size
6
MapReduce
7
MapReduce
 A programming model with two primitive functions:
 Map: <k1, v1>  list (k2, v2)
Google
 Reduce: <k2, list(v2)>  list (k3, v3)
 Input: a list <k1, v1> of key-value pairs
 Map: applied to each pair, computes key-value pairs <k2, v2>
•
The intermediate key-value pairs are hash-partitioned based
on k2. Each partition (k2, list(v2)) is sent to a reducer
 Reduce: takes a partition as input, and computes key-value
pairs <k3, v3>
The process may reiterate –
How doesmultiple
it work?map/reduce steps
8
Architecture (Hadoop)
<k1, v1>
<k1, v1> <k1, v1>
<k1, v1>
mapper
mapper
mapper
One block for each
Stored mapper
in DFS (a map task)
Partitioned in blocks (64M)
<k2, v2>
<k2, v2>
<k2, v2>
In local store of mappers
Hash partition (k2)
reducer
reducer
Multiple steps
<k3, v3>
<k3, v3>
Aggregate results
No need to worry about how the data is stored and sent
9
parallelism
<k1, v1> <k1, v1> <k1, v1>
mapper
<k2, v2>
mapper
<k2, v2>
<k1, v1>
mapper
<k2, v2>
Parallel
computation
What parallelism?
reducer
reducer
<k3, v3>
<k3, v3>
Parallel
computation
Data partitioned parallelism
10
Popular in industry
 Apache Hadoop, used by Facebook, Yahoo, …
• Hive, Facebook, HiveQL (SQL)
• PIG, Yahoo, Pig Latin (SQL like)
• SCOPE, Microsoft, SQL, Cassandra, Facebook, CQL (no join)
• HBase, Google, distributed BigTable
• MongoDB, document-oriented (NoSQL)
 Scalability
– Yahoo!: 10,000 cores, for Web search queries (2008)
– Facebook: 100 PB, about half a PB per day
– Amazon Elastic Compute Cloud (EC2) and Amazon Simple
Storage Service (S3); New York Time used 100 EC2 instances to
process 4TB of image data, $240
Study Spark: https://spark.apache.org/
11
Advantages of MapReduce
 Simple: one only needs to define two functions
no need to worry about how the data is stored, distributed and how
the operations are scheduled
 scalability: a large number of low end machines
• scale out (scale horizontally): adding a new computer to a
distributed software application; lost-cost “commodity”
• scale up (scale vertically): upgrade, add (costly) resources
to a single node
 independence: it can work with various storage layers
 flexibility: independent of data models or schema
Fault tolerance: why?
12
Fault tolerance
<k1, v1> <k1, v1> <k1, v1>
<k1, v1>
triplicated
mapper
mapper
mapper
<k2, v2>
<k2, v2>
<k2, v2>
reducer
reducer
<k3, v3>
<k3, v3>
Detecting failures and
reassigning the tasks of failed
nodes to healthy nodes
Redundancy checking to
achieve load balancing
Able to handle an average of 1.2 failures per analysis job
13
MapReduce algorithms
Input: query Q and graph G
Output: answers Q(G) to Q in G
map(key: node, value: (adjacency-list, others) )
{computation;
Match rkey, rvalue when
multiple iterations of
MapReduce are needed
emit (mkey, mvalue)
}
Match mkey, mvalue
reduce(key: __ , value: list[value] )
{…
emit (rkey, rvalue)
}
compatibility
14
Control flow
 Copy files from input directory staging dir 1; preprocessing
while (termination condition is not satisfied) do {
a) map from staging dir 1;
Iterations of MapReduce
b) reduce into staging dir 2;
c) move files from staging dir 2 to staging dir 1
}
 Postprocessing; move files from staging dir 2 to output dir
 Termination: non-MapReduce driver program
Functional programming
No global data structures accessible and mutable by all
15
BFS for distance queries
16
Dijkstra’s algorithm for distance queries


Distance: single-source shortest-path problem
• Input: A directed weighted graph G, and a node s in G
• Output: The lengths of shortest paths from s to all nodes in G
Dijkstra (G, s, w):
Use a priority queue Que; w(u, v):
weight of edge (u, v); d(u): the
distance from s to u
for all nodes v in V do
a. d[v]  ;
2. d[s]  0; Que  V;
3. while Que is nonempty do
Extract one with the minimum d(u)
a. u  ExtractMin(Que);
b. for all nodes v in adj(u) do
a) if d[v] > d[u] + w(u, v) then d[v]  d[u] + w(u, v);
MapReduce?
1.
O(|V| log|V| + |E|).
17
A MapReduce algorithm
Input: graph G, represented by adjacency lists
 Node N:
• Node id: nid n
• N.distance: from start node s to N
• N.AdjList: [(m, w(n, m))], node id and weight of edge (n, m)
 Key: node id n
Different structures
 Value of node N:
•
Distance: from start node s to n got so far
•
Node N (id, AdjList, etc)
key-values pairs
18
Mapper
Map (nid n, node N)
•
•
•
d  N.distance; Why?
emit( nid n, N);
for each (m, w) in N.AdjList do
• emit( nid m, d + w(n, m));
Revise distance of m
via n
Parallel processing
 all nodes are processed in parallel, each by a mapper
 for each node m adjacent to n, emit a revised distance via n
 emit (nid: n, N): preserve graph structure for iterative processing
Data-partitioned parallelism
19
Reducer
Reduce (nid m, list)
Group by node id
Each d in list is either
 a distance to m from a predecessor
 or node m
dmin  ;
for all d in list do
• if IsNode(d)
Always be there. Why?
• then M  d;
• else if d < dmin
Minimum distance so far
• then dmin  d;
• M.distance  dmin;
Update M.distance for this round
• emit (nid m, node M);
•
•
list for m:
 distances from all predecessors so far
 Node M: must exist (from Mapper)
Current M.distance: minimum from all predecessors
20
Iterations and termination
Each MapReduce iteration advances the “known frontier” by one
hop
 Subsequent iterations include more and more reachable nodes
as frontier expands
 Multiple iterations are needed to explore entire graph
Termination: when the intermediate result no longer changes
 For no node n, N.distance is changed in the last round
 controlled by a non-MapReduce driver
 Use a flag – inspected by non-MapReduce driver
Termination control
21
Iteration 0: Base case
mapper:
reducer:
(a,<c,10>) (c,<s,5>) edges
(a,<10, ...>) (c,<5, ...>)
a
"Wave"
1
∞
∞
b
10
2
3
9
5
6
4
s 0
7
c
∞
2
∞
d
22
Iteration 1
mapper:
reducer:
(a,<s,10>) (c,<s,5>) (a,<c,8>) (c,<a,12>) (b,<a,11>)
(b,<c,14>) (d,<c,7>) edges
(a,<8, ...>) (c,<5, ...>) (b,<11, ...>) (d,<7, ...>)
group (a,<s,10>) and (a,<c,8>)
a
"Wave"
1
10
∞
b
10
2
3
9
5
6
4
s 0
7
c
5
2
∞
d
23
Iteration 2
mapper:
reducer:
(a,<s,10>) (c,<s,5>) (a,<c,8>) (c,<a,12>) (b,<a,11>)
(b,<c,14>) (d,<c,7>) (b,<d,13>) (d,<b,15>) edges
(a,<8>) (c,<5>) (b,<9>) (d,<7>)
a
1
8
9
b
"Wave"
10
2
3
9
5
6
4
s 0
7
c
5
2
7
d
24
Iteration 3
mapper:
reducer:
(a,<s,10>) (c,<s,5>) (a,<c,8>) (c,<a,9>) (b,<a,11>)
(b,<c,14>) (d,<c,7>) (b,<d,13>) (d,<b,15>) edges
(a,<8>) (c,<5>) (b,<11>) (d,<7>)
No change!
Convergence!
b
1
a
11
8
10
2
3
9
5
6
4
s 0
7
c
5
2
7
d
25
Efficiency?
MapReduce explores all paths in parallel
Each MapReduce iteration advances the “known frontier” by one
hop
• Redundant work, since useful work is only done at the
“frontier”
Dijkstra’s algorithm is more efficient
• At any step it only pursues edges from the minimum-cost
path inside the frontier
skew
Any other sources of inefficiency?
26
A closer look
Data partitioned parallelism
 Local computation at each node in mapper, in parallel: attributes
of the node, adjacent edges and local link structures
 Propagating computations: traversing the graph; this may
involve iterative MapReduce
Tips:
 Adjacency lists
 Local computation in mapper;
 Pass along partial results via outlinks, keyed by destination node;
 Perform aggregation in reducer on inlinks to a node
 Iterate until convergence: controlled by external “driver”
 pass graph structures between iterations
Need a way to test for convergence
27
PageRank
28
PageRank
The likelihood that page v is visited by a random walk:
 (1/|V|) + (1 - ) _(u  L(v)) P(u)/C(u)
random jump

following a link from other pages
Recursive computation: for each page v in G,
• compute P(v) by using P(u) for all u  L(v)
until
• converge: no changes to any P(v)
• after a fixed number of iterations
How to speed it up?
29
A MapReduce algorithm
Input: graph G, represented by adjacency lists
 Node N:
• Node id: nid n
• N.rank: the current rank
• N.AdjList: [m], node id
 Key: node id n
 Value of node N:
•
rank: a rank of a node
•
Node N (id, AdjList, etc)
 Simplified version:
_(u  L(v)) P(u)/C(u)
Assume that  = 0
30
Mapper
Map (nid n, node N)
•
•
•
P(u)/C(u)
p  N.rank/|N.AdjList|;
emit( nid n, N);
for each m in N.AdjList do
• emit( nid m, p);
Pass rank to neighbors
Parallel processing
 all nodes are processed in parallel, each by a mapper
 Pass PageRank at n to successors of n
 emit (nid: n, N): preserve graph structure for iterative processing
Local computation in mapper
31
Reducer
Reduce (nid m, list)
s  0;
for all p in list do
Recover graph structure
• if IsNode(p)
• then M  p;
Sum up
• else s  s + p;
• M.rank  s;
With updated M.rank for this round
• emit (nid m, node M);
•
•
 list for m: P(u)/C(u) from all predecessors of m

m.rank at the end: _(u  L(v)) P(u)/C(u)
Aggregation in reducer
32
PageRank in MapReduce
n1 [n2, n4]
n2 [n3, n5]
n2
n3
n3 [n4]
n4 [n5]
n4
n5
n5 [n1, n2, n3]
Map
n4
n5
n1
n2
n3
Reduce
n1
n2
n2
n1 [n2, n4] n2 [n3, n5]
n3
n3 [n4]
n3
n4
n4
n4 [n5]
n5
n5 [n1, n2, n3]
Termination control: external driver
Acknowledgments: some animation slides are borrowed from
www.cs.kent.edu/~jin/Cloud12Spring/GraphAlgorithms.pptx
n5
Keyword search
34
Distinct-root trees
Input: A list Q = (k1, …, km) of keywords, a directed graph G, and
a positive integer k
 Output: distinct trees that match Q bounded by k


Match: a subtree T = (r, (k1, p1, d1(r, p1)), …, (km, pm, dm(r, pm)) of
G such that
• each keyword ki in Q is contained in a leaf pi of T
• pi is closest to r among all nodes that contain ki
• the distance from the root r of T to the lead does not exceed k
A simplified version
k  dj(r, pj): k iterations (termination condition)
35
An MapReduce algorithm
Input: graph G, represented by adjacency lists
 Node N:
• Node id: nid n
• N.((K1, P1, D1), …, (Km, Pm, Dm) : representing (n, (k1, p1,
d1(n, p1)), …, (km, pm, dm(n, pm))
• N.AdjList: [m], node id
 Key: node id n
 Preprocessing: N.((K1, P1, D1), …, (Km, Pm, Dm):
•
P1 =  and Dm =  if N does not contain km
•
P1 = n and Dm = 0 otherwise
Preprocessing: can be done in MapReduce itself
36
Mapper
Map (nid n, node N)
•
•
emit( nid n, N);
m is the node id of node M
for each m in N.AdjList do
• emit( nid n, (M.(K1, P1, D1+1), …, M.(Km, Pm, Dm+1));
Local computation:
 Shortcut one node
 One hop forward
Contrast this to, e.g., PageRank
Pass information from successors
37
Reducer
Reduce (nid n, list)
N: the node represented
by n; must be in list
•
for i from 1 to m do
• pi  N. Pi; di  N. di;
Group by keyword ki
• for i from 1 to m do
•
Si  the set of all M.(Ki, Pi, Di) in list
• di  the smallest M.Di; pi  the corresponding M. Di;
• for i from 1 to m do
• N.Pi  pi; N.Di  di;
Pick the one with the shortest
distance to n
• emit (nid n, node N);
 Invariant: in iteration j, N.((K1, P1, D1), …, (Km, Pm, Dm)
represents (n, (k1, p1, d1(n, p1)), …, (km, pm, dm(n, pm))
Shortest distances within j hops
38
Termination and post-processing
Termination: after k iterations, for a given positive integer k
Post-processing: upon termination, for each node n, where
N.((K1, P1, D1), …, (Km, Pm, Dm)
 If no Pi =  for i from 1 to m, then
N.((K1, P1, D1), …, (Km, Pm, Dm) represents a valid match
(n, (k1, p1, d1(n, p1)), …, (km, pm, dm(n, pm))
A different way of passing information during traversal
39
Graph pattern matching by subgraph isomorphism
40
Graph pattern matching by subgraph isomorphism
 Input: a query Q and a data graph G,
 Output: all the matches of Q in G, i.e, all subgraphs of G that
are isomorphic to Q
a bijective function f on nodes:
(u,u’ ) ∈ Q iff (f(u), f(u’)) ∈ G
MapReduce?
NP-complete
41
An MapReduce algorithm
Input: graph G, represented by adjacency lists
 Node N:
• Node id: nid n
• N.Gd: the subgraph of G rooted at n, consisting of nodes
within d hops of n
d: the radius of Q
• N.AdjList: [m], node id
 Key: node id n
 Preprocessing: for each node n, computes N.Gd
•
A MapReduce algorithm of d iterations
•
adjacency lists are only used in the preprocessing step
Two MapReduce steps: preprocessing, and computation
42
Algorithm
Invoke any algorithm for subgraph
isomorphism: VF2, Ullman
Map (nid n, node N)
•
•
compute all matches S of Q in N.Gd
emit(1, S);
not necessary; just to eliminate duplicates
reduce (1, list)
•
•
M  the union of all sets in list
emit(M, 1);
Yes, data locality
 Show the correctness? All and only isomorphic mappings?
 Parallel scalability? The more processors, the faster?
Lot of redundant computations
Yes, as long as the number of processors
does not exceed the number of nodes of G
Just a conceptual level evaluation
43
Summing up
44
Summary and review
 Why do we need parallel algorithms for querying graphs?
 What is the MapReduce framework?
 How to develop graph algorithms in MapReduce?
– Graph representation
– Local computation in mapper
– Aggregation in reducer
– Termination
 Graph algorithms in MapReduce may not be efficient. Why?
 Develop your own graph algorithms in MapReduce. Give
correctness proof, complexity analysis and performance
guarantees for your algorithms
45
Project (1)
Recall strongly connected components (Lecture 2)




Implement a MapReduce algorithm that, given a graph G,
computes all (maximum) strongly connected components of G
Develop optimization strategies
Experimentally evaluate your algorithm, especially its scalability
with the size of G
Write a survey on parallel algorithms for computing strongly
connected components, as part of the related work.
A development project
46
Project (2)
Recall strongly kNN joins (Lecture 2)




Implement a MapReduce algorithm for evaluating kNN join queries
Develop optimization strategies
Experimentally evaluate your algorithm
Write a survey on parallel algorithms for kNN queries and kNN join
queries, as part of the related work.
A development project
47
Project (3)
Recall keyword search with Steiner-tree semantics (Lecture 2)




Implement a MapReduce algorithm for keyword search with
Steiner-tree semantics
Develop optimization strategies
Experimentally evaluate your algorithm
Write a survey on parallel algorithms for keyword search, as part of
the related work.
A development project
48
Papers for you to review
•
W. Fan, F. Geerts, and F. Neven. Making Queries Tractable on Big Data
with Preprocessing, VLDB 2013
•
Y. Tao, W. Lin. X. Xiao. Minimal MapReduce Algorithms (MMC)
http://www.cse.cuhk.edu.hk/~taoyf/paper/sigmod13-mr.pdf
•
L. Qin, J. Yu, L. Chang, H. Cheng, C. Zhang, Xuemin Lin: Scalable big
graph processing in MapReduce. SIGMOD 2014.
http://www1.se.cuhk.edu.hk/~hcheng/paper/SIGMOD2014qin.pdf
•
W. Lu, Y. Shen, S. Chen, B. Ooi: Efficient Processing of k Nearest
Neighbor Joins using MapReduce. PVLDB 2012.
http://arxiv.org/pdf/1207.0141.pdf
•
V. Rastogi, A. Machanavajjhala, L. Chitnis, A. Sarma: Finding connected
components in map-reduce in logarithmic rounds. ICDE
2013http://arxiv.org/pdf/1203.5387.pdf
49