Transcript Slides - Yubao Wu
Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs
Yubao Wu 1 , Ruoming Jin 2 , Xiang Zhang 1 1 Case Western Reserve University, 2 Kent State University Speaker: Yubao Wu
K-Nearest Neighbor Query in Graphs Which nodes are most similar to the query node ?
Query Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
K-Nearest Neighbor Query —— Challenges 1) How to design proximity measures that can effectively capture the similarity between nodes ?
2) How to efficiently identify the top 𝑘 given measure ?
nodes for a Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Proximity Measures a) Shortest path distance b) Network flow c) Katz score
d) Random walk based: 1) Hitting time 2) Random walk with restart 3) Commute time
• • • • Discounted hitting time Truncated hitting time Penalized hitting probability Degree normalized RWR Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Computational Methods for KNN Query
Methods
Global iteration (GI) Castanet [1]
Key Idea
Iterative method Improved GI Matrix based [2] Graph embedding [3] Matrix decomposition Graph embedding
Pre-computation?
Applicability
No No Wide RWR Yes Yes RWR HT / RWR / CT Disadvantages: • Iterating over the entire graph • Pre-computing step is expensive [1] Y. Fujiwara, et al. SIGMOD’13 [2] Tong’ICDM’06; Fujiwara’KDD’12; Fujiwara’VLDB’12 [3] X. Zhao, et al. VLDB’13
K-Nearest Neighbor Query —— Challenge Challenge: An efficient local search method?
• Guarantees the exactness • Applies to different measures Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Our Method —— FLoS (Fast Local Search)
Contributions:
1) Exact top 𝑘 nodes 2) General method (a variety of proximity measures) 3) Simple local search strategy • no preprocessing • no global iteration Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
No Local Maximum Property Query Query Grid graph 20 20 Local maximum No local maximum With local maximum Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Measures With and Without Local Maximum
Abbr.
HT DHT THT PHP EI RWR CT
Proximity measures
Hitting time Discounted hitting time Truncated hitting time Penalized hitting probability Effective importance (degree normalized RWR) Random walk with restart Commute time
Local maximum ?
No No No No No Yes Yes Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Local Search Process Query node Visited node Boundary node Unvisited node 1 Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Bounding the Unvisited Nodes Query Query Grid graph 20 20 Local maximum Visited Unvisited Boundary Boundary No local maximum With local maximum Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Bounding the Visited Nodes Upper bound Exact proximity value Lower bound Query Visited node Unvisited node Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Bounding the Visited Nodes —— Monotonicity Upper bound Exact proximity value Lower bound Query Visited node Unvisited node Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Running Example Query Iteration Newly visited nodes 1 {2,3} 2 {4} 3 {5} 4 {6,7} 5 {8} Toy graph Top-2 nodes Trend of the bounds Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Relationships Among Proximity Measures • • •
Penalized hitting probability Effective importance Discounted hitting time
Theorem: PHP, EI, and DHT give the same ranking results.
•
Random walk with restart
Theorem: RWR 𝑖 ∝ degree(𝑖) ∙ PHP(𝑖) Note: RWR has local maximum.
Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Experiments —— Datasets Real
Datatsets
Synthetic Amazon DBLP Youtube LiveJournal In-memory Disk-resident
Abbr.
AZ DP YT LJ - - --
#nodes
334,863 317,080 1,134,890 3,997,962
#edges
925,872 1,049,866 2,987,624 34,681,189 Varying size Varying density Varying size Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Experiments —— State-of-the-art Methods
Our methods (exact)
FLoS_PHP FLoS_RWR
Abbr.
GI_PHP DNE NN_EI LS_EI GI_RWR
State-of-the-art methods Key idea
Global iteration Local search Local search Local search Global iteration Castanet K-dash Improved GI Matrix inversion GE_RWR Graph embedding LS_RWR Local search
Ref.
- CIKM’12 CIKM’13 KDD’10 - SIGMOD’13 VLDB’12 VLDB’13 KDD’10
Exactness
Exact Approx.
Exact Approx.
Exact Exact Exact Approx.
Approx.
Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Experiments —— PHP, Real Graphs • • Running time (AZ) Visited nodes 1-3 orders of magnitude faster A small portion of the nodes are visited Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Experiments —— RWR, Real Graphs Have long precomputing time • • Running time (AZ) Visited nodes Fast A small portion of the nodes are visited Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Experiments —— PHP/RWR, Disk-Resident Syn. Graphs • Running time Visited nodes Process disk-resident graph in seconds Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Conclusions
FLoS (fast local search) algorithm
1) Exact top 𝑘 nodes 2) General method (a variety of proximity measures) 3) Simple local search strategy (efficient) • no preprocessing • no global iteration Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Thank You!
Questions?
Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.
Backup Slides : Bounding the Visited Nodes Lower Bound: Deleting all transition probabilities incident to unvisited nodes Upper Bound: Adding one dummy node Original graph Transition graph Transition graph (lower bound) Transition graph (upper bound) Nodes 1,2,3,4 are visited; Nodes 5,6,7,8 are unvisited.
Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.