Slides - Yubao Wu

Download Report

Transcript Slides - Yubao Wu

Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs

Yubao Wu 1 , Ruoming Jin 2 , Xiang Zhang 1 1 Case Western Reserve University, 2 Kent State University Speaker: Yubao Wu

K-Nearest Neighbor Query in Graphs  Which nodes are most similar to the query node ?

Query Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

K-Nearest Neighbor Query —— Challenges 1) How to design proximity measures that can effectively capture the similarity between nodes ?

2) How to efficiently identify the top 𝑘 given measure ?

nodes for a Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Proximity Measures a) Shortest path distance b) Network flow c) Katz score

d) Random walk based: 1) Hitting time 2) Random walk with restart 3) Commute time

• • • • Discounted hitting time Truncated hitting time Penalized hitting probability Degree normalized RWR Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Computational Methods for KNN Query

Methods

Global iteration (GI) Castanet [1]

Key Idea

Iterative method Improved GI Matrix based [2] Graph embedding [3] Matrix decomposition Graph embedding

Pre-computation?

Applicability

No No Wide RWR Yes Yes RWR HT / RWR / CT Disadvantages: • Iterating over the entire graph • Pre-computing step is expensive [1] Y. Fujiwara, et al. SIGMOD’13 [2] Tong’ICDM’06; Fujiwara’KDD’12; Fujiwara’VLDB’12 [3] X. Zhao, et al. VLDB’13

K-Nearest Neighbor Query —— Challenge Challenge: An efficient local search method?

• Guarantees the exactness • Applies to different measures Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Our Method —— FLoS (Fast Local Search)

Contributions:

1) Exact top 𝑘 nodes 2) General method (a variety of proximity measures) 3) Simple local search strategy • no preprocessing • no global iteration Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

No Local Maximum Property Query Query Grid graph 20 20 Local maximum No local maximum With local maximum Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Measures With and Without Local Maximum

Abbr.

HT DHT THT PHP EI RWR CT

Proximity measures

Hitting time Discounted hitting time Truncated hitting time Penalized hitting probability Effective importance (degree normalized RWR) Random walk with restart Commute time

Local maximum ?

No No No No No Yes Yes Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Local Search Process Query node Visited node Boundary node Unvisited node 1 Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Bounding the Unvisited Nodes Query Query Grid graph 20 20 Local maximum Visited Unvisited Boundary Boundary No local maximum With local maximum Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Bounding the Visited Nodes Upper bound Exact proximity value Lower bound Query Visited node Unvisited node Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Bounding the Visited Nodes —— Monotonicity Upper bound Exact proximity value Lower bound Query Visited node Unvisited node Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Running Example Query Iteration Newly visited nodes 1 {2,3} 2 {4} 3 {5} 4 {6,7} 5 {8} Toy graph Top-2 nodes Trend of the bounds Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Relationships Among Proximity Measures • • •

Penalized hitting probability Effective importance Discounted hitting time

Theorem: PHP, EI, and DHT give the same ranking results.

Random walk with restart

Theorem: RWR 𝑖 ∝ degree(𝑖) ∙ PHP(𝑖) Note: RWR has local maximum.

Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Experiments —— Datasets Real

Datatsets

Synthetic Amazon DBLP Youtube LiveJournal In-memory Disk-resident

Abbr.

AZ DP YT LJ - - --

#nodes

334,863 317,080 1,134,890 3,997,962

#edges

925,872 1,049,866 2,987,624 34,681,189 Varying size Varying density Varying size Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Experiments —— State-of-the-art Methods

Our methods (exact)

FLoS_PHP FLoS_RWR

Abbr.

GI_PHP DNE NN_EI LS_EI GI_RWR

State-of-the-art methods Key idea

Global iteration Local search Local search Local search Global iteration Castanet K-dash Improved GI Matrix inversion GE_RWR Graph embedding LS_RWR Local search

Ref.

- CIKM’12 CIKM’13 KDD’10 - SIGMOD’13 VLDB’12 VLDB’13 KDD’10

Exactness

Exact Approx.

Exact Approx.

Exact Exact Exact Approx.

Approx.

Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Experiments —— PHP, Real Graphs • • Running time (AZ) Visited nodes 1-3 orders of magnitude faster A small portion of the nodes are visited Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Experiments —— RWR, Real Graphs Have long precomputing time • • Running time (AZ) Visited nodes Fast A small portion of the nodes are visited Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Experiments —— PHP/RWR, Disk-Resident Syn. Graphs • Running time Visited nodes Process disk-resident graph in seconds Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Conclusions

FLoS (fast local search) algorithm

1) Exact top 𝑘 nodes 2) General method (a variety of proximity measures) 3) Simple local search strategy (efficient) • no preprocessing • no global iteration Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Thank You!

Questions?

Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.

Backup Slides : Bounding the Visited Nodes Lower Bound: Deleting all transition probabilities incident to unvisited nodes Upper Bound: Adding one dummy node Original graph Transition graph Transition graph (lower bound) Transition graph (upper bound)   Nodes 1,2,3,4 are visited; Nodes 5,6,7,8 are unvisited.

Yubao Wu, Ruoming Jin, Xiang Zhang. Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs. SIGMOD, 2014.