LLNL Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad 8/13/2007 KDD 2007, San Jose.
Download ReportTranscript LLNL Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad 8/13/2007 KDD 2007, San Jose.
LLNL Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad 8/13/2007 KDD 2007, San Jose Input Output Query Graph CEO SEC Matching Subgraph Accountant Manager Attributed Data Graph 2 Terminology: ``Conform’’ Matching Subgraph conforms Query Graph 3 Terminology: ``Interception’’ matching node Intermediate node matching node matching node matching node Matching Subgraph Query Graph Path 12-13-4 is an Interception 4 Terminology: ``Instantiate’’ Matching Subgraph Ht Query Graph Hq Node 11 instantiates SEC node Ht instantiates Hq 5 Roadmap • Introduction – Problem Definition – Motivations • How to: Graph X-Ray • Experimental Results • Conclusion 6 Motivation: Why Not SQL? • Case 1: Exact match does not exist – Q: How to find approximate answer? • Case 2: Too many exact matches – Q: How to rank them? 7 Motivation: Why Not SQL? (Cont.) • Case 3: Exact match might be not the best answer – ``Find CEO who has heavy contact with Accountant’’ • Q: how to find right? 2 3 4 11 4 12 1 ... 99 Exact match 1 direct connection Inexact match Many indirect connections 8 Motivation: Efficiency • Why Not Subgraph Isomorphism? – Polynomial for fixed # of pattern query • Q1: How to scale up linearly? • Q2: … and with a small slope? 9 Wish List • Effectiveness – Both exact match & inexact Match – Ranking among multiple results – ``Best’’ answer (proximity-based) • Efficiency – Scale linearly – Scale with small scope G-Ray meets all! 10 Roadmap • Introduction – Problem Definition – Motivations • How to: Graph X-Ray • Experimental Results • Conclusion 11 Preliminary: Center-Piece Subgraph [Tong+] Q B B A C Original Graph CePS meta Black: queryis nodes A C =CePS( A , B , C ) opt. in G-Ray! 12 Preliminary: Augmented Graph • Data nodes 8 4 – 1,…13 7 • Attribute nodes 13 3 11 –a 12 1 5 9 6 Footnote 2 10 Aug. Graph is crucial for computation! 13 G-Ray: quick overview (for loop ) Step 1: SF Step 7: BR Step 2: NE Step 3: BR Step 6: NE Step 4: NE Step 5: BR SF: Seed-Finder NE: Neighborhood -Expander BR: Bridge 14 Step 8: BR Seed-Finder ( ) • Q: How to instantiate SEC node? • A: 11 =CePS( ) 8 4 7 Footnote `11’ is close to some un-known data nodes for `CEO’ `Account.’ and `Manager’ 13 3 11 12 1 5 9 6 2 15 10 Neighborhood-Expander ( ) • Q: How to instantiate CEO node? – Step 1 Step 2? • A: 12 =CePS( ) 11 8 4 • Footnote: 7 – Step 3 Step 4? 7 =CePS( 11 ) =CePS( 7 11 12 1 5 – Step 5 Step 6? 4 13 3 9 12 ) 6 2 16 10 Bridge ( ) Step 6: NE • Q: ? Step 7: BR • A: Prim-like Alg. – To maximize – Should block node 11 and 7 • Footnote – Connection subgraph, or one single path? 17 Roadmap • Introduction – Problem Definition – Motivation • How to: Graph X-Ray • Experimental Results • Conclusion 18 Experimental Results • Datasets – DBLP – Node: author (315k) – Edge: co-authorship (1,800k) – Attribute: conference & year (13k) • KDD-2001, SIGMOD… 19 Effectiveness: star-query Query Result 20 Effectiveness: line-query Query Result 21 Effectiveness: loop-query Query Result 22 Efficiency Response Time 80 Average Response Time (Seconds) 70 Fast FSGM Iterative method 60 •Scale linearly •Small slope •3-5 Seconds 50 40 30 20 10 0 # of Edges 0 0.2 0.4 0.6 0.8 1 1.2 # of Edges 1.4 1.6 1.8 2 6 x 10 ~2 M edges 23 Roadmap • Introduction – Problem Definition – Motivation • How to: Graph X-Ray • Experimental Results • Conclusion 24 Conclusion • Graph X-Ray (G-Ray) – Best effort pattern match • in large attributed graphs – Scale linearly • with small slope • More details in Poster Session – Monday (tonight) – board number 8 25 4 8 7 13 3 12 11 13 12 1 4 5 11 9 7 2 6 10 G-Ray X-Ray Thank you! www.cs.cmu.edu/~htong 26 Backup-slides 27 Proximity on Graph 10 a.k.a relevance, closeness 9 12 2 8 1 11 3 4 6 • Multi-faceted • Punish long path • Edge weight 5 7 How to: ---- random walk with restart 28 Random walk with restart 0.04 9 0.10 Node 4 12 2 0.13 1 0.03 10 0.08 3 0.02 8 0.13 11 0.04 4 0.13 6 5 7 0.05 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.13 0.22 0.13 0.05 0.05 0.08 0.04 0.03 0.04 0.02 0.05 Nearby nodes, higher scores More red, more relevant Ranking vector r4 29 How to rank the results • Our goodness function – Measure the proximity between any two matching nodes if they are required to be connected. (two-way) – Multiply them together • In G-Ray, we approximately optimize this goodness functions • If we have multiple matching subgraphs, we can rank them according to this goodness functions 30 How to rank the results matching node matching node matching node matching node Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12) 31