Transcript Slide 1
Graph Drawing Heuristics for Path Finding in Large Dimensionless Graphs Tim Weninger, Rodney R. Howell and William H. Hsu Department of Computing and Information Sciences Kansas State University, Manhattan KS 2009 International Conference on Artificial Intelligence Las Vegas, NV USA ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Outline • Introduction › Motivation › Definition › Example • Graph Drawing › Spring Embedding › High-Dimensional Embedding • Methodology › Heuristics › Evaluation Metrics • Results › Comparison • Conclusions and Future Work ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Introduction – Motivation • Problem: › There exist really big graphs • Goal: › We want to find paths between arbitrary vertices › Preferably optimal paths Taken from The Opte Project (www.opte.org) on 6/30/2009 ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Introduction [2] – Motivation • Problem: › Djikstra’s algorithm is the best we can do in most cases › O(m+n) i.e. O(|E|+|V|) › We want to do better • Insight: › Humans can search through graphs very quickly. › This is done by internally visualizing the graph • Therefore: › Computers should search through graphs like humans Taken from Google Maps (www.google.com/maps) on 6/30/2009 ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Introduction – Definition • Problem: › Most graphs (that I care about) are “dimensionless”. • Dimensionless Graphs: › “Graphs whos vertices do not reference points in a Euclidean space” • Example: Distance between some major US cities Miles/Kilometers ATL Atlanta, GA CHI DEN HOU KC LA MN MIA NY SF SEA 715 1405 800 805 2185 1135 665 865 2495 2785 1000 1085 525 2020 410 1380 795 2135 2070 1120 600 1025 915 2065 1780 1270 1335 795 1550 1230 1190 1635 1930 2450 1625 440 1470 1195 1865 1900 1935 2740 2800 385 1140 1795 1200 2010 2015 1280 3115 3365 3055 2860 Chicago, IL 1150 Denver, CO 2260 1615 Houston, TX 1285 1750 1805 Kansas City, MO 1295 850 965 1280 Los Angeles, CA 3515 3250 1650 2495 2610 Minneapolis, MN 1825 665 1470 1980 680 3110 Miami, FL 1070 2220 3320 1915 2365 4405 2885 New York, NY 1390 1275 2865 2630 1925 4505 1935 2060 San Francisco, CA 4015 3435 2040 3105 3000 615 3240 5015 4915 Seattle, WA 4485 3330 2140 3940 3060 1835 2675 5415 4600 810 1305 Taken from HM USA Travel Guide (www.hm-usa.com) on 6/30/2009 ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Introduction – Definition [2] • Solution: › Give the graph dimensions › Give each vertex a meaningful position City Latitude Longitude Atlanta, GA 33° 45’ 84° 23’ Chicago, IL 41° 50’ 87° 37’ Denver, CO 39° 45’ 105° 0’ Houston, TX 29° 45’ 95° 21’ Kansas City, MO 39° 6’ 94° 35’ Los Angeles, CA 34° 3’ 118° 15’ Minneapolis, MN 44° 59’ 93° 14’ Miami, FL 25° 46’ 80° 12’ New York, NY 40° 47’ 73° 58’ San Francisco, CA 37° 47’ 122° 20’ Seattle, WA 47° 37’ 122° 20’ • Example: › Consider latitude and longitude Data from infoplease (www.infoplease.com) on 6/30/2009 ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Introduction – Example • Travelling Salesman Problem: › (Approximation) Taken from http://xkcd.com/399/ on 7/7/2009 ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Graph Drawing [2] • Spring Embedding Taken from http://www.youtube.com/watch?v=_Oidv5M-fuw on 7/7/2009 ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Graph Drawing • High Dimension Embedding Images taken from Harel & Koren, Journal of Graph Algorithms and Applications vol. 8, no. 2, pp. 195–214 (2004) ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Methodology • Steps: › Find/synthesize large dimensionless graphs › Compute Euclidean embedding › Force Based › High Dimensional › Search › With embedding heuristics › A*-search › Without embedding › Dijkstra’s Algorithm › Compare › Speed › Accuracy ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Methodology [2] – Graph Corpus Graph Vertices Edges 770,595 2,992,607 Grid1 1,000,000 1,998,000 DBLP2 654,628 3,573,312 Wikipedia3 400,000 3,241,997 LiveJournal 1 Synthetic Graph (1000x1000 checkerboard) Courtesy Henry Haselgrove 3 Courtesy Jiawei Han, Yizhou Sun and Yintao Yu 2 ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Methodology [3] – Searching • Dijkstra’s Algorithm › All edges are unweighted › Therefore we have Breadth First Search › O(m+n) • A*-Search › Heuristics › Euclidean Distance (Pythagorean Equation) › Pick candidate vertex closest to goal › Angle Deviation (Cosine) › Draw imaginary line between current vertex and goal › Pick candidate vertex with smallest cosine deviation ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Methodology [4] – Scalability Test • First test – How does this approach scale? › Use only LiveJournal graph › Cut the LiveJournal graph into 100, 200, 400, 800 … 102,400-vertex graphs › Embed each graph › Force Based › High Dimensional › Search › BFS › A*-search › Distance › Angle › 1000 repetitions › Random start and end vertices › Compare ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Methodology [5] – Accuracy Test • Second test – How accurate are the paths? › Use full graphs › Embed each graph › Force Based › Complexity Restrictions › High Dimensional › Search › BFS › A*-search › Distance › Angle › 1000 repetitions › Random start and end vertices › Compare ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Methodology [6] – Metrics • Speed › Mean number of relaxations › Not CPU time • Accuracy › Percent Correct › Number of times the A*-search returns the correct path › Shortest Path Length › The actual shortest path length computed by Dijkstra’s Algorithm › Averaged over all repetitions › Mean Path Length Error › Error rate › Mean Difference between Shortest Path Length and A*-search result ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Results - Scalability • How does this approach scale? › Dijsktra’s Algorithm on Different sizes of LiveJournal Graph › No embedding › 1,000 repetitions Size ICAI Conference July 13-16, 2009 Mean Path Length Mean Relaxations 100 1.92 98.66 1,600 3.41 818.70 102,400 4.31 9,886.14 Computing and Information Sciences Kansas State University Results [2] – Scalability • How does this approach scale? › A*-search on different sizes of LiveJournal Graph › Straight line distance heuristic only › 1,000 repetitions › Compared to Dijkstra’s Algorithm ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Results [3] – Scalability • How does this approach scale? › Comparison of differently sized LiveJournal graphs › Both Distance and Angle Heuristics Size SPL Relax Error % Correct Distance SE Angle Distance HDE Angle ICAI Conference July 13-16, 2009 100 1.94 4.75 .02 98.6% 1600 4.62 70.85 1.21 40.4% 102400 - - - - 100 1.93 4.08 .01 99.1% 1600 4.08 39.93 .67 58.1% 102400 - - - - 100 1.99 5.35 .07 97.1% 1600 6.18 453.13 2.767 26.5% 102400 9.70 34429 5.62 3.3% 100 2.00 7.09 .09 93.4% 1600 6.38 388.51 2.97 27.6% 102400 11.34 33979 7.23 5.6% Computing and Information Sciences Kansas State University Results [4] – Accuracy • How accurate is this approach? › › › › Graph comparisons Dijkstra’s Algorithm on full graph sizes 1000 repetitions Baseline (Control): Mean Path Length Mean Relaxations 5.27 20,026.13 Grid 611.28 500,459.94 DBLP 9.55 172,107.69 Wikipedia 9.45 112,987.68 LiveJournal ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Results [5] – Accuracy • How accurate is this approach? › High Dimension Embedding › Force-Based does not work › 1000 repetitions › Results: SPL Distance Angle ICAI Conference July 13-16, 2009 Relax Error % Correct LiveJournal 7.65 5866.41 2.39 12.3% Grid 661.28 236732.77 0.0 100.0% DBLP 11.11 56637.80 1.57 40.1% Wikipedia 20.12 97747.48 10.67 14.3% LiveJournal 13.60 4833.80 8.34 4.4% Grid 661.37 253017.65 0.09 96.7% DBLP 10.96 59749.61 1.42 49.1% Wikipedia 20.64 132307.91 11.19 17.5% Computing and Information Sciences Kansas State University Conclusions • Final Comparison › High Dimension Embedding only › Force-Based does not work › 1000 repetitions › Results: › Error: The A*-search path error rate compared to the actual shortest path › Savings: In terms of relaxations performed by A*-search compared to Dijkstra’s Algorithm Dist Angle ICAI Conference July 13-16, 2009 Error Savings LiveJournal 45.16% 70.71% Grid 0.00% 52.69% DBLP 16.34% 67.09% Wikipedia 112.91% 13.49% LiveJournal 158.06% 75.86% Grid 0.01% 49.44% DBLP 14.76% 65.44% Wikipedia 118.41% -17.10% Computing and Information Sciences Kansas State University Conclusions [2] • Graph drawing techniques may be useful in approximating shortest paths in large graphs › Particularly in applications requiring many different paths from the same graph • However the speed/accuracy tradeoff might not be worth the effort • Observations: › Other graph drawing techniques might work better › Useful only when performing several path lookups › e.g. Social network analysis › Embeddings can be done in more than 2-dimensions › Heuristics computation can be easily changed › Other heuristics can be considered › e.g. Manhattan Distance ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University Questions? Special thanks: Defense Intelligence Agency Dr. Jiawei Han, Yizhou Sun, Yintao Yu Henry Haselgrove ICAI Conference July 13-16, 2009 Computing and Information Sciences Kansas State University