Transcript Slide 1

Graph Drawing Heuristics for Path
Finding in Large Dimensionless Graphs
Tim Weninger, Rodney R. Howell and William H. Hsu
Department of Computing and Information Sciences
Kansas State University, Manhattan KS
2009 International Conference on Artificial Intelligence
Las Vegas, NV USA
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Outline
• Introduction
› Motivation
› Definition
› Example
• Graph Drawing
› Spring Embedding
› High-Dimensional Embedding
• Methodology
› Heuristics
› Evaluation Metrics
• Results
› Comparison
• Conclusions and Future Work
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Introduction – Motivation
• Problem:
› There exist really big
graphs
• Goal:
› We want to find paths
between arbitrary
vertices
› Preferably optimal
paths
Taken from The Opte Project (www.opte.org) on 6/30/2009
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Introduction [2] – Motivation
• Problem:
› Djikstra’s algorithm is the
best we can do in most cases
› O(m+n) i.e. O(|E|+|V|)
› We want to do better
• Insight:
› Humans can search through
graphs very quickly.
› This is done by internally
visualizing the graph
• Therefore:
› Computers should search
through graphs like humans
Taken from Google Maps (www.google.com/maps) on 6/30/2009
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Introduction – Definition
• Problem:
› Most graphs (that I care about) are “dimensionless”.
• Dimensionless Graphs:
› “Graphs whos vertices do not reference points in a Euclidean
space”
• Example:
Distance between some major US cities
Miles/Kilometers
ATL
Atlanta, GA
CHI
DEN
HOU
KC
LA
MN
MIA
NY
SF
SEA
715
1405
800
805
2185
1135
665
865
2495
2785
1000
1085
525
2020
410
1380
795
2135
2070
1120
600
1025
915
2065
1780
1270
1335
795
1550
1230
1190
1635
1930
2450
1625
440
1470
1195
1865
1900
1935
2740
2800
385
1140
1795
1200
2010
2015
1280
3115
3365
3055
2860
Chicago, IL
1150
Denver, CO
2260
1615
Houston, TX
1285
1750
1805
Kansas City, MO
1295
850
965
1280
Los Angeles, CA
3515
3250
1650
2495
2610
Minneapolis, MN
1825
665
1470
1980
680
3110
Miami, FL
1070
2220
3320
1915
2365
4405
2885
New York, NY
1390
1275
2865
2630
1925
4505
1935
2060
San Francisco, CA
4015
3435
2040
3105
3000
615
3240
5015
4915
Seattle, WA
4485
3330
2140
3940
3060
1835
2675
5415
4600
810
1305
Taken from HM USA Travel Guide (www.hm-usa.com) on 6/30/2009
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Introduction – Definition [2]
• Solution:
› Give the graph dimensions
› Give each vertex a
meaningful position
City
Latitude
Longitude
Atlanta, GA
33° 45’
84° 23’
Chicago, IL
41° 50’
87° 37’
Denver, CO
39° 45’
105° 0’
Houston, TX
29° 45’
95° 21’
Kansas City, MO
39° 6’
94° 35’
Los Angeles, CA
34° 3’
118° 15’
Minneapolis, MN
44° 59’
93° 14’
Miami, FL
25° 46’
80° 12’
New York, NY
40° 47’
73° 58’
San Francisco, CA
37° 47’
122° 20’
Seattle, WA
47° 37’
122° 20’
• Example:
› Consider latitude and
longitude
Data from infoplease (www.infoplease.com) on 6/30/2009
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Introduction – Example
• Travelling Salesman Problem:
› (Approximation)
Taken from http://xkcd.com/399/ on 7/7/2009
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Graph Drawing [2]
• Spring Embedding
Taken from http://www.youtube.com/watch?v=_Oidv5M-fuw on 7/7/2009
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Graph Drawing
• High Dimension Embedding
Images taken from Harel & Koren, Journal of Graph Algorithms and
Applications vol. 8, no. 2, pp. 195–214 (2004)
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Methodology
• Steps:
› Find/synthesize large dimensionless graphs
› Compute Euclidean embedding
› Force Based
› High Dimensional
› Search
› With embedding heuristics
› A*-search
› Without embedding
› Dijkstra’s Algorithm
› Compare
› Speed
› Accuracy
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Methodology [2] – Graph Corpus
Graph
Vertices
Edges
770,595
2,992,607
Grid1
1,000,000
1,998,000
DBLP2
654,628
3,573,312
Wikipedia3
400,000
3,241,997
LiveJournal
1 Synthetic Graph
(1000x1000 checkerboard)
Courtesy Henry Haselgrove
3 Courtesy Jiawei Han, Yizhou Sun and Yintao Yu
2
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Methodology [3] – Searching
• Dijkstra’s Algorithm
› All edges are unweighted
› Therefore we have Breadth First Search
› O(m+n)
• A*-Search
› Heuristics
› Euclidean Distance (Pythagorean Equation)
› Pick candidate vertex closest to goal
› Angle Deviation (Cosine)
› Draw imaginary line between current vertex and goal
› Pick candidate vertex with smallest cosine deviation
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Methodology [4] – Scalability Test
• First test – How does this approach scale?
› Use only LiveJournal graph
› Cut the LiveJournal graph into 100, 200, 400, 800 … 102,400-vertex graphs
› Embed each graph
› Force Based
› High Dimensional
› Search
› BFS
› A*-search
› Distance
› Angle
› 1000 repetitions
› Random start and end vertices
› Compare
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Methodology [5] – Accuracy Test
• Second test – How accurate are the paths?
› Use full graphs
› Embed each graph
› Force Based
› Complexity Restrictions
› High Dimensional
› Search
› BFS
› A*-search
› Distance
› Angle
› 1000 repetitions
› Random start and end vertices
› Compare
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Methodology [6] – Metrics
• Speed
› Mean number of relaxations
› Not CPU time
• Accuracy
› Percent Correct
› Number of times the A*-search returns the correct path
› Shortest Path Length
› The actual shortest path length computed by Dijkstra’s Algorithm
› Averaged over all repetitions
› Mean Path Length Error
› Error rate
› Mean Difference between Shortest Path Length and A*-search result
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Results - Scalability
• How does this approach scale?
› Dijsktra’s Algorithm on Different sizes of LiveJournal Graph
› No embedding
› 1,000 repetitions
Size
ICAI Conference
July 13-16, 2009
Mean Path Length Mean Relaxations
100
1.92
98.66
1,600
3.41
818.70
102,400
4.31
9,886.14
Computing and Information Sciences
Kansas State University
Results [2] – Scalability
• How does this approach scale?
› A*-search on different sizes of LiveJournal Graph
› Straight line distance heuristic only
› 1,000 repetitions
› Compared to Dijkstra’s Algorithm
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Results [3] – Scalability
• How does this approach scale?
› Comparison of differently sized LiveJournal graphs
› Both Distance and Angle Heuristics
Size SPL Relax Error % Correct
Distance
SE
Angle
Distance
HDE
Angle
ICAI Conference
July 13-16, 2009
100
1.94
4.75
.02
98.6%
1600
4.62
70.85
1.21
40.4%
102400
-
-
-
-
100
1.93
4.08
.01
99.1%
1600
4.08
39.93
.67
58.1%
102400
-
-
-
-
100
1.99
5.35
.07
97.1%
1600
6.18
453.13
2.767
26.5%
102400
9.70
34429
5.62
3.3%
100
2.00
7.09
.09
93.4%
1600
6.38
388.51
2.97
27.6%
102400
11.34
33979
7.23
5.6%
Computing and Information Sciences
Kansas State University
Results [4] – Accuracy
• How accurate is this approach?
›
›
›
›
Graph comparisons
Dijkstra’s Algorithm on full graph sizes
1000 repetitions
Baseline (Control):
Mean Path Length
Mean Relaxations
5.27
20,026.13
Grid
611.28
500,459.94
DBLP
9.55
172,107.69
Wikipedia
9.45
112,987.68
LiveJournal
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Results [5] – Accuracy
• How accurate is this approach?
› High Dimension Embedding
› Force-Based does not work
› 1000 repetitions
› Results:
SPL
Distance
Angle
ICAI Conference
July 13-16, 2009
Relax Error % Correct
LiveJournal
7.65
5866.41
2.39
12.3%
Grid
661.28
236732.77
0.0
100.0%
DBLP
11.11
56637.80
1.57
40.1%
Wikipedia
20.12
97747.48
10.67
14.3%
LiveJournal
13.60
4833.80
8.34
4.4%
Grid
661.37
253017.65
0.09
96.7%
DBLP
10.96
59749.61
1.42
49.1%
Wikipedia
20.64
132307.91
11.19
17.5%
Computing and Information Sciences
Kansas State University
Conclusions
• Final Comparison
› High Dimension Embedding
only
› Force-Based does not work
› 1000 repetitions
› Results:
› Error: The A*-search path
error rate compared to the
actual shortest path
› Savings: In terms of
relaxations performed by
A*-search compared to
Dijkstra’s Algorithm
Dist
Angle
ICAI Conference
July 13-16, 2009
Error
Savings
LiveJournal 45.16%
70.71%
Grid
0.00%
52.69%
DBLP
16.34%
67.09%
Wikipedia 112.91%
13.49%
LiveJournal 158.06%
75.86%
Grid
0.01%
49.44%
DBLP
14.76%
65.44%
Wikipedia 118.41%
-17.10%
Computing and Information Sciences
Kansas State University
Conclusions [2]
• Graph drawing techniques may be useful in approximating
shortest paths in large graphs
› Particularly in applications requiring many different paths from the
same graph
• However the speed/accuracy tradeoff might not be worth
the effort
• Observations:
› Other graph drawing techniques might work better
› Useful only when performing several path lookups
› e.g. Social network analysis
› Embeddings can be done in more than 2-dimensions
› Heuristics computation can be easily changed
› Other heuristics can be considered
› e.g. Manhattan Distance
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University
Questions?
Special thanks:
Defense Intelligence Agency
Dr. Jiawei Han, Yizhou Sun, Yintao Yu
Henry Haselgrove
ICAI Conference
July 13-16, 2009
Computing and Information Sciences
Kansas State University