Distance Indexing on Road Networks

Download Report

Transcript Distance Indexing on Road Networks

Distance Indexing on Road Networks
VLDB '2006
2006-09-15
Haibo Hu (Hong Kong Baptist University, Hong Kong)
Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong)
Victor Lee (City University of Hong Kong, Hong Kong)
Modeling Road Networks
Network -> Undirected weighted graph
Road junction -> Vertex (node)
Road segment -> Edge
Distance -> Edge weight
Data object and query point -> On node only
High way
Gas station
Query point
objects
query point
Nearest in Euclidean Space
1 km
4.0
4.42
0.7
1
2.68 km
4.5
4
0.8
Actual Nearest
km
2
Query Processing on Road Networks
Queries:
Window query
kNN, continuous kNN
Processing methods:
Network Expansion [Papadias VLDB03]
5
0.7
1
Use Euclidean distance for
preliminary pruning
Indexing the objects by
spatial index
4.5
4
0.8
Precomputed Index [Kolahdouzan
VLDB04]
Voronoi Network Nearest
Neighbor (VN3)
NN list: precompute and
store the kNNs for some
large-degree nodes
3
Problems and Disadvantages
Distance computation is still tough
By Dijkstra's single-source shortest path algorithm:
Maintain nodes whose distances are not finalized
Pick the node with the shortest distance and finalize it
Relax all not-yet-finalized distances
Repeat until all distances are finalized
Limitations:
Must visit nodes in the ascending order of distances
Running time O(NlgV)
Precomputed indexes cannot suit all queries
Return k nearest neighbor
Return the actual shortest path
Precomputed indexes are costly to store and update
4
Our Solution at a Glance
Distance signature --- the first general-purposed index on
road networks that
Categorizes the distances of a node to all objects
Supports both rough and exact distance computation
Accelerates processing of common query types
Reduces the storage and maintenance cost
Is orthogonal to other query optimization techniques
5
Roadmap
Background
Distance Signature Overview
Operations on Signatures
Query Processing on Signatures
Smart Choice of Distance Categories
Construction and Maintenance
Experimental Results
Conclusion
6
Distance Signature
Basic Idea:
Precomputing distances is a good trade-off between having no
indexing and solution space indexing
Maintain the approximate distance between objects and nodes
How rough is the approximation?
Apply rough approximation to faraway objects
Queries are always interested in local objects
Faraway objects are more than local objects
We use an exponential sequence of categories
In the form of [0, T), [T, cT), [cT, c2T), [c2T, c3T), ...
T and c are constant parameters
E.g., T = 3, c = 2, then [0, 3), [3,6), [6,12), [12,24), ...
3
Cat 0 Cat 1
6
12
Cat 2
24
Cat 3
7
Distance Signature (Cont'd)
For each node n, signature component S(n)[i] denotes the
category of dist(n,i)
S(n)[i].link denotes the next node from n in the shortest path to i
Signature S(n) is the whole set of components S(n)[i]
n1
3
n3
3
n4
6
8
4
n2
node
6 12
5
n5
s(n2)
s(n2).link
0
0
1
2
2
s(n4)
1 0 1
s(n4).link 0 0 2
distance category
3
2
5
6
n 2 n3 n 6
n7
object
1
0
0
4
10
16
n6
adjacency list
s(n) n1
6 n3 4 n5 5 null
s(n) n3
5 n5 15 n6 8 null
8
Roadmap
Background
Distance Signature Overview
Operations on Signatures
Query Processing on Signatures
Smart Choice of Distance Categories
Construction and Maintenance
Experimental Results
Conclusion
9
Distance Operations on Signatures
Principle: trace back the link until the
distance range is accurate enough
Exact
Approximate
Retrieval
(distance between
node and object)
Trace back through the
link from node to
object
Terminate once the distance range
does not partially overlap with input
Comparison
(distances from
node n to objects
a and b)
Trace back until the
two distance ranges
don’t overlap
Sorting
First apply approximate Quick sort using approximate
sorting, then apply
comparison
bubble sort using exact
comparison
p1
n3
4
n2
11
11
p1p2: possible positions of n4
n6
p2
10
Approximate Distance Comparison
What and Why?
Compare the distances of two objects based on one signature
Avoid accessing the signatures of other nodes
Used to get a rough result of distance sorting
How?
Example: compare dist(n4,n2) with dist(n4,n6)
Select an observer n3
Embed objects n2,n3,n6
into Euclidean space
n3 tells if n2 or n6 is closer to n4
If n4 is on the perpendicular
bisector, is it possible for n3
to find n4 within distance range
s(n4)[n3]?
Let multiple observers vote
p1
possible position for n4
n3
4
11
n2
11
p2
n6
11
kNN Search on Signatures
Procedures
Read signature s(q) of query node q
Categories tell the approximate distances between q and other
objects
Get k closest objects according to their category values
If no need to know the distances or order, return objects based
on category ranges
To find the ordering:
Sort objects within each category
To find exact distances:
Perform exact distance retrieval for each knn
12
Roadmap
Background
Distance Signature Overview
Operations on Signatures
Query Processing on Signatures
Smart Choice of Distance Categories
Construction and Maintenance
Experimental Results
Conclusion
13
Smart Choice of Distance Categories
Exponential categories [0, T), [T, cT), [cT, c2T], ...
How to determine c and T?
Factors:
Dataset density, distribution
Query type, load (metric: spreading)
Storage availability
Simplifications
The road network is a uniform grid
Spreading is uniformly distributed
in [0, SP]
Unlimited disk storage
O(2)
n
Theorem
The optimal c = e, T = (SP/e)0.5
14
Signature Construction
Basic procedures
Allocate storage for signatures
Build shortest path spanning tree for each object (Dijkstra)
Fill in s(n)[i] when the tree of object i is spanned to node n
Variable length encoding
Observation
the number of objects in each category is not even
# of objects 1 unit, 2 units, 3 units, ... away: 4, 8, 12, ...
Use fewer bits for larger categories
15
Variable Length Encoding
Reverse zero coding
Based on Huffman encoding scheme
Under assumptions "exponential partition", "grid topology",
"uniform distance range of queries", and c>1.5, this coding
scheme is optimal
[0, T) [T, cT) [cT, c2T) [c2T, c3T) [c3T, ∞)
Reverse coding
0000
0001
01
001
1
000
001
010
011
Average code length is approximately :
100
Fixed coding
2
c
≈ 1.2
2
c −1
16
Signature Compression
Idea:
Many objects share the same link
not compressed in memory
u
v
If s(n)[u] + s(u)[v] = s(n)[v],
then s(n)[v] can be replaced by
1-bit flag
n
17
Signature Update
Requirement
The shortest path spanning trees of all objects
A reverse index for each edge of trees that comprise this edge
limit the number of trees affected by the change of this edge
How (suppose edge (a,b) is updated) :
Find those affected spanning trees
For each affected tree of object c, check s(a)[c] or s(b)[c]
(whichever is smaller)
Propagate to adjacent nodes until no more updates
18
Roadmap
Background
Distance Signature Overview
Operations on Signatures
Query Processing on Signatures
Smart Choice of Distance Categories
Construction and Maintenance
Experimental Results
Conclusion
19
Experiment Settings
Statistics
183K nodes
351K edges
Random edge weights from 1 to 10
Page size: 4K bytes
kNN Competitors
Signature indexing
Full indexing (NN list for all nodes)
Network Voronoi Diagram (NVD) from VN3
Tuning parameters
p: object density
T, c, k
Comparison metrics: page access (I/O cost), CPU time
20
Index Construction Cost
Good for medium and sparse datasets
21
KNN Search Performance
Moderate performance over various k
22
Robustness
The choice of parameters does not make large difference
23
Conclusion
Our Contributions
The first index for distance computation on road networks
Speed up general query processing
Optimal choice of distance categories and category encoding
Future work
Cross-node signature compression
The signatures of nearby nodes are similar
Derivation of optimal distance categories for a wider range of
network topologies and object distributions
24