A K-Main Routes Approach to Spatial Network Activity

Download Report

Transcript A K-Main Routes Approach to Spatial Network Activity

A K-Main Routes Approach to
Spatial Network Activity Summarization
Authors:
Dev Oliver
Shashi Shekhar
James M. Kang
Renee Bousselaire
Abdussalam Bannur
Outline
 Motivation
 Problem Statement
 Contributions
 Validation
 Analytical
 Experimental
 Case Studies
 Summary and Future Work
Motivation: Crime Analysis (application domain)
 Crime hotspot
Street
Place
 Area of concentrated crime
Neighborhood
“Most clustering algorithms will show areas of concentration even when a line
is the most appropriate dimension.” – National Institute of Justice**
Star Tribune, January 26, 2011
**J.
E. Eck et. al. Mapping Crime: Understanding Hot Spots. US National Inst. of Justice (http://www.ncjrs.gov/pdffiles1/nij/209393.pdf), 2005.
Examples of Linear Patterns
Linear patterns resulting from deforestation in Brazil
http://en.wikipedia.org/wiki/Deforestation_in_Brazil
Linear patterns of crime in a major US
city
Motivation: Environmental Criminology (scientific domain)
 Spatial theories in Environmental Criminology
 Routine Activity Theory1
 Crime location related to criminal’s
frequently visited areas
 Crime Pattern Theory2
 Based on spatial model
 Nodes (e.g. home, work,
entertainment),
 Paths (e.g. routes between
nodes),
 Edges
 Crime locations close to edges
 Near criminal’s activity
boundaries where residents may
Source: Rossmo, Kim (2000). Geographic Profiling. Boca Raton, FL: CRC Press.
not recognize him/her
http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16
 Network based summarization adds value to Environmental Criminology
 Assist with large scale verification of real-world data matching theories
 Opportunities to develop hypotheses for new theory formulation
1L.E.
2P.
Cohen et al., Social change and crime rate trends: A routine activity approach, American sociological review, 1979.
L. Brantingham et al., Environmental Criminology, Waveland Press, 1990.
Other Domains
Disaster Relief
Accident Analysis and Prevention
Key Concepts
 Activity
 Object of interest located at node or edge
 Summary path
 A path chosen by KMR to summarize activities
 Activity coverage
 Total number of activities of a path or set of paths
 Active node
 A node having n ≥ 1 activities or joined by an edge
having n ≥ 1 activities e.g., A, B, C, D, E
 Inactive node
 A node having n = 0 activities and joined by edges
all having n = 0 activities e.g., F
 Active node ratio
 Total # active nodes/Total # nodes
 e.g., 5/6
Motivation
Problem
Contributions
Validation
Summary
Each edge has a weight of 1
Problem Statement
Given P = the set of Shortest Paths
 Given
 A spatial network G = (N, E)
 A set of activities, A and their
locations (e.g. a node or edge)
 A set of Paths, P
 K (Number of routes)
 Edge weights
k=2
Edge Weights
are 1
 Find
 A cardinality k subset P′ of P, i.e.,
a subset P′⊆ P with |P′| = k
 Objective
 Maximize the activity coverage
(AC) by P′
 Constraints
 1 ≤ k ≤ |P|.
Motivation
Problem
Contributions
Validation
Summary
Challenges
 Measures of interestingness
 Activity coverage, average distance, etc
 Computational Complexity
 Choose(N,2) paths, given N nodes
 Exponential number of k subsets of paths
Motivation
Problem
Contributions
Validation
Summary
Related Work
Network Summarization by Grouping/Clustering
Zero or One routes
Clumping (Okabe), e.g.
NT-VCM (Shiode)
Motivation
Problem
Contributions
Multiple routes
Max. Subgraph, e.g.
path, tree (Buchin)
Validation
Summary
Our Work
Contributions
 K-Main Routes (KMR) algorithm
 Finds a set of k routes to group activities
 New design decisions added
 Network Voronoi Activity assignment
 Divide and Conquer Summary path recomputation
 Spatial network activity summarization is shown to be NP-complete.
 Analytically demonstrate correctness of design decisions and show cost
analysis
 Experimental evaluation of the various algorithms
 Performance evaluated using synthetic and real world datasets
 Case study comparing KMR with geometry based summarization
Motivation
Problem
Contributions
Validation
Summary
K-Main Routes (KMR) Algorithm
 K-Main Routes Algorithm
 Select k paths as initial summary paths
 Repeat
1. Form k clusters by assigning each activity
to its closest summary path
2. Recompute summary path of each cluster
 Until summary paths do not change
 Design Decisions
 Inactive node pruning
 Network Voronoi Activity assignment
 Divide and Conquer Summary path
recomputation
Motivation
Problem
Contributions
Validation
Summary
P = the set of Shortest Paths, K=2
Design Decision: Inactive Node Pruning
 Only consider paths between active nodes
 Optimal solution will still be in this set
Given the set of shortest paths
• 20 shortest paths calculated and stored versus 30
Motivation
Problem
Contributions
Validation
Summary
Design Decision: Network Voronoi (NV) Activity Assignment
 Goals
 Form k clusters by assigning each activity to its closest summary path
 Improve execution time of current assignment strategy
 Example (execution trace) Next
K-Main Routes Algorithm
Select k shortest paths as initial summary paths
Repeat
1. Form
Network
k clusters
VoronoibyActivity
assigning
Assignment
each activity
to its closest summary path
2. Recompute summary path of each cluster
Until summary paths do not change
Motivation
Problem
Contributions
Validation
Summary
Design Decision: Network Voronoi (NV) Activity Assignment
0
X
Open: X A E D H
∞
0
A
B
Closed: X
∞
3
4
C
7
8
D
1
9
2
10
E
5
6
F
G
∞
∞
Activity
Active Node
Inactive Node
Virtual Node
Motivation
Problem
H
∞
0
ACTIVITIES
1
∞
0
Summary Path
Edge weight = 1
Edge weight = 0
Closed Node
Contributions
DISTANCE FROM
∞
0
∞
Validation
A
E
D
H
AE
DH
Summary
2
3
4
5
6
7
8
9
10
Design Decision: Network Voronoi (NV) Activity Assignment
0
X
Open: A E D H B
1
∞
A
B
Closed: X A
∞
3
4
C
7
8
D
1
9
2
10
5
E
0
1 < 0?
6
F
G
∞
∞
Activity
Active Node
Inactive Node
Virtual Node
Motivation
Problem
H
0
ACTIVITIES
0
Summary Path
Edge weight = 1
Edge weight = 0
Closed Node
Contributions
DISTANCE FROM
0
Validation
1
2
0
0
AE 0
0
A
E
D
H
DH
Summary
3
4
5
6
7
8
9
10
Design Decision: Network Voronoi (NV) Activity Assignment
0
X
Open: E D H B F
A
B
3
4
C
7
8
D
1
9
2
10
E
5
0
6
F
G
∞
1
∞
Activity
Active Node
Inactive Node
Virtual Node
Motivation
Problem
H
0
ACTIVITIES
0
Summary Path
Edge weight = 1
Edge weight = 0
Closed Node
Contributions
DISTANCE FROM
0
Closed: X A E
∞
1
Validation
1
2
3
4
5
6
A
0
E
0
0
0
0
0
AE 0
0
0
0
D
H
DH
Summary
7
8
9
10
Design Decision: Network Voronoi (NV) Activity Assignment
0
X
Open: D H B F C
A
B
Closed: X A E D
∞
1
1
3
4
C
7
8
D
1
9
2
10
E
5
0
6
F
G
1
∞
Activity
Active Node
Inactive Node
Virtual Node
Motivation
Problem
0
H
0
1 < 0?
Summary Path
Edge weight = 1
Edge weight = 0
Closed Node
Contributions
ACTIVITIES
DISTANCE FROM
0
Validation
A
1
2
0
0
E
3
4
5
0
6
7
8
9
10
0
0
0
0
0
0
0
0
0
D
H
AE 0
DH
Summary
0
0
0
Design Decision: Network Voronoi (NV) Activity Assignment
0
X
Open: H B F C G
1
A
B
Closed: X A E D H
1
3
4
C
7
8
D
1
9
2
10
E
5
0
6
F
G
1
∞
1
Activity
Active Node
Inactive Node
Virtual Node
Motivation
Problem
H
0
ACTIVITIES
0
Summary Path
Edge weight = 1
Edge weight = 0
Closed Node
Contributions
DISTANCE FROM
0
Validation
A
1
2
0
0
E
3
4
5
0
6
7
8
9
10
0
0
0
0
0
0
0
0
D
0
H
AE 0
DH
Summary
0
0
0
0
0
Design Decision: Network Voronoi (NV) Activity Assignment
0
X
Open: B F C G
1
A
B
3
4
C
Closed: X A E D H B
7
8
D
1
9
2
10
E
5
0
6
F
G
1
2 < 1?
1
Activity
Active Node
Inactive Node
Virtual Node
Motivation
Problem
H
0
ACTIVITIES
0
Summary Path
Edge weight = 1
Edge weight = 0
Closed Node
Contributions
DISTANCE FROM
0
2 < 1?
1
Validation
A
1
2
3
4
0
0
1
1
E
5
0
6
7
8
9
10
0
0
0
0
0
0
0
0
D
0
H
AE 0
DH
Summary
0
1
1
0
0
0
0
Design Decision: Network Voronoi (NV) Activity Assignment
0
X
Open: F C G
A
B
Closed: X A E D H B F
1
1
3
4
C
7
8
D
1
9
2
10
E
5
0
6
F
1
Activity
Active Node
Inactive Node
Virtual Node
Motivation
Problem
G
H
0
ACTIVITIES
0
1
2 < 1?
Summary Path
Edge weight = 1
Edge weight = 0
Closed Node
Contributions
DISTANCE FROM
0
Validation
A
1
2
3
4
0
0
1
1
E
5
0
6
7
8
9
10
0
0
0
0
0
0
0
0
D
0
H
AE 0
DH
Summary
0
1
1
0
0
0
0
Design Decision: Network Voronoi (NV) Activity Assignment
0
X
Open: C G
1
A
B
Closed: X A E D H B F C
1
3
4
C
7
8
D
1
9
2
10
E
5
0
6
F
1
Activity
Active Node
Inactive Node
Virtual Node
Motivation
Problem
G
H
0
ACTIVITIES
0
1
2 < 1?
Summary Path
Edge weight = 1
Edge weight = 0
Closed Node
Contributions
DISTANCE FROM
0
Validation
A
1
2
3
4
0
0
1
1
E
5
0
D
1
6
7
8
9
10
0
0
0
0
0
0
0
0
0
1
H
AE 0
DH
Summary
0
1
1
1
1
0
0
0
0
Design Decision: Network Voronoi (NV) Activity Assignment
 Network Voronoi Activity Assignment algorithm
Input: Graph G = (N, E), a set of Activities A, a set of k Summary Paths, S
Output: A set of k clusters formed by assigning all ai ∈A to one si ∈S, where dist(ai, si) ≤
dist(ai, sj) and sj ∈S and sj ≠ si
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Motivation
Open ← all nodes ∈ S, Closed ← Ø
Tnodes ← all nodes ∈ S,
Tactivities ← activities on si ∈S
repeat
nc ← next node ∈ Open
remove nc from Open
Closed ← nc
X ← neighbors of nc
foreach xi ∈ X
if xi ∉ Tnodes and xi ∉ Closed
Tnodes ← xi
xi.prev ← nc,
xi.dist ← dist(xi, nc) + nc.dist
xi.sp ← nc.sp
else if xi ∈Tnodes
update xi if new dist < xi.dist
Problem
Contributions
Validation
17.
if xi ∉ Open
18.
Open ← xi
19.
Y ← activities on edge {nc, xi}
20.
foreach yi ∈ Y
21.
if yi ∉ Tactivities
22.
Tactivities ← yi
23.
yi.prev ← nc
24.
yi.dist ← xi.dist
25.
yi.sp ← xi.sp
26.
else
27.
update yi if new dist < yi.dist
28. until all active nodes ∈ Closed
29. return currentClusters
Summary
Design Decision: Divide and Conquer Summary PAth
REcomputation
 Goals
 Recompute the summary path of each cluster
 Improve execution time of current recomputation strategy
 Example (execution trace) Next
K-Main Routes Algorithm
Select k shortest paths as initial summary paths
Repeat
1. Network Voronoi Activity Assignment
2. Recompute
Divide and Conquer
summarySummary
path of each
pathcluster
Recomputation
Design
Until summary
paths do not
changeDecision
Until summary paths do not change
Motivation
Problem
Contributions
Validation
Summary
Design Decision: Divide and Conquer Summary PAth
REcomputation
 Summary Path Recomputation Algorithm
Input: Graph G = (N, E), a set of Clusters, C
Output: A set of summary paths, S where si ∈S has max coverage for ci ∈ C and si ∈ ci
1. nextClusters ← Ø
2. foreach ci ∈ C
3.
X ← active nodes of ci
4.
maxP ← Ø
5.
foreach xi ∈ X
6.
foreach xj ∈ X
7.
if (i ≠ j)
8.
cP ← getSP(xi, xj)
9.
if (maxP = Ø)
10.
maxP ← cP
11.
if (maxP.activities < cP.activities)
12.
maxP ← cP
13. if (maxP ≠ ci.summaryPath
14.
nextClusters ← maxP
15. else
16.
nextClusters ← ci.summaryPath
17. return nextClusters
Motivation
Problem
Contributions
Validation
Summary
A
3
4
B
C
7
8
D
1
9
2
10
E
5
6
F
G
Activity
Active Node
Inactive Node
Summary Path
Edge weights are 1
Cluster
H
Validation
 Analytical
 Cost analysis explaining computational savings
 Experimental
 Comparative analysis of KMR with various design decisions
 Performed on real and synthetic data
 Network voronoi activity assignment and divide and conquer summary path
recomputation saves computational costs
 Savings increase with number of nodes, routes, activities and active node ratio
 Case studies
 Qualitatively shows the usefulness of network based summarization on Crime
data
Motivation
Problem
Contributions
Validation
Summary
Analytical Evaluation: Computational Analysis
 KMR Execution Time = Number of Iterations × (Activity Assignment
Cost + Summary Path Recomputation Cost)
 TKMR
= I × ([K × |A| × cost(ai,ci)] + [K × dc × |N|2])
 TKMR_I = I × ([K × |A| × cost(ai,ci)] + [K × dc × (|N| × r)2])
 TKMR_IAS = I × ([|E| + |N|×log |N|]
+ [K × dc × (|N|/K × r)2])
I
= Number of Iterations
K
= Number of Clusters
A
= Set of activities
cost(ai, ci) = Cost of calculating the distance between activity ai and cluster ci
dc
= Cost of looking up a path
N
= Set of Nodes
E
= Set of Edges
r
= active node ratio, 0 ≤ r ≤ 1
Motivation
Problem
Contributions
Validation
Summary
Experimental Evaluation
Variables
#Nodes
Synthetic Dataset
#Routes
Java-based Simulator
#Activities
Active Node
Ratio
•
•
Motivation
Candidates
KMR_I
KMR_IV
Measures
KMR_ID
Analysis
KMR_IVD
Goal: Comparative analysis
Candidates: KMR with various design decisions
•
•
•
•
•
•
•
•
•
Real Dataset
KMR_I – KMR with inactive node pruning
KMR_IV – KMR with inactive node pruning and Network voronoi activity assignment
KMR_ID – KMR with Divide and conquer summary path recomputation
KMR_IVD – KMR with all three design decisions
Measure: CPU time (Unix time command)
Platform: Mac Pro, 2 x Xeon Quad Core 2.26 GHz, 16 GB RAM
Variables: #Nodes, #Routes, #Activities, Active Node Ratio
Fixed Parameters: unit edge length
Datasets: Synthetic and Real (Haiti Earthquake)
Problem
Contributions
Validation
Summary
Data Description and Characteristics
 Synthetic Data
 2010 Census TIGER/Line® Shapefiles used for road network
 Activities randomly assigned to each edge
 Real-world data: Haiti Data Set
 Geospatial and Temporal Dataset describing recent events post-disaster
 Dataset collected from Jan 12, 2010 to March 23, 2010
 1,677 records
 Characteristics
 Attributes
• Incident Title (e.g., “Food, Water, Tents needed…”)
• Incident Date and Time
• Location (City, port name)
• Category (numeric category)
• Latitude/Longitude
 Sources
 Crisis Map of Haiti - http://haiti.ushahidi.com/
 OpenStreetMap - http://www.openstreetmap.org/
Motivation
Problem
Contributions
Validation
Summary
Effect of Number of Nodes
Synthetic Data Set
Number of Activities = 1200
Active Node Ratio = 0.2
K=2
Real Data Set
Number of Activities = 1206
Active Node Ratio = 0.1998
K=2
Trends:
 Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs
 Savings increase with number of nodes
Motivation
Problem
Contributions
Validation
Summary
Effect of Number of Routes, K
Synthetic Data Set
Number of Nodes = 1000
Number of Activities = 1200
Active Node Ratio = 0.2
Real Data Set
Number of Nodes = 1000
Number of Activities = 202
Active Node Ratio = 0.219
Trends:
 Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs
 Savings increase with number of routes
Motivation
Problem
Contributions
Validation
Summary
Effect of Number of Activities
Synthetic Data Set
Number of Nodes = 1000
Active Node Ratio = 0.2
K=2
Trends:
 Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs
 Savings increase with number of activities
Motivation
Problem
Contributions
Validation
Summary
Effect of Active Node Ratio
Synthetic Data Set
Number of Nodes = 1000
Number of Activities = 1200
K=2
Trends:
 Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs
 Savings increase with active node ratio
Motivation
Problem
Contributions
Validation
Summary
Case Study: Crime Analysis
Input (a set of crime incidents, k=5)
Crimestat K-Means (Euclidean distance)
KMR Output
Crimestat K-Means (Network distance)
Case Study: Crime Analysis
Input (a set of crime incidents, k=5)
Crimestat K-Means (Euclidean distance)
KMR Output
Crimestat K-Means (Network distance)
Case Study: Crime Analysis
Input (a set of crime incidents, k=5)
Crimestat K-Means (Euclidean distance)
KMR Output
Crimestat K-Means (Network distance)
Summary
 Spatial network activity summarization was shown to be NP-complete.
 K-Main Routes (KMR) algorithm and its design decisions described
 Inactive node pruning
 Network Voronoi Activity assignment
 Divide and Conquer Summary path recomputation
 Analytically demonstrated correctness of design decisions and cost analysis
showed
 Experimental evaluation
 Performance evaluated using synthetic and real world datasets
 Case study comparing KMR with geometry based summarization
Motivation
Problem
Contributions
Validation
Summary
Acknowledgements
 Members of the Spatial Database and Spatial Data Mining Research Group, University of
Minnesota, Twin-Cities.
 This work was supported by grants from USARMY and USDOD.
 Thank you for your time! Any questions or comments?