Transcript slides

Efficient Spatial Sampling of
Large Geographical Tables
Anish Das Sarma, Hongrae Lee, Hector
Gonzalez, Jayant Madhavan, Alon Halevy
Google Research
7/26/2016
Anish Das Sarma
1
Japan Tsunami, March 2011
7/26/2016
Anish Das Sarma
2
England riot suspects v.s. poverty
7/26/2016
Anish Das Sarma
3
• Study visualization of geographic data on a
map
• Example commercial tools:
– Google Fusion Tables
– Google/Yahoo/Bing Maps
• We address efficiency challenges in visualizing
large datasets: The Thinning Problem
7/26/2016
Anish Das Sarma
4
• Geographical data:
– Set of records
– Each record associated with:
• geographic feature (sub-region of the world): points, lines,
polygons
• structured data: record-id, attributes
• Map visualization:
– Input: region of the world (“tiles”) at a specific zoom
level
– Output: display features in the region
7/26/2016
Anish Das Sarma
5
Datasetsofcan
be(geocoded)
very large restaurants
(e.g., 60M US
• Dataset
300K
in US
parcels)
• When
map is zoomed in to San Francisco (say
– Canlevel
showz=5),
only cannot
a sampleshow
of records
on the map
zoom
all restaurants
• Our Goal: Determine (a priori), the set of
records to be shown for a given tile, zoom
level.
7/26/2016
Anish Das Sarma
6
• Thinning problem:
– Formal definitions
– Challenges
• General Framework for solving Thinning
• Specific solutions
7/26/2016
Anish Das Sarma
7
7/26/2016
8
z=1
z=2
z=3
R1
7/26/2016
9
• Visibility Bound: Each tile can show a
maximum of K (~500) features
• Adjacency: Features spanning multiple
adjacent tiles must be shown entirely
• Zooming: If a feature is shown at a zoom level,
it should be preserved by zooming in further
7/26/2016
Anish Das Sarma
10
Violation of
Zooming Constraint
7/26/2016
Anish Das Sarma
Correct zoom in
11
7/26/2016
Anish Das Sarma
12
The Thinning Problem
7/26/2016
13
• Scale: Thinning needs to be performed very
efficiently
• Constraints: Several constraints need to be
met by any thinning solution
• Objectives: There may be multiple solutions,
and the best one needs to be picked based on
certain desirable criteria
7/26/2016
Anish Das Sarma
14
• General Framework based on LP
• Specific solutions:
– (Strong & Weak) Maximality, DFS algorithm
– Point Dataset: Efficient randomized algorithm
7/26/2016
Anish Das Sarma
15
• Thinning problem:
– Formal definitions
– Challenges
• General Framework for solving Thinning
• Specific solutions
7/26/2016
Anish Das Sarma
16
f
1
f
2
7/26/2016
f
5
f
3
f
4
- K = 2 (max 2
features per tile)
- 1 zoom level
f
6
Anish Das Sarma
17
f
5
f
1
f
2
- K = 2 (max 2
features per tile)
- 1 zoom level
f
6
• Possible solutions:
– (f1, f2) - (f5, f6)
7/26/2016
Anish Das Sarma
18
f
1
f
2
f
5
f
3
- K = 2 (max 2
features per tile)
- 1 zoom level
f
6
f
4
• Possible solutions:
– (f1, f2) - (f5, f6)
– (f1 or f2) - (f3 or f4) - (f5 or f6)
7/26/2016
Anish Das Sarma
19
- K = 2 (max 2
features per tile)
- 1 zoom level
f
3
f
4
• Possible solutions:
– (f1, f2) - (f5, f6)
– (f1 or f2) - (f3 or f4) - (f5 or f6)
– (f3, f4)
7/26/2016
Anish Das Sarma
20
f
1
f
2
f
5
f
3
- K = 2 (max 2
features per tile)
- 1 zoom level
f
6
f
4
• Possible solutions:
–
–
–
–
7/26/2016
(f1, f2) - (f5, f6)
(f1 or f2) - (f3 or f4) - (f5 or f6)
(f3, f4)
Subsets of the above
Anish Das Sarma
21
f
1
f
2
f
5
f
3
- K = 2 (max 2
features per tile)
- 1 zoom level
f
6
f
4
• Possible solutions:
–
–
–
–
(f1, f2) - (f5, f6)
(f1 or f2) - (f3 or f4) - (f5 or f6)
(f3, f4)
Subsets of the above
Which one is the best solution?
7/26/2016
Anish Das Sarma
22
P1
T1
P2
f
1
P3
f
3
f
2
f
6
f
4
x
y
0 <= x + y <= 2 (constraint from T1)
0 <= y + z <= 2 (constraint from T2)
0 <= x <= 2 (constraint from P1)
0 <= y <= 2 (constraint from P2)
0 <= z <= 2 (constraint from P3)
7/26/2016
T2
f
5
z
P1: features shown only in T1
P2: features shown in both T1 & T2
P3: features shown only in T2
x: # features to show from P1
y: # features to show in P2
z: # features to show in P3
maximize: x + y + z
(show as many features as possible)
Anish Das Sarma
23
P1
T1
P2
f
1
P3
f
3
f
2
f
6
f
4
x
y
0 <= x + y <= 2 (constraint from T1)
0 <= y + z <= 2 (constraint from T2)
0 <= x <= 2 (constraint from P1)
0 <= y <= 2 (constraint from P2)
0 <= z <= 2 (constraint from P3)
7/26/2016
T2
f
5
z
P1: features shown only in T1
P2: features shown in both T1 & T2
P3: features shwon only in T2
x: # features to show from P1
y: # features to show in P2
z: # features to show in P3
maximize: x + y + z
(show as many features as possible)
Anish Das Sarma
25
P1
T1
P2
f
1
P3
f
3
f
2
f
6
f
4
x
y
0 <= x + y <= 2 (constraint from T1)
0 <= y + z <= 2 (constraint from T2)
0 <= x <= 2 (constraint from P1)
0 <= y <= 2 (constraint from P2)
0 <= z <= 2 (constraint from P3)
7/26/2016
T2
f
5
z
P1: features shown only in T1
P2: features shown in both T1 & T2
P3: features shwon only in T2
x: # features to show from P1
y: # features to show in P2
z: # features to show in P3
maximize: x + y + z
(show as many features as possible)
Anish Das Sarma
26
P1
T1
P2
f
1
P3
f
3
f
2
f
6
f
4
x
y
0 <= x + y <= 2 (constraint from T1)
0 <= y + z <= 2 (constraint from T2)
0 <= x <= 2 (constraint from P1)
0 <= y <= 2 (constraint from P2)
0 <= z <= 2 (constraint from P3)
7/26/2016
T2
f
5
z
P1: features shown only in T1
P2: features shown in both T1 & T2
P3: features shwon only in T2
x: # features to show from P1
y: # features to show in P2
z: # features to show in P3
maximize: x + y + z
(show as many features as possible)
Anish Das Sarma
27
P1
T1
P2
f
1
P3
f
3
f
2
f
6
f
4
x
T2
f
5
y
z
P1: features shown only in T1
P2: features shown in both T1 & T2
P3: features shwon only in T2
x: # features to show from P1
y: # features to show in P2
z: # features to show in P3
0 <= x + y <= 2 (constraint from T1)
0 <= y + z <= 2 (constraint from T2)
0 <= x <= 2 (constraint from P1)
0 <= y <= 2 (constraint from P2)
0 <= z <= 2 (constraint from P3)
7/26/2016
Anish Das Sarma
28
P1
T1
P2
P3
T2
f
5
f
1) Maximality:
Show as fmany points
1
3
as possible
• Max. distinct records
• Max.f total records
f
f
6
4
2) Fairness: Every point needs to have
2
fair chance
of being shown
x
y
• Min. L2 norm of #records visible
from each partition
z
P1: features shown only in T1
P2: features shown in both T1 & T2
P3: features shwon only in T2
x: # features to show from P1
y: # features to show in P2
z: # features to show in P3
0 <= x + y <= 2We
(constraint
from
3) Importance:
may want
toT1)
favor
0 <= ybased
+ z <= 2on
(constraint
from(e.g.,
T2)
features
importance
0 <= x <= 2 (constraint from P1)
star rating)
0 <= y <= 2 (constraint from P2)
• Max.
weighted sum of records
0 <= z <= 2 (constraint from P3)
7/26/2016
Anish Das Sarma
29
A solution considering only
maximality
Maximality + fairness
7/26/2016
Anish Das Sarma
30
• Thinning problem:
– Formal definitions
– Challenges
• General Framework for solving Thinning
• Specific solutions
7/26/2016
Anish Das Sarma
31
The Thinning Problem: Maximality
7/26/2016
32
The Thinning Problem: Maximality
7/26/2016
33
• Algorithm that samples features while
traversing spatial tree
• Naïve:
– Breadth-first traversal
– Exponential space complexity
7/26/2016
Anish Das Sarma
34
• Traverse spatial index tree in top-down depthfirst fashion
• Until all nodes traversed in DFS:
– Check number L of available features (K at root)
based on how many already visible
– Sample L features at node N and set their zoom level
– Assign each feature to child nodes
7/26/2016
Anish Das Sarma
35
• Assign a random number r(fi) for each feature
• For each tile T:
– Show top-500 visible features based random
number
7/26/2016
Anish Das Sarma
36
• Cartographic Generalization
– domain-specific transformation of features
– our work is complementary: (a) help filter
features, (b) decide which transformed features to
render
• Spatial Databases, Top-K, sampling, …
7/26/2016
Anish Das Sarma
37
7/26/2016
Anish Das Sarma
38