Transcript slides
Efficient Spatial Sampling of Large Geographical Tables Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy Google Research 7/26/2016 Anish Das Sarma 1 Japan Tsunami, March 2011 7/26/2016 Anish Das Sarma 2 England riot suspects v.s. poverty 7/26/2016 Anish Das Sarma 3 • Study visualization of geographic data on a map • Example commercial tools: – Google Fusion Tables – Google/Yahoo/Bing Maps • We address efficiency challenges in visualizing large datasets: The Thinning Problem 7/26/2016 Anish Das Sarma 4 • Geographical data: – Set of records – Each record associated with: • geographic feature (sub-region of the world): points, lines, polygons • structured data: record-id, attributes • Map visualization: – Input: region of the world (“tiles”) at a specific zoom level – Output: display features in the region 7/26/2016 Anish Das Sarma 5 Datasetsofcan be(geocoded) very large restaurants (e.g., 60M US • Dataset 300K in US parcels) • When map is zoomed in to San Francisco (say – Canlevel showz=5), only cannot a sampleshow of records on the map zoom all restaurants • Our Goal: Determine (a priori), the set of records to be shown for a given tile, zoom level. 7/26/2016 Anish Das Sarma 6 • Thinning problem: – Formal definitions – Challenges • General Framework for solving Thinning • Specific solutions 7/26/2016 Anish Das Sarma 7 7/26/2016 8 z=1 z=2 z=3 R1 7/26/2016 9 • Visibility Bound: Each tile can show a maximum of K (~500) features • Adjacency: Features spanning multiple adjacent tiles must be shown entirely • Zooming: If a feature is shown at a zoom level, it should be preserved by zooming in further 7/26/2016 Anish Das Sarma 10 Violation of Zooming Constraint 7/26/2016 Anish Das Sarma Correct zoom in 11 7/26/2016 Anish Das Sarma 12 The Thinning Problem 7/26/2016 13 • Scale: Thinning needs to be performed very efficiently • Constraints: Several constraints need to be met by any thinning solution • Objectives: There may be multiple solutions, and the best one needs to be picked based on certain desirable criteria 7/26/2016 Anish Das Sarma 14 • General Framework based on LP • Specific solutions: – (Strong & Weak) Maximality, DFS algorithm – Point Dataset: Efficient randomized algorithm 7/26/2016 Anish Das Sarma 15 • Thinning problem: – Formal definitions – Challenges • General Framework for solving Thinning • Specific solutions 7/26/2016 Anish Das Sarma 16 f 1 f 2 7/26/2016 f 5 f 3 f 4 - K = 2 (max 2 features per tile) - 1 zoom level f 6 Anish Das Sarma 17 f 5 f 1 f 2 - K = 2 (max 2 features per tile) - 1 zoom level f 6 • Possible solutions: – (f1, f2) - (f5, f6) 7/26/2016 Anish Das Sarma 18 f 1 f 2 f 5 f 3 - K = 2 (max 2 features per tile) - 1 zoom level f 6 f 4 • Possible solutions: – (f1, f2) - (f5, f6) – (f1 or f2) - (f3 or f4) - (f5 or f6) 7/26/2016 Anish Das Sarma 19 - K = 2 (max 2 features per tile) - 1 zoom level f 3 f 4 • Possible solutions: – (f1, f2) - (f5, f6) – (f1 or f2) - (f3 or f4) - (f5 or f6) – (f3, f4) 7/26/2016 Anish Das Sarma 20 f 1 f 2 f 5 f 3 - K = 2 (max 2 features per tile) - 1 zoom level f 6 f 4 • Possible solutions: – – – – 7/26/2016 (f1, f2) - (f5, f6) (f1 or f2) - (f3 or f4) - (f5 or f6) (f3, f4) Subsets of the above Anish Das Sarma 21 f 1 f 2 f 5 f 3 - K = 2 (max 2 features per tile) - 1 zoom level f 6 f 4 • Possible solutions: – – – – (f1, f2) - (f5, f6) (f1 or f2) - (f3 or f4) - (f5 or f6) (f3, f4) Subsets of the above Which one is the best solution? 7/26/2016 Anish Das Sarma 22 P1 T1 P2 f 1 P3 f 3 f 2 f 6 f 4 x y 0 <= x + y <= 2 (constraint from T1) 0 <= y + z <= 2 (constraint from T2) 0 <= x <= 2 (constraint from P1) 0 <= y <= 2 (constraint from P2) 0 <= z <= 2 (constraint from P3) 7/26/2016 T2 f 5 z P1: features shown only in T1 P2: features shown in both T1 & T2 P3: features shown only in T2 x: # features to show from P1 y: # features to show in P2 z: # features to show in P3 maximize: x + y + z (show as many features as possible) Anish Das Sarma 23 P1 T1 P2 f 1 P3 f 3 f 2 f 6 f 4 x y 0 <= x + y <= 2 (constraint from T1) 0 <= y + z <= 2 (constraint from T2) 0 <= x <= 2 (constraint from P1) 0 <= y <= 2 (constraint from P2) 0 <= z <= 2 (constraint from P3) 7/26/2016 T2 f 5 z P1: features shown only in T1 P2: features shown in both T1 & T2 P3: features shwon only in T2 x: # features to show from P1 y: # features to show in P2 z: # features to show in P3 maximize: x + y + z (show as many features as possible) Anish Das Sarma 25 P1 T1 P2 f 1 P3 f 3 f 2 f 6 f 4 x y 0 <= x + y <= 2 (constraint from T1) 0 <= y + z <= 2 (constraint from T2) 0 <= x <= 2 (constraint from P1) 0 <= y <= 2 (constraint from P2) 0 <= z <= 2 (constraint from P3) 7/26/2016 T2 f 5 z P1: features shown only in T1 P2: features shown in both T1 & T2 P3: features shwon only in T2 x: # features to show from P1 y: # features to show in P2 z: # features to show in P3 maximize: x + y + z (show as many features as possible) Anish Das Sarma 26 P1 T1 P2 f 1 P3 f 3 f 2 f 6 f 4 x y 0 <= x + y <= 2 (constraint from T1) 0 <= y + z <= 2 (constraint from T2) 0 <= x <= 2 (constraint from P1) 0 <= y <= 2 (constraint from P2) 0 <= z <= 2 (constraint from P3) 7/26/2016 T2 f 5 z P1: features shown only in T1 P2: features shown in both T1 & T2 P3: features shwon only in T2 x: # features to show from P1 y: # features to show in P2 z: # features to show in P3 maximize: x + y + z (show as many features as possible) Anish Das Sarma 27 P1 T1 P2 f 1 P3 f 3 f 2 f 6 f 4 x T2 f 5 y z P1: features shown only in T1 P2: features shown in both T1 & T2 P3: features shwon only in T2 x: # features to show from P1 y: # features to show in P2 z: # features to show in P3 0 <= x + y <= 2 (constraint from T1) 0 <= y + z <= 2 (constraint from T2) 0 <= x <= 2 (constraint from P1) 0 <= y <= 2 (constraint from P2) 0 <= z <= 2 (constraint from P3) 7/26/2016 Anish Das Sarma 28 P1 T1 P2 P3 T2 f 5 f 1) Maximality: Show as fmany points 1 3 as possible • Max. distinct records • Max.f total records f f 6 4 2) Fairness: Every point needs to have 2 fair chance of being shown x y • Min. L2 norm of #records visible from each partition z P1: features shown only in T1 P2: features shown in both T1 & T2 P3: features shwon only in T2 x: # features to show from P1 y: # features to show in P2 z: # features to show in P3 0 <= x + y <= 2We (constraint from 3) Importance: may want toT1) favor 0 <= ybased + z <= 2on (constraint from(e.g., T2) features importance 0 <= x <= 2 (constraint from P1) star rating) 0 <= y <= 2 (constraint from P2) • Max. weighted sum of records 0 <= z <= 2 (constraint from P3) 7/26/2016 Anish Das Sarma 29 A solution considering only maximality Maximality + fairness 7/26/2016 Anish Das Sarma 30 • Thinning problem: – Formal definitions – Challenges • General Framework for solving Thinning • Specific solutions 7/26/2016 Anish Das Sarma 31 The Thinning Problem: Maximality 7/26/2016 32 The Thinning Problem: Maximality 7/26/2016 33 • Algorithm that samples features while traversing spatial tree • Naïve: – Breadth-first traversal – Exponential space complexity 7/26/2016 Anish Das Sarma 34 • Traverse spatial index tree in top-down depthfirst fashion • Until all nodes traversed in DFS: – Check number L of available features (K at root) based on how many already visible – Sample L features at node N and set their zoom level – Assign each feature to child nodes 7/26/2016 Anish Das Sarma 35 • Assign a random number r(fi) for each feature • For each tile T: – Show top-500 visible features based random number 7/26/2016 Anish Das Sarma 36 • Cartographic Generalization – domain-specific transformation of features – our work is complementary: (a) help filter features, (b) decide which transformed features to render • Spatial Databases, Top-K, sampling, … 7/26/2016 Anish Das Sarma 37 7/26/2016 Anish Das Sarma 38