National GIS Project

Download Report

Transcript National GIS Project

Spatial Databases
- Indexing
Spring, 2015
Ki-Joune Li
PNU
STEM
What is Indexing ?


Indexing : Fight against TIME
Example
Suppose that you have a Hamlet, and
you want to know the name of Hamlet’s father.


Without Index : Full (Sequential) Scan of the book
With Index : Direct Access to the Page
Hamlet
PNU
STEM
Some Constraints

Modern Database


Very Huge Volume : e.g. several peta bytes
Storage on Disk




Inevitable
But slow (cf. main memory) : msec. vs. nano sec.
Even in Main Memory Database System
What should we do ?
Minimize the number of Disk Access
PNU
STEM
The Objective of Indexing
Query Condition
Index
Disk Address
(Block Number)
Indexing
Database
in Disk
PNU
STEM
Classification of Indexing

According to the type of query and data



Alphanumeric query
Image
Spatial

What is the nearest post office to the Louvre Museum ?
Spatial predicate
Spatial Query
Spatial
Index
Disk Address
(Block Number)
Database
in Disk
PNU
STEM
Spatial Query

Sophisticated

Types of Spatial Query

One Scan Query



Multi-Scan Query : Join



Region Query : Containment, Intersection
K-Nearest Neighbor Query
Spatial Join
Distance Join
Spatial Query Processing

Tightly coupled with Spatial Indexing Method
PNU
STEM
Spatial Processing Strategy

Filtering and Refinement Strategy
Spatial
Query
Index
Verification of
Geometry
Candidates
Simplification
of Geometry
Filtering
Complete
Data
Refinement
1. More Light Index : e.g. < 1 M bytes
2. Remove Unnecessary Disk Accesses
Result
PNU
STEM
Classification of Spatial Indexing Methods

Hashing and Indexing

Index (in wide sense)


Space Decomposition vs. MBR



Hashing, Indexing (in narrow sense)
Decomposition of a space : Whole Space
Bounding Rectangle : Only Interesting Area
Dimensionality



No Transformation
to Higher Dimension
To Lower Dimension : Linearization
PNU
STEM
Indexing vs. Hashing

Hashing






1. b = h(r.key)
2. Store(r, b)
Block number is determined by hashing function or mechanism
Only for primary index
Search by a hashing function
Indexing (in narrow sense)





1. b = Store(r )
2. Insert(B, (r.key, b) )
Block number is independent from indexing mechanism
For primary or secondary index
Search by a data structure called index
PNU
STEM
Decomposition vs. Bounding Region
Decomposition
Bounding Region
PNU
STEM
Decomposition Methods

Grid File : An Extension of Hashing to 2-D

Variation




Fixed Grid
Grid File
Multi-Level Grid File
Hierarchical Data Structure




KD-tree
Quadtree
skd-tree
etc.
PNU
STEM
Fixed Grid

Most Simple Method

Minimum Data for Hashing
1 Disk Page
Query Window
40
1. Find intersecting grids
2. Find corresponding blocks
30
3. Read objects from the blocks
20
4. Refinement
10
0
0
10
20
30
40
50
PNU
STEM
Problems of Fixed Grid

Only for Point Object


Object with measure : duplicated storage
Degrade performance
Query Window

Large Dead Space

Causes
Unnecessary
Disk Accesses
40
30
20

Not very Flexible

On Distribution
10
0
0
10
20
30
40
50
PNU
STEM
Grid File

To overcome problems of Fixed Grid


Reduce Dead Space within a cell
Increase Blocking Factor
Query Window
Directory
40
Grid
Boundary
Block#
28
A
(0,0),(15,20)
Page 0
20
B
(15,0),(30,20)
Page 1
...
...
...
I
(30,28),(50,40)
Page 15
0
0
15 20
30
50
PNU
STEM
Blocking Factor

A Key Factor on performance

Number of Objects in a Disk Block
Bf 

N bloc k s
Number of Disk Accesses
DA 

N Tota lO bje c ts
N S e le c te d
Bf
How to increase Bf ?

Increase Block Size : not always possible
 Packing
PNU
STEM
Problems of Fixed Grid

Only for Point Object

Still Large Dead Space

Large Size of Directory
Directory
Grid
Boundary
Block#
A
(0,0),(15,20)
Page 0
B
(15,0),(30,20)
Page 1
...
...
...
I
(30,28),(50,40)
Page 15
PNU
Hierarchical Decomposition

To overcome the size of directory in Grid File


Hierarchical Structure of Directory
Acceleration of Search
STEM
PNU
STEM
KD-tree : Index


Extension of Binary Tree to K-Dimension (K=2 for us)
Example : suppose Bf =3
B
A Directory
E
=<
x=20
<
y=10
y=20
15
10
A
x=30
B
E
D
A
C
20
C
D
30
Each leaf node points to the disk page
PNU
STEM
KD-tree : Search
B
E
=<
x=20
<
y=10
y=20
15
10
A
x=30
B
E
D
A
C
20
30
C
D
PNU
STEM
Weak Points of KD-tree



Only for Point Objects
Dead Space
How to Store Tree Structure on Disk Space




Blocking Problem
Widely used for main memory index
Rarely used for disk resident index
B
E
Unbalanced Tree

Zipf’s Law (or 80/20 law)


Most events are concentrated
Leads highly skewed tree
D
A
C
PNU
STEM
Quadtree

Extension of KD-tree :



KD-tree : binary split
Quadtree 4-way equi-split instead
Example : Bf =3
C
D
F
A
B
E
H
J
B
F
C
D
E
G
H
I
J
G
A
I
Each leaf node points to the disk page
PNU
STEM
Weak Points of Quadtree

Same Problems of KD-tree




Only for Point Objects
Dead Space
How to Store Tree Structure on Disk Space




In addition to the lack of flexibility
Blocking Problem
Widely used for main memory index
Rarely used for disk resident index
Unbalanced Tree

Zipf’s Law (or 80/20 law)


Most events are concentrated
Leads highly skewed tree
PNU
STEM
Point Quadtree

A Simple Variation of Quadtree



Specification of Partition Point instead of equi-split
More Adaptive to the distribution of objects
Less Skewed
(10,20)
(5,25)
A
(5,25)
F
(35,10)
(10,20)
B
(35,10)
C
D
E
G
H
I
J
PNU
STEM
Linear Quadtree : Space-Filling Curve


Quadtree but another representation
Linearization by Space-Filling Curve
11
6
13
N-order
Hilbert
Column-wise
Linearize points(or cells) by their peano-key
PNU
STEM
Linear Quadtree

Example : N-order curve

Computation of Peano-Key : Bit-Interleaving
11
1. Binary representation of coordinates (10,01)
10
2. Bit-Interleaving
x=1
0
y=
0
1
01
00
Peano key
00
01
10
11
=1 0 0 1
=9
PNU
STEM
MBR Methods

MBR (Minimum Bounding Box)



Two dimensional geometric simplification of objects
Not the Whole space,
only in the region occupied by objects
(X1max, X2max )
(X1min, X2min)

R-tree and its variants
PNU
STEM
R-tree
B
R-tree
C
E
A
H
F
G
I
B
C
D
D
E
J
K
F
G
H
I
J
A
Leaf node points to the disk page
2-D Objects

Construction of R-tree : Sequence of Insertion

Upward Split
K
PNU
STEM
Splitting in R-tree

Split MBR in the case of overflow

Line sweeping : Compare Cost-X and Cost-Y

Splitting Line
New MBR
• Cost Measure
Area,
Perimeter
Overlapping Area
PNU
STEM
R-tree : Query Processing
B
C
E
H
F
B
I
G
J
A
E
D
F
C
G
H
D
I
J
K
A
Query
Region W
Candidate
Read its exact geometry from databaseCandidate
Refinement
Sample : http://www.dbnet.ece.ntua.gr/~mario/rtree/
K
PNU
STEM
Strength of R-tree





For point and non-point Objects
Good for non-uniform distribution
Paged Tree
Hierarchical Structure but Balanced
Less Dead Space than Decomposition Methods
B
C
E
H
F
I
G
J
E
D
D
K
A
C
PNU
STEM
Weak Points of R-tree : Overlapping Area

Overlapping : False Matching
Query
Region
M
A
B
J
A
B
K
C
D
E
C
L
F
G
H
I
G
L
J
D
H
I
K
E
False Matching : Visit unnecessary node
Performance Degradation
F
M
PNU
STEM
Weak Points of R-tree : Dead Space
Query
Region
A
B
C
G
L
J
D
E
H
I
K
F
M
At least one visit at this node (K) even though there is nothing
PNU
STEM
Weak Points of R-tree : Bad Split

50:50 Split
Good Split
Bad Split
1. Make them as COMPACT as possible
2. Preserve spatial proximity as possible
PNU
Improvement of R-tree

Minimize





Overlapping area
Dead Space
Or Make it more COMPACT
Preserve Spatial Proximity
Two approaches


Packing (or Bulk Loading)
Good Split or Insertion Strategies
STEM
PNU
STEM
R*-tree : An Improvement of R-tree

Re-Insertion Strategy on Overflow
Overflow
Newly Inserted Object
Delete and Re-Insert this
PNU
STEM
R*-tree : An Improvement of R-tree

Re-Insertion Strategy on Overflow
More Compact
Re-Inserted Object
PNU
STEM
R*-tree : An Improvement of R-tree

R*-tree

Compact





Small Overlapping Area
Small Sum of MBR area or perimeters
Small Dead Space
Stable : Not very affected by the order of insertions
The most widely used spatial indexing method
PNU
STEM
Packing R-tree : Improvement of R-tree

Preprocessing for making R-tree more compact

Hilbert R-tree
STR (Sort-Tile Recursive)
Uniformization

Instead of Sequential Insertions


PNU
STEM
Hilbert Packing

Hilbert Curve

A Space Filling Curve
N-order

Hilbert
Column-wise
Linearize spatial objects by their peano-key
PNU
Hilbert Packing

Hilbert Packing





Sort objects by Hilbert key
Packing by round-robin way
Maximize storage utilization
Minimum Dead Space, and Sum of MBR area
Example: Bf =3
STEM
PNU
STEM
STR (Sort-Tile Recursive)

Basic idea : “tile” the data spacer / n
slices



r : number of rectangles
n : blocking factor
P ( leaf node page ) = r / n 
Example
Suppose r = 25, n =3
nTile = 9,
nV = 3, nH = 3
using vertical
PNU
STEM
Comparison : Hilbert Packing vs. STR
HP
Large Objects
STR
HP
Points
STR
PNU
Uniformization

Non-Uniform Distribution



Uniformization Technique




Negative Effect on the performance
But in real applications : Non-Uniform
Step 1 : Transform Non-Uniform data to Uniform by STR
Step 2 : Apply R-tree (or Fixed Grid)
Step 3 : Transform Query Region
Strength


High Storage Utilization
Very Simple and Good Performance
STEM
PNU
STEM
Uniformization
Non Equi-Width
Equi-Width
1. Area of each cell : identical
2. Number of objects within each cell : almost identical
PNU
STEM
Uniformization : Example
Original
By Delaunay
Triangulation
By STR
PNU
STEM
Uniformization : Example
400
80
350
70
300
60
250
50
200
40
150
30
100
S 19
50
S 19
10
By STR
17
13
9
1
17
13
5
9
Original
S1
S 10
0
5
S 10
0
1
20
S1
PNU
STEM
Query Processing by R-tree :
Nearest Neighbor
Searching Space
Minimum
Query Point
2nd Distances in 2-D
PNU
STEM
Query Processing by R-tree :
Nearest Neighbor
Branching
Branching
Pruning
Minimum
PNU
STEM
Transformation to Higher Space

Transformation to Higher Dimension



Transform non-point object to point object
Reuse of spatial indexing methods (e.g. Grid File) applicable
only to point objects to non-point objects
Example
Max
C
B
B
A
Amin

A
C
Amax
Min
PNU
STEM
Corner Transformation

From 2-D to 4-D
(Xmax, Ymax)
(Xmin, Ymin)
1. Simplification by MBR
2. MBR ((Xmin, Ymin), (Xmax, Ymax)) to Point (Xmin, Ymin, Xmax, Ymax)
PNU
STEM
Query Processing for Corner
Transformation : 1-D Example
Query :
Find Contained Objects
A

VI
A
IV III
V
II
I
W
Amin
Max
Min
Amax
Region I
Region II
Region III
Region IV
Region V
Region VI
: Wmax < Amin
:WA
: Amax < Wmin
: Amin < Wmin, Amax < Wmax
: Wmin < Amin, Wmax < Amax
:AW
PNU
STEM
Transformation to Lower Dimension :
Linear Quadtree
1. Simplification of Geometry
(22, 0)
(23, 0)
(28, 1)
2. Compute Peano Key
with lower-left corner
3. If necessary, divide it and
give peano key to each
4. Define the size of each
piece according to the
number of quadrants
4. Insert them into B-tree
5. Query Processing by B-tree
(0, 2)