Efficient Processing of Spatial Joins Using R

Download Report

Transcript Efficient Processing of Spatial Joins Using R

The R+-Tree

A Dynamic Index for Multi-Dimensional Objects Timos K. Sellis et al.

VLDB 1987

Jae-hoon Kim

1

PNU

Introduction

STEM

DBMS store one-dimensional data

  

Integers Real numbers Strings

DBMS do not handle sufficiently multi-dimensional data

  

Boxes Polygons Points in multi-dimensional space

2

PNU

Method for Multi-dimensional Data

STEM

Common case of multi-dimensional data is points

Main idea is divide the whole space into disjoint sub-region

Sub-region contains no more than C points

C is capacity of disk page

Insertion of new points → partitioning of a region (split)

3

PNU

Classification of known methods

STEM

  

Position

 

Fixed : position of the splitting hyperplane is predetermined (grid file) Adaptable : data points determine the position of the hyperplane (k-d tree) Dimensionality

1-d cut : k-d tree

K-d cut : quad-tree, oct-tree Locality

 

Grid : splits not only the affected region, but also all the regions Brickwall : restrict the splitting hyperplane to extend solely inside the region Method

point quad-tree k-d tree grid file K-D-B-tree

Position

adaptable adaptable fixed adaptable

Dimension

k-d 1-d 1-d 1-d

Locality

brickwall brickwall grid brickwall 4

PNU

Methods for Rectangles

Transform into points in a higher dimension space

2 d rectangle → a point in 4-d space

k-d trees, or grid file after a rotation of the axes

Use space filling curve

 

Map a k-d space to a 1-d space Transform k-dimensional object to line segment (z-transform)

Divide the original space into sub-regions

Disjoint : can use method mentioned before

 

Overlapping : cut in two pieces and tag R-tree : First proposed use of overlapping sub-region STEM

5

PNU

R-Tree

Extension of b-tree

a1

Height balanced tree

Nodes are consist of MBR

Guarantee that space utilization is at least 50%

a2

STEM

6

PNU

R-Tree Split

Requirement of “good” split

 

Minimize the whole area Minimize the overlap New entry STEM

7

PNU

R-Tree Insert & Split 1 A 3 2 4 5 A B 1 2 3 4 5 5 6 7 8 8 7 6 B

STEM

8

PNU

Bad Search in R-Tree 1 A 3 2 4 5 1 2 3 4 A B 5 6 7 8 8 7 6 B

STEM

9

PNU

R+-Tree

STEM

Variant of R-tree

Avoid overlapping of internal nodes by inserting an object into multiple leaves

Leaf node : (oid, RECT)

RECT : (x low , x high , y low , y high )

Intermediate node : (p, RECT) p → pointer to a lower level node

10

PNU

Properties of R+-Tree

STEM

Properties

   

Subtree rooted at the node pointed to by p contains a rectangle R if and only if R is covered by RECT → only exception is when R is at a leaf node Intermediate node (p1, RECT1) and (p2, RECT2) → overlap between RECT1 , RECT2 is “0” Root has at least two children unless it is a leaf All leaves are at the same level

11

PNU

R+-Tree 1 A 3 2 4 C 5 A B C 1 2 3 4 6 7 8 4 5 8 B 7 6

STEM

12

PNU

Operations to keep the R+-tree

STEM

Searching operation

 

First decompose the search space into disjoint sub-region Descend the tree until the actual data object are found in the leaves

Insertion operation

 

Searching the tree and adding the rectangle in leaf nodes Difference from R tree → add to more than one leaf node

Deletion operation

Locating the rectangle that must be deleted and then removing it from leaf node

Node Splitting operation

 

Two sub-nodes cover disjoint areas Contrary to R tree → downward propagation

13

PNU

Packing Algorithm

STEM

Packing algorithm

 

Pack attempts to set up an R+-tree with good search performance Partition, Sweep, Pack

Selection of x_ or y_ cut for Partition

   

Nearest neighbor Minimal total x- and y- displacement Minimal total space coverage accured by the two sub-regions Minimal number of rectangle splits Reduce the height expansion of R+-tree Reduce the coverage of “dead space”

14

PNU

Operations to build the R+-tree

STEM

Partition operation

 

Decompose the total space into a locally optimal (search performance) Use the sweep routine that parallel to x or y axis

Sweep operation

Used to scan the rectangles and identify points where space partitioning is possible

Pack operation

 

Pack is to organize a R+-tree depends on a set S of rectangles and the fill-factor ff of the tree. Recursively pack the entries of each level of the tree from bottom up

In each level, partitioning non-leaf nodes and some of the rectangles have been split because of the chosen partition, recursively propagate the split downward and if necessary propagate the changes upward also.

15

PNU

Analysis

STEM Disk access for Two-Size Segments : Point Query Disk access for Two-Size Segments : Segment Query

16

PNU

Summary

Advantage of R+-tree

 

Improved search performance, especially in point query More than 50% saving in disk access

Disadvantage of R+-tree

 

Tree height is more than R-tree Use more space (duplicate node) STEM

17