Spatial Access Method (SAM)

Download Report

Transcript Spatial Access Method (SAM)

Trees for spatial indexing
Part 2 : SAMs
SAMs
R-Tree
R*-Tree
X
TV
Answering question
• The Kd-Trie, is similar to kd-tree. In the article it was
used for kd-tree.
• The split-axis isn’t in the middle, but is choosen is the
median point.
• Because, we work with points, we have no problem is
separating the elements.
UB-Tree range queries
• Algorithm is :
• Find all region who intersects q
– IF this region is a page, all objects that
intersects q is in the answer.
– After that we search for the last subcube in
this region and we search the brother, and if it
intersects q we make the same loop on it.
– After that we look the father of B and search
again.
R-Tree
• Special B+-Tree for spatial indexing.
• The performance of the R*-Tree is
decreasing with the dimensionality.
• R-tree access method is prohibitively slow
for dimensions higher than 5.
Problems of (R-Tree based)
Index Structures
• Because it has been shown that with the
increasing of the dimensionality we have
also more overlap.
• Overlap is intuitively when for some point
queries, we have multiple paths to search.
Definition of overlap
• Intuitively, overlap is the pourcentage of
the volume that is covered by more than
one directory hyperrectangle.
• This intuitive definition of overlap is directly
correlated to the query performance.
• Because it implies multiple paths.
Definition of the overlap (2)
• Overlap = ||( Ui,j, i≠j Ri ∩ Rj )|| / ||( Ui Ri )||
• We add all the intersection of the MBR in
volume and we divide it by the union of all
the MBR in volume.
• But overlap in highly populated areas is
much more critical than overlap in low
population.
• WeightedOverlap = |{ p|p Ui,j,i≠j Ri ∩ Rj )}| / |(p|p Ui Ri )|
1
1
Overlap = (¼)/(2) = 1/8 = 12,5 %
WeightedOverlap = (2)/(6) = 1/3 =
33 %
Overlap / WeightedOverlap
• Depending the kind of data the the
measurement can be different.
• If we have uniformed distributed data
points, we can use the overlap measure
• In the case of real data, when can have
clustering, so the weightedOverlap is
more accurate.
X-Tree
• Avoid overlap in the directory.
• X-Tree hybrid of a linear array-like and a
hierarchical R-Tree-like directory.
• In low dimensions the most efficient
organization of the directory is hierarchical
organization.
• For high dimensionality a linear
organization is more efficient.
X-Tree
• In the X-Tree we have 3 types of nodes :
data nodes,normal directory, and
supernodes.
• The supernodes avoid splits in directory,
so it’s more faster to search.
• Not the same as R*-Tree with larger
blocks, because it creates larger blocks
only if necessary.
X-Tree
Supernode
Normal directory
Data nodes
Creation of supernodes
• They are only created if there is no other
possibility to avoid overlap during
insertion.
TV-Tree (Telescopic-Vector tree)
• The basis of the tv-tree is to use
dynamically contracting and extending
feature vectors. ( Like in classification )
TV-Tree
•
•
•
•
A m-contraction of x, is a sequence of
Amx where Am is a contraction matrix.
A natural Am is
(10… 0)
(010…0)
(
….
)
( 0 …. 0 1)
Multiple shapes
• We can use for example a sphere,
because it’s only a center and a radius r.
Represents the set of points with
euclidean distance ≤ r.
• ~the euclidean distance is a special case
of the Lp metrics with p=2.
• For L1 metric (manhattan distance) it
defines a diamond shape.
• The TV-tree is working with any Lp-sphere.
Tv-Tree principle
• So the TV treats the attributs
asymmetrically favoring the first few
features over the rest.
• TV-Tree can use any type of MBR
(minimum bounding region),
rectangle,cube,sphere etc.
• TV-Tree can use any Lp-Sphere
TV-Tree node structure
• Each node is represents the MBR of all it’s
descendents ( say an Lp-sphere ).
• Each region is represented by a center
which is a telescopic-vector and a radius.
• So we talk about TMBR.
TV-1-Tree example
TV-2-Tree example
TMBR
Act. Dim : y
Act. Dim : z
Act. Dim : x,z
Act. Dim : x,y
Act. Dim : x
What is the best number of active
dimensions ?
• They find out that the best number of
active dimensions was two
TV-Tree conclusion
• We accept overlap, so also multiple path
to search.
• Branch choosen for new point is done with
the following criteria :