Transcript Slide 6
COSC 6114
Prof. Andy Mirzaian
References:
• [M. de Berge et al] chapter 5
Applications:
• Data Base
• GIS, Graphics: crop-&-zoom, windowing
Orthogonal Range Search: Data Base Query
salary
14,000
Mr. G. O. Meter
born: Nov. 6, 1988
Salary: $13,600.
13,000
date of birth
1980,00,00
1989,99,99
2D Query Rectangle [1980,00,00 : 1989,99,99] [13,000 : 14,000]
salary
14,000
13,000
4
2
date of birth
1980,00,00
1989,99,99
3D Query Orthogonal Range [1980,00,00 : 1989,99,99] [13,000 : 14,000] [2 : 4]
1D-Tree: 1-Dimensional Range Searching
x axis
x’
x
Static:
Binary Search in a sorted array.
Dynamic: Store data points in some balanced Binary Search Tree
T.
Let the data points be P = { p1, p2 , …, pn } .
T is a balanced BST where the data appear at its leaves sorted left to right.
The internal nodes are used to split left & right subtrees.
Assume x(v) = max x(L), where L is any leaf in the left subtree of internal node v.
root[T ]
23
9
49
5
17
2
2
6
5
6
13
9
13
37
31
23
17
62
31
41
37
41
73
62
49
85
73
85
Query Range: [7 : 49]
91
Query Range [x : x’]: Call 1DRangeQuery(root[T ],x,x’)
ALGORITHM 1DRangeQuery (v, x, x’)
if v is a leaf then if x x(v) x’ then report data stored at v
else do
if x x(v) then 1DRangeQuery ( leftchild(v) , x, x’ )
if x(v) < x’ then 1DRangeQuery ( rightchild(v), x, x’ )
od
end
Complexities:
root[T ]
Query Time
O( K + log n)
Construction Time O(n log n)
Space
O(n)
T ,[x,x’] output
PT
store T
vsplit
[These are optimal]
K leaves reported
2D-Tree
y
Consider dimension d=2:
point p=( x(p) , y(p) ) , range R = [x1 : x2] [y1 : y2]
p R x(p) [x1 : x2] and y(p) [y1 : y2] .
R
y2
p
y(p)
y1
x
x1
L
x(p) x2
L
2D-tree
Pleft
Pright
Pleft
Pright
OR
L = vertical/horizontal median split.
Pright
Alternate between vertical & horizontal splitting
L
Pleft
at even and odd depths.
(Assume: no 2 points have equal x or y coordinates.)
Constructing 2D-Tree
Input: P = { p1, p2 , …, pn } 2 off-line.
Output: 2D-tree storing P.
Step 1: Pre-sort P on x & on y, i.e., 2 sorted lists Û = (Xsorted(P), Ysorted(P)).
Step 2: root[T ] Build2DTree ( Û , 0)
end
Procedure Build2DTree ( Û , depth )
if Û contains one point then return a leaf storing this point
else do
if depth is even
then x-median split Û, i.e., split data points in half by a vertical line L
through x-median of Û and reconfigure Ûleft and Ûright .
else y-median split Û, … by a horizontal line L,
and reconfigure Ûleft and Ûright .
v a newly created node storing line L
leftchild(v) Build2DTree ( Ûleft , 1+depth)
rightchild(v) Build2DTree ( Ûright , 1+depth)
return v
end
T(n) = 2 T(n/2) + O(n) = O(n log n) time.
2D-Tree Example
L1
L5
L9
p10
p2
p8
L2
p5 p
6
p3
L4
p1
p4
L3
L8
L6
p9
p7
L1
L7
L2
L6
L3
L4
p1
L5
p4
p3
p2
L7
L8
p5
p7
L9
p9
p6
p8
p10
Query Point Search in 2D-Tree
L1
L5
L9
p10
p2
p8
L2
p5 p
6
p3
L4
p1
p4
L3
L8
q
L6
p9
p7
L1
L7
L2
L6
L3
L4
p1
L5
p4
p3
p2
L7
L8
p5
p7
L9
p9
p6
p8
p10
2D-Tree node regions
region(v) = rectangular region (possibly unbounded) covered by the subtree rooted at v.
region (root[T ]) = (- : + ) (- : + )
Suppose region(v) = x1 : x2 y1 : y2
what are region(leftchild(v)) and region(rightchild(v))?
L
With x-split:
region(lc(v)) = x1 : x(L) ] y1 : y2
region(rc(v)) = ( x(L) : x2 y1 : y2
With y-split:
region(lc(v)) = x1 : x2 y1 : y(L) ]
region(rc(v)) = x1 : x2 ( y(L) : y2
lc(v)
rc(v)
lc(v)
L
rc(v)
2D-Tree Range Search
For range R = [x1 : x2] [y1 : y2]
call Search2DTree (root[T ] , R )
ALGORITHM Search2DTree ( v , R )
1. if v is a leaf then if p(v) R then report p(v)
2. else if region(lc(v)) R
3.
then ReportSubtree (lc(v))
4.
else if region(lc(v)) R
5.
then Search2DTree ( lc(v) , R )
6.
7.
8.
9.
end
if region(rc(v)) R
then ReportSubtree (rc(v))
else if region(rc(v)) R
then Search2DTree ( rc(v) , R )
region(v) can either be passed as input parameter, or explicitly stored at node v, vT.
ReportSubtree(v) is a simple linear-time in-order traversal that reports every
leaf descendent of node v.
Running Time of Search2DTree
K = # of points reported.
Lines 3 & 7 take O(K) time over all recursive calls.
Total # nodes visited (reported or not) is proportional to # times conditions of
lines 4 & 8 are true.
region(v)R & region(v) R a bounding edge e of R intersects region(v).
R has 4 bounding edges. Let e (assume vertical) be one of them.
Define H(n) (resp. V(n)) = worst-case number of nodes v that intersect e for a
2D-tree of n leaves, assuming root corresponds to an x-split (resp. y-split).
e
H(n)
e
L
V(n)
H (n ) V (n / 2) 1
H (n ) 3 n 2
H ( n ) 2 H ( n / 4 ) 2
V (n ) 2H (n / 2) 1
V
(
n
)
2
V
(
n
/
4
)
3
V (n ) 4 n 3
(H (1) V (1) 1)
Running
Time
O( K
n ).
L
dD-Tree Complexities
2D-Tree
O( K + n ) worst-case,
Query Time :
O( K + log n) average
Construction Time : O(n log n)
Storage Space:
dD-Tree
O(n)
d-dimensions
Use round-robin splitting at successive levels on the d dimensions x1 , x2 , … , xd .
Query Time:
Construction Time:
Space:
O(dK + d n1–1/d )
O(d n log n)
O(dn)
How can we improve the query time?
Range Trees
2D Range Tree
Query Time:
O( K + log2 n )
O(K + log n) by Fractional Cascading
Construction Time: O(n log n)
Space:
O(n log n)
Range R = [x : x’] [y : y’]
1D Range Tree on x-coordinates:
y
O(log n)
x
x’
x
x’
O(log n) canonical sub-trees
Each x-range [x : x’] can be expressed as the disjoint union of O(log n) canonical x-ranges.
Range Trees
2-level data structure:
root[T ]
Primary Level:
BST on
x-coordinates
Tassoc(v)
v
Secondary level:
BST on y-coord.
P(v)
P(v)
min(v)
max(v)
Range Tree Construction
ALGORITHM Build 2D Range Tree (P)
Input: P = { p1, p2 , …, pn } 2, P = (Px , Py)
represented by pre-sorted list on x (named Px) and on y (named Py).
Output: pointer to the root of 2D range tree for P.
Construct Tassoc , bottom up, based on Py ,
but store in each leaf the points, not just their y-coordinates.
if |P| > 1
then do
Pleft { pP | px xmed of P }
(* both lists Px and Py should split *)
Pright { pP | px > xmed of P }
lc(v) Build 2D Range Tree (Pleft )
rc(v) Build 2D Range Tree (Pright )
od
min(v) min (Px ); max(v) max(Px )
Tassoc(v) Tassoc
return v
end
T(n) = 2 T(n/2) + O(n) = O(n log n) time.
This includes time for pre-sorting.
2D Range Query
ALGORITHM 2DRangeQuery ( v, [x : x’] [y : y’] )
1. if x min(v) & max(v) x’
2.
then 1DRangeQuery (Tassoc(v) , [y : y’] )
3.
else if v is not a leaf do
4.
if x max(lc(v))
5.
then 2DRangeQuery ( lc(v), [x : x’] [y : y’] )
6.
if min(rc(v)) x’
7.
then 2DRangeQuery ( rc(v), [x : x’] [y : y’] )
8.
od
end
T
x
x’
• Line 2 called at roots of red canonical sub-trees, a total of O(log n) times.
Each call takes O(Kv + log | Tassoc(v) | ) = O(Kv + log n) time.
• Lines 5 & 7 called at blue shoulder paths. Total cost O(log n).
• Total Query Time = O(log n + v(Kv + log n)) = O(vKv + log2 n) = O(K + log2 n).
Query Time:
O( K + log2 n ) will be improved to O(K + log n) by Fractional Cascading
Construction Time: O(n log n)
Space:
O(n log n)
Higher Dimensional Range Trees
P = { p1, p2 , …, pn } d,
pi = (xi1 , xi2 , … , xid ) , i=1..n.
root[T ]
Primary Level:
BST on the 1st
coordinate
Tassoc(v)
(d-1)-dimensional
Range Tree
on coord’s 2..d.
v
P(v)
P(v)
Higher Dimensional Range Trees
d-level data structure
Higher Dimensional Range Trees
Query Time:
Qd(n) = O( K + logd n) improved to O(K + logd-1 n) by Frac. Casc.
Construction Time: Td(n) = O(n logd-1 n)
Space:
Sd(n) = O(n logd-1 n)
T d ( n ) 2 T d n2 T d 1 ( n ) O ( n )
T d ( n ) O ( n log
T 2 ( n ) O ( n log n )
S d ( n ) 2 S d n2 S d 1 ( n ) O (1)
S d ( n ) O ( n log
S 2 ( n ) O ( n log n )
d 1
d 1
n)
n)
Q d ( n ) O ( K ) Qˆ d ( n )
Qˆ d ( n ) O (log d n )
ˆ
ˆ
Q d ( n ) O (log n ) O (log n ) Q d 1 ( n )
d
Q
(
n
)
O
(
K
log
n)
2
d
Qˆ 2 ( n ) O (log n )
General Sets of Points
What if 2 points have the same coordinate value at some coordinate axis?
Composite Numer Space: (lexicographic order)
(a,b) (a | b)
(a | b) < (a’ | b’) a<a’ or (a=a’ & b<b’)
p = (px , py ) p’ = ((px | py ) , (py | px ) )
R=[x:x’][y:y’] R’ = [ (x | -) : (x’ | +) ] [ (y | -) : (y’ | +) ]
pR
p’ R’
x px x’
(x | -) ((px | py ) (x’ | +)
& y py y’
& (y | -) ((py | px ) (y’ | +)
Note: no two points in the composite space have the same
value at any coordinate (unless they are identical points).
Fractional Cascading
IDEA: Save repeated cost of binary search in many sorted lists for the same
range [y : y’] if the list contents for one are a subset of the other.
A2 A1
Binary search for y in A1 to get to A1[i].
Follow pointer to A2 to get to A2[j].
Now walk to the right in each list.
A1
1 3 5 7 9 13
15
A2
5
26
13
19
23
26
31
36
36
45
63
92
45
nil
nil
Fractional Cascading
A1
1
7
1 3 5 7 9 13
13
23
26
15
36
19
23
26
3
5
31
36
9
A2
15
19
31
A3
A2 A1 , A3 A1 .
No binary search in A2 and A3 is needed.
Do binary search in A1.
Follow blue and red pointers from there to A2 and A3.
Now we have the starting point in each sorted list. Walk to the right & report.
nil
Layered 2D Range Tree
Tassoc(v)
T
v
Tassoc(lc(v))
lc(v)
rc(v)
P(lc(v))
P(v)
P(rc(v))
P(lc(v)) P(v)
P(rc(v)) P(v)
Tassoc(rc(v))
Layered 2D Range Tree
T
Associated Structures at
the secondary level by
Fractional Cascading
Layered 2D Range Tree (by Fractional Cascading)
Query Time:
Q2(n) = O(log n + v (Kv + log n)) = O(v Kv + log2 n) = O(K + log2 n)
improves to:
Q2(n) = O(log n + v (Kv + 1)) = O(v Kv + log n) = O(K + log n).
For d-dimensional range tree query time improves to:
Q d ( n ) O ( K ) Qˆ d ( n )
ˆ
ˆ
Q
(
n
)
O
(log
n
)
O
(log
n
)
Q
(
n
)
d
Q d ( n ) O ( K log
d 1
Qˆ 2 ( n ) O (log n )
d 1
n)
Exercises
1.
Show the following implication on the worst-case query time on 2D-Tree:
H ( n ) 2 H ( n / 4 ) 2
ˆ
Q ( n ) O ( n ).
V ( n ) 2 V ( n / 4 ) 3
2.
Describe algorithms to insert and delete points from a 2D-Tree. You don’t need to
take care of rebalancing the structure.
3.
dD-Trees can also be used for partial match queries. A 2D partial match query
specifies one of the coordinates and asks for all points that have the specified
coordinate value. In higher dimensions we specify values for a subset of the
coordinates. Here we allow multiple points to have equal values for coordinates.
(a) Show that 2D-Trees can answer partial match queries in O(K+n) time, where
K is the number of reported answers.
(b) Describe a data structure that uses O(n) storage and answers 2D partial match
queries in O(K + log n) time.
(c) Show that a dD-Tree can solve a partial match query in O(K + n1-s/d) time,
where s is the number of specified coordinates.
(d) Show that, when we allow for O(d 2d n) storage, dD partial match queries can
be answered in O(K + d log n) time.
4.
Describe algorithms to insert and delete points from a Range Tree. You don’t need
to take care of rebalancing the structure.
5.
One can use 2D-Trees and Range Trees to search for a particular point (a,b) by
performing a range query with the range [a:a] [b:b].
(a) Prove this takes O(log n) time on 2D-Trees.
(b) Derive the time bound on Range Trees.
6.
In many applications one wants to do range searching among objects other than
points.
(a) Let P be a set of n axis-parallel rectangles in the plane. We want to be able
to report all rectangles in P that are completely contained in a query rectangle
[x : x’] [y: y’]. Describe a data structure for this problem that uses O(n log3 n)
storage and has O(K + log4 n) query time, where K is the number of reported
answers. [Hint: Transform the problem to an orthogonal range searching
problem in some higher dimensional space.]
(b) Let P now consist of a set of n simple polygons in the plane. Describe a data
structure that uses O(n log3 n) space (excluding space needed to externally
store the polygons) and has O(K + log4 n) query time, where K is the number
of polygons completely contained in the query rectangle that are reported.
(c) Improve the query time to O(K + log3 n).
7.
For this problem, assume for simplicity that n is a power of 2. Consider a 3D-Tree
for a set of n points in 3D. Consider a line that is parallel to the x-axis. What is the
maximum number of leaf cells that can intersect by such a line.
8.
We showed that a 2D-tree could answer orthogonal range queries for a set of n
points in the plane in O( n1/2 + K) time, where K was the number of points reported.
Generalize this to show that in dimension 3, the query time is O(n2/3 + K).
END