Transcript slides
Succinct Orthogonal Range
Search Structures on a Grid with
Applications to Text Indexing
Prosenjit Bose, Carleton University
Meng He, Unversity of Waterloo
Anil Maheshwari and Pat Morin,
Carleton University
2D Orthogonal Range Search
A
fundamental geometric query problem
Data sets: A set, N, of n points in the plane
Query: Given an orthogonal query
rectangle R, return information about the
points in N∩R
Orthogonal range counting queries
Orthogonal range reporting queries
k: size of the output
Example
Range counting query: 5
Range reporting query
Classic Solutions
Data
Structures
Space (words)
Time
(counting)
R-trees
O(n)
O(n)
kd-trees
O(n)
O(n1/2 + k)
Chazelle 1988
O(n)
Range trees
O(n lg n)
O(lg n + k)
Chazelle 1988
O(n lgε n)
O(lg n + k)
O(lg n)
Time
(reporting)
O(lg n + k lgε n)
Range Search on an n×n Grid
A special case: points coordinates are from
[1..n]×[1..n] (rank space)
The general problem can be reduced to this
special case using a standard approach
Alstrup et al. 2000
Orthogonal range search structures in the rank
space and succinct data structures
Background: Succinct Data Structures
What
are succinct data structures
(Jacobson 1989)
Representing data structures using ideally
information-theoretic minimum space
Supporting efficient navigational operations
Why
succinct data structures
Large data sets in modern applications:
textual, genomic, spatial or geometric
Succinct Orthogonal Range Search
Structures in rank space
Wavelet Trees (Grossi et al. 2003)
Space: n lg n + o (n lg n) bits
Query time for orthogonal range search (Makinen and
Navarro 2006):
Restriction: no points have the same x or y coordinates
Counting: O(lg n)
Reporting: O(k lg n)
Applications
Space-efficient text indexes: Makinen and Navarro
2006, Chien et al. 2008
Support counting: an Overview
Reduce orthogonal range counting to
Dominance counting
Design a succinct data structure supporting
dominance counting on a narrow grid, i.e. an n×t
grid where t = O(lgε n) (0<ε<1). We also assume
that each point has a distinct x-coordinate
Recursively divide the n×n grid into narrow grids
and use the above structure at each level
Remove the restriction that each point has a
distinct x-coordinate
Range counting on a Narrow Grid
S = 2 3 4 4 1 3 1 1 3 2 4 2 3…
Divide the grid into blocks of size lg2 n × t
A 2D array A: A[i,j] stores the result of dominance counting when
(i lg2 n+1, j) is given as the query point
Divide each block into subblocks of size lgλ n × t (0< λ < ε)
A 2D array B: B[i,j] stores, when (i lgλ n+1, j) is given as a query point,
the result of dominance counting inside the block containing this point
A table C that stores for each possible set of lgλ n points on a lgλ n × t
grid and each query point in the grid, the result of dominance counting
Space: n lg t + o(n) bits
Time: O(1)
Range Counting on an n×n Grid
Transform the original grid into a narrow grid by
grouping y-coordinates into ranges of size n/t
Construct orthogonal range search structures
for this narrow grid and recurse
Number of levels: log t n
Space: n lg n + o(n lg n) bits Time: O(log t n)
More results
The restriction that each point has a distinct xcoordinate can be removed using 2n+o(n) extra
bits
The support for range reporting is based on
similar ideas but is more complicated
Our main result
Space: n lg n + o (n lg n) bits
Query time for orthogonal range
Counting: O(lg n / lg lg n)
Reporting: O(k lg n / lg lg n)
Applications: Substring Search
Notation:
T-text, n-text size, σ-alphabet size
P-pattern, m-pattern length
occ-number of occurrences
Query: report the occurrences of P in T
Chien et al. 2008: O(n lg σ) bits, O(m + lg n ×
(logσn + occ lg n)) time
Our results: O(n lg σ) bits, O(m + lg n × (logσn +
occ lg n) / lglg n) time
Applications: Position-Restricted
Substring Search
Query:
Given a pattern P and a range [i, j],
how many times does P occur in T[i, j]?
Makinen and Navarro 2006
Space: 3n lg n + o(n lg n) bits
Time: O(m + occ lg n)
Our
results:
Space: 3n lg n + o(n lg n) bits
Time: O(m + occ lg n / lglg n)
Applications: Representing Small
Integers
Data: A sequence S of n numbers in [1..s],
where s = polylog (n)
Ferragina et al. 2007
Space: nH0(S) + o(n) bits
Operations: rank/select in O(1) time
Our result:
New operation: Given a range of position [p1..p2] and
a range of values [v1..v2], retrieve the entries in
S[p1..p2] whose values are in [v1..v2]
Time: O(1) for counting, O(1) per entry for reporting
Applications: A Restricted
Versions of Range Search
Restriction: the query rectangle is defined by two
points in the given point set
Notation:
c: the number of bits required to encode the
coordinates of a point
Space: cn + n lg n + o(n lg n) bits
Time:
Counting: O (lg n / lglg n)
Reporting: O(k lg n / lglg n)
Conclusions
We designed a succinct data structure for
orthogonal range search on an n×n grid that
provides more efficient support for both counting
and reporting queries
This structure can be used to improve and
extend previous results on succinct data
structures, such as succinct text indexes and
sequence representation.
Thank you!