Efficient Computation of the Skyline Cube

Download Report

Transcript Efficient Computation of the Skyline Cube

Efficient Computation of
the Skyline Cube
Yidong Yuan
School of Computer Science & Engineering
The University of New South Wales & NICTA
Sydney, Australia
Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW),
Wei Wang (UNSW), Jeffrey Xu Yu (CUHK),
Qing Zhang (UNSW & CSIRO)
Outline




Introduction
Skycube Computation Techniques
Experiments
Summary
VLDB 2005
Yidong Yuan @DBG.UNSW
2
Skyline Query
(x1, x2, …, xd)
 i, xi  yi
(y1, y2, …, yd)
& ∃k, xk<yk
dist

A real estate example
P3
price (100K)
dist
age
…
P1
3
3
5
…
P2
5
1
1
…
P3
1
4
4
…
P4
4
5
2
…
P5
2
2
3
…
Properties and Values


P5
P1
skyline returns data points
not dominated by others
VLDB 2005
P4
P1
Skyline on
price & dist
P5
P2
price
age
P1
P3
Skyline on
price & age
Yidong Yuan @DBG.UNSW
P5
P4
P2
price
3
Skyline Cube

P
Skycube




A
B
C
3
3
5
ABC
5
1
1
{P2, P3, P4,
P5 }
AB
{P2, P3, P5}
1
4
4
AC
{P2, P3, P4,
P5 }
4
5
2
BC
{P2}
2
2
3
A
{P3}
B
ABCC
{P2}
skyline
1
P
Skyline on price & dist & age
Skyline on price & dist
Skyline on price & age
……
A union of skyline results of
all the non-empty subsets of
d-dimensional set (2d - 1)
VLDB 2005
Skycube Example
2
P
3
P
4
Dataset
P
5
{P2}
AB
AC
BC
A
B
C
Lattice Structure of a Skycube
Yidong Yuan @DBG.UNSW
4
Motivation

How to compute Skycube efficiently?


existing skyline techniques are applicable
no sharing computation  Not efficient!
VLDB 2005
Yidong Yuan @DBG.UNSW
5
Motivation (cont.)

nested-loop-based alg.

BNL
B
[ICDE 01]
Candidate Comparison of
List
Skyline on A
P4
P3
P1
P5
P2
A
Comparison of
Skyline on A and B
P1

--
--
P2
P1
P1(A) vs. P2(A)
P1(A) vs. P2(A)
P1(B) vs. P2(B)
……
redundant comparison  Not efficient!

SFS
[ICDE 03]
: presort the dataset  keep the candidate list minimum
repeated sorting  Not efficient!
VLDB 2005
Yidong Yuan @DBG.UNSW
6
Motivation (cont.)

divide-and-conquer-based alg. (DC
B
BC
P4
P3
Divide Step of
Skyline on A and B
[ICDE 01])
Divide Step of
Skyline on A, B, and C
P1
P5
P4
P3
P1
P5
P2
m’A mA m’’A
P2
A
m’A mA m’’A
A
repeat same divide/merge steps  Not efficient!
VLDB 2005
Yidong Yuan @DBG.UNSW
7
Outline


Introduction
Skycube Computation Techniques




Bottom-Up Skycube Algorithm (BUS)
Top-Down Skycube Algorithm (TDS)
Experiments
Summary
VLDB 2005
Yidong Yuan @DBG.UNSW
8
Property of Skycube

Distinct Value Condition


no two data points have same value on the same
dimension
SKYU(S): skyline on sub-dimension set U


SKYU(S)  SKYV(S)  U  V
General Case

Keep track of the “bad guys”
VLDB 2005
Yidong Yuan @DBG.UNSW
9
Basic Idea


compute the Skycube in a
level-wise and bottom-up manner
each skyline is computed by a
nested-loop-based algorithm
ABC
VLDB 2005
AB
AC
BC
A
B
C
Yidong Yuan @DBG.UNSW
10
Sharing Strategies

share-results: SKYU(S)  SKYV(S)



reduce the size of input
reduce the # of dominance test
AB
A
B
share-sorting: sort the dataset on each dimension


keep the candidate list minimum
reduce the # of sorting from 2d – 1 to d
VLDB 2005
Yidong Yuan @DBG.UNSW
11
Filtering

Effective Dominance Test


B
filter function: p = sum of p’s coordinates
no false negative: p  q  q does not dominate p
Skyline on A and B
Sort on
B
P4
ABp
6 4 6 5
Candidate
Comparison
List
(without filter)
P3
P1
P5
P2

P2 P5 P1 P 3 P4
P5
A
P2
P2(A) vs. P5(A)
P2(B) vs. P5(B)
9
Comparison
(with filter)
ABP2 vs. ABP5
maintain the candidate list in a non-decreasing order of
filtering values (e.g. avl-tree)
VLDB 2005
Yidong Yuan @DBG.UNSW
12
DC Algorithm
Merge Step
Divide Step
B
B
P4
P3
P2
A
VLDB 2005
S2 P4
P3
P1
P5
S1
B
mA
S22
P3
P1
P5
S12
P1
mB
P2
A
Yidong Yuan @DBG.UNSW
P5
S11
mA
S21 P2
A
13
Sharing Opportunities

share-partitioning
S1
B
skyline on
A and B
S2
skyline on
A, B, and C
P3
P1
… mi …
VLDB 2005
BC
P4
P5
S1
P2
mA … mj …
S2
P4
P3
P1
P5
A
Yidong Yuan @DBG.UNSW
… mi …
P2
mA … mj …
A
14
Sharing Opportunities (cont.)
S1
BC
P3

share-merging
decompose
merge step
VLDB 2005
P4
S2
BC {P1, P2, P4}
{P3, P5}
BC
P1
P5
P2
A
mA
skyline on A, B, and C
{P3, P5}
S2
P3
P1
P5
S1
B
P2
mA
skyline on A and B
{P3, P5}
B
A
{P1, P2}
{{P1, P2}, {P4}}
{P3, P5}
BC
{P1, P2}
{P3, P5}
BC
{P4}
{P3, P5}
B
{P3, P5}
C
Yidong Yuan @DBG.UNSW
{P1, P2}
above result
15
TDS Algorithm

ABC
Basic Idea



AB
compute skylines on a path simultaneously
A
find a minimal set of paths
share-parent: using parent’s skyline result as the input
S
ABC
VLDB 2005
ABC
SKYABC(S)
SKYABC(S)
AB
AC
BC
AB
BC
AC
A
B
C
A
B
C
Yidong Yuan @DBG.UNSW
16
Outline




Introduction
Skycube Computation Techniques
Experiments
Summary
VLDB 2005
Yidong Yuan @DBG.UNSW
17
Experiment Setting
BNLS: BNL-Skycube algorithm *
SFSS: SFS-Skycube algorithm *
Algorithms
DCS: DC-Skycube algorithm *
(* our sharing
strategies applied)
BUS: Bottom-Up Skycube algorithm
TDS: Top-Down Skycube algorithm
Dataset
correlated, independent, anti-correlated
Dimensionality d  [4, 10]
n  [100k, 500k]
Cardinality
VLDB 2005
Yidong Yuan @DBG.UNSW
18
Effect of Dimensionality
independent
Dimensionality (n = 500k)
VLDB 2005
Yidong Yuan @DBG.UNSW
19
Effect of Dimensionality (cont.)
VLDB 2005
correlated
anti-correlated
Dimensionality (n = 500k)
Dimensionality (n = 500k)
Yidong Yuan @DBG.UNSW
20
Effect of Cardinality
anti-correlated
x100K
Cardinality (d = 8)
VLDB 2005
Yidong Yuan @DBG.UNSW
21
Effect of Duplicate Values
independent (d = 8)
VLDB 2005
Yidong Yuan @DBG.UNSW
22
Outline




Introduction
Skycube Computation Techniques
Experiments
Summary
VLDB 2005
Yidong Yuan @DBG.UNSW
23
Summary


A novel concept –– Skycube
Skycube computation Techniques

Bottom-Up Skycube algorithm


Top-Down Skycube algorithm


share-results, share-sorting
share-partition-and-merging, share-parent
Future Work


I/O based techniques
multiple skyline queries
VLDB 2005
Yidong Yuan @DBG.UNSW
24
Q&A
Thank you.
VLDB 2005
Yidong Yuan @DBG.UNSW
25
Preliminaries

Existing Skyline Computation Algorithms

nested-loop-based



divide-and-conquer-based


Block-Nested-Loop (BNL) algorithm [BKS, ICDE 01]
Sort-Filter-Skyline (SFS) algorithm [CGG+, ICDE 03]
Divide-and-Conquer (DC) algorithm
[BKS, ICDE 01]
index-based


VLDB 2005
Bitmap, Index-Method [TEO, VLDB 01]
R-tree Index Based [KRR, VLDB 02; PTF+, SIGMOD 03]
Yidong Yuan @DBG.UNSW
26
Preliminaries
–– BNL and SFS Algorithms
BNL algorithm
B
P3
P1
P5
SFS algorithm


Current Cand. List
P4
P2
A
Results
P1 
P1
P2 P1
P1 , P2
P3 P1 , P2
P1 , P2 , P3
P4 P1 , P2 , P3
P1 , P2 , P3
P5 P1 , P2 , P3
P2 , P3 , P5
entropy value (indicator of the dominance power)
pre-sort the dataset (e.g., {P5, P2, P3, P1, P4})
VLDB 2005
Yidong Yuan @DBG.UNSW
27
Preliminaries
–– DC Algorithm
Merge Step
Divide Step
B
B
P4
P3
P2
A
VLDB 2005
P4
P3
P1
P5
S1
S2
mA
S12
P4
P3
P1
P5
B
P1
mB
P2
A
Yidong Yuan @DBG.UNSW
S22
P5
S11
mA
S21 P2
A
28
General Case

Issue: SKYU(S)  SKYV(S) does not necessarily hold
B
P1
SKYB(S) = {P3, P4, P5}
P2
SKYAB(S) = {P3}
P3

P5
P4
A
Solution

share-results: re-examine SKYU(S) on V
VLDB 2005
Yidong Yuan @DBG.UNSW
29
Motivation (cont.)

other techniques


Index method [VLDB 01]
R-tree based index [VLDB 02; SIGMOD 03]
pre-computation
repeat

(e.g. index)
pre-computation
is not reusable


Not efficient!
Goal

Maximizing sharing computation!
VLDB 2005
Yidong Yuan @DBG.UNSW
30