下載/瀏覽

Download Report

Transcript 下載/瀏覽

Flexible Data Cube for Range-Sum
Queries in Dynamic OLAP
Data Cubes
Authors: C.-I Lee and Y.-C. Li
Speaker: Y.-C. Li
Date :Dec. 19, 2002
1
Outline






Introduction
Related works
Analysis of the average query and
update costs
Flexible data cube
Performance analysis
Conclusions
2
Introduction






Data cubes are frequently adopted to implement
OLAP and provides aggregate information
Data cube: also known as Multi-dimensional
Database(MDDB)
Measure attributes: be chosen as metrics of interest
Functional attributes(dimensions):
other attributes of records.
Cells: store measure attribute values
Range-Sum Query:
add all cells in query region
3
Car-sales example


Measure attribute → Sale_Volume
Dimensions → Year and Age of customers
4
4
20
+
255
+
1430
5



Several previous approaches are used to
accelerate the response time
But they slow down the update speed and require
further space overhead
This study considers both query and update costs
to construct data cubes



No extra space overhead
Choice the best cube in any query or update ratio
We also present a FDC method


No extra space overhead (for dense data cube)
Select or integrate some pre-aggregation techniques for
each dimension
6
Related works
Hierarchical Cube (HC)
[Chan & Ioannidis, 1999]
Double RPS
[Liang et al., 2000]
Relative Prefix Sum
(RPS) [Geffer et al., 1999a]
Prefix Sum(PS)
[Ho et al., 1997]
Dynamic Data Cube
(DDC)[Geffer et al., 1999b]
1997

1998
1999
Iterative Data Cube
(IDC)[Riedewal et al., 2001]
Space-Efficient Data Cube
(SEDC)[Riedewal et al., 2000]
2000
2001
The history of pre-aggregate range-sum queries
7
Prefix Sum(PS)



( Ho et al., 1997 )
3+5+1+2+7+3+2+6+2+4+2+3=40
A: 2+3+3+3+1+5+3+5+1+3+3+4=36
P: 103-50-35+18=36
8
Prefix Sum(PS)
9
Other methods

RPS ( Geffer et al., 1999a)


HC ( Chan & Ioannidis, 1999 )


No exrtra space overhead of RPS and DDC (SRPS and SDDC)
Double RPS ( Liang et al., 2000 )


Hierarchical method but need extra space overhead
SEDC ( Riedewald et al., 2000 )


Hierarchical method
DDC ( Geffer et al., 1999b )


Two levels(Local PS and overlay boxes) but extra space overhead
Three levels but need extra space overhead
IDC ( Riedewald et al., 2001 )

No extra space overhead (different method in different dimension)
10

Our work focuses mainly on methods that do not
require any extra space overhead for dense data
cubes.
11
Analysis of the average query
and update costs

Assume query ratio + update ratio =100%

Average query cost:

Average update cost: Cu(n) / n
12
13
Flexible Data Cube(FDC)


Exponential time is required to find the optimal
pre-aggregated data cube
Proposed the FDC method that is a heuristic
method to select or integrate any two preaggregation techniques for each dimension.
14
The FDC Method
A, LPS
A,
LPS
A, LPS
LPS
A, LPS
A,
or LPS
PS
or
A, PS
LPS
or LPS
PS
orLPS
PSA,
k’=0
k’=1 A, LPS
k’=2
k’=3
k’=4
k’=5
k’=7
k’=6
A,
A,
A,
or
LPS
PS
or
LPS
PS
or
A, or
PS
A,
LPS
PS
LPS
orA,
or
PS
PS
or PS
or PS PS
A or PS
k’=4 or PS

In certain situation


Size
Query ratio

FDCopt = min average cost{FDC candidates}

FDCopt = min{q×CaqFDC + u×CauFDC}

Time complexity O(9n)=O(n)
15
Performance analysis
70
Average cost at different query ratios d = 2,
n = 16, 64
Average cost (access cells)
Average cost (access cells)

A
LPS
PS
FDC
60
50
40
30
20
10
0
1
0.8
0.6
0.4
query ratio (q)
0.2
0
A
LPS
PS
FDC
10000
1000
100
10
1
1
0.8
0.6
0.4
0.2
0
query ratio (q)
16
Average cost for different dimension sizes:
d = 4, q = 1, 0.9
Average cost (access cells)
Average cost (access cells)

1.E+08
1.E+07
A
1.E+06
LPS
1.E+05
PS
1.E+04
FDC
1.E+03
1.E+02
1.E+01
1.E+00
2
4
8
16
32
size(n )
64
128
256
1.E+08
1.E+07
A
1.E+06
LPS
1.E+05
PS
1.E+04
FDC
1.E+03
1.E+02
1.E+01
1.E+00
2
4
8
16
32
64
128
256
size (n )
17
Avera ge cost (ac cess cells)
1.E+09
1.E+08
1.E+07
1.E+06
1.E+05
Avera ge cost (ac cess cells)
Average cost for different dimension sizes:
d = 4, q = 0.1, 0

A
LPS
PS
FDC
1.E+04
1.E+03
1.E+02
1.E+01
1.E+00
2
4
8
16
32
size(n)
64
128 256
1.E+09
1.E+08
1.E+07
1.E+06
1.E+05
A
LPS
PS
FDC
1.E+04
1.E+03
1.E+02
1.E+01
1.E+00
2
4
8
16
32
64
128
256
size(n )
18
Conclusions


Take both the query and update costs into
consideration to select the suitable data cube.
Propose the FDC method




select or integrate pre-aggregating techniques for
each dimension.
Outperform other methods for any query (or
update) ratio situation
linear time: determine the best FDC structure.
In the future, develop new techniques to
support sparse data sets
19
Thank You
20