Selecting Stars: The k Most Representative Skyline Operator
Download
Report
Transcript Selecting Stars: The k Most Representative Skyline Operator
Xuemin Lin, Yidong Yuan, Qing Zhang,
Ying Zhang
ICDE 2007
1
Introduction
preliminary
Method
◦ Two-dimensional Space
Dynamic Programming Based Algorithm
◦ Multi-dimensional Space
Greedy algorithm
FM-based Algorithm
Experiment
Conclusion
2
top-k representative skyline
points(Top-k RSP)
Given a set P of points and an integer k, compute
a set S of k skyline points such that |D(S)| is
maximized.
3
Top-1 RSPP6
dominate {p3,p5,p7}
Top-2 RSP P4,P6 dominate {p3,p5,p7}
P2,P6 dominate {p1,p3,p5,p7}
4
mindist=x+y
N7={N3,N4,N5}
N3={i,g,h}
N6={N1,N2}
N1={a,b,c}
N4={l,k}
Action
Heap contents
Skyline
points
Access root
Expand N7
Expand N3
<N7,4><N6,6>
<N3,5><N6,6><N5,8><N4,10>
<i,5><N6,6><h,7><N5,8><N4,10><g,11>
i
Expand N6
<h,7><N5,8><N1,9><N4,10><g,11>
i
Expand N1
<a,10><N4,10><g,11><b,12><c,12>
i,a
Expand N4
<k,10><g,11><b,12><c,12><l,14>
i,a,k
5
Δ(si, sj) denotes the set of data points that
are dominated by si but not dominated by sj
Eeee
1
1
j <i
6
7
8
9
top-2 RSP={S1,S2}
10
Greedy Algorithm
FM-based Algorithm
11
12
BBS computes skyline points
FM sketches estimate every |D({Sp})|
13
FM algorithm proposed a bitmap based
algorithm that can efficiently estimate the
number of distinct elements (data points).
h() is a randomly generated hash function
which hashes each elementID into an integer
in bitmap.
14
Give a bitmap B of length L
[0…..L-1]
L=8B=00000000(bitmap)
h(p)=3= 00000011(binary)
Only keep least significant bit 100000001
p.fm=10000000
h(q)=6=0000011000000010
q.fm=01000000
S={p,q}
S.fm=10000000 V 01000000 =11000000
15
100000001,3,5….
1, 11, 111
010000002,6,10….
10,110,1010…
001000004,12,20….
100,1100,10100
11100000
find min(B):The leftmost bit value = 0
2min(B)/0.7735=8/0.7735=10.34
16
:number
of hash function
17
action
heap
skyline
Access root
<e6,5> ,<e7,7>
none
Expand e6
<e1,5>,<e7,7>, <e2,8> ,<e3,13>
none
Expand e1
<S2,6>, <e7,7>, <e2,8> ,<S1,8>, <P1,9.5> ,<e3,13>
S2
Expand e7
<e2,8>,<S1,8>,<e4,8>,…..
S2
L=4,
=1
(S2.fm V e2.fm)
Skyline points:S2,S1,S3,S4,S5
18
Add H2
Skyline points:S2,S1,S3,S4,S5
Avoid H2 not too big
when
delete e
e2H2 maxdist=11.5
S3H1 mindist =8.5
S4H1 mindist =12
11.5
12
e2 remove from H2
19
20
21
22
Present an efficient dynamic programming
based exact algorithm in a 2d-space.
An efficient, scalable, index-based
randomized algorithm is developed by
applying the FM probabilistic counting
technique.
23