Top-k Skyline: Un Enfoque Unificado para Consultas basadas

Download Report

Transcript Top-k Skyline: Un Enfoque Unificado para Consultas basadas

Universidad Simón Bolívar
Reaching the Top-k of the Skyline: A
efficient Indexed Algorithm for Top-k
Skyline Queries
Marlene Goncalves and María-Esther Vidal
Universidad Simón Bolívar, Caracas, Venezuela
{mgoncalves,mvidal}@usb.ve
Motivating Example
«There are two Open Faculty
Positions»
«Candidates will be evaluated
in terms of:
Degree, Publications,
Experience»
«Criteria to select the best
Candidates: higher academic degree,
maximum number of publications and
maximum years of experience»
«Ties will be broken by using
the GPA»
Page  2
Solutions: Skyline and Top-k
Motivation
Id
Degree
Publications
Experience
GPA
1
2
3
4
5
6
Post Dr
Post Dr
PhD
MsC
BEng
BEng
9
10
12
13
7
6
2
1
2
4
3
2
3.75
4
3.75
3.6
4.5
3.5
7
BEng
5
1
4
Query: Candidates with
the best academic
degree, number of
publications and
experience
Page  3
Answer: None of the
candidates is better in all
criteria simultaneous.
Skyline
Query: Select the candidates with better degree,
number of publications and experience
4
Id Criteria
Degree (Equally
Publications
Experience
User
Important!)
1
Post Dr
• Degree
Maximum
•
•
9
2
Post Dr
10
3
PhD Maximum
12
Publications
4
MsC
13
5
BEng
Experience
Maximum 7
6
BEng
6
2
1
2
Multicriteria
4
3
3
7
BEng candidates
5
Skyline
selects
1,2,3 and1 4.
GPA
3.75
4
3.75
Function
3.6
4.5
3.5
4
i.e., multi-criteria induce a partial order, and ties
need to be broken
Page  4
Top-k
Id
Degree
Publications
Experience
GPA
10
5
9
12
13
3
1
1
2
2
4
4.5
Top-k
4
4
3.75
3.75
3.6
6
3
3.5
BEng(Score Function!)
7
User5Criteria
•
2
Post Dr
BEng
GPA7Maximum
1
Post Dr
3
PhD
4
MsC
6
BEng
Select two candidates
with the best GPA
Top-k identifies candidates 5 and 2, but these
candidates have not the best academic merit
necessarily
Page  5
Preference based Queries
 Select two candidates with higher GPA between the
candidates with better degree, number of publications and
Experience.
–Skyline
Cases: selects four candidates in equality of
conditions
• Skyline produces the candidates with better degree, number of
publications and Experience
Top-k selects two candidates with good GPA
– Skyline may be very huge and a post-processing over the
Skyline is required to select k.
• Top-k identifies the two candidates with better GPA
–So…
False answers
–ALoss
of results approach is required!!
combined
Page  6
Top-k Skyline
Id
Degree
Publications
Experience
GPA
1
2
3
4
5
6
Post Dr
Post Dr
PhD
MsC
BEng
BEng
9
10
12
13
7
6
2
1
2
4
3
3
3.75
4
3.75
3.6
4.5
3.5
7
BEng
5
1
4
Top-k
Skyline
Skyline
Top-k
Top-k
Top-k Skyline selects candidates 1 and 2 with
theQuery:
highest
GPAs
among the ones
with similar
Answer:
The two
Select
two
academic
records
candidates
with higher
candidates with the
GPA between the
highest value in score
candidates that have
function between the
better degree, number of
candidates preselected
publications and
in terms of multicriteria
experience
function `
Page  7
Outline
 Related Work
 Our Approach
 Top-k Skyline Evaluation
 Experimental Study
 Conclusions and Future Work
Page  8
Related Work
High Ranking capabilities
Poor Ranking Capabilities
Multi-criteria-based
approaches
Score-based
Approaches
Combined Approaches
Neither Skyline nor Top-k provides high expressivity and high ranking
capabilities.
Existing Techniques of Top-k Skyline completely build the Skyline.
BNL,
SFS,
SKYLINE
Techniques
to
LESS
Answers
can be
huge!
Page  9
efficiently
BMORTKS, BDTKS
Top-k Skyline
Metrics:
evaluate
ranking approaches
Skyline Frequency
MPro, Upper, TA,
NRA. Top-k
areFA,required.
Answers may
be
incomplete
Our Challenge
• Efficient Implementation of Top−k Skyline operator: Build
the Top-k Skyline set minimizing the non-necessary probes.
 A probe p of functions m or f is necessary if and only if p is
evaluated on an object o that belongs to the Top-k Skyline.
Id
Degree
Publications Experience
GPA
1
Post Dr
9
2
3.75
2
Post Dr
10
1
4
3
PhD
12
2
3.75
4
MsC
13
4
3.6
7
BEng
5
1
4
Goal:
identify the 7elements of the
Skyline that
5 OnlyBEng
3
4.5
6
3
3.5
belongs
toBEng
the answer 6
Non-Necessary Probes
Page  10
(Evaluations of multi-criteria or score function)!
Top-k Skyline Evaluation
Indexed Solutions
– BDTKS (Basic Distributed Top-k Skyline)
– BMORTKS (Basic Multi-Objective Retrieval for Top-k
Skyline)
– TKSI (Top-K SkyIndex)
Page  11
Pagina
Top-k Skyline Evaluation
BDTKS
Query: Select two candidates with higher GPA
between the candidates that have better degree,
number of publications and experience.
Index 1
Page  12
Index 3
Index 2
Id
Degree
Id
Publications
Id
Experience
1
Post Dr
4
13
4
4
2
Post Dr
3
12
5
3
3
PhD
2
10
6
3
4
MsC
1
9
3
2
5
BEng
5
7
1
2
6
BEng
6
6
2
1
7
BEng
7
5
7
1
Final Object!
Top-k Skyline Evaluation
BDTKS
Query: Select two candidates with higher GPA
between the candidates that have better degree,
number of publications and Experience
Id
2
1
3
4
Partial
Degree
Post Dr
Post Dr
PhD
MsC
Scanning
Publications
of
10
9
12
13
database
Experience
1
2
2
(the4final
object
found)
But, BDTKS completely builds the Skyline.
Page  13
GPA
4
3.75
3.75
3.6
is
Top-k Skyline Evaluation
 BMORTKS
Query: Select two candidates with higher GPA
between the candidates that have better degree,
number of publications and experience.
Index 1
Id
Degree
Page  14
Virtual (Last score seen):
PostDr,?,?
PostDr,12,4
PostDr,13,?
MsC,9,3
PostDr,13,4
PhD,10,3
PhD,12,3
PostDr,12,3
MsC,10,3
Index 3
Index 2
Id
Publications
Id
Experience
1
Post Dr
4
13
4
4
2
Post Dr
3
12
5
3
3
PhD
2
10
6
3
4
MsC
1
9
3
2
5
BEng
5
7
1
2
6
BEng
6
6
2
1
7
BEng
7
5
7
1
Top-k Skyline Evaluation
BMORTKS
Query: Select the two candidates with higher GPA
between the candidates that have better degree,
number of publications and experience
Id
Degree
Publications
2
Post Dr
10
1
Post Dr
9
Partial
Scanning
of database
3
PhD
12
4
MsC the final object)
13
dominates
Experience
1
2
(until
2
4
a seen
GPA
4
3.75
object
3.75
3.6
But, BMRTKS also completely builds the Skyline
Page  15
Top-k Skyline Evaluation
TKSI (Top-K SkyIndex)
Index 1
Index 2
Index 3
Index 4
Id
Degree
Id
Publications
Id
Experience
Id
GPA
1
Post Dr
4
13
4
4
5
4.5
2
Post Dr
3
12
5
3
2
4
3
4
5
6
7
Page  16
2
PhD
Partial
Scanning
of10 database6 (until3 k incomparable
7
4
objects
are found)
1
9
MsC
3
2
1
3.75
TKSI
partially
builds
the Skyline,
and
minimizes the
5
7
BEng
1
2
3
3.75
non-necessary
probes
6
6
BEng
BEng
7
5
2
1
7
1
4
3.6
6
3.5
Experimental Study
Dataset and Queries
– 100.000 Random data:
• Value Domain: Float between 0 and 1
• Data Distribution: Uniform, Gaussian and Mixed
– Sixty random queries. Multi-criteria dimensions range
between 2-6.
Plataform
– SunFire V440, OS SunOS 5.10, two processors Sparcv9
of 1.281 MHZ, 16 GB of RAM and four disks Ultra320
SCSI of 73 GB.
– Java 1.5 and Oracle 9i.
Page  17
Pagina
Experimental Study
Average Skyline Size & Probes
Data Distribution
Average Skyline Size
(60 queries)
Uniform
2405
Gaussian
Skyline size
can be up to 2.6% of2477
the input data!
Mixed
2539
BDTKS
BMORTKS
Probes on virtual object increase the number of probes
Probes function! 23,749,796
27,201,877
of multi-criteria
Page  18
Pagina
Experimental Study
BDTKS executes less probes and requires
less evaluation time than BMORTKS.
BDTKS and TKSI
8,0
BDTKS
BDTKS
6,0
k=1000
k=1000
7,0
.
5,0
4,0
5,0
Log(#Access)
Log(#Probes)
.
6,0
4,0
3,0
3,0
2,0
2,0
1,0
1,0
For small k, TKSI outperforms BDTKS!
0,0
0,0
BDTKS
k=1
k=50
BDTKS
k=1000
k=100
k=500
k=1000
BDTKS
.
5,0
4,0
Log(Time (sec))
Log(#Seen Objects)
k=1
3,5
.
6,0
k=10
3,0
2,0
1,0
k=10
BDTKS
k=50
k=100
k=500
k=1000
k=1000
3,0
2,5
2,0
1,5
1,0
0,5
0,0
0,0
BDTKS
Page  19
k=1
k=10
k=50
k=100
k=500
k=1000
BDTKS
Pagina
k=1
k=10
k=50
k=100
k=500
k=1000
Conclusions and Future Work
 TKSI builds the Skyline until it has calculated the k
objects.
 Our experimental results show that TKSI executed
less probes and consumed less evaluation time.
 In the Future, we plan to extend TKSI over Web
data sources, and incorporate the TKSI into an
existing DBMS.
Page  20
Q&A
Thanks!