A Similarity Skyline Approach for Handling Graph Queries - A Preliminary Report Katia Abbaci† Allel Hadjali† Ludovic Liétard‡ †IRISA/ENSSAT, University Daniel Rocacher† of Rennes1 {Katia.Abbaci, Allel.Hadjali, Daniel.Rocacher}@enssat.fr ‡IRISA/IUT, University of.

Download Report

Transcript A Similarity Skyline Approach for Handling Graph Queries - A Preliminary Report Katia Abbaci† Allel Hadjali† Ludovic Liétard‡ †IRISA/ENSSAT, University Daniel Rocacher† of Rennes1 {Katia.Abbaci, Allel.Hadjali, Daniel.Rocacher}@enssat.fr ‡IRISA/IUT, University of.

A Similarity Skyline Approach for Handling Graph
Queries - A Preliminary Report
Katia Abbaci†
Allel Hadjali†
Ludovic Liétard‡
†IRISA/ENSSAT, University
Daniel Rocacher†
of Rennes1
{Katia.Abbaci, Allel.Hadjali, Daniel.Rocacher}@enssat.fr
‡IRISA/IUT, University
of Rennes1
[email protected]
Outline
 Introduction
 Background:
 Skyline Query
 Graph Query
 Graph Similarity Measures
 Graph Similarity Skyline
 Refinement Graph Similarity Skyline
 Summary and Outlook
2
GDM 2011
07/11/2015
Introduction (1/3)
Context:
 Graphs: Modeling of structured and complex data
 Application Domains:
 Medicine, Web, Chemistry, Imaging, XML documents,
Bioinformatic,...
Medicine
3
Web
GDM 2011
Chemistry
Imaging
07/11/2015
Introduction (2/3)
Main:
 Search Problem of similar graphs to graph query
 Existing approaches: a single similarity measure
 Several methods for measuring the similarity between
two graphs:
 Method limited to an application class
 No method fits all
4
GDM 2011
07/11/2015
Introduction (3/3)
Motivations:
 Model for different classes of applications
 Model incorporating multiple features
Contributions:
Graph Similarity Skyline in order to answer a graph query:
optimality in the sense of Pareto
2. A Refinement Method of Skyline based on diversity criterion
among graphs
1.
5
GDM 2011
07/11/2015
Skyline Query
 Identification of interesting objects from multi-dimensional
dataset
 p = (p1, …, pm), q = (q1, …, qm): multidimensional objects
p Pareto dominates q, denoted p  q, iff:
i.
ii.
6
on each dimension, 1 ≤ i ≤ m, pi ≤ qi
on at least one dimension, pj < qj
GDM 2011
07/11/2015
Sample Skyline Query
 Find a cheap hotel and as close as possible to the downtown:
Hotel
H1
H2
H2
H3
H2
H4
H5
H6
H6
H7
H6
Price (€)
Distance (m)
54.0
150
33.0
110
42.5
240
32.0
180
31.7
270
21.0
195
21.2
210
Tab. 1 – Sample of hotels
7
Skyline = {H2, H4, H6}
GDM 2011
07/11/2015
Graph Query
 Two categories of graph queries:
1. Graph containment search:
q: a query, D = {g1, …, gn} a GDB
i.
ii.
Subgraph containment search
 Retrieve all graphs gi of D such that q ⊆ gi
Supergraph containment search
 Retrieve all graphs gi of D such that q ⊇ gi
2. Graph similarity search:
Retrieve structurally similar graphs to the query graph
8
GDM 2011
07/11/2015
Graph Similarity Measures
 Several processing methods of graph similarity:
 Edit Distance (DistEd)
 Maximum common subgraph based distance (DistMcs)
 Graph union based distance (DistGu)
9
GDM 2011
07/11/2015
Graph Similarity Measures
Distance between g and g’
Edit
Distance
Mcs-based
Distance
Gu-based
Distance
Similarity between g and g’
DistEd g, g '  minsE _ op c(s)
SimEd g , g ' 
cs   i 1 c(e _ opi )
n
DistMcs  1  SimMcs g, g '
DistGu  1  SimGu g , g '
SimMcs g , g ' 
SimGu g , g ' 
1
1  DistEd
Mcsg , g '
Max g , g ' 
Mcsg , g '
g  g '  Mcs( g , g ' )
Tab. 2 – Similarity Measures
10
GDM 2011
07/11/2015
Edit Distance: example
6
ed
1
f
a
3
2
c
g
e
a
41
5
e
e
4
1
f
a
5
3
6
4
a
2
c
g’
Fig. 3 – Example of labeled graphs
 Transformation of g into g’:
1. deletion of the adge (d, e),
2. re-labeling the adge (a, d) from 1 to 4,
3. re-labeling the node d with e,
4. insertion of the adge (a, f) with the label 1.
 Use of the uniform distance: DistEd g, g '  4
11
GDM 2011
07/11/2015
Distances based on Mcs and Gu: example
6
d
f
a
3
e
a
1
5
e
e
4
2
c
1
f
a
g
5
3
6
4
a
2
c
g’
Fig. 4 – Example of labeled graphs
 Identification of the size of Mcsg , g ' : Mcsg, g'  4
 Computation of Mcs-based distance:
DistMcs g , g '  1 
Mcsg , g '
Max g , g ' 
 0.33
 Computation of Gu-based distance:
DistGu g , g '  1 
12
Mcsg , g '
g  g '  Mcsg , g '
GDM 2011
 0.50
07/11/2015
Graph Similarity Skyline (1/2)
 Graph compound similarity between two graphs: a vector of
local distance measures
GCS( g, g ' )  ( Dist1 ( g, g ' ), Dist2 ( g, g ' ),, Distd ( g, g ' ))
GCS ( g, g ' )  ( DistEd ( g, g ' ), DistMcs ( g, g ' ), DistGu ( g, g ' ))
13
GDM 2011
07/11/2015
Graph Similarity Skyline (2/2)
 q: a query, D = {g1, …, gn} a GDB
 For i = 1 to n, do:
GCS( gi , q)  ( Dist1 ( gi , q), Dist2 ( gi , q),, Distd ( gi , q))
 Compare GCS( gi , q)
 Extract the Graph Similarity Skyline (GSS):
 Similarity-Dominance Relation
i. ∀ i ∈ {1, ..., d}, Disti(g, q) ≤ Disti(g’, q),
ii. ∃ k ∈ {1, ..., d}, Distk(g, q) < Distk(g’, q).
GSS(D, q)  g  D g ' D, g '  q g
14
GDM 2011
07/11/2015
Illustrative Example (1/2)
4
d
f
a
1
f
a
e
6
4
e
e1
e 64
d
3
1 a
d 15 a
5
f 5
c 2
2c 2
ac 3
3
g1
g2
a 3
g1
1 e
1
6
1 e
e 2
e 2
2
6
a
3
1
e
2
5
f 5
c 2 f 5 11
ca 2
a 32
3
1
c
g5
g6
a 3
g5
6
1
ee
e e
6
6
6
6
6
|Mcs(g
i, q)|
e 4e 4
e e4 4
e 4
3
2
a
a
a
2
d3 5 a
f 35
a
a
(g
,
q)
4
5
d 25
d
f
5
1
1
c2 2
c
2
2
1
c
a 3c
a c3
(g2, q)
4
a 3
g3
a 3 g4
a 3
g2
g3
(g3, gq)4
4
e
e
e
1
(g4, q)
3
e
6
6
1 4e 6
1 e
e
2 e 4
6
1
(g5, q)e 4 6 5
e 4 6a
a
a
e
2
2f 5
3
2 f
2
1 53
1 1
a
a
(g6,5 q)
5
c
c 2a
f
f 5
f
3
3 2
a 3 2
2
1 5a c
1
1
c g7
(g7, q)c
6
q
a 3
a 3
a 3
g6
g7
q
Fig. 6 – Graph database D and graph query q
15
e
Tab. 3 – Information about |Mcs(gi, q)|
Fig. 6 – Graph database D and graph query q
GDM 2011
07/11/2015
Illustrative Example (2/2)
 Computation of GCS(gi,q), for i= 1 to 7, do:
DistEd(gi,q)
DistMcs(gi,q)
DistGu(gi,q)
(g1, q)
4
0.33
0.50
(g2, q) g1
4
0.43
0.56
(g3, q) g5
3
0.43
0.56
(g4, q)
2
0.50
0.67
(g5, q)
3
0.38
0.44
(g6, q) g1
4
0.44
0.50
(g7, q)
4
0.40
0.40
Tab. 4 – Distance Measures
GSS(D, q) = {g1, g4, g5, g7}
16
GDM 2011
07/11/2015
Refinement of Graph Similarity Skyline (1/3)
 Large Skyline
 Need k dissimilar answers
 Solution: diversity criterion
 Extract a subset (S) of size k with a maximal diversity
Provide the user with a global picture of the whole set GSS
17
GDM 2011
07/11/2015
Refinement of Graph Similarity Skyline (2/3)
 Diversity of a subset S of size k is:
Div(S )   1 , 2 , 3 
vi  minDisti ( g, g ' ) g, g ' S
 i : diversity in the ith dimension of the subset S
s. t.:
v1  Dist1  DistEd (1  DistEd )
v2  Dist2  DistMcs
v3  Dist3  DistGu
18
GDM 2011
07/11/2015
Refinement of Graph Similarity Skyline (3/3)
 Refinement Algorithm:
For j = 1 to C kSGS , enumerate S j  SGS, with S j  k
2. For i = 1 to d, rank-order all Sj in decreasing way according
to their diversity i
1.
3.
4.
19
Let ri S j  be the rank of Sj w. r. t. the ith dimension:
 ri S k   1: the best diversity value
 ri S k   M: the worst diversity value
Evaluate Sj by: val S j  i 1,...,d ri S j
 
 
Extract S : valS  minS val(S )
GDM 2011
07/11/2015
Illustrative Example
 Return the 2 best graphs:
e
4
6
d
1
f 5
a 3
c
e
a
2
g1
6
e
1
f
5
a 3
2 4
1
g4
f
51
a 3
c
e
1
6
e 2
a
c
e
1
a
2
g5
2 e 4
f 5
1
1
a 3
c
3
6
a
2
g7
Fig. 8 –The skyline GSS
20
v1
r1 v1 v2 v2 r2
v3 v3
r3
Val(Si)
S1={g1,g4}
S1={g
0.86
1,g4}
2 0.86 0.67 0.67 2
0.800.80
1
5
S2={g1,g5}
S2={g
0.83
1,g5}
3 0.83 0.50 0.50 5
0.600.60
6
14
S3={g1,g7}
S3={g
0.87
1,g7}
1 0.87 0.60 0.60 4
0.670.67
4
9
S4={g4,g5}
S4={g
0.80
4,g5}
4 0.80 0.62 0.62 3
0.730.73
3
10
S5={g4,g7}
S5={g
0.83
4,g7}
3 0.83 0.70 0.70 1
0.770.77
2
6
S6={g5,g7}
S6={g
0.75
5,g7}
5 0.75 0.50 0.50 5
0.610.61
5
15
GDM 2011
07/11/2015
Summary and Outlook
 Skyline approach for searching graphs by similarity
 Extraction of all DB graphs non-dominated by any other graph
 Preserving
information
similarity on different features
about
the
 Selection of the subset of graphs with maximal diversity
from the skyline
 Implementation: step to demonstrate the effectiveness
of the approach on a real database
 Investigation of other similarity measures
21
GDM 2011
07/11/2015
Thank you
Questions ?