oMAP: An Implemented Framework for Automatically Aligning OWL Ontologies Raphaël Troncy, Umberto Straccia ISTI-CNR [email protected] SWAP, December, 2005

Download Report

Transcript oMAP: An Implemented Framework for Automatically Aligning OWL Ontologies Raphaël Troncy, Umberto Straccia ISTI-CNR [email protected] SWAP, December, 2005

oMAP: An Implemented Framework
for Automatically Aligning OWL
Ontologies
Raphaël Troncy, Umberto Straccia
ISTI-CNR
[email protected]
SWAP, December, 2005
Outline
• Motivations
• oMAP
• A formal framework
• The different classifiers used
• Evaluation
• Conclusion
2
Motivations
• Heterogeneity of information systems
• Ontologies as a solution to data heterogeneity on
the Web
• Ontologies are themselves heterogeneous:
• knowledge representation language
• degree of formalization
• Semantic Web
• More and more OWL/RDF ontologies on the Web
• Need for comparing/reusing/merging ontologies
• partially covering the same domain
• different version of the same ontology
3
Motivations (cont.)
• Distributed Information Retrieval
• Resource selection: The agent has to select a subset of some
relevant resources
• Query reformulation: For every selected resource, the agent has
to re-formulate its information need accordingly
• Data fusion & rank aggregation: The results from the selected
resources have finally to be merged together.
4
Aligning Ontologies
• A matching operator:
• Input: a set of discrete entities (tables, XML elements,
classes, properties…)
• Output:
• relationship holding between the entities (subsumption,
equivalence, disjointness…)
• a confidence measure v  0..1
• Automatic vs manual techniques
• Numerous work from various communities
• schema matching, machine learning, data integration
5
Example
Equivalence
Subsumption
Disjointness
6
oMAP: A Formal Framework
• Inspirations:
• Formal work in data exchange [Fagin et al., 2003]
• GLUE: combining several specialized components for
finding the best set of mappings [Doan et al., 2003]
• Notations:
• A mapping is a tuple: M = (T, S, ∑)
• S et T are the source and target ontologies
• Si is an OWL entity (class, datatype property, object property)
of the ontology
• ∑ is a set of mapping rules: αij Tj ← Si
7
oMAP: Overall Strategy
• A three step process:
• Form possible ∑ sets and estimate its quality based
on the quality measures for its mapping rules
• For each mapping rule Tj ← Si, estimate its
confidence αij which also depends on the ∑ it
belongs to
• Use heuristics to build iteratively the final set of
mappings
8
oMAP: Combining Classifiers
• Weight of a mapping rule:
• αij = w (Si,Tj, ∑)
• Using different classifiers:
• w (Si,Tj,CLk) is the classifier's approximation of the
rule Tj ← Si
• Combining the approximations:
• Use of a priority list: CL1  CL2 … CLn
9
Terminological Classifiers
• Same entity names (or URI)
1 if S i , T j havesame name,
w( S i , T j , CLN )  
0 otherwise
• Same entity name stems
1 if S i , T j havesame stem,
w( S i , T j , CLS )  
0 otherwise
10
Terminological Classifiers
• String distance name
w(S i , T j , CLLD ) 
distLevenshtein (S i , T j )
max(length(S i ), length(T j ))
• WordNet distance name
if S i , T j are synonyms,
1

w( S i , T j , CLWN )  


2
*
lcs
max sim, length( S )  length(T )  otherwise
i
j 


• lcs is the longest common substring between Si and Tj
• sim =
synonym( S i )
synonym( S i )


synonym(T j )
synonym(T j )
11
Machine Learning-Based
Classifiers
• Collecting individuals:
• label for the named individuals
• data value for the datatype properties
• type for the anonymous individuals and the
range of object properties
• Recursion on the OWL definition:
• depth parameter
12
Machine Learning-Based
Classifiers
• Example
Individual (x1 type (Conference)
value (label "Int. Conf. on WISE") value (location x2) )
Individual (x2 type (Address)
value (city "New York city") value (country "USA") )
u1 = ("Int. Conf. on WISE", "Address")
u2 = ("Address", "New York City", "USA")
• Naïve Bayes text classifier
w( S i , T j , CL NB )  Pr( S i ) 
  Pr( m S )
( x ,u )T j mu
i
13
Structural and SemanticsBased Classifier
• If Si and Tj are property names:

0
w( S i , T j , )  

w' ( S i , T j , )
if T j  S i  
otherwise
• If Si and Tj are concept names1:

0
if T j  S i  


if D  0 and T j  S i  
w( S i , T j , )  w' ( S i , T j , )




1



 w' ( S i , T j , )  maxset   w(Ci , D j , )   otherwise

 ( Set  1) 
(
Ci
,
Dj
)

set





t
1 Where
D = D(Si) * D(Tj) ; D(Si) represents the set of concepts directly parent of Si 14
Structural and SemanticsBased Classifier
• Let CS=(QR.C) and DT=(Q’R’.D), then1:
w(CS , DT , )  wQ(Q, Q' )  w( R, R' , )  w(C, D, )
• Let CS=(op C1…Cm) and DT=(op’ D1…Dm), then2:


maxset   w(Ci , D j , ) 
 (Ci , Dj )set

w(CS , DT , )  wop (op, op' ) 
min(m, n)
Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions
2 Where op, op’ are concept constructors and n,m ≥ 1
1
15
Structural and SemanticsBased Classifier
• Possible values for wop and wQ weights
wop
wQ
⊓
⊔
¬
⊓
⊔
¬
1
1/4
0

1
0

1
n n


1
1/4
m
1
m
1
1/3
1
16
Evaluation
• More and more techniques / tools for aligning
ontologies
• difficult to compare all the approaches theoretically
• pragmatism: evaluation campaign and contest
• I3CON : based on the NIST Text Retrieval Conference model
• EON : systematic benchmark tests on all OWL constructs
• OAEI : http://oaei.inrialpes.fr
• Alignment API [Euzenat, ISWC 2004]
• common format for representing / exchanging the
alignments found
• tools and metrics for evaluating these alignments
17
• 3 series of tests on bibliographic ontologies:
• simple tests: identity, specialization/generalization of the
language
• systematic tests: some features of the initial ontology
are progressively discarded
• complex tests: aligning 4 real ontologies available on
the Web
• The directory real world case consists of
aligning web sites directory using the large
dataset
18
19
Conclusion
• oMAP: a formal framework for aligning
automatically OWL ontologies
• Combining several specific classifiers
• terminological classifiers
• machine learning-based classifiers
• structural and semantics-based classifier
• Empirical evaluation on benchmark tests
• using traditional information retrieval metrics
• machine resources, memory, computation time…
not yet considered
20
Future Work
• Alignment:
• Using additional classifiers:
• kNN, KL-distance, WordNet or other terminological
resources
• straightforward theoretically but practically difficult
• Finding complex alignment
• name = firstName + lastName
• Distributed Information Retrieval
• Automated relevant resource selection
21
Useful Links
• oMAP: http://homepages.cwi.nl/~troncy/oMAP/
• Tutorial: Schema and Ontology Matching @ ESWC
http://dit.unitn.it/~accord/Presentations/ESWC'05-MatchingHandOuts.pdf
• Alignment API: http://co4.inrialpes.fr/align/align.html
• OAEI: http://oaei.inrialpes.fr/
• State of the Art:
• P. Shvaiko and J. Euzenat: A Survey of Shema-based Matching
Approaches. Journal on Data Semantics (JoDS), 2005
• KW Consortium: State of the Art on Ontology Alignment. Knowledge
22
Web D2.2.3, 2004