oMAP: An Implemented Framework for Automatically Aligning OWL Ontologies Raphaël Troncy, Umberto Straccia ISTI-CNR [email protected] SWAP, December, 2005
Download ReportTranscript oMAP: An Implemented Framework for Automatically Aligning OWL Ontologies Raphaël Troncy, Umberto Straccia ISTI-CNR [email protected] SWAP, December, 2005
oMAP: An Implemented Framework for Automatically Aligning OWL Ontologies Raphaël Troncy, Umberto Straccia ISTI-CNR [email protected] SWAP, December, 2005 Outline • Motivations • oMAP • A formal framework • The different classifiers used • Evaluation • Conclusion 2 Motivations • Heterogeneity of information systems • Ontologies as a solution to data heterogeneity on the Web • Ontologies are themselves heterogeneous: • knowledge representation language • degree of formalization • Semantic Web • More and more OWL/RDF ontologies on the Web • Need for comparing/reusing/merging ontologies • partially covering the same domain • different version of the same ontology 3 Motivations (cont.) • Distributed Information Retrieval • Resource selection: The agent has to select a subset of some relevant resources • Query reformulation: For every selected resource, the agent has to re-formulate its information need accordingly • Data fusion & rank aggregation: The results from the selected resources have finally to be merged together. 4 Aligning Ontologies • A matching operator: • Input: a set of discrete entities (tables, XML elements, classes, properties…) • Output: • relationship holding between the entities (subsumption, equivalence, disjointness…) • a confidence measure v 0..1 • Automatic vs manual techniques • Numerous work from various communities • schema matching, machine learning, data integration 5 Example Equivalence Subsumption Disjointness 6 oMAP: A Formal Framework • Inspirations: • Formal work in data exchange [Fagin et al., 2003] • GLUE: combining several specialized components for finding the best set of mappings [Doan et al., 2003] • Notations: • A mapping is a tuple: M = (T, S, ∑) • S et T are the source and target ontologies • Si is an OWL entity (class, datatype property, object property) of the ontology • ∑ is a set of mapping rules: αij Tj ← Si 7 oMAP: Overall Strategy • A three step process: • Form possible ∑ sets and estimate its quality based on the quality measures for its mapping rules • For each mapping rule Tj ← Si, estimate its confidence αij which also depends on the ∑ it belongs to • Use heuristics to build iteratively the final set of mappings 8 oMAP: Combining Classifiers • Weight of a mapping rule: • αij = w (Si,Tj, ∑) • Using different classifiers: • w (Si,Tj,CLk) is the classifier's approximation of the rule Tj ← Si • Combining the approximations: • Use of a priority list: CL1 CL2 … CLn 9 Terminological Classifiers • Same entity names (or URI) 1 if S i , T j havesame name, w( S i , T j , CLN ) 0 otherwise • Same entity name stems 1 if S i , T j havesame stem, w( S i , T j , CLS ) 0 otherwise 10 Terminological Classifiers • String distance name w(S i , T j , CLLD ) distLevenshtein (S i , T j ) max(length(S i ), length(T j )) • WordNet distance name if S i , T j are synonyms, 1 w( S i , T j , CLWN ) 2 * lcs max sim, length( S ) length(T ) otherwise i j • lcs is the longest common substring between Si and Tj • sim = synonym( S i ) synonym( S i ) synonym(T j ) synonym(T j ) 11 Machine Learning-Based Classifiers • Collecting individuals: • label for the named individuals • data value for the datatype properties • type for the anonymous individuals and the range of object properties • Recursion on the OWL definition: • depth parameter 12 Machine Learning-Based Classifiers • Example Individual (x1 type (Conference) value (label "Int. Conf. on WISE") value (location x2) ) Individual (x2 type (Address) value (city "New York city") value (country "USA") ) u1 = ("Int. Conf. on WISE", "Address") u2 = ("Address", "New York City", "USA") • Naïve Bayes text classifier w( S i , T j , CL NB ) Pr( S i ) Pr( m S ) ( x ,u )T j mu i 13 Structural and SemanticsBased Classifier • If Si and Tj are property names: 0 w( S i , T j , ) w' ( S i , T j , ) if T j S i otherwise • If Si and Tj are concept names1: 0 if T j S i if D 0 and T j S i w( S i , T j , ) w' ( S i , T j , ) 1 w' ( S i , T j , ) maxset w(Ci , D j , ) otherwise ( Set 1) ( Ci , Dj ) set t 1 Where D = D(Si) * D(Tj) ; D(Si) represents the set of concepts directly parent of Si 14 Structural and SemanticsBased Classifier • Let CS=(QR.C) and DT=(Q’R’.D), then1: w(CS , DT , ) wQ(Q, Q' ) w( R, R' , ) w(C, D, ) • Let CS=(op C1…Cm) and DT=(op’ D1…Dm), then2: maxset w(Ci , D j , ) (Ci , Dj )set w(CS , DT , ) wop (op, op' ) min(m, n) Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions 2 Where op, op’ are concept constructors and n,m ≥ 1 1 15 Structural and SemanticsBased Classifier • Possible values for wop and wQ weights wop wQ ⊓ ⊔ ¬ ⊓ ⊔ ¬ 1 1/4 0 1 0 1 n n 1 1/4 m 1 m 1 1/3 1 16 Evaluation • More and more techniques / tools for aligning ontologies • difficult to compare all the approaches theoretically • pragmatism: evaluation campaign and contest • I3CON : based on the NIST Text Retrieval Conference model • EON : systematic benchmark tests on all OWL constructs • OAEI : http://oaei.inrialpes.fr • Alignment API [Euzenat, ISWC 2004] • common format for representing / exchanging the alignments found • tools and metrics for evaluating these alignments 17 • 3 series of tests on bibliographic ontologies: • simple tests: identity, specialization/generalization of the language • systematic tests: some features of the initial ontology are progressively discarded • complex tests: aligning 4 real ontologies available on the Web • The directory real world case consists of aligning web sites directory using the large dataset 18 19 Conclusion • oMAP: a formal framework for aligning automatically OWL ontologies • Combining several specific classifiers • terminological classifiers • machine learning-based classifiers • structural and semantics-based classifier • Empirical evaluation on benchmark tests • using traditional information retrieval metrics • machine resources, memory, computation time… not yet considered 20 Future Work • Alignment: • Using additional classifiers: • kNN, KL-distance, WordNet or other terminological resources • straightforward theoretically but practically difficult • Finding complex alignment • name = firstName + lastName • Distributed Information Retrieval • Automated relevant resource selection 21 Useful Links • oMAP: http://homepages.cwi.nl/~troncy/oMAP/ • Tutorial: Schema and Ontology Matching @ ESWC http://dit.unitn.it/~accord/Presentations/ESWC'05-MatchingHandOuts.pdf • Alignment API: http://co4.inrialpes.fr/align/align.html • OAEI: http://oaei.inrialpes.fr/ • State of the Art: • P. Shvaiko and J. Euzenat: A Survey of Shema-based Matching Approaches. Journal on Data Semantics (JoDS), 2005 • KW Consortium: State of the Art on Ontology Alignment. Knowledge 22 Web D2.2.3, 2004