Transcript Slide 1

The MOMIS-STASIS approach for
Ontology-Based Data Integration
D. Beneventano, M. Orsini, L. Po, S. Sorrentino
DII, University of Modena and Reggio Emilia
A project co-funded by the Project Partners and the European Commission
Outline
Introduction
Ontology Based Data Integration
MOMIS & STASIS
The goal of the MOMIS-STASIS approach
Semantic Link Generation
Global Schema Generation
The MOMIS-STASIS architecture
An Application Example
Future Work
Conclusion
STASIS
Ontology Based Data Integration
Data integration: to combine data residing at distributed heterogeneous
sources
Integration System: wrapper/mediator architecture based on a Global
Virtual Schema (Global Virtual View - GVV) and a set of data sources
The data sources contain the real data
The GVV provides a reconciled, integrated, and virtual view of the
underlying sources
Mapping among sources and the GVV
Ontologies can be used in the integration task to describe the semantics of
the information sources
Ontology-Based Data Integration: use of ontologies to effectively combine
data coming from multiple heterogeneous sources
STASIS
MOMIS & STASIS
MOMIS (Mediator EnvirOnment for Multiple Information Sources) is a Data
Integration System which performs information extraction and integration
from both structured and semi-structured data sources
single ontology approach: the lexical ontology WordNet (WN) is used as
a shared vocabulary for:
• the specification of the semantics of data sources
• the identification and association of semantically corresponding
information concepts
STASIS is a comprehensive application suite which allows enterprises to
simplify the mapping process between data schemas based on semantics
Ontology-driven Semantic Mapping: identification of mappings between
concepts of different schemas based on the schemas annotation with
respect to a set of ontologies (multiple ontology approach)
STASIS
The goal of the MOMIS-STASIS approach
To combine the MOMIS and STASIS frameworks to obtain an effective
approach for Ontology-Based Data Integration
extension of the MOMIS system by using the Ontology-driven Semantic
Mapping framework of STASIS:
• enabling the MOMIS system to employ generic OWL ontologies,
with respect to the limitation of using only the WordNet lexical
ontology
• developing a new method to compute semantic mapping among
source schemas in the MOMIS system.
Macro-steps of the MOMIS-STASIS approach:
Semantic Link Generation (STASIS)
Global Schema Generation (MOMIS)
STASIS
Semantic Link Generation
Easy to use GUI allowing users
to identify semantic elements in an easy way
to create mappings by considering the meaning of elements rather than
their syntactical structure
Distributed registry and repository network:
intelligent mapping suggestions by reusing mapping information from
earlier semantic links
Ontology-driven Semantic Mapping definition
mappings between entities of different schemas based on annotations
linking the entities with concepts of an ontology
STASIS
Semantic Link Generation
The Semantic Link Generation process is composed by 3 main steps:
1- obtaining a neutral schema representation
2- local source annotation
3- semantic mapping discovery
STASIS
Semantic Link Generation
Step 1. Obtaining a neutral schema representation
Local schemas are described by a unified data model called Logical Data
Model (LDM). It allows the representation of the following semantic entities:
classes (or concepts), relationships (or object properties), and attributes (or
data-type properties); classes are organized in a is-a hierarchy
Step 2. Local Source Annotation
Semantics of the data expressed by semantic correspondences between
the schema and ontologies. Semantic entities n to be annotated with
respect to one or more ontologies. eed
An annotation element is a tuple < SE, R, C>
SE is a semantic entity of the schema
C is a concept of the ontology
R is the semantic relationship between SE and C:
equivalence (AR EQUIV )
more general (AR SUP)
less general (AR SUB)
disjointness (AR DISJ)
STASIS
Semantic Link Generation
Step 3.Semantic Mapping Discovery
Based on the annotation made with respect to the ontologies and on the
logic relationships identified between these aligned ontologies, reasoning
can identify correspondences among the semantic entities.
Semantic mappings between entities of two source schemas (called
semantic link- SL) :
equivalence (EQUIV)
more general (SUP)
less general (SUB)
disjointness (DISJ);
STASIS
Global Schema Generation
Global-As-View (GAV) approach where each global class of the Global
Schema is characterized in terms of a view over the local sources
INPUT : Common Thesaurus SLs generated by the STASIS framework
METHOD : clustering techniques
Given a set of data sources MOMIS synthesizes in a semi-automatic way a
Global Schema (Global Virtual View - GVV):
a global class G=(L,GA) is generated for each cluster C where L are the
local classes of the cluster C and GA are the global attributes of G
• Union of the local attributes
• Fusion of “similar attributes” (by using the Common Thesaurus)
a Mapping Table (MT) is generated for each global class, which contains
the mappings to connect the global attributes with the local sources
attributes. MT is a table GAxL : an element MT[GA][L] represents the
attributes of the local class L mapped into the global attribute GA.
STASIS
The MOMIS-STASIS Architecture
STASIS
An Application Example
As a simple example let us consider two relational local sources L1 and L2 ,
where each schema contains a relation describing purchase orders:
We will describe step by step the application of the MOMIS-STASIS
approach:
Step 1. Obtaining a neutral schema representation
Local sources L1 and L2 are translated in the neutral representation and are
represented in LDM data model. L1.PURCHASE_ORDER, L1.BILLING
ADDRESS, L1.DELIVERY ADDRESS are represented as semantic
entities.
Step 2. Local Source Annotation
We consider the annotation of schemas and the derivation of mappings
w.r.t. a single common ontology: the Purchase Order Ontology.
STASIS
An Application Example
Some examples of simple annotations discovered
by applying the automatic “name-based” technique.
:
An example of complex annotation is
which can be considered as a designer refinement
of the above simple annotations to state that the
address in the PURCHASE_ORDER table is the
“address of the Shipping in a Purchase Order”.
STASIS
An Application Example
Step 3. Semantic Mapping Discovery
From the previous annotations, e.g. the following semantic link is derived:
While no semantic link among CUSTOMER_LOCATION and BILLING _ADDRESS
is generated.
Step 4. Global Schema Generation
Given the set of semantic links described above and collected in the
Common Thesaurus, the GVV is automatically generated and the classes
describing the same or semantically related concepts in different sources
are identified and clusterized in the same global class. Moreover, the
Mapping Table is automatically generated.
STASIS
Future Work
An advantage of the proposed approach: an accurate schema annotation.
Problem: this annotation is performed manually by the integration designer.
We propose a preliminary idea to overcome this problem which can be
summed up in three steps:
1- Annotation w.r.t. WordNet (WN): both the ontologies and the local sources are
annotated, w.r.t. WN, by using Automatic Lexical Annotation techniques based on
Word Sense Disambiguation
2- WN semantic relationship discovery: starting from the previous annotations, a set
of WN semantic relationships (synonym (equivalence), hypernym (more general)
etc.) is discovered among semantic entities and ontology concepts
3- Local source annotation for Ontology Driven Semantic Mapping: starting from the
set of WN semantic relationships, a correspondent set of annotations for OntologyDriven Semantic Mapping can be discovered
STASIS
Future Work
STASIS
Conclusions
We described the early effort to obtain an effective Global Schema
Generation approach for Ontology-Based Data Integration, by combining
the techniques provided by MOMIS and the STASIS frameworks
Extension of the MOMIS system to perform Ontology-driven Semantic
Mapping discovery: the annotation of data sources elements w.r.t. generic
ontologies (expressed in OWL)
Extension of the MOMIS system to overcome the limitation of using only the
lexical ontology WN by introducing a multiple ontology approach w.r.t. the
previous single ontology approach
Even if this work needs to be further investigated, it represents a
fundamental start point versus a fully automatic Ontology-Based Data
Integration System
STASIS