Diapositiva 1 - DataBase Group

Download Report

Transcript Diapositiva 1 - DataBase Group

WISDOM
(Web Intelligent Search based on DOMain ontologies):
Demo
Sonia Bergamaschi
[1],
[1]
Paolo Bouquet
[2],
Paolo Ciaccia
[3],
and Paolo Merialdo
[4]
Università degli Studi di Modena e Reggio Emilia
[2] Università degli Studi di Trento
[3] Università degli Studi di Bologna
[4] Università degli Studi Roma Tre
http://www.dbgroup.unimo.it/wisdom
6 Dicembre 2006
WISDOM D0.P1 – Integrated System Protoype 1
Overview
The WISDOM project aims at studying, developing and experimenting methods
and techniques for searching and querying data sources available on the Web.
The goal of the project:
Definition of a software framework that allows computer applications
to leverage the huge amount of information contents offered by Web
sources (typically, as Web sites)
The context:
 number of sources of interest might be extremely large
 sources are independent and autonomous one each other
These factors raise significant issues, in particular because such an information
space implies heterogeneities at different levels of abstraction (format, logical,
semantics). Providing effective and efficient methods for answering queries in
such a scenario is the challenging task of the project
WISDOM D0.P1 – Integrated System Protoype 2
Overview
In WISDOM, super-peers containing data from web-sources referred to the same
domain are built. Super-peers are connected by semantic mappings in a Superpeer Network.
The end-user formulates a query according to a specific super-peer. The answer will
include data extracted from all the super-peers relevant for the query.
From a functional point of view, the WISDOM project may be divided into two parts:
A) Building a super-peer Network, where

Web sources are grouped into super-peers;

Each super-peer exports a Semantic Peer Ontology synthesizing the
knowledge of the involved sources;

The Semantic peer ontologies are related by means of simple semantic
mappings.
B) Querying a super-peer Network, where:

A graphical interface allows the user to formulate a query according to a
semantic peer ontology;

The query is rewritten for each super-peer interesting for the answer;

The query is reformulated inside each super-peer according to the involved
sources

The query is locally executed and the results are provided to the user.
WISDOM D0.P1 – Integrated System Protoype 3
Functional Flow-Diagram
1) Data from web-sites are
extracted by means of
wrappers
2) Data are annotated
according to a lexical
reference
3) A semantic peer ontology
is created for each
semantic peer
4) The semantic peer
ontologies are related
by means of mappings
1)
2)
3)
4)
http://dbgroup.unimo.it/wisdom/prototipi/D1.P4.html
http://dbgroup.unimo.it/wisdom/prototipi/D1.P5.html
http://dbgroup.unimo.it/wisdom/prototipi/D1.P1.html
http://dbgroup.unimo.it/wisdom/prototipi/D2.P1.html
WISDOM D0.P1 – Integrated System Protoype 4
Building a Super-peer Network: the local sources
We tested the system by creating with MOMIS (http://dbgroup.unimo.it/Momis), Road Runner
(http://www.dia.uniroma3.it/db/roadRunner) and MELIS (http://dbgroup.unimo.it/wisdom-unimo/melis), a
Super-peer Network composed of three peers, each one integrating 2-3 tourism Web sites:
…
Peer 1
bbitaly
touring
http://www.bbitalia.it/default_eng.asp
http://www.touring.it
Peer2
guidacampeggi
saperviaggiare
venere.com
http://www.guidacampeggi.com
http://www.touringclub.com/ITA/viaggiatori/dove_mangiare
http://www.venere.com
Peer 3
bedandbreakfast http://www.bed-and-breakfast.it
booking
http://www.booking.com
opificidigitali
http://www.opificidigitali.it
WISDOM D0.P1 – Integrated System Protoype 5
Demonstration Scenario
Peer 1 abstracts in the
global classes:
hotels
restaurants
the local classes:
hotels (bbitaly)
restaurants (touring)
WISDOM D0.P1 – Integrated System Protoype 6
Demonstration Scenario
Peer 2 abstracts in the
global classes:
hotels
campings
facilities
the local classes:
hotels (venere)
hotels (saperviaggiare)
maps (venere)
campings (guidacampeggi)
facilities (guidacampeggi)
facilities (venere)
WISDOM D0.P1 – Integrated System Protoype 7
Demonstration Scenario
Peer 3 abstracts in the
global classes:
hotels
restaurants
features
the local classes:
hotels (booking)
judgement_hotel (booking)
bedandbreakfast
(bedandbreakfast)
features (bedandbreakfast)
restaurants (opificidigitali)
features_bb
(bedandbreakfast)
conditions_hotel (booking)
WISDOM D0.P1 – Integrated System Protoype 8
Wrapping Web Sources
Each Super-peer is created extracting data by Road Runner:
1. Identify sources (Web sites) to be wrapped
2. For each source: infer a site schema and collect pages containing
information of interest
3. From a set of sample pages, infer a wrapper library
4. Apply the wrapper library over the set(s) of pages collected
in step 2
1. Identify sample pages
2 Collect pages similar
in structure to those of the
sample set
All/InDesit: infer site schema
4 Apply the wrapper library
to extract data from pages
Wrapper generator
3 Generate a wrapper library
Wrapper library
Output
data
WISDOM D0.P1 – Integrated System Protoype 9
Wrapping Web Sources: the Demo
The demonstration:
 11 web sites, delivering information about hotels,
campings, b&bs, restaurants
 The Web site schema inference module (Indesit) was
configured (when possible) to collect pages of interest
from these sites
 Indesit generated a Web schema for each Web site: the
output description was used to collect pages about 8000
pages
 16 wrappers were inferred by means of the wrapper
generation module RoadRunner
 The extracted data were stored in 11 relational databases
(one per source)
 The Indesit Web schemas can be used to refresh data
WISDOM D0.P1 – Integrated System Protoype 10
Annotating Data Sources wrt a lexical reference
MELIS: Meaning Elicitation and Lexical Integration System
+
Extensions
Lexical
Reference
Input
WNEditor
CtxMatch 2.0
Output
Source annotated
with respect to the
Reference Ontology
MELIS
Data source
partially annotated
Reference
Ontology
User validation and
further annotations
WISDOM D0.P1 – Integrated System Protoype 11
Annotating Data Sources wrt a lexical reference
Meaning Elicitation Process
For each (class and property)
element in the Input
Ontology, MELIS extracts all
candidate senses from
WordNet. After this step it
filters out candidate senses
by using Domain Ontologies
and a collection of heuristic
rules.
Domain Ontology
Building #1
Restaurant #2
Home page #1
Domain Ontology
Hotel #1
Name #2
Input Ontology
Building
Address
Hotel
Name
City
City #1
B&B #1
Name #2
Domain Ontology
Edifice #2
Address #3
Restaurant
Home page
Motel #1
Name #2
City #1
City #1
WISDOM D0.P1 – Integrated System Protoype 12
Annotating Data Sources wrt a lexical reference
OUTPUT Language: WISDOM-OWL
<owl:Class rdf:ID="BB.bed_and_breakfast">
OWL DL
<rdfs:label>
bed_and_breakfast
</rdfs:label>
<db:PrimaryKey>
DB Annotations
BB.bed_and_breakfast.url
</db:PrimaryKey>
<lex:wnAnnotation rdf:parseType="Literal">
<lex:lemmaValue>bed_and_breakfast</lex:lemmaValue>
Lexical
<lex:lemmaSyntacticCategory>1</lex:lemmaSyntacticCategory>
Annotations
<lex:lemmaSenseNumber>1</lex:lemmaSenseNumber>
</lex:wnAnnotation>
</owl:Class>
WISDOM D0.P1 – Integrated System Protoype 13
Building a Super-Peer Ontology
Super-peer Ontologies were built by means of the MOMIS system, extended for the
specific purposes of the project. In particular, techniques for adding/removing
sources to/from a created ontology without restarting the process from scratch
were introduced.
The MOMIS process
for building a
domain ontology
is based on the
following steps:
WISDOM D0.P1 – Integrated System Protoype 14
Building Peer 2
Peer 2 was created integrating the local sources venere (493 hotels), saperviaggiare (977
hotels) and guidacampeggi (183 campings)
Source venere
local classes:
hotels
maps
facilities
WISDOM D0.P1 – Integrated System Protoype 15
Building Peer 2
Source guidacampeggi local classes:
campings
facilities
Source saperviaggiare local class:
hotels
WISDOM D0.P1 – Integrated System Protoype 16
Peer 2 Ontology
WISDOM D0.P1 – Integrated System Protoype 17
Creating inter-peer mappings
Booking
For each node of the
Peers that are compatible
with other elements of
other Peers we create a
Mapping Element that
describe the relationship
between them
Hotel #1
PEER 3
Hotel #1
Name #2
Address #3
BBItaly
URL #1
Hotel #1
Venere
PEER 1
Hotel #1
PEER 2
Hotels #1
Hotel #1
Name #2
Address #3
Name #2
Phone Number #1
City #1
e-mail #1
Logo #1
URL #1
Address #3
Price #4
WISDOM D0.P1 – Integrated System Protoype 18
Creating inter-peer mappings
OUTPUT Mappings:
<MappingElement>
<MappingElementID>mappingElement#3</MappingElementID>
<MappingType>Datatype2Datatype</MappingType>
<SourceElement>
</SourceElement>
<TargetElement>
</TargetElement>
<SourceElement>
<SourceElementID>Peer1.hotels.address
</SourceElementID>
<SourceElementLabel>address
</SourceElementLabel>
<AtomicMeaning>address#9#0#C
</AtomicMeaning>
<ContextualMeaning>address#9#0#C
</ContextualMeaning>
<DictionaryID>wordnet21</DictionaryID>
<Senses>
<Sense>107938889</Sense>
</Senses>
</SourceElement>
<Relations>
<SemanticRelation>
<RelationType>equivalent</RelationType>
<RelationGrade>1</RelationGrade>
</SemanticRelation>
<RelationMeasure>
</RelationMeasure>
</Relations>
</MappingElement>
<RelationalMeasure>
<LessGeneralThan>0.0</LessGeneralThan>
<MoreGeneralThan>0.0</MoreGeneralThan>
<Equivalent>
<SameGranularity>1.0</SameGranularity>
<LowerGranularityThan>0.0</LowerGranularityThan>
<HigherGranularityThan>0.0</HigherGranularityThan>
</Equivalent>
<Disjoint>0.0</Disjoint>
<Overlapping>0.0</Overlapping>
</RelationalMeasure>
WISDOM D0.P1 – Integrated System Protoype 19
Querying a Super-Peer Network
Querying the Super-Peer Network involves:
 Formulating the query at a peer on the ontology local to that peer
 Rewriting the query according to neighboring peers’ ontologies using the semantic
mappings
 Selecting the peers that are more relevant to the query (using content summaries
and semantic information about the rewritings)
 Sending the rewritten query to the relevant neighboring peers
 Translating the query to execute it on the local sources
 This involves relaxing preference expressions that are not directly manageable
by the underlying query executor
 Executing the query locally (by translating it using the local mappings)
and sending the results to the local query processor
 Collecting the results from both the local sources and the neighboring peers
 Building the final result
 This may involve performing additional computations to enforce the original
preference relation that was relaxed to be performed locally
 Presenting the result to the user
WISDOM D0.P1 – Integrated System Protoype 20
Functional Flow-Diagram
At peer p0, the user formulates a request using the M-FIRE
interface, that produces a query Q in a SPARQL-like syntax.
The semantic parser translates the query in an internal format
to be easily manipulated by the query processor.
The QP rewrites the query with respect to the local ontology
(to send the query to the MOMIS Query Manager) or to a
remote ontology (to be sent to the QP of a neighboring peer).
GUI (M-FIRE)
L0
Q
Semantic Parser
q0
(MOMIS Query Manager)
q0
q
Query Processor
GVV0
(QP of)
peer pj
qj
WISDOM D0.P1 – Integrated System Protoype 21
Query Formulation with M-FIRE
The user interface prototype
The query formulation prototype implements the M-FIRE framework
and includes two components: a client component for visual rendering
and user interaction handling, and a server component implementing
the M-FIRE representation and navigation engines.
Metaphors
M-FIRE allows to declaratevely define how a give RDF document shall be
graphically represented by supplying metaphors as parameters to a generic
representation engine. Metaphors also determine how the user’s actions on the
delivered representation shall be translated into queries over the underlying
knowledge base. Our prototype includes two metaphors, featuring:
 Two alternative ways of representing the ontology schema
 One single way of formulating conjunctive queries on the ontology schema
 One single table-like view of the query results
WISDOM D0.P1 – Integrated System Protoype 22
Query Formulation with M-FIRE
Ontology schema representation
After selecting a knowledge base to be explored and a metaphor for its
presentation, the user is provided with a representation of the ontology
schema. This is how the two alternative metaphors represent a schema:

Classes are rendered as tables, where a left pane is
showing an intuitive icon for the class, and the right pane is
listing the set of properties which apply to that class.

Each datatype property has a light yellow background
and a black font

Each object property has a yellow background and a
dark red font; moreover, on the left of the property name, an
icon is shown for each class which is in the range of the
property

Classes are rendered as tables, where the upper pane is
showing an intuitive icon for the class on the left of the class
name, and the lower pane is listing the properties which apply
to that class

Each datatype property has a cyan background, is
written in italic and has a black font; on the right side, the
name of each class in the property range is shown

Each object property has a light cyan background and a
blue font; moreover, on the right side of the property name,
the name of each class in the property range is shown
WISDOM D0.P1 – Integrated System Protoype 23
Query Formulation with M-FIRE
Query formulation
Conjunctive queries are formulated by left-clicking or right-clicking on the
properties to be included in the result (projection) or for which filters are
to be specified (selection)
 A left click on a datatype property selects that property for output (object
property cannot be selected for output)
 A left click on an object property means that the clicked property will be
used to perform a join between two classes (each class can only participate in
one join with each other class)
 A right click on a datatype property allows to express a filter on that
property (the operator for comparison and the target value may be specified
through a proper dialog box)
WISDOM D0.P1 – Integrated System Protoype 24
Input to the Query Processor
Query with joins are formulated similarly to
queries without joins, by specifying the correct
path between join properties through mouse
clicks.
The query is finally passed to the local query
processor.
WISDOM D0.P1 – Integrated System Protoype 25
Query Processor Components
Ont0
Q (SPARQL-like)
rdfs:Datatype
type
type
weekDay
type
cuisineType
type
xsd:boolean
ran
ran
Semantic Parser
dom
type
xsd:integer
ran
phone
ran
type
xsd:string
ran
name
dom
guides
price
type
ran
dom
ran
cuisine
type
dom
Restaurant
type
dom
closing-day
ran
dom
rdfs:Class
rates
rdf:Property
dom
type
type
dom
Rating
dom
ran
has-smoking-area
value
q (internal form)
rq1
Optimizer
rq1
Ranker
Rewriter
rqM
Semantic
Mappings
guide
Internally, the QP ranks the rewritten
queries in order to only execute the
“best” ones.
Then, the optimizer produces the
actual queries that will be sent to the
local executor (e.g., by “relaxing” some
preferences) or to neighboring peers
Content
Summaries
rqk
Plan Treej
Peers’
Metadata
qj,N
qj,0
qj,i
WISDOM D0.P1 – Integrated System Protoype 26
Inter-Peer Query Rewriting
Mapping extension with scores
RelationGrade: measures the
similarity among the corresponding
elements
WISDOM D0.P1 – Integrated System Protoype 27
Inter-Peer Query Rewriting
Target: Superpeer3
Query reformulation example
Source: Superpeer2
BASE <http://www.wisdom.net/ontology#>
SELECT ?N ?A ?C ?P ?S
FROM Peer2
WHERE {Peer2.hotels Peer2.hotels.name ?N ;
Peer2.hotels.address ?A ;
Peer2.hotels.city ?C ;
Peer2.hotels.price ?P ;
Peer2.hotels.url ?HURL .
Peer2.services Peer2.services.faciltity ?S ;
Peer2.services.url ?SURL .
FILTER ((?HURL=?SURL) && (?HURL='Rimini') &&
(?P>50) && (?P<80) && (?S = 'air conditioning')). }
PREFERRING min(?P)
LIMIT 50
R
e
w
r
i
t
e
r
BASE <http://www.wisdom.net/ontology#>
SELECT ?N ?P ?S
FROM Peer3
WHERE {Peer3.hotels Peer3.hotels.name ?N ;
Peer3.hotels.price_single ?P ;
Peer3.hotels.url ?HURL .
Peer3.features Peer2.features.url ?SURL .
FILTER ((?HURL=?SURL) && (?HURL='Rimini') &&
(?P>50) && (?P<80)). }
PREFERRING min(?P)
LIMIT 50
UNION
BASE <http://www.wisdom.net/ontology#>
SELECT ?N ?P ?S
FROM Peer3
WHERE {Peer3.hotels Peer3.hotels.name ?N ;
Peer3.hotels.price_double ?P ;
Peer3.hotels.url ?HURL .
Peer3.features Peer2.features.url ?SURL .
FILTER ((?HURL=?SURL) && (?HURL='Rimini') &&
(?P>50) && (?P<80)). }
PREFERRING min(?P)
LIMIT 50
UNION
BASE <http://www.wisdom.net/ontology#>
SELECT ?N ?P ?S
FROM Peer3
WHERE {Peer3.hotels Peer3.hotels.name ?N ;
Peer3.hotels.price_triple ?P ;
Peer3.hotels.url ?HURL .
Peer3.features Peer2.features.url ?SURL .
FILTER ((?HURL=?SURL) && (?HURL='Rimini') &&
(?P>50) && (?P<80)). }
PREFERRING min(?P)
LIMIT 50
WISDOM D0.P1 – Integrated System Protoype 28
The WISDOM Project
Query reformulation: details of target output
Target schema
Global union score
1st rewritten query
1st query score &
percentages
1st query terms
rewriting details
WISDOM D0.P1 – Integrated System Protoype 29
Inter-Peer Query Rewriting
The Ranker component of the query
processor ranks all the available rewritings
(according to user preferences) in order to
only execute the “best” ones (e.g., in order
to maximize the number of retrieved results,
or the semantic similarity of the rewritten
query wrt the original one)
WISDOM D0.P1 – Integrated System Protoype 30
Inter-Peer Query Forwarding
The QP of the neighboring peer receives the
query, solves it locally and (possibly) forwards it
to its neighbors (care is taken to ensure that
each peer only receives a query once).
WISDOM D0.P1 – Integrated System Protoype 31
Query Reformulation for Local Execution
The Optimizer component of the QP translates the rewritten
query in SQL (e.g., preference expressions are relaxed into
ORDER BY clauses).
Finally, the Executor sends the query to the local query
executor (MOMIS Query Manager), waiting for the results.
WISDOM D0.P1 – Integrated System Protoype 32
Local Query Execution
The MOMIS Query Manager reformulates the query taking into account the intrapeer mappings defined in a semantic peer among the local classes and the global
classes of the GVV (Global Virtual View).
The mappings are defined by using a GAV (Global as View) approach: each global
class of the GVV is expressed by means of the full-disjunction operator over the
local classes.

Query rewriting
 GAV approach: the query is processed by means of unfolding

Fusion and Reconciliation of the local answers into the global
answer
 Object Identification : Join conditions among local classes
 Inconsistencies: Resolution functions to deal with conflits
WISDOM D0.P1 – Integrated System Protoype 33
Query rewriting and execution
query q0 = scqG1  scqG2
single class query
scqG1
Global Virtual View
(GVV)
L1scqG1
Local
Schema
scqG2
Services
Services
L2scqG1 L3scqG1
Local
Schema
hotels
hotels
Hotels
single class query
hotels
hotels
map_hotels
map_hotels
L1scqG2
facilities
facilities
L2scqG2
Local
Schema
facilities
facilities
Query execution on the local sources
SAPERVIAGGIARE
VENERE
GUIDACAMPEGGI
WISDOM D0.P1 – Integrated System Protoype 34
Query rewriting
q0 = SELECT
H.name, H.address, H.city, H.price, S.facility, S.structure_name, S.structure_city
FROM hotels as H, services as S
WHERE H.city = S.structure_city and H.name = S.structure_name
and H.city = 'rimini‘ and H.price > 50 and H.price < 80 and S.facility = 'air conditioning'
order by H.price
SINGLE CLASS QUERIES
scqG1 = SELECT
H.name , H.address , H.city , H.price FROM hotels as H
WHERE (H.city = 'rimini' ) and (H.price > 50) and (H.price < 80)
scqG2 = SELECT
S.facility , S.structure_name , S.structure_city
WHERE (S.facility = 'air conditioning')
UNFOLDING
FROM services as S
UNFOLDING
L1scqG1 = SELECT
hotels.name, hotels.address, hotels.city FROM hotels
WHERE (city) = ('rimini')
L2scqG1 = SELECT
maps_hotels.hotels_name2, maps_hotels.hotels_city FROM maps_hotels
WHERE (hotels_city) = ('rimini')
L3scqG1 = SELECT
hotels.name2, hotels.address, hotels.price, hotels.city FROM hotels
WHERE ((city) = ('rimini') and ((price) > (50) and (price) < (80)))
L1scqG2 = SELECT
facilities_hotels.hotel_name2, facilities_hotels.hotels_city,
facilities_hotels.facility FROM facilities_hotels WHERE (facility) = ('air conditioning')
L2scqG2 = SELECT
facilities_campings.campings_name, facilities_campings.campings_city,
facilities_campings.name FROM facilities_campings WHERE (name) = ('air conditioning')
WISDOM D0.P1 – Integrated System Protoype 35
Fusion and Reconciliation
q0 result
set
scqG1 join scqG2
scqG1
result set
scqG2
result set
L1scqG1 full join L2scqG1
full join L3scqG1
L1scqG1
result set
partial
results
L1scqG2 full join L2scqG2
L2scqG1 L3scqG1 L1scqG2
result set result set result set
partial
results
SAPERVIAGGIARE
L2scqG2
result set
partial
results
VENERE
GUIDACAMPEGGI
WISDOM D0.P1 – Integrated System Protoype 36
Fusion and Reconciliation
scqG1 result set = L1scqG1 full join L2scqG1 full join L3scqG1
saperviaggiare.hotels full outer join venereEn.hotels on (
((venereEn.hotels.name2) = (saperviaggiare.hotels.name)
AND (venereEn.hotels.city) = (saperviaggiare.hotels.city)))
full outer join venereEn.maps_hotels on (
((venereEn.maps_hotels.hotels_name2) = (saperviaggiare.hotels.name)
AND (venereEn.maps_hotels.hotels_city) = (saperviaggiare.hotels.city))
OR ((venereEn.maps_hotels.hotels_name2) = (venereEn.hotels.name2)
AND (venereEn.maps_hotels.hotels_city) = (venereEn.hotels.city)))
scqG2 result set= L1scqG2 full join L2scqG2
guidacampeggi.facilities full outer join venere.facilities on (
(venere.facilities.facility) = (guidacampeggi.facilities.name)
AND (venere.facilities.hotels_city) = (guidacampeggi.facilities.campings_city)
AND (venere.facilities.hotel_name2) = (guidacampeggi.facilities.campings_name))
q0 = scqG1 result set join scqG2 result set
SELECT H.name , H.address , H.city , H.price , S.facility , S.structure_name , S.structure_city
FROM hotels as H , facilities as S
WHERE (H.city = S.structure_city ) AND (H.name = S.structure_name )
ORDER BY H.price ASC
WISDOM D0.P1 – Integrated System Protoype 37
Local Query Execution
The MOMIS Query
Manager at work
WISDOM D0.P1 – Integrated System Protoype 38
Building the Final Result
Local results are forwarded by MOMIS to the query
processor.
The Executor component also retrieves results from
neighboring peers, computes the overall result by
taking into account the original user preferences, and
forwards it to the M-fire interface.
WISDOM D0.P1 – Integrated System Protoype 39
Showing Results in M-FIRE
Results are finally shown in M-fire using a table-based form:
 Solutions are listed vertically, each one with
its own table
 For each solution, variable bindings are listed
vertically
 For each binding a row is provided, where
the property name corresponding to the binding
variable is shown on the right side, and the
(literal) value is shown on the left side
WISDOM D0.P1 – Integrated System Protoype 40