Linked Data: Principles and State of the Art - uni

Download Report

Transcript Linked Data: Principles and State of the Art - uni

3rd Asian Semantic Web Conference (ASWC 2008)
DIST Workshop, Bangkok, Thailand
8 December 2008
Fusing the Web of Data
Christian Bizer, Freie Universität Berlin
Christian Bizer: Fusing the Web of Data (12/08/2008)
Overview
1. The Web of Data
 Linked Data Principles
 Linked Data Deployment
 Applications that consume Linked Data
2. Linked Data Fusion
1. The Linking Process
2. Inconsistency Resolution
3. Provenance Tracking and Explanations
Christian Bizer: Fusing the Web of Data (12/08/2008)
The Classic Web
Single global information space
Search
Engines
Web
Browsers
1. URLs as
 globally unique IDs
 retrieval mechanism
2. HTML as shared content format
3. Hyperlinks
HTML
HTML
HTML
Shortcomings
hyperlinks
 Content is not well structured
 You can not ask expressive
queries
A
B
C
 You can not process content
within applications
Christian Bizer: Fusing the Web of Data (12/08/2008)
Linked Data
Use Semantic Web technologies to
1. publish structured data on the Web,
2. set links between data from one data source
to data within other data sources.
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
typed
links
A
typed
links
B
typed
links
C
typed
links
D
E
Christian Bizer: Fusing the Web of Data (12/08/2008)
Linked Data Principles
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful RDF
information.
4. Include RDF statements that link to other URIs so that
they can discover related things.
Tim Berners-Lee 2007
http://www.w3.org/DesignIssues/LinkedData.html
Christian Bizer: Fusing the Web of Data (12/08/2008)
The RDF Data Model
rdf:type
pd:cygri
foaf:name
foaf:Person
Richard Cyganiak
foaf:based_near
dbpedia:Berlin
Christian Bizer: Fusing the Web of Data (12/08/2008)
Data objects are identified with HTTP URIs
rdf:type
pd:cygri
foaf:name
foaf:Person
Richard Cyganiak
foaf:based_near
dbpedia:Berlin
pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri
dbpedia:Berlin = http://dbpedia.org/resource/Berlin
Christian Bizer: Fusing the Web of Data (12/08/2008)
Dereferencing URIs over the Web
rdf:type
pd:cygri
foaf:name
foaf:Person
3.405.259
Richard Cyganiak
foaf:based_near
dp:population
dbpedia:Berlin
skos:subject
dp:Cities_in_Germany
Christian Bizer: Fusing the Web of Data (12/08/2008)
Dereferencing URIs over the Web
rdf:type
pd:cygri
foaf:name
foaf:Person
3.405.259
Richard Cyganiak
foaf:based_near
dp:population
dbpedia:Berlin
skos:subject
skos:subject
dbpedia:Hamburg
dbpedia:Muenchen
dp:Cities_in_Germany
skos:subject
Christian Bizer: Fusing the Web of Data (12/08/2008)
The Disco – Hyperdata Browser
Christian Bizer: Fusing the Web of Data (12/08/2008)
Christian Bizer: Fusing the Web of Data (12/08/2008)
2. Linked Data Deployment on the Web
 Is this real?
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
typed
links
A
typed
links
B
typed
links
C
typed
links
D
E
Christian Bizer: Fusing the Web of Data (12/08/2008)
W3C Linking Open Data Project
 Community effort to
 publish existing open license datasets as Linked Data on the Web
 interlink things between different data sources
Christian Bizer: Fusing the Web of Data (12/08/2008)
LOD Datasets on the Web: May 2007
 Over 500 million RDF triples
 Around 120,000 RDF links between data sources
Christian Bizer: Fusing the Web of Data (12/08/2008)
Example RDF Links
 RDF links from DBpedia to other data sources
<http://dbpedia.org/resource/Berlin> owl:sameAs
<http://sws.geonames.org/2950159> .
<http://dbpedia.org/resource/Tim_Berners-Lee> owl:sameAs
<http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007> .
 RDF link from a FOAF profile to DBpedia
<http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest
<http://dbpedia.org/resource/Semantic_Web> .
Christian Bizer: Fusing the Web of Data (12/08/2008)
LOD Datasets on the Web: February 2008
Christian Bizer: Fusing the Web of Data (12/08/2008)
LOD Datasets on the Web: September 2008
> 2 billion RDF triples
> 6 million RDF links
Christian Bizer: Fusing the Web of Data (12/08/2008)
The Bio2RDF Project
 Goals
1. Make bioinformatics data available in RDF format on the Web.
2. Promote the linked data vision within the bioinformatics community.
3. Answer questions which were not possible or practical to ask before.
 Participants
 Université Laval, Canada
 Queensland University of Technology, Australia
Christian Bizer: Fusing the Web of Data (12/08/2008)
The Bio2RDF Cloud
 27 data sources
 260 million records
 2,7 billion RDF triples
Christian Bizer: Fusing the Web of Data (12/08/2008)
3. Applications
What can I do with this?
Linked Data
Browsers
Linked Data
Mashups
Search
Engines
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
typed
links
A
typed
links
B
typed
links
C
typed
links
D
E
Christian Bizer: Fusing the Web of Data (12/08/2008)
Linked Data Browsers
 Tabulator Browser (MIT, USA)
 Disco Hyperdata Browser (FU Berlin, DE)
 OpenLink RDF Browser (OpenLink, UK)
 Zitgist RDF Browser (Zitgist, USA)
 Humboldt (HP Labs, UK)
 Fenfire (DERI, Irland)
 Marbles (FU Berlin, DE)
Christian Bizer: Fusing the Web of Data (12/08/2008)
Christian Bizer: Fusing the Web of Data (12/08/2008)
Linked Data Mashups
 Domain-specific applications using Linked Data from the Web
Christian Bizer: Fusing the Web of Data (12/08/2008)
DBtune Slashfacet
 Visualizes music-related Linked Data
 Uses LastFM, MySpace, and BBC data
Christian Bizer: Fusing the Web of Data (12/08/2008)
DBpedia Mobile
Geospatial entry point
into the Web of Data
Starts with DBpedia,
Revyu and Flickr data
Christian Bizer: Fusing the Web of Data (12/08/2008)
DERI Semantic Web Pipes
Christian Bizer: Fusing the Web of Data (12/08/2008)
Web of Data Search Engines
 Falcons (IWS, China)
 Sindice (DERI, Ireland)
 MicroSearch (Yahoo, Spain)
 Watson (Open University, UK)
 SWSE (DERI, Ireland)
 Swoogle (UMBC, USA)
Christian Bizer: Fusing the Web of Data (12/08/2008)
Falcons
Christian Bizer: Fusing the Web of Data (12/08/2008)
Christian Bizer: Fusing the Web of Data (12/08/2008)
Is this good enough?
No.
Christian Bizer: Fusing the Web of Data (12/08/2008)
2. Linked Data Fusion
Users want an
integrated view on all
data that is available
about an real-world
entity!
Application
Integrated
View
owl:sameAs
Data
Object 1
Data
Object 3
Data
Object 5
Data
Object 2
Data
Object 4
Data
Object 6
owl:sameAs
A
B
C
Christian Bizer: Fusing the Web of Data (12/08/2008)
Linked Data Fusion - Requirements
1. Map data into a single schema

so that data can be rendered and queried properly.
2. Smush data from all sources about a single real-world entity

while keeping track of information provenance.
3. Resolve inconsistencies in the data

by applying different data fusion heuristics.
4. Be able to explain the fusion process

Tim Berner-Lee‘s „Oh, yeah?“ button.
Christian Bizer: Fusing the Web of Data (12/08/2008)
Roles in the Linked Data Scenario
 Data Publisher
1. Publish data itself
2. Set RDF links to other
data items describing the
same real-world entity.
3. Reuse terms from
existing vocabularies or
set links to related
schemata.
4. Publish metadata about
- provenance
- timeliness
- data license
 Client Application
1. Map data into single
schema.
2. Smush data from
different sources about
real-world entity.
3. Resolve inconsistencies
in the data.
4. Keep track of information
provenance and lineage.
5. Explain fusion process.
Christian Bizer: Fusing the Web of Data (12/08/2008)
2.1 Setting RDF Links
 Today:
 Simple pattern- and graph-matching based techniques used to generate
links.
 Usually proprietary code.
 There is lots of existing work in database and knowledge
representation communities on identity resolution to be used.
 Rule-based approaches
 Distance-based techniques
 Probabilistic matching
 Supervised and unsupervised learning
 Using a wide range of distance metrics
see: Elmagarmid et al: Duplicate Record Detection: A Survey. KaDE, 2007.
Christian Bizer: Fusing the Web of Data (12/08/2008)
Linking Frameworks
 Goal: (Semi-)automatically generate RDF Links based on
declarative rules.
 Ongoing work
 Oktei Hassanzadeh (University of Toronto): ODDLinker
 Andriy Nikolov et al. (Open University): KnoFuss
 Julius Volz (Freie Universität Berlin): XXXX
CREATE LINKS owl:sameAs
BETWEEN a FROM dbpedia AND b FROM factbook
RESTRICT a TO { ?a rdf:type dbpedia-owl:Country }
METRIC { STRING_SIMILARITY(a/rdfs:label, b/rdfs:label),
NUM_SIMILARITY(a/p:populationEstimate,
b/factbook:population_total),
NUM_SIMILARITY(a/p:areaKm, b/factbook:area_total) }
THRESHOLDS MATCH 0.9 VERIFY 0.7;
seeAlso: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/
EquivalenceMining
Christian Bizer: Fusing the Web of Data (12/08/2008)
Schema Level RDF Links
 Today: Simple mappings:
 owl:equivalentClass
 owl:equivalentProperty
 rdfs:subClassOf
 rdfs:subPropertyOf
 UMBEL effort:
 Lots of existing work on schema/
ontology matching to build on.
 Missing: Agreed-upon way to
publish more expressive mapping
rules on the Web.
Christian Bizer: Fusing the Web of Data (12/08/2008)
2.2 Publish Metadata
 Document Metadata
 Dublin Core, Semantic Web Publishing Vocabulary
 Licensing Metadata
 Creative Commons Licensing Framework
 Open Data Commons Public Domain Dedication & Licence (PDDL)
# Metadata and Licensing Information
<http://dbpedia.org/data/Alec_Empire> rdf:type foaf:Document ;
dc:publisher <http://dbpedia.org/resource/DBpedia> ;
dc:date "2007-07-13"^^xsd:date ;
dc:rights <http://en.wikipedia.org/wiki/WP:GFDL> .
# The Document Content
<http://dbpedia.org/resource/Alec_Empire> rdf:type foaf:Person ;
foaf:name "Empire, Alec" ;
dbpedia-owl:associatedBand dbpedia:Atari_Teenage_Riot ;
Christian Bizer: Fusing the Web of Data (12/08/2008)
2.3. Provenance and Lineage Tracking
 Named Graphs data model
 part of W3C SPARQL Recommendation
 implemented by an increasing number of RDF stores
# TriG Representation of three Named Graphs
:G1 { :Monica ex:name "Monica Murphy" .
:Monica ex:homepage <http://www.monicamurphy.org> .
:Monica ex:email <mailto:[email protected]> .}
:G2 { :Monica rdf:type ex:Person .
:Monica ex:hasSkill ex:Programming }
:G3 { :G1 swp:assertedBy _:w1 .
_:w1 swp:authority :Chris .
_:w1 dc:date "2003-10-02"^^xsd:date .
:G2 swp:quotedBy _:w2 .
_:w2 swp:authority :Chris .
_:w2 dc:date "2003-09-03"^^xsd:date . }
Christian Bizer: Fusing the Web of Data (12/08/2008)
2.4. Inconsistency Resolution
 There is lots of overlap between
LOD datasets
 Places: Dbpedia, Geonames, Riese, …
 People: Freebase, LinkedMDB, DBLP, …
 Music: Dbpedia, Musicbrainz, Jamendo,..
 There are naturally lots of
inconsistencies
 Dbpedia: Person born at date X.
 Freebase: Person born at date Y.
 Dbpedia: Band album X.
 Musicbrainz: Band album Y.
 Geonames: City has geo-coordinates
 Freebase: City has geo-coordinates
Christian Bizer: Fusing the Web of Data (12/08/2008)
Inconsistency Resolution Strategies
 Pass it on.
 Pass conflicting values to the user and let him decide.
 Take the information
 If value is missing in dataset 1, use value from dataset 2
 Trust your friends
 Prefer information from certain sources.
 Cry with the wolfes
 Choose most common value
 Meet in the middle
 Take the averadge of all values
 Keep up to data
 Use the newest value
SeeAlso: Bleiholder and Naumann: Conflict Handling Strategies in an Integrated
Information System. WWW2006.
Christian Bizer: Fusing the Web of Data (12/08/2008)
2.5. Explain Data Provenance and Fusion Steps
 Tim Berner-Lee‘s „Oh, yeah?“ button.
 Existing Work:
 Deborah McGuinness et al: Inference Web: Portable Explanations for the
Web.
 Chris Bizer: Web Information Quality Assessment Framework (WIQA)
Christian Bizer: Fusing the Web of Data (12/08/2008)
Example WIQA Explanations
Christian Bizer: Fusing the Web of Data (12/08/2008)
Outlook
 Lots of exiting open issues to solve!
 DIST related technologies will be one of the hot topics
for next years (see for instance WWW2009)
 Important for LOD
 Progress with Publishing Schema Mappings on the Web
 Progress with Data Fusion
 Linked Data client applications that address all issues mentioned
 Please submit such solutions and client applications to the
 Semantic Web Challenge 2009
 Linked Data on the Web (LDOW2009) workshop at WWW2009
 IJSWIS Special Issue on Linked Data
Christian Bizer: Fusing the Web of Data (12/08/2008)
Thanks!
References
 Linking Open Data Project Wiki
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/
LinkingOpenData
 Tutorial on How to Publish Linked Data on the Web
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
Christian Bizer: Fusing the Web of Data (12/08/2008)