Transcript Slide 1

Ferdowsi University of Mashhad
Web Technology Lab. (WTLab), www.wtlab.um.ac.ir
Linked Data Group (LDG)
Linked Data at present
Using Linked Data
Mahboubeh Dadkhah
May 11, 2011
You may know the
Linked Data
History
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Linked Data Design Issues by TimBL July 2006
Linked Open Data Project WWW2007
First LOD Cloud May 2007
1st Linked Data on the Web Workshop WWW2008
1stTriplification Challenge 2008
How to Publish Linked Data Tutorial ISWC2008
BBC publishes Linked Data 2008
2nd Linked Data on the Web Workshop WWW2009
NY Times announcement SemTech2009 - ISWC09
1st Linked Data-a-thon ISWC2009
1st How to Consume Linked Data Tutorial ISWC2009
Data.gov.uk publishes Linked Data 2010
2st How to Consume Linked Data Tutorial WWW2010
1st International Workshop on Consuming Linked Data COLD2010
…
May 2007
Cloud statistics
Now that the Linked Data is here
What to do next?
Let’s Make Use of It
Linked Data
• Before using we should be sure that we
understand the meaning.
• What was the problem:
Searching and Finding
Search for
Football Players who went to the University of Texas at
Austin, played for the Dallas Cowboys as Cornerback
Why cant we find it?
Current Web = internet + links + docs
So, what to do?
• Make it easy for computers/software to find
THINGS
Publish Thing
• As data
• In a standardized way: RDF
• RDF data is serialized in different ways:
– RDF/XML, RDFa, N3, Turtle, JSON
http://…/
review1
hasReview
http://…/i
sbn978
description
hasReviewer
Programming the
Semantic Web
title
sameAs
Awesome
Book
http://…/
reviewer
http://…/i
sbn978
name
author
Toby Segaran
isbn
978-0-596-15381-6
sameAs
http://juanse
queda.com/id
Juan
Sequeda
livesIn
name
publisher
http://…/p
ublisher1
name
http://dbpedia.org/Austin
Juan Sequeda
O’Reilly
2009’s Top 10 Linked Data Research Issues
• Data Linking and Fusion
– linking algorithms and heuristics, identity resolution
– Web data integration and data fusion
– evaluating quality and trustworthiness of Linked Data
• Linked Data Application Architectures
– crawling, caching and querying Linked Data on the Web; optimizations,
performance
– Linked Data browsers, search engines
– applications that exploit distributed Web datasets
• Data Publishing
– tools for publishing large data sources as Linked Data on the Web (e.g. relational
databases, XML repositories)
– embedding data into classic Web documents (e.g. GRDDL, RDFa, Microformats)
– licensing and provenance tracking issues in Linked Data publishing
– business models for Linked Data publishing and consumption
2010’s Top 10 Linked Data Research Issues
•
Linked Data Application Architectures
– crawling, caching and querying Linked Data
– dataset dynamics and synchronization
– Linked Data mining
•
Data Linking and Data Fusion
–
–
–
–
•
linking algorithms and heuristics, identity resolution
Web data integration and data fusion
link maintanance
performance of linking infrastructures/algorithms on Web data
Quality, Trust and Provenance in Linked Data
– tracking provenance and usage of Linked Data
– evaluating quality and trustworthiness of Linked Data
– profiling of Linked Data sources
•
User Interfaces for the Web of Data
– approaches to visualizing and interacting with distributed Web data
– Linked Data browsers and search engines
•
Data Publishing
– tools for publishing large data sources as Linked Data on the Web (e.g. relational databases, XML
repositories)
– embedding data into classic Web documents (e.g. RDFa, Microformats)
– describing data on the Web (e.g. voiD, semantic site maps)
– licensing issues in Linked Data publishing
2011’s Top 10 Linked Data Research Issues
•
Foundations of Linked Data
–
–
–
•
Data Linking and Fusion
–
–
–
•
publishing legacy data sources as Linked Data on the Web
cost-benefits of the 5 star LOD plan
Data Usage
–
–
–
–
–
•
access authentication mechanisms for Linked Datasets (WebID, etc.)
authorisation mechanisms for Linked Datasets (WebACL, etc.)
enabling write-access to legacy data sources (Google APIs, Flickr API, etc.)
Data Publishing
–
–
•
entity consolidation and linking algorithms
Web-based data integration and data fusion
performance and scalability of integration architectures
Write-enabling the Web of Data
–
–
–
•
Web architecture and dataspace theory
dataset dynamics and synchronisation
analyzing and profiling the Web of Data
tracking provenance of Linked Data
evaluating quality and trustworthiness of Linked Data
licensing issues in Linked Data publishing
distributed query of Linked Data
RDF-to-X, turning RDF to legacy data
Interacting with the Web of Data
–
–
–
approaches to visualising Linked Data
interacting with distributed Web data
Linked Data browsers, indexers and search engines
Linked Data makes the web appear
as
ONE
GIANT
HUGE
GLOBAL
DATABASE!
Do you remember
Search and Find
?
Query Linked Data with
SPARQL Endpoints
• Linked Data sources usually provide a SPARQL
endpoint for their dataset(s)
• SPARQL endpoint: SPARQL query processing
service that supports the SPARQL protocol*
• Send your SPARQL query, receive the result
* http://www.w3.org/TR/rdf-sparql-protocol/
http://www.w3.org/wiki/SparqlEndpoints
http://labs.mondeca.com/sparqlEndpointsStatus/
SPARQL queries over multiple datasets
How to do this?
1.
2.
3.
4.
Issue follow-up queries to different endpoints
Querying a central collection of datasets
Build store with copies of relevant datasets
Use query federation system
1- Follow-up Queries
• Idea: issue follow-up queries over other
datasets based on results from previous
queries
• Substituting placeholders in query templates
String s1 = "http://cb.semsol.org/sparql";
String s2 = "http://dbpedia.org/sparql";
String qTmpl = "SELECT ?c WHERE{ <%s>rdfs:comment ?c }";
String q1 = "SELECT ?s WHERE { ...";
QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1);
ResultSet results1 = e1.execSelect();
while ( results1.hasNext() ) {
QuerySolution s1 = results.nextSolution();
String q2 = String.format( qTmpl, s1.getResource("s"),getURI() );
QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2);
ResultSet results2 = e2.execSelect();
while ( results2.hasNext() ) {
// ...
}
Find a list of companies
e2.close();
Filtered by some criteria
}
and return DbpediaURIs
e1.close();
from them
1- Follow-up Queries
Advantage
– Queried data is up-to-date
× Drawbacks
– Requires the existence of a SPARQL endpoint for each
dataset
– Requires program logic
– Very inefficient
2- Querying a Collection of Datasets
• Idea: Use an existing SPARQL endpoint that
provides access to a set of copies of relevant
datasets
• Example:
– SPARQL endpoint over a majority of datasets from
the LOD cloud at:
http://uberblic.org
http://lod.openlinksw.com/sparql
(Linked) Data Marketplaces
• FactForge
– Integrates some of the most central LOD datasets
– General-purpose information(not specific to a
domain)
– 1.2billion explicit and 1 billion inferred statements
– The largest upper-level knowledge base
– http://www.FactForge.net
• LinkedLifeData
– 25 of the most popular life-science datasets
– 2.7billion explicit and 1.4 billion inferred statements
– http://www.LinkedLifeData.com
2- Querying a Collection of Datasets
Advantage
– No need for specific program logic
× Drawbacks
– Queried data might be out of date
– Not all relevant datasets in the collection
3- Own Store of Dataset Copies
• Idea: Build your own store with copies of relevant
datasets and query it
• Possible stores:
–
–
–
–
–
–
Jena TDB http://jena.hpl.hp.com/wiki/TDB
Sesame http://www.openrdf.org/
OpenLink Virtuoso http://virtuoso.openlinksw.com/
4store http://4store.org/
AllegroGraphhttp://www.franz.com/agraph/
etc.
3- Own Store of Dataset Copies
Advantages
– No need for specific program logic
– Can include all datasets
– Independent of the existence, availability, and
efficiency of SPARQL endpoints
× Drawbacks
–
–
–
–
Requires effort to set up and to operate the store
Ideally, data sources provide RDF dumps; if not?
How to keep the copies in sync with the originals?
Queried data might be out of date
4- Federated Query Processing
• Idea: Querying a mediator which distributes
sub-queries to relevant sources and integrates
the results
4- Federated Query Processing
• DARQ (Distributed ARQ)
–
–
–
–
http://darq.sourceforge.net/
Query engine for federated SPARQL queries
Extension of ARQ (query engine for Jena)
Last update: June 28, 2006
• Semantic Web Integrator and Query
Engine(SemWIQ)
– http://semwiq.sourceforge.net/
– Actively maintained!
4- Federated Query Processing
Advantages
– No need for specific program logic
– Queried data is up to date
× Drawbacks
– Requires the existence of a SPARQL endpoint for each
dataset
– Requires effort to set up and configure the mediator
In any case
• You have to know the relevant data sources
– When developing the app using follow-up queries
– When selecting an existing SPARQL endpoint over a collection of
dataset copies
– When setting up your own store with a collection of dataset
copies
– When configuring your query federation system
• You restrict yourself to the selected sources
Automated Link Traversal
Idea: Discover further data by looking up relevant URIs in your
application
Can be combined with the previous approaches
Link Traversal Based Query Execution
• Applies the idea of automated link traversal to the
execution of SPARQL queries
• Idea:
– Intertwine query evaluation with traversal of RDF links
– Discover data that might contribute to query results during
query execution
• Alternately:
– Evaluate parts of the query
– Look up URIs in intermediate solutions
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Advantages
–
–
–
–
No need to know all data sources in advance
No need for specific programming logic
Queried data is up to date
Does not depend on the existence of SPARQL endpoints provided by
the data sources
× Drawbacks
– Not as fast as a centralized collection of copies
– Unsuitable for some queries
– Results might be incomplete (do we care?)
Implementations
• Semantic Web Client library (SWClLib) for Java
http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
• SWIC for Prolog
http://moustaki.org/swic/
• SQUIN http://squin.org
– Provides SWClLib functionality as a Web service
– Accessible like a SPARQL endpoint
Real World Example
What is a Linked Data application?
Software system that makes use of data on the
web from multiple datasets and that benefits
from links between the datasets
Characteristics of Linked Data
Applications
• Consume data that is published on the web following the
Linked Data principles: an application should be able to
request, retrieve and process the accessed data.
• Discover further information by following the links between
different data sources: the fourth principle enables this.
• Combine the consumed linked data with data from sources
(not necessarily Linked Data).
• Expose the combined data back to the web following the
Linked Data principles.
• Offer value to end-users.
1st Linked Data-a-thon
• co-located at 8th International Semantic Web Conference (ISWC 2009)
• The overall goal of this event was to
– Create a Linked Data application that shows innovative and new functionality
– Show that a "quick and dirty" Linked Data application can be developed in 3
days
Winners
• United States Linked Data Overlay
–
•
www.diversity-search.info
–
–
•
•
Use Linked Data about geographical locations and display it on Google Earth.
Web and Image search engine augmented with Linked Data
Pictures of David Beckham playing football in the different clubs he has played for
Find traditional Chinese medicine as an alternative to western drugs
iGoogr: Imagine Google was using Good Relations vocabulary for
e‐commerce
Self-Service Development of Linked Data Applications
• Semantic cloud computing
• the Information Workbench as a self-service platform for the fast
development of domain-specific Linked Data solutions
• Designed with the goal to leverage Linked Data deployment in the
enterprise
• implements concepts and features for data integration, interactive
visualization, exploration and analytics, as well as the collaborative
acquisition and authoring of Linked Data
•
•
Data sources can be dynamically integrated at the click of a button
the user interface can be flexibly customized based on a large, extensible
collection of widgets supporting data visualization, exploration, and
collaboration
Self-Service Development of Linked Data Applications
• Platform for Linked Data Application Development
– Base functionality to build applications without any programming
– SDK for easy extensions
– Available in Open Source at http://iwb.fluidops.com/
• Covering the entire lifecycle of interacting with Linked Data
–
–
–
–
–
Discovery of data sources
Integration of data sources
Visualization
Search and Exploration
Collaborative generation of data
• Targeted at
– Linked Open Data, Linked Government Data
– Linked Enterprise Data
– Combinations thereof
Still remember
Search and Find
?
challenges
•
•
•
•
discovering relevant data sources
discovering useful vocabularies
The query
Data Quality
• Finding more/useful Links
Challenges (COLD 2010)
•
•
•
•
Web scale data management (indexing, crawling, etc.)
Query processing over multiple linked datasets
Search in the Web of Data
Auto-discovery
– of URIs,
– of additional data that is not from the authoritative source of a URI,
– of relevant linked datasets in general
•
•
Caching and replication
Dataset dynamics
– processing change notifications,
– keeping consistency,
– temporal tracking of linked datasets
•
•
•
Reasoning on Linked Data from multiple sources
Knowledge discovery deriving insights from the Web of Data
Information quality of Linked Data
– information quality assessment,
– trustworthiness,
– provenance
•
User interface research for the interaction with the Web of Data
– user interaction and usability,
– visualizing Linked Data,
– natural language interfaces
Thank You
Any Opinion!
Any Question!