Storing and Accessing Semantic Data
Download
Report
Transcript Storing and Accessing Semantic Data
Semantic
Data
Access
Semantic CMS Community
Lecturer
Organization
Date of presentation
Co-funded by the
European Union
1
Copyright IKS Consortium
Page:
Part I: Foundations
(1)
Introduction of Content
Management
Part II: Semantic Content
Management
(3)
Knowledge Interaction
and Presentation
(2)
Foundations of Semantic
Web Technologies
Part III: Methodologies
(7)
Requirements Engineering
for Semantic CMS
Representation
(4) Knowledge
and Reasoning
(8)
Designing
Semantic CMS
(5)
Semantic Lifting
(9)
Semantifying
your CMS
(6)
Storing and Accessing
Semantic Data
(10)
www.iks-project.eu
Designing Interactive
Ubiquitous IS
Copyright IKS Consortium
Page: 3
What is this Lecture about?
We
... which languages can be used
to model knowledge.
... how to extract knowledge
from content in a automatic way
(semantic lifting).
We
have learned ...
need a way ...
... to store the extracted
knowledge technically in an
accessible way.
www.iks-project.eu
Part II: Semantic Content
Management
(3)
Knowledge Interaction
and Presentation
Representation
(4) Knowledge
and Reasoning
(5)
Semantic Lifting
(6)
Storing and Accessing
Semantic Data
Copyright IKS Consortium
Page: 4
Outline
Semantic
Semantic Web
RDF
Semantic
Data Storage
Triple Stores
Semantic
Data
Data Access
SPARQL
RQL
API Calls
www.iks-project.eu
Copyright IKS Consortium
Page: 5
Semantic Data
Stands
for machine understandable information
Allows computers to figure out the data without user
interference
Allows computers act intelligently without programming
for each task
www.iks-project.eu
Copyright IKS Consortium
Page: 6
Semantic Data
Provides
Applications find out subsequent information based on the
previous relations. (e.g. Eiffel Tower -> Paris -> France)
Allows
infrastructure to get practical results
reasoning capabilities
Providing extraction of related information which is not
directly linked
www.iks-project.eu
Copyright IKS Consortium
Page: 7
Semantic Web
A classical
“Web of data”
Extends
generic description:
the World Wide Web
By encouraging,
Common
language for representing data
Transformable to/from disparate sources such as relational
databases, XML, etc (RDF)
Common
reusable data model to represent data from different
domains in common terms (RDFS, OWL, etc)
Rules to enable applications reason over the information
(SWRL)
www.iks-project.eu
Copyright IKS Consortium
Page: 8
Semantic Web Layer Cake
Semantic Web Layer Cake, Image source: http://www.w3.org/2007/03/layerCake.svg
www.iks-project.eu
Copyright IKS Consortium
Page: 9
Semantic Web
So many organizations publishing their data in different
domains
Media
Geographic
Government
…
Whole set contains approximately 30 billion triples
One of the largest collections is DBPEDIA
Semantified version of Wikipedia
Example:
Obtain cities of China that have population over 20 million
Needs efficient storage and query for semantic data
www.iks-project.eu
Copyright IKS Consortium
Page: 10
Representation of Semantic
Data
RDF
The common data format
An abstract model with several serialization formats
Consists of statement referred as triples having the form
(subject, predicate, object) where,
Subject:
any resource identifier
Predicate: a resource identifier of any property
Object: either a resource identifier or a literal value
www.iks-project.eu
Copyright IKS Consortium
Page: 11
Storing Semantic Data
Need
for specialized designs for triple collections
Two modalities:
Relational databases
Triple stores
Mostly
used for storage
Lots of implementations
They
can also be RDB based.
www.iks-project.eu
Copyright IKS Consortium
Page: 12
Triple Store
A purpose-built
database for the storage and retrieval of
RDF data.
Optimized place to add, remove and query for triples.
Each triple in the TripleStore complies with the form
(subject, predicate, object)
www.iks-project.eu
Copyright IKS Consortium
Page: 13
Considering XML Databases
XML databases are existing storage systems for semistructured data
Idea: Transform RDF to XML and store it in XML databases
Yet, XML data model is not exactly same with semantic data
XML data model is a tree-like structure
RDF data is represented through a graph without an hierarchy
www.iks-project.eu
Copyright IKS Consortium
Page: 14
Considering XML Databases
XML Databases are not suitable for storage and querying
RDF
Only simple manipulations can be handled through XML query
languages
RDF Schema processing and inference is not possible
Standard RDF/XML mapping is unsuitable
www.iks-project.eu
Copyright IKS Consortium
Page: 15
Monolithic approach for DB
Based Triple Stores
Generic
representation for all RDF schemas
Only two tables are used
Resources table
Triples table
www.iks-project.eu
Copyright IKS Consortium
Page: 16
Monolithic approach for DB
Based Triple Stores
predid
subid
objid
6
2
5
id
uri
1
1
http://www.iks.og/topics.rdfs#Hotel
3
7
2
http://www.iks.og/topics.rdfs#HotelDirections
5
1
8
3
http://www.oclc.org/dublincore.rdfs#title
5
9
2
4
http://www.iks.og/schema.rdf#Ext.Resource
3
9
5
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
6
http://www.w3.org/2000/01/rdf-schema#subClassOf
7
http://www.w3.org/1999/02/22-rdf-syntaxns#Property
8
http://www.w3.org/2000/01/rdf-schema#Class
9
rl
www.iks-project.eu
objvalue
Sunscal
e
Copyright IKS Consortium
Page: 17
Triples Stores
Can
be categorized into 3 category:
In memory triple stores
Used
for certain operations like benchmarking, caching, etc
Native triple stores
Provides
their own implementations (Virtuoso, Mulgara,
AllegroGraph, …)
Non memory non native triple stores
Are
built on third party databases (Jena SDB, Kaon, …)
www.iks-project.eu
Copyright IKS Consortium
Page: 18
Functionalities provided by
Triple Stores
RDBMS-support
General RDF model access
Query language support in the store such as RQL,
SPARQL
Some stores provide:
Provenance
- tracking of who-said-what
APIs for accessing triple store over network
Very few stores provide:
Full
text search
Inference and rule languages
www.iks-project.eu
Copyright IKS Consortium
Page: 19
Example Triple Store implementations
RDF Suite
Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis,
Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite:
Managing Voluminous RDF Description Bases , SemWeb, 2001
Based on an ORDBMS model
Sesame
Jena
http://www.openrdf.org/
Relational databases (mysql, postgres, oracle)
http://www.hpl.hp.com/semweb/jena2.htm
Relational databases (mysql , postgres, oracle)
Virtuoso
http://virtuoso.openlinksw.com/
Native RDF Quad Storage (Physical Quads)
www.iks-project.eu
Copyright IKS Consortium
Page: 20
RDFSuite (ICS-Forth)*
* IST-1999-13479 C-Web, IST-2000-26074 Mesmuses
www.iks-project.eu
Copyright IKS Consortium
Page: 21
How triples are stored and
accessed in RDF Suite
Separate
tables are created to store resources
Properties,
subClasses, subProperties and instances
Indices
on attributes like URI, source and target
Querying is possible through RQL
www.iks-project.eu
Copyright IKS Consortium
Page: 22
How triples are stored and
accessed in RDF Suite
[Figure from *]
www.iks-project.eu
Copyright IKS Consortium
*Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
Page: 23
Sesame Architecture
DBMS-independent API for
accessing triple
repositories
SAIL API
A set of Java interfaces
between other modules and
repository
Abstract from the actual
storage mechanism
Query Module
RQL
support
Different ways to
communicate with clients
Through
Protocol handlers
www.iks-project.eu
Copyright IKS Consortium
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002
Page: 24
SAIL API over PostgreSQL
PostgreSQL
Object-relational
DBMS
www.iks-project.eu
Support sub-table
relations between its
tables for providing
RDF Schema class
and property
subsumption
Individuals are
represented under
separate tables
created for resources
Difficult to add table
Copyright IKS Consortium
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002
Page: 25
SAIL API over MySQL
MySQL
www.iks-project.eu
The database
schema does
not change
when the
RDFS changes
Has advantage
where RDFS is
unstable
Copyright IKS Consortium
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002
Page: 26
Jena2 Architecture
www.iks-project.eu
Copyright IKS Consortium
Page: 27
Jena2 Architecture
www.iks-project.eu
Copyright IKS Consortium
*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on
Semantic Web and Databases
Page: 28
Jena2
Jena2
Denormalized schema
Avoids
unnecessary joins by merging URIs, literals in
statements table
Multiple statement tables
Better
locality and caching
Property Tables
www.iks-project.eu
Copyright IKS Consortium
Page: 29
Normalized vs Denormalized
Tables
www.iks-project.eu
Copyright IKS Consortium
Page: 30
Property Tables
Triple Store Only
Subject
Property
Person Property Table
Object
ID
name
age
gender
person1
name
Alice
person1
age
32
person1
twinOf
person2
person1
faxPhone
x1234
person1
adminPh
x5678
person2
name
Bob
person1
twinOf
person2
person2
age
35
person1
faxPhone
x1234
person2
adopteeOf person6
person1
adminPh
x5678
person2
friendOf
person8
person2
adopteeOf
person6
person2
gender
male
person2
friendOf
person8
www.iks-project.eu
p1
Alice
32
-
p2
Bob
35
male
Triple Store
Subject
Property
Object
Copyright IKS Consortium
*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on
Semantic Web and Databases
Page: 31
Jena Persistence Options
SDB
Scalable storage and query for RDF
Specifically designed for SPARQL support
Supports: MySQL, PostgreSQL, Oracle 11g, Microsoft
SQL server and IBM DB2
Scales to graphs of 100 million triples
www.iks-project.eu
Copyright IKS Consortium
Page: 32
Jena Persistence Options
TDB
Provides for large scale storage and query of RDF
datasets using a pure Java engine
Supports SPARQL
A non-transactional, faster database solution for use by a
single system
It scales well beyond SDB and is simpler to setup
www.iks-project.eu
Copyright IKS Consortium
Page: 33
Virtuoso
General
purpose RDBMS with extensive RDF
adaptations
RDF data is stored as RDF quads, i.e. it supports RDF
with named graphs
i.e. graph, subject, predicate, object tuples
The columns are G for graph, P for predicate, S for subject
and O for object
www.iks-project.eu
Copyright IKS Consortium
Page: 34
Querying Semantic Data
Semantic
data can be queried from triple stores by
Various query languages
SPARQL
Different endpoints provided
RQL
RDQL
SeRQL
…
API Calls
Through
proprietary APIs of different projects
Linked Data
www.iks-project.eu
Copyright IKS Consortium
Page: 35
SPARQL
Is
an RDF query language
Standardized by W3C consortium
Similar concept of SQL for databases
Syntactically
resembles to SQL
RDF Graphs instead of databases
www.iks-project.eu
Copyright IKS Consortium
Page: 36
SPARQL Endpoints
Provides
functionality to query the knowledge base via
the SPARQL language
Accepts queries and returns results through HTTP
protocol
Query results can be in different formats such as
RDF
XML
HTML
JSON
CSV
www.iks-project.eu
Copyright IKS Consortium
Page: 37
Semantic Data Access With API
Calls
Open
source projects provides APIs to manipulate RDF
data
Jena
Apache Clerezza
Sesame
JRDF
www.iks-project.eu
Copyright IKS Consortium
Page: 38
Jena
Jena
provides a rich API to manipulate the RDF stored in
the underlying triple store.
Model to represent graphs
CRUD methods for triples
Querying methods for existing resources
See
the next slide for the code snippet…
www.iks-project.eu
Copyright IKS Consortium
Page: 39
Jena Code Snippet
String personURI = "http://somewhere/JohnSmith";
String givenName = "John";
String familyName = "Smith";
String fullName = givenName + " " + familyName;
// create an empty Model which represents an RDF graph
Model model = ModelFactory.createDefaultModel();
// create the resource which will produce the triples in the next slide
Resource johnSmith
= model.createResource(personURI)
.addProperty(VCARD.FN, fullName)
.addProperty(VCARD.N,
model.createResource()
.addProperty(VCARD.Given, givenName)
.addProperty(VCARD.Family, familyName));
www.iks-project.eu
Copyright IKS Consortium
Page: 40
Jena
Created
triples with the code snippet in previous slide:
(<http://somewhere/JohnSmith>, VCARD.FN, “John
Smith”)
(<http://somewhere/JohnSmith>, VCARD.FN, _)
(_, VCARD.Given, “John”)
(_, VCARD.Family, “Smith”)
• Note that _ symbol represents a blank node
www.iks-project.eu
Copyright IKS Consortium
Page: 41
Apache Clerezza
Provides
an API regardless from the different triples
stores it supports
Its API provides a model to represent RDF graphs and
manipulate those graphs
Also provides an SPARQL endpoint to query the stored
knowledge
www.iks-project.eu
Copyright IKS Consortium
Page: 42
Apache Clerezza Code Snippet
Simple code snippet adding two triples to the graph:
String base = “http://www.example.org#”;
MGraph g = new SimpleMGraph();
g.add( new TripleImpl(
new UriRef(base + “JohnSmith”),
new UriRef(rdf:Type)
new UriRef(foaf:Person)));
g.add( new TripleImpl(
new UriRef(base + “JohnSmith”),
new UriRef(VCARD:FN)
LiteralFactory.getInstance().createTypedLiteral(“John”)));
www.iks-project.eu
Copyright IKS Consortium
Page: 43
Linked Data
Interrelated
datasets on the Web so that computers can
explore them
Has a standard format to be accessed and managed
Provides integration and reasoning on a huge amount
of data on the Web
www.iks-project.eu
Copyright IKS Consortium
Page: 44
Linked Data
Four
famous principles of linked data represented by
Tim Berners-Lee
Use URIs as names of things
Use HTTP URIs to provide dereferencable data to people
When an URI is dereferenced provide useful information in
standard format (RDF, SPARQL)
Provide links to other URIs to make possible discovery of
related data
www.iks-project.eu
Copyright IKS Consortium
Page: 45
Linked Data
www.iks-project.eu
Copyright IKS Consortium
Page: 46
Linking Open Data Project
Is
an W3C SWEO Project
Aims to make data freely to everyone
Aims to publish open data sets as RDF and set
semantic relationships between them
Serves information in a machine readable format
Enriches content
Reduces duplication
Linked
datasets increasing rapidly
A large number of datasets are linked already
www.iks-project.eu
Copyright IKS Consortium
Page: 47
Linked Datasets As of October
2008
www.iks-project.eu
Copyright IKS Consortium
Page: 48
Linked Datasets As of September
2010
www.iks-project.eu
Copyright IKS Consortium
Page: 49
2011
www.iks-project.eu
Copyright IKS Consortium
Page: 50
Access Data In The Cloud
Follow
the RDF links representing the “things”
SPARQL Endpoints
Ready to use software to discover linked data (See the
next slide)
www.iks-project.eu
Copyright IKS Consortium
Page: 51
Linked Data Applications
Lots of application on top of the linked data
Just google
Tabulator
Marbles
Openlink RDF Browser
…
RDF Crawlers
RDF Browsers
Also see the following link containing a number of linked data
applications:
http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/
LinkingOpenData/Applications
www.iks-project.eu
Copyright IKS Consortium
Page: 52
Available SPARQL Endpoints
http://dbpedia.org/sparql
http://www4.wiwiss.fu-berlin.de/dblp/
To
see possible SPARQL endpoints providing a certain
URI see
http://void.rkbexplorer.com/endpoint-search/
See
also a list of alive SPARQL endpoints
http://www.w3.org/wiki/SparqlEndpoints
www.iks-project.eu
Copyright IKS Consortium
Page: 53
References
http://www.w3.org/TR/rdf-sparql-query
http://jena.sourceforge.net/tutorial/RDF_API/index.html
http://www.slideshare.net/ldodds/sparql-tutorial
http://www.slideshare.net/shamod/a-hands-on-overview-of-the-semanticweb?src=related_normal&rel=1702851
http://www.cambridgesemantics.com/2008/09/sparql-by-example
http://linkeddata-specs.info/
http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
http://www.bioontology.org/wiki/images/6/6a/Triple_Stores.pdf
Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The
ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for
Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web
Conference, 2002
Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in
Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases
http://jena.sourceforge.net/DB/index.html
http://virtuoso.openlinksw.com/
www.iks-project.eu
Copyright IKS Consortium