Part 1 - PPT

Download Report

Transcript Part 1 - PPT

RDF languages and storages
part 1 - expressivness
Maciej Janik
Conrad Ibanez
CSCI 8350, Fall 2004
Outline





Comparison of RDF languages
RQL
Sesame implementation
SquishQL - bases for RDQL
Redland store
Sesame


Web-based architecture
Persistent RDF store





use of traditional DBMS
use of dedicated RDF triple storage
Database independent
Scalable architecture
Query engine that implements RQL
Sesame - architecture


Written in Java
Modules:






HTTP/SOAP handler
Admin module
Query module
Export module
Repository
Abstraction Layer
Use of PostgreSQL
Sesame - modules

Admin module



incrementaly add RDF/RDFS
clearing repository
schema operations




recognise ‘type’, ‘subClassOf’, ‘subPropertyOf’
consistency checking
adding inferred facts to repository
RDF Export module

export RDF to standard XML-serialized format
Sesame - modules

Query module





query plan and optimizer similar to already known
DB solutions
query is translated to a set of simple RAL calls
each leaf of the query plan can ‘evaluate itself’
and pull data from RAL
data are returned as streams
lack of optimization on storage level
Sesame - modules

RAL - Repository Abstraction Layer







makes Sesame storage independent
API supportes RDF Schema semantics (e.g.
subsumption reasoning)
can be stacked one on another
interface oriented for persistance storage
(DBMS, Object-Relational DB)
data returned as streams
can even use net-based RDF services (!)
Due to poor performance, implemented
cache as one of RALs

cache mainly for RDFS, as it needs code
support in reasoning (subClassOf, ...)
Sesame - issues



Due to portability (RAL) cannot optimize for
underlying data storage
Incremental uploads (schema) are slow due to
rebuilding table in PostreSQL
Scaled up to 400,000 statements (RDF from
Wordnet)




very loosely connected graph
took 94 minutes (71 statements per second)
Slow upload of new data due to lots of required
database operations
Query works slow due to the same issues
Redland, Rasqual, Raptor





Storage for RDF triples - do not implement
any language by itself
This is the main module to include in RDF
manipulation system
Implemented in pure C for portability
Rich API enables to build modules on top of it
Rasqual - RDF query module



RDQL
SPARQL
Raptor - a fast RDF parser
Redland



Triple: Subject - Predicate - Object
API enables retrieval of triples
Highly optimized for performance

Indexes





SP 2 O
PO 2 S
SO 2 P
P 2 SO
S2P
-
get
get
get
get
get
target
source
relations between nodex
nodes in relation
relations for subject
Redland - RDF Model stores

Memory based

memory



Persistent

double-linked list
small models
basic indexes on triples


hashes - bdb memory

native storage with
DBD hashes, no
persistence
hashes with BDB

hashes - memory



3store



BDB hashes on disk
native storage, scales
tolow million of tuples
triplestore from AKT
project
not well supported
mysql

uses MYSQL DB
Redland - class diagram

Efficient implementation
of triple in memory





use of pointers
URI value separated
Strict memory
management - no leaks
Abstraction of model to
support different
storages
Fast parser / serializer
Redland

API available in different languages


API for manipulating



triples, URI/literals, graphs
Portable - can built in most OSes
Scalable to handle millions of triples



C, C#, Java, Perl, Python, PHP, Ruby, Tcl
while using of persistent storage
but indexing is very space-consuming
Support for context and hierarchy of models
RDF languages and storages
part 2 - indexing semi-structure data
Maciej Janik
Conrad Ibanez
CSCI 8350, Fall 2004