Web semantization

Download Report

Transcript Web semantization

Semantic Web infrastructure Trisolda current state and perspectives Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK http://www.ksi.mff.cuni.cz/semwex/

10. Mixer 26.11.2008

Semantic web vs. semantization

 Semantic web vision  Tim Berners-Lee  “The Semantic Web,” Scientific Am. 2001  semantic research generously funded 

'hardly one has ever seen ...'

  New buzzwords  Web 2.0, Web 3.0, Social web, Web of data, Meshups, … Semantic web died?

 no, not yet born   Web Semantization

Semantic technologies

Browser HTML HTTP TCP/IP

Technical details

Semantic web services

Trisolda

 Motto  'hardly one has ever seen ...' the semantic web  data from real life     incomplete, duplicated, inaccurate, >20 millions triples Jena  very slow load, over >1 million of triples → crash Sesame  unable to load more then 200 000 triples  exponential complexity for loading where is a working platform for semantic web research?

 Technology background   Repository – data integration DataPile

Trisolda

 Trisolda Architecture  Import interfaces  Repository  Querying & Executors

Repository

  Trisolda Repository     Stores incoming data Retrieves results for queries Stores used ontology DataPile structure  holds data in any format Applications server    Not all data and knowledge available when imported  the knowledge is not accurate Background worker    inferencing data unifications reasoner Framework for plug-ins

Import

    Direct import   data in data sources converters to the used ontology Crawling wild Web    Egothor   web crawler AgentMat parsed pages stored deductors deduce data and ontology  real life data  incomplete, duplicated, inaccurate Import modes   batch insert immediate insert

Querying

  Query API    Based on simple graph matching  query: set of RDF triples with var.

 result: multiset of possible variable mapping – a relation Not another SQL-like language  set of C++ classes and operators Query evaluation  levels of support by q engines Query environments    present outputs examples: rep. browser, RDF visualizer, semantic executors service composition - conductors

AgentMat - data semantization framework

AgentMat - data extraction

Future work

 Conclusions    working infrastructure  currently not working - re-deployment, AgentMat & TriQ integration gathering, storing and querying of semantic data platform for research and experiments  Future work & long-term goals  specialized semantic data storage    semantic acquisition, data semantization interface-based loosely coupled network of Semantic Web repositories semantic computing, services, composition, executors ...

Selected Publications

      Beňo, Míšek, Zavoral: AgentMat: Framework for Data Scraping and Semantization, 3rd International Conference on Research Challenges in Information Science, IEEE, 2009 Dokulil, Yaghob, Zavoral: Trisolda: The Environment for Semantic Data Processing, International Journal On Advances in Software, IARIA, 2009 Podzimek, Dokulil, Yaghob, Zavoral: Mám hlad: pomůže mi Sémantický web?, Informačné technológie - Aplikácia a Teória, ITAT 2008 Dokulil, Tykal, Yaghob, Zavoral: Semantic Web Repository And Interfaces, International Conference on Advances in Semantic Processing, SEMAPRO 2007, IEEE Computer Society Press -

Best Paper Award

Dokulil, Tykal, Yaghob, Zavoral: Semantic Web Infrastructure, IEEE International Conference on Semantic Computing ICSC, IEEE Computer Society Press 2007 Yaghob, Zavoral: Semantic Web Infrastructure using DataPile, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Itelligent Agent Technology, Hong Kong, IEEE Computer Society Press 2006

Web (3TB of *.cz) Crawler Galamboš

Dědek Zavoral, Eckhardt Dokulil Mlýnková, Nečaský

http://ksi.mff.cuni.cz/semwex

PART II Tables in RDF querying do we really need them?

SPARQL

 syntax   SQL-like – at first look “simple language” but complex grammar  {?x ?y ?z . OPTIONAL { ?a ?b ?c . } . ?k ?l ?m . }  {?x ?y ?z OPTIONAL { ?a ?b ?c } ?k ?l ?m }

SPARQL

 semantics    lot of changes – now stable based on algebra  works with sets of variable mappings – i.e. tables  very different from SQL “closed”  no compositionality

SPARQL

  RDF is a graph SPARQL provides pattern (subgraph) matching – no other graph handling   SPARQL handles only fixed-size graphs RDFS supports arbitrary hierarchy of classes  SPARQL has no aggregate functions, no “group by”  no constructors

Seasoned SQL developer

Seasoned SQL developer

Idea… ?

 make the language SQL-like inside not just outside    joins, selection, projection, grouping, aggregation relational algebra works with relation, i.e. sets of triples, the database is made of relations RDF data is made of… RDF graphs  maybe we should work with RDF graphs

Tables – Graphs

John John Jane Bill Smith Doe Doe Jackson John John Smith Doe Jane Doe Bill Jackson

Basic pattern

?person

ex:firstname ex:lastname ?firstname

?lastname

 variables -> “columns”

Further operations

  selection, joins, aggregation, projection group by

ex:john

Multiple values

[email protected]

ex:mail ex:mail [email protected]

Local and global aggregations

 more values in one “column”   maximal number of mails total count of mails

What’s more?

   optional parts of the graph regular expressions textual representation (language)

Conclusion

   current state is bad try something different ?

PART III Let’s have a look – RDF visualizer

RDF

   subject – the thing we are describing predicate – the property of the thing object – the value of the property  a graph (directed, labeled)

Visualization

   triangle layout  layered drawing for trees node merging  more information for a node navigation  the way to handle huge data

Let’s have a look

A picture is worth a thousand words…