Transcript Web semantization
Semantic Web infrastructure Trisolda current state and perspectives Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK http://www.ksi.mff.cuni.cz/semwex/
10. Mixer 26.11.2008
Semantic web vs. semantization
Semantic web vision Tim Berners-Lee “The Semantic Web,” Scientific Am. 2001 semantic research generously funded
'hardly one has ever seen ...'
New buzzwords Web 2.0, Web 3.0, Social web, Web of data, Meshups, … Semantic web died?
no, not yet born Web Semantization
Semantic technologies
Browser HTML HTTP TCP/IP
Technical details
Semantic web services
Trisolda
Motto 'hardly one has ever seen ...' the semantic web data from real life incomplete, duplicated, inaccurate, >20 millions triples Jena very slow load, over >1 million of triples → crash Sesame unable to load more then 200 000 triples exponential complexity for loading where is a working platform for semantic web research?
Technology background Repository – data integration DataPile
Trisolda
Trisolda Architecture Import interfaces Repository Querying & Executors
Repository
Trisolda Repository Stores incoming data Retrieves results for queries Stores used ontology DataPile structure holds data in any format Applications server Not all data and knowledge available when imported the knowledge is not accurate Background worker inferencing data unifications reasoner Framework for plug-ins
Import
Direct import data in data sources converters to the used ontology Crawling wild Web Egothor web crawler AgentMat parsed pages stored deductors deduce data and ontology real life data incomplete, duplicated, inaccurate Import modes batch insert immediate insert
Querying
Query API Based on simple graph matching query: set of RDF triples with var.
result: multiset of possible variable mapping – a relation Not another SQL-like language set of C++ classes and operators Query evaluation levels of support by q engines Query environments present outputs examples: rep. browser, RDF visualizer, semantic executors service composition - conductors
AgentMat - data semantization framework
AgentMat - data extraction
Future work
Conclusions working infrastructure currently not working - re-deployment, AgentMat & TriQ integration gathering, storing and querying of semantic data platform for research and experiments Future work & long-term goals specialized semantic data storage semantic acquisition, data semantization interface-based loosely coupled network of Semantic Web repositories semantic computing, services, composition, executors ...
Selected Publications
Beňo, Míšek, Zavoral: AgentMat: Framework for Data Scraping and Semantization, 3rd International Conference on Research Challenges in Information Science, IEEE, 2009 Dokulil, Yaghob, Zavoral: Trisolda: The Environment for Semantic Data Processing, International Journal On Advances in Software, IARIA, 2009 Podzimek, Dokulil, Yaghob, Zavoral: Mám hlad: pomůže mi Sémantický web?, Informačné technológie - Aplikácia a Teória, ITAT 2008 Dokulil, Tykal, Yaghob, Zavoral: Semantic Web Repository And Interfaces, International Conference on Advances in Semantic Processing, SEMAPRO 2007, IEEE Computer Society Press -
Best Paper Award
Dokulil, Tykal, Yaghob, Zavoral: Semantic Web Infrastructure, IEEE International Conference on Semantic Computing ICSC, IEEE Computer Society Press 2007 Yaghob, Zavoral: Semantic Web Infrastructure using DataPile, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Itelligent Agent Technology, Hong Kong, IEEE Computer Society Press 2006
Web (3TB of *.cz) Crawler Galamboš
Dědek Zavoral, Eckhardt Dokulil Mlýnková, Nečaský
http://ksi.mff.cuni.cz/semwex
PART II Tables in RDF querying do we really need them?
SPARQL
syntax SQL-like – at first look “simple language” but complex grammar {?x ?y ?z . OPTIONAL { ?a ?b ?c . } . ?k ?l ?m . } {?x ?y ?z OPTIONAL { ?a ?b ?c } ?k ?l ?m }
SPARQL
semantics lot of changes – now stable based on algebra works with sets of variable mappings – i.e. tables very different from SQL “closed” no compositionality
SPARQL
RDF is a graph SPARQL provides pattern (subgraph) matching – no other graph handling SPARQL handles only fixed-size graphs RDFS supports arbitrary hierarchy of classes SPARQL has no aggregate functions, no “group by” no constructors
Seasoned SQL developer
Seasoned SQL developer
Idea… ?
make the language SQL-like inside not just outside joins, selection, projection, grouping, aggregation relational algebra works with relation, i.e. sets of triples, the database is made of relations RDF data is made of… RDF graphs maybe we should work with RDF graphs
Tables – Graphs
John John Jane Bill Smith Doe Doe Jackson John John Smith Doe Jane Doe Bill Jackson
Basic pattern
?person
ex:firstname ex:lastname ?firstname
?lastname
variables -> “columns”
Further operations
selection, joins, aggregation, projection group by
ex:john
Multiple values
ex:mail ex:mail [email protected]
Local and global aggregations
more values in one “column” maximal number of mails total count of mails
What’s more?
optional parts of the graph regular expressions textual representation (language)
Conclusion
current state is bad try something different ?
PART III Let’s have a look – RDF visualizer
RDF
subject – the thing we are describing predicate – the property of the thing object – the value of the property a graph (directed, labeled)
Visualization
triangle layout layered drawing for trees node merging more information for a node navigation the way to handle huge data
Let’s have a look
A picture is worth a thousand words…