Transcript Document
KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content and Semantics http://www.kyoto-project.eu/ Piek Vossen, VU University Amsterdam FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 2 Project goals • Open platform for knowledge sharing across languages and cultures – Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills – Bootstrap this knowledge through open text mining & concept learning – Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries. – Enables deep semantic search for facts and knowledge • Free, open source license (GPL) FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 3 Scope • Languages: – English, Dutch, Italian, Spanish, Basque, Chinese, Japanese • Domain: – Environmental domain, BUT usable in any domain • Global: – Both European and non-European languages • Available: – Free: as open source system and data (GPL) • Future perspective: – Content standardization that supports world wide communication FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 4 KYOTO (ICT-211423) • Funded: – 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics – Taiwan and Japan funded by national grants • STREPS project: research & development • Duration: – March 2008 – March 2011 • Effort: – 364 person months of work. FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 5 Consortium 1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2. Consiglio Nazionale delle Ricerche (Pisa, Italy), 3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany), 4. Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5. Academia Sinica (Tapei, Taiwan), 6. National Institute of Information and Communications Technology (Kyoto, Japan), 7. Irion Technologies (Delft, The Netherlands), 8. Synthema (Rome, Italy), 9. European Centre for Nature Conservation (Tilburg, The Netherlands), • Subcontractors: – World Wide Fund for Nature (Zeist, The Netherlands), – Masaryk University (Brno, Czech) FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 6 Current situation environment domain • Vast amount of information in all kinds of formats and structures: websites, documents, databases, experts, community networks • Scattered over the world: different regions, languages and cultures • Highly dynamic and developing • Increasing time and information pressure • Technology gap, use first results Google • Critical knowledge dependency FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 7 KYOTO cycle FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 8 KYOTO's Solution • Text mining: – – – – Massive and accurate indexing of facts from vast amounts of text; In any language/culture from scattered sources; Again and again to detect trends and changes; Direct relation between knowledge modeling effort and text mining • Knowledge modeling: – automatic learning of terms and concepts from text in any language; – formalization of knowledge in computer usable format -> wordnets & ontologies • Community software: – For experts in the field and not knowledge engineers – Continuous and collaborative effort: • adapt to the changing domain; • consensus in the field; • consensus across languages and cultures – Produce interoperable, formal, standardized knowledge structures; – Relate knowledge structure to expressions in languages FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 Environmental organizations Distributed, diverse & dynamic data 1 Citizens 4 Governments maintain terms & concepts Companies Wikyoto Wordnets Capture text: "Sudden increase of CO2 emissions in 2008 in Europe" Ontology 2 Top Abstract Physical Tybot: term yielding robot 3 Process Substance CO2 emission Middle H20 CO2 H20 CO2 Greenhouse Pollution Emission Gas Domain Kybot: knowledge yielding robot Index facts: 5 Process: Involves: Property: When: Where: Emission CO2 increase, sudden 2008 Europe 6 Text & Fact Index Semantic Search 10 Achievements after st 1 year • First version of all system components – Wordnets in 7 languages in uniform database formats – Standard representation for output of linguistic processing for 7 languages, based on ISO proposals – Tybot (term extraction), Kybot (fact extraction) and Wikyoto (user editor) – Semantic search • Extensive definition of user requirements • Integration of system components FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 Potential impact Kyoto Knowledge Base Domain Domain Domain WnJP WnIT WnNL Domain Ontology Domain Domain Ontology Ontology WnES WnEN Domain Domain WnEU WnCH 13 Linking Open Data dataset cloud http://richard.cyganiak.de/2007/10/lod/ Wordnet environment terms legal facts environment facts medical facts Wordnet sailing terms Wordnet environment terms Wordnet legal terms Wordnet environment terms Wordnet medical terms Ontology legal concepts Ontology environment concepts Ontology medical concepts Wordnet environment terms Ontology sailing concepts FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 Wordnet environment terms Project characteristics 15 Why STRP project? • Major technical challenges • Cross-cultural and cross-lingual • Small consortium for intense collaboration and discussion • Bridge the gap between users and technology: two-directional process • Role out needs to follow from technical achievements FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 16 How to keep focus? • Use existing state of the art technology • Start from current practice as baseline • Develop robust platform that adds to baseline, with baseline as fall back • Gradually add richer data, more precision and new functionalities • Allow end-users to control the process, driven by textual examples • Open standardized architecture that can be developed further FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 Thank you for your attention