Transcript Document

KYOTO (ICT-211423)
Yielding Ontologies for Transition-Based Organization
FP7: Intelligent Content and Semantics
http://www.kyoto-project.eu/
Piek Vossen, VU University Amsterdam
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
2
Project goals
• Open platform for knowledge sharing across languages
and cultures
– Wiki environment that allows people in the field to maintain their
knowledge and agree on meaning without knowledge engineering
skills
– Bootstrap this knowledge through open text mining & concept
learning
– Enables knowledge transition and information search across
different target groups, transgressing linguistic, cultural and
geographic boundaries.
– Enables deep semantic search for facts and knowledge
• Free, open source license (GPL)
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
3
Scope
• Languages:
– English, Dutch, Italian, Spanish, Basque, Chinese, Japanese
• Domain:
– Environmental domain, BUT usable in any domain
• Global:
– Both European and non-European languages
• Available:
– Free: as open source system and data (GPL)
• Future perspective:
– Content standardization that supports world wide communication
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
4
KYOTO (ICT-211423)
• Funded:
– 7th Framework Program-ICT of the European Union:
Intelligent Content and Semantics
– Taiwan and Japan funded by national grants
• STREPS project: research & development
• Duration:
– March 2008 – March 2011
• Effort:
– 364 person months of work.
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
5
Consortium
1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands),
2. Consiglio Nazionale delle Ricerche (Pisa, Italy),
3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,
Germany),
4. Euskal Herriko Unibertsitatea (San Sebastian, Spain),
5. Academia Sinica (Tapei, Taiwan),
6. National Institute of Information and Communications Technology
(Kyoto, Japan),
7. Irion Technologies (Delft, The Netherlands),
8. Synthema (Rome, Italy),
9. European Centre for Nature Conservation (Tilburg, The Netherlands),
• Subcontractors:
– World Wide Fund for Nature (Zeist, The Netherlands),
– Masaryk University (Brno, Czech)
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
6
Current situation environment domain
• Vast amount of information in all kinds of formats
and structures: websites, documents, databases,
experts, community networks
• Scattered over the world: different regions,
languages and cultures
• Highly dynamic and developing
• Increasing time and information pressure
• Technology gap, use first results Google
• Critical knowledge dependency
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
7
KYOTO cycle
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
8
KYOTO's Solution
• Text mining:
–
–
–
–
Massive and accurate indexing of facts from vast amounts of text;
In any language/culture from scattered sources;
Again and again to detect trends and changes;
Direct relation between knowledge modeling effort and text mining
• Knowledge modeling:
– automatic learning of terms and concepts from text in any language;
– formalization of knowledge in computer usable format -> wordnets &
ontologies
• Community software:
– For experts in the field and not knowledge engineers
– Continuous and collaborative effort:
• adapt to the changing domain;
• consensus in the field;
• consensus across languages and cultures
– Produce interoperable, formal, standardized knowledge structures;
– Relate knowledge structure to expressions in languages
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
Environmental organizations
Distributed, diverse & dynamic data
1
Citizens
4
Governments
maintain
terms & concepts
Companies
Wikyoto
Wordnets
Capture text:
"Sudden increase of
CO2 emissions in
2008 in Europe"
Ontology

2
Top
Abstract Physical
Tybot: term yielding robot
3
Process
Substance
CO2 emission
Middle
H20
CO2
H20
CO2
Greenhouse
Pollution Emission
Gas
Domain
Kybot: knowledge yielding robot
Index facts:
5
Process:
Involves:
Property:
When:
Where:
Emission
CO2
increase, sudden
2008
Europe
6
Text & Fact Index
Semantic
Search
10
Achievements after
st
1
year
• First version of all system components
– Wordnets in 7 languages in uniform database formats
– Standard representation for output of linguistic
processing for 7 languages, based on ISO proposals
– Tybot (term extraction), Kybot (fact extraction) and
Wikyoto (user editor)
– Semantic search
• Extensive definition of user requirements
• Integration of system components
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
Potential impact
Kyoto Knowledge Base
Domain
Domain
Domain
WnJP
WnIT
WnNL
Domain Ontology
Domain
Domain
Ontology
Ontology
WnES
WnEN
Domain
Domain
WnEU
WnCH
13
Linking Open Data dataset cloud
http://richard.cyganiak.de/2007/10/lod/
Wordnet
environment
terms
legal
facts
environment
facts
medical
facts
Wordnet
sailing
terms
Wordnet
environment
terms
Wordnet
legal
terms
Wordnet
environment
terms
Wordnet
medical
terms
Ontology
legal
concepts
Ontology
environment
concepts
Ontology
medical
concepts
Wordnet
environment
terms
Ontology
sailing
concepts
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
Wordnet
environment
terms
Project characteristics
15
Why STRP project?
• Major technical challenges
• Cross-cultural and cross-lingual
• Small consortium for intense collaboration
and discussion
• Bridge the gap between users and
technology: two-directional process
• Role out needs to follow from technical
achievements
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
16
How to keep focus?
• Use existing state of the art technology
• Start from current practice as baseline
• Develop robust platform that adds to baseline,
with baseline as fall back
• Gradually add richer data, more precision and new
functionalities
• Allow end-users to control the process, driven by
textual examples
• Open standardized architecture that can be
developed further
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
Thank you for your attention