IWCS-7 : Harnessing the Semantic Web for NLP

Download Report

Transcript IWCS-7 : Harnessing the Semantic Web for NLP

NLP in a Data-Driven Approach to
the Ontology Life-Cycle
Paul Buitelaar
Competence Center Semantic Web &
Language Technology Lab
DFKI GmbH
Saarbrücken, Germany
© Paul Buitelaar: TALN07 – Toulouse, June 2007
1
Overview
Part I Ontologies and the Semantic Web

Why Ontologies?

The Semantic Web
Part II The Ontology Life-Cycle

Ontology Search – OntoSelect

Ontology Population – SOBA offline

Knowledge Retrieval – SOBA online

Ontology Learning – OntoLT (RelExt, ISOLDE)
Part III Ontologies and the Lexicon

A Lexicon Model for Multilingual Ontologies – LingInfo
© Paul Buitelaar: TALN07 – Toulouse, June 2007
2
Part I
Ontologies and the Semantic Web
© Paul Buitelaar: TALN07 – Toulouse, June 2007
3
Ontologies – An example
Geographical Entity (GE)
is-a
flow_through
Inhabited GE
Natural GE
capital_of
mountain
river
instance_of
located_in
Zugspitze
height (m)
2962
city
country
Neckar
length (km)
367
F-Logic
Ontology
capital_of
Germany
flow_through
located_in
flow_through
Stuttgart
similar
Berlin
Design: Philipp Cimiano
© Paul Buitelaar: TALN07 – Toulouse, June 2007
4
Why Ontologies?

Provide explicit and formal context for:

Interpretation

Integration

Sharing
© Paul Buitelaar: TALN07 – Toulouse, June 2007
5
Why Ontologies? - Interpretation
A
X
i
k
B
C
Y
j
Z
l
C1
Y1
Z1
© Paul Buitelaar: TALN07 – Toulouse, June 2007
6
Why Ontologies? - Integration
A
i
B
j
C
C1
B1
© Paul Buitelaar: TALN07 – Toulouse, June 2007
7
Why Ontologies? - Sharing
X
k
Z
Y
l
Y1
Z1
© Paul Buitelaar: TALN07 – Toulouse, June 2007
8
Ontologies and the Semantic Web
Brief intro to the Semantic Web
© Paul Buitelaar: TALN07 – Toulouse, June 2007
9
Web Consists of Uninterpreted Data
Web
Text
Images
© Paul Buitelaar: TALN07 – Toulouse, June 2007
Tables
DBs
10
Interpretation through Markup - Categories
Markup
© Paul Buitelaar: TALN07 – Toulouse, June 2007
Web
11
Interpretation through Markup – User Tags
Markup
© Paul Buitelaar: TALN07 – Toulouse, June 2007
Web2.0
12
Formal Interpretation - Knowledge Markup
Knowledge
Markup
© Paul Buitelaar: TALN07 – Toulouse, June 2007
Semantic Web
(Web3.0)
Ontologies
13
Formal Interpretation - Knowledge Markup
Knowledge
Markup
© Paul Buitelaar: TALN07 – Toulouse, June 2007
Semantic Web
(Web3.0)
Ontologies
14
…turns the Web into a Knowledge Base
Knowledge
Markup
Ontologies
…
<rdf:Description rdf:about="KB_100308_Individual_16">
<rdf:type rdf:resource="http://www.lehigh.edu/univ-bench.owl#Director"/>
<ub:title>PhD</ub:title>
<ub:age>51</ub:age>
<ub:headOf>KB_100308_Individual_19</ub:headOf>
</rdf:Description>
<rdf:Description rdf:about="KB_100308_Individual_19">
<rdf:type rdf:resource="http://www.lehigh.edu/univ-bench.owl#Program"/>
</rdf:Description>
…
© Paul Buitelaar: TALN07 – Toulouse, June 2007
15
… that enables Semantic Web Services
Semantic
Web Services
Knowledge
Markup
© Paul Buitelaar: TALN07 – Toulouse, June 2007
Ontologies
16
… and Intelligent Man-Machine Interfaces
Semantic
Web Services
Knowledge
Markup
Ontologies
Intelligent
Man-Machine Interface
© Paul Buitelaar: TALN07 – Toulouse, June 2007
17
Part II
The Ontology Life-Cycle
© Paul Buitelaar: TALN07 – Toulouse, June 2007
18
Ontology Life Cycle
Populate
Knowledge Base Generation
Validate
Consistency Checks
Create/Select
Development and/or Selection
Evolve
Extension, Modification
Deploy
Knowledge Retrieval
Maintain
Usability Tests
© Paul Buitelaar: TALN07 – Toulouse, June 2007
19
NLP in the Ontology Life Cycle
Text-Driven Ontology Search
Ontology Population from Text
Ontology Learning from Text
NL Interaction with KBs
© Paul Buitelaar: TALN07 – Toulouse, June 2007
20
NLP in the Ontology Life Cycle
Text-Driven Ontology Search
Ontology Population from Text
Ontology Learning from Text
NL Interaction with KBs
© Paul Buitelaar: TALN07 – Toulouse, June 2007
21
OntoSelect

Ontology Library and Ontology Search Service
http://olp.dfki.de/OntoSelect






OntoSelect monitors the web for ontologies (automatic updates
and indexing)
Ontology browse and search (by keyword, by document, by topic)
Class, property and (multilingual) label browse and search
Ontology publishing (submit your ontology)
Statistics on formats (mostly OWL), languages (mostly EN),
frequently used labels, ontology publishing
Selected ontologies may be used in:


Knowledge extraction/markup in Semantic Web applications
Semantic tagging in Natural Language Processing
Paul Buitelaar, Thomas Eigner, Thierry Declerck OntoSelect: A Dynamic Ontology Library with Support for Ontology Selection In:
Proc. of the Demo Session at the International Semantic Web Conference, Hiroshima, Japan, Nov. 2004.
© Paul Buitelaar: TALN07 – Toulouse, June 2007
22
© Paul Buitelaar: TALN07 – Toulouse, June 2007
23
Multilinguality
Distribution of languages in 170 ontologies with multilingual labels out of 1420 ontologies currently - June 2007 - collected (~12%)
English
Dutch
Hindi
German
Catalan
Japanese
© Paul Buitelaar: TALN07 – Toulouse, June 2007
French
Italian
Korean
Spanish
Latin
Turkish
Portugese
Greek
Chinese
24
Ontology Search with OntoSelect
“Find the background knowledge that fits your NLP task …”

Keyword, topic, document-specific ontology search

Relevance criteria address ontology content and structure:

Coverage - Term Matching


Structure - Properties Relative to Classes


How many of the terms in a text collection are covered by labels for
classes and properties?
How detailed is the knowledge structure that the ontology
represents?
Connectedness - Number of Included Ontologies

Is the ontology connected to other ontologies and how well
established are these?
© Paul Buitelaar: TALN07 – Toulouse, June 2007
25
© Paul Buitelaar: TALN07 – Toulouse, June 2007
26
© Paul Buitelaar: TALN07 – Toulouse, June 2007
27
© Paul Buitelaar: TALN07 – Toulouse, June 2007
28
© Paul Buitelaar: TALN07 – Toulouse, June 2007
29
© Paul Buitelaar: TALN07 – Toulouse, June 2007
30
© Paul Buitelaar: TALN07 – Toulouse, June 2007
31
Related Services

Manually maintained ontology libraries


Protege Ontology Library (Stanford Univ., USA)
DAML Ontology Library (DAML.org)


SchemaWeb Directory (schemaweb.info)


No longer maintained
No longer active?
Semantic Web search engines


SWOOGLE (Univ. of Maryland, USA)
Watson (Knowledge Media Institute, UK)
© Paul Buitelaar: TALN07 – Toulouse, June 2007
32
NLP in the Ontology Life Cycle
Text-Driven Ontology Search
Ontology Population from Text
Ontology Learning from Text
NL Interaction with KBs
© Paul Buitelaar: TALN07 – Toulouse, June 2007
33
Ontology Population
SmartWeb Ontology-based Annotation (SOBA)

Generate a Knowledge Base from Web Documents
on Soccer World Cup to be used for knowledge
based QA in the SmartWeb mobile dialogue system
http://www.smartweb-projekt.de/


Ontology-based wrapping of HTML tables
Ontology-based Information Extraction


Ontology-based Information Extraction on image captions


Named-Entity Recognition/Classification, Event Extraction
NE/Event extraction for linking images to ontology classes
Ontology Population

Instantiation of classes with extracted NEs and events
Paul Buitelaar, Philipp Cimiano, Anette Frank, Stefania Racioppa SOBA: SmartWeb Ontology-based Annotation In: Proc. of the
Demo Session at the International Semantic Web Conference, Athens GA, USA, Nov. 2006.
© Paul Buitelaar: TALN07 – Toulouse, June 2007
34
SOBA - offline
Linguistic Annotation & IE
Web
Monitor
Text
Image
Captions
Web
Interface
Textual
Data
Crawler
Tables
SemiStruct.
Data
(HTML)
Images
(XML)
SmartWeb corpus
(XML)
© Paul Buitelaar: TALN07 – Toulouse, June 2007
Linguistic
Annotation &
Information
Extraction
Results
Class
Instantiation
SWIntO &
OntoBroker
(F-Logic)
(XML)
KB
(RDF)
35
KB Generation
semistruct#Uruguay_vs_Bolivien_29_Maerz_2000_19:30
[
sportevent#matchEvents -> soba#ID11].
soba#ID11:sportevent#Ban
[
sportevent#commitedBy -> semistruct#Uruguay_vs_Bolivivien_(…)_Luis_CRISTALDO].
KB
semistruct#Uruguay_vs_Bolivien_29_Maerz_2000_19:30:sportevent#LeagueFootballMatch
[ externalRepresentation@(de) ->> "Uruguay vs. Bolivien (29. Maerz 2000 19:30)";
dolce#"HAPPENS-AT" -> semistruct#"29. Maerz 2000 19:30_interval";
sportevent#heldIn -> semistruct#"Montevideo_Centenario_29Maerz_2000_19_30_Stadium";
sportevent#team1Result -> 1;
sportevent#team2Result -> 0;
sportevent#attendance ->49811;
sportevent#team1 -> semistruct#"Uruguay_vs_Bolivien_29Maerz_2000_19:30_Uruguay_MatchTeam";
sportevent#team2 -> semistruct#"Uruguay_vs_Bolivien_29Maerz_2000_19:30_Bolivien_MatchTeam";
…
© Paul Buitelaar: TALN07 – Toulouse, June 2007
semistruct#Uruguay_vs_Bolivien_29_Maerz_2000_19:30
[ sportevent#matchEvents -> soba#ID25].
soba#ID25:sportevent#Foul
[ sportevent#commitedBy -> semistruct#Uruguay_vs_Bolivien_Luis_CRISTALDO].
mediainst#ID67:media#Picture
[ media#URL -> "http://fifaworldcup.yahoo.com/06/de/photos/124155.jpg";
media#shows -> ID25].
36
OBIE Results
Results over 50 manually annotated match reports
SWIntO class + attribute
manually annotated
extracted
extracted OK
Precision
Recall
F-measure
23
15
13
0,867
0,565
0,684
22
2
1
0,500
0,045
0,083
26
13
12
0,923
0,462
0,615
committed_by
17
1
1
1,000
0,059
0,111
committed_on
8
3
3
1,000
0,375
0,545
69
36
32
0,889
0,464
0,610
52
11
11
1,000
0,212
0,349
53
24
20
0,833
0,377
0,519
14
4
4
1,000
0,286
0,444
58
34
23
0,676
0,397
0,500
committed_by
55
7
6
0,857
0,109
0,194
committed_on
11
3
3
1,000
0,273
0,429
95
31
25
0,806
0,263
0,397
committed_by
77
5
5
1,000
0,065
0,122
committed_on
26
2
2
1,000
0,077
0,143
35
5
4
0,800
0,114
0,200
34
2
2
1,000
0,059
0,111
- 65 classes ShowingYellowRedCard
penalized_player
Block
Sample of good results
F-measure > 0,6
FreeKick
committed_by
CornerKick
committed_by
Header
Sample of average
results
Cross
0,3 < F-Measure < 0,6
Sample of bad results
F-Measure < 0,3
BallDeflection
committed_by
MACRO AVERAGE
on types
P 0.51 / R 0.23 / F 0.31
on attributes
P 0.38 / R 0.06 / F 0.11
MICRO AVERAGE
on types
P 0.72 / R 0.26 / F 0.38
on attributes P 0.88 / R 0.06 / F 0.12
© Paul Buitelaar: TALN07 – Toulouse, June 2007
37
SOBA – online
© Paul Buitelaar: TALN07 – Toulouse, June 2007
38
NLP in the Ontology Life Cycle
Text-Driven Ontology Search
Ontology Population from Text
Ontology Learning from Text
NL Interaction with KBs
© Paul Buitelaar: TALN07 – Toulouse, June 2007
39
Ontology Learning (from Text)
disjoint (PLAYER, REFEREE)
CROSS R ASSIST
CROSS (domain:PLAYER, range:PLAYER)
GOALKEEPER c PLAYER, PLAYER c PERSON
c := GOALKEEPER := i(c), |c|, Refc(c)
{Torwart, doelman, goalkeeper, ...}
Axioma
Relation Taxonomy
Relations
Class Taxonomy
Concept Formation (Classes)
(Multilingual) Synonyms
referee, trainer, goalkeeper, ...
Terms
Paul Buitelaar, Philipp Cimiano, Bernardo Magnini Ontology Learning from Text: An Overview In: Paul Buitelaar, Philipp
Cimiano, Bernardo Magnini (eds.) Ontology Learning from Text: Methods, Evaluation and Applications Frontiers in
Artificial Intelligence and Applications Series, Vol. 123, IOS Press, July 2005.
Philipp Cimiano Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer, 2006.
© Paul Buitelaar: TALN07 – Toulouse, June 2007
40
OntoLT – NLP-based Protege PlugIn
http://olp.dfki.de/OntoLT/OntoLT.htm
Corpus
Linguistic
Annotation
Annotated
Corpus
(XML)
OntoLT
Mappings
Extraction
Extracted
Ontology
XML (Linguistic Structure)
<=>
Protégé (Classes, Slots)
Definition
of Mappings
Protégé
Edit Extracted Ontology
Paul Buitelaar, Daniel Olejnik, Michael Sintek A Protégé Plug-In for Ontology Extraction from Text Based on Linguistic Analysis
In: Proceedings of the 1st European Semantic Web Symposium (ESWS), Heraklion, Greece, May 2004.
© Paul Buitelaar: TALN07 – Toulouse, June 2007
41
Linguistic Patterns + Statistical Relevance
© Paul Buitelaar: TALN07 – Toulouse, June 2007
42
Class Candidate Extraction
© Paul Buitelaar: TALN07 – Toulouse, June 2007
43
Generate Ontology Fragments
© Paul Buitelaar: TALN07 – Toulouse, June 2007
44
RelExt - Relation Extraction

Extend SmartWeb Ontology with ‘Event Relations’
“Ballack shoots the ball in the net.” > Relation:Shoot(Domain:FootballPlayer, Range:BallObject)
c2
Rank
.
.
.
flanken
SUBJ:FOOTBALLPLAYER
“Klasnic”
flanken
DOBJ:FOOTBALLPLAYER
“Klose”
.
.
.
beschimpfen (to insult)
SUBJ:FOOTBALLPLAYER “Klasnic”
.
.
.
1
2
3
4
5
6
7
8
9
Rank
1
2
3
4
5
6
7
8
9
27167,41
22045,39
21908,37
20439,09
16342,99
9563,41
9468,57
7752,84
7653,68
c2
565510,9
162137,8
143528,9
138535,4
70814,86
49018,16
45474,96
34668,11
29324,54
Predicate Frequency
flanken
1373
klaeren
1435
schiessen
1503
koepfen
1033
lassen
826
ziehen
1548
passen
814
spielen
1559
lenken
537
Concept Label Frequency
FOOTBALLPLAYER
28494
GOALOBJECT
8188
BALLOBJECT
7249
GOALKEEPER
6887
SHOT
3578
TEAM
2477
PENALTYAREA
2298
FREEKICK
1752
WING
1482
Relation: flanken(Domain:FootballPlayer,Range:FootballPlayer)
Alexander Schutz, Paul Buitelaar RelExt: A Tool for Relation Extraction in Ontology Extension In: Proc. of the 4th International
Semantic Web Conference, Galway, Ireland, Nov. 2005.
© Paul Buitelaar: TALN07 – Toulouse, June 2007
45
ISOLDE - Web-based Taxonomy Extraction
<searchWord="SPIELER"/>
<DWDS>
KlassenClass
- <kontext>
kandidat
Candidate
<entry>jmd., der an einem sportlichen Spiel teilnimmt</entry>
<entry>jmd., der dem Gluecksspiel verfallen ist</entry>
</kontext>
- <synonym>
Typ
der
Lexical
…..
Relation
</synonym>
Relation
- <hyponym>
<entry>Auswahlspieler</entry>
<entry>Gegner</entry>
- <entry>
<class resource="TORSCHUETZE" />
</entry>
Verweis
auf
Related
- <entry>
andere Candidate
Klasse
<class resource="TORWART" />
Class
</entry>
- <entry>
<class resource="VERTEIDIGER" />
</entry>
</hyponym>
- <hyperonym>
<entry>jemand</entry>
Web
</hyperonym>
Quelle
</DWDS>
Resource
- <WIKI>
<WIKIPedia>…..</WIKIPedia>
- <WIKTionary>
- <subClass>
- <entry>
<class resource="ABWEHRSPIELER" />
</entry>
- <entry>
<class resource="MITTELFELDSPIELER" />
</entry>
- <entry>
<class resource="STUERMER" />
</entry>
- <entry>
<class resource="TORWART" />
</entry>
</subClass>
- <superClass>
<entry>Person</entry>
</superClass>
<kontext>Mitglied einer Mannschaft, jmd.
der an einem Spiel teilnimmt</kontext>
<Herkuft></Herkuft>
</WIKTionary>
</WIKI>
Taxonomy
Class
Extracting
Taxonomies
Equivalence
from
Wikipedia &
Web Dictionaries
Nicolas Weber, Paul Buitelaar Web-based Ontology Learning with ISOLDE In: Proc. of the Workshop on Web Content Mining with
Human Language at the International Semantic Web Conference, Athens GA, USA, Nov. 2006.
© Paul Buitelaar: TALN07 – Toulouse, June 2007
46
Part III
Ontologies and the Lexicon
© Paul Buitelaar: TALN07 – Toulouse, June 2007
47
NLP in the Ontology Life Cycle
Ontology Search
Ontology Population
(Multilingual)
Lexicon
Ontology Learning
KB Retrieval
© Paul Buitelaar: TALN07 – Toulouse, June 2007
48
Multilinguality in Ontologies
Student
studies_at
located_at
University
Campus
works_at
School
is_part_of
Staff
has_German_term
Fakultät
has_Dutch_term
has_US-English_term
Faculteit
School
© Paul Buitelaar: TALN07 – Toulouse, June 2007
49
Towards Lexicalized OntologiesStudent
studies_at
located_at
University
Campus
works_at
School
is_part_of
Staff
has_term
Term
instance_of
instance_of
Fakultät
language
faculteit
language
DE
© Paul Buitelaar: TALN07 – Toulouse, June 2007
NL
school
language
EN-US
50
LingInfo - Lexicon Model for Ontologies
DomainClass
hasLingInfo
LingInfo
instanceOf
Term-1
hasMorphSynInfo
hasOrthographicForm
hasLang
term
WordForm-1
XX
Paul Buitelaar, Michael Sintek, Malte Kiesel A Lexicon Model for Multilingual/Multimedia Ontologies In: Proceedings of the 3rd
European Semantic Web Conference, Budva, Montenegro, June 2006
© Paul Buitelaar: TALN07 – Toulouse, June 2007
51
LingInfo - Lexicon Model for Ontologies
Multilingual Terms
SCHOOL
hasLingInfo
LingInfo
instanceOf
Term-1
hasOrthographicForm
hasLang
fakulteitsgebouw
NL
“department building”
“school”
© Paul Buitelaar: TALN07 – Toulouse, June 2007
52
LingInfo - Lexicon Model for Ontologies
Morpho-Syntactic Info
SCHOOL
hasLingInfo
LingInfo
instanceOf
Term-1
hasMorphSynInfo
hasOrthographicForm
hasLang
fakulteitsgebouw
WordForm-1
NL
“department building”
“school”
© Paul Buitelaar: TALN07 – Toulouse, June 2007
hasPoS
N
53
LingInfo - Lexicon Model for Ontologies
Decomposition
SCHOOL
hasLingInfo
LingInfo
instanceOf
Term-1
hasMorphSynInfo
hasOrthographicForm
hasLang
fakulteitsgebouw
WordForm-1
NL
“department building”
“school”
hasPoS
hasStem
Term-2
N
hasOrthographicForm
“department” fakulteit
“school”
© Paul Buitelaar: TALN07 – Toulouse, June 2007
hasStem
Term-3
hasOrthographicForm
gebouw “building”
54
Mapping Lexical to Semantic Structure
Decomposition
SCHOOL
hasLingInfo
LingInfo
instanceOf
instanceOf
Term-1
hasMorphSynInfo
hasOrthographicForm
hasLang
fakulteitsgebouw
WordForm-1
NL
“department building”
“school”
hasPoS
hasStem
Term-2
N
hasOrthographicForm
“department” fakulteit
“school”
© Paul Buitelaar: TALN07 – Toulouse, June 2007
hasStem
Term-3
hasOrthographicForm
gebouw “building”
55
Mapping Lexical to Semantic Structure
SCHOOL
Decomposition
BUILDING
hasLingInfo
hasLingInfo
LingInfo
LingInfo
instanceOf
instanceOf
Term-1
instanceOf
hasMorphSynInfo
hasOrthographicForm
hasLang
fakulteitsgebouw
WordForm-1
NL
“department building”
“school”
hasPoS
hasStem
Term-2
N
hasOrthographicForm
“department” fakulteit
“school”
© Paul Buitelaar: TALN07 – Toulouse, June 2007
hasStem
Term-3
hasOrthographicForm
gebouw “building”
56
Mapping Lexical to Semantic Structure
isLocatedAt
SCHOOL
Decomposition
BUILDING
hasLingInfo
hasLingInfo
LingInfo
LingInfo
instanceOf
instanceOf
Term-1
instanceOf
hasMorphSynInfo
hasOrthographicForm
hasLang
fakulteitsgebouw
WordForm-1
NL
“department building”
“school”
hasPoS
hasStem
Term-2
N
hasOrthographicForm
“department” fakulteit
“school”
© Paul Buitelaar: TALN07 – Toulouse, June 2007
hasStem
Term-3
hasOrthographicForm
gebouw “building”
57
Mapping Lexical to Semantic Structure
Pred-Arg Structure
PERSON
hasAgent
CALL
hasLingInfo
LingInfo
instanceOf
Term-1
hasMorphSynInfo
hasOrthographicForm
hasLang
call
WordForm-1
EN
hasPoS
hasArg
hasArg
Arg-1
hasGramFunc
V
hasPhraseType
SUBJ
© Paul Buitelaar: TALN07 – Toulouse, June 2007
Arg-2
NP
mapsTo
hasAgent-1
58
Mapping Lexical to Semantic Structure
Pred-Arg Structure
worksFor
ORGANIZATION
PERSON
hasAgent
isa
SCHOOL
CALL
hasLingInfo
LingInfo
instanceOf
Term-1
hasMorphSynInfo
hasOrthographicForm
hasLang
call
WordForm-1
EN
hasPoS
hasArg
hasArg
Arg-1
hasGramFunc
V
Coercion/Bridging:
“The Heller school called. They wanted to know ...”
© Paul Buitelaar: TALN07 – Toulouse, June 2007
Arg-2
hasPhraseType
SUBJ
NP
mapsTo
hasAgent-1
59
Acknowledgements
OntoSelect
Thomas Eigner, Michael Velten (DFKI)
SOBA
Anette Frank (DFKI, now at Univ. Heidelberg), Stefania Racioppa (DFKI),
Philipp Cimiano (AIFB) and others ...
OntoLT
Michael Sintek (DFKI), Daniel Olejnik (now at IDS Scheer) and others ...
RelExt
Alexander Schutz (now at DERI Galway)
ISOLDE
Nicolas Weber (now at KnowCenter, Graz)
LingInfo
Michael Sintek, Massimo Romanelli (DFKI), Vanessa Micelli (European Media
Lab, Germany) and others ...
© Paul Buitelaar: TALN07 – Toulouse, June 2007
60