Transcript AP 5.1 – Question Answering
Lexical Semantics and Ontologies
Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing
Paul Buitelaar
Language Technology Lab & Competence Center Semantic Web DFKI GmbH Saarbrücken, Germany © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Overview
Day 1: Words and Meanings
Human language as a system How do words relate to each other
Day 2: Words and Object Descriptions
Human language as a means of representation How do words represent objects in the/a world
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Day 1 - Introduction
Words and Meanings
Synsets and Senses Lexical Semantics in WordNet Related Senses Generative Lexicon and CoreLex Domains and Senses Tuning WordNet to a Domain
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Meanings
Lexical Semantics in WordNet
Generative Lexicon and CoreLex Tuning WordNet to a Domain
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WordNet
Lexical Semantic Resource
Semantic Lexicon
Maps words to meanings (senses)
Lexical Database
Machine readable (has a formal structure)
Freely available
http://wordnet.princeton.edu/
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WordNet - Origins
In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database … The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically … WordNet … instantiates hypotheses based on results of psycholinguistic research … … expose such hypotheses to the full range of the common vocabulary In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter ‘‘apple,’’ even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. (Caramazza/Berndt 1978)
Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller.
``Introduction to WordNet: an on-line lexical database.'' In:
International Journal of Lexicography
3 (4), 1990, pp. 235 - 244.
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Synsets
WordNet is organized around word meaning (not word forms as with traditional lexicons)
Word meaning is represented by “synsets” Synset is a “Set of Synonyms”
Example
{board, plank}
Piece of lumber
{board, committee}
Group of people
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Synset Hierarchy
Synsets are organized in hierarchies Defines: generalization (hypernymy) specialization (hyponymy) Example {entity} … {whole, unit} {building material} {lumber, timber} {board, plank}
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
hypernymy hyponymy
Hierarchies (WordNet 1.7)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Hierarchy Example (WordNet 2.1)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Synsets and Senses
Synsets represent word meaning
Words that occur in several synsets have a corresponding number of meanings (senses)
Example
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WordNet 2.1
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
(Other) WordNet Relations
Synonymy
Similar in meaning
Hypernymy/Hyponymy
Generalization and Specialization
Meronymy
Part-of
e.g. study, bathroom, ...
meronym
house
Antonymy
Opposite in meaning
e.g. warm
antonym
cold
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Meanings
Lexical Semantics in WordNet
Generative Lexicon and CoreLex
Tuning WordNet to a Domain
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Systematic Polysemy
Homonymy
bank
embankment institution
We walked along the bank of the Charles river.
Did he have an account at the HBU bank?
Systematic Polysemy
school
group (of people) (learning) process organization building
The school went for an outing.
School starts at 8.30
The school was founded in 1910.
The school has a new roof.
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic or Pragmatic?
Semantic Analysis Pragmatic Analysis
Lexical Items of the Language
school
Objects in the World
Obj1 Obj4 Obj2 Obj3
school
Obj1 Obj2 Obj4 Obj3
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Underspecified Discourse Referents
Anaphora Resolution
[A long book heavily weighted with military technicalities] NP:event-physical_object content , in this edition it is neither so long event nor so technical content as it was originally.
Metonymy
The Boston office called
office >
person person part-of
office
Bridging
Peter bought a car. The engine runs well.
engine
part-of
car
The Boston office called. They asked for a new price.
office > person
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Generative Lexicon Theory Type Coercion
I began the book book >
event event ‘has-relation-with’
book read
is-a
event
multifaceted representation of lexical semantics
reflecting systematic / regular / logical polysemy
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Generative Lexicon Theory Qualia Structure (Pustejovsky 1995)
Formal
book
formal inheritance (is-a / hyponymy) artifact, communication, … Constitutive
book
modification (part-of / meronymy) constitutive
section
, … Telic
book
Agentive
book
telic purpose („what is the object used for“)
read
, … agentive causality („how did the object come about“)
write
, …
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
CoreLex (Buitelaar 1998)
Automatic Qualia Structure Acquisition CoreLex is an attempt to automatically acquire underspecified lexical semantic representations that reflect systematic polysemy These representations can be viewed as shallow Qualia Structures Sense Distribution in WordNet Systematic polysemy can be empirically studied in WordNet by observing sense distributions
>> If more than two words share the same sense distribution (i.e. have the same set of senses), then this may indicate a pattern of systematic polysemy
(adapted from Apresjan 1973)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Systematic Polysemous Classes
book
1.{publication} 2.{product, production} 3.{fact} 4.{dramatic_composition, dramatic_work} 5.{record} 6.{section, subdivision} 7.{journal} => artifact => artifact => communication => communication => communication => communication => artifact
Systematic Polysemous Class
“artifact communication”
amulet annals armband arrow article ballad bauble beacon bible birdcall blank blinker boilerplate book bunk cachet canto catalog catalogue chart chevron clout compact compendium convertible copperplate copy cordon corker ... guillotine homophony horoscope indicator journal laurels lay ledger loophole marker memorial nonsense novel obbligato obelisk obligato overture pamphlet pastoral paternoster pedal pennant phrase platform portrait prescription print puzzle radiogram rasp recap riddle rondeau … statement stave stripe talisman taw text tocsin token transcription trophy trumpery wand well whistle wire wrapper yardstick
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
From WordNet to CoreLex
Noun 1 Noun n Basic Type 1 Systematic Polysemous Class 1
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Basic Type 1 Systematic Polysemous Class n
Other Examples
“animal natural_object”
alligator broadtail chamois ermine lapin leopard muskrat ...
“natural_object plant
” algarroba almond anise baneberry butternut candlenut cardamon ...
“action artifact group_social”
artillery assembly band church concourse dance gathering institution ...
“action attribute event psychological”
appearance concentration decision deviation difference impulse outrage …
“possession quantity_definite”
cent centime dividend gross penny real shilling
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
CoreLex vs. WordNet
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Representation and Interpretation
„Dotted Types“ (Pustejovsky)
Lexical types are either simple ( human, artifact , ...) or complex ( information AND physical_object ) Can be represented with a „dotted type“, e.g.
information physical_object In (Cooper 2005) interpreted as a record type (
a delicious lunch can take forever
):
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Related Work
Apresjan 1973
Regular Polysemy
.
Nunberg & Zaenen 1992
Systematic polysemy in lexicology and lexicography
.
Bill Dolan 1994
Word Sense Ambiguation: Clustering Related Senses.
Copestake & Briscoe 1996
Semi-productive polysemy and sense extension
.
Peters, Peters & Vossen 1998
Automatic Sense Clustering in EuroWordNet.
Tomuro 1998
Semi-Automatic Induction of Systematic Polysemy from WordNet.
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Meanings
Lexical Semantics in WordNet Generative Lexicon and CoreLex
Tuning WordNet to a Domain
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Reducing Ambiguity
WordNet has too many senses …
Reduce Ambiguity
Cluster related senses (CoreLex) Tune WordNet to an application domain
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Domains and Senses
Domains determine Sense Selection, e.g.
English:
cell
prison cell
in the Politics/Law domain
living cell
in the Biomedical domain English:
tissue
living tissue
in the Biomedical domain
cloth
in the Fashion domain German:
Probe
test
in the Biomedical domain
rehearsal
in the Theater domain >> Compute Domain-Specific Sense
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Approaches
Subject Codes
Domain codes are in the dictionary
Topic Signatures
Compute (domain-specific) context models from dictionary definitions, domain corpora, web resources
Tuning of WordNet to a domain
Top Down: Cucchiarelli & Velardi, 1998 Bottom Up: Buitelaar & Sacaleanu, 2001 Related recent work: McCarthy et al, 2004; Chan & Ng, 2005; Mohammad & Hirst, 2006
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Subject Codes
Subject Codes (as used in LDOCE) indicate a domain in which a word is used in a particular sense Examples (2600 codes)
Sub-Field Codes MDZP (Medicine:Physiology) Code Combinations MLCO (Meteorology+Building) e.g.
lightning conductor
MLUF (Meteorology+Europe+France) e.g.
Mistral high
SN (sounds) DG (drugs) ML (meteorology)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Adding Subject Codes to WordNet
Grouping Synsets together across POS MEDICINE Nouns: Verbs: doctor#1, hospital#1 operate#7 Grouping Synsets together across Sub-Hierarchies SPORT life_form#1: athlete#1 physical_object#1: game_equipment#1 act#2 : sport#1 location#1 : playing_field#1 Magnini B. & Cavaglià G.
Integrating Subject Field Codes into WordNet
In: Proceedings LREC 2000
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WordNet DOMAINS
4 5 6 7 8 9 10
Sense
1 2 3
WordNet synset and gloss
Depository, financial institution, bank, banking concern, banking company
(a financial institution)
Bank
(sloping land)
Bank
(a supply or stock held in reserve)
Bank, bank building
(a building)
Bank
(an arrangement of similar objects)
Savings bank, coin bank, money box, bank
(a container)
Bank
(a long ridge or pile)
Bank
(the funds held by a gambling house )
Bank, cant, camber
(a slope in the turn of a road)
Bank
(a flight maneuver.)
Domains
Economy Geography, Geology Economy Architecture, Economy Factotum Economy Geography, Geology Economy, Play Architecture Transport Bernardo Magnini, Carlo Strapparava, Giovanni Pezzuli, and Alfio Gliozzo.
Using domain information for word sense disambiguation
. In: Proceedings of the SENSEVAL2 workshop 2001.
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WSD with Subject Codes
Match between set of words in the context of the ambiguous word and the set of words (“neighborhoods”) in the definitions + sample sentences of all senses that share a Subject Code
bank
: Economics
bank
: Medicine and Biology
write account take keep paper safe person money pay draw sum put order supply cheque medicine origin treatment use organ product hold place human blood store hospital comb
Guthrie J. A. & Guthrie I. & Wilks Y. & Aidinejad H.
Subject Dependent Co-Occurrence and Word Sense Disambiguation
In: Proceedings of ACL 1991.
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Topic Signatures from the Web
Construct Topic Signatures for WordNet synsets/senses Retrieve document collections from the web and use queries constructed for each WordNet sense, e.g.
( boy AND ( altar boy OR ball boy OR … OR male person ) AND NOT (man OR … OR broth of a boy OR son OR … OR mama’s boy OR black ) )
Agirre E. & Ansa O. & Hovy E. & Martinez D.
Enriching very large ontologies using the WWW
In: Proc. of the Ontology Learning Workshop ECAI 2000
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Top Down Tuning – Cucchiarelli & Velardi
Automatically find the best set of (WordNet) senses that:
“… represent at best the semantics of the domain” “[has the] … ‘right’ level of abstraction, so as to mediate between over ambiguity and generality” “… [is] balanced …, i.e. words should be evenly distributed among categories” Alessandro Cucchiarelli, Paola Velardi
Finding a domain-appropriate sense inventory for semantically tagging a corpus
. Natural Language Engineering 4/4, p.325-344, Dec. 1998.
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Methods Used
Create alternative sets of balanced categories by use of an adapted version of the Hearst/Schütze algorithm Apply a scoring function to find the best set, with parameters: Generality Highest possible level of generalization with a small number of categories is preferred Discrimination Power Different senses lead to different categories (Domain) Coverage Words in the domain corpus that are represented by the selected categories Average Ambiguity Ambiguity reduction is measured by the
inverse
all words of the average ambiguity of
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Balanced Categories Hearst/Schütze
Reduce WordNet noun hierarchy to a set of 726 disjoint categories, each consisting of a relatively large number of synsets and of an average size, with as small a variance as possible Group categories together into a set of 106 super-categories according to mutual co-occurrence in a training corpus Measure the frequency of categories on domain corpora 12.200
11.782
7.859
legal_system, ...
government, ...
politics, ...
United States Constitution
26.459
25.062
24.356
Genesis
religion, ...
breads, ...
mythology, ...
Hearst M. & Schütze H.
Customizing a Lexicon to Better Suit a Computational Task
SIGLEX Workshop 1993 In: Proceedings ACL
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Generality
Generality of Category Set C i :
1/DM(C i ) Average Distance between the Categories of C i and the topmost synsets.
4 + 3 / 2 3 / 1
DM
(
C i
) 1
n
*
j n
1
dm
(
c ij
)
C i = {C i
1,
C i2 } DM (C i )=
(3.5 + 3) / 2 = 3.25
C i
1
C i
2 Topmost SynSet General SynSet
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Discrimination Power
Discrimination Power of Category Set C i : (
N c (C i ) - N pc (C i ))/ N c (C i ) where N c (C i ) is the number of words that reach at least one category of C i and N pc (C i ) is the number of words that have at least two senses that reach the same category c ij of C i C i
1
C i
2
C i
3
C i
4
C i = {C i
1
C i2 C i
3
C i4 }
General Synset Sense Domain Word
w
1
w
2
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
w
3
Coverage & Average Ambiguity
Coverage of Category Set
C i
:
N c (C i )/W where N c (C i ) is the number of words that reach at least one category in C i
Inverse of Average Ambiguity of Category Set
C i
: 1/A
(C i ) A
(
C i
) 1
N c
(
C i
) *
N j c
(
C
1
i
)
Cwj
(
C i
)
where N c (C i ) is the number of words that reach at least one category in C i , and for each word w in this set, Cw j (C i ) is the number of categories in C i reached
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Best Category Set (WSJ)
Category
C 9 C 10 C 11 C 12 C 13 C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8
Higher-level synset
person, individual, someone, mortal, human, soul instrumentality, instrumentation written communication, written language message, content, subject matter, substance measure, quantity, amount, quantum action activity group action organization psychological feature possession state location
Top Down categories for the financial domain, based on the
Wall Street Journal
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Sense Selection with WSJ Set
Sense
1 2 4 5 6
Synset hierarchy for sense
capital
>
asset support
>
device document
>
writing accumulation
>
asset ancestor
>
relative
Top synset for sense
possession
(C 11 )
instrumentality
(C 2 )
written communication
(C 3 )
possession
(C 11 )
person
(C 1 ) Senses for
stock
- kept by domain tuning on the Wall Street Journal 9 10 11 12 14
Sense
3 7 8
Synset hierarchy for sense
stock, inventory
>
merchandise, wares
>…
broth, stock
>
soup
> …
stock, caudex
>
stalk, stem
> …
stock
>
plant part
> …
stock, gillyflower
>
flower
> …
malcolm stock, stock
>
flower
…
lineage, line of descent
> … >
genealogy
> …
lumber, timber
> … Senses for
stock
- discarded by domain tuning on the Wall Street Journal
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Bottom Up Tuning – Buitelaar & Sacaleanu
Ranking of WordNet synsets according to a domain-specific corpus
Compute term relevance against reference corpus Compute synset relevance according to term relevance (where term = synonym in synset) Ranking can be used in WSD (similar to usage of ‘most frequent heuristic’) Paul Buitelaar, Bogdan Sacaleanu
Ranking and Selecting Synsets by Domain Relevance
In: Proceedings of WordNet and Other Lexical Resources: Applications, Extensions and Customizations, NAACL 2001 Workshop, June 3/4 2001
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
TFIDF
tfidf
(
w
)
tf
.
log(
df N
(
w
) ) The word is more important if it appears several times in a target document The word is more important if it appears in less documents
tf(w) df(w) N tfIdf(w) term frequency (number of word occurrences in a document) document frequency (number of documents containing the word) number of all documents relative importance of the word in the document
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Term and Synset Relevance
Term Relevance
Relevance Score of Synset Members
rlv
(
t
|
d
) log(
tf t
,
d
) log(
N df t
)
where t represents the term, d the domain, N is the total number of domains
Synset Relevance
Cumulated Relevance Score for a Synset
rlv
(
c
|
d
)
t
c rlv
(
t
|
d
)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Extended Synset Relevance
Lexical Coverage
Take Length of the Synset Into Account
[Gefängniszelle, Zelle]
("prison cell")
[Zelle]
("living cell")
rlv
(
c
|
d
)
t
c T c rlv
(
t
|
d
)
Hyponyms
Take Hyponyms Into Account
[Zelle,Gefängniszelle,Todeszelle] [Zelle, Körperzelle,Pflanzenzelle] rlv
(
c
|
d
)
t
c
T c rlv
(
t
|
d
)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
all all all yes all yes yes all all yes all yes yes yes yes
Experiment – Medical Domain
Rank ed Terms -- with English translation(s) Eingriff (operation, intervention)
Ranked Concepts 1. [Eingriff:c, Operation:c, Abtreibung, Biopsie, ...] 2. [Eingreifen:c, Eingriff:c, Intervention:c] 1. [Entzündung:c, Infektion:c, Infektionskrankheit:c, ...] 2. [Ansteckung:c, Infektion:c, Übertragung:c]
Infek tion (infection) Studie (study, report) Prophylaxe (prophylaxis) Gewebe (tissue) Medizin (medicine) Gefäß (vascular, container) Zelle (cell) Einschränk ung (constraint, restriction) Aufnahme (intak e, reception) Sek tion (section) Ausdehnung (spread, dimensions) Geburt (birth, rebirth) Abweichung (abnormality, divergence)
1. [Experiment:c, Studie:c, Test:c, Versuch:c,...] 2. [Abhandlung:c, Studie:c] 1. [Prophylaxe:c, Empfängnisverhütung, Impfung, Verhütung] 2. [Prophylaxe:c, Vorbeugung:c, Vorsorge:c, ...] 1. [Gewebe:c, Körpergewebe:c, Bindegewebe, Tumor, ...] 2. [Gewebe:c, Kleiderstoff:c, Stoff:c, Textilstoff:c, ...] 1. [Medizin:c, Chirurgie, Frauenheilkunde, Gynäkologie, ...] 2. [Arznei:c, Arzneimittel:c, Heilmittel:c, Medikament:c, ...] 1. [Gefäß:c, Blutgefäß, Haargefäß, Herzkranzgefäß, Lymphgefäß] 2. [Gefäß:c, Container, Form, Pokal, Schale, Schüssel, Tonne, ...] 1. [Zelle:c, Körperzelle, Pflanzenzelle] 2. [Gefängniszelle:c, Zelle:c, Todeszelle] 1. [Beschränkung:c, Einschränkung:c, Vorbehalt:c] 2. [Beschränkung:c, Degression:c, Drosselung:c, Einschränkung:c] 1. [Aufnahme:c, Aufzeichnung:c, Mitschnitt:c, Protokoll, ...] 2. [Aufnahme:c, Beherbergung:c, Unterbringung:c, Notaufnahme, ...] 1. [Autopsie:c, Leichenöffnung:c, Obduktion:c, Sektion:c] 2. [Amtsbereich:c, Dezernat:c, Geschäftsbereich:c, Sektion:c, ...] 1. [Ausdehnung:c, Rauminhalt:c, Volumen:c] 2. [Ausdehnung:c, Ausweitung:c, Dehnung:c, Erweiterung:c, ...] 1. [Geburt:c, Fehlgeburt, Frühgeburt] 2. [Geburt:c, Wiedergeburt] 1. [ Abweichung:c, Differenz:c, Abnormität, Anomalie, ...] 2. [ Abweichung:c, Differenz:c, Meinungsverschiedenheit] 1. [Probe:c, Blutprobe, Gesteinsprobe, Urinprobe, Wasserprobe] 2. [Bühnenprobe:c, Probe:c, Chorprobe, Generalprobe]
Probe (test, rehearsal)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Related Recent Work
Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll
Finding predominant senses in untagged text. In Proc. of ACL 2004.
Chan, Yee Seng and Ng, Hwee Tou (2005) Word Sense
Disambiguation with Distribution Estimation. Proc. of IJCAI 2005.
Mohammad, Saif and Hirst, Graeme.
Determining word sense dominance using a thesaurus. Proc. of EACL 2006.
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Day 2 - Introduction
Words and Object Descriptions
Semantics on the Semantic Web Semantic Web, Ontologies and Natural Language Processing The Lexical Semantic Web Knowledge Representation as Word Meaning A Lexicon Model for Ontologies Enriching Ontologies with Linguistic Information
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Object Descriptions
Semantics on the Semantic Web
The “Lexical Semantic Web” A Lexicon Model for Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Web Consists of Non-Interpreted Data
Text
Web
Images Tables
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
DBs
Interpretation through Markup - Categories
Markup Web
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Interpretation through Markup – User Tags
Markup
“Web 2.0” © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Interpretation through Markup – User Tags
Markup
“Web 2.0” © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Formal Interpretation - Knowledge Markup
Knowledge Markup Semantic Web Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Formal Interpretation - Knowledge Markup
Knowledge Markup Semantic Web Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Formal Interpretation - Knowledge Markup
Knowledge Markup Semantic Web Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Turns the Web into a Knowledge Base
Knowledge Markup Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Enables Semantic Web Services …
Semantic Web Services Knowledge Markup Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
… and Intelligent Man-Machine Interface
Semantic Web Services Knowledge Markup Ontologies Intelligent Man-Machine Interface
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic Web Layer cake
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Resource Description Framework (RDF)
DFKI GmbH node1 www http://www.dfki.de
Kaiserslautern
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
RDF : XML-based Representation
<
rdf:RDF
xmlns:rdf=“… rdf-syntax-ns#” xmlns:rdfs=“… rdf-schema#” xmlns=“http://example.org”> <
rdf:Description rdf:nodeID
=“node1”>
DFKI GmbH
Kaiserslautern
rdf:Description
>
rdf:RDF
>
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
RDF Schema (RDFS) Representation of classes and properties
Person Student is-a Teacher rdf:Literal Course
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
RDFS : XML-based Representation
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Web Ontology Language (OWL)
OWL adds further modelling vocabulary on top of RDFS, e.g.
Class equivalence Property types (data vs. object property) Based on Description Logics, three versions OWL Lite OWL DL OWL Full
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
OWL Extended knowledge representation
Person Student disjoint is-a Teacher rdf:Literal Course
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
OWL : XML-based Representation
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
XML – RDF – RDFS - OWL
Syntax
XML XML Schema Data Types
Semantics
Namespaces
Interpretation Context
RDF Schema
Formalization:
Class Definition, Properties
RDF OWL
Formalization:
extended Class Definition, Properties, Property Types
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – What they are
Ontology refers to an engineering artifact
a specific vocabulary used to describe a certain reality a set of explicit assumptions regarding the intended meaning of the vocabulary
An Ontology is
an explicit specification of a conceptualization [Gruber 93] a shared understanding of a domain of interest [Uschold/Gruninger 96]
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Why you need them
Make domain assumptions
explicit
Easier to exchange domain assumptions Easier to understand and update legacy data Separate
domain knowledge
from operational knowledge Re-use domain and operational knowledge separately A
community reference
for applications
Shared understanding
means of what particular information
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Applications of Ontologies
NLP
Information Extraction
, e.g. Buitelaar et al. 06, Mädche, Staab & Neumann 00, Nedellec, Rebholz
Information Retrieval (Semantic Search)
, e.g. WebKB (Martin et al. 00), OntoSeek (Guarino et al. 99), Ontobroker (Decker et al. 99)
Question Answering
, e.g. Harabagiu, Schlobach & de Rijke, Aqualog (Lopez and Motta 04)
Machine Translation
, e.g. Nirenburg et al. 04, Beale et al. 95, Hovy, Knight Other
Business Process Modeling
, e.g. Uschold et al. 98
Digital Libraries
, e.g. Amann & Fundulaki 99
Information Integration
, e.g. Kashyap 99; Wiederhold 92
Knowledge Management (incl. Semantic Web)
, e.g. Fensel 01, Staab & Schnurr 00; Sure et al. 00, Abecker et al. 97
Software Agents
, e.g. Gluschko et al. 99; Smith & Poulter 99
User Interfaces
, e.g. Kesseler 96
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies and Their Relatives
Catalogs Thesauri Formal isa General logical constraints Glossaries & Terminologies Semantic Networks Formal Instance Axioms: Disjoint/Inverse…
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Thesauri – Examples : EuroVoc
EuroVoc
covers terminology in all of the official EU languages for all fields (27) that concern the EU institutions, e.g. politics, trade, law, science, energy, agriculture MT UF BT1 BT2 NT1 NT1 RT 3606 natural and applied sciences gene pool genetic resource genetic stock genotype heredity biology life sciences DNA eugenics genetic engineering (6411)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Thesauri – Examples : MeSH
MeSH (Medical Subject Headings)
organized by terms (~ 250,000) that correspond to medical subjects for each term syntactic, morphological or semantic variants are given MeSH Heading Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term See Also Databases, Genetic Genetic Databases Genetic Sequence Databases OMIM Online Mendelian Inheritance in Man Genetic Data Banks Genetic Data Bases Genetic Databanks Genetic Information Databases Genetic Screening
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic Networks - Examples : UMLS
Unified Medical Language System
integrates linguistic, terminological and semantic information Semantic Network consists of 134 semantic types and 54 relations between types Pharmacologic Substance affects Pharmacologic Substance causes Pharmacologic Substance complicates Pharmacologic Substance diagnoses Pharmacologic Substance prevents Pharmacologic Substance treats Pathologic Function Pathologic Function Pathologic Function Pathologic Function Pathologic Function Pathologic Function
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic Networks - Examples : GO
GO (Gene Ontology)
Aligns descriptions of gene products in different databases, including plant, animal and microbial genomes Organizing principles are molecular function, biological process and cellular component Accession: Ontology: Synonyms: Definition: Term Lineage GO:0009292 biological process broad: genetic exchange In the absence of a sexual life cycle, the processes involved in the introduction of genetic information to create a genetically different individual.
all : all (164142) GO:0008150 : biological process (115947) GO:0007275 : development (11892) GO:0009292 : genetic transfer (69)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example I
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example II
Geographical Entity (GE)
is-a flow_through
Natural GE Inhabited GE
capital_of
mountain river country city
instance_of
Zugspitze
height (m)
2962 Neckar
located_in
Germany F-Logic
capital_of
similar Ontology
length (km)
367
flow_through located_in flow_through
Stuttgart Berlin Design: Philipp Cimiano
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies for NLP
Information Retrieval
Query Expansion
Machine Translation
Interlingua
Information Extraction
Template Definition Semantic Integration
Question Answering
Question Analysis Answer Selection
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Information Extraction
Class-based Template Definition
Allows for Reasoning over Extracted Templates with Respect to the Ontology (see e.g. [Nedellec and Nazarenko 2005] for discussion)
Semantic Integration
Extraction from Heterogeneous Sources (Text, Tables and other Semi-Structured Data, Image Captions) – SmartWeb [Buitelaar et al. 06] Multi-Document Information Extraction – ArtEquAKT [Alani et al. 2003]
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Question Answering
Question Analysis Ontology/WordNet-based Semantic Question Interpretation (e.g. [Pasca and Harabagiu 01]) Answer Selection Ontology/WordNet-based Reasoning for Answer Type-Checking Ontology of Events [Sinha and Narayanan 05] Geographical Ontology, WordNet [Schlobach & de Rijke 04] WordNet [Pasca and Harabagiu 01] Ontology-based Question Answering Derive Answers from a Knowledge Base (e.g. Aqualog [Lopez & Motta 04])
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology Life Cycle
Populate
Knowledge Base Generation
Validate
Consistency Checks
Create/Select
Development and/or Selection
Deploy
Knowledge Retrieval
Maintain
Usability Tests
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Evolve
Extension, Modification
NLP in the Ontology Life Cycle
Ontology Population
Information Extraction
KB Retrieval
Question Answering
Ontology Learning
Text Mining
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology Learning
x ( country(x) y capital_of (y, x) z ( capital_of (z, x) y z)) disjoint(r iver, mountain) GeneralAxioms Axiom Schemata capital_of R located_in Relation Hierarchy flow_throu gh(dom : river, range : GE) capital C city, city C Inhabited GE c : country : i(c), c , Ref C (c) {country, nation, Land} river, country, nation, city, capital,..
.
Relations Concept Hierarchy Concept Formation (Multilingual) Synonyms Terms Design: Philipp Cimiano
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Object Descriptions
Semantics on the Semantic Web
The “Lexical Semantic Web”
A Lexicon Model for Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Dictionary: Words and Senses
Represent interpretations of words through senses, very much like classes that are assigned to a word, e.g.
article
1.
2. An individual thing or element of a class… A particular section or item of a series in a written document… 3. A non-fictional literary composition that forms an independent part of a publication… 4. The part of speech used to indicate nouns and to specify their application 5. A particular part or subject; a specific matter or point (as provided by http://dictionary.reference.com/ )
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology: Classes and Labels - I
Ontologies assign labels (i.e. words) to a given class In the
COMMA
ontology on document management the class
article
corresponds to sense 2 (‘section of a written document’): http://pauillac.inria.fr/cdrom/ftp/ocomma/comma.rdfs
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology Classes and Labels - II
In the
article GOLD
ontology on linguistics, the class label corresponds to sense 4 (‘part of speech ’): http://emeld.org/gold
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
The Meaning of
Director
- I
The Semantic Web can be viewed as a large, distributed dictionary (or rather a semantic lexicon) in which we can look up the meaning of words, e.g.
director … as a ‘role’ (AgentCities ontology)
http://www-agentcities.doc.ic.ac.uk/ontology/shows.daml
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
The Meaning of
Director
- II
… as ‘head of a program’ (University Benchmark ontology)
http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Exploring the Lexical Semantic Web
Collect ontologies
OntoSelect
Analyse the use of class/property labels
Treat class/property labels as lexical entries
Normalize Organize by language
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology Collection
OntoSelect
Web Monitor on DAML, RDFS, OWL Files Download, Analyze and Store Included Information and Metadata Class and Property Labels Multilingual Information Included Ontologies Ontology Ranking and Selection Functionalities
http://olp.dfki.de/OntoSelect
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
OntoSelect
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Multilinguality on the Semantic Web
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Multilingual Labels
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
“Lexical Semantic Ambiguity”
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Object Descriptions
Semantics on the Semantic Web The “Lexical Semantic Web”
A Lexicon Model for Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example III
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example III (continued)
studies_at
Student
located_at
Campus University
works_at is_part_of
“Fakultät” Staff
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example III (continued)
studies_at
Student
located_at
Campus University
works_at
“Fakultät”
has_German_term
Fakultät
is_part_of has_Dutch_term has_US_English_term
Faculteit Staff School
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example III (continued)
University “Fakultät”
is_part_of
Term
instance_of
Fakultät
language has_term language instance_of
faculteit
language
school DE NL
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
EN-US
Semiotic Triangle
Ogden & Richards, 1923 based on Structural Linguistics studies (de Saussure, 1916) adopted in Knowledge Representation (e.g. Sowa, 1984)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
LingInfo Model – Simplified
rdf:type URI property ...
rdfs:Class rdfs:subClassOf feat:ClassWithFeats feat:ClassWithFeats o:FootballPlayer feat:ClassWithFeats o:Defender feat:lingFeat ...
rdfs: subClassOf feat:ClassWithFeats o:Midfielder feat:imgFeat feat:lingFeat lf:LingFeat lf:lang lf:term … “de” “Abwehrspieler” lf:LingFeat lf:lang lf:term … “de” “Mittelfeldspieler” rdfs:Class if:ImgFeat rdfs:Class lf:LingFeat ...
meta-classes classes if:ImgFeat instances if:color “#111111” lf:texture … “&keypatchSet_223 Design: Michael Sintek
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
LingInfo Model
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
LingInfo Instances - Example
Fußballspielers „of the football player“
lang
inst0 : LingInfo
de morphSynDecomp term Fußballspielers case gender number
inst2 : Stem
ortographicForm partOfSpeech isComposedOf … nominative male singular Fußballspieler Noun analysisIndex
inst3 : Stem
1 orthographicForm ...
Fußball isComposedOf function root semantics modifier case gender
inst1 : InflectedWordForm
genitive male number ortographicForm partOfSpeech wordForm … singular Fußballspielers Noun orthographicForm …
inst1 : Root
Spieler analysisIndex orthographicForm … root …
inst8 : Stem
2 Spieler
o:BallObject inst7 : Stem (Ball) inst5 : Stem (Fuß)
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
inst4 : Root (Ball) inst6 : Root (Fuß)
LingInfo Predicate-Arg Structure
Design: Anette Frank
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Conclusions
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Conclusions
WordNet: Appropriate Use may include
Introduction of underspecified senses (sense grouping) Tuning to a domain
The “Lexical Semantic Web”
The Semantic Web (and Web 2.0) is a potentially rich resource for (formal) lexical semantics Mining such resources for lexical semantics (i.e. compilation of a distributed semantic lexicon) only just started Ontologies to be extended with linguistic/lexical information
© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia