AP 5.1 – Question Answering

Download Report

Transcript AP 5.1 – Question Answering

Lexical Semantics and Ontologies

Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing

Paul Buitelaar

Language Technology Lab & Competence Center Semantic Web DFKI GmbH Saarbrücken, Germany © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Overview

Day 1: Words and Meanings

 Human language as a system  How do words relate to each other 

Day 2: Words and Object Descriptions

 Human language as a means of representation  How do words represent objects in the/a world

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Day 1 - Introduction

Words and Meanings

 Synsets and Senses  Lexical Semantics in WordNet  Related Senses  Generative Lexicon and CoreLex  Domains and Senses  Tuning WordNet to a Domain

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Words and Meanings

Lexical Semantics in WordNet

Generative Lexicon and CoreLex Tuning WordNet to a Domain

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

WordNet

Lexical Semantic Resource

Semantic Lexicon

 Maps words to meanings (senses) 

Lexical Database

 Machine readable (has a formal structure) 

Freely available

http://wordnet.princeton.edu/

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

WordNet - Origins

In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database … The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically … WordNet … instantiates hypotheses based on results of psycholinguistic research … … expose such hypotheses to the full range of the common vocabulary In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter ‘‘apple,’’ even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. (Caramazza/Berndt 1978)

Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller.

``Introduction to WordNet: an on-line lexical database.'' In:

International Journal of Lexicography

3 (4), 1990, pp. 235 - 244.

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Synsets

WordNet is organized around word meaning (not word forms as with traditional lexicons)

  Word meaning is represented by “synsets” Synset is a “Set of Synonyms” 

Example

{board, plank}

 Piece of lumber 

{board, committee}

 Group of people

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Synset Hierarchy

 Synsets are organized in hierarchies  Defines:   generalization (hypernymy) specialization (hyponymy)  Example {entity} … {whole, unit} {building material} {lumber, timber} {board, plank}

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

hypernymy hyponymy

Hierarchies (WordNet 1.7)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Hierarchy Example (WordNet 2.1)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Synsets and Senses

 

Synsets represent word meaning

 Words that occur in several synsets have a corresponding number of meanings (senses)

Example

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

WordNet 2.1

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

(Other) WordNet Relations

 Synonymy 

Similar in meaning

 Hypernymy/Hyponymy 

Generalization and Specialization

 Meronymy 

Part-of

e.g. study, bathroom, ...

meronym

house

 Antonymy 

Opposite in meaning

e.g. warm

antonym

cold

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Words and Meanings

Lexical Semantics in WordNet

Generative Lexicon and CoreLex

Tuning WordNet to a Domain

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Systematic Polysemy

Homonymy

bank

embankment institution

We walked along the bank of the Charles river.

Did he have an account at the HBU bank?

Systematic Polysemy

school

group (of people) (learning) process organization building

The school went for an outing.

School starts at 8.30

The school was founded in 1910.

The school has a new roof.

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Semantic or Pragmatic?

Semantic Analysis Pragmatic Analysis

Lexical Items of the Language

school

Objects in the World

Obj1 Obj4 Obj2 Obj3

school

Obj1 Obj2 Obj4 Obj3

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Underspecified Discourse Referents

 Anaphora Resolution 

[A long book heavily weighted with military technicalities] NP:event-physical_object content , in this edition it is neither so long event nor so technical content as it was originally.

 Metonymy 

The Boston office called

 

office >

person person part-of

office

 Bridging 

Peter bought a car. The engine runs well.

engine

part-of

car

The Boston office called. They asked for a new price.

office > person

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Generative Lexicon Theory Type Coercion

I began the book book >

event event ‘has-relation-with’

book read

is-a

event

multifaceted representation of lexical semantics

 reflecting systematic / regular / logical polysemy

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Generative Lexicon Theory Qualia Structure (Pustejovsky 1995)

Formal

book

formal inheritance (is-a / hyponymy) artifact, communication, … Constitutive

book

modification (part-of / meronymy) constitutive

section

, … Telic

book

Agentive

book

telic purpose („what is the object used for“)

read

, … agentive causality („how did the object come about“)

write

, …

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

CoreLex (Buitelaar 1998)

 Automatic Qualia Structure Acquisition   CoreLex is an attempt to automatically acquire underspecified lexical semantic representations that reflect systematic polysemy These representations can be viewed as shallow Qualia Structures  Sense Distribution in WordNet  Systematic polysemy can be empirically studied in WordNet by observing sense distributions

>> If more than two words share the same sense distribution (i.e. have the same set of senses), then this may indicate a pattern of systematic polysemy

(adapted from Apresjan 1973)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Systematic Polysemous Classes

book

1.{publication} 2.{product, production} 3.{fact} 4.{dramatic_composition, dramatic_work} 5.{record} 6.{section, subdivision} 7.{journal} => artifact => artifact => communication => communication => communication => communication => artifact

Systematic Polysemous Class

“artifact communication”

amulet annals armband arrow article ballad bauble beacon bible birdcall blank blinker boilerplate book bunk cachet canto catalog catalogue chart chevron clout compact compendium convertible copperplate copy cordon corker ... guillotine homophony horoscope indicator journal laurels lay ledger loophole marker memorial nonsense novel obbligato obelisk obligato overture pamphlet pastoral paternoster pedal pennant phrase platform portrait prescription print puzzle radiogram rasp recap riddle rondeau … statement stave stripe talisman taw text tocsin token transcription trophy trumpery wand well whistle wire wrapper yardstick

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

From WordNet to CoreLex

Noun 1 Noun n Basic Type 1 Systematic Polysemous Class 1

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Basic Type 1 Systematic Polysemous Class n

Other Examples

“animal natural_object”

alligator broadtail chamois ermine lapin leopard muskrat ...

“natural_object plant

” algarroba almond anise baneberry butternut candlenut cardamon ...

“action artifact group_social”

artillery assembly band church concourse dance gathering institution ...

“action attribute event psychological”

appearance concentration decision deviation difference impulse outrage …

“possession quantity_definite”

cent centime dividend gross penny real shilling

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

CoreLex vs. WordNet

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Representation and Interpretation

„Dotted Types“ (Pustejovsky)

 Lexical types are either simple ( human, artifact , ...) or complex ( information AND physical_object )  Can be represented with a „dotted type“, e.g.

information  physical_object  In (Cooper 2005) interpreted as a record type (

a delicious lunch can take forever

):

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Related Work

 Apresjan 1973 

Regular Polysemy

.

 Nunberg & Zaenen 1992 

Systematic polysemy in lexicology and lexicography

.

 Bill Dolan 1994 

Word Sense Ambiguation: Clustering Related Senses.

 Copestake & Briscoe 1996 

Semi-productive polysemy and sense extension

.

 Peters, Peters & Vossen 1998 

Automatic Sense Clustering in EuroWordNet.

 Tomuro 1998 

Semi-Automatic Induction of Systematic Polysemy from WordNet.

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Words and Meanings

Lexical Semantics in WordNet Generative Lexicon and CoreLex

Tuning WordNet to a Domain

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Reducing Ambiguity

WordNet has too many senses …

Reduce Ambiguity

 Cluster related senses (CoreLex)  Tune WordNet to an application domain

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Domains and Senses

Domains determine Sense Selection, e.g.

 English:

cell

 

prison cell

in the Politics/Law domain

living cell

in the Biomedical domain  English:

tissue

 

living tissue

in the Biomedical domain

cloth

in the Fashion domain  German:

Probe

 

test

in the Biomedical domain

rehearsal

in the Theater domain >> Compute Domain-Specific Sense

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Approaches

Subject Codes

 Domain codes are in the dictionary 

Topic Signatures

 Compute (domain-specific) context models from dictionary definitions, domain corpora, web resources 

Tuning of WordNet to a domain

   Top Down: Cucchiarelli & Velardi, 1998 Bottom Up: Buitelaar & Sacaleanu, 2001 Related recent work: McCarthy et al, 2004; Chan & Ng, 2005; Mohammad & Hirst, 2006

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Subject Codes

 

Subject Codes (as used in LDOCE) indicate a domain in which a word is used in a particular sense Examples (2600 codes)

  Sub-Field Codes  MDZP (Medicine:Physiology) Code Combinations   MLCO (Meteorology+Building) e.g.

lightning conductor

MLUF (Meteorology+Europe+France) e.g.

Mistral high

SN (sounds) DG (drugs) ML (meteorology)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Adding Subject Codes to WordNet

 Grouping Synsets together across POS MEDICINE Nouns: Verbs: doctor#1, hospital#1 operate#7  Grouping Synsets together across Sub-Hierarchies SPORT life_form#1: athlete#1 physical_object#1: game_equipment#1 act#2 : sport#1 location#1 : playing_field#1 Magnini B. & Cavaglià G.

Integrating Subject Field Codes into WordNet

In: Proceedings LREC 2000

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

WordNet DOMAINS

4 5 6 7 8 9 10

Sense

1 2 3

WordNet synset and gloss

Depository, financial institution, bank, banking concern, banking company

(a financial institution)

Bank

(sloping land)

Bank

(a supply or stock held in reserve)

Bank, bank building

(a building)

Bank

(an arrangement of similar objects)

Savings bank, coin bank, money box, bank

(a container)

Bank

(a long ridge or pile)

Bank

(the funds held by a gambling house )

Bank, cant, camber

(a slope in the turn of a road)

Bank

(a flight maneuver.)

Domains

Economy Geography, Geology Economy Architecture, Economy Factotum Economy Geography, Geology Economy, Play Architecture Transport Bernardo Magnini, Carlo Strapparava, Giovanni Pezzuli, and Alfio Gliozzo.

Using domain information for word sense disambiguation

. In: Proceedings of the SENSEVAL2 workshop 2001.

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

WSD with Subject Codes

 Match between set of words in the context of the ambiguous word and the set of words (“neighborhoods”) in the definitions + sample sentences of all senses that share a Subject Code

bank

: Economics

bank

: Medicine and Biology

write account take keep paper safe person money pay draw sum put order supply cheque medicine origin treatment use organ product hold place human blood store hospital comb

Guthrie J. A. & Guthrie I. & Wilks Y. & Aidinejad H.

Subject Dependent Co-Occurrence and Word Sense Disambiguation

In: Proceedings of ACL 1991.

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Topic Signatures from the Web

 Construct Topic Signatures for WordNet synsets/senses  Retrieve document collections from the web and use queries constructed for each WordNet sense, e.g.

( boy AND ( altar boy OR ball boy OR … OR male person ) AND NOT (man OR … OR broth of a boy OR son OR … OR mama’s boy OR black ) )

Agirre E. & Ansa O. & Hovy E. & Martinez D.

Enriching very large ontologies using the WWW

In: Proc. of the Ontology Learning Workshop ECAI 2000

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Top Down Tuning – Cucchiarelli & Velardi

Automatically find the best set of (WordNet) senses that:

 “… represent at best the semantics of the domain”  “[has the] … ‘right’ level of abstraction, so as to mediate between over ambiguity and generality”  “… [is] balanced …, i.e. words should be evenly distributed among categories” Alessandro Cucchiarelli, Paola Velardi

Finding a domain-appropriate sense inventory for semantically tagging a corpus

. Natural Language Engineering 4/4, p.325-344, Dec. 1998.

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Methods Used

 Create alternative sets of balanced categories by use of an adapted version of the Hearst/Schütze algorithm  Apply a scoring function to find the best set, with parameters:  Generality  Highest possible level of generalization with a small number of categories is preferred  Discrimination Power  Different senses lead to different categories  (Domain) Coverage  Words in the domain corpus that are represented by the selected categories  Average Ambiguity  Ambiguity reduction is measured by the

inverse

all words of the average ambiguity of

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Balanced Categories Hearst/Schütze

   Reduce WordNet noun hierarchy to a set of 726 disjoint categories, each consisting of a relatively large number of synsets and of an average size, with as small a variance as possible Group categories together into a set of 106 super-categories according to mutual co-occurrence in a training corpus Measure the frequency of categories on domain corpora 12.200

11.782

7.859

legal_system, ...

government, ...

politics, ...

United States Constitution

26.459

25.062

24.356

Genesis

religion, ...

breads, ...

mythology, ...

Hearst M. & Schütze H.

Customizing a Lexicon to Better Suit a Computational Task

SIGLEX Workshop 1993 In: Proceedings ACL

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Generality

Generality of Category Set C i :

1/DM(C i ) Average Distance between the Categories of C i and the topmost synsets.

4 + 3 / 2 3 / 1

DM

(

C i

)  1

n

*

j n

  1

dm

(

c ij

)

C i = {C i

1,

C i2 } DM (C i )=

(3.5 + 3) / 2 = 3.25

C i

1

C i

2 Topmost SynSet General SynSet

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Discrimination Power

Discrimination Power of Category Set C i : (

N c (C i ) - N pc (C i ))/ N c (C i ) where N c (C i ) is the number of words that reach at least one category of C i and N pc (C i ) is the number of words that have at least two senses that reach the same category c ij of C i C i

1

C i

2

C i

3

C i

4

C i = {C i

1

C i2 C i

3

C i4 }

General Synset Sense Domain Word

w

1

w

2

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

w

3

Coverage & Average Ambiguity

Coverage of Category Set

C i

:

N c (C i )/W where N c (C i ) is the number of words that reach at least one category in C i

Inverse of Average Ambiguity of Category Set

C i

: 1/A

(C i ) A

(

C i

) 1

N c

(

C i

) *

N j c

(

C

  1

i

)

Cwj

(

C i

)

where N c (C i ) is the number of words that reach at least one category in C i , and for each word w in this set, Cw j (C i ) is the number of categories in C i reached

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Best Category Set (WSJ)

Category

C 9 C 10 C 11 C 12 C 13 C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8

Higher-level synset

person, individual, someone, mortal, human, soul instrumentality, instrumentation written communication, written language message, content, subject matter, substance measure, quantity, amount, quantum action activity group action organization psychological feature possession state location

Top Down categories for the financial domain, based on the

Wall Street Journal

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Sense Selection with WSJ Set

Sense

1 2 4 5 6

Synset hierarchy for sense

capital

>

asset support

>

device document

>

writing accumulation

>

asset ancestor

>

relative

Top synset for sense

possession

(C 11 )

instrumentality

(C 2 )

written communication

(C 3 )

possession

(C 11 )

person

(C 1 ) Senses for

stock

- kept by domain tuning on the Wall Street Journal 9 10 11 12 14

Sense

3 7 8

Synset hierarchy for sense

stock, inventory

>

merchandise, wares

>…

broth, stock

>

soup

> …

stock, caudex

>

stalk, stem

> …

stock

>

plant part

> …

stock, gillyflower

>

flower

> …

malcolm stock, stock

>

flower

lineage, line of descent

> … >

genealogy

> …

lumber, timber

> … Senses for

stock

- discarded by domain tuning on the Wall Street Journal

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Bottom Up Tuning – Buitelaar & Sacaleanu

Ranking of WordNet synsets according to a domain-specific corpus

 Compute term relevance against reference corpus  Compute synset relevance according to term relevance (where term = synonym in synset)  Ranking can be used in WSD (similar to usage of ‘most frequent heuristic’) Paul Buitelaar, Bogdan Sacaleanu

Ranking and Selecting Synsets by Domain Relevance

In: Proceedings of WordNet and Other Lexical Resources: Applications, Extensions and Customizations, NAACL 2001 Workshop, June 3/4 2001

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

TFIDF

tfidf

(

w

) 

tf

.

log(

df N

(

w

) ) The word is more important if it appears several times in a target document The word is more important if it appears in less documents

tf(w) df(w) N tfIdf(w) term frequency (number of word occurrences in a document) document frequency (number of documents containing the word) number of all documents relative importance of the word in the document

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Term and Synset Relevance

Term Relevance

 Relevance Score of Synset Members

rlv

(

t

|

d

)  log(

tf t

,

d

) log(

N df t

)

where t represents the term, d the domain, N is the total number of domains

Synset Relevance

 Cumulated Relevance Score for a Synset

rlv

(

c

|

d

) 

t

 

c rlv

(

t

|

d

)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Extended Synset Relevance

Lexical Coverage

 Take Length of the Synset Into Account

[Gefängniszelle, Zelle]

("prison cell")

[Zelle]

("living cell")

rlv

(

c

|

d

) 

t

 

c T c rlv

(

t

|

d

) 

Hyponyms

 Take Hyponyms Into Account

[Zelle,Gefängniszelle,Todeszelle] [Zelle, Körperzelle,Pflanzenzelle] rlv

(

c

 |

d

)  

t

c

T c rlv

(

t

|

d

)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

all all all yes all yes yes all all yes all yes yes yes yes

Experiment – Medical Domain

Rank ed Terms -- with English translation(s) Eingriff (operation, intervention)

Ranked Concepts 1. [Eingriff:c, Operation:c, Abtreibung, Biopsie, ...] 2. [Eingreifen:c, Eingriff:c, Intervention:c] 1. [Entzündung:c, Infektion:c, Infektionskrankheit:c, ...] 2. [Ansteckung:c, Infektion:c, Übertragung:c]

Infek tion (infection) Studie (study, report) Prophylaxe (prophylaxis) Gewebe (tissue) Medizin (medicine) Gefäß (vascular, container) Zelle (cell) Einschränk ung (constraint, restriction) Aufnahme (intak e, reception) Sek tion (section) Ausdehnung (spread, dimensions) Geburt (birth, rebirth) Abweichung (abnormality, divergence)

1. [Experiment:c, Studie:c, Test:c, Versuch:c,...] 2. [Abhandlung:c, Studie:c] 1. [Prophylaxe:c, Empfängnisverhütung, Impfung, Verhütung] 2. [Prophylaxe:c, Vorbeugung:c, Vorsorge:c, ...] 1. [Gewebe:c, Körpergewebe:c, Bindegewebe, Tumor, ...] 2. [Gewebe:c, Kleiderstoff:c, Stoff:c, Textilstoff:c, ...] 1. [Medizin:c, Chirurgie, Frauenheilkunde, Gynäkologie, ...] 2. [Arznei:c, Arzneimittel:c, Heilmittel:c, Medikament:c, ...] 1. [Gefäß:c, Blutgefäß, Haargefäß, Herzkranzgefäß, Lymphgefäß] 2. [Gefäß:c, Container, Form, Pokal, Schale, Schüssel, Tonne, ...] 1. [Zelle:c, Körperzelle, Pflanzenzelle] 2. [Gefängniszelle:c, Zelle:c, Todeszelle] 1. [Beschränkung:c, Einschränkung:c, Vorbehalt:c] 2. [Beschränkung:c, Degression:c, Drosselung:c, Einschränkung:c] 1. [Aufnahme:c, Aufzeichnung:c, Mitschnitt:c, Protokoll, ...] 2. [Aufnahme:c, Beherbergung:c, Unterbringung:c, Notaufnahme, ...] 1. [Autopsie:c, Leichenöffnung:c, Obduktion:c, Sektion:c] 2. [Amtsbereich:c, Dezernat:c, Geschäftsbereich:c, Sektion:c, ...] 1. [Ausdehnung:c, Rauminhalt:c, Volumen:c] 2. [Ausdehnung:c, Ausweitung:c, Dehnung:c, Erweiterung:c, ...] 1. [Geburt:c, Fehlgeburt, Frühgeburt] 2. [Geburt:c, Wiedergeburt] 1. [ Abweichung:c, Differenz:c, Abnormität, Anomalie, ...] 2. [ Abweichung:c, Differenz:c, Meinungsverschiedenheit] 1. [Probe:c, Blutprobe, Gesteinsprobe, Urinprobe, Wasserprobe] 2. [Bühnenprobe:c, Probe:c, Chorprobe, Generalprobe]

Probe (test, rehearsal)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Related Recent Work

 Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll 

Finding predominant senses in untagged text. In Proc. of ACL 2004.

 Chan, Yee Seng and Ng, Hwee Tou (2005)  Word Sense

Disambiguation with Distribution Estimation. Proc. of IJCAI 2005.

 Mohammad, Saif and Hirst, Graeme. 

Determining word sense dominance using a thesaurus. Proc. of EACL 2006.

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Day 2 - Introduction

Words and Object Descriptions

 Semantics on the Semantic Web  Semantic Web, Ontologies and Natural Language Processing  The Lexical Semantic Web  Knowledge Representation as Word Meaning  A Lexicon Model for Ontologies  Enriching Ontologies with Linguistic Information

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Words and Object Descriptions

Semantics on the Semantic Web

The “Lexical Semantic Web” A Lexicon Model for Ontologies

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Web Consists of Non-Interpreted Data

Text

Web

Images Tables

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

DBs

Interpretation through Markup - Categories

Markup Web

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Interpretation through Markup – User Tags

Markup

“Web 2.0” © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Interpretation through Markup – User Tags

Markup

“Web 2.0” © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Formal Interpretation - Knowledge Markup

Knowledge Markup Semantic Web Ontologies

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Formal Interpretation - Knowledge Markup

Knowledge Markup Semantic Web Ontologies

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Formal Interpretation - Knowledge Markup

Knowledge Markup Semantic Web Ontologies

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Turns the Web into a Knowledge Base

Knowledge Markup Ontologies

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Enables Semantic Web Services …

Semantic Web Services Knowledge Markup Ontologies

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

… and Intelligent Man-Machine Interface

Semantic Web Services Knowledge Markup Ontologies Intelligent Man-Machine Interface

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Semantic Web Layer cake

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Resource Description Framework (RDF)

DFKI GmbH node1 www http://www.dfki.de

Kaiserslautern

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

RDF : XML-based Representation

<

rdf:RDF

xmlns:rdf=“… rdf-syntax-ns#” xmlns:rdfs=“… rdf-schema#” xmlns=“http://example.org”> <

rdf:Description rdf:nodeID

=“node1”>

DFKI GmbH

Kaiserslautern

rdf:Description

>

rdf:RDF

>

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

RDF Schema (RDFS) Representation of classes and properties

Person Student is-a Teacher rdf:Literal Course

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

RDFS : XML-based Representation

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Web Ontology Language (OWL)

 OWL adds further modelling vocabulary on top of RDFS, e.g.

  Class equivalence Property types (data vs. object property)  Based on Description Logics, three versions    OWL Lite OWL DL OWL Full

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

OWL Extended knowledge representation

Person Student disjoint is-a Teacher rdf:Literal Course

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

OWL : XML-based Representation

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

XML – RDF – RDFS - OWL

Syntax

XML XML Schema Data Types

Semantics

Namespaces

Interpretation Context

RDF Schema

Formalization:

Class Definition, Properties

RDF OWL

Formalization:

extended Class Definition, Properties, Property Types

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies – What they are

Ontology refers to an engineering artifact

 a specific vocabulary used to describe a certain reality  a set of explicit assumptions regarding the intended meaning of the vocabulary 

An Ontology is

 an explicit specification of a conceptualization [Gruber 93]  a shared understanding of a domain of interest [Uschold/Gruninger 96]

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies – Why you need them

 Make domain assumptions

explicit

  Easier to exchange domain assumptions Easier to understand and update legacy data  Separate

domain knowledge

from operational knowledge  Re-use domain and operational knowledge separately  A

community reference

for applications 

Shared understanding

means of what particular information

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Applications of Ontologies

 NLP 

Information Extraction

, e.g. Buitelaar et al. 06, Mädche, Staab & Neumann 00, Nedellec, Rebholz   

Information Retrieval (Semantic Search)

, e.g. WebKB (Martin et al. 00), OntoSeek (Guarino et al. 99), Ontobroker (Decker et al. 99)

Question Answering

, e.g. Harabagiu, Schlobach & de Rijke, Aqualog (Lopez and Motta 04)

Machine Translation

, e.g. Nirenburg et al. 04, Beale et al. 95, Hovy, Knight  Other  

Business Process Modeling

, e.g. Uschold et al. 98

Digital Libraries

, e.g. Amann & Fundulaki 99    

Information Integration

, e.g. Kashyap 99; Wiederhold 92

Knowledge Management (incl. Semantic Web)

, e.g. Fensel 01, Staab & Schnurr 00; Sure et al. 00, Abecker et al. 97

Software Agents

, e.g. Gluschko et al. 99; Smith & Poulter 99

User Interfaces

, e.g. Kesseler 96

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies and Their Relatives

Catalogs Thesauri Formal isa General logical constraints Glossaries & Terminologies Semantic Networks Formal Instance Axioms: Disjoint/Inverse…

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Thesauri – Examples : EuroVoc

EuroVoc

 covers terminology in all of the official EU languages  for all fields (27) that concern the EU institutions, e.g. politics, trade, law, science, energy, agriculture MT UF BT1 BT2 NT1 NT1 RT 3606 natural and applied sciences gene pool genetic resource genetic stock genotype heredity biology life sciences DNA eugenics genetic engineering (6411)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Thesauri – Examples : MeSH

MeSH (Medical Subject Headings)

 organized by terms (~ 250,000) that correspond to medical subjects  for each term syntactic, morphological or semantic variants are given MeSH Heading Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term See Also Databases, Genetic Genetic Databases Genetic Sequence Databases OMIM Online Mendelian Inheritance in Man Genetic Data Banks Genetic Data Bases Genetic Databanks Genetic Information Databases Genetic Screening

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Semantic Networks - Examples : UMLS

Unified Medical Language System

 integrates linguistic, terminological and semantic information  Semantic Network consists of 134 semantic types and 54 relations between types Pharmacologic Substance affects Pharmacologic Substance causes Pharmacologic Substance complicates Pharmacologic Substance diagnoses Pharmacologic Substance prevents Pharmacologic Substance treats Pathologic Function Pathologic Function Pathologic Function Pathologic Function Pathologic Function Pathologic Function

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Semantic Networks - Examples : GO

GO (Gene Ontology)

  Aligns descriptions of gene products in different databases, including plant, animal and microbial genomes Organizing principles are molecular function, biological process and cellular component Accession: Ontology: Synonyms: Definition: Term Lineage GO:0009292 biological process broad: genetic exchange In the absence of a sexual life cycle, the processes involved in the introduction of genetic information to create a genetically different individual.

all : all (164142) GO:0008150 : biological process (115947) GO:0007275 : development (11892) GO:0009292 : genetic transfer (69)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies – Example I

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies – Example II

Geographical Entity (GE)

is-a flow_through

Natural GE Inhabited GE

capital_of

mountain river country city

instance_of

Zugspitze

height (m)

2962 Neckar

located_in

Germany F-Logic

capital_of

similar Ontology

length (km)

367

flow_through located_in flow_through

Stuttgart Berlin Design: Philipp Cimiano

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies for NLP

Information Retrieval

 Query Expansion 

Machine Translation

 Interlingua 

Information Extraction

  Template Definition Semantic Integration 

Question Answering

  Question Analysis Answer Selection

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Information Extraction

Class-based Template Definition

 Allows for Reasoning over Extracted Templates with Respect to the Ontology (see e.g. [Nedellec and Nazarenko 2005] for discussion) 

Semantic Integration

  Extraction from Heterogeneous Sources (Text, Tables and other Semi-Structured Data, Image Captions) – SmartWeb [Buitelaar et al. 06] Multi-Document Information Extraction – ArtEquAKT [Alani et al. 2003]

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Question Answering

 Question Analysis  Ontology/WordNet-based Semantic Question Interpretation (e.g. [Pasca and Harabagiu 01])  Answer Selection  Ontology/WordNet-based Reasoning for Answer Type-Checking    Ontology of Events [Sinha and Narayanan 05] Geographical Ontology, WordNet [Schlobach & de Rijke 04] WordNet [Pasca and Harabagiu 01]  Ontology-based Question Answering  Derive Answers from a Knowledge Base (e.g. Aqualog [Lopez & Motta 04])

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontology Life Cycle

Populate

Knowledge Base Generation

Validate

Consistency Checks

Create/Select

Development and/or Selection

Deploy

Knowledge Retrieval

Maintain

Usability Tests

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Evolve

Extension, Modification

NLP in the Ontology Life Cycle

Ontology Population

Information Extraction

KB Retrieval

Question Answering

Ontology Learning

Text Mining

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontology Learning

 x ( country(x)   y capital_of (y, x)   z ( capital_of (z, x)  y  z)) disjoint(r iver, mountain) GeneralAxioms Axiom Schemata capital_of  R located_in Relation Hierarchy flow_throu gh(dom : river, range : GE) capital  C city, city  C Inhabited GE c :  country :   i(c), c , Ref C (c)  {country, nation, Land} river, country, nation, city, capital,..

.

Relations Concept Hierarchy Concept Formation (Multilingual) Synonyms Terms Design: Philipp Cimiano

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Words and Object Descriptions

Semantics on the Semantic Web

The “Lexical Semantic Web”

A Lexicon Model for Ontologies

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Dictionary: Words and Senses

 Represent interpretations of words through senses, very much like classes that are assigned to a word, e.g.

article

1.

2. An individual thing or element of a class… A particular section or item of a series in a written document… 3. A non-fictional literary composition that forms an independent part of a publication… 4. The part of speech used to indicate nouns and to specify their application 5. A particular part or subject; a specific matter or point (as provided by http://dictionary.reference.com/ )

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontology: Classes and Labels - I

  Ontologies assign labels (i.e. words) to a given class In the

COMMA

ontology on document management the class

article

corresponds to sense 2 (‘section of a written document’): http://pauillac.inria.fr/cdrom/ftp/ocomma/comma.rdfs

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontology Classes and Labels - II

 In the

article GOLD

ontology on linguistics, the class label corresponds to sense 4 (‘part of speech ’): http://emeld.org/gold

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

The Meaning of

Director

- I

The Semantic Web can be viewed as a large, distributed dictionary (or rather a semantic lexicon) in which we can look up the meaning of words, e.g.

director … as a ‘role’ (AgentCities ontology)

http://www-agentcities.doc.ic.ac.uk/ontology/shows.daml

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

The Meaning of

Director

- II

… as ‘head of a program’ (University Benchmark ontology)

http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Exploring the Lexical Semantic Web

Collect ontologies

 OntoSelect 

Analyse the use of class/property labels

Treat class/property labels as lexical entries

 Normalize  Organize by language

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontology Collection

OntoSelect

   Web Monitor on DAML, RDFS, OWL Files Download, Analyze and Store Included Information and Metadata    Class and Property Labels Multilingual Information Included Ontologies Ontology Ranking and Selection Functionalities

http://olp.dfki.de/OntoSelect

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

OntoSelect

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Multilinguality on the Semantic Web

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Multilingual Labels

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

“Lexical Semantic Ambiguity”

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Words and Object Descriptions

Semantics on the Semantic Web The “Lexical Semantic Web”

A Lexicon Model for Ontologies

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies – Example III

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies – Example III (continued)

studies_at

Student

located_at

Campus University

works_at is_part_of

“Fakultät” Staff

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies – Example III (continued)

studies_at

Student

located_at

Campus University

works_at

“Fakultät”

has_German_term

Fakultät

is_part_of has_Dutch_term has_US_English_term

Faculteit Staff School

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Ontologies – Example III (continued)

University “Fakultät”

is_part_of

Term

instance_of

Fakultät

language has_term language instance_of

faculteit

language

school DE NL

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

EN-US

Semiotic Triangle

   Ogden & Richards, 1923 based on Structural Linguistics studies (de Saussure, 1916) adopted in Knowledge Representation (e.g. Sowa, 1984)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

LingInfo Model – Simplified

rdf:type URI property ...

rdfs:Class rdfs:subClassOf feat:ClassWithFeats feat:ClassWithFeats o:FootballPlayer feat:ClassWithFeats o:Defender feat:lingFeat ...

rdfs: subClassOf feat:ClassWithFeats o:Midfielder feat:imgFeat feat:lingFeat lf:LingFeat lf:lang lf:term … “de” “Abwehrspieler” lf:LingFeat lf:lang lf:term … “de” “Mittelfeldspieler” rdfs:Class if:ImgFeat rdfs:Class lf:LingFeat ...

meta-classes classes if:ImgFeat instances if:color “#111111” lf:texture … “&keypatchSet_223 Design: Michael Sintek

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

LingInfo Model

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

LingInfo Instances - Example

Fußballspielers „of the football player“

lang

inst0 : LingInfo

de morphSynDecomp term Fußballspielers case gender number

inst2 : Stem

ortographicForm partOfSpeech isComposedOf … nominative male singular Fußballspieler Noun analysisIndex

inst3 : Stem

1 orthographicForm ...

Fußball isComposedOf function root semantics modifier case gender

inst1 : InflectedWordForm

genitive male number ortographicForm partOfSpeech wordForm … singular Fußballspielers Noun orthographicForm …

inst1 : Root

Spieler analysisIndex orthographicForm … root …

inst8 : Stem

2 Spieler

o:BallObject inst7 : Stem (Ball) inst5 : Stem (Fuß)

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

inst4 : Root (Ball) inst6 : Root (Fuß)

LingInfo Predicate-Arg Structure

Design: Anette Frank

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Conclusions

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

Conclusions

WordNet: Appropriate Use may include

  Introduction of underspecified senses (sense grouping) Tuning to a domain 

The “Lexical Semantic Web”

   The Semantic Web (and Web 2.0) is a potentially rich resource for (formal) lexical semantics Mining such resources for lexical semantics (i.e. compilation of a distributed semantic lexicon) only just started Ontologies to be extended with linguistic/lexical information

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia