WordNet: An Overview

Download Report

Transcript WordNet: An Overview

WordNet: An Overview
Anubhav Madan
[email protected]
7/8/2015
- WordNet - Anubhav Madan
1
Today’s Discussion





WordNet: A Lexical Database
WordNet::Similarity
Some More Applications
Limitations
Tutorial
7/8/2015
- WordNet - Anubhav Madan
2
WordNet: A Lexical Database
Started in 1985
 Basic Unit: Synset
 Hierarchical arrangement w/r/t
definition
 Contains compounds phrasal verbs,
collocations, and idiomatic phrases
 {Bad Person} @ {offender, libertine}
 Establishes a rich, dense network and
establishes
text
coherence
7/8/2015
- WordNet - Anubhav Madan

3
WordNet: The Facts




A word or phrase is the basic unit
Words are organized into synsets, which are
a group of units that have the same sense.
A gloss is a textual definition of the synset
Words organized into hierarchies




hypernym/hyponym {concept} IS-A {concept}
meronym/holonym {concept} HAS-PART
{concept}
Types: Nouns, Verbs, Adjectives
80,000 Nouns organized into 60,000 concepts
7/8/2015
- WordNet - Anubhav Madan
4
WordNet: Architecture
X-Windows
Lexicographers
Application 1
Application 2
Lexical
Source
Files
Grinder
The
WordNet
Database
Application 3
Application 4
Application N
7/8/2015
- WordNet - Anubhav Madan
5
WordNet: Architecture


Word/synset pairs stored in the WordNet DB.
{Word/List of Word Forms, Pointer to Lexical File, frames (for
verbs), list of elements, (optional gloss), adjective cluster}

{apple, edible_fruit,@ (fruit with red or yellow or green skin and
crisp whitish flesh) }

Indexes: Senses are Ordered




Index of Familarity – How well known is the word.
Index and Data Files
Sense Index
The Grinder as a Converter: takes Lexical Source Files written
by Lexiographers and converts them into a format that is
understandable and updatable for WN.
7/8/2015
- WordNet - Anubhav Madan
6
Today’s Discussion





WordNet: A Lexical Database
WordNet::Similarity
Some More Applications
Limitations
Tutorial
7/8/2015
- WordNet - Anubhav Madan
7
WordNet::Similarity


An application measuring “closeness” of
concepts in terms of their definitions
Main categories of measures:

Path based



7/8/2015
Depth based
Information Content Based
Gloss Based
- WordNet - Anubhav Madan
8
WordNet: Similarity Measures

Path Finder

Depth Finder




Path: Inverse of the Shortest Path measures
Information Content Finder




Resnik: Max Distance b/w concepts of both words
Jcn (Jiang and Conrath): Inverses the difference between Sum and LCS
Lin: Scales LCS IC with the description
Gloss Finder



Wup (Wu and Palmer): Shortest path by scaling sum of values b/w node, root
Lch: (Leacock and Chodrow) Shortest path by scaling the max path
Lesk (Banerjee and Pederson): Finds and scores overlaps between glosses
Vector (Padwardhan): Creates a co-occurrence matrix with glosses in vectors
Hso (Hirst and St-Onge): Specifies Direction between Words
Demo
7/8/2015
- WordNet - Anubhav Madan
9
Root
2
LCH
D=5
1
1
1
1
1
Lch Related (Money-Credit) = -log (2/10)
= 0.70
7/8/2015
- WordNet - Anubhav Madan
10
Root
2
WUP
D=5
1
1
1
1
1
Wup ConSim (Money-Credit) = 4/6 = 0.67
7/8/2015
- WordNet - Anubhav Madan
11
Root
2
Path

D=5
Inverse of the Shortest
Path Measures
1
1
Path (Money-Credit)
= 1/ min[0.70, 0.67]
= 1/0.67
= 1.5
7/8/2015
1
1
1
- WordNet - Anubhav Madan
12
6/6
Resnik
3/6
2/6
2/6
1/6
1/6
Resnik Sim (Money-Credit) = -log (3/6) = 0.30
7/8/2015
- WordNet - Anubhav Madan
13
6/6
Lin
3/6
2/6
2/6
1/6
1/6
Lin Sim (Money-Credit) = log (6/6 – 3/6) = 0.30
7/8/2015
- WordNet - Anubhav Madan
14
6/6
JCN
3/6
2/6
2/6
1/6
1/6
Jcn Dist (Money-Coin)
= log (3/6) + log (2/6) – 2*log(6/6)
= 0.301 + 0.477 = 0.878
7/8/2015
- WordNet - Anubhav Madan
15
Lesk
7/8/2015
- WordNet - Anubhav Madan
16
Vector
7/8/2015
- WordNet - Anubhav Madan
17
HSO



Classfies the relations in WordNet as
having directions.
The Is-a relations are upwards. The
has-part are horizontal.
Establishes a relationship b/w words
through a path that is neither too long
nor changes direction very often.
7/8/2015
- WordNet - Anubhav Madan
18
Demo
7/8/2015
- WordNet - Anubhav Madan
19
Today’s Discussion





WordNet: A Lexical Database
WordNet::Similarity
Some More Applications
Limitations
Tutorial
7/8/2015
- WordNet - Anubhav Madan
20
Applications




Building Semantic Concordances
Performance and Confidence in a
Semantic Annotation Resnik Similarity
Measure in Class Based Probabilities
Lch WordNet Similarity Measure in
Word Sense Identification
Text Retrieval using Wordnet
7/8/2015
- WordNet - Anubhav Madan
21
Applications




Lexical Chains as Representations of
Context for the Detection of Correction
of Malapropisms
Temporal Indexing through Lexical
Chaining
COLOR-X
Knowledge Processing on an Extended
WordNet
7/8/2015
- WordNet - Anubhav Madan
22
Further Speculation




Sense Disambiguation
Information Retrieval
Semantic Relations and Textual
Coherence
Knowledge engineering
7/8/2015
- WordNet - Anubhav Madan
23
The Limitations





Relation IS-NOT or NOT-A-KIND-OF is inexpressible
Relation IS-USED-AS-A-KIND-OF is also
inexpressible
No Explicit Distinction between Proper and
Common Nouns – It was too difficult to include this
information
Does not attempt to identify “basic-level” or
“generic” categories. For the concepts in the
middle of the lexical hierarchy, there can be many
listed features that can identify the differences
between words. WordNet doesn’t support this.
Not enough semantic relations in Wordnet.
7/8/2015
- WordNet - Anubhav Madan
24
Tutorial





What is WordNet?
Why is WordNet unique?
What is the difference between
WordNet and WordNet::Similarity
What are some of the limiting
features?
Give an example of a human scenario,
where WordNet would be instrumental
7/8/2015
- WordNet - Anubhav Madan
25
Tutorial
What Similarity measure would you
use if you had only the following
information:





7/8/2015
Path
[linkages between words in an ontology]
Information Content of the Words
Gloss of the Words
An ontology with direction
- WordNet - Anubhav Madan
26
References










Overview: Pedersen, Ted and Patwardhan, Siddharth, and Michelizzi, Jason "WordNet::Similarity Measuring the Relatedness of Concepts" In: Proceedings of Fifth Annual Meeting of the North American
Chapter of the Association for Computational Linguistics (NAACL-04), pp. 38-41, Boston, May 2004.
Lch: Leacock, C., and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense
identification. In Fellbaum, C., ed., WordNet: An electronic lexical database. MIT Press. 265–283.
Wup: Wu, Z., and Palmer, M. 1994. Verb semantics and lexical selection. In 32nd Annual Meeting of the
Association for Computational Linguistics, 133–138.
Res: Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In
Proceedings of the 14th International Joint Conference on Artificial Intelligence, 448–453.
Lin: Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the International
Conference on Machine Learning.
Jcn: Jiang, J., and Conrath, D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In
Proceedings on International Conference on Research in Computational Linguistics, 19–33.
Hso: Hirst, G., and St-Onge, D. 1998. Lexical chains as representations of context for the detection and
correction of malapropisms. In Fellbaum, C., ed., WordNet: An electronic lexical database. MIT Press. 305–
332.
Lesk: Banerjee, S., and Pedersen, T. 2003. Extended gloss overlaps as a measure of semantic relatedness.
In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, 805–810.
Vector: Patwardhan, S. 2003. Incorporating dictionary and corpus information into a context vector
measure of semantic relatedness. Master’s thesis, Univ. of Minnesota, Duluth.
Links availiable at: http://www.comp.nus.edu.sg/~anubhavm/reading.htm
7/8/2015
- WordNet - Anubhav Madan
27
Thank You
Anubhav Madan
[email protected]
7/8/2015
- WordNet - Anubhav Madan
28