Transcript Document

IJCNLP2008 Jan 10, 2008
Gloss-based Semantic Similarity
Metrics for Predominant Sense
Acquisition
Ryu Iida
Nara Institute of Science and Technology
Diana McCarthy and Rob Koeling
University of Sussex
1
IJCNLP2008 Jan 10, 2008
Word Sense Disambiguation

Predominant sense acquisition
 Exploited
as a powerful back-off strategy for
word sense disambiguation

McCarthy et al (2004):
 Achieved
64% precision on Senseval2 allwords task
 Strongly relies on linguistic resources such
as WordNet for calculating the semantic
similarity
 Difficulty: porting it to other languages
2
IJCNLP2008 Jan 10, 2008
Focus

How to calculate the semantic similarity
score without semantic relations such as
hyponym

Explore the potential use of the word
definitions (glosses) instead of WordNetstyle resources for porting McCarthy et
al.’s method to other languages
3
IJCNLP2008 Jan 10, 2008
Table of contents
1.
Task
2.
Related work: McCarthy et al (2004)
3.
Gloss-based semantic similarity metrics
4.
Experiments

5.
WSD on the two datasets: EDR and
Japanese Senseval2 task
Conclusion and future directions
4
IJCNLP2008 Jan 10, 2008
Word Sense Disambiguation (WSD) task

select the correct sense of the word
appearing in the context
I ate fried chicken last Sunday.
sense id
gloss
1 a common farm bird that is kept for its meat and eggs
2 the meat from this bird eaten as food
3 informal someone who is not at all brave
4 a game in which children must do something dangerous
to show that they are brave

Supervised approaches have been mainly
applied to learn the context
5
IJCNLP2008 Jan 10, 2008
Word Sense Disambiguation (WSD) task
(Cont’d)

Estimate the most predominant sense of
a word regardless of its context

English coarse-grained all words task
(2007)
 Choosing
most frequent senses: 78.9%
 Best performing system: 82.5%

Systems using a first sense heuristic
have relied on sense-tagged data
 However,
sense-tagged data is expensive
6
IJCNLP2008 Jan 10, 2008
McCarthy et al. (2004)’s unsupervised approach

Extract top N neighbour words of the target
word according to the distributional similarity
score (simds)

Calculate the prevalent score of each sense




Calculate simds weighted by the semantic similarity
score (simss)
Sum up all the weighted simds of top N neighbours
Semantic similarity: estimated from linguistic
resources (e.g. WordNet)
Output the sense which has the maximum
prevalent score
7
IJCNLP2008 Jan 10, 2008
McCarthy et al. (2004)’s approach: An example
chicken
sense2: the meat from this bird eaten as food.
sense3: informal someone who is not at all brave.
neighbour
simds
simss(word, sense2)
weighted simds
turkey
0.1805
0.15
0.0271
meat
0.1781
0.20
...
...
= 0.0365
tomato
0.1573
distributional
similarity score

...
...
0.10
0.0157
semantic similarity
score (from WordNet)
prevalence(sense2) = 0.0271 + 0.0365 + ... + 0.0157
= 0.152
8
IJCNLP2008 Jan 10, 2008
McCarthy et al. (2004)’s approach: An example
chicken
sense2: the meat from this bird eaten as food.
sense3: informal someone who is not at all brave.
neighbour
simds
simss(word, sense3)
weighted simds
turkey
0.1805
0.01
0.0018
meat
0.1781
0.02
...
...
= 0.0037
tomato
0.1573

...
0.01
...
0.0016
prevalence(sense2) = 0.152
prevalence(sense3) = 0.0018 + 0.0037 + ... + 0.0016
= 0.023
prevalence(sense2) > prevalence(sense3)
 predominant sense: sense2
9
IJCNLP2008 Jan 10, 2008
Problem
While the McCarthy et al.’s method
works well for English, other inventories
do no always have WordNet-style
resources to tie the nearest neighbors to
the sense inventory
 While traditional dictionaries do not
organise senses into synsets, they do
typically have sense definitions (glosses)
associated with the senses

10
IJCNLP2008 Jan 10, 2008
Gloss-based similarity

Calculate similarity between two glosses
in a dictionary as semantic similarity
simlesk: simply calculate the overlap of
the content words in the glosses of the
two word senses
 simDSlesk: use distributional similarity as
an approximation of semantic distance
between the words in the two glosses

11
IJCNLP2008 Jan 10, 2008
lesk: Example
word
gloss
chicken the meat from this bird eaten as food
turkey the meat from a turkey eaten as food

simlesk(chicken, turkey) = 2
 “meat”
and “food” are overlapped in two
glosses
12
IJCNLP2008 Jan 10, 2008
lesk: Example
word
gloss
chicken the meat from this bird eaten as food
tomato a round soft red fruit eaten raw or cooked
as a vegetable

simlesk(chicken, tomato) = 0
 No
overlap in two glosses
13
IJCNLP2008 Jan 10, 2008
DSlesk

Calculate distributional similarity scores of any
pairs of nouns in two glosses
simds(meat, fruit) = 0.1625, simds(meat, vegetable) = 0.1843,
simds(bird, fruit) = 0.1001, simds(bird, vegetable) = 0.0717,
simds(food, fruit) = 0.1857, simds(food, vegetable) = 0.1772

Output the average of the maximum
distributional similarity of all the nouns in
target word
simDSlesk (chicken, tomato)
= 1/3 (0.1843 + 0.1001 + 0.1857) = 0.1557
14
IJCNLP2008 Jan 10, 2008
DSlesk
sim DSlesk ( wsi , n)  max sim ( wsi , ws j )
ws j W S( n )
 max sim ( gi , g j )
g i : gloss of word sense
wsi
1
sim ( gi , g j ) 
max simds (a, b)

| a  gi | agi bg j
a (b) : noun appearing in g i ( g j )
15
IJCNLP2008 Jan 10, 2008
Apply Gloss-based similarity to McCarthy et
al.’s approach
chicken
sense2: the meat from this bird eaten as food.
sense3: informal someone who is not at all brave.
neighbour
simds
simDSlesk(word, sense2)
weighted simds
turkey
0.1805
0.3453
0.0623
meat
0.1781
0.2323
...
...
= 0.0414
tomato
0.1573

...
0.1557
...
0.0245
prevalence(sense2) = 0.0623 + 0.0414 + ... + 0.0245
= 0.2387
16
IJCNLP2008 Jan 10, 2008
Table of contents
1.
Task
2.
Related work: McCarthy et al (2004)
3.
Gloss-based semantic similarity metrics
4.
Experiments

5.
WSD on the two datasets: EDR and
Japanese Senseval2 task
Conclusion and future directions
17
IJCNLP2008 Jan 10, 2008
Experiment 1: EDR

Dataset: EDR corpus
 3,836
polysemous nouns (183,502
instances)

Adopt the similarity score proposed by
Lin (1998) as the distributional similarity
score
 9-years
Mainichi newspaper articles and 10years Nikkei newspaper articles
 Japanese dependency parser CaboCha (Kudo
and Matsumoto, 2002)

Use 50 nearest neighbors in line with
McCarthy et al. (2004)
18
IJCNLP2008 Jan 10, 2008
Methods

Baseline
 Select
one word sense at random for each
word token and average the precision over
100 trials

Unsupervised: McCarthy et al. (2004)
 Semantic
similarity:
Jiang and Conrath (1997) (jcn), lesk, DSlesk

Supervised (Majority)
 Use
hand-labeled training data for obtaining
the predominant sense of the test words
19
IJCNLP2008 Jan 10, 2008
Results: EDR
baseline
jcn
lesk
DSlesk
upper-bound
supervised

recall
0.402
precision
0.402
0.495
0.474
0.495
0.488
0.495
0.745
0.495
0.745
0.731
0.731
DSlesk is comparable to jcn without the requirement
for semantic relations such as hyponymy
20
IJCNLP2008 Jan 10, 2008
Results: EDR (Cont’d)
baseline
jcn
lesk
DSlesk
upper-bound
supervised

all
freq ≤ 10
freq ≤ 5
0.402
0.495
0.474
0.495
0.745
0.731
0.405
0.445
0.448
0.453
0.674
0.519
0.402
0.431
0.426
0.433
0.639
0.367
All methods for finding a predominant sense
outperform the supervised one for item with little data
(≤ 5), indicating that these methods robustly work
even for low frequency data where hand-tagged data is
unreliable
21
IJCNLP2008 Jan 10, 2008
Experiment 2 and Results: Senseval2 in
Japanese
50 nouns (5,000 instances)
 4 methods

 lesk,
DSlesk, baseline, supervised
baseline
lesk
DSlesk
upper-bound
supervised
precision = recall
fine-grained
coarse-grained
0.282
0.399
0.344
0.501
0.386
0.747
0.593
0.834
0.742
0.842
sense-id: 105-0-0-2-0 fine-grained
coarse-grained
22
IJCNLP2008 Jan 10, 2008
Conclusion

We examined different measures of
semantic similarity for finding a first
sense heuristic for WSD automatically in
Japanese

We defined a new gloss-based similarity
(DSlesk) and evaluated the performance
on two Japanese WSD datasets (EDR and
Senseval2), outperforming lesk and
achieving a performance comparable to
the jcn method which relies on hyponym
links which are not always available
23
IJCNLP2008 Jan 10, 2008
Future directions
Explore other information in the glosses,
such as words of other POS and
predicate-argument relations
 Group fine-grained word senses into
clusters, making the task suitable for
NLP applications (Ide and Wilks, 2006)
 Use the results of predominant sense
acquisition as a prior knowledge of other
approaches

 Graph-based
approaches (Mihalcea 2005,
Nastase 2008)
24