Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu
Download
Report
Transcript Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu
Dimension of Meaning
Author: Hinrich Schutze
Presenter: Marian Olteanu
Introduction
Represent context as vectors
Dimensions of space – words
Initial vectors – determined by word
occurrence
This paper – reduce dimensionality by
singular value decomposition
Applications
WSD
Thesaurus induction
Introduction
Classic scheme in IR
Extension – represent contexts as vectors of
words within a fixed window
Documents are represented as vectors of words
in term space
Disadvantage – content can be expressed with
different words, close in meaning
This approach
Represent words as term vectors that reflect their
pattern of usage in a large corpus
Introduction
Dimension in this
space:
Cash
Sport
Measure
Cosine of the angle
between vectors
Introduction
Compute a representation of context more
robust than bag-of-words
Centroid (normalized average) of the vectors of
the words in a context
Practical applications
Thousands of dimensions (words)
Matrix of concurrence with only 10% zeros
Application
WSD
Done by clustering the
contexts
AutoClass
Buckshot
Assign a sense for each
cluster
Word space
Window size, dimension sets
Discussion
Resembles LSI
Uses SVD
Purpose of space reduction
LSI – improve the quality of representation
(because of null values)
This paper
Reducing the computation
Detection of term dependencies (similar terms)
SVD doesn’t influence accuracy of WSD
Discussion
Small number of parameters (thousands)
compared to other statistical approaches (i.e.:
trigrams)