Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu

Download Report

Transcript Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu

Dimension of Meaning
Author: Hinrich Schutze
Presenter: Marian Olteanu
Introduction




Represent context as vectors
Dimensions of space – words
Initial vectors – determined by word
occurrence
This paper – reduce dimensionality by
singular value decomposition

Applications


WSD
Thesaurus induction
Introduction

Classic scheme in IR


Extension – represent contexts as vectors of
words within a fixed window


Documents are represented as vectors of words
in term space
Disadvantage – content can be expressed with
different words, close in meaning
This approach

Represent words as term vectors that reflect their
pattern of usage in a large corpus
Introduction

Dimension in this
space:



Cash
Sport
Measure

Cosine of the angle
between vectors
Introduction

Compute a representation of context more
robust than bag-of-words


Centroid (normalized average) of the vectors of
the words in a context
Practical applications


Thousands of dimensions (words)
Matrix of concurrence with only 10% zeros
Application

WSD

Done by clustering the
contexts



AutoClass
Buckshot
Assign a sense for each
cluster
Word space
Window size, dimension sets
Discussion

Resembles LSI


Uses SVD
Purpose of space reduction


LSI – improve the quality of representation
(because of null values)
This paper



Reducing the computation
Detection of term dependencies (similar terms)
SVD doesn’t influence accuracy of WSD
Discussion

Small number of parameters (thousands)
compared to other statistical approaches (i.e.:
trigrams)