+ Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon University Improving Vector Space Word Representations Using Multilingual Correlation.
Download
Report
Transcript + Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon University Improving Vector Space Word Representations Using Multilingual Correlation.
+
Manaal Faruqui and Chris Dyer
Language Technologies Institute
Carnegie Mellon University
Improving Vector Space
Word Representations
Using Multilingual
Correlation
+
Distributional Semantics
“You shall know a word by the company it keeps”
(Harris 1954; Firth, 1957)
…I will take what is mine with fire and blood…
…the end battle would be between fire and ice…
…My dragons are large and can breathe fire now…
…flame is the visible portion of a fire…
…take place whereby fires can sustain their own heat…
+
Translational Semantics
What other Information?
That plane can seat more than 300 people
तीन सौ से अधिक लोगों को बैठाने वाला वायय
ु ान …
रूसी वायय
ु ान बहुत बड़े हैं
Russian airplanes are huge
plane ≅ airplane
Multilingual Information!
(Bannard & Callison-Burch, 2005)
+
Outline
Distributional
Monolingual
context
Translational
Multilingual
Better
Using
Semantics
Semantics
context
Semantic Representations
Distributional + Translational semantics
Word Vector Representations
How to encode such co-occurrences?
contexts
words
+
day
night
…
cold
sleep
0
10
2
winter
3
3
50
10
12
9
…
the
+
Word Vector Representation
Latent Semantic Analysis
(Deerwester et al., 1990)
words
words
context
Singular Value Decomposition
+
Multilingual Information
English
dragon
German
Drache
French
dragon
Spanish
dragón
Problem ?
= Append
+
Multilingual Information
Disadvantages of Vector Concatenation
Vector
Size Increases
Idiosyncratic
What
✗
Info.
if word is OOV ?
?
+
Multilingual Information
So, what can we do?
…I will take what is mine
with fire and blood…
…the end battle would be
between fire and ice…
…My dragons are large and
can breathe fire now…
... Das Ende der Schlacht
würde zwischen Feuer und Eis
...
... gesehen ist Feuer eine
Oxidationsreaktion mit...
... Das Licht des Feuers ist eine
physikalische Erscheinung…
Two Views: Canonical Correlation Analysis !
+
Canonical Correlation Analysis
(CCA)
Project two sets of vectors (equal cardinality) in a
space where they are maximally correlated
CCA
Ω
Θ
Ω
≅
Θ
Convex Optimization Problem with Exact Solution !
+
Canonical Correlation Analysis
(CCA)
W, V = CCA(Ω, Θ)
n1
×
X
W
d1
Y
n2
k
d2
×
V
d2
k
d1
n1
X”
Y”
n2
k = min(r(Ω), r(Θ))
k
k
X” and Y” are now maximally correlated !
+
Canonical Correlation Analysis
(CCA)
Problems Addressed?
Vector
Size Increases, Doesn’t increase
Idiosyncratic
What
Information, Lets you choose!
if word is OOV?, Projection vectors for everyone!
+
Canonical Correlation Analysis
(CCA)
Ok, but equal cardinality sets Ω& Θ?
The
vocabularies cant be of equal size !
Get
word alignments from a parallel corpus
Preserve only words in the original vocabulary
For
every word in English, select the best foreign word
+
Experimental Setup
LSA Word Vector Learning
Monolingua
l Data
English
German
French
Spanish
News Corpus
WMT-2011
WMT-2011
WMT 2011-12
WMT-2011
Tokens
360,000,000
290,000,000
263,000,000
164,000,000
Types
180,000
294,000
137,000
145,000
Tokenizer and Lowercasing: WMT scripts
+
Experimental Setup
LSA Word Vector Learning
Parallel
Data
De-En
Fr-En
Es-En
News Comm
+ Europarl
WMT
WMT
WMT
Tokens
128,000,000
138,000,000
134,000,000
Word pairs
37,000
38,000
38,000
Word Alignment Tool: fast_align (Dyer et al, 2013)
+
Experimental Setup
LSA Word Vector Learning
Corpus Preprocessing
...hello… …hello… …hello… …hello… …hello…
Context :
23.45 , 21st , 10-20-2014 , 0.5e10
NUM
anchfgugsjh, wekjfbg, bhguyq
UNK
+
Experimental Setup
Evaluation Benchmarks
Word
Similarity Evaluation
WS-353 (Finkelstein et al, 2001)
WS-353-SIM (Agirre et al, 2009)
WS-353-REL (Agirre et al, 2009)
RG-65 (Rubenstein and Goodenough, 1965)
MC-30 (Miller and Charles, 1991)
MTurk-287 (Radinsky et al, 2011)
Word
Relation Evaluation
Semantic Relations (Mikolov et al, 2013)
Syntactic Relations (Mikolov et al, 2013)
+
Experimental Setup
Multilingual Vector Learning
Monolingual Vector
Multilingual Vector
The
Length: 80
Length: ?
length in projected space can be chosen: ‘k’
Choose
the best value of ‘k’ for WS-353
k ε[0.1, 0.2, …, 1.0]
Experimental Setup
Multilingual Vector Learning
Spearman’s correlation
+
Dimensions
Performance on WS-353; k = 0.6
+
Experimental Setup
Multilingual Vector Learning
70
60
50
Spearman’s
correlation
40
Monolingual
Multilingual
30
20
10
0
WS-353
RG-65
Mturk-287
+
Experimental Setup
Multilingual Vector Learning
35
30
25
Accuracy
20
Monolingual
Multilingual
15
10
5
0
Semantic
Syntactic
+
Experimental Setup
Multilingual Vectors: Neural Networks
RNNLM
(Mikolov et al, 2011)
Predict next word given the history
Neural language model
Recurrent hidden layer connections
Skip-Gram, word2vec
(Mikolov et al, 2013)
Predict context given the word
Removes hidden layer
Vocabulary represented in Huffman coding
+
Experimental Setup
Multilingual Vector Learning
50
45
40
35
30
25
20
15
10
5
0
80
70
60
50
40
30
20
10
0
RNNLM
Mono
Multi
Skip-Gram
+
Experimental Setup
Multilingual Vectors: Scaling
Spearman’s correlation on WS-353
+
Experimental Setup
Multilingual Vectors: Qualitative Analysis
Antonyms and Synonyms of “Beautiful”: Monolingual Setting
t-SNE tool (van der Maaten and Hinton, 2008)
+
Experimental Setup
Multilingual Vectors: Qualitative Analysis
Antonyms and Synonyms of “Beautiful”: Multilingual Setting
t-SNE tool (van der Maaten and Hinton, 2008)
+
Conclusion
CCA: Easy
to use tool in MATLAB
Take vectors from two languages and improve them.
Multilingual
Even if the problems are inherently monolingual.
More
Information is Important
Effective for Distributional Vectors
Semantics generalizes better than Syntax.
Vectors
available at: http://cs.cmu.edu/~mfaruqui
+
Related Work
Document
representation
Bilingual
word vectors
Vinokourov et al, 2002,
Klementiev et al 2012
Platt et al, 2010
Zou et al, 2013
Synonymy
and Paraphrasing
Bannard and Burch, 2005,
Ganitkevitch et al, 2013
Bilingual
lexicon induction
Haghighi et al, 2008
Vulic and Moens, 2013
Translation
Models
Kalbrenner & Blunsom, 2013
Compositional
Semantics
Hermann & Blunsom, 2014
+
Thanks!
Visit us at ACL-demo:
wordvectors.org