Linguistic Regularities in Sparse and Explicit Word

Transcript Linguistic Regularities in Sparse and Explicit Word

Linguistic Regularities in Sparse and Explicit Word Representations

Omer Levy Yoav Goldberg Bar-Ilan University Israel

Papers in ACL 2014* * Sampling error: +/- 100% Other Topics Neural Networks & Word Embeddings

Neural Embeddings • Dense vectors • Each dimension is a latent feature • Common software package: word2vec 𝐼𝑡𝑎𝑙𝑦: (−7.35, 9.42, 0.88, … ) ∈ ℝ 100 •

“Magic”

king − man + woman = queen

(analogies)

Representing words as vectors is not new!

Explicit Representations (Distributional) • Sparse vectors • Each dimension is an explicit context • Common association metric: PMI, PPMI 𝐼𝑡𝑎𝑙𝑦: 𝑅𝑜𝑚𝑒: 17, 𝑝𝑎𝑠𝑡𝑎: 5, 𝐹𝑖𝑎𝑡: 2, … ∈ ℝ 𝑉𝑜𝑐𝑎𝑏 ≈100,000 • Does the same “magic” work for explicit representations too?

• Baroni et al. (2014) showed that embeddings outperform explicit, but…

Questions •

Are analogies unique to neural embeddings?

Compare neural embeddings with explicit representations •

Why does vector arithmetic reveal analogies?

Unravel the mystery behind neural embeddings and their “magic”

Background

Mikolov et al. (2013a,b,c) • Neural embeddings have interesting geometries

Mikolov et al. (2013a,b,c) • Neural embeddings have interesting geometries • These patterns capture “relational similarities” • Can be used to solve analogies: man is to woman as king is to queen

Mikolov et al. (2013a,b,c) • Neural embeddings have interesting geometries • These patterns capture “relational similarities” • Can be used to solve analogies: 𝑎 is to 𝑎 ∗ as 𝑏 is to 𝑏 ∗ • Can be recovered by “simple” vector arithmetic: 𝑎 − 𝑎 ∗ = 𝑏 − 𝑏 ∗

Mikolov et al. (2013a,b,c) • Neural embeddings have interesting geometries • These patterns capture “relational similarities” • Can be used to solve analogies: 𝑎 is to 𝑎 ∗ as 𝑏 is to 𝑏 ∗ • With simple vector arithmetic: 𝑎 − 𝑎 ∗ = 𝑏 − 𝑏 ∗

Mikolov et al. (2013a,b,c) 𝑎 − 𝑎 ∗ = 𝑏 − 𝑏 ∗

Mikolov et al. (2013a,b,c) 𝑏 − 𝑎 + 𝑎 ∗ = 𝑏 ∗

Mikolov et al. (2013a,b,c) 𝑏 king − 𝑎 man + 𝑎 ∗ woman = 𝑏 ∗ queen

Mikolov et al. (2013a,b,c) 𝑏 Tokyo − 𝑎 Japan + 𝑎 ∗ France = 𝑏 ∗ Paris

Mikolov et al. (2013a,b,c) 𝑏 best − 𝑎 good + 𝑎 ∗ strong = 𝑏 ∗ strongest

Mikolov et al. (2013a,b,c) 𝑏 best − 𝑎 good + 𝑎 ∗ strong = 𝑏 ∗ strongest vectors in ℝ 𝑛

Are analogies unique to neural embeddings?

• Experiment: compare embeddings to explicit representations

Are analogies unique to neural embeddings?

• Experiment: compare embeddings to explicit representations

Are analogies unique to neural embeddings?

• Experiment: compare embeddings to explicit representations • Learn different representations from the same corpus:

Are analogies unique to neural embeddings?

• Experiment: compare embeddings to explicit representations • Learn different representations from the same corpus: • Evaluate with the same recovery method: arg max 𝑏 ∗ cos 𝑏 ∗ , 𝑏 − 𝑎 + 𝑎 ∗

Analogy Datasets • 4 words per analogy: 𝑎 is to 𝑎 ∗ as 𝑏 is to 𝑏 ∗ • Given 3 words: 𝑎 is to 𝑎 ∗ as 𝑏 is to ?

• Guess the best suiting • 𝑏 ∗ from the entire vocabulary 𝑉 Excluding the question words 𝑎 , 𝑎 ∗ , 𝑏 • •

MSR: Google:

~ 8000 syntactic analogies ~ 19,000 syntactic and semantic analogies

Embedding vs Explicit (Round 1)

Embedding vs Explicit (Round 1) 70% 60% 50% 40% 30% 20% 10% Embedding 54% Explicit 29% Embedding 63% Explicit 45% 0% MSR Google Many analogies recovered by explicit, but many more by embedding.