Slide 1

Transcript Slide 1

Dimensionality Reduction
Dimensionality Reduction
• High-dimensional == many features
• Find concepts/topics/genres:
– Documents:
• Features: Thousands of words, millions of word pairs
– Surveys – Netflix: 480k users x 177k movies
Slides by Jure Leskovec
2
Dimensionality Reduction
• Compress / reduce dimensionality:
– 106 rows; 103 columns; no updates
– random access to any cell(s); small error: OK
Slides by Jure Leskovec
3
Dimensionality Reduction
• Assumption: Data lies on or near a low
d-dimensional subspace
• Axes of this subspace are effective
representation of the data
Slides by Jure Leskovec
4
Why Reduce Dimensions?
Why reduce dimensions?
• Discover hidden correlations/topics
– Words that occur commonly together
• Remove redundant and noisy features
– Not all words are useful
• Interpretation and visualization
• Easier storage and processing of the data
Slides by Jure Leskovec
5
SVD - Definition
A[m x n] = U[m x r]  [ r x r] (V[n x r])T
• A: Input data matrix
– m x n matrix (e.g., m documents, n terms)
• U: Left singular vectors
– m x r matrix (m documents, r concepts)
• : Singular values
– r x r diagonal matrix (strength of each ‘concept’)
(r : rank of the matrix A)
• V: Right singular vectors
– n x r matrix (n terms, r concepts)
Slides by Jure Leskovec
6
SVD
T
n
n
m
A


m
VT
U
Slides by Jure Leskovec
7
SVD
T
n
m
A
1u1v1

2u2v2
+
Slides by Jure Leskovec
σi … scalar
ui … vector
vi … vector8
SVD - Properties
It is always possible to decompose a real
matrix A into A = U  VT , where
• U, , V: unique
• U, V: column orthonormal:
– UT U = I; VT V = I (I: identity matrix)
– (Cols. are orthogonal unit vectors)
• : diagonal
– Entries (singular values) are positive,
and sorted in decreasing order (σ1  σ2  ...  0)
Slides by Jure Leskovec
9
SVD – Example: Users-to-Movies
Serenity
Casablanca
Amelie
1
2
SciFi
1
5
0
Romnce 0
0
Alien
Matrix
• A = U  VT - example:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
10
SVD – Example: Users-to-Movies
Serenity
Casablanca
Amelie
1
2
SciFi
1
5
0
0
Romnce
0
Alien
Matrix
• A = U  VT - example:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
SciFi-concept
Romance-concept
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
11
SVD - Example
Serenity
Casablanca
Amelie
1
2
SciFi
1
5
0
0
Romnce
0
Alien
Matrix
• A = U  VT - example:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
U is “user-to-concept”
similarity matrix
SciFi-concept
Romance-concept
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
12
SVD - Example
Serenity
Casablanca
Amelie
1
2
SciFi
1
5
0
0
Romnce
0
Alien
Matrix
• A = U  VT - example:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
‘strength’ of SciFi-concept
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
13
SVD - Example
Serenity
Casablanca
Amelie
1
2
SciFi
1
5
0
0
Romnce
0
Alien
Matrix
• A = U  VT - example:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
V is “movie-to-concept”
similarity matrix
0
SciFi-concept
0
9.64 0
0
x
x
0
5.29
0
0.53
0.80
0.58 0.58 0.58 0
0
0.27
0
0
0
0.71 0.71
Slides by Jure Leskovec
14
SVD - Example
Serenity
Casablanca
Amelie
1
2
SciFi
1
5
0
0
Romnce
0
Alien
Matrix
• A = U  VT - example:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
V is “movie-to-concept”
similarity matrix
0
SciFi-concept
0
9.64 0
0
x
x
0
5.29
0
0.53
0.80
0.58 0.58 0.58 0
0
0.27
0
0
0
0.71 0.71
Slides by Jure Leskovec
15
SVD - Interpretation #1
‘movies’, ‘users’ and ‘concepts’:
• U: user-to-concept similarity matrix
• V: movie-to-concept sim. matrix
• : its diagonal elements:
‘strength’ of each concept
Slides by Jure Leskovec
16
SVD gives best axis
to project on:
• ‘best’ = min sum
of squares of
projection errors
• minimum
reconstruction
error
Movie 2 rating
SVD - interpretation #2
first right
singular vector
v1
Movie 1 rating
Slides by Jure Leskovec
17
• A = U  VT - example:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
Movie 2 rating
SVD - Interpretation #2
first right
singular
vector
v1
Movie 1 rating
9.64 0
0
5.29
x
v1
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
18
SVD - Interpretation #2
• A = U  VT - example:
variance (‘spread’)
on the v1 axis
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
19
SVD - Interpretation #2
More details
• Q: How exactly is dim. reduction done?
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
20
SVD - Interpretation #2
More details
• Q: How exactly is dim. reduction done?
• A: Set the smallest singular values to zero
A=
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
21
SVD - Interpretation #2
More details
• Q: How exactly is dim. reduction done?
• A: Set the smallest singular values to zero
A=
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
~
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
0
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
22
SVD - Interpretation #2
More details
• Q: How exactly is dim. reduction done?
• A: Set the smallest singular values to zero:
A=
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
~
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
0
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
23
SVD - Interpretation #2
More details
• Q: How exactly is dim. reduction done?
• A: Set the smallest singular values to zero:
A=
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
~
0.18
0.36
0.18
0.90
0
0
0
x
Slides by Jure Leskovec
9.64
x
0.58 0.58 0.58 0
0
24
SVD - Interpretation #2
More details
• Q: How exactly is dim. reduction done?
• A: Set the smallest singular values to zero
B=
A=
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
~
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Slides by Jure Leskovec
Frobenius norm:
ǁMǁF = Σij Mij2
ǁA-BǁF = Σij (Aij-Bij)2
is “small”
25
Sigma
A
U
=
VT
B is approx A
Sigma
B
=
U
VT
Slides by Jure Leskovec
26
SVD – Best Low Rank Approx.
• Theorem: Let A = U  VT
then B = U S VT
(σ1σ2…, rank(A)=r)
– S = diagonal nxn matrix where si=σi (i=1…k) else si=0
is a best rank-k approximation to A:
– B is solution to minB ǁA-BǁF where rank(B)=k
𝜎11
Σ
𝜎𝑟𝑟
• We will need 2 facts:
– 𝑀 𝐹 = 𝑖 𝑞𝑖𝑖 2 where M = P Q R is SVD of M
– U  VT - U S VT = U ( - S) VT
Slides by Jure Leskovec
27
SVD – Best Low Rank Approx.
• We will need 2 facts:
– 𝑀
𝐹
=
𝑘
𝑞𝑘𝑘
2
where M = P Q R is SVD of M
We apply:
-- P column orthonormal
-- R row orthonormal
-- Q is diagonal
– U  VT - U S VT = U ( - S) VT
Slides by Jure Leskovec
28
SVD – Best Low Rank Approx.
• A = U  VT , B = U S VT (σ1σ2…  0, rank(A)=r)
– S = diagonal nxn matrix where si=σi (i=1…k) else si=0
then B is solution to minB ǁA-BǁF , rank(B)=k
 Why?
r
2
min A  B F  min   S F  min si  ( i  si )
B , rank ( B )  k
i 1
U  VT - U S VT = U ( - S) VT
• We want to choose si to minimize
we set si=σ
(i=1…k)
else
s
=0
i
i
k
r
r
 min si  ( i  si ) 
2
i 1

i Leskovec
k 1
Slides by Jure
2
i

𝑖

i  k 1
𝜎𝑖 − 𝑠𝑖
2
2
i
29
SVD - Interpretation #2
Equivalent:
‘spectral decomposition’ of the matrix:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
u1
u2
x
σ1
x
σ2
v1
v2
Slides by Jure Leskovec
30
SVD - Interpretation #2
Equivalent:
‘spectral decomposition’ of the matrix
m
n
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
k terms
=
σ1
u1
nx1
vT 1
+
σ2
u2
vT 2
+...
1xm
Assume: σ1  σ2  σ3  ...  0
Why is setting small σs the thing to do?
Vectors ui and vi are unit length, so σi
scales them.
So, zeroing small σs introduces less error.
Slides by Jure Leskovec
31
SVD - Interpretation #2
Q: How many σs to keep?
A: Rule-of-a thumb:
keep 80-90% of ‘energy’ (=σi2)
m
n
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
σ1
u1
vT 1
+
σ2
u2
vT 2
+...
assume: σ1  σ2  σ3  ...
Slides by Jure Leskovec
32
SVD - Complexity
• To compute SVD:
– O(nm2) or O(n2m) (whichever is less)
• But:
– Less work, if we just want singular values
– or if we want first k singular vectors
– or if the matrix is sparse
• Implemented in linear algebra packages like
– LINPACK, Matlab, SPlus, Mathematica ...
Slides by Jure Leskovec
33
SVD - Conclusions so far
• SVD: A= U  VT: unique
– U: user-to-concept similarities
– V: movie-to-concept similarities
–  : strength of each concept
• Dimensionality reduction:
– keep the few largest singular values
(80-90% of ‘energy’)
– SVD: picks up linear correlations
Slides by Jure Leskovec
34
Case study: How to query?
Serenity
Casablanca
Amelie
1
2
SciFi
1
5
0
0
Romnce
0
Alien
Matrix
Q: Find users that like ‘Matrix’ and ‘Alien’
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
35
Case study: How to query?
Serenity
Casablanca
Amelie
1
2
SciFi
1
5
0
0
Romnce
0
Alien
Matrix
Q: Find users that like ‘Matrix’
A: Map query into a ‘concept space’ – how?
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
x
0
0.53
0.80
0.27
Slides by Jure Leskovec
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 360.71
Case study: How to query?
Amelie
0
0
0
Alien
Casablanca
5 0
Serenity
q=
Alien
Matrix
Q: Find users that like ‘Matrix’
A: map query vectors into ‘concept space’ – how?
q
v2
v1
Project into concept space:
Inner product with each
‘concept’ vector vi
Matrix
Slides by Jure Leskovec
37
Case study: How to query?
Amelie
0
0
0
Alien
Casablanca
5 0
Serenity
q=
Alien
Matrix
Q: Find users that like ‘Matrix’
A: map the vector into ‘concept space’ – how?
q
v2
v1
Project into concept space:
Inner product with each
‘concept’ vector vi
q*v1
Matrix
Slides by Jure Leskovec
38
Case study: How to query?
Compactly, we have:
qconcept = q V
Casablanca
Amelie
5 0
Serenity
q=
Alien
Matrix
E.g.:
SciFi-concept
0
0
0
0.58
0.58
0.58
0
0
0
0
0
0.71
0.71
=
2.9
0
movie-to-concept
similarities
Slides by Jure Leskovec
39
Case study: How to query?
How would the user d that rated
(‘Alien’, ‘Serenity’) be handled?
dconcept = d V
Casablanca
Amelie
0 4
Serenity
d=
Alien
Matrix
E.g.:
5
0
0
0.58
0.58
0.58
0
0
0
0
0
0.71
0.71
movie-to-concept
similarities
Slides by Jure
Leskovec
SciFi-concept
=
5.22
0
40
Case study: How to query?
Casablanca
Amelie
d=
5
0
0
2.9
q=
5 0
0
0
0
5.22 0
Alien
0 4
Matrix
Serenity
Observation: User d that rated (‘Alien’,
‘Serenity’) will be similar to query “user” q
that rated (‘Matrix’), although d did not rate
‘Matrix’!
SciFi-concept
0
Similarity ≠ 0
Similarity = 0
Slides by Jure Leskovec
41
SVD: Drawbacks
+ Optimal low-rank approximation:
• in Frobenius norm
- Interpretability problem:
– A singular vector specifies a linear
combination of all input columns or rows
- Lack of sparsity:
– Singular vectors are dense!
VT

=
U
Slides by Jure Leskovec
42

Slide 1

Transcript Slide 1

Directory