Matrix operation

Download Report

Transcript Matrix operation

Chapter 28 – Part II
Matrix Operations
Matrix Operations








Gaussian elimination
LU factorization
Gaussian elimination with partial pivoting
LUP factorization
Error analysis
Complexity of matrix multiplication &
inversion
SVD and PCA
SVD and PCA applications

Project 2 – image compression using SVD
Eigenvalues & Eigenvectors

Eigenvectors (for a square mm matrix S)
Example
(right) eigenvector

eigenvalue
How many eigenvalues are there at most?
only has a non-zero solution if
this is a m-th order equation in λ which can have at most m distinct
solutions (roots of the characteristic polynomial) – can be
complex even though S is real.
Some of these slides are adapted from notes of Stanford CS276 & CMU CS385
Matrix-vector multiplication
3 0 0
S  0 2 0
0 0 0
has eigenvalues 3, 2, 0 with
corresponding eigenvectors
1
 
v1   0 
 0
 
 0
 
v2   1 
 0
 
 0
 
v3   0 
1
 
On each eigenvector, S acts as a multiple of the identity matrix: but as a different
multiple on each.
Any vector (say x=
the eigenvectors:
 2
 
 4
 6
 
) can be viewed as a combination of
x = 2v1 + 4v2 + 6v3
Matrix vector multiplication

Thus a matrix-vector multiplication such as Sx
can be rewritten in terms of the
eigenvalues/vectors:
Sx  S (2v1  4v2  6v 3 )
Sx  2Sv1  4Sv2  6Sv3  21v1  42v2  6 3v 3


Even though x is an arbitrary vector, the action of
S on x is determined by the eigenvalues/vectors.
Suggestion: the effect of “small” eigenvalues is
small.
Eigenvalues & Eigenvectors
- For symmetric matrices, eigenvectors for distinct
eigenvalues are orthogonal
?
Sv{1,2}  {1,2}v{1,2} , and 1  2  v1  v2  0
- All eigenvalues of a real symmetric matrix are real.
?
for complex, if S  I  0 andS  ST   
- All eigenvalues of a positive semidefinite matrix
are non-negative
?
w  n , wT Sw  0, thenif Sv  v    0
Example
2 1 
S

1 2
Real, symmetric.

Let

1 
2  
2
S  I  
 (2   )  1  0.

2  
Then
 1


The eigenvalues are 1 and 3 (nonnegative, real).
The eigenvectors are orthogonal (and real):
 1   1
   
 1 1
Eigen/diagonal Decomposition

Let
be a square matrix with m
linearly independent eigenvectors

Theorem: Exists an eigen decomposition

Columns of U are eigenvectors of S

Diagonal elements of
are eigenvalues of
Diagonal decomposition (2)
Let U have the eigenvectors as columns:


U  v1 ... vn 


Then, SU can be written

 
 
 1


SU  S v1 ... vn   1v1 ... n vn   v1 ... vn  
...


 
 
 
n 
Thus SU=U, or U–1SU=
And S=UU–1.
Diagonal decomposition - example
2 1
S

;


1
,


3
.
1
2


Recall
1 2
1
1




The eigenvectors:  
 
 1
 
 1
1 / 2  1 / 2
1
U 
Inverting U,

1
/
2
1
/
2


Then, S=UU–1 =
 1 1
U 

 1 1
 1 1 1 0 1 / 2  1 / 2
 1 1 0 3 1 / 2 1 / 2 




Example continued
Let’s divide U (and multiply U–1 ) by
Then, S=
2
 1 / 2 1 / 2  1 0 1 / 2




 1 / 2 1 / 2  0 3 1 / 2
Q

1/ 2 

1/ 2 
(Q-1= QT )
Symmetric Eigen Decomposition

If
is a symmetric matrix:

Theorem: Exists a (unique) eigen
decomposition S  QQT

where Q is orthogonal:

Q-1= QT

Columns of Q are normalized eigenvectors

Columns are orthogonal.

(everything is real)
Exercise

Examine the symmetric eigen decomposition, if
any, for each of the following matrices:
 0 1
  1 0


0 1 
1 0 


 1 2
  2 3


2 2
2 4


Singular Value Decomposition
For an m n matrix A of rank r there exists a factorization
(Singular Value Decomposition = SVD) as follows:
A  UV
mm
T
mn
V is nn
The columns of U are orthogonal eigenvectors of AAT.
The columns of V are orthogonal eigenvectors of ATA.
Eigenvalues 1 … r of AAT are the eigenvalues of ATA.
 i  i
  diag 1... r 
Singular values.
Singular Value Decomposition

Illustration of SVD dimensions and sparseness
SVD example
1  1
Let


A 0 1
 1 0 
Thus m=3, n=2. Its SVD is
 0

1 / 2
1 / 2

2/ 6
1/ 6
1/ 6
1/ 3  1 0 
1 / 2


1 / 3  0
3
1/ 2


 1 / 3   0 0 
1/ 2 

1/ 2 
Typically, the singular values arranged in decreasing order.
…
Low-rank Approximation


SVD can be used to compute optimal low-rank
approximations.
Approximation problem: Find Ak of rank k such
that
Ak 
min
X :rank ( X ) k
A X
F
Ak and X are both mn matrices.
Typically, want k << r.
Frobenius norm = 2-norm
Low-rank Approximation

Solution via SVD
Ak  U diag(1,..., k ,0,...,0)V T
set smallest r-k singular values to zero
k
Ak  i 1 u v
k
T
i i i
column notation: sum
of rank 1 matrices
Approximation error


How good (bad) is this approximation?
It’s the best possible, measured by the Frobenius
norm of the error:
min
X :rank ( X )  k
A X
F
 A  Ak F ,
(It can be shown : A  Ak
2
  k 1 )
where the i are ordered such that i  i+1.
Suggests why Frobenius error drops as k increased.
SVD Application

Image compression
SVD example
Eigen values of A’A: ?
Eigen vector?
Matrix U?
Action of A on a unit circle
SVD example --
Vx
U
apply 
PCA

Principal Components Analysis
Data Presentation

A1
A2
A3
A4
A5
A6
A7
A8
A9
Example: 53 Blood and urine
measurements (wet chemistry) from
65 people (33 alcoholics, 32 nonalcoholics).
Matrix Format
H-WBC
8.0000
7.3000
4.3000
7.5000
7.3000
6.9000
7.8000
8.6000
5.1000
H-RBC
4.8200
5.0200
4.4800
4.4700
5.5200
4.8600
4.6800
4.8200
4.7100
H-Hgb
14.1000
14.7000
14.1000
14.9000
15.4000
16.0000
14.7000
15.8000
14.0000
H-Hct
41.0000
43.0000
41.0000
45.0000
46.0000
47.0000
43.0000
42.0000
43.0000
H-MCV
85.0000
86.0000
91.0000
101.0000
84.0000
97.0000
92.0000
88.0000
92.0000
Value


Spectral Format
1000
900
800
700
600
500
400
300
200
100
00
H-MCH
29.0000
29.0000
32.0000
33.0000
28.0000
33.0000
31.0000
33.0000
30.0000
10
20
30
40
measurement
Measurement
H-MCHC
34.0000
34.0000
35.0000
33.0000
33.0000
34.0000
34.0000
37.0000
32.0000
50
60
Data Presentation
Bivariate
0
C-LDH
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
Trivariate
0 10 20 30 40 50 60 70
550
500
450
400
350
300
250
200
150
100
50
0 50
Person
4
M-EPI
H-Bands
Univariate
3
2
1
0
600
400
C-LDH
200
00
500
400
300
200
100
C-Triglycerides
150 250 350 450
C-Triglycerides
Data Presentation




Better presentation than ordinate axes?
Do we need a 53 dimension space to view data?
How to find the ‘best’ low dimension space that
conveys maximum useful information?
One answer: Find “Principal Components”
Principal Components

First PC is direction
of maximum variance
from origin
Subsequent PCs are
orthogonal to 1st PC
and describe
maximum residual
variance
25
20
15
PC 1
10
5
0 0
5
10
15
20
Wavelength 1
25
30
10
15
20
Wavelength 1
25
30
30
Wavelength 2

Wavelength 2
30
25
20
15
10
PC 2
5
0 0
5
The Goal

We wish to explain/summarize the underlying
variance-covariance structure of a large set of
variables through a few linear combinations of
these variables.
Applications

Uses:
Data Visualization
 Data Reduction
 Data Classification
 Trend Analysis
 Factor Analysis
 Noise Reduction


Examples:







How many unique “sub-sets” are in the
sample?
How are they similar / different?
What are the underlying factors that
influence the samples?
Which time / temporal trends are
(anti)correlated?
Which measurements are needed to
differentiate?
How to best present what is “interesting”?
Which “sub-set” does this new sample
rightfully belong?
Trick: Rotate Coordinate Axes

Suppose we have a population measured on p random variables
X1,…,Xp. Note that these random variables represent the p-axes
of the Cartesian coordinate system in which the population
resides. Our goal is to develop a new set of p axes (linear
combinations of the original p axes) in the directions of greatest
variability:
X2
X1
This is accomplished by rotating the axes.
PCA: General
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
PCA: General
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk

such that:
yk's are uncorrelated (orthogonal)
y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance
etc.
5
2nd Principal
Component, y2
1st Principal
Component, y1
4
3
2
4.0
4.5
5.0
5.5
6.0
PCA Scores
5
xi2
4
yi,1
yi,2
3
2
4.0
4.5
5.0
5.5
xi1
6.0
PCA Eigenvalues
5
λ2
λ1
4
3
2
4.0
4.5
5.0
5.5
6.0
PCA: Another Explanation
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
yk's are
y2 = a21x1 + a22x2 + ... + a2kxk
Principal Components
...
yk = ak1x1 + ak2x2 + ... + akkxk
such that:
yk's are uncorrelated (orthogonal)
y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance
etc.
Example
Apple
student1
student2
student3
student4
Samsung
1
4
7
8
Nokia
2
2
8
4
Mean
1
13
1
5
-mean
5 student1
4 student2
5 student3
student4
Apple
Samsung
-4
-1
2
3
Nokia
-2
-2
4
0
-4
8
-4
0
Example
Apple
student1
student2
student3
student4
Samsung
1
4
7
8
Nokia
2
2
8
4
Mean
1
13
1
5
-mean
5 student1
4 student2
5 student3
student4
Sample co-variance matrix: S=B’B/(n-1) =
Apple
Samsung
-4
-1
2
3
10
6
0
6
8
-8
Nokia
-2
-2
4
0
-4
8
-4
0
0
-8
32
PCA of B  finding eigenvalues and eigenvectors of S: S=VDV’
V=
0.5686
-0.7955
-0.2094
0.8193
0.5247
0.2312
-0.0740
-0.3030
0.9501
D=
1.6057
0
0
0
0
13.843
0
0 34.5513
1st principle component: [-0.0740, -0.3030, 0.9501]’, y1 = -0.0740A-0.3030S+0.9501N
2nd principle component: [0.8193 0.5247 0.2312]’, y2 …
3rd principle component: [0.5686 -0.7955 -0.2094]’, y3 …
Total variance = tr(D) = 50 = tr(S)
Example
Apple
student1
student2
student3
student4
Samsung
1
4
7
8
Nokia
2
2
8
4
Mean
1
13
1
5
-mean
5 student1
4 student2
5 student3
student4
Sample co-variance matrix: S=B’B/(n-1) =
Apple
Samsung
-4
-1
2
3
10
6
0
6
8
-8
Nokia
-2
-2
4
0
0
-8
32
PCA of B  finding eigenvalues and eigenvectors of S: S=VDV’
V=
0.5686
-0.7955
-0.2094
0.8193
0.5247
0.2312
-0.0740
-0.3030
0.9501
D=
1.6057
0
0
0
0
13.843
0
0 34.5513
1st principle component: [-0.0740, -0.3030, 0.9501], y1 = -0.0740A-0.3030S+0.9501N
2nd principle component: [0.8193 0.5247 0.2312], y2 …
3rd principle component: [0.5686 -0.7955 -0.2094], y3 …
Total variance = tr(D) = 50 = tr(S)
-4
8
-4
0
New view of the data


y1 = -0.0740A-0.3030S+0.9501N, …
C = BV =
C
y3
y2
student1
0.1542
student2
-0.6528
student3
-1.2072
student4
1.7058
y1
-5.2514
-0.0191
2.8126
2.4579
-2.8984
8.2808
-5.1604
-0.222
y2
x2
y1
Apple
student1
student2
student3
student4
Samsung
1
4
7
8
Nokia
2
2
8
4
1
13
1
5
x1
Another look of the Data
B
student1
student2
student3
student4
Apple
-mean
student1
student2
student3
student4
A
1
4
7
8
Nokia
2
2
8
4
S
-0.2946
-1.344
-0.1778
1.8164
S=VDV’,
C
student1
student2
student3
student4
Samsung
Var^-1*B A
S
student1
1.7321
student2
0.6827
1 Make all
1.8489
13 variances = 1 student3
student4
3.8431
1
5
N
3.4641
0.3413
2.1131
1.9215
N
1.5041 0.077925
-1.6187 0.564425
0.1531 -1.39008
-0.0385 0.747725
lambda=[0.5873; 1.4916; 2.2370];
y3
y2
0.8888
0.2289
-0.9428
-0.1749
y1
0.9583
-0.5312
1.048
-1.4751
-0.8043
2.1001
-0.01
-1.2858
y2
y1
1.7321
2.2186
0.2641
2.4019