Principal Components Analysis

Download Report

Transcript Principal Components Analysis

VI. Principal Components Analysis
A. The Basic Principle
We wish to explain/summarize the underlying variancecovariance structure of a large set of variables through a
few linear combinations of these variables. The
objectives of principal components analysis are
- data reduction
- interpretation
The results of principal components analysis are often
used as inputs to
- regression analysis
- cluster analysis
B. Population Principal Components
Suppose we have a population measured on p random
variables X1,…,Xp. Note that these random variables
represent the p-axes of the Cartesian coordinate system in
which the population resides. Our goal is to develop a
new set of p axes (linear combinations of the original p
axes) in the directions of greatest variability:
X2
X1
This is accomplished by rotating the axes.
Consider our random vector
 X1 
X 
2
X =  
 
 
 X p 
with covariance matrix S
~ and eigenvalues l1  l2    lp.
We can construct p linear combinations
Y1 = a'1 X = a11 X 1 + a12 X 2 +
+ a1p X p
Y2 = a'2 X = a21 X 1 + a22 X 2 +
+ a2p X p
Yp = a'p X = a p1 X 1 + a p2 X 2 +
+ a pp X p
It is easy to show that
Var  Yi  = a'iΣai, i = 1,
,p
Cov  Yi, Yk  = a'iΣak, i, k = 1,
,p
The principal components are those uncorrelated linear
combinations Y1,…,Yp whose variances are as large as
possible.
Thus the first principal component is the linear
combination of maximum variance, i.e., we wish to solve
the nonlinear optimization problem
source of
nonlinearity
max a'1Σa1
a1
st
a'1a1 = 1
restrict to
coefficient vectors
of unit length
The second principal component is the linear
combination of maximum variance that is uncorrelated
with the first principal component, i.e., we wish to solve
the nonlinear optimization problem
max a'2Σa2
a2
st
'
2
a a2 = 1
a'1Σa2 = 0
restricts
covariance
to zero
The third principal component is the solution to the
nonlinear optimization problem
max a'3Σa3
a3
st
a'3a3 = 1
'
1
'
2
a Σa3 = 0
a Σa3 = 0
restricts
covariances
to zero
Generally, the ith principal component is the linear
combination of maximum variance that is uncorrelated
with all previous principal components, i.e., we wish to
solve the nonlinear optimization problem
max a'iΣai
ai
st
'
i
'
k
a ai = 1
a Σai = 0 k < i
We can show that, for random vector X with covariance
~
matrix S~ and eigenvalues l1  l2    lp  0, the ith
principal component is given by
Yi = e'iX = e'i1X1 + e'i2X 2 +
+ e'ip X p, i = 1,
Note that the principal components are not unique if
some eigenvalues are equal.
,p
We can also show for random vector X with covariance
~
matrix S and eigenvalue-eigenvector pairs (l1 , e1), …, (lp ,
~
~
e~p) where l1  l2    lp,
p
σ11 +
+ σ pp =
 Var  X 
i
p
= λ1 +
i =1
+ λp =
 Var  Y 
i
i =1
so we can assess how well a subset of the principal
components Yi summarizes the original random variables
Xi – one common method of doing so is
λk
p
λ
i=1
i
proportion of total
population variance
due to the kth
principal component
If a large proportion of the total population variance can
be attributed to relatively few principal components, we
can replace the original p variables with these principal
components without loss of much information!
We can also easily find the correlations between the
original random variables Xk and the principal
components Yi:
ρYi,Xk =
eik λi
σkk
These values are often used in interpreting the principal
components Yi.
Example: Suppose we have the following population of
four observations made on three random variables X1, X2,
and X3:
X1
1.0
4.0
3.0
4.0
X2
6.0
12.0
12.0
10.0
X3
9.0
10.0
15.0
12.0
Find the three population principal components Y1, Y2,
and Y3:
First we need the covariance matrix S:
~
1.50 2.50 1.00


Σ = 2.50 6.00 3.50
1.00 3.50 5.25
and the corresponding eigenvalue-eigenvector pairs:
 0.2910381


λ1 = 9.9145474, e1 =  0.7342493
 0.6133309
 0.4150386


λ2 = 2.5344988, e2 =  0.4807165
-0.7724340
 0.8619976


λ3 = 0.3009542, e3 = -0.4793640 
 0.1648350
so the principal components are:
Y1 = e'1X = 0.2910381X 1 + 0.7342493X 2 + 0.6133309X 3
Y2 = e'2X = 0.4150386X 1 + 0.4807165X 2 - 0.7724340X 3
Y3 = e'3 X = 0.8619976X 1 - 0.4793640X 2 + 0.1648350X 3
Note that
σ11 + σ22 + σ33 = 2.0 + 8.0 + 7.0 = 17.0
= 9.9145474 + 2.5344988 + 0.3009542 = λ1 + λ2 + λ3
and the proportion of total population variance due to
the each principal component is
λ1
9.9145474
=
= 0.777611529
17.0
p
λ
i
i=1
λ2
2.5344988
=
= 0.198784220
17.0
p
λ
i
i=1
λ3
0.3009542
=
= 0.023604251
17.0
p
λ
i
i=1
Note that the third principal component is relatively
irrelevant!
Next we obtain the correlations between the original
random variables Xi and the principal components Yi:
ρY1,X1 =
ρY1,X2 =
ρY1,X3 =
ρY2,X1 =
ρY2,X2 =
e11 λ1
σ11
e21 λ1
σ22
e31 λ1
σ33
e12 λ2
σ11
e22 λ2
σ21
0.2910381 9.9145474
=
= 0.610935027
1.50
0.7342493 9.9145474
=
= 0.385326368
6.00
0.6133309 9.9145474
=
= 0.367851033
5.25
0.4150386 2.5344988
=
= 0.440497325
1.50
0.4807165 2.5344988
=
= 0.127550987
6.00
ρY2,X3 =
ρY3,X1 =
ρY3,X2 =
ρY3,X3 =
e32 λ2
σ33
e13 λ3
σ11
e23 λ3
σ22
e33 λ3
σ33
-0.7724340 2.5344988
=
= -0.234233023
5.25
0.8619976 0.3009542
=
= 0.315257191
1.50
-0.4793640 0.3009542
=
= -0.043829283
6.00
0.1648350 0.3009542
=
= 0.017224251
5.25
We can display these results in a correlation matrix:
Y1
Y2
Y3
X1
X2
X3
0.6109350 0.3853264 0.3678510
0.4404973 0.1275510 -0.2342330
0.3152572 -0.0438293 0.0172243
Here we can easily see that
- the first principal component (Y1) is a mixture of all three
random variables (X1, X2, and X3)
- the second principal component (Y2) is a trade-off
between X1 and X3
- the third principal component (Y3) is a residual of X1
When the principal components are derived from an X ~
~
Np(m,S) distributed population, the density of X is
~~
~
constant on the m-centered ellipsoids
~

 
'

x - μ Σ x - μ = c2
which have axes
c λi, i = 1, , p
where (li,ei) are the eigenvalue-eigenvector pairs of S.
~
~
We can set m = 0 w.l.g. – we can then write
~
~
 
1 '
c = x Σx =
e1x
λ1
2
'
2
+
 
1 '
+
ep x
λp
2
where the e'i x are the principal components of x.
~
Setting y i = e'i x and substituting into the previous
expression yields
1 2
c =
y1 +
λ1
2
1 2
+
yp
λp
which defines an ellipsoid (note that li > 0  i) in a
coordinate system with axes y1,…,yp lying in the
~
directions of e~~1,…,e
~ p, respectively.
The major axis lies in the direction determined by the
eigenvector ei associated with the largest eigenvalue li ~
the remaining minor axes lie in the directions
determined by the other eigenvectors.
Example: For the principal components derived from the
following population of four observations made on three
random variables X1, X2, and X3:
X1
1.0
4.0
3.0
4.0
X2
6.0
12.0
12.0
10.0
plot the major and minor axes.
X3
9.0
10.0
15.0
12.0
We will need the centroid m:
 3.0


μ = 10.0
11.5
The direction of the major axis is given by
e'1X = 0.2910381X 1 + 0.7342493X 2 + 0.6133309X 3
while the directions of the two minor axis are given by
e'2X = 0.4150386X 1 + 0.4807165X 2 - 0.7724340X 3
e'3 X = 0.8619976X 1 - 0.4793640X 2 + 0.1648350X 3
We first graph the centroid:
X2
3.0,10.0,15.0
X3
X1
…then use the first eigenvector to find a second point on
the first principal axis:
X2
Y1
X1
X3
The line connecting these two points is the Y1 axis.
…then do the same thing with the second eigenvector:
Y2
X2
Y1
X1
X3
The line connecting these two points is the Y2 axis.
…and do the same thing with the third eigenvector:
Y2
X2
Y1
X1
Y3
X3
The line connecting these two points is the Y3 axis.
What we have done is a rotation…
Y2
X2
Y1
X1
Y3
X3
and a translation in p = 3 dimensions.
Y2 Y2
X2
Note that the rotated
axes remain orthogonal!
Y1
X1
Y3
X3
Note that we can also construct principal components for
the standardized variables Zi:
X i - μi
Zi =
, i = 1, , p
σii
which in matrix notation is
   X - μ
Z = V
12
-1
where V1/2 is the diagonal standard deviation matrix.
~
Obviously
E  Z = 0
 
Cov  Z = V
12
-1
 
Σ V
12
-1
= ρ
This suggests that the principal components for the
standardized variables Zi may be obtained from the
eigenvectors of the correlation matrix r! The operations
~
are analogous to those used in conjunction with the
covariance matrix.
We can show that, for random vector Z
of standardized
~
variables with covariance matrix r and eigenvalues l1  l2
~
th
   lp  0, the i principal component is given by
'
i
'
i
Yi = e Z = e
 V   X - μ , i
12
-1
= 1,
,p
Note again that the principal components are not unique
if some eigenvalues are equal.
We can also show for random vector Z with covariance
~
matrix r and eigenvalue-eigenvector pairs (l1 , e1), …, (lp ,
~
~
ep) where l1  l2    lp,
~
p
 Var Z 
i
p
= λ1 +
+ λp =
i =1
 Var  Y 
i
= p
i =1
and we can again assess how well a subset of the
principal components Yi summarizes the original random
variables Xi by using
λk
p
proportion of total
population variance
due to the kth
principal component
If a large proportion of the total population variance can
be attributed to relatively few principal components, we
can replace the original p variables with these principal
components without loss of much information!
Example: Suppose we have the following population of
four observations made on three random variables X1, X2,
and X3:
X1
1.0
4.0
3.0
4.0
X2
6.0
12.0
12.0
10.0
X3
9.0
10.0
15.0
12.0
Find the three population principal components
variables Y1, Y2, and Y3 for the standardized random
variables Z1, Z2, and Z3:
We could standardize the variables X1, X2, and X3, then
work with the resulting covariance matrix S,
~ but it is
much easier to proceed directly with correlation matrix r:
1.000 0.833 0.356 


ρ = 0.833 1.000 0.624 
0.356 0.624 1.000 
and the corresponding eigenvalue-eigenvector pairs:
~
0.58437383


λ1 = 2.2149347, e1 = 0.63457754
0.50578527
-0.5449250


λ2 = 0.6226418, e2 = -0.1549791
 0.8240377
 0.6013018


λ3 = 0.1624235, e3 = -0.7571610
 0.2552315
These results differ
from the covariancebased principal
components!
so the principal components are:
Y1 = e'1Z = 0.5843738Z1 + 0.6345775Z2 + 0.5057853Z3
Y2 = e'2Z = -0.5449250Z1 - 0.1549791Z2 + 0.8240377Z3
Y3 = e'3Z = 0.6013018Z1 - 0.7571610Z2 + 0.2552315Z3
Note that
σ11 + σ22 + σ33 = 1.0 + 1.0 + 1.0 = 3.0
= 2.2149347 + 0.6226418 + 0.1624235 = λ1 + λ2 + λ3
and the proportion of total population variance due to
the each principal component is
λ1
2.2149347
=
= 0.738311567
3.0
p
λ
i
i=1
λ2
0.6226418
=
= 0.207547267
3
p
λ
i
i=1
λ3
0.1624235
=
= 0.054141167
3
p
λ
i
i=1
Note that the third principal component is again
relatively irrelevant!
Next we obtain the correlations between the original
random variables Xi and the principal components Yi:
ρY1,Z1 = e11 λ1 = 0.58437383 2.2149347 = 0.869703464
ρY1,Z2 = e21 λ1 = 0.6345775 2.2149347 = 0.944419907
ρY1,Z3 = e31 λ1 = 0.5057853 2.2149347 = 0.752742749
ρY2,Z1 = e12 λ2 = -0.5449250 0.6226418 = -0.429987538
ρY2,Z2 = e22 λ2 = -0.1549791 0.6226418 = -0.122290294
ρY2,X3 = e32 λ2 = 0.8240377 0.6226418 = 0.650228824
ρY3,X1 = e13 λ3 = 0.6013018 0.1624235 = 0.242335443
ρY3,X2 = e23 λ3 = -0.7571610 0.1624235 = -0.305149504
ρY3,X3 = e33 λ3 = 0.2552315 0.1624235 = 0.102862886
We can display these results in a correlation matrix:
Y1
Y2
Y3
Z1
0.8697035
-0.4299875
0.2423354
Z2
Z3
0.944420 0.7527427
-0.122290 0.6502288
-0.305150 0.1028629
Here we can easily see that
- the first principal component (Y1) is a mixture of all three
random variables (X1, X2, and X3)
- the second principal component (Y2) is a trade-off
between X1 and X3
- the third principal component (Y3) is a trade-off between
X1 and X2
SAS code for Principal Components Analysis:
OPTIONS LINESIZE=72 NODATE PAGENO=1;
DATA stuff;
INPUT x1 x2 x3;
LABEL x1='Random Variable 1'
x2='Random Variable 2'
x3='Random Variable 3';
CARDS;
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0
;
PROC PRINCOMP DATA=stuff OUT=pcstuff N=3;
VAR x1 x2 x3;
RUN;
PROC CORR DATA=pcstuff;
VAR x1 x2 x3;
WITH prin1 prin2 prin3;
RUN;
PROC FACTOR DATA=stuff SCREE;
VAR x1 x2 x3;
RUN;
Note that the SAS default is to use the correlation matrix
to perform this analysis!
SAS output for Principal Components Analysis:
The PRINCOMP Procedure
Observations
4
Variables
3
Mean
StD
Correlation Matrix
x1
Random Variable 1
1.0000
Random Variable 2
0.8333
Random Variable 3
0.3563
x1
x2
x3
1
2
3
x1
x2
x3
Simple Statistics
x1
x2
3.000000000
10.00000000
1.414213562
2.82842712
x3
11.50000000
2.64575131
x2
0.8333
1.0000
0.6236
x3
0.3563
0.6236
1.0000
Eigenvalues of the Correlation Matrix
Eigenvalue
Difference
Proportion
Cumulative
2.22945702
1.56733894
0.7432
0.7432
0.66211808
0.55369318
0.2207
0.9639
0.10842490
0.0361
1.0000
Random Variable 1
Random Variable 2
Random Variable 3
Eigenvectors
Prin1
0.581128
0.645363
0.495779
Prin2
-0.562643
-0.121542
0.817717
Prin3
0.587982
-0.754145
0.292477
SAS output for Correlation Matrix – Original Random
Variables vs. Principal Components:
The CORR Procedure
3 With Variables:
3
Variables:
Variable
Prin1
Prin2
Prin3
x1
x2
x3
N
4
4
4
4
4
4
Prin1
x1
Prin2
x2
Simple Statistics
Mean
Std Dev
0
1.49314
0
0.81371
0
0.32928
3.00000
1.41421
10.00000
2.82843
11.50000
2.64575
Prin3
x3
Sum
0
0
0
12.00000
40.00000
46.00000
Minimum
-2.20299
-0.94739
-0.28331
1.00000
6.00000
9.00000
Pearson Correlation Coefficients, N = 4
Prob > |r| under H0: Rho=0
x1
x2
x3
Prin1
0.86770
0.1323
0.96362
0.0364
0.74027
0.2597
Prin2
-0.45783
0.5422
-0.09890
0.9011
0.66538
0.3346
Prin3
0.19361
0.8064
-0.24832
0.7517
0.09631
0.9037
Maximum
1.11219
0.99579
0.47104
4.00000
12.00000
15.00000
SAS output for Factor Analysis
PRINCIPAL COMPONENTS ANALYSIS
FOR QA 610
SPRING QUARTER 2001
Using PROC FACTOR to obtain a Scree Plot for Principal Components Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components
Prior Communality Estimates: ONE
Eigenvalues of the Correlation Matrix: Total = 3
1
2
3
Average = 1
Eigenvalue
Difference
Proportion
Cumulative
2.22945702
0.66211808
0.10842490
1.56733894
0.55369318
0.7432
0.2207
0.0361
0.7432
0.9639
1.0000
1 factor will be retained by the MINEIGEN criterion.
Note that
this is
consistent
with the
results from
PCA
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components
Scree Plot of Eigenvalues
‚
‚
‚
‚
‚
‚
2.5 ˆ
‚
‚
‚
1
‚
‚
2.0 ˆ
‚
‚
E
‚
i
‚
g
‚
e 1.5 ˆ
n
‚
v
‚
a
‚
l
‚
u
‚
e 1.0 ˆ
s
‚
‚
‚
‚
2
‚
0.5 ˆ
‚
‚
‚
‚
‚
3
0.0 ˆ
‚
‚
‚
‚
‚
Šƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ
0
1
2
3
Number
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components
Factor Pattern
Factor1
x1
x2
x3
Random Variable 1
Random Variable 2
Random Variable 3
0.86770
0.96362
0.74027
Variance Explained by Each Factor
Factor1
Pearson
Correlation
Coefficients for
the first
principal
component with the
three original
variables X1, X2,
and X3
First eigenvalue l1
2.2294570
Final Communality Estimates: Total = 2.229457
x1
x2
x3
0.75291032
0.92855392
0.54799278
SAS code for Principal Components Analysis:
OPTIONS LINESIZE=72 NODATE PAGENO=1;
DATA stuff;
INPUT x1 x2 x3;
LABEL x1='Random Variable 1'
x2='Random Variable 2'
x3='Random Variable 3';
CARDS;
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0
;
PROC PRINCOMP DATA=stuff OUT=pcstuff N=3 COV;
VAR x1 x2 x3;
RUN;
PROC CORR DATA=pcstuff;
VAR x1 x2 x3;
WITH prin1 prin2 prin3;
RUN;
PROC FACTOR DATA=stuff SCREE COV;
VAR x1 x2 x3;
RUN;
Note that here we use SAS to derive the covariance
matrix based principal components!
SAS output for Principal Components Analysis:
The PRINCOMP Procedure
Observations
4
Variables
3
Mean
StD
x1
x2
x3
Simple Statistics
x1
x2
3.000000000
10.00000000
1.414213562
2.82842712
Random Variable 1
Random Variable 2
Random Variable 3
Covariance Matrix
x1
x2
2.000000000
3.333333333
3.333333333
8.000000000
1.333333333
4.666666667
Total Variance
1
2
3
x1
x2
x3
x3
11.50000000
2.64575131
17
Eigenvalues of the Covariance Matrix
Eigenvalue
Difference
Proportion
13.2193960
9.8400643
0.7776
3.3793317
2.9780594
0.1988
0.4012723
0.0236
Random Variable 1
Random Variable 2
Random Variable 3
Eigenvectors
Prin1
0.291038
0.734249
0.613331
x3
1.333333333
4.666666667
7.000000000
Prin2
0.415039
0.480716
-.772434
Cumulative
0.7776
0.9764
1.0000
Prin3
0.861998
-.479364
0.164835
SAS output for Correlation Matrix – Original Random
Variables vs. Principal Components:
The CORR Procedure
3 With Variables:
3
Variables:
Variable
Prin1
Prin2
Prin3
x1
x2
x3
N
4
4
4
4
4
4
Prin1
x1
Prin2
x2
Simple Statistics
Mean
Std Dev
0
3.63585
0
1.83830
0
0.63346
3.00000
1.41421
10.00000
2.82843
11.50000
2.64575
Prin3
x3
Sum
0
0
0
12.00000
40.00000
46.00000
Minimum
-5.05240
-1.74209
-0.38181
1.00000
6.00000
9.00000
Pearson Correlation Coefficients, N = 4
Prob > |r| under H0: Rho=0
x1
x2
x3
Prin1
0.74824
0.2518
0.94385
0.0561
0.84285
0.1571
Prin2
0.53950
0.4605
0.31243
0.6876
-0.53670
0.4633
Prin3
0.38611
0.6139
-0.10736
0.8926
0.03947
0.9605
Maximum
3.61516
2.53512
0.94442
4.00000
12.00000
15.00000
SAS output for Factor Analysis
PRINCIPAL COMPONENTS ANALYSIS
FOR QA 610
SPRING QUARTER 2001
Using PROC FACTOR to obtain a Scree Plot for Principal Components Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components
Prior Communality Estimates: ONE
Eigenvalues of the Covariance Matrix: Total = 17
1
2
3
Average = 5.66666667
Eigenvalue
Difference
Proportion
Cumulative
13.2193960
3.3793317
0.4012723
9.8400643
2.9780594
0.7776
0.1988
0.0236
0.7776
0.9764
1.0000
1 factor will be retained by the MINEIGEN criterion.
Note that
this is
consistent
with the
results from
PCA
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components
Scree Plot of Eigenvalues
‚
‚
‚
14 ˆ
‚
‚
1
‚
‚
12 ˆ
‚
‚
‚
‚
10 ˆ
‚
E
‚
i
‚
g
‚
e 8 ˆ
n
‚
v
‚
a
‚
l
‚
u 6 ˆ
e
‚
s
‚
‚
‚
4 ˆ
‚
‚
2
‚
‚
2 ˆ
‚
‚
‚
‚
3
0 ˆ
‚
‚
‚
Šƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒ
0
1
2
3
Number
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components
Factor Pattern
Factor1
x1
x2
x3
Random Variable 1
Random Variable 2
Random Variable 3
0.74824
0.94385
0.84285
Variance Explained by Each Factor
Factor
Factor1
Weighted
Unweighted
13.2193960
2.16112149
Pearson
Correlation
Coefficients for
the first
principal
component with the
three original
variables X1, X2,
and X3
First eigenvalue l1
Final Communality Estimates and Variable Weights
Total Communality: Weighted = 13.219396
Unweighted = 2.161121
Variable
x1
x2
x3
Communality
Weight
0.55986257
0.89085847
0.71040045
2.00000000
8.00000000
7.00000000
Covariance matrices with special structures yield
particularly interesting principal components:
- Diagonal covariance matrices – suppose S is the diagonal
~
matrix
0
0
σ
11
0
Σ = 


 0
0 


σ pp 
σ22
0
since the eigenvector e~ i has a value of 1 in the ith position
and 0 in all other positions, we have
σ11 0
0 σ
22
Σei = 


 0 0
0
0 


σ pp 
0
th
so
(
s
,e
)
is
the
i
ii i
 
  eigenvalue-eigenvecotr
0
pair
 
σii  = σiiei
0
 
 
0
 
…so the linear combination
Yk = Σe'i X = X i
demonstrates that the set of principal components and
the original set of (uncorrelated) random variables are
the same!
Note that this result is also true if we work with the
correlation matrix.
- constant variances and covariance matrices – suppose S is
~
the patterned matrix  σ2 ρσ2
ρσ2 
 2
2
ρσ
σ
Σ = 

 2
2
ρσ ρσ

ρσ 

2 
σ 
2
Here the resulting correlation matrix
1 ρ
ρ 1
ρ = 


ρ ρ
ρ
ρ


1
is also the covariance matrix of the standardized
variables Z Here the resulting correlation matrix
~
C. Using Principal Components to
Summarize Sample Variation
Suppose the data x~1,…,x
represent n independent
~n
observations from a p-dimensional population with
some mean vector m
and covariance
matrix S
– these data
~
_
~
yield a sample mean vector ~x, sample covariance matrix
S,
~ and sample correlation matrix R.
~
As in the population case, our goal is to develop a new
set of p axes (linear combinations of the original p axes)
in the directions of greatest variability:
y1 = a'1x = a11x1 + a12x 2 +
+ a1p x p
y 2 = a'2x = a21x1 + a22x 2 +
+ a2p x p
y p = a'p x = a p1x1 + a p2x 2 +
+ a pp x p
Again it is easy to show that the linear combinations
a'ix = ai1xj1 + ai2xj2 +
+ aipxjp
have sample means a'i x and
  = a Sa , i = 1, , p
Cov  a x, a x  = a Sa , i, k = 1,
Var a'i x
'
i
'
i
i
'
k
'
i
k
,p
The principal components are those uncorrelated linear
^
combinations y^ 1,…,y
p whose variances are as large as
possible.
Thus the first principal component is the linear
combination of maximum sample variance, i.e., we wish
to solve the nonlinear optimization problem
source of
nonlinearity
max a'1Sa1
a1
st
a'1a1 = 1
restrict to
coefficient vectors
of unit length
The second principal component is the linear
combination of maximum sample variance that is
uncorrelated with the first principal component, i.e., we
wish to solve the nonlinear optimization problem
max a'2Sa2
a2
st
'
2
a a2 = 1
a'1Sa2 = 0
restricts
covariance
to zero
The third principal component is the solution to the
nonlinear optimization problem
max a'3Sa3
a3
st
a'3a3 = 1
'
1
'
2
a Sa3 = 0
a Sa3 = 0
restricts
covariances
to zero
Generally, the ith principal component is the linear
combination of maximum sample variance that is
uncorrelated with all previous principal components, i.e.,
we wish to solve the nonlinear optimization problem
max a'iSai
ai
st
'
i
'
k
a ai = 1
a Sai = 0 k < i
We can show that, for random sample X
with sample
~
^
^
^
covariance matrix S
~ and eigenvalues l1  l2    lp  0,
the ith sample principal component is given by
ˆ'ix = e
ˆ'i1x1 + e
ˆ'i2x2 +
ˆi = e
y
ˆ'ipxp, i = 1,
+e
Note that the principal components are not unique if
some eigenvalues are equal.
,p
We can also show for random sample X with sample
~
^
covariance matrix S and
eigenvalue-eigenvector
pairs (l1 ,
^
^
^
^
^
~
^
e~1), …, (lp , e~p) where l~1  l~2    l~p,
p
s11 +
+ spp =
s
ii
= λˆ1 +
i =1
+ λˆ p =
p
 Var  y 
i
i =1
so we can assess how well a subset of the principal
components yi summarizes the original random sample X
~
– one common method of doing so is
ˆ
λ
k
p

i=1
ˆ
λ
i
proportion of total
sample variance due
to the kth principal
component
If a large proportion of the total sample variance can be
attributed to relatively few principal components, we can
replace the original p variables with these principal
components without loss of much information!
We can also easily find the correlations between the
original random variables xk and the principal
components yi
rYi,X k =
eˆik λˆi
skk
These values are often used in interpreting the principal
components yi.
Note that
- the approach for standardized data (i.e., principal
components derived from the sample correlation matrix
R) is analogous to the population approach
~
- when principal components are derived from sample
data, the sample data are frequently centered,
x-x
which has no effect on the sample covariance matrix S
~
and yields the derived principal components
ˆ'i  x - x 
ˆi = e
y
Under these circumstances, the mean value of the ith
principal component associated with all n observations
in the data set is
1
ˆ
yi =
n
n
n
1
1 '
'
'
ˆ
ˆ
ei xj - x = ei  xj - x = eˆi0 = 0

n j=1
n
j=1




Example: Suppose we have the following sample of four
observations made on three random variables X1, X2, and
X3:
X1
1.0
4.0
3.0
4.0
X2
6.0
12.0
12.0
10.0
X3
9.0
10.0
15.0
12.0
Find the three sample principal components y1, y2, and y3
based on the sample covariance matrix S:
~
First we need the sample covariance matrix S:
~
2.00 3.33 1.33


S = 3.33 8.00 4.67
1.33 4.67 7.00
and the corresponding eigenvalue-eigenvector pairs:
ˆ = 13.21944, e
ˆ1
λ
1
ˆ =
λ
2
ˆ2
3.37916, e
ˆ =
λ
3
ˆ3
0.40140, e
 0.291000


=  0.734253
 0.613345
 0.415126


=  0.480690
-0.772403
 0.861968


= -0.479385
 0.164927
so the principal components are:
ˆy1 = e'1x = 0.291000x1 + 0.734253x2 + 0.613345x3
ˆy2 = e'2x = 0.415126x1 + 0.480690x2 - 0.772403x3
ˆy3 = e'3x = 0.861968x1 - 0.479385x2 + 0.164927x3
Note that
s11 + s22 + s33 = 2.0 + 8.0 + 7.0 = 17.0
= 13.21944 + 3.37916 + 0.40140 = λˆ1 + λˆ2 + λˆ3
and the proportion of total population variance due to
the each principal component is
ˆ
λ
1
p

ˆ
λ
i
13.21944
=
= 0.777613814
17.0
i=1
ˆ
λ
2
p

ˆ
λ
i
3.37916
=
= 0.198774404
17.0
i =1
ˆ
λ
3
p

ˆ
λ
i
0.40140
=
= 0.023611782
17.0
i =1
Note that the third principal component is relatively
irrelevant!
Next we obtain the correlations between the observed
values xi of the original random variables and the sample
principal components yik
ry1,x1 =
ry1,x2 =
ry1,x3 =
ry2,x1 =
ry2,x2 =
ˆe11 λˆ1
s11
eˆ λˆ
21
1
s22
ˆ
e λˆ
31
1
s33
ˆe12 λˆ2
s11
ˆ
e22 λˆ2
s21
0.291000 13.21944
=
= 0.529016407
2.0
0.734253 13.21944
=
= 0.333704415
8.0
0.613345 13.21944
=
= 0.318576185
7.0
0.415126 3.37916
=
= 0.381552972
2.0
0.480690 3.37916
=
= 0.110453671
8.0
ry2,x3 =
ry3,x1 =
ry3,x2 =
ry3,x3 =
ˆ32 λˆ2
e
s33
ˆ λˆ
e
13
3
s11
ˆ
e λˆ
23
3
s22
ˆ
e λˆ
33
s33
3
-0.772403 3.37916
=
= -0.202838600
7.0
0.861968 0.40140
=
= 0.273055007
2.0
-0.479385 0.40140
=
= -0.037964991
8.0
0.164927 0.40140
=
= 0.014927318
7.0
We can display these results in a correlation matrix:
Y1
Y2
Y3
X1
X2
X3
0.529016 0.333704 0.318576
0.381553 0.110454 -0.202839
0.273055 -0.037965 0.014927
How would we interpret these results?
Note that results based on the sample correlation matrix
R
will not differ from results based on the population
~
correlation matrix r (why?).
~
SAS code for Principal Components Analysis:
OPTIONS LINESIZE=72 NODATE PAGENO=1;
DATA stuff;
INPUT x1 x2 x3;
LABEL x1='Random Variable 1'
x2='Random Variable 2'
x3='Random Variable 3';
CARDS;
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0
;
PROC PRINCOMP DATA=stuff COV OUT=pcstuff;
VAR x1 x2 x3;
TITLE4 'Using PROC PRINCOMP for Principal Components Analysis';
RUN;
PROC CORR DATA=pcstuff;
VAR x1 x2 x3;
used to instruct SAS to perform
WITH prin1 prin2 prin3;
the principal components
run;
analysis on the sample
covariance rather than the
default (correlation matrix)!
SAS output for Principal Components Analysis:
The PRINCOMP Procedure
Observations
4
Variables
3
Mean
StD
x1
x2
x3
Simple Statistics
x1
x2
3.000000000
10.00000000
1.414213562
2.82842712
Random Variable 1
Random Variable 2
Random Variable 3
Covariance Matrix
x1
x2
2.000000000
3.333333333
3.333333333
8.000000000
1.333333333
4.666666667
Total Variance
1
2
3
x1
x2
x3
x3
11.50000000
2.64575131
17
Eigenvalues of the Covariance Matrix
Eigenvalue
Difference
Proportion
13.2193960
9.8400643
0.7776
3.3793317
2.9780594
0.1988
0.4012723
0.0236
Random Variable 1
Random Variable 2
Random Variable 3
Eigenvectors
Prin1
0.291038
0.734249
0.613331
x3
1.333333333
4.666666667
7.000000000
Prin2
0.415039
0.480716
-0.772434
Cumulative
0.7776
0.9764
1.0000
Prin3
0.861998
-0.479364
0.164835
SAS output for Correlation Matrix – Original Random
Variables vs. Principal Components:
The CORR Procedure
3 With Variables:
3
Variables:
Variable
Prin1
Prin2
Prin3
x1
x2
x3
N
4
4
4
4
4
4
Prin1
x1
Prin2
x2
Simple Statistics
Mean
Std Dev
0
1.49314
0
0.81371
0
0.32928
3.00000
1.41421
10.00000
2.82843
11.50000
2.64575
Prin3
x3
Sum
0
0
0
12.00000
40.00000
46.00000
Minimum
-2.20299
-0.94739
-0.28331
1.00000
6.00000
9.00000
Pearson Correlation Coefficients, N = 4
Prob > |r| under H0: Rho=0
x1
x2
x3
Prin1
0.86770
0.1323
0.96362
0.0364
0.74027
0.2597
Prin2
-0.45783
0.5422
-0.09890
0.9011
0.66538
0.3346
Prin3
0.19361
0.8064
-0.24832
0.7517
0.09631
0.9037
Maximum
1.11219
0.99579
0.47104
4.00000
12.00000
15.00000