Factor Analysis - Learn Via Web .com

Download Report

Transcript Factor Analysis - Learn Via Web .com

Factor Analysis
and
Principal Components
Factor analysis with principal
components presented as a
subset of factor analysis
techniques, which it is subset.
Principal Components: (PC)
Principal components is about explaining the variance-covariance
structure, , of a set of variables through a few linear combinations
of these variables.
In general PC is used for either:
1. Data reduction
or
2. Interpretation
If you have p variables x   x1 ,
, x p  you need p  components
to capture all the variability, but often a smaller number, k, principal
components can capture most of the variability.
So the original data of n measurements on p variables can be
reduced to a to a data set of n measurements on k principal
components. PC tends to be a means to an end but not the end
itself. That is PC is often not the final step. The PC may be
used for multiple regression, cluster analysis, etc.
Let x   x1 ,
, x p  with   covariance matrix 
consider the linear combination
Y1  a1x  a11x1  a12 x 2   a1p x p
p
Y2  a 2 x   a 2i x i
i 1
p
Yp  a p x   a pi x i
i 1
Var  Yi   a i a i , i  1, 2,
,p
Cov  Yi , Yk   a i a k , i, k  1, 2,
,p
1. First principal component  linear combination a1x
that maximizes Var  a1x  subject to a1a1  1
2. second principal component  linear combination a 2 x
that maximizes Var  a 2 x  subject to a 2a 2  1
and Cov  a1x,a 2 x   0
i th . i th principal component  linear combination a i x
that maximizes Var  a i x  subject to a i a i  1
and Cov  a i x,a k x   0 for k  i.
Find the principal components and the proportion of the total
population of the total population variance explained by
each when the covariance matrix is
  2  2 0 
1
1
 2
2
2 
    
  , 

2
2
 0  2  2 


To solve this you will have to go though your notes, but you can
do this even though didn’t give you the formula.
Hint : Re call : Maximization of Quadratic forms for points on
the Unit Sphere.
Let B be a positive definite matrix with eigenvalues
 pp 
1   2   3 
e1 ,e 2 ,
  p and associated normalized eigenvectors
,e p . Then
x Bx
max
 1  attained when x  e1 
x 0
x x
x Bx
min
  p  attained when x  e p 
x  0 x x
Moreover
x Bx
max
  k 1  attained when x  e k 1 k  1, 2,
x e1 , ,e k x x
, p  1
Answer :
  I        2          0
3
2
2
2
2
2

           242   0
2

   2 or    2 1   2
for 1   2
e1  1


  1   2 
2

0 1
2 
 2  2 1   2
e2  1 2 1
2 1 2 
3
e3  1 2 1
2 1
2
2 
2
Principal Components
1
Y1  X1 
2
1
Y2  X1 
2
1
Y3  X1 
2
Var
2
1

X3
2
1
1
X 2  X3
2 1   2
2
2
1
1
X2 
X3 2
2
2
 1  2
% Total var




1
3
1
1  2
3
1
1  2
3




Let  be the covariance matrix associated with the random vector
x   x1 , x 2 ,
, x p  . Let  have the eigenvalue  eigenvector pairs
 1 ,e1  ,   2 ,e2  ,
,   p ,e p  where 1   2 
Then the i th principal component is given by
Yi  ei x  ei1x1  ei 2 x 2   eip x p i  1, 2, , p
Var  Yi   ei ei   i
i  1, 2,
,p
Cov  Yi , Yk   ei e k  0 i  k
1. 11   22 
p
  pp
  Var  x i   1   2 
i 1
p
  p   Var  Yi 
i 1
  p  0.
Proof :
By definition tr     11   22 
  pp
we can write  as   PP where  is the diagonal matrix of
eigenvalues and P  e1 ,e 2 ,
,e p 
PP  PP  I
p
tr     tr  PP   tr  PP   tr       i
i 1
p
p
i 1
i 1
 Var  x i   tr     tr       Yi 
p
p
i 1
i 1
Thus total population variance   ii    i
Thus proportion of total pop.var. due to k th principal component

is p k
k  1, 2, , p.
 i
i 1
Y1  e1x , Y2  e2 x ,
, Yp  ep x are the principal components
from the covariance matrix , then Yi ,x k 
eik  i
 kk
i, k  1, 2,
are the correlation coefficients between Yi and the variables x k .
Here  1 ,e1  ,   2 ,e 2  ,
,p
,   p ,e p  are the eigenvalue  eigenvector
pair for .
Show Yi ,x k 
eik  i
 kk
Proof : set a k  0,0,
,1,0,
,0 
Cov  X k , Yk   Cov  a k x,ei x   a k ei
x k  a k x
ei   i ei
So Cov  X k , Yi   a k  i ei   i eik
Var  Yi    i
Show earlier 
Var  X k    kk
Yi ,Xk 
Cov  Yi , X k 
Var  Yi  Var  X k 
 i eik

 i  kk
eik  i

 kk
i, k  1, 2,
,p
Pricipal Components from Standard Variables
Z1   X1  1 
11
Z2   X 2   2 
 22
Zp   X p   p 
 pp
1


In matrix notation Z  V 


 11
0

1
 22
 0
2
where V  

 0
0

1
2
X  
0 

0 


 pp 
1
1

 

E  Z  0 Cov  Z    V    V   

 

The PC of Z can be obtained from the eigenvectors
of the correlation matrix  of X.
1
2
1
2
For notation we shall continue to use Yi to refer to the i th PC and
 i ,ei  as the eigenvalue  eigenvector pair from either  or .
However, the   i ,ei  derived from  are, in general, not the same as
the ones derived from .
The i th PC of the standardized variables Z   Z1, Z2 ,
 
with Cov  Z    is given by Yi  ei Z  ei  V 
 
1
2
p
p
i 1
i 1
, Z p 
1
X  
i  1,
,p
 Var  Yi    Var  Zi     The number of random variables, not rho 
Yi ,Zk  eik  i
i,k  1,2,
,p where   i ,ei  are
the eigenvalue  eigenvector pair for , with 1   2 
  p  0.
If S  Sik  is the p  p sample covariance matrix with


 ,  ˆ ,eˆ 
eigenvalue  eigenvector pairs ˆ 1 ,eˆ1 , ˆ 2 ,eˆ 2 ,
the i th sample principal component is given by
yˆ i  eˆ i x  eˆ i1x1  eˆ i2 x 2   eˆ ip x p i  1, 2, , p.
where ˆ 1  ˆ 2 
 ˆ p  0 and X is any observation
on the var iables X1 , X 2 ,
, Xp.
Also sample variance of yˆ k  ˆ k k  1, 2,
sample covariance of  yˆ i , yˆ k   0 i  k
ryˆ i ,x k 
eˆ ik ˆ i
Skk
i, k  1, 2,
,p
,p
p
p
Factor Analysis
The main purpose of factor analysis is to try to describe
the covariance relationships among many variables in term of
a few underlying, but unobservable, random quantities called
factors.
The Orthogonal Factor Model:
The observable random vector x, with p components, has mean
 and covariance matrix, .
The factor model proposes that x is linearly dependent upon
a few unobservable random variables, F1, F2 ,….,Fm , called
common factors and p additional sources of variation
1, 2 ,….,p called error, or specific factors.
X1  1  l11F1  l12 F2 
 l1m Fm  1
X 2   2  l21F1  l22 F2 
 l2m Fm   2
X p   p  lp1F1  lp2 F2 
 lpm Fm   p
in matrix notation :
X   L
(p1)
F  
(pm) (m1)
(p1)
lij is called the i th loading of the jth factor.
L is called the matrix of factor loadings.
The p deviations X1  1 , X 2   2 , , X p   p
are expressed in terms of F1 ,
p  m random variables.
, Fm , 1 ,
,  p , that is
Assumptions :
E  F  0
Cov  F   E  FF  I
E   0
1 0
0 
2
Cov     E      

pp

0
0
(m1)
(p1)
(mm)
0
0



p 
F and  are independent thus Cov  , F   E  F  0
(pm)
Solve for  in terms of L and .
  Cov  X   E  X    X    


 X    X      LF    LF   
  LF      LF    


 LF  LF     LF   LF  
E  X    X      E  LF  LF     LF   LF  




 E  LFFL  FL  LF  
 LE  FF L  E  F L  LE  F  E  
indepenent  0
 LIL    LL  
independent  0
 X    F   LF    F  LFF  F
Cov  X, F   E  X    F
 LE  FF  E  F  L.
Also
so
So Var  X i   li12 
2
 lim
 i
Cov  X i , X k   li1l k1 
Cov  X i , Fj   lij
 lim l km i  k
The Principal Component  and principal factor  method
 One method for factor analysis 
Re call Spectral decomposition :
Let   a covariance matrix  have eigenvalue  eigenvector pair
 i ,ei  with 1   2 
  1e1e1   2 e 2 e2 
  1 e1 |
 2 e2 |
  p  0. Then
  p e p ep
|
 1 e1 


  2 e2 

 p e p  




  p ep 
Thus for a factor analysis where m  p  #factors  # var iables 
and  i  0 for all i
  L L  0  LL
(pp)
(pp) (pp)
(pp)
(pp)
Since we almost always want fewer factors than original var iables
 m  p .
One approach when the last p  m eigenvalue are small is to neglect
the contribution of  m1e m1em1    p e p ep to .
We use an approximation and now   L L removing the last
(pp)
(pm) (pm)
p  m components. This assumes 1 error, can still be ignored.
If we wish to allow for  to be included
  LL  
  1 e1

 2 e2
m
where  i  ii   lij2
j1
 1 e1 

 1 0
  2 e2   0  2
 m em  


 

 0
0

  m em 
for i  1,
, p.
0
0



p 
When applying this approach it is typical to center the observations
 minus X .
In the case where the units of the variables are not the same


e.g.
K.g.
and
cm.


height
weight


it is usually desirable to work with the standardized variables,
  X j1  X1 

 X j2  X 2 
Zj  


 X jp  X p 
S11 

S22 



Spp 
j  1, 2,
,n
Maximum Likelihood Method for Factor Analysis
If the common factors, F, and  can be assumed to be normally
distributed, then ML estimates of the factor loadings and  variance
may be obtained.
When Fj and  j are jointly normal, the observations
X j    LFj   j are then normal. The likelihood is
L  ,     2 
 np 2

n 2
 1 1 n
exp   tr  X j  X X j  X  
j1
 2



n  X    X    

 1  1 n
exp   tr    X j  X X j  X

L  ,     2  
j1
 2 
p 2
1 2

 n
exp    X     1  X     .

  2 

 2
which depends on L and  through   LL  .
 n 1 p 2

 n 1 2

 

 

To make L will defined, a unique solution, impose the condition that
L 1L  a diagonal matrix
Proportion of total sample
th
variance due to the j
factor

ˆl 2  ˆl 2 
2j
1j
 ˆlpj2
S11  S22 
 Spp
Factor Rotation
All factor loadings obtained from the initial loadings by a
orthogonal transformation have the same ability to reproduce
the covariance (or correlation) matrix.
ˆ where TT  TT  I
Lˆ*  LT
ˆ ˆ
ˆ Lˆ  ˆ  Lˆ*Lˆ*  ˆ
ˆ  LTT
so LL
ˆ remains unchanged also.
The 
Imagine if these were only m  2 factors
Lˆ*  Lˆ T
p2
(p2) (22)
 cos  sin   clockwise
where T  
 rotation

sin

cos



cos   sin   counter clockwise
or T  

sin

cos

rotation


 is an angle which the factor loadings will be rotated through.
A Reference:
 The
following 13 slides comes
from:
 Multivariate
SPSS
 By
Data Analysis Using
John Zhang
 ARL, IUP
Factor Analysis-1


The main goal of factor analysis is data reduction.
A typical use of factor analysis is in survey
research, where a researcher wishes to represent
a number of questions with a smaller number of
factors
Two questions in factor analysis:


How many factors are there and what they represent
(interpretation)
Two technical aids:


Eigenvalues
Percentage of variance accounted for
Factor Analysis-2

Two types of factor analysis:



Exploratory: introduce here
Confirmatory: SPSS AMOS
Theoretical basis:


Correlations among variables are explained by
underlying factors
An example of mathematical 1 factor model for two
variables:
V1=L1*F1+E1
V2=L2*F1+E2
Factor Analysis-3




Each variable is composed of a common factor (F1)
multiply by a loading coefficient (L1, L2 – the lambdas
or factor loadings) plus a random component
V1 and V2 correlate because the common factor and
should relate to the factor loadings, thus, the factor
loadings can be estimated by the correlations
A set of correlations can derive different factor
loadings (i.e. the solutions are not unique)
One should pick the simplest solution
Factor Analysis-4

That is the findings
should not differ by
methodology of
analysis nor by sample
A factor solution needs to confirm:
 By a different factor method
 By a different sample

More on terminology



Factor loading: interpreted as the Pearson
correlation between the variable and the factor
Communality: the proportion of variability for
a given variable that is explained by the factor
Extraction: the process by which the factors
are determined from a large set of variables
Factor Analysis-5 (Principle components)

Principle component: one of the extraction
methods



A principle component is a linear combination of
observed variables that is independent (orthogonal) of
other components
The first component accounts for the largest amount of
variance in the input data; the second component
accounts for the largest amount or the remaining
variance…
Components are orthogonal means they are
uncorrelated
Factor Analysis-6 (Principle components)

Possible application of principle
components:

E.g. in a survey research, it is common to have
many questions to address one issue (e.g.
customer service). It is likely that these
questions are highly correlated. It is
problematic to use these variables in some
statistical procedures (e.g. regression). One
can use factor scores, computed from factor
loadings on each orthogonal component
Factor Analysis-7 (Principle components)

Principle component vs. other extract methods:




Principle component focus on accounting for the
maximum among of variance (the diagonal of a
correlation matrix)
Other extract methods (e.g. principle axis factoring)
focus more on accounting for the correlations between
variables (off diagonal correlations)
Principle component can be defined as a unique
combination of variables but the other factor methods
can not
Principle component are use for data reduction but more
difficult to interpret
Factor Analysis-8

Number of factors:

Eigenvalues are often used to determine how
many factors to take

Take as many factors there are eigenvalues greater
than 1
 Eigenvalue represents the amount of standardized
variance in the variable accounted for by a factor
 The amount of standardized variance in a variable is 1
 The sum of eigenvalues is the percentage of variance
accounted for
Factor Analysis-9

Rotation


Objective: to facilitate interpretation
Orthogonal rotation: done when data reduction is the
objective and factors need to be orthogonal



Varimax: attempts to simplify interpretation by maximize
the variances of the variable loadings on each factor
Quartimax: simplify solution by finding a rotation that
produces high and low loadings across factors for each
variable
Oblique rotation: use when there are reason to allow
factors to be correlated

Oblimin and Promax (promax runs fast)
Factor Analysis-10

Factor scores: if you are satisfy with a factor
solution


You can request that a new set of variables be
created that represents the scores of each
observation on the factor (difficult of interpret)
You can use the lambda coefficient to judge
which variables are highly related to the factor;
the compute the sum of the mean of this
variables for further analysis (easy to interpret)
Factor Analysis-11


Sample size: the sample size should be about 10
to 15 times the number of variables (as other
multivariate procedures)
Number of methods: there are 8 factoring
methods, including principle component


Principle axis: account for correlations between the
variables
Unweighted least-squares: minimize the residual
between the observed and the reproduced correlation
matrix
Factor Analysis-12




Generalize least-squares: similar to Unweighted leastsquares but give more weight to the variables with
stronger correlation
Maximum Likelihood: generate the solution that is the
most likely to produce the correlation matrix
Alpha Factoring: Consider variables as a sample; not
using factor loadings
Image factoring: decompose the variables into a
common part and a unique part, then work with the
common part
Factor Analysis-13

Recommendations:



Principle components and principle axis are the
most common used methods
When there are multicollinearity, use principle
components
Rotations are often done. Try to use Varimax
Reference
Factor Analysis from SPSS
 Much of the wording comes from the SPSS
help and tutorial.

Factor Analysis

Factor Analysis is primarily used for data
reduction or structure detection.


The purpose of data reduction is to remove
redundant (highly correlated) variables from
the data file, perhaps replacing the entire data
file with a smaller number of uncorrelated
variables.
The purpose of structure detection is to
examine the underlying (or latent)
relationships between the variables.
Factor Analysis

The Factor Analysis procedure has several extraction methods for
constructing a solution.



For Data Reduction. The principal components method of extraction
begins by finding a linear combination of variables (a component) that
accounts for as much variation in the original variables as possible. It
then finds another component that accounts for as much of the
remaining variation as possible and is uncorrelated with the previous
component, continuing in this way until there are as many components
as original variables. Usually, a few components will account for most
of the variation, and these components can be used to replace the
original variables. This method is most often used to reduce the
number of variables in the data file.
For Structure Detection. Other Factor Analysis extraction methods go
one step further by adding the assumption that some of the variability
in the data cannot be explained by the components (usually called
factors in other extraction methods). As a result, the total variance
explained by the solution is smaller; however, the addition of this
structure to the factor model makes these methods ideal for
examining relationships between the variables.
With any extraction method, the two questions that a good solution
should try to answer are "How many components (factors) are needed
to represent the variables?" and "What do these components
represent?"
Factor Analysis: Data Reduction
An industry analyst would like to predict
automobile sales from a set of predictors.
However, many of the predictors are
correlated, and the analyst fears that this
might adversely affect her results.
 This information is contained in the file
car_sales.sav . Use Factor Analysis with
principal components extraction to focus
the analysis on a manageable subset of
the predictors.

Factor Analysis: Structure Detection
A telecommunications provider wants to
better understand service usage patterns
in its customer database. If services can
be clustered by usage, the company can
offer more attractive packages to its
customers.
 A random sample from the customer
database is contained in telco.sav . Factor
Analysis to determine the underlying
structure in service usage.
 Use: Principal Axis Factoring

Example of Factor Analysis: Structure Detection
Telecommunications
provider wants to
better understand
service usage
patterns in its
customer database.
Selecting service
offerings
Example of Factor Analysis: Descriptives
Click descriptives:
Recommend
checking Initial
Solution (default)
In addition, check
“Anti-image” and
“KMO and …”.
Example of Factor Analysis: Extraction
Click Extraction:
Select Method
“Principal axis
factoring”.
Recommend
Keep defaults but
also check “Scree
plot”.
Example of Factor Analysis: Rotation
Click
Rotation:
Select
“Varimax”
and
Loading
plot(s)”.
Understanding the Output
The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is a statistic that
indicates the proportion of variance in your variables that might be caused
by underlying factors. Perhaps can’t use factor analys if <0.5
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling
Adequacy.
Bartlett' s Test of
Sphericity
Approx. Chi-Square
df
Sig .
.888
6230.901
91
.000
Bartlett's test of sphericity tests the hypothesis that your correlation
matrix is an identity matrix, which would indicate that your variables
are unrelated and therefore unsuitable for structure detection. Sig.
<0.05 than factor analysis may be helpful.
Understanding the Output
Communalities
Long distance last month
Toll free last month
Equipment last month
Calling card last month
Wireless last month
Multiple lines
Voice mail
Paging service
Internet
Caller ID
Call waiting
Call forwarding
3-way calling
Electronic billing
Initial
.297
.510
.579
.266
.660
.276
.471
.527
.455
.552
.545
.532
.506
.416
Extraction
.748
.564
.697
.307
.708
.340
.501
.541
.525
.623
.610
.596
.561
.488
Extraction Method: Principal Axis Factoring.
Extraction communalities
are estimates of the
variance in each variable
accounted for by the
factors in the factor
solution. Small values
indicate variables that do
not fit well with the factor
solution, and should
possibly be dropped from
the analysis. The lower
values of Multiple lines
and Calling card show
that they don't fit as well
as the others.
Understanding the Output
Before
rotation
Only three factors in
the initial solution have
eigenvalues greater
than 1. Together, they
account for almost
65% of the variability
in the original
variables. This
suggests that three
latent influences are
associated with service
usage, but there
remains room for a lot
of unexplained
variation.
Understanding the Output
After
rotation
From rotation
approximately now
56% of the variation
is explained about a
10% loss in
explanation of the
variation.
In general, there are a lot of services that have correlations
greater than 0.2 with multiple factors, which muddies the
picture. The rotated factor matrix should clear this up.
Understanding the Output
Before
rotation
Factor Matrixa
1
Long distance last month
Toll free last month
Equipment last month
Calling card last month
Wireless last month
Multiple lines
Voice mail
Paging service
Internet
Caller ID
Call waiting
Call forwarding
3-way calling
Electronic billing
.146
.652
.494
.364
.799
.257
.669
.692
.323
.689
.678
.684
.662
.250
Factor
2
-.254
-.373
.671
-.243
.261
.280
.228
.246
.648
-.345
-.366
-.336
-.338
.652
Extraction Method: Principal Axis Factoring .
a. Attempted to extract 3 factors. More than 25 iterations
req uired. (Converg ence=.002). Extraction was
terminated.
3
.814
.020
.054
.339
.037
.442
-.038
-.050
-.014
-.172
-.126
-.128
-.093
-.035
The relationships in the
unrotated factor matrix
are somewhat clear. The
third factor is associated
with Long distance last
month. The second
corresponds most
strongly to Equipment
last month, Internet, and
Electronic billing. The first
factor is associated with
Toll free last month,
Wireless last month,
Voice mail, Paging
service, Caller ID, Call
waiting, Call forwarding,
and 3-way calling.
Understanding the Output
After
rotation
Rotated Factor Matrixa
1
Long distance last month
Toll free last month
Equipment last month
Calling card last month
Wireless last month
Multiple lines
Voice mail
Paging service
Internet
Caller ID
Call waiting
Call forwarding
3-way calling
Electronic billing
.062
.726
.067
.348
.530
-.025
.455
.468
-.049
.787
.779
.768
.743
-.107
Factor
2
-.121
.018
.831
-.012
.637
.384
.539
.566
.722
.056
.033
.062
.050
.686
Extraction Method: Principal Axis Factoring .
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
3
.854
.191
.049
.431
.146
.438
.054
.044
-.045
.008
.054
.048
.078
-.080
The first rotated factor is
most highly correlated
with Toll free last month,
Caller ID, Call waiting,
Call forwarding, and 3way calling. These
variables are not
particularly correlated
with the other two
factors. The second factor
is most highly correlated
with Equipment last
month, Internet, and
Electronic billing. The
third factor is largely
unaffected by the
rotation.
Thus, there are three major groupings of services, as defined by the services
that are most highly correlated with the three factors. Given these groupings,
you can make the following observations about the remaining services:
Understanding the Output
Rotated Factor Matrixa
1
Long distance last month
Toll free last month
Equipment last month
Calling card last month
Wireless last month
Multiple lines
Voice mail
Paging service
Internet
Caller ID
Call waiting
Call forwarding
3-way calling
Electronic billing
.062
.726
.067
.348
.530
-.025
.455
.468
-.049
.787
.779
.768
.743
-.107
Factor
2
-.121
.018
.831
-.012
.637
.384
.539
.566
.722
.056
.033
.062
.050
.686
Extraction Method: Principal Axis Factoring .
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
3
.854
.191
.049
.431
.146
.438
.054
.044
-.045
.008
.054
.048
.078
-.080
Because of their moderately large
correlations with both the first and
second factors, Wireless last
month, Voice mail, and Paging
service bridge the "Extras" and
"Tech" groups. Calling card last
month is moderately correlated
with the first and third factors,
thus it bridges the "Extras" and
"Long Distance" groups. Multiple
lines is moderately correlated with
the second and third factors, thus
it bridges the "Tech" and "Long
Distance" groups. This suggests
avenues for cross-selling. For
example, customers who
subscribe to extra services may be
more predisposed to accepting
special offers on wireless services
than Internet services.
Summary: What Was Learned

Using a principal axis factors extraction,
you have uncovered three latent factors
that describe relationships between your
variables. These factors suggest various
patterns of service usage, which you can
use to more efficiently increase crosssales.
Using Principal Components

Principal Components can aid in
clustering.

What is principal components?

Principal is a statistical technique that creates
new variables that are linear functions of the
old variables. The main goal of principal
components is to to reduce the number of
variables needed to analyze.
Principal Components
Analysis (PCA)
What it is and when it
should be used.
Introduction to PCA

What does principal components analysis do?



Takes a set of correlated variables and creates a smaller set of
uncorrelated variables.
These newly created variables are called principal components.
There are two main objectives for using PCA
1.
Reduce the dimensionality of the data.
–
–
–
2.
Identify new meaningful underlying variables.
–
–

In simple English: turn p variables into less than p variables.
While reducing the number of variables we attempt to keep as much
information of the original variables as possible.
Thus we try to reduce the number of variables without loss of
information.
This is often not possible.
The “principal components created are linear combinations of the
original variables and often don’t lend to any meaning beyond that.
There are several reasons why and situations where PCA is
useful.
Introduction to PCA

There are several reasons why PCA is useful.
1.
2.
PCA is helpful in discovering if abnormalities exist in a multivariate
dataset.
Clustering (which will be covered later):
–
PCA is helpful when it is desirable to classify units into groups with
similar attributes.

–
3.
It can also be helpful for verifying the clusters created when clustering.
Discriminant analysis:
–
–
4.
For example: In marketing you may want to classify your customers into
groups (or clusters) with similar attributes for marketing purposes.
In some cases there may be more response variables than
independent variables. It is not possible to use discriminant analysis
in this case.
Principal components can help reduce the number of response
variables to a number less than that of the independent variables.
Regression:
–
It can help address the issue of multicolinearity in the independent
variables.
Introduction to PCA

Formation of principal components
1.
2.
3.
4.
5.
They are uncorrelated
The 1st principal component accounts for as
much of the variability in the data as possible.
The 2nd principal component accounts for as
much of the remaining variability as possible.
The 3rd …
Etc.
Principal Components and Least Squares

Think of the Least Squares model
Y  X E
Y is a n  p matrix of the centered observed variables.
X is a n  j matrix of the scores on the 1st j principal components.
B is a j  p matrix of the eigenvectors.
E is a n  p matrix of the residuals.
• Eigenvector <mathematics> A vector which, when acted on by a
particular linear transformations, produces a scalar multiple of the
original vector. The scalar in question is called the
eigenvalue corresponding to this eigenvector.
•www.dictionary.com
Calculation of the PCA

There are two options:
1.
2.


Correlation matrix.
Covariance matrix.
Using the covariance matrix will cause
variables with large variances to be more
strongly associated with components with
large eigenvalues and the opposite is true
of variables with small variances.
For the above reason you should use the
correlation matrix unless the variables are
comparable or have been standardized.
Limitations to Principal Components

PCA converts a set of correlated variables
into a smaller set of uncorrelated
variables.

If the variables are already uncorrelated
than PCA has nothing to add.

Often it is difficult to impossible to explain
a principal component. That is often
principal components do not lend
themselves to any meaning.
SAS Example of PCA



We will analyze data on crime.
CRIME RATES PER 100,000 POPULATION BY STATE.
The variables are:
1.
2.
3.
4.
5.
6.
7.

MURDER
RAPE
ROBBERY
ASSAULT
BURGLARY
LARCENY
AUTO
SAS command for PCA
SAS CODE:


PROC PRINCOMP DATA=CRIME OUT=CRIMCOMP;
run;
The dataset is CRIME and results
will be saved to CRIMCOMP
SAS Output Of Crime Example
Observations
50
7
Variables
Simple Statistics
MURDER
RAPE
ROBBERY
ASSAULT BURGLARY
Mean 7.444000000 25.73400000 124.0920000 211.3000000
StD
3.866768941 10.75962995
88.3485672 100.2530492
LARCENY
AUTO
1291.904000 2671.288000 377.5260000
432.455711
725.908707 193.3944175
Correlation Matrix
MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO
MURDER
1.0000 0.6012
0.4837
0.6486
0.3858
0.1019 0.0688
RAPE
0.6012 1.0000
0.5919
0.7403
0.7121
0.6140 0.3489
ROBBERY
0.4837 0.5919
1.0000
0.5571
0.6372
0.4467 0.5907
ASSAULT
0.6486 0.7403
0.5571
1.0000
0.6229
0.4044 0.2758
BURGLARY
0.3858 0.7121
0.6372
0.6229
1.0000
0.7921 0.5580
LARCENY
0.1019 0.6140
0.4467
0.4044
0.7921
1.0000 0.4442
AUTO
0.0688 0.3489
0.5907
0.2758
0.5580
0.4442 1.0000
More SAS Output Of Crime Example
0.09798342=0.22203947 - 0.12045606
Eigenvalues of the Correlation Matrix
Eigenvalue Difference Proportion Cumulative
1
4.11495951
2.87623768
0.5879
0.5879
2
1.23872183
0.51290521
0.1770
0.7648
3
0.72581663
0.40938458
0.1037
0.8685
4
0.31643205
0.05845759
0.0452
0.9137
5
0.25797446
0.03593499
0.0369
0.9506
6
0.22203947
0.09798342
0.0317
0.9823
7
0.12405606
0.0177
1.0000
The proportion of variability explained
by each principal component
individually. This value equals the
Eigenvalue/(sum of the Eigenvalues).
The first two
principal components
captures 76.48% of
the variation.
If you include 6 of the 7
principal components you
capture 98.23% of the
variability. The 7th component
only captures 1.77%.
More SAS Output Of Crime Example
Eigenvectors
Prin1
Prin2
Prin3
Prin4
Prin5
Prin6
Prin7
MURDER
0.300279 -.629174 0.178245 -.232114 0.538123 0.259117 0.267593
RAPE
0.431759 -.169435 -.244198 0.062216 0.188471 -.773271 -.296485
ROBBERY
0.396875 0.042247 0.495861 -.557989 -.519977 -.114385 -.003903
ASSAULT
0.396652 -.343528 -.069510 0.629804 -.506651 0.172363 0.191745
BURGLARY
0.440157 0.203341 -.209895 -.057555 0.101033 0.535987 -.648117
LARCENY
0.357360 0.402319 -.539231 -.234890 0.030099 0.039406 0.601690
AUTO
0.295177 0.502421 0.568384 0.419238 0.369753 -.057298 0.147046
Prin1 has all positive
values. This variable
can be used as a proxy
for overall crime rate.
Prin2 has positive and negative values. Murder, Rape,
and Assault are all negative (Violent Crimes).
Robbery, Burglary, Larceny, and Auto are all positive
(Property). This variable can be used for an
understanding of Property vs. Violent crime.
CRIME RATES PER 100,000 POPULATION BY STATE
STATES LISTED IN ORDER OF OVERALL CRIME
RATE AS DETERMINED BY THE FIRST PRINCIPAL
COMPONENT
Lowest 10 States and Then theTop 10 States
CRIME RATES PER 100,000 POPULATION BY STATE.
STATES LISTED IN ORDER OF PROPERTY VS. VIOLENT CRIME AS
DETERMINED BY THE SECOND PRINCIPAL COMPONENT
Lowest 10 States and Then theTop 10 States
Correlation From SAS: First the Descriptive Statistics
(A part of the output from Correlation)
Correlation Matrix
Correlation Matrix: Just the Variables
Note that there is correlation
among the crime rates.
Correlation Matrix: Just the Principal Components
Note that there is no
correlation among the
principal components.
Correlation Matrix: Just the Principal Components
Note the higher/very high
correlations with the 1st few
principal components and it
decreases as it goes closer to the
last principal component.
What If We Told SAS to Produce
Only 2 Principal Components?
Eigenvalues of the Correlation Matrix
Eigenvalue Difference Proportion Cumulative
The 2 principal
components
produced when it
is asked to
produce only 2
principal
components are
exactly the same
for when it
produced all.
1
4.11495951
2
1.23872183
2.87623768
0.5879
0.5879
0.1770
0.7648
Eigenvectors
Prin1
Prin2
MURDER
0.300279 -.629174
RAPE
0.431759 -.169435
ROBBERY
0.396875 0.042247
ASSAULT
0.396652 -.343528
BURGLARY
0.440157 0.203341
LARCENY
0.357360 0.402319
AUTO
0.295177 0.502421