Basic principles of probability theory

Download Report

Transcript Basic principles of probability theory

Factor Analysis
•
•
•
•
•
•
Purpose of Factor Analysis
Maximum likelihood Factor Analysis
Least-squares
Factor rotation techniques
R commands for factor analysis
References
Purpose of Factor Analysis
Factor analysis is one of the techniques to reduce dimension of the observed
variables. Suppose that we have p-dimensional continuous variable vector x =
(x1,x2,,,xp). We can observe these variables. These may not be real independent
underlying variables. Factor analysis seeks to find real underlying variables that
are not observable. It means that We want to find m<p dimensional vector –
y=(y1,y2,,,ym) of independent variables satisfying conditions:.
x  μ  Γy  e
Where e is normal random vector with 0 mean and constant dispersion. It is assumed
that elements of e are independent of each other and y. Moreover it is assumed
that elements of y are independent each other and they are standard normal
variables. We can write:
e ~ N (0, Ψ), y ~ N (0,I)
Where  is the diagonal mxm matrix. Elements of this matrix are called specific or
unique variances. Weights  are factor loadings. Elements of y are called
common variables and elements of e are called unique or specific
variables.Without loss of generality we will assume that mean of x s 0, i.e. =0.
Note that in case of PCA we wanted to find linear combination of observable
variables. In the case of factor analysis we want to find independent variables
linear combinations of which are observable variables.
As it is case in many situations assumption of normal distribution makes treatment
easy, although results are applicable to wider range of problems.
Factor analysis model
Model defined by the linear equation given above can not be solved directly. We can
use the relation between covariance matrix, factor loadings and specific variances.
It has the form:
Σ  ΓΓ T  Ψ
Objective of the factor analysis is to determine m (length of the vector y),  and 
using the observed sample estimate of the covariance matrix S.
It should be noted that if we have mxm orthogonal matrix M (MTM=I) then for z=My
we can write:
x  ΓMT z  e and
Σ  ΓMMT ΓT  Ψ  ΓΓT  Ψ
i.e. solution to the problem is not unique. Solutions are indeterminate up to an
orthogonal transformation. Only thing we can do is to estimate the factor space.
To be able to find the unique solution we need to add new condition. This
condition is:
ΓT Γ  Λ or ΓT Ψ1Γ  Λ or ΓT D1Γ  Λ
where  and D are a diagonal matrices. If we can identify factor space using these
constraints then we can use any rotation matrix and define other factors.
Moreover we can use even any non-singular matrix and use it to redefine new
factors. When we use orthogonal transformation then independent variables go to
independent variables. When we use non-orthogonal transformation independent
variables may go to dependent variables.
Note that if =0 then the second condition cannot be used. It is called Heywood case.
Variance of variables and communalities
We can write relations between covariances of original variables and loadings and
unique variances
 m
   ik jk
 ij   mk 1
  ik2  ii
 k 1
i j
i j
The term:
m
h    ik2   ii  ii
2
i
k 1
is also called communality. That is the variance of the original variable shared with
others via common variables. And ii is the unique variances that is property of
the variable of xi only.
Maximum number of factors
Number of elements in the covariance matrix of p variables is ½p(p+1) (elements of S).
Number of elements of loadings is pm, number of specific variances is p. Thus we
want identify p(m+1) elements. Number of constraints is ½m(m-1). Taking the
constraints into account we want to identify p(m+1)-1/2m(m-1) elements using
½p(p+1) elements. Then we can write relation for the maximum number of
identifiable elements:
1
1
1
p( p  1)  ( p(m  1)  m(m  1))  (( p  m) 2  ( p  m))  0
2
2
2
2p  1 - 8p  1
( p  m) 2  p  m or m 
2

For example if we have 6 original variables we cannot define more than 3 factor
variables. If we have 15 original variables we cannot define more than 10 new
variables. In practice it is hoped that one can find much smaller number of factors
describing the whole system.
Factor Analysis using Maximum likelihood
If we use assumptions that n observed variables xi = (xi1,,,xip) are distributed normally
then we can write for the likelihood function (assuming that mean of x is 0):
L(x | Γ, Ψ)  (2 )
 np / 2
|Σ|
n / 2
1 n T 1
n
exp(  x i Σ x i )  (2 ) np / 2 | Σ |n / 2 exp( tr ( Σ 1S)),
2 i 1
2
where Σ  ΓΓT  Ψ
We can write for the log-likelihood function:
l (x | Γ, Ψ)  
np
n
n
log( 2 )  log | Σ |  tr ( Σ 1S)
2
2
2
Derivatives wrt to factor loadings and specific variables become:
l
n
  (2 Σ 1 (Σ  S )Σ 1  diag( Σ 1 (Σ  S )Σ 1 )Γ  0
Γ
2
l
n
diag( )   diag( Σ 1 (Σ  S )Σ 1 )  0
Ψ
2
here we used the matrix notation of the derivatives, some facts from matrix algebra and
the fact that covariance matrix is symmetric:
 log | Σ |
(tr ( Σ 1S)
1
1
 2Σ  diag( Σ ),
 2Σ 1SΣ 1  diag( Σ 1SΣ 1 )
Σ
Σ
Factor analysis using ML
The maximum likelihood equations are usually solved iteratively. Care should be taken
in implementation of these equations as convergence can be slow and some
elements of the specific variables can become negative. These equations are
usually solved using Newton-Raphson (NR) second order methods or scoring
method (scoring method uses Fisher information matrix instead of the second
derivative matrix. It can be slower than NR but has attractive properties that
initial values of the parameters can be far from optimal.) Numerical optimisation
should also ensure that i>0. Optimisations are usually done using these
constraints.
Maximum likelihood can be performed in a following way: find initial values for i,
then estimate values for  and then find new values for i.
One of the problems in factor analysis is the common problem in multivariate analysis:
It is not guarantied that all measurement are in the same scale. For that reason it is
common to use correlation matrices instead of covariance matrices. If factor
analysis is done using Maximum likelihood then loadings using correlation matrix
can easily derived. In general maximum likelihood estimation is invariant under
transformations with non-zero Jacobians. Since transformation from covariance
matrix to correlation matrix (and corresponding transformation of loadings and
unique variances) has non-zero Jacobian then having found parameters using one
of them we can derive another one.
Least-squares for Factor analysis
Other widely used technique for factor analysis is the least-squares technique. Its
simplicity makes it attractive. It is done by minimisation of:
tr[(S  Σ) 2 )  min
Covariance matrix has the same conditions as before. If we get derivatives and equate
to 0 we can derive the following equations:
(S  Ψ)Γ  Γ(ΓT Γ)
Ψ  diag(S  ΓΓT )
First initial value for  is taken and using the first equation  is found. For this
eigenvalue analysis is used. Then using the second equation  is updated. This
technique is called principal factor analysis. It should not be confused with
principal component analysis. If values of  are 0 then the first equation is very
similar to principal component analysis. That is the reason why some statistical
packages contain PCA as a special case for factor analysis.
Two points should be noted: Least-squares are usually used to find initial estimates
for ML. If correlation matrix is used then results derived using least squares
will be different. Results obtained using covariance and correlation matricess
can not be converted into each other using simple scaling as it was the case for
the maximum likelihood estimation.
Significance test and model selection
If normality assumptions holds then we can use likelihood ratio test for factor with
dimension m. If null hypothesis is:
H 0 : Σ  ΓΓT  Ψ
and the alternative is that covariance is unconstrained (i.e. null hypothesis is not true)
then likelihood ratio test reduces to:
  n(tr (Σ1S)  log(| Σ1S |)  p)
Distribution of this is approximated by a chi-squared distribution with ½((p-m)2-(p+m))
degrees of freedom. This enables us to carry out the significance test for nullhypothesis. If maximum number of identifiable parameters is reached we can
conclude that it is not straightforward to extract from the given data some
structure.
Usually n is replaced by n’=n-1-1/6(2p+5)-2/3m. In this case chi-squared
approximation is more accurate. This test is called a goodness-of-fit test.
For model selection usual techniques used are: First carry out principal component
analysis then using one of the recommended techniques (scree plot, proportion of
variances etc) select number of factors. Then do factor analysis starting from this
value. Likelihood ratio test can be carried out to test significance of the number of
factors. But it should applied with care. Likelihood ratio test does not make any
adjustments on sequential application of the test.
Determining the number of parameters is trade of between the number of parameters
(we want to have as little as possible) and goodness-of-fit.
Factor rotations
Factor analysis does not give the unique solution. As we noted above using
orthonormal rotation we can derive factors that will fit the model with exactly
same accuracy. It is usual to rotate factors after analysis. There are several
techniques for doing that. All they attempt to minimise some loadings and
maximise others so that interpretation of results is easy. Two widely used
techniques to derive rotations are varimax and quartimax. Varimax maximises:
m
R1   ( ij2   j ) 2  max,
j 1
where ij 
 'ij
m
(  ij2 )1/ 2
,j 
1 n 2
 ij
p i 1
j 1
’ are loadings after the rotation.
Quartimax maximises:
m
p
R2  
j 1 i 1
1  m p 2
   ij 
 
pm  j 1 i 1 
2
4
ij
Many statistical packages can find rotation matrices using these techniques. R uses
varimax only. Sometimes it is useful to find non-orthogonal rotation matrices.
One of the techniques is promax available in R.
One of the techniques for factor rotation maximises non-normality of the unobserved
(common) variables. This technique is an separate technique and it is called
Independent component analysis (ICA).
Factor scorings
There are also techniques to find factor scores. One technique is due to Bartlett that uses leastsquares technique:
ˆ y )T Ψ1 (x  Γ
ˆ y )  min
(xi  Γ
i
i
i
If we get derivatives of this wrt to y and equate to zero we can get:
ˆ 1Γ
ˆ 1x
ˆ TΨ
ˆ )1 Γ
ˆ TΨ
yˆ i  (Γ
i
Another technique uses normality assumption (due to Thomson) and finds conditional expected
value of y given x. It turns out to be:
ˆ 1Γ
ˆ 1x
ˆ TΨ
ˆ )1 Γ
ˆΨ
yˆ i  E(y | xi )  (I  Γ
i
Here we assumed that mean values of x-s are 0.
Both technique gives score as a linear combination of the initial variables.
y i  Axi
A is sometimes called factor score estimation matrix in computer package output.
R commands for factor analyses
First decide what data matrix we have and prepare data matrix. Necessary commands for factor
analysis are in the package called mva. This package contains many functions for
multivariate analysis. First load this package using
library(mva) – loads the library mva
Now we can analyse data using PCA
data(swiss) – loads data
fan <- factanal(swiss,2) - It does actual calculations. Second number is the number of factors
desired. Have a look help for this command. There are options for rotation and other
things
fan = factanal(swiss,2,scores=“Bartlett”) – will do factor analysis and calculate scores.
varimax(fan$loadings) – perform varimax rotation
promax(fan$loadings) – performs promax rotation
fan - prints out the result of factor analaysis
If covariance matrix has been calculated by some means then it can be used for factor analysis:
data (Harman23.cor)
fan = factanal(covmat=Harman23.cor,factors=3)
It will use factor analysis using the correlation matrix. Obviously scores can not be calculated.
References
1)
2)
3)
Krzanowski WJ and Marriout FHC. (1994) Multivatiate analysis. Vol 2.
Kendall’s library of statistics
Morrison DR (1990) Multivatiate statistical methods
Mardia,KV, Kent, JT and Bibby, JM (2003) Multivariate analysis