Transcript Basic principles of probability theory
• • • • • •
Factor Analysis
Purpose of Factor Analysis Maximum likelihood Factor Analysis Least-squares Factor rotation techniques R commands for factor analysis References
Purpose of Factor Analysis
Factor analysis is one of the techniques to reduce dimension of the observed variables.
Suppose that we have
p
-dimensional continuous variable vector
x = (x 1 ,x 2 ,,,x p ).
We want describe correlation between them with m dimensional vector –
y=(y 1 ,y 2 ,,,y m ).
Suppose that the joint probability distribution of these two variables is normal and equal to
f(x,y).
Conditional probability distribution of Conditional mean of
x
is linear in
y x
given
y
is also normal and given by
g(x|y).
and the covariance matrix does not depend on
y.
Then we can write:
x
y
e
Where
e
is normal random vector with 0 mean and constant dispersion. It is additionally assumed that elements of
e
are independent of each other and
y.
Moreover it is assumed that elements of y are independent each other and they are standard normal variables.
We can write:
e
~
N
( 0 , ),
y
~
N
( 0 ,
I
) Where is the diagonal
m
x
m
matrix. In this model
e
is specific variables and weights factor loadings.
Without loss of generality we will assume that mean of
x
s 0, i.e.
=0.
are Vector
x
is what we can observe and vector
y
is what we think is the vector of independent variables. We want to deduce from the observations independent variables
Factor analysis model
Model defined by the linear equation given above can not be solved directly. We can use the relation between covariance matrix, factor loadings and specific variables. It has the form:
T
Objective of the factor analysis is to determine
m
(length of the vector obseved sample estimate of the covariance matrix
S
.
y
), and using the It should be noted that if we have
m
x
m
we can write: orthogonal matrix
M x
M T z
e
and MM T
T
T
(
M T M=I
) then we we write z=My i.e. solution to the problem is not unique. Solutions are indeterminate up to an orthogonal transformation. Only thing we can do is to estimate the factor space. To be able to find the unique solution we need to add new condition. This condition is:
T
where is a diagonal matrix with diagonal elements
i , i=1,m.
If we can identify factor space using these constraints then we can use any rotation matrix and define other factors.
Moreover we can use even any non-singular matrix and use it to redefine new factors.
When we use orthogonal transformation then independent variables go to independent variables. When we use non-orthogonal transformation independent variables may go to dependent variables.
Maximum number of factors
Number of elements in the covariance matrix of p variables is ½
p(p+1)
(elements of S). Number of elements of loadings is
pm,
number of specific elements is
m
. Thus we want identify
p(m+1)
elements. Number of constraints is ½m(m-1). Taking the constraints into account we want to identify
p(m+1)-1/2m(m-1)
elements using ½p(p+1) elements. Then we can write relation for the maximum number of identifiable elements: 1 2
p
(
p
1 )
p
(
m
1 ) 1 2
m
(
m
1 ) (
p
m
) 2
p
m
or m 2p 1 8p 1 2 For example if we have 6 original variables we cannot define more than 3 factor variables. If we have 15 original variables we cannot define more than 10 new variables. In practice it is hoped that one can find much smaller number of factors describing the whole system.
Factor Analysis using Maximum likelihood
If we use assumptions that variables
x = (x 1 ,,,x p )
are distributed normally then we can write for the likelihood function (we assume that we have
n
observation vectors):
L
(
x
| , ) ( 2 )
np
/ 2 | | 1 / 2 exp( 1 2
i n
1 (
x i
)
T
1 (
x i
)) , where T If we maximise likelihood wrt mean values then we can see that they do not depend on covariance matrix. So we can forget about them. Using the form of the sample covariance matrix we can write for the log-likelihood function:
l
(
x
| , ) 1 2
np
log( 2 ) 1 2 log | | 1 2
ntr
( 1
S
) Derivative wrt to factor loadings and specific variables becomes:
l
diag
(
n
1 2
l
) (
n
2
S
)
diag
( 1 1 (
S
) 1 ) here we use matrix notation of the derivatives. Here we used several facts from matrix derivatives (since matrices we are dealing with are symmetric): | | 2 1
diag
( 1 ), (
tr
( 1
S
) 2 1
S
1
diag
( 1
S
1 )
Factor analysis using ML
After some manipulations using dependence of covariance on loadings and independent variables the maximum likelihood equations can be written in a form: 1 / 2
S
1 / 2 ( 1 / 2 ) ( 1 / 2 )(
I
)
diag
(
T
S
) where T 1 These are maximum likelihood equations for factor analysis. These equations are solved usually iteratively. Care should be taken in implementation of these equations as convergence can be slow and some elements of the specific variables can become negative. These equations are usually solved using Newton-Raphson second order methods or scoring method (scoring method uses Fisher information matrix instead of the second derivative matrix. It can be slow but has attractive properties that initial values of the parameters can be far from optimal.) Numerical optimisation should also ensure that
i
>0. Optimisations are usually done using these constraints.
Maximum likelihood can be performed in a following way: find initial values for estimate values for and then find new values for
i
.
i ,
then One of the problem in factor analysis is the common problem in multivariate analysis: It is not guarantied that all measurement are in the same scale. For that reason it is common to use correlation matrices instead of covariance matrices. If factor analysis is done using Maximum likelihood then loadings using correlation matrix can easily derived those from covariance matrix using the following relation:
ir
(
R
)
ir
(
S
) /
s ii
,
i
(
R
)
i
(
R
) /
s ii
,
s ii
are diagoanal of S
Least-squares for Factor analysis
Other widely used technique for factor analysis is the least-squares technique. It is done by minimisation of:
tr
[(
S
) 2 ) min Covariance matrix has the same conditions as before. If we get derivatives and equate to 0 we can derive the following equatios: (
S
) (
T
)
diag
(
S
T
) First initial value for is taken and using the first equation eigenvalue analysis is used. Then using the second equation technique is called principal factor analysis. It should not be confused with principal component analysis. If values of is found. For this is updated. This are 0 then the first equation is very similar to principal component analysis. That is the reason why some statistical packages contain PCA as a special case for factor analysis.
Two points should be noted: Least-squares are usually used to find initial estimates for ML. If correlation matrix is used then results will be different. Results obtained using covariance and correlation matrices can not be converted using simple scaling as it was case for maximum likelihood estimation.
In general it should be remembered that maximum likelihood estimation is invariant under transformation with non-zero Jacobians
Significance test and model selection
If normality assumptions holds then we can use likelihood ratio test for factor with dimension m.
If null hypothesis is:
H
0 :
T
and the alternative is that covariance is unconstrained (i.e. null hypothesis is not true) then likelihood ratio test reduces to:
n
(
tr
( 1
S
) log( 1
S
)
p
) Distribution of this is approximated with chi-squared distribution with ½((p-m) 2 -(p+m)) degrees of freedom. This enables us to carry out the significance test for null-hypothesis. If maximum number of identifiable parameters is reached we can conclude that it is not straightforward to extract from the given data some structure, For model selection usual techniques used are: First carry out principal component analysis then using one of the recommended techniques (scree plot, proportion of variances etc) select number of factors. Then do factor analysis starting from this value. Likelihood ratio test can be carried out to test significance of the number of factors. But it should applied with care. Likelihood ratio test does not make any adjustments on sequential application of the test. Number of factors m can also be considered as a random variable.
Determining the number of parameters is trade of between the number of parameters (we want to have as little as possible) and goodness-of-fit.
Factor rotations
Factor analysis does not give the unique solution. As we noted above using orthogonal rotation we can derive factors that will fit the model with exactly same accuracy.
It is usual to rotate factors after analysis. There are several techniques for doing that. All they attempt to minimise some loadings and maximise others. Two widely used techniques to derive rotations are varimax and quartimax. Varimax maximises:
R
1 1
p
2
j
1
m i p m
1
ij
4 (
p
i
1
ij
2 ) 2 ,
ij
ij
(
j m
1
ij
) 1 / 2 Quartimax maximises:
R
2
j
1
i p m
1
ij
4 1
pm
p m
j
1
i
1
ij
2 2 Many statistical packages can find rotation matrices using these techniques. R uses varimax only. Sometime it is useful to find non-orthogonal rotation matrices.
Available techniques are promax and oblimin. R can use promax to find the rotation matrix.
Factor scorings
There are also techniques to find factor scores. One technique is due to Bartlett that uses least squares technique: (
x i
y i
)
T
1 (
x i
y i
) min If we get derivatives of this wrt to y and equate to zero we can get: ˆ
i
( ˆ
T
1 ˆ ) 1 ˆ
T x i
Another technique uses normality assumption (due to Thomson) and finds conditional expected value of y given x. It turns out to be: ˆ
i
E
(
y
|
x i
) ˆ
T
(
T
ˆ )
x i
Here we assumed that mean values of x-s are 0. Scores are usually standardised before using.
Both technique gives score as a linear combination of the initial variables.
y i
Ax i
A
is sometimes called factor score estimation matrix in computer package output.
R commands for factor analyses
First decide what data matrix we have and prepare data matrix. Necessary commands for factor analysis are in the package called mva. This package contains many functions for multivariate analysis. First load this package using
library(mva) –
loads the library mva Now we can analyse data using PCA
data(swiss)
– loads data
fan <- factanal(swiss,2) -
It does actual calculations. Second number is the number of factors desired. Have a look help for this command. There are options for rotation and other things
varimax(fan$loadings)
– perform varimax rotation
promax(fan$loadings)
– performs promax rotation
1) 2) 3)
References
Krzanowski WJ and Marriout FHC. (1994) Multivatiate analysis. Vol 2.
Kendall’s library of statistics Rencher AC (1995) Methods of multivatiate analysis Morrison DR (1990) Multivatiate statistical methods
Exercises Factor
Data will be available from http://www.ysbl.york.ac.uk/~garib/mres_course/exercise_factor It is required to use principal component analysis and then factor analysis in an attempt to find number of factors and factors themselves. Use varimax rotation.