Document

Transcript Document

Ninness, C., Lauter, J. Coffee, M., Clary, L., Kelly, E., Rumph, M., Rumph, R., Kyle, R., & Ninness, S. (2012) Behavioral and Biological N
EPS 651 Multivariate Analysis
Factor Analysis,
Principal Components Analysis,
and
Neural Network Analysis
(Self-Organizing Maps)
For next week:
Continue with T&F Chapter 13
and please read the study below posted on our webpage:
T&F Chapter 13 --> 13.5.3 page 642
Several slides are based on material from the
UCLA SPSS Academic Technology Services
http://www.ats.ucla.edu/stat/spss/output/factor1.htm
Principal components analysis (PCA) and Factor Analysis are methods
of data reduction:
Suppose that you have a dozen variables that are correlated. You might use principal
components analysis to reduce your 12 measures to a few principal components. For
example, you may be most interested in obtaining the component scores (which are
variables that are added to your data set) and/or to look at the dimensionality of the
data. For example, if two components are extracted and those two components
accounted for 68% of the total variance, then we would say that two dimensions in the
component space account for 68% of the variance. Unlike factor analysis, principal
components analysis is not usually used to identify underlying latent variables.
[direct quote from below].
http://www.ats.ucla.edu/stat/spss/output/factor1.htm
FA and PCA: Data reduction methods
If raw data are used, the procedure will create the original correlation matrix or
covariance matrix, as specified by the user. If the correlation matrix is used, the
variables are standardized and the total variance will equal the number of variables
used in the analysis (because each standardized variable has a variance equal to
1). If the “covariance matrix” is used, the variables will remain in their original
metric. However, one must take care to use variables whose variances and scales
are similar. Unlike factor analysis, which analyzes the common variance, the
original matrix in a principal components analysis analyzes the total
variance. Also, principal components analysis assumes that each original measure
is collected without measurement error [direct quote].
http://www.ats.ucla.edu/stat/spss/output/factor1.htm
Spin Control
Factor analysis is a method of data reduction also – forgiving relative to PCA
Factor Analysis seeks to find underlying unobservable (latent) variables that are reflected in
the observed variables (manifest variables). There are many different methods that can be
used to conduct a factor analysis (such as principal axis factor, maximum likelihood,
generalized least squares, unweighted least squares). There are also many different types of
rotations that can be done after the initial extraction of factors, including orthogonal
rotations, such as varimax and equimax, which impose the restriction that the factors cannot
be correlated, and oblique rotations, such as promax, which allow the factors to be correlated
with one another. You also need to determine the number of factors that you want to
extract. Given the number of factor analytic techniques and options, it is not surprising that
different analysts could reach very different results analyzing the same data set. However, all
analysts are looking for a simple structure. A simple structure is pattern of results such that
each variable loads highly onto one and only one factor.
[direct quote]
http://www.ats.ucla.edu/stat/spss/output/factor1.htm
FA vs. PCA conceptually
FA produces factors
PCA produces components
FA
I1
I2
PCA
I3
I1
I2
I3
6
Kinds of Research Questions re PCA and FA
What does each factor mean? Interpretation? Your call
What is the percentage of variance in the data accounted for by the
factors? SPSS & psyNet will show you
Which factors account for the most variance? SPSS & psyNet
How well does the factor structure fit a given theory? Your call
What would each subject’s score be if they could be measured directly on the
factors? Excellent question!
7
Before you can even start to answer these questions using FA
should be
> .6
should be
< .05
Kaiser-Meyer-Olkin Measure of Sampling Adequacy - This measure varies between 0
and 1, and values closer to 1 are better. A value of .6 is a suggested minimum.
It answers the question: Is there enough data relative to the number of variables.
Bartlett's Test of Sphericity - This tests the null hypothesis that the correlation matrix is an identity
matrix. An identity matrix is a matrix in which all of the diagonal elements
are 1 and all off diagonal elements are 0.
Ostensibly, you want to reject this null hypothesis.
This, of course, is psychobabble.
Taken together, these two tests provide a minimum standard which should be passed before a factor
analysis (or a principal components analysis) should be conducted.
What is a Common Factor?
It is an abstraction, a “hypothetical construct” that relates to at least two
of our measurement variables into a factor
In FA, psychometricians / statisticians try to estimate the common factors
that contribute to the variance in a set of variables.
Is this an act of logical conclusion, a creation, or a figment of a
psychometrician’s imagination ? Depends on who you ask
What is a Unique Factor?
It is a factor that contributes to the variance in only one variable.
There is one unique factor for each variable.
The unique factors are unrelated to one another and unrelated to the
common factors.
We want to exclude these unique factors from our solution.
Seems reasonable … right?
Assumptions
Factor analysis needs large samples and it is one of
the only draw backs
• The more reliable the correlations are the
smaller the number of subjects needed
• Need enough subjects for stable estimates -How many is enough
11
Assumptions
Take home hint:
• 50 very poor, 100 poor, 200 fair, 300 good, 500 very good and
1000+ excellent
• Shoot for minimum of 300 usually
12
• More highly correlated markers fewer subjects
Assumptions
No outliers – obvious influence on correlations would bias
results
Multicollinearity
In PCA it is not problem; no inversions
In FA, if det(R) or any eigenvalue approaches 0 -> multicollinearity
is likely
The above Assumptions at Work:
Note that the metric for all these variables is the
same (since they employed a rating scale). So do we
do we run the FA as
correlation or covariance matrices /
does it matter?
Sample Data Set From Chapter 13 (p. 617)
Tabacknick and Fidell
Principal Components and Factor Analysis
Skiers
S1
S2
S3
S4
S5
Cost
32
61
59
36
62
Variables
Lift
Depth
64
65
37
62
40
45
62
34
46
43
Powder
67
65
43
35
40
Keep in mind, multivariate normality is assumed when
statistical inference is used to determine the number of factors.
The above dataset is far too small to fulfill the normality assumption. However,
even large datasets frequently violate this assumption and compromise the analysis.
Multivariate normality also implies that relationships among pairs of variables are linear. The analysis is
degraded when linearity fails, because correlation measures linear relationship and does not reflect
nonlinear relationship.
Linearity among variables is assessed through visual inspection of scatterplots.
Equations – Extractions - Components
Correlation matrix w/ 1s in the diag
Cost
Lift
Depth
Powder
Cost
Lift
Depth
Powder
1
-0.952990 -0.055276 -0.129999
-0.952990
1
-0.091107 -0.036248
-0.055276 -0.091107
1
0.990174
-0.129999 -0.036248 0.990174
1
Large correlation between Cost and Lift and another between Depth and Powder
Looks like two possible factors – why?
Are you sure about this?
L=V’RV => L =
V’
R
V
EigenValueMatrix = TransposeEigenVectorMatrix * CorMat * EigenVecMat
We are reducing to a few factors which duplicate the matrix?
Does this seem reasonable?
Equations – Extraction - Obtaining components
In a two-by-two matrix we derive eigenvalues
with two eigenvectors each containing two elements
In a four-by-four matrix we derive eigenvalues
with eigenvectors each containing four elements
L=V’RV It is important to know how L is constructed
Where
L is the eigenvalue matrix and V is the eigenvector matrix.
This diagonalized the R matrix and reorganized the variance into eigenvalues
A 4 x 4 matrix can be summarized by 4 numbers instead of 16.
5
4
Remember
this?
54
2
2-
1
1
2
-
(5- ) *
-5 + -2
+
2
0
0
(2-
(5 * 2)
)
-
-
(4 * 1)
(1 *4)
-7 + 6=0
With a two-by-two matrix we derive eigenvalues
with two eigenvectors each containing two elements
With a four-by-four matrix we derive eigenvalues
with eigenvectors each containing four elements
it simply becomes a longer polynomial
54
2
2-
1
(5- ) *
-5 + -2
+
2
(2-
-
(5 * 2)
)
(1 *4)
-7 + 6=0
-
(4 * 1)
= 0
Determinant
Where a = 1, b = -7 and c = 6
i
i
=
- (- 7 ) +
=
i
-b
1
b2 - 4 ac
2a
(-7)2 - 4 (1) * (6)
=6
2 (1)
- (- 7 ) -
=
-+
(-7)2 - 4 (1) * (6)
2 (1)
=6
2
an equation of the second
degree with two
roots [eigenvalues]
=1
=1
21
From Eigenvalues to Eigenvectors
R=VLV’
Equations – Extractions – Obtaining components
• SPSS matrix output
Skiers
S1
S2
S3
S4
S5
Cost
32
61
59
36
62
Variables
Lift
Depth Powder
64
65
67
37
62
65
40
45
43
62
34
35
46
43
40
Careful here. 1.91 is correct,
but it appears as a “2” in the text
Obtaining
L = the eigenvalue matrix
V’
Our original correlation matrix
V
Bartlett's Test of Sphericity - This tests the null hypothesis that the correlation
matrix is an identity matrix. An identity matrix is matrix in which all of the
diagonal elements are 1 and all off diagonal elements are 0.
Equations – Extraction – Obtaining Components
Other than the magic “2” below – this is a decent example
1.91
We have
“extracted” two
Factors from
four variables
Using a small
data set
Following SPSS Extraction and Rotation
and all that jazz… in this case, not much difference [others data sets show big change]
Cost
Lift
Depth
Powder
Factor 1 Factor 2
-0.401
0.907
0.251
-0.954
0.933
0.351
0.957
0.288
Here we see that Factor 1 is mostly Depth and Powder (Snow Condition Factor)
Factor 2 is mostly Cost and Lift, which is a Resort Factor
Both factors have complex loadings
Using SPSS 12, SPSS 20 and psyNet.SOM
This is a variation on your homework.
Just use your own numbers and replicate the process.
(we may use this hypothetical data as part of a study)
Skiers
S1
S2
S3
S4
S5
Cost
32
61
59
36
62
Variables
Lift
Depth
64
65
37
62
40
45
62
34
46
43
Powder
67
65
43
35
40
Here is an easier way than doing it by hand:
Arrange data in Excel Format as below: SPSS 20
Select Data Reduction: SPSS 12
Select Data Reduction: SPSS 20
Select Variables Descriptives: SPSS 12
Select Variables and Descriptives: SPSS 20
Start with a basic run using Principal Components: SPSS 12
Eigenvalues over 1
Start with a basic run using Principal Components: SPSS 12
Fixed number of factors
Select Varimax: SPSS 12
Select Varimax: SPSS 20
Under Options, select exclude cases likewise and sort by size: SPSS 12
Under Options, select exclude cases likewise and sort by size: SPSS 20
Under Scores, select “save variables” and “display matrix”: SPSS 20
Watch what pops out of your oven
A real time saver
Matching psyNet PCA correlation matrix with SPSS FA
This part is the same but the rest of PCA
goes in an entirely different direction
Kaiser's measure of sampling adequacy: Values of .6 and above are required for a good FA.
Remember
these guys?
An MSA of .9 is marvelous, .4 is not too impressive – Hey it was a small sample
Normally, variables with small MSAs should be deleted
Looks like two factors can be isolated/extracted
which ones? and what shall we call them?
Here they are again // they have eigenvalues > 1
We are reducing to a few factors which duplicate the matrix?
Fairly
Close
Rotations – Nice hints here
SPSS will provide an Orthogonal Rotation
without your help – look at the iterations
Extraction, Rotation, and Meaning of Factors
Orthogonal Rotation [assume no correlation among the factors]
Loading Matrix – correlation between each variable and the factor
Oblique Rotation [assumes possible correlations among the factors]
Factor Correlation Matrix – correlation between the factors
Structure Matrix – correlation between factors and variables
Oblique Rotations – Fun but not today
Factor extraction is usually followed by rotation in order to maximize
large correlation and minimize small correlations
Rotation usually increases simple structure and interpretability.
The most commonly used is the Varimax variance maximizing procedure
which maximizes factor loading variance
Rotating your axis “orthogonally” ~ sounds painfully chiropractic
Where are your
components
located on
these graphs?
What are the
upper
and
lower limits
on each of
these
axes?
Cost and Lift
may be a
factor,
but they are
polar opposites
Abbreviated Equations
Factor weight matrix [B] is found by dividing the loading matrix [A]
by the correlation matrix [R-1].
See matrix output
B  R
1
A
Factors scores [F] are found by multiplying the standardized scores
[Z] for each individual by the factor weight matrix [B]and adding
them up.
F  ZB
Abbreviated Equations
The specific goals of PCA or FA are to summarize patterns of correlations among
observed variables, to reduce a large number of observed variables to a smaller number
of factors,
to provide an operational definition
(a regression equation) for an underlying process by using observed
variables to test a theory about the nature of underlying processes.
Z  FA '
You can also estimate what each subject
would score on the “standardized variables.”
This is a revealing procedure—often overlooked.
Standardized
variables
as factors
Predictions based on Factor analysis: Standard-Scores
1.1447
Cost
Lift
Depth
Powder
0.96637
1
32
64
65
67
2
61
37
62
65
-0.41852
3
59
40
45
43
-1.11855
4
36
62
34
35
-0.574
5
62
46
43
40
Predictions based on Factor analysis: Standard-Scores
1.18534
Cost
Lift
Depth
Powder
-0.90355
1
32
64
65
67
- 0.70694
2
61
37
62
65
0.98342
3
59
40
45
43
4
36
62
34
35
Interesting stuff… what about cost?
-0.55827
5
62
46
43
40
Predictions based on Factor analysis: Standard-Scores
0.39393
-0.59481
- 0.73794
-0.64991
And this is supposed to represent ?
1.58873
SOM Classification of Ski Data
Variables
Skiers
S1
S2
S3
S4
S5
Skiers
Cost
Lift
Depth
Powder
Cost
32
61
59
36
62
S1
32
64
65
67
Variables
Lift
Depth
64
65
37
62
40
45
62
34
46
43
S2
61
37
62
65
S3
59
40
45
43
Powder
67
65
43
35
40
S4
36
62
34
35
S5
62
46
43
40
Transpose data before saving as a CSV file.
Transpose data to analyze by class/factors
4 rows 4 columns
In CSV format
SOM Classification of Ski Data
SOM classification 1:
Depth and Powder
across 5 SS
Nice match with FA 1
Cost
Lift
Powder
Depth
Cost
Lift
Depth
Powder
-1.36772 0.835832 0.683862 -1.06379 0.911816
1.27029 -1.14505 -0.87668 1.091376 -0.33994
1.285737 1.031973 -0.40602 -1.33649
-0.5752
1.275638 1.125563 -0.52526 -1.12556 -0.75038
1
32
64
65
67
2
61
37
62
65
3
59
40
45
43
4
36
62
34
35
5
62
46
43
40
Cost
Class/Factor ??
SOM classification 2:
Cost across 5 SS
SOM classification 3:
Lift across 5 SS
Lift
Class/Factor
Near match
with FA 2
C
ost
Lift
Depth
Powder
Cost
Lift
Powder
Depth
1
32
64
65
67
-1.36772
1.27029
1.285737
1.275638
2
61
37
62
65
0.835832
-1.14505
1.031973
1.125563
3
59
40
45
43
0.683862
-0.87668
-0.40602
-0.52526
4
36
62
34
35
-1.06379
1.091376
-1.33649
-1.12556
5
62
46
43
40
0.911816
-0.33994
-0.5752
-0.75038
Factor 1:
Appears to address
Depth and Powder
SOM classification 1:
Depth and Powder
across 5 SS
Nice match with FA 1
This could be placed into a logistic regression and predict with reasonable accuracy
Factor 2:
Appears to address Lift
SOM classification 3:
Lift across 5 SS
Predictions based on Factor analysis: Standard-Scores
Factor Analysis Factor 3: ??
SOM classification 2:
Cost across 5 SS
Center for Machine Learning and Intelligent Systems
Iris Setosa
Iris Versicolour
Iris Virginica
This dataset has provided the foundation for
multivariate statistics and machine learning
Transpose data before saving as a CSV file.
Transpose data to analyze by class/factors
4 rows 150 columns
In CSV format
Factor Analysis: Factor 1
Factor Analysis: Factor 2
petal width in cm
sepal length in cm
sepal width in cm
petal length in cm
SOM Neural Network: Class 1
SOM Neural Network: Class 2
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
Factor Analysis: Factor 1
SOM Neural Network: Class 1
sepal length in cm
sepal width in cm
petal length in cm
This could be placed into a logistic regression and predict with near perfect accuracy
Really ?? Look at the original
Everybody but psychologists seem to understand this