Factor Analysis with SPSS Karl L. Wuensch Dept. of Psychology East Carolina University.

Download Report

Transcript Factor Analysis with SPSS Karl L. Wuensch Dept. of Psychology East Carolina University.

Factor Analysis with SPSS
Karl L. Wuensch
Dept. of Psychology
East Carolina University
What is a Common Factor?
• It is an abstraction, a hypothetical
construct that relates to at least two of our
measurement variables.
• We want to estimate the common factors
that contribute to the variance in our
variables.
• Is this an act of discovery or an act of
invention?
What is a Unique Factor?
• It is a factor that contributes to the
variance in only one variable.
• There is one unique factor for each
variable.
• The unique factors are unrelated to one
another and unrelated to the common
factors.
• We want to exclude these unique factors
from our solution.
Iterated Principal Factors Analysis
• The most common type of FA.
• Also known as principal axis FA.
• We eliminate the unique variance by
replacing, on the main diagonal of the
correlation matrix, 1’s with estimates of
communalities.
• Initial estimate of communality = R2
between one variable and all others.
Lets Do It
• Using the beer data, change the extraction
method to principal axis.
Look at the Initial Communalities
• They were all 1’s for our PCA.
• They sum to 5.675.
• We have eliminated 7 – 5.675 = 1.325
units of unique variance.
Communalities
COST
SIZE
A LCOHOL
REPUTA T
COLOR
A ROMA
TA STE
Initial
.738
.912
.866
.499
.922
.857
.881
Extraction
.745
.914
.866
.385
.892
.896
.902
Extraction Method: Principal Axis Factoring.
Iterate!
• Using the estimated communalities, obtain
a solution.
• Take the communalities from the first
solution and insert them into the main
diagonal of the correlation matrix.
• Solve again.
• Take communalities from this second
solution and insert into correlation matrix.
• Solve again.
• Repeat this, over and over, until the
changes in communalities from one
iteration to the next are trivial.
• Our final communalities sum to 5.6.
• After excluding 1.4 units of unique
variance, we have extracted 5.6 units of
common variance.
• That is 5.6 / 7 = 80% of the total variance
in our seven variables.
• We have packaged those 5.6 units of
common variance into two factors:
Total Variance Explained
Factor
1
2
Extraction Sums of Squared Loadings
Total
% of Variance
Cumulative %
3.123
44.620
44.620
2.478
35.396
80.016
Extraction Method: Principal Axis Factoring.
Rotation Sums of Squared Loadings
Total
% of Variance
Cumulative %
2.879
41.131
41.131
2.722
38.885
80.016
Our Rotated Factor Loadings
• Not much different from those for the PCA.
Rotated Factor Matrixa
Factor
1
TASTE
AROMA
COLOR
SIZE
ALCOHOL
COST
REPUTAT
.950
.946
.942
7.337E-02
2.974E-02
-4.64E-02
-.431
2
-2.17E-02
2.106E-02
6.771E-02
.953
.930
.862
-.447
Extraction Method: Principal Axis Factoring.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
Reproduced and Residual
Correlation Matrices
• Correlations between variables result from
their sharing common underlying factors.
• Try to reproduce the original correlation
matrix from the correlations between
factors and variables (the loadings).
• The difference between the reproduced
correlation matrix and the original
correlation matrix is the residual matrix.
• We want these residuals to be small.
• Check “Reproduced” under “Descriptive”
in the Factor Analysis dialogue box, to get
both of these matrices:
•
Reproduced Correlations
Reproduced Correlation
Residual
a
COST
SIZE
ALCOHOL
REPUTAT
COLOR
AROMA
TASTE
COST
SIZE
ALCOHOL
REPUTAT
COLOR
AROMA
TASTE
COST
.745b
.818
.800
-.365
1.467E-02
-2.57E-02
-6.28E-02
1.350E-02
-3.29E-02
-4.02E-02
3.328E-03
-2.05E-02
-1.16E-03
SIZE
.818
.914b
.889
-.458
.134
8.950E-02
4.899E-02
1.350E-02
1.495E-02
6.527E-02
4.528E-02
8.097E-03
-2.32E-02
ALCOHOL REPUTAT
.800
-.365
.889
-.458
.866b
-.428
-.428
.385b
9.100E-02
-.436
4.773E-02
-.417
8.064E-03
-.399
-3.295E-02
-4.02E-02
1.495E-02
6.527E-02
-3.47E-02
-3.471E-02
-1.884E-02
6.415E-02
-3.545E-03
-2.59E-02
3.726E-03
-4.38E-02
COLOR
1.467E-02
.134
9.100E-02
-.436
.892b
.893
.893
3.328E-03
4.528E-02
-1.88E-02
6.415E-02
1.557E-02
1.003E-02
AROMA
-2.57E-02
8.950E-02
4.773E-02
-.417
.893
.896b
.898
-2.05E-02
8.097E-03
-3.54E-03
-2.59E-02
1.557E-02
-2.81E-02
Extraction Method: Principal Axis Factoring.
a. Residuals are computed between observed and reproduced correlations. There are 2 (9.0%) nonredundant residuals with
absolute values greater than 0.05.
b. Reproduced communalities
TASTE
-6.28E-02
4.899E-02
8.064E-03
-.399
.893
.898
.902b
-1.16E-03
-2.32E-02
3.726E-03
-4.38E-02
1.003E-02
-2.81E-02
Nonorthogonal (Oblique) Rotation
• The axes will not be perpendicular, the
factors will be correlated with one another.
• the factor loadings (in the pattern matrix)
will no longer be equal to the correlation
between each factor and each variable.
• They will still equal the beta weights, the
A’s in
X j  A1j F1  A2 j F2    Amj Fm  U j
•
•
•
•
Promax rotation is available in SAS.
First a Varimax rotation is performed.
Then the axes are rotated obliquely.
Here are the beta weights, in the “Pattern
Matrix,” the correlations in the “Structure
Matrix,” and the correlations between
factors:
Beta Weights
Correlations
Structure Matrix
Pattern Matrixa
Factor
Factor
1
TASTE
AROMA
COLOR
SIZE
ALCOHOL
COST
REPUTAT
.955
.949
.943
2.200E-02
-2.05E-02
-9.33E-02
-.408
1
2
-7.14E-02
-2.83E-02
1.877E-02
.953
.932
.868
-.426
TASTE
AROMA
COLOR
SIZE
ALCOHOL
COST
REPUTAT
Extraction Method: Principal Axis Factoring.
Rotation Method: Promax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
.947
.946
.945
.123
.078
-.002
-.453
1
1.000
.106
.030
.072
.118
.956
.930
.858
-.469
Extraction Method: Principal Axis Factoring.
Rotation Method: Promax with Kaiser Normalization.
Factor Correlation Matrix
Factor
1
2
2
2
.106
1.000
Extraction Method: Principal Axis Factoring.
Rotation Method: Promax with Kaiser Normalization.
Exact Factor Scores
• You can compute, for each subject,
estimated factor scores.
• Multiply each standardized variable score
by the corresponding standardized scoring
coefficient.
• For our first subject,
Factor 1 = (-.294)(.41) + (.955)(.40) + (-.036)(.22)
+ (1.057)(-.07) + (.712)(.04) + (1.219)(.03)
+ (-1.14)(.01) = 0.23.
• SPSS will not only give you the scoring
coefficients, but also compute the
estimated factor scores for you.
• In the Factor Analysis window, click
Scores and select Save As Variables,
Regression, Display Factor Score
Coefficient Matrix.
• Here are the scoring coefficients:
Factor Score Coefficient Matrix
Factor
1
COST
SIZE
ALCOHOL
REPUTAT
COLOR
AROMA
TASTE
.026
-.066
.036
.011
.225
.398
.409
2
.157
.610
.251
-.042
-.201
.026
.110
Extraction Method: Principal Axis Factoring.
Rotation Method: Varimax with Kaiser Normalization.
Factor Scores Method: Regression.
• Look back at the data sheet and you will
see the estimated factor scores.
R2 of the Variables With Each Factor
• These are treated as indicators of the internal
consistency of the solution.
• .70 and above is good.
• They are in the main diagonal of this matrix
Factor Score Covariance Matrix
Factor
1
2
1
.966 .003
2
.003 .953
R2 of the Variables With Each Factor 2
• These squared multiple correlation
coefficients are equal to the variance of
the factor scores.
Use the Factor Scores
• Let us see how the factor scores are
related to the SES and Group variables.
• Use multiple regression to predict SES
from the factor scores.
Model Summary
Model
1
R
.988a
R Square
.976
Adjusted
R Square
.976
a. Predictors: (Constant), FAC2_1, FAC1_1
Std. Error of
the Estimate
.385
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
1320.821
32.179
1353.000
df
2
217
219
Mean Square
660.410
.148
F
4453.479
Sig.
.000a
a. Predictors: (Constant), FAC2_1, FAC1_1
b. Dependent Variable: SES
Coefficientsa
Model
1
Standardized
Coefficients
Beta
(Constant)
FAC1_1
FAC2_1
a. Dependent Variable: SES
.681
-.718
t
134.810
65.027
-68.581
Sig.
.000
.000
.000
Correlations
Zero-order
Part
.679
-.716
.681
-.718
• Also, use independent t to compare
groups on mean factor scores.
Group Statistics
FAC1_1
FAC2_1
GROUP
1
2
1
2
N
121
99
121
99
Mean
-.4198775
.5131836
.5620465
-.6869457
Std. Deviation
.97383364
.71714232
.88340921
.55529938
Std. Error
Mean
.08853033
.07207552
.08030993
.05580969
Independent Samples Test
Levene's Test for
Equality of Variances
F
FAC1_1
FAC2_1
Equal variances
assumed
Equal variances
not assumed
Equal variances
assumed
Equal variances
not assumed
19.264
25.883
Sig.
.000
.000
t-test for Equality of Means
t
df
Sig. (2-tailed)
95% Confidence
Interval of the
Difference
Lower
Upper
-7.933
218
.000
-1.16487
-.701253
-8.173
215.738
.000
-1.15807
-.708049
12.227
218
.000
1.047657
1.450327
12.771
205.269
.000
1.056175
1.441809
Unit-Weighted Factor Scores
• Define subscale 1 as simple sum or mean
of scores on all items loading well (> .4) on
Factor 1.
• Likewise for Factor 2, etc.
• Suzie Cue’s answers are
• Color, Taste, Aroma, Size, Alcohol, Cost, Reputation
• 80, 100, 40, 30, 75, 60, 10
• Aesthetic Quality = 80+100+40-10 = 210
• Cheap Drunk = 30+75+60-10 = 155
• It may be better to use factor scoring
coefficients (rather than loadings) to
determine unit weights.
• Grice (2001) evaluated several techniques
and found the best to be assigning a unit
weight of 1 to each variable that has a
scoring coefficient at least 1/3 as large as
the largest for that factor.
• Using this rule, we would not include
Reputation on either subscale and would
drop Cost from the second subscale.
Item Analysis
and Cronbach’s Alpha
• Are our subscales reliable?
• Test-Retest reliability
• Cronbach’s Alpha – internal consistency
– Mean split-half reliability
– With correction for attenuation
– Is a conservative estimate of reliability
• AQ = Color + Taste + Aroma – Reputation
• Must negatively weight Reputation prior to
item analysis.
• Transform, Compute,
NegRep = -1Reputat.
• Analyze, Scale, Reliability Analysis
• Statistics
• Scale if item deleted.
• Continue, OK
• Shoot for an alpha of at least .70 for
research instruments.
• Note that deletion of the Reputation item
would increase alpha to .96.
Comparing Two Groups’ Factor
Structure
• Eyeball Test
– Same number of well defined factors in both
groups?
– Same variables load well on same factors in
both groups?
• Pearson r
– Just correlate the loadings for one factor in
one group with those for the corresponding
factor in the other group.
– If there are many small loadings, r may be
large due to the factors being similar on small
loadings despite lack of similarity on the larger
loadings.
• CC, Tucker’s coefficient of congruence
– Follow the instructions in the document
Comparing Two Groups’ Factor Structures:
Pearson r and the Coefficient of Congruence
– CC of .85 to .94 corresponds to similar
factors, and .95 to 1 as essentially identical
factors.
• Cross-Scoring
– Obtain scoring coefficients for each group.
– For each group, compute factor scores using
coefficients obtained from the analysis for that
same group (SG) and using coefficients
obtained from the analysis for the other group
(OG).
– Correlate SG factor scores with OG factor
scores.
• Catell’s Salient Similarity Index
– Factors (one from one group, one from the
other group) are compared in terms of
similarity of loadings.
– Catell’s Salient Similarity Index, s, can be
transformed to a p value testing the null that
the factors are not related to one another.
– See my document Cattell’s s for details.
Required Number of Subjects and
Variables
• Rules of Thumb (not very useful)
– 100 or more subjects.
– at least 10 times as many subjects as you
have variables.
– as many subjects as you can, the more the
better.
• It depends – see the references in the
handout.
• Start out with at least 6 variables per
expected factor.
• Each factor should have at least 3
variables that load well.
• If loadings are low, need at least 10
variables per factor.
• Need at least as many subjects as
variables. The more of each, the better.
• When there are overlapping factors
(variables loading well on more than one
factor), need more subjects than when
structure is simple.
• If communalities are low, need more
subjects.
• If communalities are high (> .6), you can
get by with fewer than 100 subjects.
• With moderate communalities (.5), need
100-200 subjects.
• With low communalities and only 3-4 high
loadings per factor, need over 300
subjects.
• With low communalities and poorly defined
factors, need over 500 subjects.
What I Have Not Covered Today
• LOTS.
• For a brief introduction to reliability,
validity, and scaling, see Document or
Slideshow .
• For an SAS version of this workshop, see
Document or Slideshow .
Practice Exercises
• Animal Rights, Ethical Ideology, and
Misanthropy
• Rating Characteristics of Criminal
Defendants