Determining the # Of PCs

Download Report

Transcript Determining the # Of PCs

PC Decisions: # PCs, Rotation &
Interpretation
• Remembering the process
• Some cautionary comments
• Statistical approaches
• Mathematical approaches
• “Nontrivial factors” approaches
• Simple Structure & Factor Rotation
• Major Kinds of Factor Rotation
• Factor Interpretation
How the process really works…
Here’s the series of steps we talked about earlier.
•
# factors decision
•
Rotate the factors
•
interpreting the factors
•
factor scores
These “steps” aren’t made independently and done in this order!
Considering the interpretations
of the factors can aid the #
factors decision!
Considering how the factor scores
(representing the factors) relate to
each other and to variables external
to the factoring can aid both the #
factors decision and interpretation.
Statistical Procedures
• PC analyses are extracted from a correlation matrix
• PCs should only be extracted if there is “systematic
covariation” in the correlation matrix
• This is know as the “sphericity question”
• Note: the test asks if there the next PC should be extracted
• There are two different sphericity tests
• Whether there is any systematic covariation in the original R
• Whether there is any systematic covariation left in the partial
R, after a given number of factors has been extracted
• Both tests are called “Bartlett’s Sphericity Test”
Statistical Procedures, cont.
• Applying Bartlett’s Sphericity Tests
• Retaining H0: means “don’t extract another factor”
• Rejecting H0: means “extract the next factor”
• Significance tests provide a p-value, and so a known probability
that the next factor is “1 too many” (a type I error)
• Like all significance tests, these are influenced by “N”
• larger N = more power = more likely to reject H0: = more
likely to “keep the next factor” (& make a Type I error)
• Quandary?!?
• Samples large enough to have a stable R are likely to have
“excessive power” and lead to “over factoring”
• Be sure to consider % variance, replication & interpretability
Mathematical Procedures
• The most commonly applied decision rule (and the
default in most stats packages -- chicken & egg ?) is
the  > 1.00 rule … here’s the logic
Part 1
• Imagine a spherical R (of k variables)
• each variable is independent and carries unique
information
• so, each variable has 1/kth of the information in R
• For a “normal” R (of k variables)
• each variable, on average, has 1/kth of the information
in R
Mathematical Procedure, cont.
Part 2
• The “trace” of a matrix is the sum of its diagonal
• So, the trace of R (with 1’s in the diag) = k (# vars)
•  tells the amount of variance in R accounted for by
each extracted PC
• for a full PC solution   = k (accounts for all variance)
Part 3
• PC is about data reduction and parsimony
• “trading” fewer more-complex things (PCs - linear
combinations of variables) for fewer more-simple things
(original variables)
Mathematical Procedure, cont.
Putting it all together (hold on tight !)
• Any PC with  > 1.00 accounts for more variance
than the average variable in that R
• That PC “has parsimony” -- the more complex
composite has more information than the
average variable
• Any PC with  < 1.00 accounts for less variance
than the average variable in that R
• That PC “doesn’t have parsimony” -- the more
complex composite has more no information than
the average variable
Mathematical Procedure, cont.
There have been examinations the accuracy of this criterion
• The usual procedure is to generate a set of variables from a
known number of factors (vk = b1k*PC1 + … +bfk*PCf, etc.) --while varying N, # factors, # PCs & communalities
• Then factor those variables and see if  > 1.00 leads to the
correct number of factors
Results -- the rule “works pretty well on the average”, which really
means that it gets the # factors right some times,
underestimates sometimes and overestimates sometimes
• No one has generated an accurate rule for assessing when
which of these occurs
• But the rule is most accurate with k < 40, f between k/5 and
k/3 and N > 300
Nontrivial Factors Procedures
These “common sense” approaches became increasing common
as…
• the limitations of statistical and mathematical procedures
became better known
• the distinction between exploratory and confirmatory
factoring developed and the crucial role of “successful
exploring” became better known
These procedures are more like “judgement calls” and require
greater application of content knowledge and “persuasion”, but
are often the basis of good factorings !!
Nontrivial factors Procedures, cont.
Scree -- the “junk” that piles up at the foot of an glacier
a “diminishing returns” approach
• plot the  for each factor and look for the “elbow”
• “Old rule” -- # factors = elbow (1966; 3 below)
• “New rule” -- # factors = elbow - 1 (1967; 2 below)
 4
2
0
# PC 1 2 3 4 5 6
• Sometimes there isn’t a clear
elbow -- try another “rule”
• This approach seems to
work best when combined
with attention to
interpretability !!
An Example…
1? – big elbow at 2, so ’67 rule suggests a single factor, which
clearly accounts for the biggest portion of variance
7? – smaller elbow at 8, so ’67 rule suggests 7
8? – smaller elbow at 8, ’66 rule gives the 8 he was looking for
– also 8th has λ > 1.0 and 9th had λ < 1.0
01
10
λ
20
A buddy in graduate school wanted to build a measure of
“contemporary morality”. He started with the “10 Commandments” and
the “7 Deadly Sins” and created a 56-item scale with 8 subscales. His
scree plot looked like… How many factors?
1
8
20
40
56
Remember that these are subscales of a central construct, so..
• items will have substantial correlations both within and between subscales
• to maximize the variance accounted for, the first factor is likely to pull in all
these inter-correlated variables, leading to a large λ for the first (general) factor
and much smaller λs for subsequent factors
This is a common scree configuration when factoring items from a multisubscale scale!
Kinds of well-defined factors
• There is a trade-off between “parsimony” and “specificity”
whenever we are factoring
• This trade-off influences both the #-of-factors and cutoff
decisions, both of which, in turn, influence factor interpretation
• general and “larger” group factors include more variables,
account for more variance -- are more parsimonious
• unique and “smaller” group factors include fewer variables &
many be more focused -- are often more specific
• Preferences really depends upon ...
• what you are expecting
• what you are trying to accomplish with the factoring
Kinds of ill-defined factors
Unique factors
• hard to know what construct is represented by a 1variable factor
• especially if that variable is multi-vocal
• then the factor is defined by “part” of that single
variable -- but maybe not the part defined by its
name
Group factors can be ill-defined
• “odd combinations” can be hard to interpret -especially later factors comprised of multi-vocal
variables (knowledge of variables & population is
very important!)
Simple Structure
• The idea of simple structure is very appealing ...
•
Each factor of any solution should
have an unambiguous interpretation, because
the variable loadings for each factor should be
simple and clear.
• There have been several different characterizations of
this idea, and varying degrees of success with
translating those characterizations into mathematical
operations and objective procedures, here are some
of the most common
Components of Simple Structure
Each factor should have several variables with strong
loadings
• admonition for well-defined factors
• remember that “strong” loadings can be “+” or “-”
Each variable should have a strong loading for only one
factor
• admonition against multi-vocal items
• admonition of conceptually separable factors
• admonition that each variable should “belong” to some
factor
Each variable should have a large communality
• implying that its membership “accounts” for its variance
The benefit of “simple structure” ?
• Remember that …
• we’re usually factoring to find “groups of variables”
• But, the extraction process is trying to “reproduce variance”
• the factor plot often looks simpler than the structure matrix
V1
V2
V3
V4
PC1
.7
.6
.6
.7
PC2
.5
.6
-.5
-.6
PC2
V2
V1
V3
V4
PC1
• True, this gets more complicated with more variables and
factors, but “simple structure” is basically about “seeing” in the
structure matrix what is apparent in the plot
How rotation relates to “Simple Structure”
Factor Rotations -- changing the “viewing angle” of the factor
space-- have been the major approach to providing simple
structure
• structure is “simplified” if the factor vectors “spear” the
variable clusters
Unrotated
V1
V2
V3
V4
PC1 PC2
.7 .5
.6 .6
.6 -.5
.7 -.6
PC1’
PC2
Rotated
V2
V1
PC1
V3
V4
PC2’
V1
V2
V3
V4
PC1 PC2
.7 -.1
.7 .1
.1 .5
.2 .6
Major Types of Rotation
Remember -- extracted factors are orthogonal (uncorrelated)
• Orthogonal Rotation -- resulting factors are uncorrelated
• more parsimonious & efficient, but less “natural”
• Oblique Rotation -- resulting factors are correlated
• more “natural” & better “spearing”, but more complicated
Orthogonal Rotation
PC1’
PC2
Angle is 90o
Oblique Rotation
V2
V1
PC2
PC1’
Angle less
than 90o
V2
V1
PC1
V3
V4
PC1
V3
V4
PC2’
PC2’
Major Types of Orthogonal Rotation &
their “tendencies”
Varimax -- most commonly used and common default
• “simplifies factors” by maximizing variance of loadings of
variables of a factor (minimized #vars with high loadings)
• tends to produce group factors
Quartimax
•
•
“simplifies variables” by maximizing variance of loadings
of a variable across factors (minimizes #factors a var
loads on)
tends to “move” vars from extraction less than varimax
• tends to produce a general & small group factors
Equimax
•
•
designed to “balance” varimax and quartimax tendencies
didn’t work very well -- can’t do simultaneously whichever is done first dominates the final structure
Major Types of Oblique Rotation & their
“tendencies”
Promax
• computes best orthogonal solution and then “relaxes”
orthogonality constraints to better “spear” variable clusters
with factor vectors (give simpler structure)
Direct Oblimin
• spearing variable clusters as well as possible to produce
lowest occurrence of multi-vocality
All oblique rotations have a parameter (, , Κ) that set
maximum correlation allowed between rotated factors
• changing this parameter can “importantly” change the
resulting rotation and interpretation
• try at least a couple of values & look for consistency
Some things that are different (or not) when
you use a Oblique Rotation
Different things:
• There will be a  (phi) matrix that holds the factor
intercorrelations
• The -values and variances accounted for by the
rotated factors will be different than those of the
extracted factors
• compute  for each factor by summing the squared
structure loadings for that factor
• compute the variance accounted for as the newly computed
/k
Same things:
• the communality of each variable will be the same --
but can’t be computed by summing squared structure loadings
for each variable (since factors are correlated)
Interpretation & Cut-offs
• Interpretation is the process of naming factors based on the
variables that “load on” them
• Which variables “load” is decided based on a “cutoff”
• cutoffs usually range from .3 to .4 ( + or - )
• Higher cutoffs limit # loading variables
• factors may be ill-defined, some variables may not load
• Lower cutoffs increases # loading variables
• variables more likely to be multi-vocal
• Worry & make a careful decision when your interpretation
depends upon the cutoff that is chosen !!
Combining #-factors & Rotation to Select “the best
Factor Solution”
To specify “the solution” you must pick the #factors, type or rotation & cutoff !
• Apply the different rules to arrive at an initial “best
guess” of the #-factors
• Obtain orthogonal and oblique rotations for that many
factors, for one fewer and for one more
• Compare the solutions to find for “your favorite” –
remember this is exploratory factoring, so explore!
•
•
•
•
•
•
parsimony vs. specificity
different cutoffs (.3 - .4)
rotational survival
simple structure
conceptual sense
interesting surprises (about factors and/or variables)