Factor i - Michael Kalsher Home

Download Report

Transcript Factor i - Michael Kalsher Home

Factor Analysis
Adv. Experimental
Methods & Statistics
PSYC 4310 / COGS 6310
Michael Kalsher
Department of
Cognitive Science
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2012, Michael Kalsher
1
Outline
• What Are Factors?
• Representing Factors
– Graphs and Equations
• Extracting factors
– Methods and Criteria
• Interpreting Factor Structures
– Factor Rotation
• Reliability
– Cronbach’s alpha
• Writing Results
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
2
When to use Factor Analysis?
• Data Reduction
• Identification of underlying latent structures
- Clusters of correlated variables are termed factors
– Example:
– Factor analysis could potentially be used to identify
the characteristics (out of a large number of
characteristics) that make a person popular.
Candidate characteristics: Level of social skills, selfishness, how
interesting a person is to others, the amount of time they spend
talking about themselves (Talk 2) versus the other person (Talk
1), their propensity to lie about themselves.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
3
The R-Matrix
Factor 1:
The better your social skills,
the more interesting and
talkative you tend to be.
Meaningful clusters of large correlation
coefficients between subsets of variables
Factor 2:
Selfish people are likely to lie
suggests these variables are measuring
and talk about themselves.
aspects of the same underlying
dimension.
4
PSYC
4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
What is a Factor?
• Factors can be viewed as classification axes
along which the individual variables can be
plotted.
• The greater the loading of variables on a
factor, the more the factor explains
relationships among those variables.
• Ideally, variables should be strongly related to
(or load on) only one factor.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
5
Graphical Representation of a
factor plot
Note that each variable
loads primarily on only
one factor.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
Factor loadings tell use about
the relative contribution that a
variable makes to a factor
© 2009, Michael Kalsher and James Watt
6
Mathematical Representation
of a factor plot
• The equation describing a linear model can be
applied to the description of a factor.
• The b’s in the equation represent the factor
loadings observed in the factor plot.
Yi = b1X1i +b2X2i + … bnXn + εi
Factori = b1Variable1i +b2Variable2i + … bnVariablen + εi
Note: there is no intercept in the equation since the lines intersection at zero and hence
the intercept is also zero.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
7
Mathematical Representation
of a factor plot
There are two factors underlying the popularity
construct: general sociability and consideration.
We can construct equations that describe each factor in
terms of the variables that have been measured.
Sociabilityi = b1Talk 1i +b2Social Skillsi + b3interesti
+ b4Talk 2 + b5Selfishi + b6Liari + εi
Considerationi = b1Talk 1i +b2Social Skillsi +
b3interesti + b4Talk 2 + b5Selfishi + b6Liari + εi
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
8
Mathematical Representation
of a factor plot
The values of the “b’s” in the two equations differ, depending on
the relative importance of each variable to a particular factor.
Sociabilityi = 0.87Talk 1i +0.96Social Skillsi + 0.92Interesti +
0.00Talk 2 - 0.10Selfishi + 0.09Liari + εi
Considerationi = 0.01Talk 1i - 0.03Social Skillsi +
0.04interesti + 0.82Talk 2 + 0.75Selfishi + 0.70Liari + εi
Replace values of b with the co-ordinate of each variable on the graph.
Ideally, variables should have very high b-values for one factor and very low
b-values for all other factors.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
9
Factor Loadings
Factors
Variables
Sociability
Consideration
Talk 1
0.87
0.01
Social Skills
0.96
-0.03
Interest
0.92
0.04
Talk 2
0.00
0.82
Selfish
-0.10
0.75
Liar
0.09
0.70
• The b values represent the weights of a variable on a factor and are
termed Factor Loadings.
• These values are stored in a Factor pattern matrix (A).
• Columns display the factors (underlying constructs) and rows
display how each variable loads onto each factor.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
10
Factor Scores
• Once factors are derived, we can estimate each
person’s Factor Scores (based on their scores for each
factor’s constituent variables).
• Potential uses for Factor Scores.
- Estimate a person’s score on one or more factors.
- Answer questions of scientific or practical interest (e.g., Are females are
more sociable than males? using the factors scores for sociability).
• Methods of Determining Factor Scores
- Weighted Average (simplest, but scale dependent)
- Regression Method (easiest to understand; most typically used)
- Bartlett Method (produces scores that are unbiased and correlate only with their
own factor).
- Anderson-Rubin Method (produces scores that are uncorrelated and
standardized)
PSYC 4310/6310
Advanced Experimental Methods and Statistics
11
© 2009, Michael Kalsher and James Watt
Approaches to Factor Analysis
• Exploratory
– Reduce a number of measurements to a smaller number of
indices or factors (e.g., Principal Components Analysis or PCA).
– Goal: Identify factors based on the data and to maximize the
amount of variance explained.
• Confirmatory
– Test hypothetical relationships between measures and more
abstract constructs.
– Goal: The researcher must hypothesize, in advance, the
number of factors, whether or not these factors are correlated,
and which items load onto and reflect particular factors. In
contrast to EFA, where all loadings are free to vary, CFA allows
for the explicit constraint of certain loadings to be zero.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Communality
• Understanding variance in an R-matrix
– Total variance for a particular variable has two
components:
• Common Variance – variance shared with other variables.
• Unique Variance – variance specific to that variable
(including error or random variance).
• Communality
– The proportion of common (or shared) variance present in
a variable is known as the communality.
– A variable that has no unique variance has a communality
of 1; one that shares none of its variance with any other
variable has a communality of 0.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Factor Extraction: PCA vs. Factor Analysis
– Principal Component Analysis. A data reduction technique that
represents a set of variables by a smaller number of variables called
principal components. They are uncorrelated, and therefore, measure
different, unrelated aspects or dimensions of the data.
– Principal Components are chosen such that the first one accounts for as
much of the variation in the data as possible, the second one for as much
of the remaining variance as possible, and so on.
– Useful for combining many variables into a smaller number of subsets.
– Factor Analysis. Derives a mathematical model from which factors are
estimated.
– Factors are linear combinations that maximize the shared portion of the
variance underlying latent constructs.
– May be used to identify the structure underlying such variables and to
estimate scores to measure latent factors themselves.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Factor Extraction: Eigenvalues & Scree Plot
• Eigenvalues
– Measure the amount of variation accounted for by each factor.
– Number of principal components is less than or equal to the number
of original variables. The first principal component accounts for as
much of the variability in the data as possible. Each succeeding
component has the highest variance possible under the constraint
that it be orthogonal to (i.e., uncorrelated with) the preceding
components.
• Scree Plots
– Plots a graph of each eigenvalue (Y-axis) against the factor
with which it is associated (X-axis).
– By graphing the eigenvalues, the relative importance of each
factor becomes apparent.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Factor Retention Based on Scree Plots
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
16
Factor Retention: Kaiser’s Criterion
Kaiser (1960) recommends retaining all factors with
eigenvalues greater than 1.
- Based on the idea that eigenvalues represent the amount
of variance explained by a factor and that an eigenvalue
of 1 represents a substantial amount of variation.
- Kaiser’s criterion tends to overestimate the number of
factors to be retained.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
17
Doing Factor Analysis: An Example
• Students often become stressed about statistics
(SAQ) and the use of computers and/or SPSS to
analyze data.
• Suppose we develop a questionnaire to measure
this propensity (see sample items on the following
slides; the data can be found in SAQ.sav).
• Does the questionnaire measure a single construct?
Or is it possible that there are multiple aspects
comprising students’ anxiety toward SPSS?
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
18
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
19
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
20
Doing Factor Analysis: Some
Considerations
• Sample size is important! A sample of 300 or more
will likely provide a stable factor solution, but
depends on the number of variables and factors
identified.
• Factors that have four or more loadings greater than
0.6 are likely to be reliable regardless of sample
size.
• Correlations among the items should not be too low
(less than .3) or too high (greater than .8), but the
pattern is what is important.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
21
Factor Extraction
nce
of
gen
of
S
S
mula
%
mula
%
mu
%
of
o
o
Tota
ari
otal
Tota
ri
i
%
Co
an
%
an
%
a
219
90
96
96
0
30
6
19
6
1
2
742
39
60
56
9
40
0
23
6
3
842
17
25
81
7
53
5
99
1
4
317
27
36
17
7
49
6
75
7
5
8
5
2
6
5
3
4
7
6
2
7
8
3
4
0
9
1
5
6
10
7
7
3
11
4
2
5
12
0
1
6
13
2
1
7
14
8
2
9
15
9
8
6
16
3
5
1
17
8
0
1
18
6
2
4
19
4
3
6
20
8
3
9
21
9
0
9
22
4
3
2
23
3
8
0
Ex
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
22
Scree Plot for the
SAQ Data
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
23
Table of Communalities Before
and After Extraction
Component Matrix Before Rotation
(loadings of each variable onto each factor)
a
n en
un a
m po
a
it
ia
c
t
1
2
3
4
0
5
Q
0
01
Q
Q
Q
0
85
0
4
Q
79
Q
0
0
0
Q
73
Q0
0
9
Note: Loadings less0
than
Q
69
Q
0
3
0.4 have been omitted.
Q
58
Q
0
0
4
Q
56
Q0
0
5
Q
52
00
Q
0
0
9
Q
43
Q0
0
4
Q
34
Q
1
0
5
Q
29
Q1
0
0
Q
93
Q1
0
3
Q
86
Q1
0
6
Q
56
Q1
0
8
Q
49
01
17
Q1
0
8
Q
37
Q1
0
7
Q
36
04
Q1
0
3
Q
27
Q1
0
7
Q
27
Q
1
0
3
Q
48
Q
2
0
4
Q
65
Q
2
0
0
Q
62
71
Q
2
0
4
Q
07
Q
2
0
2
E
E
x
a.
4
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
24
Factor Rotation
• To aid interpretation it is possible to maximize
the loading of a variable on one factor while
minimizing its loading on all other factors.
• This is known as Factor Rotation.
• Two types:
– Orthogonal (factors are uncorrelated)
– Oblique (factors intercorrelate)
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
25
Orthogonal Rotation
PSYC 4310/6310
Advanced Experimental Methods and Statistics
Oblique Rotation
© 2009, Michael Kalsher and James Watt
26
Orthogonal
Rotation
(varimax)
Note: Varimax rotation is the
most commonly used rotation.
Its goal is to minimize the
complexity of the components
by making the large loadings
larger and the small loadings
smaller within each
component. Quartimax
rotation makes large loadings
larger and small loadings
smaller within each variable.
Equamax rotation is a
compromise that attempts to
simplify both components and
variables. These are all
orthogonal rotations, that is,
the axes remain perpendicular,
so the components are not
correlated.
PSYC 4310/6310
Rotated Component Matrixa
Component
1
I have little experience of computers
SPSS always crashes when I try to use it
I worry that I will cause irreparable damage because
of my incompetenece with computers
All computers hate me
Computers have minds of their own and deliberately
go wrong whenever I use them
Computers are useful only for playing games
Computers are out to get me
I can't sleep for thoughts of eigen vectors
I wake up under my duvet thinking that I am trapped
under a normal distribtion
Standard deviations excite me
People try to tell you that SPSS makes statistics
easier to understand but it doesn't
I dream that Pearson is attacking me with correlation
coefficients
I weep openly at the mention of central tendency
Statiscs makes me cry
I don't understand statistics
I have never been good at mathematics
I slip into a coma whenever I see an equation
I did badly at mathematics at school
My friends are better at statistics than me
My friends are better at SPSS than I am
If I'm good at statistics my friends will think I'm a nerd
My friends will think I'm stupid for not being able to
cope with SPSS
Everybody looks at me when I use SPSS
2
3
4
.800
.684
.647
Fear of Computers
.638
.579
.550
.459
.677
.661
-.567
.473
.523
Fear of Statistics
.516
.514
.496
.429
Fear of Math
.833
.747
.747
Peer Evaluation
.648
.645
.586
.543
.427
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 9 iterations.
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
27
Pattern Matrix a
Component
1
Oblique
Rotation:
Pattern Matrix
I can't sleep for thoughts of eigen vectors
I wake up under my duvet thinking that I am trapped
under a normal distribtion
Standard deviations excite me
I dream that Pearson is attacking me with correlation
coefficients
I weep openly at the mention of central tendency
Statiscs makes me cry
I don't understand statistics
My friends are better at SPSS than I am
My friends are better at statistics than me
If I'm good at statistics my friends will think I'm a nerd
My friends will think I'm stupid for not being able to
cope with SPSS
Everybody looks at me when I use SPSS
I have little experience of computers
SPSS always crashes when I try to use it
All computers hate me
I worry that I will cause irreparable damage because
of my incompetenece with computers
Computers have minds of their own and deliberately
go wrong whenever I use them
Computers are useful only for playing games
People try to tell you that SPSS makes statistics
easier to understand but it doesn't
Computers are out to get me
I have never been good at mathematics
I slip into a coma whenever I see an equation
I did badly at mathematics at school
2
3
4
.706
.591
-.511
Fear of Statistics
.405
.400
.643
.621
.615
Peer Evaluation
.507
Fear of Computers
.885
.713
.653
.650
.588
.585
.412
.462
.411
Fear of Math
-.902
-.774
-.774
Extraction Method: Principal Component Analysis.
Rotation Method: Oblimin with Kaiser Normalization.
a. Rotation converged in 29 iterations.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
28
Reliability:
A measure should consistently reflect the construct it is measuring
• Test-Retest Method
– What about practice effects/mood states?
• Alternate Form Method
– Expensive and Impractical
• Split-Half Method
– Splits the questionnaire into two random halves,
calculates scores and correlates them.
• Cronbach’s Alpha
– Splits the questionnaire (or sub-scales of a questionnaire)
into all possible halves, calculates the scores, correlates
them and averages the correlation for all splits.
– Ranges from 0 (no reliability) to 1 (complete reliability)
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
29
Reliability:
PSYC 4310/6310
Fear of Computers Subscale
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
30
Reliability:
PSYC 4310/6310
Advanced Experimental Methods and Statistics
Fear of Statistics Subscale
© 2009, Michael Kalsher and James Watt
31
Reliability:
PSYC 4310/6310
Advanced Experimental Methods and Statistics
Fear of Math Subscale
© 2009, Michael Kalsher and James Watt
32
Reliability:
PSYC 4310/6310
Advanced Experimental Methods and Statistics
Peer Evaluation Subscale
© 2009, Michael Kalsher and James Watt
33
Reporting the Results
A principal component analysis (PCA) was conducted on the 23 items with
orthogonal rotation (varimax). Bartlett’s test of sphericity, Χ2(253) = 19334.49,
p< .001, indicated that correlations between items were sufficiently large for
PCA. An initial analysis was run to obtain eigenvalues for each component in
the data. Four components had eigenvalues over Kaiser’s criterion of 1 and
in combination explained 50.32% of the variance. The scree plot was slightly
ambiguous and showed inflexions that would justify retaining either 2 or 4
factors.
Given the large sample size, and the convergence of the scree plot and
Kaiser’s criterion on four components, four components were retained in the
final analysis. Component 1 represents a fear of computers, component 2 a
fear of statistics, component 3 a fear of math, and component 4 peer
evaluation concerns.
The fear of computers, fear of statistics, and fear of math subscales of the
SAQ all had high reliabilities, all Chronbach’s α = .82. However, the fear of
negative peer evaluation subscale had a relatively low reliability, Chronbach’s
α= .57.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
34
Step 1: Select Factor Analysis
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Step 2: Add all variables to be included
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Step 3: Get descriptive statistics & correlations
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Step 4:
Ask for Scree Plot and set
extraction options
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Step 5:
PSYC 4310/6310
Handle missing values and sort
coefficients by size
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Step 6:
PSYC 4310/6310
Select rotation type and set
rotation iterations
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Step 7:
PSYC 4310/6310
Advanced Experimental Methods and Statistics
Save Factor Scores
© 2009, Michael Kalsher and James Watt
Communalities
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Variance Explained
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Scree Plot
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Rotated Component Matrix:
Component 1
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Rotated Component Matrix:
Component 2
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Component 1:
PSYC 4310/6310
Advanced Experimental Methods and Statistics
Factor Score
© 2009, Michael Kalsher and James Watt
Component (Factor): Score Values
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt
Rename Components
According to Interpretation
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2009, Michael Kalsher and James Watt