Factor Analysis - Home : School of Psychology:Trinity

Download Report

Transcript Factor Analysis - Home : School of Psychology:Trinity

Factor Analysis
Introduction to concept
Reading = Individual differences by
Colin Cooper
1
Common Terms Used
Exploratory
Factor
Analysis
 Principle Components
analysis
 Factor analysis (principleaxis factoring)
Confirmatory
Factor
Analysis
 Path analysis
 Latent variable analysis
FA
2
Factor Analysis - Intro
• Data reduction - identifies parts of data set which
potentially measure the same thing.
– Commonly encountered through identification of
personality dimensions.
• Hundreds of questions relating to components of personality
are complied
–
–
–
–
Do you enjoy socialising with different people at parties?
Do you worry a lot?
Do you enjoy trying out new things?
Do you get upset very easily?
• Questions consistently responded to in similar manner by
different respondents supposedly address the same
underlying construct or ‘Common Factor’ - e.g.,
Extraversion-Introversion or Neuroticism.
3
Factor Analysis - Intro
• Most data, generated from responses, can be
exposed to FA and therefore it is not limited to
questionnaires e.g., a series of physical tests
may have as their essence one or two core
skills.
• Is arguably the most abused statistical
technique used. Generates much controversy
and the treatment here is very simplified.
• Are many different types - simplest form
described here - others identified in due course
4
Identifying No. of factors inspection of responses
Question Sample
Answer 1 for strongly agree and 5 for strongly disagree
Q1 I enjoy socialising
1
Q2 I often act on impulse
1
Q3 I am a cheerful sort of person
1
Q4 I often feel depressed
1
Q5 I have difficulty getting to sleep
1
Q6 Large crowds make me feel anxious
1
Response Sample
Stephen
Ann
Paul
Janette
Michael
Christine
Q1
5
1
3
4
3
3
Q2
5
2
4
4
3
3
Q3
4
1
3
3
4
3
2
2
2
2
2
2
3
3
3
3
3
3
4
4
4
4
4
4
Q4
1
1
4
1
1
5
Q5
1
1
5
2
2
4
Q6
2
2
4
1
2
5
5
5
5
5
5
5
5
Tentative inferences
• Responses to Q1-3 and Q4-6 were very
similar. Suggests these questions are
addressing the same common factor.
• Made easy by the fact
– related question items were positioned side by
side
– were very few participants. In normal
situations this would be impossible
• Usually a correlation matrix is required to
identify which items are related to each
other.
6
Correlation matrix
Q1
Q1
Q2
Q3
Q4
Q5
Q6
Q2
Q3
Q4
Q5
Q6
1
.933 1
.824 .696 1
-.096 -.052 0
1
-.005 .058 .111 .896 1
-.167 -.127 0
.965 .808 1
Mentioned 2 slides later
Corrleation matrix depicting the correlations between the
six items given the responses previously documented.
7
Interpretation
• Q1-3 correlate strongly with each other and
hardly at all with 4-6 indicating 2 common
factors.
• This would not be typical.
– Correlations here are artificially large - in real
life would rarely be in excess of .5 - typically
be between .2-.3. Would make it very difficult
to establish a pattern by eye.
– With more items there would be a greater
number of correlations to observe.
• 6 items produced 15 correlations. 40 items would
produce 780 items - N(N-1)/2.
8
Representing FA through geometry
• Items or factors can be represented by
straight lines of equal length.
Q4
Q6
• Lines are positioned such that the
correlation between the items = cosine of
the angle.
hypotenuse
cosine of angle =
hypotenuse/ adjacent
adjacent
• correlation =.97, cosine=.97, angle = 15
9
Interpreting angles
• Factors/items above horizontal are positively correlated to F1
• Factors/items at right angles to F1 have zero correlation to it
• Factors/items at 180 have a perfect negative correlation
F1 F2
F1- F2 =15, r= .97
F1- F3 =105, r= -.26
F1 - F4 =165, r= -.97
F1 - F5 =285, r= .26
F3
10
Combining Factors and Items
F1
I1 I2 I3
I4
I5
I6
F2
• Roughly orthogonal solution for the items
described previously.
11
Possible relationships between
Common Factors
• Orthogonal solution - when two common
factors are extracted which are not
themselves correlated i.e., they are at right
angles to each other. Is preferable since if
the common factors are not correlated they
truly represent independent factors.
• Oblique solution - the common factors
extracted may themselves may be
correlated.
12
Essential FA output & associated
statistical concepts
• Factor (structure) matrix - table showing the correlations
between all the items and the factors. By convention factors are
shown as columns.
Item
Factor 1
Factor 2
I1
.9
.1
I2
.98
.0
I3
.9
-.1
I4
.1
.85
I5
.0
.98
I6
-.1
.85
– Factor loading - correlation between an item and a factor
NB this is different to the correlation matrix
13
Factor matrix shows 3 things/ 1&2
• Which items make up which common factor
– Convention dictates that an item only contributes to a factor if
the correlation is greater than ±.3
• Revels amount of overlap between each item and all the
factors
– square of correlation indicates the common variance between
item and factor. Sum these squared correlations =
communality of item
• For I1= .92 + .12 = .82
– Communality for an item may be low because
• measures something conceptually different from all the other items
• Has excessive measurement error
• Are few individual differences in the way the item is responded to -14
may be very easy or very difficult
Factor matrix shows 3 things/ 3
• Indicates the relative importance of each
common factor i.e., A factor that for example
explains 40% of the overlap between the items
will be more important than one that only
explains 25%.
– Calculated through an eigenvalue.
• Square the factor loadings for a single factor, add them up = the
eigenvalue.
• Divide the eigenvalue by the number of items - proportion of
variance which is explained by that factor.
15
• Calculating for Factor 1
Item
Factor 1
Factor 2
I1
.9
.1
I2
.98
.0
I3
.9
-.1
I4
.1
.85
I5
.0
.98
I6
-.1
.85
– eigenvalue = .92 + .982 + .92 + .12 + 02 + -.12 =2.6
– Variance explained by factor 1= 2.6/6= 43%
16
Additional observation
• Indicates possibility that some of the
variance may be unexplained by the factors.
Possible explanations:
– Factors are an approximation - some of the
original information is sacrificed during this
process. The 2 different methods of EFA make
different assumptions about the possibility of
unexplained variance.
17
Principal Components Analysis vs.
Principle Axis Factoring
• Both are examples of Exploratory FA but are
distinguished by assumption regarding the
possibility of unexplained variance
• PCA - all item variance can be explained by the
factors. All items will have a communality of 1
and the factors will, between them, account for
100% of the variation among the items.
– Total variance = common factor variance +
measurement error
18
• PAF - items may have ‘unique variance’ variance which cannot be explained by factors
– Suppose there are two test items:
What is the capital of Italy? What is the capital of Spain?
• Lets assume that they are of the same level of difficulty and
therefore test the underlying factor (geographical
knowledge) to the same degree. Will these items always be
responded to in the exact same way?
– Someone may have a poor level of Geographical knowledge but
just happen to know the capital of Spain. It is therefore not
possible to consider the two items as being completely equivalent
• Correct response depends on knowledge relating to
– common factor (geographical knowledge)
– something unique to the individual item - Specific Variance cannot be predicted from the common factors
19
– Total variance = common factor variance +
specific item variance + measurement error
• PAF is more complicated because it must
determine how much of the variance relating
to an item is ‘common-factor’ variance and
how much is ‘specific variance’.
– PCA does not allow for the possibility of Item
specific variance
20
PCA or PAF?
• Seem to produce very similar results so
much so that some researchers do not
identify which one they are carrying out.
• Since PAF allows for specific variance then
an item’s communality is necessarily going
to be less than one
– Loading factors for items are going to appear
less impressive with PAF as opposed to PCA.
21
Used for 4 basic purposes /1
• Shows how many distinct common factors
are measured by a set of test items
– Are the supposed different constructs:
neuroticism, anxiety, hysteria, ego strength,
self-actualisation, and locus of control, 6
independent entities or would they be better
described as only 2 factors?
neuroticism, anxiety,
hysteria
‘Elements of Pathology’
ego strength, selfactualisation, locus of
control
‘Healthy mechanisms’
22
Used for 4 basic purposes /2&3
• Shows which items relate to which common
factors
– from previous example neuroticism belonged to
the factor ‘Elements of Pathology’
• Determines whether tests that purportedly
measure the same thing in fact do so
– 3 tests that claim to measure anxiety. FA may
produce more than one factor indicating
something in addition to anxiety is being
measured
23
Used for 4 basic purposes /4
• Checks the psychometric properties of
questionnaire - with a different sample do
the same factors materialise?
– Would a different population made up of Native
American Indians identify the constructs of
extraversion-introversion & Neuroticism which
have been found in European cultures?
24