All about variable selection in factor analysis and
Download
Report
Transcript All about variable selection in factor analysis and
IMPS2001, July 15-19,2001
Osaka, Japan
All about variable selection in
factor analysis and structural
equation modeling
Yutaka Kano
Osaka University
School of Human Sciences
1
2
Today’s talk
Motivation for variable selection
How SEFA (and SCoFA) works
Derivation of the statistics
Theoretical property
What does variable selection with model
fit mean?
Summary
3
Needs for variable selection
Variable selection in EFA is an important
but time-consuming process
Composite scale construction
Reliability analysis
Variable selection in SEM should be less
important but …
Indicator selection
Improvement of model fit
4
Recent literature
Little et. al. (1999). On selecting indicators
for multivariate measurement and modeling
with latent variables. Psychological Methods,
4, 192-211.
Fabrigar et. al. (1999). Evaluating the use of
EFA in psychological research. Psychological
Methods, 4, 272-299.
Kano et. al. (in press, 2000, 1994).
Procedures for variable
selection in EFA
Usual procedure
Magnitude of communalities
Interpretability
Towards simple structures
Our approach
Model fit
5
6
Programs for variable selection
in factor analysis
Exploratory analysis
SEFA(Stepwise variable selection in EFA)
http://koko15.hus.osakau.ac.jp/~harada/sefa2001/stepwise/
Confirmatory analysis
SCoFA(Stepwise Confirmatory FA)
http://koko16.hus.osakau.ac.jp/~harada/scofa/input.html
7
Example_1
A questionnaire on perception on
physical exercise
n=653, p=15, one-factor model
Data was collected by Dr Oka
(Waseda U.)
Conclusion
Remove X2, X9, X13, X14
8
Example_2
9
Example_3
10
Example_4
11
Example_5
12
Example_6
13
SCoFa:
24 Pschological
variable
Original Model (p=24)
2
231
(0.05) 267.45
14
15
Theory of SEFA and SCoFA
Obtain estimates for a current model
Construct predicted chi-square for each
one-variable-deleted model using the
estimates, without tedious iterations
Take a sort of LM approach
16
Known quantities and goal_1
Current Modeland Statistics
MODEL: V( X) ( ),
MLE : ˆ
2 STATISTICS:
T0 H 0 : V( X) ( )
vs
A : V( X) is saturated
17
Known quantities and goal_2
What wewant is
T2 H 2 : V( X 2 ) 22 ( )
vs
A : V( X 2 ) is saturated
where
X [ X 1 , X 2 ,, X p ]' : observedvector(in a current model)
X2
X 1 : possiblyinconsiste
nt variable to be examined
18
Basic idea
New test statisticsto be used :
T0 H 0 : V( X) ( )
vs
A : V( X) is saturated
T2 H 2 : V( X 2 ) 22 ( )
vs
A : V( X 2 ) is saturated
vs
A : V( X) is saturated
vs
11
H 2' : V( X)
21
11
T2' H 2' : V( X)
21
T02' H 0 : V( X) ( )
a
12
22 ( )
12
22 ( )
T2 T2' T0 T0 T2' We construct T as LM test
02’
T0 T02'
19
Final formula for T2
T2 T0 T02'
n v S 22 22 ( ) '
1
1
1
2 N ( ) 2 N ( ) 2 ( ) 2 ( )' 2 N ( ) 2 ( )
v S 22 22 ( )
1
2 ( )' 2 N ( ) 1
Note: This is Browne’s (Browne 1982)
statistic of goodness-of-fit using
general estimates
20
Properties_1
X1
1 0 0
V ( )
0 D
n
22
X 2
L
T2 ( 0) if D22 O
L
2
2
T2 ( 0) if D22 O
2
2
21
Question 1
Can T2 work even if X1 is inconsistent?
Estimate for Θ is biased.
X1
1 0 0
V ( )
0 D
n
22
X 2
L
T2 ( 0) if D22 O
L
2
2
T2 ( 0) if D22 O
2
2
22
Properties_2
X1
1 d11 d12
V ( )
d
n 21 D22
X 2
Eitherd12 0 or not,we can prove
L
T2 ( 0) if D22 O
L
2
2
T2 ( 0) if D22 O
2
2
23
Question 2
Can SEFA identify an uncorrelated
variable?
Unfortunately, no
We have developed a way of testing
zero communality in SEFA
(see Harada-Kano, IMPS)
24
Question 3
What is the actual meaning of variable
selection with model fit?
The following shows an illustrative
example:
25
Answer 3_1: Example again
X2, X9, X13, X14 are to be removed
26
Answer 3_2: Example again
Best fitted model with correlated errors
SEFA conclusion: X2, X9, X13, X14 are to be
removed
27
Answer 3_3: Example again
Variables to be deleted are identified so as to
break up the correlated errors
Correlated errors may cause
Different interpretation of FA results
Common factors considered are not enough to explain
correlations between observed variables
Such variables are not good indicators (e.g., in SEM)
Inaccurate reliability estimates
Green-Hershberger (2000), Raykov (2001)
Kano-Azuma (2001, IMPS)
28
Question 4
How one should do if SEFA or SCoFA
identifies a variable with large factor
loading estimate as inconsistent?
29
Answer 4_1: Reliability
If one employs the alpha coefficient or
2
i
2
i
i
(s)he has to delete it to have a good-fit
model.
30
Answer 4_2: Reliability
If one employs
'
2
i
2
i
ij
(s)he can remain it, and compare
reliability between models.
31
Answer 4_3: Example
ρ'
0.64
α
0.74
Bad-fitted One-factor
Model based ρ
0.76
32
Answer 4_4: Example
ρ'
α
0.64
0.74
0.63
0.63
33
Answer 4_5: Example
ρ'
α
0.60
0.78
0.63
0.63
34
Summary_1
A new option for variable selection was
introduced, which is based on model fit.
You can easily access the programs on the
internet
SEFA(Stepwise variable selection in EFA)
http://koko15.hus.osakau.ac.jp/~harada/sefa2001/stepwise/
SCoFA(Stepwise Confirmatory FA)
http://koko16.hus.osakau.ac.jp/~harada/scofa/input.html
35
Summary_2
It enjoys preferable theoretical
properties
Testing null communality is important
Uncorrelated variables cannot be identified
Variable selection with model fit can
find out error correlations
Traditional reliability coefficients based
on a poor-fit model have serious bias
36
Summary_3
High communality variables can be
inconsistent
Whether such variables should be
removed depends
Reliability has to be figured out using
nonstandard factor model
37
References
Harada, A. and Kano, Y. (2001) Variable selection
and test of communality in EFA. IMPS2001, Osaka
Kano, Y. (in press).
Variable selection for structural models. Journal of
Statistical Inference and Planning.
Kano, Y. and Harada, A. (2000).
Stepwise variable selection in factor analysis.
Psychometrika, 65, 7-22.
Kano, Y. and Ihara, M. (1994).
Identification of inconsistent variates in factor
analysis. Psychometrika, Vol.59, 5-20
Thank you for coming to
Osaka and being at my talk
38
TakoYaki
performance will
start soon
You can understand
how octopus relates
to Osaka, if you see
and taste it