All about variable selection in factor analysis and

Download Report

Transcript All about variable selection in factor analysis and

IMPS2001, July 15-19,2001
Osaka, Japan
All about variable selection in
factor analysis and structural
equation modeling
Yutaka Kano
Osaka University
School of Human Sciences
1
2
Today’s talk






Motivation for variable selection
How SEFA (and SCoFA) works
Derivation of the statistics
Theoretical property
What does variable selection with model
fit mean?
Summary
3
Needs for variable selection

Variable selection in EFA is an important
but time-consuming process



Composite scale construction
Reliability analysis
Variable selection in SEM should be less
important but …


Indicator selection
Improvement of model fit
4
Recent literature



Little et. al. (1999). On selecting indicators
for multivariate measurement and modeling
with latent variables. Psychological Methods,
4, 192-211.
Fabrigar et. al. (1999). Evaluating the use of
EFA in psychological research. Psychological
Methods, 4, 272-299.
Kano et. al. (in press, 2000, 1994).
Procedures for variable
selection in EFA

Usual procedure




Magnitude of communalities
Interpretability
Towards simple structures
Our approach

Model fit
5
6
Programs for variable selection
in factor analysis

Exploratory analysis



SEFA(Stepwise variable selection in EFA)
http://koko15.hus.osakau.ac.jp/~harada/sefa2001/stepwise/
Confirmatory analysis


SCoFA(Stepwise Confirmatory FA)
http://koko16.hus.osakau.ac.jp/~harada/scofa/input.html
7
Example_1




A questionnaire on perception on
physical exercise
n=653, p=15, one-factor model
Data was collected by Dr Oka
(Waseda U.)
Conclusion

Remove X2, X9, X13, X14
8
Example_2
9
Example_3
10
Example_4
11
Example_5
12
Example_6
13
SCoFa:
24 Pschological
variable
Original Model (p=24)
2
 231
(0.05)  267.45
14
15
Theory of SEFA and SCoFA



Obtain estimates for a current model
Construct predicted chi-square for each
one-variable-deleted model using the
estimates, without tedious iterations
Take a sort of LM approach
16
Known quantities and goal_1
Current Modeland Statistics
MODEL: V( X)  ( ),
MLE : ˆ
 2  STATISTICS:
T0  H 0 : V( X)  ( )
vs
A : V( X) is saturated
17
Known quantities and goal_2
What wewant is
T2  H 2 : V( X 2 )  22 ( )
vs
A : V( X 2 ) is saturated
where
X  [ X 1 , X 2 ,, X p ]' : observedvector(in a current model)

X2
X 1 : possiblyinconsiste
nt variable to be examined
18
Basic idea
New test statisticsto be used :
T0  H 0 : V( X)  ( )
vs
A : V( X) is saturated
T2  H 2 : V( X 2 )  22 ( )
vs
A : V( X 2 ) is saturated
vs
A : V( X) is saturated
vs
 11
H 2' : V( X)  
 21
 11
T2'  H 2' : V( X)  
 21
T02'  H 0 : V( X)  ( )
a
 12 
 22 ( )
 12 
 22 ( )
T2  T2'  T0  T0  T2'  We construct T as LM test
02’
 T0  T02'
19
Final formula for T2
T2  T0  T02'



 n  v S 22   22 ( ) '
 1
 1


 1

 2 N ( )  2 N ( )  2 ( )  2 ( )' 2 N ( )  2 ( )

 v S 22   22 ( )





1


 2 ( )' 2 N ( ) 1 

Note: This is Browne’s (Browne 1982)
statistic of goodness-of-fit using
general estimates
20
Properties_1
 X1 
1 0 0 
V    ( ) 
0 D 
n
22 
X 2 

L
T2   (  0) if D22  O
L
2
2
T2   (  0) if D22  O
2
2
21
Question 1


Can T2 work even if X1 is inconsistent?
Estimate for Θ is biased.
 X1 
1 0 0 
V    ( ) 
0 D 
n
22 
X 2 

L
T2   (  0) if D22  O
L
2
2
T2   (  0) if D22  O
2
2
22
Properties_2
 X1 
1  d11 d12 
V    ( ) 
d

n  21 D22 
X 2 
 Eitherd12  0 or not,we can prove
L
T2   (  0) if D22  O
L
2
2
T2   (  0) if D22  O
2
2
23
Question 2



Can SEFA identify an uncorrelated
variable?
Unfortunately, no
We have developed a way of testing
zero communality in SEFA
(see Harada-Kano, IMPS)
24
Question 3

What is the actual meaning of variable
selection with model fit?

The following shows an illustrative
example:
25
Answer 3_1: Example again

X2, X9, X13, X14 are to be removed
26
Answer 3_2: Example again


Best fitted model with correlated errors
SEFA conclusion: X2, X9, X13, X14 are to be
removed
27
Answer 3_3: Example again


Variables to be deleted are identified so as to
break up the correlated errors
Correlated errors may cause

Different interpretation of FA results



Common factors considered are not enough to explain
correlations between observed variables
Such variables are not good indicators (e.g., in SEM)
Inaccurate reliability estimates


Green-Hershberger (2000), Raykov (2001)
Kano-Azuma (2001, IMPS)
28
Question 4

How one should do if SEFA or SCoFA
identifies a variable with large factor
loading estimate as inconsistent?
29
Answer 4_1: Reliability

If one employs the alpha coefficient or




    
2
i
2
i
i
(s)he has to delete it to have a good-fit
model.
30
Answer 4_2: Reliability

If one employs



'
    
2
i
2
i
ij
(s)he can remain it, and compare
reliability between models.
31
Answer 4_3: Example
ρ'
0.64
α
0.74
Bad-fitted One-factor
Model based ρ
0.76
32
Answer 4_4: Example
ρ'
α
0.64
0.74
0.63
0.63
33
Answer 4_5: Example
ρ'
α
0.60
0.78
0.63
0.63
34
Summary_1


A new option for variable selection was
introduced, which is based on model fit.
You can easily access the programs on the
internet

SEFA(Stepwise variable selection in EFA)


http://koko15.hus.osakau.ac.jp/~harada/sefa2001/stepwise/
SCoFA(Stepwise Confirmatory FA)

http://koko16.hus.osakau.ac.jp/~harada/scofa/input.html
35
Summary_2


It enjoys preferable theoretical
properties
Testing null communality is important



Uncorrelated variables cannot be identified
Variable selection with model fit can
find out error correlations
Traditional reliability coefficients based
on a poor-fit model have serious bias
36
Summary_3


High communality variables can be
inconsistent
Whether such variables should be
removed depends

Reliability has to be figured out using
nonstandard factor model
37
References




Harada, A. and Kano, Y. (2001) Variable selection
and test of communality in EFA. IMPS2001, Osaka
Kano, Y. (in press).
Variable selection for structural models. Journal of
Statistical Inference and Planning.
Kano, Y. and Harada, A. (2000).
Stepwise variable selection in factor analysis.
Psychometrika, 65, 7-22.
Kano, Y. and Ihara, M. (1994).
Identification of inconsistent variates in factor
analysis. Psychometrika, Vol.59, 5-20
Thank you for coming to
Osaka and being at my talk


38
TakoYaki
performance will
start soon
You can understand
how octopus relates
to Osaka, if you see
and taste it