#### Transcript All about variable selection in factor analysis and

IMPS2001, July 15-19,2001 Osaka, Japan All about variable selection in factor analysis and structural equation modeling Yutaka Kano Osaka University School of Human Sciences 1 2 Today’s talk Motivation for variable selection How SEFA (and SCoFA) works Derivation of the statistics Theoretical property What does variable selection with model fit mean? Summary 3 Needs for variable selection Variable selection in EFA is an important but time-consuming process Composite scale construction Reliability analysis Variable selection in SEM should be less important but … Indicator selection Improvement of model fit 4 Recent literature Little et. al. (1999). On selecting indicators for multivariate measurement and modeling with latent variables. Psychological Methods, 4, 192-211. Fabrigar et. al. (1999). Evaluating the use of EFA in psychological research. Psychological Methods, 4, 272-299. Kano et. al. (in press, 2000, 1994). Procedures for variable selection in EFA Usual procedure Magnitude of communalities Interpretability Towards simple structures Our approach Model fit 5 6 Programs for variable selection in factor analysis Exploratory analysis SEFA(Stepwise variable selection in EFA) http://koko15.hus.osakau.ac.jp/~harada/sefa2001/stepwise/ Confirmatory analysis SCoFA(Stepwise Confirmatory FA) http://koko16.hus.osakau.ac.jp/~harada/scofa/input.html 7 Example_1 A questionnaire on perception on physical exercise n=653, p=15, one-factor model Data was collected by Dr Oka (Waseda U.) Conclusion Remove X2, X9, X13, X14 8 Example_2 9 Example_3 10 Example_4 11 Example_5 12 Example_6 13 SCoFa: 24 Pschological variable Original Model (p=24) 2 231 (0.05) 267.45 14 15 Theory of SEFA and SCoFA Obtain estimates for a current model Construct predicted chi-square for each one-variable-deleted model using the estimates, without tedious iterations Take a sort of LM approach 16 Known quantities and goal_1 Current Modeland Statistics MODEL: V( X) ( ), MLE : ˆ 2 STATISTICS: T0 H 0 : V( X) ( ) vs A : V( X) is saturated 17 Known quantities and goal_2 What wewant is T2 H 2 : V( X 2 ) 22 ( ) vs A : V( X 2 ) is saturated where X [ X 1 , X 2 ,, X p ]' : observedvector(in a current model) X2 X 1 : possiblyinconsiste nt variable to be examined 18 Basic idea New test statisticsto be used : T0 H 0 : V( X) ( ) vs A : V( X) is saturated T2 H 2 : V( X 2 ) 22 ( ) vs A : V( X 2 ) is saturated vs A : V( X) is saturated vs 11 H 2' : V( X) 21 11 T2' H 2' : V( X) 21 T02' H 0 : V( X) ( ) a 12 22 ( ) 12 22 ( ) T2 T2' T0 T0 T2' We construct T as LM test 02’ T0 T02' 19 Final formula for T2 T2 T0 T02' n v S 22 22 ( ) ' 1 1 1 2 N ( ) 2 N ( ) 2 ( ) 2 ( )' 2 N ( ) 2 ( ) v S 22 22 ( ) 1 2 ( )' 2 N ( ) 1 Note: This is Browne’s (Browne 1982) statistic of goodness-of-fit using general estimates 20 Properties_1 X1 1 0 0 V ( ) 0 D n 22 X 2 L T2 ( 0) if D22 O L 2 2 T2 ( 0) if D22 O 2 2 21 Question 1 Can T2 work even if X1 is inconsistent? Estimate for Θ is biased. X1 1 0 0 V ( ) 0 D n 22 X 2 L T2 ( 0) if D22 O L 2 2 T2 ( 0) if D22 O 2 2 22 Properties_2 X1 1 d11 d12 V ( ) d n 21 D22 X 2 Eitherd12 0 or not,we can prove L T2 ( 0) if D22 O L 2 2 T2 ( 0) if D22 O 2 2 23 Question 2 Can SEFA identify an uncorrelated variable? Unfortunately, no We have developed a way of testing zero communality in SEFA (see Harada-Kano, IMPS) 24 Question 3 What is the actual meaning of variable selection with model fit? The following shows an illustrative example: 25 Answer 3_1: Example again X2, X9, X13, X14 are to be removed 26 Answer 3_2: Example again Best fitted model with correlated errors SEFA conclusion: X2, X9, X13, X14 are to be removed 27 Answer 3_3: Example again Variables to be deleted are identified so as to break up the correlated errors Correlated errors may cause Different interpretation of FA results Common factors considered are not enough to explain correlations between observed variables Such variables are not good indicators (e.g., in SEM) Inaccurate reliability estimates Green-Hershberger (2000), Raykov (2001) Kano-Azuma (2001, IMPS) 28 Question 4 How one should do if SEFA or SCoFA identifies a variable with large factor loading estimate as inconsistent? 29 Answer 4_1: Reliability If one employs the alpha coefficient or 2 i 2 i i (s)he has to delete it to have a good-fit model. 30 Answer 4_2: Reliability If one employs ' 2 i 2 i ij (s)he can remain it, and compare reliability between models. 31 Answer 4_3: Example ρ' 0.64 α 0.74 Bad-fitted One-factor Model based ρ 0.76 32 Answer 4_4: Example ρ' α 0.64 0.74 0.63 0.63 33 Answer 4_5: Example ρ' α 0.60 0.78 0.63 0.63 34 Summary_1 A new option for variable selection was introduced, which is based on model fit. You can easily access the programs on the internet SEFA(Stepwise variable selection in EFA) http://koko15.hus.osakau.ac.jp/~harada/sefa2001/stepwise/ SCoFA(Stepwise Confirmatory FA) http://koko16.hus.osakau.ac.jp/~harada/scofa/input.html 35 Summary_2 It enjoys preferable theoretical properties Testing null communality is important Uncorrelated variables cannot be identified Variable selection with model fit can find out error correlations Traditional reliability coefficients based on a poor-fit model have serious bias 36 Summary_3 High communality variables can be inconsistent Whether such variables should be removed depends Reliability has to be figured out using nonstandard factor model 37 References Harada, A. and Kano, Y. (2001) Variable selection and test of communality in EFA. IMPS2001, Osaka Kano, Y. (in press). Variable selection for structural models. Journal of Statistical Inference and Planning. Kano, Y. and Harada, A. (2000). Stepwise variable selection in factor analysis. Psychometrika, 65, 7-22. Kano, Y. and Ihara, M. (1994). Identification of inconsistent variates in factor analysis. Psychometrika, Vol.59, 5-20 Thank you for coming to Osaka and being at my talk 38 TakoYaki performance will start soon You can understand how octopus relates to Osaka, if you see and taste it