Designing Monte Carlo Simulation Studies

Download Report

Transcript Designing Monte Carlo Simulation Studies

Designing Monte Carlo Simulation Studies
Xitao Fan, Ph.D.
Chair Professor & Dean
Faculty of Education
University of Macau
Getting Involved in Monte Carlo Simulation
Fan, X., Felsovalyi, A., Sivo, S. A., & Keenan, S. (2002) SAS for Monte
Carlo studies: A guide for quantitative researchers. Cary,
NC: SAS Institute, Inc.
Fan, X. (2012). Designing simulation studies. In H.
Cooper (Ed.), Handbook of Research Methods in
Psychology,Vol. 2 (pp. 427-444). Washington, DC:
American Psychological Association.
Getting Involved in Monte Carlo Simulation
Peugh, J., & Fan, X. (In press). Enumeration index performance in generalized growth
mixture models: a Monte Carlo test of Muthén’s (2003) hypothesis. Structural
Equation Modeling.
Peugh, J., & Fan, X. (In press). Modeling unobserved heterogeneity using latent profile
analysis: A Monte Carlo simulation. Structural Equation Modeling.
Peugh, J., & Fan, X. (2012). How well does growth mixture modeling identify
heterogeneous growth trajectories? A simulation study examining GMM’s
performance characteristics. Structural Equation Modeling, (19), 204-226.
Fan, X., & Sivo, S. A. (2009). Using goodness-of-fit indices in assessing mean
structure invariance. Structural Equation Modeling, 16, 1-16.
Fan, X. & Sivo, S. (2007). Sensitivity of fit indices to model misspecification and model
types. Multivariate Behavioral Research, 42, 509-529.
Sivo, S. A., Fan, X., Witta, E. L., & Willse, J. T. (2006). The search for "optimal" cutoff
properties: Fit index criteria in structural equation modeling. Journal of Experimental
Education, 74, 267-288.
Getting Involved in Monte Carlo Simulation
Fan, Xitao, & Fan, Xiaotao. (2005). Power of latent growth modeling for detecting linear
growth: Number of measurements and comparison with other analytic approaches.
Journal of Experimental Education, 73, 121-139.
Fan, X., & Sivo, S. A. (2005). Sensitivity of fit indices to misspecified structural or
measurement model components: Rationale of two-index strategy revisited. Structural
Equation Modeling, 12, 343-367.
Fan, Xitao, & Fan, Xiaotao. (2005). Using SAS for Monte Carlo simulation research in
structural equation modeling. Structural Equation Modeling, 12, 299-333.
Sivo, S., Fan, X., & Witta, L. (2005). The biasing effects of unmodeled ARMA time series
processes on latent growth curve model estimates. Structural Equation Modeling, 12,
215-231.
Fan, X. (2003). Two Approaches for Correcting Correlation Attenuation Caused by
Measurement Error: Implications for Research Practice. Educational and Psychological
Measurement, 63, 6, 915-930.
Fan, X. (2003). Power of latent growth modeling for detecting group differences in linear
growth trajectory parameters. Structural Equation Modeling, 10, 380-400.
Getting Involved in Monte Carlo Simulation
Yin, P., & Fan, X. (2001). Estimating R2 shrinkage in multiple regression: A comparison of
different analytical methods. Journal of Experimental Education, 69, 203-224.
Fan, X., & Wang, L. (1999). Comparing logistic regression with linear discriminant analysis in
their classification accuracy. Journal of Experimental Education, 67, 265-286.
Fan, X., Thompson, B, & Wang, L. (1999). The effects of sample size, estimation methods,
and model specification on SEM fit indices. Structural Equation Modeling: A
Multidisciplinary Journal, 6, 56-83.
Fan, X., & Wang, L. (1998). Effects of potential confounding factors on fit indices and
parameter estimates for true and misspecified SEM models. Educational and
Psychological Measurement, 58, 699-733.
Fan, X. & Wang, L. (1996). Comparability of jackknife and bootstrap results: An investigation
for a case of canonical analysis. Journal of Experimental Education, 64, 173-189.
What Is a Monte Carlo Simulation Study?

“the use of random sampling techniques and often the use of computer
simulation to obtain approximate solutions to mathematical or physical
problems especially in terms of a range of values each of which has a
calculated probability of being the solution” (Merriam-Webster OnLine).

An empirical alternative to a theoretical approach (i.e., a solution based
on statistical/mathematical theory)

Increasingly possible because of the advances in computing technology
Situations Where Simulation Is Useful

Consequences of Assumption Violations
Statistical Theory: stipulates what the condition should be, but does not say what
the reality would be if the conditions were not satisfied in the data

Understanding a Sample Statistic That May Not Have Theoretical
Distribution
●
Many Other Situations

Retaining the optimal number of factors in EFA

Evaluating the performance of mixture modeling in identifying the latent
groups

Assessing the consequences of failure to model correlated error structure in
latent growth modeling
Basic Steps in a Simulation Study

Asking Questions Suitable for a Simulation Study

Questions for which no (no trustworthy) analytical/theoretical solutions
 Simulation Study Design (Example)
 Include / manipulate the major factors that potentially affect the outcome
 Data Generation
 Sample data generation & transformation
 Analysis (Model Fitting) for Sample Data
 Accumulation and Analysis of the Statistic(s) of Interest
 Presentation and Drawing Conclusions
 Conclusions limited to the design conditions
An Example: Independent t-test (group variance homogeneity)
An Example: Independent t-test (group variance homogeneity)
Data Generation in a Simulation Study
 Common Random Number Generators
* binomial, Cauchy, exponential, gamma, Poisson, normal, uniform, etc.
* All distributions are based on uniform distribution

Simulating Univariate Sample Data
* Normally-Distributed Sample Data (N ~ , 2)
* Non-Normal Distribution: Fleishman (1978):
a, b, c, d: coefficients needed for transforming the unit normal variate to a nonnormal variable with specified degrees of population skewness and kurtosis.
Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43,
521-531.
Data Generation in a Simulation Study

Sample Data from a Multivariate Normal Distribution
*
F:

matrix decomposition procedure (Kaiser & Dickman, 1962):
k  k matrix containing principal component factor pattern coefficients obtained by
applying principal component factorization to the given population inter-correlation
matrix R;
Sample Data from a Multivariate Non-Normal Distribution
*
Interaction between non-normality and inter-variable correlations
*
Intermediate correlations using Fleishman coefficients (Vale & Maurelli, 1983)
*
Matrix decomposition procedure applied to intermediate correlation matrix
Kaiser, H. F., & Dickman, K. (1962). Sample and population score matrices and sample correlation matrices from
an arbitrary population correlation matrix. Psychometrika, 27, 179-182
Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48, 465471.
Checking the Validity of Data Generation Procedures
 Example: Multivariate non-normal sample data (three correlated
variables)
From Simulation Design to Population Data Parameters

It may take much effort to obtain population parameters – t-test example
From Simulation Design to Population Data Parameters

Latent growth model example
From Simulation Design to Population Data Parameters

Latent growth model example
Accumulation and Analysis of the Statistic(s) of Interest
 Accumulation: Straightforward or Complicated
* Typically, not an automated process
* Statistical software used
* Analytical techniques involved
* Type of statistic(s) of interest, etc.
 Analysis
*
Follow-up data analysis may be simple or complicated
*
Not different from many other data analysis situations
Presentation and Drawing Conclusions
 Presentation
* Representativeness & Exceptions
* Graphic Presentations
* Typical: table after table of results – No one has the time to read the tables!
 Drawing Conclusions
*
Validity & generalizability depend on the adequacy & appropriateness of
simulation design
*
Conclusions must be limited by the design conditions and levels.