Structural Equation Modeling With Mplus-BYU
Download
Report
Transcript Structural Equation Modeling With Mplus-BYU
Structural Equation Modeling
Using Mplus
Chongming Yang
Research Support Center
FHSS College
Structural?
Structuralism
Components
Relations
Objectives
Introduction to SEM
The model
Parameters
Estimation
Model evaluation
Applications
Estimate simple models with Mplus
Continuous Dependent
Variables
Session I
Information of Variable
Mean
Variance
Skewedness
Kurtosis
Variance & Covariance
n
V
(x x )
2
i
i
n 1
n
Cov
( x x )( y y )
i
i
i
n 1
Covariance Matrix
(S)
x1
x2
x1
V1
x2
Cov21
V2
x3
Cov31
Cov32
x3
V3
Statistical Model
Probabilistic statement about Relations of
variables
Imperfect but useful representation of
reality
Structural Equation Modeling
A system of regression equations for
latent variables to estimate and test direct
and indirect effects without the influence
of measurement errors.
To estimate and test theories about
interrelations among observed and latent
variables.
Latent Variable
(Construct / Factor / Trait)
A hypothetical variable
cannot be measured directly
No objective measurement unit
inferred from observable manifestations
Multiple manifestations (indicators)
Normally distributed interval dimension
How is Depression
Distributed in?
BYU students
Patients for Therapy
Normal Distributions
Levels of Analyses
Observed
Latent
Test Theories
Classical True Score Theory:
Observed Score = True score + Error
Item Response Theory
Generalizability (Raykov & Marcoulides, 2006)
Graphic Symbols of SEM
Rectangle – observed variable
Oval -- latent variable or error
Single-headed arrow -- causal relation
Double-headed arrow -- correlation
Graphic Measurement Model
of Latent
1
X1
1
2
X2
2
3
X3
3
Equations
Specific
equations
X1 = 1 + 1
X2 = 2 + 2
X3 = 3 + 3
Matrix
Symbols
X = +
True
Score Theory?
Relations of Variances
VX1 = 12 + 1
VX2 = 22 + 2
VX3 = 32 + 3
= measurement error / uniqueness
Unknown Parameters
VX1 = 12 + 1
VX2 = 22 + 2
VX3 = 32 + 3
Sample Covariance Matrix
(S)
x1
x2
x1
V1
x2
Cov21
V2
x3
Cov31
Cov32
x3
V3
Variance of
Variance of = common covariance of X1
X2 and X3
1
0
0
Variance of
2
3
0
Unstandardized Parameterization
(scaling)
1
=1
(set variance of X1 =1; X1 called reference Indicator)
Variance of = common variance of X1 X2
and X3
Squared = explained variance of X (R2)
Variance of = unexplained variance-error
Total Variance = Squared + Variance
Just Identified Model
1
X1
1
2
X2
2
3
X3
3
Reference Indicator
(marker)
Choose conceptually the best
Small variance non-convergence
Different markers different parameters
estimates and their standard errors
Affect measurement invariance tests
Not affect standardized estimates
Standardized Parameterizations
(scaling)
Variance of = 1 = common variance of
X1 X2 and X3
Squared = explained variance of X (R2)
Variance of = 1 - 2
Mean of = 0
Mean of = 0
Two Kinds of Parameters
Fixed at 0, 1, or other values
Freely estimated
d1
Analytic
d2
Reasoning
d3
Verbal
d4
Self
Control
d5
Recognize/
Assess
d6
General
Intelligence
Social
Relations
e2
Perceived
Benefit
e3
Perceived
Cost
e4
Emotional
Intelligence
z2
Marital
Satisfaction
Agreeableness
Openness
e1
z1
Personality
d7
Job
Satisfaction
Being
Appreciated
Structural Equation Model
in Matrix Symbols
= x + (exogenous)
Y = y + (endogenous)
= + + (structural model)
X
Note: Measurement model reflects the true score
theory
Structural Equation Model
in Matrix Symbols
X = x + x + (measurement)
Y = y + y + (measurement)
= α + + + (structural)
Note: SEM with mean structure.
Model Implied Covariance Matrix
(Σ)
Note: This covariance matrix contains unknown parameters in the equations.
(I-B) = non-singular
Estimations/Fit Functions
Hypothesis: = S or - S = 0
Maximum Likelihood
F = log|||| + trace(S-1) - log||S|| - (p+q)
Convergence -- Reaching Limit
Minimize F while adjust unknown
Parameters through iterative process
Convergence value: F difference between
last two iterations
Default convergence = .0001
Increase to help convergence (0.001 or 0.01)
e.g.
Analysis: convergence = .01;
No Convergence
No unique parameter estimates
Lack of degrees of freedom under
identification
Variance of reference indicator too small
Fixed parameters are left to be freely
estimated
Misspecified model
Absolute Fit Index
2 = F(N-1) (N = sample size)
df = p(p+1)/2 – q
P = number of variances, covariances, & means
q = number of unknown parameters to be estimated
prob = ? (Nonsignificant
2
indicates good fit, Why?)
Sample Information
x1
x1
x2
x3
x4
…
x2
x3
x4
v1
cov21
v2
cov31
cov32
v3
cov41
cov42
cov43 v4 …
Mean1 Mean2
Mean3
Mean4 …
Total info = P(P+1)/2 + Means
…
Absolute Fit -- SRMR
Standardized Root Mean Square Residual
SRMR = Difference between observed and
implied covariances in standardized metric
Desirable when < .90, but no consensus
Relative Fit:
Relative to Baseline (Null) Model
All unknown parameters are fixed at 0
Variables not related (====0)
Model implied covariance = 0
Fit to sample covariance matrix S
Obtain 2,
df, prob < .0000
Relative Fit Indices
CFI = 1- (2-df)/(2b-dfb)
b = baseline model
Comparative Fit Index, desirable => .95; 95% better than b model
TLI = (2b/dfb - 2/df) / (2b/dfb-1)
(Tucker-Lewis Index, desirable => .90)
RMSEA = √(2-df)/(n*df)
(Root Mean Square of Error Approximation, desirable <=.06
penalize a large model with more unknown parameters)
Special Case A
d1
1
Verbal
Aggression
t4a3
e3
t4a93
e2
t4a94
e1
t4a37
e6
t4a57
e5
t4a90
e4
Sex
d2
1
Physical
Aggression
Special Cases A
Assumption: x =
y = x + +
= + x +
Special Case B
e1
x1
e2
x2
e3
x3
Verbal
Aggression
d
Peer
Status
e4
x4
e5
x5
e6
x6
Physical
Aggression
Special Cases B
Assumption: y =
x = x + x +
y = + +
Other Special Cases of SEM
Confirmatory Factor Analysis (measurement model only)
Multiple & Multivariate Regression
ANOVA / MANOVA (multigroup CFA)
ANCOVA
Path Analysis Model (no latent variables)
Simultaneous Econometric Equations…
Growth Curve Modeling
…
EFA vs. CFA
e1
1
e2
1
e3
1
e4
1
e5
1
e6
1
x1
x2
x3
x4
x5
x6
1
1
Factor 2
Factor 1
Exploratory Factor Analysis
Confirmatory Factor Analysis
e1
1
e2
1
e3
1
e4
1
e5
1
e6
1
x1
x2
x3
x4
x5
x6
1
1
Factor 1
Factor 2
Multiple Regression
x1
e
1
x2
x3
Y
ANCOVA
e1
1
Pretest1
Posttest1
Group
e2
1
Pretest2
Posttest2
Multivariate Normality Assumption
Observed data summed up perfectly by
covariance matrix S (+ means M), S thus
is an estimator of the population
covariance
Consequences of Violation
Inflated 2 & deflated CFI and TLI
reject plausible models
Inflated standard errors attenuate
factor loadings and relations of latent
variables (structural parameters)
(Cause: Sample covariances were underestimated)
Accommodating Strategies
Correcting Fit
Correcting standard errors
Bootstrapping
Transforming Nonnormal variables
Satorra-Bentler Scaled 2 & Standard Errors
(estimator = mlm; in Mplus)
Transforming into new normal indicators
(undesirable)
SEM with Categorical Variables
Satorra-Bentler Scaled
S-B 2 = d-1(ML-based 2)
that incorporates kurtosis)
2
& SE
(d= Scaling factor
Effect: performs well with continuous data
in terms of 2, CFI, TLI, RMSEA,
parameter estimates and standard errors.
also works with certain-categorical
variables (See next slide)
Analysis: estimator = MLM;
Workable Categorical Data
7.000
6.000
5.000
4.000
3.000
2.000
1.000
0.000
1.000
2.000
3.000
4.000
5.000
Nonworkable Categorical Data
6.000
5.000
4.000
3.000
2.000
1.000
0.000
1.000
2.000
3.000
Bootstrapping
(resampling of data)
Original btstrp1
x y
x y
1 5
5 3
2 4
1 1
3 3
3 2
4 2
4 5
5 1
2 4
. .
. .
btstrp2 …
x y
1 3
5 4
4 1
2 2
3 5
. .
Limitation of Bootstrapping
Assumption: Sample = Population
Useful Diagnostic Tool
Does not Compensate for
small or unrepresentative samples
severely non-normal or
absence of independent samples for the crossvalidation
Analysis: Bootstrap = 500 (standard/residual);
Output: stand cinterval;
Mplus
www.statmodel.com
Multiple Programs Integrated
SEM of both continuous and categorical
variables
Multilevel modeling
Mixture modeling (identify hidden groups)
Complex survey data modeling
(stratification, clustering, weights)
Modern missing data treatment
Monte Carlo Simulations
Types of Mplus Files
Data (*.dat, *.txt)
Input (specify a model, <=80
columns/line)
Output (automatically produced)
Plot (automatically produced)
Data File Format
Free
Delimited by tab, space, or comma
All missing values must be flagged with
special numbers / symbols
Default in Mplus
Computationally slow with large data set
Fixed
Format = 3F3, 5F3.2, F5.1;
Mplus Input
DATA: File = ?
VARIABLE: Names=?; Usevar=?; Categ=?;
ANALYSIS: Type = ?
MODEL: (BY, ON, WITH)
OUTPUT: Stand;
Model Specification in Mplus
BY Measured by (F by x1 x2 x3 x4)
ON Regressed on (y on x)
WITH Correlated with (x with y)
XWITH Interact with (inter | F1 xwith F2)
PON Pair ON (y1 y2 on x1 x2 = y1 on x1; y2 on x2)
PWITH pair with (x1 x2 with y1 y2 = x1 with y1; y1
with y2)
Default Specification
Error or residual (disturbance)
Covariance of exogenous variables in CFA
Certain covariances of residuals (z2)
z1
z2
Graphic Model
y1
y2
y3
y7
F1
y8
y9
F3
y13
y14
d3
F5
F2
d4
d5
y4
y5
y6
F4
y10
y11
y12
y15
Model Specification
Model:
f1 by y1-y3;
f2 by y4-y6;
f3 by y7-y9;
f4 by y10-y12;
f5 by y13-y15;
f3 on f1 f2;
f4 on f2;
f5 on f2 f3 f4 ;
MeaErrors are au
Practice
Prepare two data files for Mplus
Mediation.sav
Aggress.sav
Model Specification
Single Group CFA
Examine Mediation Effects in a Full SEM
Run a MIMIC model of aggressions
Multigroup CFA to examine measurement
invariance
SPSS Data
Missing Values?
Save as & choose file type
Leave as blank to use fixed format
Recode into special number to use free format
Fixed ASCII
Free *.dat (with or without variable names?)
Copy & paste variable names into Mplus
input file
Mplus Interface
Activate Mplus Program
Language Generator
Manually Create An Input File
Four Separate Files
(Mplus)
Data
Input
Need manually specify a model
Output
best prepared with other programs
automatic output window
Graph
automatic graph file
Data File
Individual Case Data (*.dat or *.txt)
Free Format (default)
Variable separated by tab, comma, or space
All missing values must be flagged with special
symbols or numbers).
Fixed Format
Variable takes fixed space, e.g. 2F2, 4F6, 5F6.3
Missing values can be left blank
Summary Data
Variance-Covariance matrix, means
Correlation matrix, standard deviation, means
SPSS Mplus
Open “Antisocial.sav” with SPSS
Work in Variable Window
Option 1: Fixed Format
Change Format to Simplify
Save as ? (Type=Fixed ASCII)
Option 2: Free Format
Recode missing values
Save as ? (Tab-delimited)
Fixed Format
F3 4F3.2 25F1
F3
One variable that takes 3
columns
4F3.2 4 variables, each has 3 column
with 2 decimals with a column
25F1 25 variables, each uses on
column
Copy SPSS Variable Names
into Mplus
Menu: Utilities
Variables
Highlight to select variables
Paste
Go to Syntax Window
Select & Copy
Paste under Names Are in Mplus input
file
Practice now
SAS Mplus
Assign flags to missing values (use Array
code for many variables)
Proc Export Data = Data File
Outfile = “Mplus input file folder\*.dat”
DBMS = dlm Replace;
Run;
Practice
Fixed Format Out of SAS
Open with SPSS
Save as Fixed Format
Practice
Stata2mplus
Converting a stata data file to *.dat
Find out:
http://www.ats.ucla.edu/stat/stata/faq/stata
2mplus.htm
Modification Indices
Lower bound estimate of the expected chi
square decrease
Freely estimating a parameter fixed at 0
MPlus Output: stand Mod(10);
Start with least important parameters
(covariance of errors)
Caution: justification?
Indirect (Mediation) Effect
A*B
Mplus specification:
Model Indirect: DV IND Mediator IV;
Model Comparison
Model:
Probabilistic statement about the relations of
variables
Imperfect but useful
Models Differ:
Different Variables and Different Relations
(, , , )
Same Variables but Different Relations
(, , , )
Nested Model
A Nested Model (b) comes from general
Model (a) by
Removing a parameter (e.g. a path)
Fixing a parameter at a value (e.g. 0)
Constraining parameter to be equal to another
Both models have the same variables
Test If A=B
y1
y2
y3
y7
A
F1
y8
y9
F3
y13
B
y14
d3
F5
F2
d4
d5
y4
y5
y6
F4
y10
y11
y12
y15
Model Comparison via
2
Difference
2 =
df =
(Nested model)
2 =
df =
(Default model)
___________________________________
2dif =
dfdif =
p = ? (a single tail)
Find p value at the following website:
http://www.tutor-homework.com/statistics_tables/statistics_tables.html
Conclusion:
If p > .05, there is no difference between the default model and nested
model. Or the Hypothesis that the parameters of the two models are equal
is not supported.
Practice
Test if effect A=B
Equality Constraints in Mplus
Parameter Labels:
Numbers
Letters
Combination of numbers of letters
Constraint (B=A)
F3 on F1 (A);
F3 on F2 (A);
Run CFA with Real Data
Verbal
Aggression
Physical
Aggression
a3
e1
a93
e2
a94
e3
a37
e4
a57
e5
a90
e6
Multigroup Analysis
VARIABLE:
USEVAR = X1 X2 X3 X4;
Grouping IS sex (0=F 1=M);
ANALYSIS: TYPE = MISSING H1;
MODEL:
F1 BY X1 - X4;
MODEL M:
F1 BY X2 - X4;
Note: sex is grouping variable
and is not used in the model.
Why Measurement Invariance
Matters?
Xg1 = g1 + g1g1 + g1
Xg2 = g2 + g2g2 + g2
Xg1- Xg2= (g1 - g2) + (g1g1-g2g2) + (g1-g2)
Xg1- Xg2 =
+ (g1- g2)
Test Measurement Invariance
Default Model
Model:
F1 By a3
a93(1)
a94 (2);
F2 By a37
a57 (3)
a90 (4);
Model M:
F1 By
a93 ()
a94 ();
F2 By
a57 ()
a90 ();
Output: stand;
Note: Reference indicators in
the second group are omitted.
Test Measurement Invariance
Constrained Model
Model:
F1 By a3
a93(1)
a94 (2);
F2 By a37
a57 (3)
a90 (4);
Model M:
F1 By
a93 (1)
a94 (2);
F2 By
a57 (3)
a90 (4);
Output: stand;
Note: Reference indicators in
the second group are omitted.
Estimate with Real Data
Verbal
Aggression
Sex
a3
e1
a93
e2
a94
e3
a37
e4
a57
e5
a90
e6
d1
Race1
d2
Race2
Physical
Aggression
SEM with Categorical
Indicators
Session II
Problems of Ordinal Scales
Not truly interval measure of a latent
dimension, having measurement errors
Limited range, biased against extreme
scores
Items are equally weighted (implicitly by
1) when summed up or averaged, losing
item sensitivity
Criticisms on Using Ordinal Scales
as Measures of Latent Constructs
Steven (1951): …means should be avoided because
Merbitz(1989): Ordinal scales and foundations of
its meaning could be easily interpreted beyond ranks.
misinference
Muthen (1983): Pearson product moment correlations
Write (1998): “…misuses nonlinear raw scores or
of ordinal scales will produce distorted results in
structural equation modeling.
Likert scales as though they were linear measures will
produce systematically distorted results. …It’s not only
unfair, it is immoral.”
Assumption of Categorical
Indicators
A categorical indicator is a coarse
categorization of a normally distributed
underlying dimension
Latent (Polychoric) Correlation
Categorization of Latent Dimension
& Threshold
No
Never
1
Yes
m-1
2
Sometimes m
3
4
Y
Often
5
Threshold
The values of a latent dimension at which
respondents have 50% probability of
responding to two adjacent categories
Number of thresholds = response
categories – 1. e.g. a binary variable has
one threshold.
Mplus specification [x$1] [y$2];
Normal Cumulative Distributions
Measurement Models of Categorical
Indicators (2P IRT)
Probit: P (=1|) = [(- + )-1/2 ]
(Estimation = Weight Least Square with df adjusted for
Means and Variances)
Logistic: P (=1|) = 1 / (1+ e-(- + ))
(Maximum Likelihood Estimation)
Converting CFA to IRT
Parameters
Probit Conversion
a = -1/2
b = /
Logit Conversion
a = /D
b = /
(D=1.7)
One Parameter
Item Response Theory Model
Analysis: Estimator = ML;
Model:
F by [email protected]
[email protected]
…
[email protected];
Sample Information
Latent Correlation Matrix
equivalent to covariance matrix of
continuous indicators
Threshold matrix Δ
equivalent to means of continuous
indicators
Stages of Estimation
Sample information:
Correlations/threshold/intercepts
(Maximum Likelihood)
Correlation structure (Weight Least
Square)
g
F=
(s(g)-(g))’W(g)-1(s(g)-(g))
g=1
W-1 matrix
Elements:
S1 intercepts or/and thresholds
S2 slopes
S3 residual variances and correlations
W-1 : divided by sample size
Estimation
WLSMV:
Weight Least Square estimation 2 with
degrees of freedom adjusted for Means
and Variances of latent and observed
variables
Baseline Model
Estimated thresholds of all the categorical
indicators
df = p 2– 3p
(p = 3 of polychoric correlations)
Data Preparation Tip
Categorical indicators are required to have
consistent response categories across
groups
Run Crosstab to identify zero cells
Recode variables to collapse certain
categories to eliminate zero cells
Inconsistent Categories
1
2
3
4
5
Male
60
80
43
4
0
Female
57
86
32
16
2
1
2
3
4
Male
60
80
43
4
Female
57
86
32
18
Specify Dependent Variables
as Categorical
Variable:
Categ = x1-x3;
Categ = all;
Reporting Results
Guidelines:
Conceptual Model
Software + Version
Data (continuous or categorical?)
Treatment of Missing Values
Estimation method
Model fit indices (2(df), p, CFI, TLI, RMSEA)
Measurement properties (factor loadings + reliability)
Structural parameter estimates (estimate,
significance, 95% confidence intervals)
( = .23*, CI = .18~.28)
Reliability of Categorical Indicators
(variance approach)
= (i)2/ [(i)2 + 2],
where
(i)2 = square (sum of standardized factor loadings)
2 = sum of residual variances
i = items or indicator
2i = 1 - 2
McDonald, R. P. (1999). Test theory: A unified treatment (p.89) Mahwah,
New Jersey: Lawrence Erlbaum Associates.
Calculator of Reliability
(Categorical Indicators)
SPSS reliability data
SPSS reliability syntax
Trouble Shooting Strategy
Start with one part of a big model
Ensure every part works
Estimate all parts simultaneously
Important Resources
Mplus Website:
www.statmodel.com
Papers:
http://www.statmodel.com/papers.shtml
Mplus discussions:
http://www.statmodel.com/cgi-bin/discus/discus.cgi