Transcript SEM

Structural Equation Models
Asma Alfadhel
Sarah Asio
Jimmy(Yuanshan) Cheng
10/10/2013
Outline:
 Part I:
-
CFA
 Part II:
SEM
- SEM Plots
-
 Part III:
-
Goodness of Fit
Part I (CFA) - Outline:
CFA
 Confirmatory Factor Analysis
 Available R Packages.
 The Lavaan Package.
 Model Description.
 Apply CFA and Interpret Results
Confirmatory Factor Analysis vs. EFA
 EFA: exploratory

All loadings are free to vary (“L” has no zeros)

Assumption: Cov(F) = I
 CFA: driven by theory

The number of factors

Correlations between factors, Cov(F) = ɸ

Which items load onto which factors

CFA allows for the constraint of certain loadings to be zero
Diagram
Confirmatory Factor Analysis vs. SEM
 SEM: specify the causality between factors
 Directed arrows between latent variables
 Called the structural model
 CFA: no directed arrows between latent factors
 Called (the measurement model)
 CFA is frequently used as a first step to assess the
proposed measurement model in a structural equation
model. (wikipedia)
Objective of CFA:
Cov(Y) = L Cov(F) LT + Ψ
 Factors are uncorrelated with error terms, and error terms are uncorrelated
 Cov(Y): the covariance of the observed variables
 Cov(F) = ɸ, the covariance of the factors
Cov(Y) = L ɸ LT + Ψ
Ʃ
=
(Observed Cov)
Ʃ(Ɵ)
(Implied Cov)
 Try to match the implied covariance with the observed covariance
R Packages for SEM
 “SEM” package: developed by John Fox and for
along time was the only option in R
 “OpenMx” package: developed by Steven Boker.
 “lavaan” package: developed by Yves Rossel from
the Ghent University in Belgium.
The “lavaan” Package:
lavaan is an R package for latent variable analysis: *
 confirmatory factor analysis: function cfa()
 structural equation modeling: function sem()
 latent curve analysis / growth modeling: function growth()
 general mean/covariance structure modeling: function lavaan()
 (item response theory (IRT) models)
 (latent class + mixture models)
 (multilevel models)
More information:
Lavaan website.
lavaan: an R package for structural equation modeling.
Journal of Statistical Software
*http://users.ugent.be/~yrosseel/lavaan/lavaan1.pdf
“cfa” function:
Description: Fit a Confirmatory Factor Analysis (CFA) model.
Usage
cfa (model = NULL, data = NULL, meanstructure = "default", fixed.x = "default", orthogonal =
FALSE, std.lv = FALSE, std.ov = FALSE, missing = "default", ordered = NULL, sample.cov = NULL,
sample.cov.rescale = "default", sample.mean = NULL, sample.nobs = NULL, ridge = 1e-05, group =
NULL, group.label = NULL, group.equal = "", group.partial = "", cluster = NULL, constraints = ’’,
estimator = "default", likelihood = "default", information = "default", se = "default", test = "default",
bootstrap = 1000L, mimic = "default", representation = "default", do.fit = TRUE, control = list(), WLS.V =
NULL, NACOV = NULL, start = "default", verbose = FALSE, warn = TRUE, debug = FALSE)
Arguments

model: A description of the user-specified model.

data: An optional data frame containing the observed variables used in the model.

std.lv: If TRUE, the metric of each latent variable is determined by fixing their variances to 1.0. If FALSE,
the metric of each latent variable is determined by fixing the factor loading of the first indicator to 1.0.

std.ov: If TRUE, all observed variables are standardized before entering the analysis.

Missing: If the data contain missing values, the default behavior is “listwise” deletion. If the missing
mechanism is MCAR (missing completely at random) or MAR (missing at random), the lavaan package
provides case-wise (or 'full information') maximum likelihood estimation (Set missing = "ML").
Model Description
 The dataset was collected by Sarah Asio.
 The original model consists of 12 factors and 42 observed indicators.
 For simplification a sub-model was used; it consists of 4 factors and 23 observed
variables.
 The dataset contains a sample of 381 responses from students.
 The items range in value from 1 to 6.
Team
Innovation
Team
Communication
Team Effort
Team Learning
Specifying the model: (Symbols)
 =~
“latent variable definition”
 latent variable =~ indicator1 + indicator2 + indicator3
 It define how the latent variables are 'manifested by' a set of observed
variables.
 The reason why this model syntax is so short, is that the function will take
care of several things:
• First, by default, the factor loading of the first indicator of a latent variable is
fixed to 1, thereby fixing the scale of the latent variable.
• Second, residual variances are added automatically.
• And third, all exogenous latent variables are correlated by default.
http://lavaan.ugent.be/tutorial/cfa.html
Specifying the model: (Symbols)
 ~~


 ~

“Correlation” --- Correlated with
Residual Variance
Covariance of each latent variable.
“Regression” --- Regressed on
This is used in specifying the SEM model.
Specifying the model:
#Specify the model
Our.model <- 'CMM =~ CM9 + CM10 + CM11 + CM12 + CM13
EFF =~ EF14 + EF15 + EF16 +EF17
LN =~ LN18 + LN19 +LN20 +LN21 +LN22 +LN23 +LN24
INN =~ IN36 + IN37 + IN38 + IN39 + IN40 + IN41 + IN42'
fit <- cfa(Our.model, data=MyData)
Syntax
summary(fit, fit.measures=T)
Missing values, Standardization, & R2
 fit <- cfa(Our.model, data=MyData, std.lv=TRUE, std.ov = TRUE,
missing = "ML")
 summary(fit, fit.measures=T, rsq=T)
OR
 Inspect(fit, "rsquare")
(no round off)
 fit <- cfa(Our.model, data=MyData, missing = "ML")
 summary(fit, standardized = TRUE, rsq =TRUE)
2st Output
 fit <- cfa(Our.model, data=MyData, std.lv=TRUE, std.ov = TRUE, missing =
"ML")
 Inspect(fit, "rsquare")
CFA Syntax in “lavaan” vs “sem”
install.packages("semPlot")
 Lavaan.model <- semSyntax(fit, "lavaan")
 Sem.model <- semSyntax(fit, “sem")
 Output:
BACK
CFA vs. EFA
Back
Part II Outline
 SEM process - Overview
 SEM Measurement models
SEM Path diagram - Overview
R-Code for:
 SEM model specification
 SEM model fitting
 SEM Path Diagram
 Outputs for SEM model and path diagram
STRUCTURAL Equations Modeling (SEM) process
Specify the Model
Select Measures of the theoretical model and collect
data
Determine whether the model is identified
Analyze the Model
Analyze the model fit
Finish or Re-specify the model and repeat process
Notes: SEM vs CFA
“Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers.
http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf
SEM Measurement models
Endogenous measurement model:
Exogenous measurement model:
X = BxU + ex
Y = ByZ + ey
•
Here:
•
Here:
• Y is an (ny x1) matrix of endogenous
indicators,
• X is an (nx x1) matrix of exogenous
indicators,
• By is an (nyxq) matrix of coefficients
from the endogenous variable to
endogenous indicators,
• Bx is an (nx xp) matrix of
coefficients from the exogenous
variables to exogenous indicators,
• Z is a (qx1) matrix of endogenous
latent variable(s),
• U is a (px1) matrix of exogenous
latent variable(s),
• ey is a (nyx1) matrix for error
associated with the endogenous
indicators.
• ex is a (nx x1) matrix for error
associated with the exogenous
indicators.
“Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers.
http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf
Overall SEM Measurement & Structural models
SEM model for the case study:
Z = BzU + ez
•
Here:
Effort+
Learning
Communication
• Z is the endogenous variable,
• U is a (3x1) matrix of exogenous
latent variable(s),
Innovation
• Bz is a (1x3) matrix of coefficients of
exogenous variables,
• ez is the error associated with the
endogenous variable.
“Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers.
http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf
Matrix representation for SEM measurement models
X = BxU + ex
Y = ByZ + ey
Z = BsU + es
Notes: CFA vs EFA
SEM Path diagram - Overview
•A path diagram is a graphical representation of the hypothesized
relationships between the variables.
•Exogenous – emanates arrow (analogous to independent variables).
•
communication, effort and learning
•Endogenous – receives arrow (analogous to dependent variables).
•
innovation and measures
•Other variables are error terms which account for random or
measurement error for endogenous variables.
http://en.wikipedia.org/wiki/Structural_equation_modeling
Path Diagram Node representations
http://people.ucsc.edu/~zurbrigg/psy214b/09SEM3a.pdf
R-Code for SEM model specification
#Specify the model
Our.model <- ‘
CMM =~ CM9 + CM10 + CM11 + CM12 + CM13
EFF =~ EF14 + EF15 + EF16 +EF17
LN =~ LN18 + LN19 +LN20 +LN21 +LN22 +LN23 +LN24
INN =~ IN36 + IN37 + IN38 + IN39 + IN40 + IN41 + IN42
INN ~ CMM + EFF + LN’
#Install the lavaan package
install.packages("lavaan")
require("lavaan")
R-Code for SEM model fitting
# Fit SEM model using standardized data
fit <- lavaan ::: sem(Our.model, data=SEMdata, std.lv=TRUE, std.ov = T, missing =
"ML")
summary(fit, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE)
Syntax definitions:
std.lv: If TRUE, the metric of each latent variable is determined by fixing their
variances to 1.0. If FALSE, the metric of each latent variable is determined by fixing
the factor loading of the first indicator to 1.0.
std.ov: If TRUE, all observed variables are standardized before entering the analysis.
Missing: If "listwise", cases with missing values are removed listwise from the data
frame before analysis. If "direct" or "ml" or "fiml" and the estimator is maximum
likelihood, Full Information Maximum Likelihood (FIML) estimation is used using all
available data in the data frame.
http://cran.r-project.org/web/packages/lavaan/lavaan.pdf
R-Code for SEMS Path Diagram
#Install semPlot package
install.packages("semPlot")
require("semPlot")
# Plot input path diagram
semPaths(fit,title=FALSE, curvePivot = TRUE, exoVar = FALSE, exoCov = FALSE)
# Plot output path diagram with standardized parameters
semPaths(fit, "std”, curvePivot = TRUE, exoVar = FALSE, exoCov = FALSE)
For more options and Syntax definitions, refer to:
http://cran.r-project.org/web/packages/semPlot/semPlot.pdf
Input Path diagram
Output Path Diagram
Part III (Goodness of fit) - Outline
• Introduction to fit indices
• Using R to show these indices
• Modification indices
Goodness of fit
• Model fit: “how the model that best represents
the data reflects underlying theory”
• Population covariance matrix (∑) Matches
Implied covariance matrix (∑(θ) )
• So far not yet an agreement on
• Which indices to use
• Cut-offs for various indices
Hopper et. al (2008)
Overview of Indices
Types of index
Description
Examples
Absolute fit indices
How well a priori model fits the
sample data (McDonald and Ho,
2002)
Chi-Square, RMSEA, GFI,
AGFI, RMR, SRMR
Incremental fit
indices
AKA Comparative (Miles and
Shevlin, 2007) or relative fit indices
(McDonald and Ho, 2002)
NFI, NNFI, CFI
Parsimony fit
indices
Overcome the problem of high fit
for less rigorous theoretical model
Penalize for model complexity
(Hopper et. al ,2008)
AIC, BIC
Hopper et. al (2008)
Benchmarks Summary
Indices
Acceptable Threshold levels Comments
Chi-Square χ2
p value (p > 0.05)
Sensitive to sample size
RMSEA
Less than 0.07 (Steiger,
2007)
Has a known distribution.
Favors parsimony.
SRMR
Less than 0.08 (Hu and
Bentler, 1999)
Standardized root mean
square residual
CFI
Greater than 0.95
TLI
Greater than 0.95
AIC; BIC
Smaller is better
Performs well in simulation
studies(Sharma et al, 2005)
Hopper et. al (2008)
Reporting Strategy
• Not necessary to report all
• Do not choose to report only the good ones
• CFI, GFI, NFI, and NNFI are most commonly
reported (McDonald and Ho 2002)
Hopper et. al (2008)
Reporting Strategy
• Hopper et al (2008)
•
•
Chi-Square, df, p-value
RMSEA, SRMR, CFI and one parsimony fit index
• Two-index presentation strategy (Hu and
Bentler, 1999)
•
•
•
TLI and SRMR
RMSEA and SRMR
CFI and SRMR
Modification indices
• To improve the model fit by freeing fixed
parameters
•
CFA is structured by theory
•
One factor only measures certain but not all observable measures
•
Parameters assumed to be zeros
•
Assumed zero error correlations
•
Just practical standard (Westfall et. al, 2012)
Wikipedia
Freeing fixed parameters
F2
F1
X1
e1
X2
X3
X4
e2
e3
e4
Modification Indices
• Don’t allow modification indices to drive to
process
• Any modification should make theoretical
sense
• Good practice to assess the fit
Hopper et. al (2008)