Structural Equation Modeling - Appalachian State University

Download Report

Transcript Structural Equation Modeling - Appalachian State University

Structural Equation
Modeling
Intro to SEM
Other Names
•
•
•
•
•
•
SEM – Structural Equation Modeling
CSA – Covariance Structure Analysis
Causal Models
Simultaneous Equation Modeling
Path Analysis (with Latent Variables)
Confirmatory Factor Analysis
SEM in a nutshell
• Combination of factor analysis and
regression
– Continuous and discrete predictors and outcomes
– Relationships among measured or latent variables
• Direct link between Path Diagrams and
equations and fit statistics
• Models contain both measurement and
path models
An Example of a Path Diagram
D
E
BDI
E
CES-D
E
ZDRS
Negative Parental
Influence
Depression
Gender
Dep parent
E
Insecure
Attachment
E
Neglect
E
Vocabulary
• Measured variable
– Observed variables, indicators or manifest
variables in an SEM design
– Predictors and outcomes in path analysis
– Squares in the diagram
• Latent Variable
– Un-observable variable in the model, factor,
construct
– Construct driving measured variables in the
measurement model
– Circles in the diagram
More Vocabulary
• Error or E
– Variance left over after prediction of a measured
variable
• Disturbance or D
– Variance left over after prediction of a factor
• Exogenous Variable
– Variable that predicts other variables
• Endogenous Variables
– A variable that is predicted by another variable
– A predicted variable is endogenous even if it in
turn predicts another variable
Still more Vocabulary
• Measurement Model
– The part of the model that relates indicators to latent
factors
– The measurement model is the factor analytic part of
SEM
• Path model
– This is the part of the model that relates variable or
factors to one another (prediction)
– If no factors are in the model then only path model
exists between indicators
Even more Vocabulary
• Direct Effect
– Regression coefficients of direct prediction
• Indirect Effect
– Mediating effect of x1 on y through x2
• Confirmatory Factor Analysis
• Covariance Structure
– Relationships based on variance and covariance
• Mean Structure
– Includes means (intercepts) into the model
•
Back to Path Diagrams
Single-headed arrow →
– This is prediction
– Regression Coefficient or factor loading
• Double headed arrow ↔
– This is correlation
• Missing Paths
– Hypothesized absence of relationship
– Can also set path to zero
The Previous Example
D
E
BDI
E
CES-D
E
ZDRS
Negative Parental
Influence
Depression
Gender
Dep parent
E
Insecure
Attachment
E
Neglect
E
Types of SEM questions
• Does the model produce an estimated
population covariance matrix that “fits” the
sample data?
– SEM calculates many indices of fit; close fit, absolute fit, etc.
• Which model best fits the data?
• What is the percent of variance in the
variables explained by the factors?
• What is the reliability of the indicators?
• What are the parameter estimates from the
model?
SEM questions
• Are there any indirect or mediating effects
in the model?
• Are there group differences?
– Multi-group models
• Can change in the variance (or mean) be
tracked over time?
– Growth Curve or Latent Growth Curve Analysis
SEM questions
• Can a model be estimated with
individual and group level components?
– Multilevel Models
• Can latent categorical variables be
estimated?
– Mixture models
• Can a latent group membership be
estimated from continuous and discrete
variables?
– Latent Class Analysis
SEM questions
• Can we predict the rate at which people
will drop out of a study or end treatment?
– Discrete-time survival mixture analysis
• Can these techniques be combined into a
huge mess?
– Multiple group multilevel growth curve latent class
analysis???????
SEM limitations
• SEM is a confirmatory approach
– You need to have established theory about
the relationships
– Cannot be used to explore possible
relationships when you have more than a
handful of variables
– Exploratory methods (e.g. model modification)
can be used on top of the original theory
– SEM is not causal; experimental design =
cause
SEM limitations
• SEM is often thought of as strictly
correlational but can be used (like
regression) with experimental data if you
know how to use it.
• SEM is by far a very fancy technique but
this does not make up for a bad
experiment and the data can only be
generalized to the population at hand
SEM limitations
• Biggest limitation is sample size
– It needs to be large to get stable estimates of
the covariances/correlations
– 200 subjects for small to medium sized model
– A minimum of 10 subjects per estimated
parameter
– Also affected by effect size and required
power
SEM limitations
• Missing data
– Can be dealt with in the typical ways (e.g.
regression, EM algorithm, etc.) through SPSS and
data screening
– Most SEM programs will estimate missing data
and run the model simultaneously
• Multivariate Normality and no outliers
– Screen for univariate and multivariate outliers
– SEM programs have tests for multi-normality
– SEM programs have corrected estimators
when there’s a violation
SEM limitations
• Linearity
• No multicollinearity/singularity
• Residuals Covariances (R minus reproduced
R)
– Should be small
– Centered around zero
– Symmetric distribution of errors
– If asymmetric than some covariances are
being estimated better than others
Technical Stuff Follow
Basic Structure
Simple regression:
 yy
   xy

y = x + 
 yx 
 Cov( y, x )

 xx 
  2  xx  Var () xx 

 ()   
 xx 
xx

= Implied Covariance Matrix
The univariate consequences of
measurement error
x = True Score + Error =  + 
 Var(x) = Var() + Var() =  + 
Thus, Var(x) overestimates the variance of the true score
The bivariate consequences of measurement error
A simple regression model with measurement error
x    x 
Cov(, )

y    y    
Var ()

     
y =
*x
+ 
 Var () 
Cov(x, y)
  xx
 
 
Var (x)
 Var (x) 
*
where xx is the measurement reliability of x.
Introduction:The bivariate consequences of
measurement error
Impact on goodness-of-fit
Cor( x , y) 
Cov( x , y)
 xx  yy

Var ()
 xx
Var ()
 yy
Cov(, )
Var ()Var ()
    xx  yy
What’s the impact on sample inference?
Generally, the distortions are not as systematic for multiple
regression and simultaneous equation models
Confirmatory Factor Analysis Model
x    
Where:
x = (q  1) vector of indicator/manifest variables
 = (n  1) vector of latent constructs (factors)
 = (q  1) vector of errors of measurement
= (q  n) matrix of factor loadings
Confirmatory Factor Analysis: Example
Measures for positive emotions 1:
– x1 = Happiness, x2=Pride
Measures for negative emotions 2:
– x3 = Sadness, x4=Fear
Model:
x 1   111
 1
x 2   211
 2
x3 
 32  2   3
x3 
 42  2   4
Confirmatory Factor Analysis: Example
 x 1    11
 x  
 2    21
x 3   0
  
x 4   0
x 
0 
 1 



0    1   2 



 32   2   3 

 
 42 
 4 

 
Confirmatory Factor Analysis
Graphical Representation

1
11
2
21
x1
x2
32
x3
1
2
3
34
x4
4
Confirmatory Factor Analysis
Model Assumptions
E() = 0
E() = 0
 E( x )  0
Var() = 
Var() = 
Cov(, ) = 0
 Var (x)    
Implied Mean Vector
Implied Covariance Matrix
Confirmatory Factor Analysis
Example
0 
 var (1 ) 0 
 0





 

 


  var ( 4 )
 0
21111  11

  
        21 11 11
  32  11 12

  42  11 12

   11
12
12 
 22 
22111   22
 32  21 12
 42  21 12
232  22   33
 42  32  22





2
 42  22   44 
Confirmatory Factor Analysis
Model Identification
Definition:
The set of parameters ={,,} is not
identified if there exists 12 such that (1)=
(2).
Confirmatory Factor Analysis
Is the one-factor, two-indicator model identified?
• Example: Measures for temperature : x1 = Celsius,
x2=Fahrenheit
• Measurement Model:

11
21
x 1  1  11  1
x1
x2
x 2   2   21   2
1
2
where 1 and 2 are measurement intercepts.
Confirmatory Factor Analysis
Scale indeterminacy
Recall measurement model:
x 1  1  11  1
x 2   2   21   2
Origin indeterminacy
 E() = 0
Scale (unit) indeterminacy
 11  1 or

 Var ()  1
How should single-indicator factors be handled?
Confirmatory Factor Analysis
The one-factor, two-indicator model is under identified
Population covariance matrix
Implied covariance matrix
Solution 1
11  5
 21  1
11  5
 22  5
 yy

 xy
 10


 xx   5 10
211 11  11
()  
  21 11 11
Solution 2


212 11   22 
11  2.5
21  2.0
11  7.5
 22  0
Confirmatory Factor Analysis
Is the one-factor three-indicator model identified?
11  11
( )   2111
 3111
 11
   21
 31
22111   22
312111
 22
 32



 33 



2
3111   33 
1
1
21
31
x1
x2
x3
1
2
3
Confirmatory Factor Analysis
The one-factor three-indicator model is exactly identified
31   32  21
21   32  21
11   21 31  32
11   11  11
2
 22   22  2111
2
 33   33  3111
Confirmatory Factor Analysis
Identification Rules
- Number of free parameters  ½ q (q+1)
- Three-Indicator Rule
n1
One non zero element per row of 
Three or more indicators per factor
 Diagonal
- Two-Indicator Rule
n>1
ij  0 for at least one pair i, j, i  j
one non-zero element per row of 
Two or more indicators per factor
 Diagonal
Confirmatory Factor Analysis
Maximum Likelihood Estimation
xi ~ i.i.d MVNq(0, ()) i=1, …, N
N
L   (2)
i 1

Nq
2
()
N 2
N
exp(  12 ( xi  1 () x i ))
i 1
N
N
1
Log L  Constant  log ()  2  x i  1 () x i
2
i 1
N
 Constant  (log ()  tr (S  1 ()))
2
Confirmatory Factor Analysis
Other Estimation Methods
•
Unweighted Least Squares
Min FULS  12 tr[S  ()]2 

•
Generalized L.S.
Min FGLS 

1
2

tr [I  ()S 1 ] 2

Confirmatory Factor Analysis
The Asymptotic Covariance Matrix
   log L    log L ' 
 
 
H  ( N  1) Ε  
         
= Information Matrix
Asy Cov  H 1
  ˆ
Confirmatory Factor Analysis
Goodness-of-fit measures

(s ij  ˆ ij ) 
RMR  

q
(
q

1
)
/
2
 i 1 j1

q
Root Mean-Square Residual
Correlation Residuals
i
2
1
2
rij  rˆij
Goodness-of-Fit Index
ˆ 1 S  I) 2
tr (
GFI  1 
ˆ 1 S)
tr (
Communalities/Reliabilities
vaˆr(i )
R  1
ˆ ii
Coefficient of Determination
ˆ

R 2  1
ˆ

2
xi
Confirmatory Factor Analysis
Goodness-of-fit measures
H 0 :   () vs H1 :  unristricted
 L0 
L.R.T. : F  2 log   ~  2q ( q 1)
t
 L1 
2

Do not reject if F   2q ( q 1)
2


N
ˆ  tr 
ˆ 1 S
log 
2
N
N
Log L1   log S  tr(I)   (logS  q )
2
2
ˆ  tr 
ˆ 1 S  log S  q  Fit Function
F  N log 
Log L 0  




 t , 0.05
Confirmatory Factor Analysis
Other Goodness-of-fit indices
• Root Mean Square Error of Approximation:
F  df
RMSEA
( N –1)t*(degrees
df
where df = (q(q+1)/2)
of freedom).
•
RMSEA  0.05 
Close fit
0.05 < RMSEA  0.08  Reasonable fit
RMSEA > 0.1  Poor fit
Confirmatory Factor Analysis:
Multitrait-Multimethod Example
Method 1
x1
x2
Method 2
x3
Method 1
x1
x2
x1x2
Method 2
x3
x4
x3x1
x4x1
x3x2
x4x2
x4x3
x4
Confirmatory Factor Analysis:
Multitrait-Multimethod Example
1
1
2
x1
3
2
x2
x3
3
x4
4
4
Brand Halos and Brand Evaluations
Lynd Bacon (1999)
Performance
Pd1 Pt1 Pd2 Pt2
DirtyScooter
Quality
Qd1 Qt1 Qd2 Qt2
TrailBomber
Brand Halos and Brand Evaluations
Sources of Variance
DirtyScooter
Pd1
Pd2
TrailBomber
Pt1
Pt2
Brand
Attribute
0.71
0.74
0.04
0.02
0.40
0.41
0.39
0.30
Convergent and Discriminant Validity
Bagozzi and Yi (1993)
• Attitude towards coupons (1) with three
semantic differential measures:
x1=pleasant/unpleasant
x2=good/bad
x3=favorable/unfavorable
• Subjective norms (2) with two measures
x4= Most people who are important to me think I definitely should use coupons
for shopping in the supermarket
x5= Most people who are important to me probably consider my use of
coupons
to be wise.
Convergent and Discriminant Validity
Bagozzi and Yi (1993)
.86
1
.82
.75
2
.73
.90
x1
x2
x3
.69
x4
1
2
3
4
5
.33
.47
.52
.19
.43
x5
Convergent and Discriminant Validity
Bagozzi and Yi (1993)
• Convergent validity:
- Goodness-of-fit:
2

4.68; p  .32
4  significant
- All loadings are high and
• Discriminant validity: H0 =1 is rejected
• Measurement reliability: (x1=.56, x2=.67, x3=.53,
x4=.48, x5=.81)
The Full Structural Equation Model
Measurement Model
x  x  
y   y  
Where:
x = (q  1) vector of exogenous indicator/manifest variables
y = (p  1) vector of endogenous indicator/manifest variables
 = (n  1) vector of exogenous latent constructs with mean 0 and variance 
 = (m  1) vector of endogenous latent constructs
 = (q  1) vector of errors of measurement with mean 0 and variance 
 = (p  1) vector of errors of measurement with mean 0 and variance 
x= (q  n) matrix of factor loadings
y= (p  m) matrix of factor loadings
The Full Structural Equation Model
Structural Model
  B    
where
B = (m x m) Coefficient Matrix for the effect of  on 
 = (m x n) Coefficient Matrix for the effect  on 
 = (m x 1) Vector of errors, E() = 0 , COV(, ’) =  ,
COV(, ’) = 0
The Full Structural Equation Model
The Implied Covariance Matrix
 y (I  B) 1 [ '   ](I  B) 1 ' y   
()  
'
1'
'


(
I

B
)


x
y
'


'
 x  x 
The Full Structural Equation Model
Identification
•
•
Number of parameters <(p+q)(p+q+1)/2
Two-Step Rule
- Measurement Model Identification
- Structural Model Identification
The Full Structural Equation Model
Structural Model Identification
•
•
•
•
Null B Rule (B=0)
Recursive Rule
- B Triangular
-  Diagonal
Order Condition
ith equation is identified if # of variables excluded from ith
equation is  m-1
Rank Condition
- Form C    ,
- ith equation is identified if rank of Ci = m –1 where Ci formed
from those columns of C that have 0 in the ith row.
The Full Structural Equation Model
Structural Model Identification Example
1  12 2   111  1
2   211   22  2   2
 1   0
   
 2   21


1
2
12   1    11




0   2   0
B



1
 11
 22
0   1    1 
 



 22   2   2 
1
 21
12
2
2
The Full Structural Equation Model
Structural Model Identification Example
• Form
• Rank of
• Rank of
 1
C    ,  
  21
 0 
C1  

  22 
  
C 2   11 
 0 
 12
  11
1
0
0 
  22 
is m-1=2-1=1
is m-1=2-1=1
• Both equations are identified
Construct Validation by Use of Panel Model
Bagozzi and Yi (1993)
Construct Validation by Use of Panel Model
Bagozzi and Yi (1993)
• 31 and 42 capture temporal stability
• 21 and 43 reflect discriminant validity
• Convergent validity is assessed by overall
model fit and by the magnitude and
significance of the factor loadings
• The covariance between two serially
correlated errors is a measure of specific
variance
Construct Validation by Use of Panel Model
Bagozzi and Yi (1993)
Construct Validation by Use of Panel Model
Bagozzi and Yi (1993)
• Convergent validity:
2

- Goodness-of-fit:
26  22.16; p  .68
- All loadings are high and significant
- Factorial invariance holds
• Discriminant validity: H0 21 = 1 and 43 = 1
are rejected
• Temporal stability: st31  0.81 and st31  0.91