Data Analysis in Longitudinal Experiments and

Download Report

Transcript Data Analysis in Longitudinal Experiments and

Longitudinal Data Analysis:
Why and How to Do it With
Multi-Level Modeling (MLM)?
Oi-man Kwok
Texas A & M University
Road Map
• Why do we want to analyze longitudinal data
under multilevel modeling (MLM) framework?
– Dependency issue
– Advantages of using MLM over traditional Methods
(e.g., Univariate ANOVA, Multivariate ANOVA)
– Review of important parameters in MLM
• How can we do it under SPSS?
2
• Regression Model:
e.g.
Stati  0  1GRE _ M i  ei
DV: Test Scores of 1st Year Grad-Level Statistics
IV: GRE_M (GRE Math Test Score)
150 Students (i = 1,…,150)
One of the important Assumptions for OLS
regression?
(Observations are independent from each other)
Ignoring the clustered structure (or dependency
between observations) in the analyses can
result in:
• Bias in the standard errors
*Bias in the test of significance and confidence
interval
(Type I errors: Inflated alpha level (e.g. set
α=.05; actual α=.10))
 non-replicable results
4
Advantages of MLM over the traditional
Methods on analyzing longitudinal data
• Univariate ANOVA—Restriction on the error structure:
Compound Symmetry (CS) type error structure (higher
statistical power but not likely to be met in longitudinal
data)
• Multivariate ANOVA—No restriction on the error
structure: Unstructured (UN) type error structure (often
too conservative, lower statistical power); can only
handle completely balanced data (Listwise deletion)
• More…
5
Analyzing Longitudinal Data:
• Example
• (Based on Actual Data—variable names changed for
ease of presentation):
Compare two different teaching methods on
Achievement over time
• Teaching Methods:
78 students are randomly assigned to either:
A. Lecture (Control group; 39 students) or
B. Computer (Treatment group; 39 students)
• 4 Achievement (Ach) scores (right after the course, 1
year after, 2 year after, & 3 year after) were collected
from each student after treatment (i.e. statistics course)
Achievement
Computer
Time=0 :
Immediately
posttest
measure
Lecture
1
2
Time (Year)
7
3
Multi-Level Model (MLM)
A Simple
Regression
Model growth
for ONE student
• Note:
Start
with simple
model (student 36)
• Introduce treatment in example at end
Acht  0  1Timet  et
(t=0,1,2,3)
Student 36
Acht
e3
β1
e2
e1
e0
V(eti)=σ2
et: Captures variation of individual achievement
scores from the fitted regression model
WITHIN student 36
β0
0
1
2
3
Timet
Acht  0  1Timet  et
Compare to
Achti  0i  1iTimeti  eti
(i=1,2,3,…,78)
(Micro Level Model)
Student 27
Student 36
Achti
β1_Student 36
β1_Student 27
Β0_Student 36
Student 52
Β0_Student 27
Β0_Student 52
0
1
2
3
Timeti
Student 27
Student 36
Achti
β1_Student 36
β1_Student 27
Β0_Student 36
Β0_Student 27
0
Student ID
12
15
23
27
28
33
37
 0i
13.5
10.5
12.6
00
00
15.6
22.3
36.4
25.2

Student 52
Β0_Student 52
1
2
 1i
1.25
2.75
.23
10
11
.28
1.64
3.27
1.22

3
Timeti
Grand Intercept
Variance of the
intercepts
Grand Slope
Variance of the
Slopes
10
 00 0 
G

 0  11 
0 0 
G

 11 of
0deviations
Captures the
the 78 slopes from the
Grand slope
γ1027
Student
Ach
Captures the deviations of
the 78 intercepts from the
grand intercept γ00
Student 36
Overall Model
Student 52
γ00
No variation among the 78 intercepts
Time
0
 00 0 
G

 0  11 
Ach
 00
G
0
0

0
Student 27
γ10
γ10
Overall Model
Student
γ10
36
Student
γ10
52
No variation among the 78 slopes
Time
 00 0 
G

 0  11 
Ach
 00  01 
G

 10  11 
 01  10  0
Overall Model
Time
13
Summary
 00
Grand Intercept
 00
Variance of the
Intercepts
 10
Grand Slope
 11
Variance of the
Slopes
 01
• G: Captures between- student
differences
• R: Captures within-student
random errors
Covariance between
Intercepts and Slopes
 00  01 
G

 10  11 
V(eti)=σ2
MACRO vs. MICRO
• UNITS:
MACRO
MICRO
Educational
study
Family study
Longitudinal
study
School
/Class
Student
Family
Individual
Family
member
Repeated
observations
15
MACRO vs. MICRO (Cont.)
• MODELS:
MICRO level model:
regression model fits the observations
within each MACRO unit
MACRO level model:
model captures the differences between
the overall model and individual regression
models from different macro units
16
• Dependent Variable:
Math Achievement (Achieve, Repeat measures
/Micro Level)
• Predictors:
• Repeated measure (MICRO) Level Predictor:
Time (& any time varying covariates)
• Student (MACRO) Level Predictor:
Computer (Different teaching methods) (& any
time-invariant variables such as gender)
17
Data format under MANOVA approaches:
•
•
•
•
Student
S1
S2
S3
Treat
0
1
1
T0 T1 T2 T3
5
3
2
3
5 25 -- 33 (SPSS Data Format)
-- 19 17 26
• S1 has responses on all time points
• S2 has missing response at time 2 (indicated by "--")
• S3 has missing response at time 0.
•
MANOVA: only retains S1 in the analysis
18
Student
S1
S2
S3
Treat
0
1
1
Student Treat Time DV
S1
0
0 5
S1
0
1 3
S1
0
2 2
S1
0
3 3
S2
1
0 5
S2
1
1 25
S2
1
3 33
S3
1
1 19
S3
1
2 17
S3
1
3 26
T0 T1 T2 T3
5
3
2
3
Data format for MANOVA
5 25 -- 33
-- 19 17 26
Data format for Multilevel Model
(All 3 students are included
in the analyses)
19
Student Treat Time
S1
0 0
S1
0 7
S1
0 12
S1
0 13
S2
1
1
S2
1
3
S2
1 4
S2
1 6
S3
1 3
S3
1 15
S3
1 28
S3
1 31
DV
5
3
2
3
5
9
5
25
18
19
17
26
Can you
transform this
dataset back into
multivariate
format???
20
Questions
• 1. On average, is there any trend of the
math achievement over time?
• 2. Are there any differences between
students on the trend of math achievement
over time? (Do all students have the same
trend of math achievement over time?)
21
Micro Level (Level 1):
Grand Intercept
Var(U 0i )   00
Achti  0i  1iTimeti  eti
Macro Level (Level 2):
0i   00  U0i
1i  10  U1i
Grand Slope
Var(U1i )  11
Mathachti  0i  1i Timeti  eti
V(U0i)=τ00
Micro Level
V(U1i)=τ11
Combined Model
Mathach




Time

U

U
Time

e
ti
00
10
ti
0
i
1
i
ti
Mathachti   00  U0i  10  U1i Timeti  eti ti
Grand Intercept
Between School Differences
Grand Slope
0i   00  U0i
1i  10  U1i
Within School Errors
V(eti)=σ2
Macro Level
23
120
Red: Computer
Blue: Lecture
100
SUBID
53
80
32
18
60
15
14
ACH
40
11
6.0
20
4.0
0.0
TIME
.5
1.0
1.5
MAti =γ00 + γ10 Timeti+U0i +U1i Timeti+ eti
DV with Continuous IV by Categorical IV
SPSS MIXED Syntax:
1 MIXED mathach with Time
Default: REML
2 /METHOD = REML
(Restricted
Likelihood)
SpecifyMaximum
random effects:
Captures
the overall model
option:capture
3 /Fixed = intercept Time Other
Effects
the betweenML (Maximum
Likelihood)
School differences
4 /Random = intercept Time
|Subject(Subid) COVTYPE (UN)  00  01 



5 /PRINT = G SOLUTION TESTCOV  10
11 
Structure of
Produce asymptotic
Execute. Print G matrix
G matrix
standard errors
and
.
Requests for regression
identity variable for Macro level
coefficients
Units (e.g., Subid)
(Unstructured)
Wald Z-tests
for
The covariance
25
Parameter estimates
SPSS Output
Basic Information
Model Dimensionb
Fixed Effects
Random Effects
Residual
Total
Intercept
ti me
Intercept + time a
Number
of Levels
1
1
2
4
Covariance
Structure
Unstructured
Number of
Parameters
1
1
3
1
6
Subj ect
Variables
subid
a. As of version 11.5, the syntax rul es for the RANDOM subcommand have chang ed. Your
command syntax may yi eld results that differ from those produced by prior versions. If
you are using SPSS 11 syntax, pl ease consult the current syntax reference gui de for
more informati on.
b. Dependent Variabl e: Achieve.
26
Information Criteriaa
-2 Restricted Log
Likelihood
Akaike' s Information
Criterion (AIC)
Hurvich and Tsai's
Criterion (AICC)
Bozdog an's Criterion
(CAIC)
Schwarz's Bayesian
Criterion (BIC)
2509.873
2517.873
2518.004
2536.819
2532.819
The information criteria are displayed
in smaller-is-better forms.
a. Dependent Variable: Achieve.
27
Type III Tests of Fixed Effectsa
Source
Intercept
time
Numerator df
1
1
Denominator
df
77
77
F
871.772
13.701
Sig .
.000
.000
a. Dependent Variable: Achieve.
a
Estimates of Fixed Effects
(γ00) Average MA score at Time=0
Parameter
Intercept
time
Estimate
54.25609
2.3760897
Std. Error
1.8375833
.6419278
df
77
77
t
29.526
3.701
Sig .
.000
.000
95% Confidence Interval
Lower Bound
Upper Bound
50.5969939
57.9151856
1.0978482
3.6543313
a. Dependent Variable: Achieve.
(γ10) Average Trend of the MA score
Requested by the “Solution” command in the PRINT statement (Line 5)
28
Requested by the “TESTCOV” command in the PRINT statement (Line 5)
Estimates of Covariance Parametersa
Parameter
Residual
σ2
Intercept +
UN (1,1)τ00
time [subject τ01 UN (2,1)τ
10
= subid]
UN (2,2)τ11
95% Confidence Interval
Lower Bound
Upper Bound
70.2936565
109.5658788
133.0294456
306.5824032
Estimate
87.75982
201.9517
Std. Error
9.9368430
43.01424
Wald Z
8.832
4.695
Sig .
.000
.000
-.1513755
11.31083
-.013
.989
-22.3201972
22.0174463
14.58960
5.5482320
2.630
.009
6.9237677
30.7428445
a. Dependent Variable: Achieve.
Asymptotic standard errors and Wald Z-tests
Random Effect
Requested by the “G” command
in the PRINT statement (Line 5)
a
Covariance Structure (G)
Intercept |
subid
Intercept | subidτ00
201.9517
τ10 -.1513755
time | subid
time | subid
-.1513755
14.5895961
Unstructured
a. Dependent Variable: Achieve.
τ01
τ11
 00  01 



11 
 10
29
Can I have a simpler G matrix (i.e. τ01= τ10 =0)
• Compare
 00 0 
 00  01 
With 



 0  11 
 10  11 
-2LL: 2509.873
-2LL: ?
Likelihood Ratio Test!
30
Syntax for fitting simpler G
 00 0 
0  
11 

SPSS syntax
/random = intercept Time |subject(Subid)
COVTYPE (Diag)
31
(Model with τ01= τ10 =0)
-2 Res Log Likelihood
(or Deviance)
2509.873
Choose This
χ2(1)=.000, p=1.00
(Model with τ01= τ10 ≠0)
-2 Res Log Likelihood
(or Deviance)
2509.873
32
Compare to model with τ11= 0
 00
0

0

0
SPSS syntax
/random = intercept |subject(Subid) COVTYPE (Diag)
33
(Model with τ01=τ10=0, τ11≠0)
-2 Res Log Likelihood
2509.873
Choose This
 00 0 
0  
11 

χ2(1)=14.51, p<.001
(Model with τ11=τ01=τ10= 0)
-2 Res Log Likelihood
 00
0

0

0
2524.387
Halved P-value
34
Result of the final Model
Estimates of Covariance Parametersa
Parameter
Residual
Intercept + time [subject
= subid]
σ2
τ00Var: Intercept
τ11Var: time
Estimate
87.794973
201.7136
Std. Error
9.591118
39.133631
Wald Z
9.154
5.154
Sig .
.000
.000
14.556515
4.964819
2.932
.003
95% Confidence Interval
Lower Bound
Upper Bound
70.872958
108.757380
137.910425
295.034860
7.459959
28.403928
a. Dependent Variable: Achieve.
Random Effect Covariance Structure (G)a
Intercept | subid
time | subid
Intercept |
subid
201.7136
0
time | subid
0
14.556515
Diagonal
a. Dependent Variable: Achieve.
Estimates of Fixed Effectsa
γ00
γ10
Parameter
Intercept
time
Estimate
54.256090
2.376090
Std. Error
1.836838
.641668
df
89.672
89.672
t
29.538
3.703
Sig .
.000
.000
95% Confidence Interval
Lower Bound
Upper Bound
50.606708
57.905472
1.101242
3.650938
a. Dependent Variable: Achieve.
35
• 1. On average, is there any trend of the math
achievement over time?
Mathaˆ chti  54.26  2.38Timeti
• 2. Are there any differences between students
on the trend of math achievement over time?
(Or, do all students have the same trend of math
achievement over time?)
τ00 = 201.71 τ11 = 14.56
• Q3. If Yes to Q2, what causes the differences?
36
• Micro Level (Level 1):
MAti = 0i + 1i Timeti + eti
(Variance of eti = σ2)
Null Hypothesis:
Different teaching
methods have SAME
effects on achievement
over time
(H0: γ11 = 0)
• Combined Model:
MAti =γ00 + γ01 Compi + γ10 Timeti + γ11Timeti*Compi
+ U0i + U1i SESti + eti
• Macro Level (Level 2):
β0i =γ00 + γ01 Compi + U0i
β1i =γ10 + γ11 Compi + U1i
(Variance of U0i = τ00; Variance of U1i = τ11)
37
MAij =γ00 + γ01 Compi + γ10 Timeti + γ11Timeti*Compi +
U0i + U1i Timeti + eti
• SPSS PROC MIXED Syntax:
MIXED mathach with Time
/METHOD = REML
/Fixed = intercept Comp Time Time*Comp
/Random = intercept Time
|Subject(Subid) COVTYPE (Diag)
.
/PRINT = G SOLUTION TESTCOV
Execute.
38
With Comp in the Macro models
a
Random Effect Covariance Structure (G)
Intercept | subid
time | subid
Intercept |
subid
176.1636
0
time | subid
0
9.813461
Diagonal
a. Dependent Variable: Achieve.
Without Comp in the Macro models
Random Effect Covariance Structure (G)a
Intercept | subid
time | subid
Intercept |
subid
201.7136
0
time | subid
0
14.556515
Diagonal
a. Dependent Variable: Achieve.
39
(WITHOUT “Comp” in the model)
0 
201.71
G

14.56
 0
(WITH “Comp” in the model)
176.16 0 
G

9.81
 0
Proportion of variance in the intercept ( 00) explained by
“Comp”=(201.71-176.16)/201.71 = .13 (or 13%)
Proportion of variance in the slope ( 11 ) explained by
“Comp”=(14.56-9.81)/14.56 = .33 (or 33%)
40
Solution for Fixed Effects
Effect
Standard
Estimate
Error
Intercept 50.3769
time
0.5756
computer 7.7583
time*comp 3.6009
2.4764
0.8445
3.5021
1.1943
DF
76
232
76
232
t Value
Pr > |t|
20.34
0.68
2.22
3.02
<.0001
0.4962
0.0297
0.0029
Acˆhti  50.38  7.76* Compi  .58* Timeti  3.60* Compi  Timeti
41
Acˆhti  50.38  7.76* Compi  .58* Timeti  3.60* Compi  Timeti
Overall Model for students in the Lecture method group
Mathˆ achti  50.38  .58Timeti
Overall Model for students in the Computer method group
Mathˆ achti  58.14  4.18Timeti
Random Effect
176.16 0 
G
9.81
 0
V(eti)=σ2=90.00
42
Achievement
Computer
Time=0 :
Immediately
posttest
measure
Lecture
43
Time (Year)
Conclusion
• Advantages of using MLM over traditional ANOVA
approaches for analyzing longitudinal data:
– 1. Can flexibly model the variance function
– 2. Retain meaning of the random effects 00 11
– 3. Explore factors which predict individual differences in
change over time (e.g., Treatment effect)
– 4.Take both unequal spacing and missing data into
 ,
account
Take Home Exercise
A clinical psychologist wants to examine the
impact of the stress level of each family member
(STRESS) on his/her level of symptomatology
(SYMPTOM). There are 100 families, and
families vary in size from three to eight
members. The total number of participants is
400.
a) Can you write out the model? (Hint: What is in
the micro model? What is in the macro model?)
b) Can you write out the syntax (SPSS) to
analyze this model?
45
c) In designing the study, what possible macro
predictors do you think the clinical psychologist
should include in her study? (e.g. family size?)
d) In designing the study, what possible micro
predictors do you think the clinical psychologist
should include in her study? (e.g. participant’s
neuroticism?)
e) Can you write out the model? (Hint: What is in
the micro model? What is in the macro model)
f) Can you write out the syntax (SPSS) to
analyze this model?
46
b) SYMPTOMij = γ00 + γ10 STRESSij
+ U0j + U1j STRESSij + eij
SPSS Syntax:
MIXED Symptom with Stress
/fixed = intercept Stress
/random = intercept Stress |subject (Family)
COVTYPE (UN)
/PRINT = G SOLUTION TESTCOV.
execute.
47
a)
Micro-level model:
SYMPTOMij = β0j + β1j STRESSij + eij
Macro-level model:
β0j = γ00 + U0j
β1j = γ10 + U1j
Combined model:
SYMPTOMij = γ00 + γ10 STRESSij
+ U0j + U1j STRESSij + eij
48
THE END!
THANK YOU!