REI Summer Fellowship Biostatistics

Download Report

Transcript REI Summer Fellowship Biostatistics

Biostatistics Case Studies 2008
Session 5:
Choices for Longitudinal Data Analysis
Peter D. Christenson
Biostatistician
http://gcrc.labiomed.org/biostat
Case Study
Study Goal - General
Specific Primary Aim
The “ANCOVA” would be a t-test, if we ignored the
baseline values and the different centers.
The outcome is change in HAM-A. The groups are
drug and placebo. The signal:noise ratio is ……
Comparison of Change Means with t-test
Strength of Treatment
Effect:
Signal:Noise Ratio t=
Observed Δ
SDΔ √(1/N1 + 1/N2)
Δ = Drug - Placebo Mean (Final-Base) Diff in HAM-A
changes
SD = Std Dev of within group HAM-A changes
N1 = N2 = Group size
| t | > ~1.96
↔
p<0.05
Comparison of Change Means with t-test
(Actually adjusted for
baseline and center)
Strength of Treatment
Effect:
Signal:Noise Ratio t=
Observed Δ
SDΔ √(1/N1 + 1/N2)
=
-11.8 - (-10.2)
(?) √(1/134 + 1/132)
= -1.10 → p=0.27
More than Two Visits
How can we
get one
signal:noise
ratio
incorporating
all visits?
Perhaps we
want to
detect
treatment
effect at any
visit.
Suppose Only Three Visits - Weeks 0, 4, 8
Two
Treatment
Differences
in Changes:
P1
D1
P2
Δ1 = D1 - P1
D2
Δ2 = D2 - P2
Total Effect:
Δ 12 + Δ 22
Comparison of Change Means with ANOVA
Strength of Treatment
Effect:
Signal:Noise Ratio F=
Observed (Δ12 + Δ22 )
√V
V involves SDΔ1 and SDΔ2 and the 1/Ns.
Large F
↔ Δ12 + Δ22 too large to be random
↔ p<0.05
Repeated Measures ANOVA
• The previous slide is “classical” repeated measures
ANOVA.
• Could have many groups and many time points.
• If the overall “total” effect is significant, then we
would examine which Δs are the cause.
• Same conclusions if changes from baseline, not
sequential changes were used.
Since the signal or effect Δ12+Δ22 equally weights the
two Δ, we must know all changes for a subject. If we
do not (missing data), then that subject is completely
removed from the analysis.
Mixed Models for Repeated Measures (MMRM)
• “Classical” repeated measures ANOVA uses only
subjects with no missing visits.
• MMRM overcomes that limitation by making a
signal:noise ratio as the weighted average of
signals or effects from sets of subjects with the
same missing visit pattern.
• MMRM still provides the overall ratio, as in the
classical ANOVA that cannot handle missing
visits.
Mixed Models for Repeated Measures (MMRM)
The next four slides use a simpler example to
give the idea of how the weighting is done in
MMRM.
These four slides can be skipped to get to the
bigger picture of longitudinal analyses.
MMRM Example*
Consider a crossover (paired) study with 6 subjects.
Subject 5 missed treatment A and subject 6 missed B.
LOCF
Difference
8
2
-1
8
0
0
2.83
Completer analysis would use IDs 1-4; trt diff=4.25.
Strict LOCF analysis would impute 22,17; trt diff=2.83.
*Brown, Applied Mixed Models in Medicine, Wiley 1999.
MMRM Example Cont’d
ΔW=4.25
Paired
ΔB=5
Unpaired
Mixed model gets the better* estimate of the A-B
difference from the 4 completers paired mean Δw=4.25.
It gets a poorer unpaired estimate from the other 2
subjects ΔB = 22-17 = 5.
How are these two “sub-studies” combined?
*Why better?
MMRM Example Cont’d
ΔW=4.25
Paired
ΔB=5
Unpaired
The overall estimated Δ is a weighted average of the
separate Δs, inversely weighting by their variances:
Δ = [ΔW/SE2(ΔW) + ΔB/SE2(ΔB)]/K
= [4.25/4.45 + 5.0/43.1]/(1/4.45 + 1/43.1) = 4.32
The 4.45 and 43.1 incorporate the Ns and whether
data is paired or unpaired.
MMRM - More General I
The example was “balanced” in missing data, with
information from both treatments A and B in the
unpaired data.
What if all missing data are for A, and none for B?
The unpaired A mean is compared with the combined
A and B mean, giving an estimate of half of the A - B
difference. It is appropriately weighted with the paired
A - B estimate.
Competing Conclusions
The next three slides show differences
obtained by using different repeated measures
approaches.
These three slides can be skipped to get to
other approaches for longitudinal analyses.
Competing Conclusions
Imputation with LOCF
• Ignores potential progression; conservative; usually
attenuates likely changes and ↑ standard deviations.
• No correction for using unobserved data as if real.
30
denotes
imputed:
N=63/260
HAM-A
Score
Completer
Individual
Subjects
0
0
1
2
3
4
Week
6
8
Use all 260 values
as if observed here.
Completer vs. LOCF vs. MMRM Analysis
LOCF Analysis
Δ b/w groups = 1.8
N=260:
197 actual, 63 imputed
Completer Analysis
Δ b/w groups = 2.5
N=197:
197 actual
(Week 8
or earlier)
MMRM uses all
available visits for all
260. No imputation
MMRM vs. Classical: Why Distinguish?
Doesn’t distinguishing MMRM and classical seem to
be about a minor technical point about weighting?
Why make such a big deal?
The MMRM is not in many basic software packages.
It is not obvious how to perform it in software that
does have it.
So, it is not user-friendly yet.
If you have missing data, ask a statistician to set it up
in software correctly.
Other Approaches to Longitudinal Data
So far, we have considered all sequential
changes or changes from baseline.
What other outcomes could be of interest?
Some Other Goals with Longitudinal Data
Use one visit at a time:
• Compare treatments at each time separately
- doesn’t look at changes in individuals.
• Compare treatments at end of study.
Create summary over time:
• Compare average over time - trends unimportant.
• Specific pattern features, as in pharmacokinetic
studies of AUC, peak, half-life, etc.
• Compare treatments on rate of change over time.
Average over Time - Trends Unimportant
Area Under the
Curve (AUC),
divided by total
length of time, is an
average outcome,
weighted for time.
. . . AUC . . .
Larger weights are
given to the larger
time intervals, since
AUC is just a sum
of trapezoids.
“Growth” Curves
Parabola or line or
equation based on
theory describes
time trend.
The idea is to
compare
treatment groups
on a parameter
describing the
pattern, e.g.,
slope.
“Growth” Curves
The logic is to
compare treatment
groups by finding
means over subjects
in each group for a
parameter describing
the pattern, e.g.,
slope.
Next slide for correct
method.
“Growth” Curves
The idea is to compare treatment groups on a
parameter describing the pattern, e.g., slope.
Conceptually, we could just fit a separate regression
line for each subject, get the slopes, and compare
mean slopes between groups with a t-test.
But subjects may have different numbers of visits,
and the slope might be correlated with the intercept
(e.g., start off higher → smaller slope).
So, another form of “mixed models” is more accurate:
“random coefficient” models. They give slopes also.
Like MMRM, they are not very user-friendly in
software, so ask a statistician to set up.
Summary on Mixed Models Repeated Measures
• Currently one of the preferred methods for missing
data.
• Does not resolve bias if missingness is related to
treatment.
• Requires more model specifications than is typical.
• Mild deviations from assumed covariance pattern do
not usually have a large influence.
• May be difficult to apply objectively in clinical trials
where the primary analysis needs to be detailed
a priori.
• Can be intimidating; need experience with modeling;
software has many options to be general and
flexible.