Transcript Slide 1

Type I error control using law of iterated
logarithm in cumulative meta-analysis
Mingxiu Hu, Ph.D.
Head of Biostatistics
Millennium Pharmaceuticals/The Takeda Oncology Company
Collaborators: Gordon Lan and Joseph Cappelleri
Midwest Biopharmaceutical Statistics Workshop
May 19, 2009
© 2008 Millennium Pharmaceuticals Inc., The Takeda Oncology Company
Outline
▐
▐
▐
▐
▐
Meta-analysis vs. cumulative metaanalysis
Key challenge with conventional methods
Law of Iterated Logarithm
Simulation scope and results
Summary
Meta-Analysis
• Statistical analysis of data from multiple studies
• Synthesize and summarize results, especially useful
for rare events in safety analyses
• Integrated safety analyses
• Integrated efficacy analyses
• Quantify sources of possible heterogeneity & bias
Meta-Analysis vs. Mega-Analysis
• Meta-analysis: Obtain one estimate from each study
and then combine the estimates to obtain an overall
estimate via weighted average
• Mega-analysis: lump all data from different studies
together to obtain one estimate. Treat patients from
different studies as if they were from the same study:
• Ignore between-study variation
• Different studies may have different effects
A Reason for Not Simply Lumping Results
Fictional results of two randomized controlled trials
Treatment Group
Trial
Deaths
Number
Control Group
Risk
Deaths Number
Risk
Risk Ratio
A
20
100
20%
40
100
40%
20/40 = 0.50
B
50
500
10%
20
100
20%
10/20 = 0.50
Total
70
600
11.7%
60
200
30%
11.7/30 = 0.39
Meta analysis:
RR=0.5
Mega analysis:
RR=0.39
11.7/30  0.50
Statistical Models of Pooling in Meta-Analysis
• Fixed Effect Model
– assumes a common treatment effect
– weights studies by the inverse of the within-study (sampling) variance
• Random Effect Model
–
allows for different treatment effects.
– weights studies by the inverse of the total variance (sum of withinstudy variation and between-study variation)
– tends to be more conservative (gives broader confidence interval)
Famous Meta-Analysis
• Nissen and Wolski (NEJM, 2007) on GSK’s Avandia:
Methods We conducted searches of the published literature, the Web site of the Food and Drug
Administration, and a clinical-trials registry maintained by the drug manufacturer
(GlaxoSmithKline). Criteria for inclusion in our meta-analysis included a study duration of more
than 24 weeks, the use of a randomized control group not receiving rosiglitazone, and the
availability of outcome data for myocardial infarction and death from cardiovascular causes. Of 116
potentially relevant studies, 42 trials met the inclusion criteria. We tabulated all occurrences of
myocardial infarction and death from cardiovascular causes.
Results Data were combined by means of a fixed-effects model. In the 42 trials, the mean age of the
subjects was approximately 56 years, and the mean baseline glycated hemoglobin level was
approximately 8.2%. In the rosiglitazone group, as compared with the control group, the odds ratio
for myocardial infarction was 1.43 (95% confidence interval [CI], 1.03 to 1.98; P=0.03), and the
odds ratio for death from cardiovascular causes was 1.64 (95% CI, 0.98 to 2.74; P=0.06).
Conclusions Rosiglitazone was associated with a significant increase in the risk of myocardial
infarction and with an increase in the risk of death from cardiovascular causes that had borderline
significance. Our study was limited by a lack of access to original source data, which would have
enabled time-to-event analysis. Despite these limitations, patients and providers should consider the
potential for serious adverse cardiovascular effects of treatment with rosiglitazone for type 2
diabetes.
Famous Meta-Analysis
Summary on Nissen & Wolski example:
▐ Drug: Avandia (generic name: Rosiglitazone)
▐ Data source: FDA website
▐ Number of trials: 42
▐ Method: meta-analysis
▐ Statistical model: fixed-effects
▐ Primary endpoints: Myocardial infarction (MI), death from
cardiovascular causes
▐ Results:
▌
▌
▐
▐
▐
Odds ratio for MI =1.43, p=0.03
Odds ratio for death =1.64, p=0.06
Conclusion: Avandia associated with increased cardiovascular events
Impact: Multi-billion dollar drop in market cap
Statistical controversy: probably not significant if random-effects
models are used
Famous Meta-Analysis
• Juni et al. (Lancet, 2004) on Merck’s Vioxx:
Famous Meta-Analysis
Summary on Juni’s example:
▐ Drug: Vioxx (generic name: refecoxib)
▐ Data source: FDA
▐ Number of trials: 18 RCTs and 11 observational studies
▐ Method: Cumulative meta-analysis
▐ Objective: When did Vioxx’s CV risk become evident
▐ Statistical model: random-effects
▐ Primary endpoints: Myocardial infarction (MI)
▐ Results:
▌
▌
▐
▐
▐
RR=2.30 with p=0.010 at the end of 2000
RR=2.24 with p=0.007 a year later
Conclusion: Vioxx should have been withdrawn several years earlier
Impact: legal implications
Statistical controversy: no adjustment for repeated testing
Cumulative Meta-Analysis
(chronologically ordered RCTs)
Conduct a new statistical pooling every time a new
trial or a set of new trials become available
▐ Performed retrospectively, to identify the year when
sufficient evidence had been accumulated to show a
treatment was effective or toxic
▐ Performed prospectively, effective treatment or toxicity
may be identified at the earliest possible moment
▐ Reveals (temporal) trend towards superiority of the
treatment or the control, or indifference
▐
Basic Method of Cumulative Meta-Analysis
Studies ordered
chronologically or
by cov ariates
Study 1
Study 2
Study 3
Study 4
Study n-1
Study n
Pool Studies 1 to 2
Pool Studies 1 to 3
Pool Studies 1 to 4
Pool Studies 1 to n-1
Pool Studies 1 to n
Cumulative M-A 1
Cumulative M-A 2
Cumulative M-A 3
Cumulative M-A n-2
Cumulative M-A n-1
Example of Cumulative Meta-Analysis
(Lau et al. NEJM 1992)
What are the key challenges
How to control overall type I error for repeated testing
▐ It does not fit into the conventional group sequential
framework due to the heterogeneity between studies
▌ May spread out for a long period of time
▌ Patient population may not be identical
▌ Medical technology change
▐ Not know in advance how many tests we will have, which
makes multiple comparison methods hard to apply (also
due to the complexity of correlations between tests)
▐ Unreliable between-study variance estimation especially
at the beginning of the testing process when we only have
a small number of studies
▐
What are the key challenges
Overall type I error rates of conventional methods and group
sequential methods (nominal =0.025, =0.4314, =1) based
on 100,000 simulation replications. Two sample continuous
case
Average
Sample
Size
Maximum Traditional O’Brien
Pocock
LIL
Fleming
Boundary Method
Number of CMA
method
Boundary 
tests



500
5
15
25
0.2156
0.2810
0.3016
0.1474
0.1065
0.0926
0.1864
0.1906
0.1860
0.0171
0.0235
0.0248
Law of Iterated Logarithm: Motivation
▐
▐
▐
▐
Xi ~ iid N(µ,2=1)
H0: µ = 0 vs. H1: µ > 0
Sn = X1 + … + Xn ~ N(0, n) under H0
▌
standardized test statistic Zn = Sn / n ~
▌
each Zn “practically bounded,” i.e., P(Zn > z| H0) = 
Infinite sequence {Z1, … Zn, …} is not “practically bounded”
▌
▐
N(µ=0,1) under H0
P(Zn > C, for some n | H0) = 1 for any C, no matter how large
Implication to CMA: Under the null, the null will be rejected eventually with
probability of 1 by repeated testing
Law of Iterated Logarithm: Motivation
▐
LIL states that


S
n
P lim sup
 1  1
2n  ln(ln(n)) 
 n 
• Modified test statistic Z*n = Sn / 2n  ln[ln(n) ]
• Sequence {Z*n } is bounded in probability
• Therefore, for any given  (no matter how small), we can
find a constant C such that Pr (Z*n >C for some n) 
CMA Tests Based on LIL: One-Sample Continuous Case
▐
In order to make C=Z, i.e., Pr (Z*(n)>Z , for some n)=, we
replace “2” in the denominator of Z*n by an adjusting factor  to
obtain test statistic for cumulative meta-analysis
Sn
Z ( n) 
  n  ln[ln(n)]
*
• When we conduct group sequential analyses, at the k-th inspection, we
will use test statistics
S (k )
Z (k ) 
  nck  ln[ln(nck )]
*
• nck = cumulative number of patients at the kth inspection
• S(k) = sum up to the kth inspection
•  = correction factor to be determined by simulations
CMA Tests Based on LIL: Extension to General Case
▐
In general, LIL-based cumulative standard test statistic at
kth inspection (replace nck by Ick)
S (k )
Z (k ) 
  I ck  ln[ln(I ck )]
*
•  = correction factor to control alpha level
• S (k )   j 1 I j ˆ j weighted sum of treatment effects
k
• I ck   j 1 I j cumulative information up to the kth
inspection
k
I j  1 / var(ˆ j )
•
CMA Tests Based on LIL: Information
▐
Two sample mean difference (continuous):
ˆ  Y  Y , I  ( 2 / n   2 / n   2 ) 1
j
1j
2j
j
1j
1j
2j
2j
 = 0 for fixed effects model
CMA Tests Based on LIL: Information
▐
Two sample odds ratio (binary)
ˆ j  ln( pˆ 1 j qˆ 2 j / pˆ 2 j qˆ1 j )
1
1
1
1
Ij (



  2 ) 1
n1 j p1 j n1 j q1 j n2 j p2 j n2 j q2 j
▐
Two sample relative risk (binary)
ˆ  ln( pˆ / pˆ ), I  ( q1 j  q2 j   2 ) 1
j
1j
2j
j
n1 j p1 j n2 j p2 j
▐
Two sample risk difference (binary)
ˆ  pˆ  pˆ , I  ( p1 j q1 j  p2 j q2 j   2 ) 1
j
1j
2j
j
n1 j
n2 j
CMA Tests Based on LIL: Between-Study Variance
▐
Traditional between-study variance estimates:
Qk  (k  1)
ˆ  k
k
k
2
ˆ
ˆ
 j 1 I j   j 1 I j  j 1 Iˆ j
2
k
Qk   j 1 Iˆ j (ˆ j  
k
▐
(k ) 2
w
)
This can be negative and unstable when the number
of studies is small. For the first 5 studies, use a
conservative estimate:
k
1
2
(k ) 2
ˆ
ˆ
k 
( j   )

k  1 j 1
CMA Tests Based on LIL: Correction factor 
▐
▐
▐
▐
▐
▐
Two sample mean difference (continuous): =2
Two sample odds ratio (binary): =2
Two sample relative risk (binary): =2
Two sample risk difference (binary): =1.5
These  values work for 0.025 (one-sided) or 0.05
(two-sided).
For smaller type I error rates such as =0.01, the
adjusting factors need to be increased slightly (our paper
provided a formula which gives a rough but practical
estimate. Can also get it through simulation)
Simulation Scope: What have been Evaluated
▐
▐
▐
▐
▐
▐
▐
Maximum number of inspections: 5, 10, 15, 20, 25, 50,
100
Average number of subjects per study: 40-4000
Ratio of two group within-study variance varying
between [1/4, 4] and one of them is simulated from
2(5)/5
Ratio of between-study SD to within-study SD: 0.1 to 10
In the discrete case, P~N(P0, 2) and P0 varies from
0.05 to 0.9
Sample size allocation to 2 groups: 40% to 60%
Number of studies per inspection: single or multiple
(Poison (1.5))
Simulation Results: A Taste of the Flavor-Alpha
Comparison of overall type I error rates (nominal =0.025, =0.4314,
/~[0.2, 5],E2 =1) based on 100,000 simulation replications. Two
sample continuous case
Sample Number of
Size
maximum
tests
Traditional
CMA
method

O’Brien
Fleming
Boundary

Pocock
LIL
Boundary Method


500
0.2156
0.2810
0.3016
0.1474
0.1065
0.0926
0.1864
0.1906
0.1860
5
15
25
0.0171
0.0235
0.0248
• For the LIL-based method, the type I error rates are not appreciably
different as the number of inspections increases further to 50 and 100
(lnln(n) kicks in when n gets large)
Simulation Results: A Taste of the Flavor-Power
Comparison of power (=0.025, =0.4314, /~[0.2, 5],E2 =1)
based on 100,000 simulation replications.
Treatment Average Study
Difference Size
Number of
Inspections
LIL Method
Traditional Method
0.4
100
15
25
0.7004
0.9099
0.9906
0.9997
500
15
25
0.7511
0.9388
0.9956
0.9999
100
15
25
0.9683
0.9989
0.9999
1.0000
500
15
25
0.9788
0.9994
1.0000
1.0000
0.6
Power*
* Not really comparable because different testing size
Example
Random Effects Cumulative Meta-Analysis for Stroke Example (Single Study per
Inspection): Standardized Test Statistics
Study (yr)
(Yr.)
(yr.)
1
(1980)
*yr)
2
(1982)
3 (1984)
4 (1984)
5 (1984)
6 (1985)
7 (1985)
8 (1993)
9 (1993)
10 (1993)
11 (1993)
12 (1993)
13 (1995)
14 (1996)
15 (1996)
16 (1997)
17 (1997)
18 (1998)
Extra Penalty
--0.08
-0.07
-0.10
-0.11
0.005
-0.12
-0.22
-0.40
-0.58
-1.41
-1.70
-1.73
-1.42
-1.25
-1.25
-1.60
-1.97
Without Extra Penalty
--0.31
-0.28
-0.40
-0.34
-0.31
-0.70
-0.64
-0.76
-1.30
-1.88
-1.70
-1.73
-1.42
-1.25
-1.25
-1.60
-1.97
No Correction
--0.44
-0.35
-0.61
-0.58
-0.43
-0.97
-0.89
-1.06
-1.83
-2.65
-2.38
-2.44
-1.99
-1.76
-1.76
-2.26
-2.80
Summary and Discussion
▐
▐
▐
Meta-analysis is naturally useful for safety analysis,
especially for rare AEs, which need extremely large
sample size to reach conclusions
Type I error cannot be controlled by traditional group
sequential methods because of the between-study
variations and unknown maximum information
General multiple comparison methods do not apply
because we do not know in advance how many tests
that will be conducted
Summary and Discussion
▐
▐
▐
LIL-based method with a tailor-made between-study
variance estimate at the beginning of a testing
process controls type I error for a broad range of
practical situations
A universal adjusting factor  may be conservative
when the maximum number of inspections are small.
If one knows the maximum number of inspections in
advance or in retrospective meta-analyses, we may
reduce  to increase power (if necessary) using
simulations
In meta-analyses, power is usually lesser a concern
References
▐
▐
▐
▐
▐
Hu, Cappelleri, and Lan, Clinical Trials 2007, 329-340
Lan, Hu, and Cappelleri, Statistica Sinica, 2003,
1135-45
Berkey et al. Controlled Clinical Trials 1996;17:357371
Pogue, Yusuf. Controlled Clinical Trials 1997;18:580593
Whitehead. Statistics in Medicine 1997;16:2901-2913