Transcript Design of Clinical Research Protocols
Study Design and Hypothesis Testing in Clinical Research
Jonathan J. Shuster, Ph.D ([email protected]) Research Professor of Biostatistics Univ. of Florida, College of Medicine 1
Take-home Messages
• • • • •
Rely on Evidence-Based Medicine. Conventional wisdom can easily lead us astray. The objective of Statistics is to make informed inferences about a population, based on a sample. It is imperative to quantify the uncertainty.
The P-value is a quantity that allows us to infer something about whether a scientific hypothesis is false.
Non-significant results are inconclusive Randomization and intent-to-treat are vital components in sound clinical research
2
3
Topics
1. Motivating Evidence-Based Clinical Studies 2. Objective of Statistics 3. Hypothesis testing and P-values 4. Real Examples and their lessons 4
5
1. Motivating Evidence-Based Medicine
• A coin is “loaded”, with a 70% chance of landing heads. One player picks a three outcome sequence (e.g. HTH), then the other picks a different sequence. Whoever’s sequence comes up first is the winner.
• Do you want to choose first, and if so, what sequence to you select?
6
Evidence-Based Medicine
• So you decided to go first and pick HHH, right?
• OK, I pick THH.
• HHH can only occur before THH if it is on the first three flips. (If the first time HHH occurs is flips 6,7,8 then flip 5 is T, so flips 5,6,7 are THH, I win. (I make your first 2, my last 2, so I tend to stay ahead.) • Your chance of winning=.7
3 =.343 (34.3%) 7
Evidence-Based Medicine
• Lesson from this example.
• Things are not always what they seem. You need to be a healthy skeptic.
• Reference: Shuster, J. A two-player coin game paradox in the classroom.
American Statistician
, 2006(Feb), vol 60, pp 68-70.
8
9
2. Objective of Statistics
• To make an inference about a defined target population from a representative sample.
• That is, for us, to start from a medical hypothesis about a medical condition, help design a study that can collect data to test the question, and draw conclusions. Quantifying the uncertainty about the inference is a key part.
10
2. Comment on This
• Should we compare treatment groups statistically in a randomized study with respect to baseline parameter (e.g. age, gender, ethnicity, blood pressure)?
11
2. Provenzano: Clin J Am Soc Nephrol 4, 386-93, 2009 • “Baseline characteristics were similar except for more men in the oral iron group compared with the ferumoxytol group (62.9%
versus
50.0%,
P
0.04). Mean baseline laboratory measures were similar between the two treatment groups.” 12
2. Comment on This
• For hypothesis driven research, should we test for normality before using a t-test, and if we reject try to transform the data? 13
Nissen Article
•
JAMA.
2008;299(13):1561-1573.
Pioglitazone vs Glimepiride on Progression of Coronary Atherosclerosis in Patients With Type 2 Diabetes Comparison of
• ‘For continuous variables with a normal distribution, the mean and 95% confidence intervals (CIs) are reported. For variables not normally distributed, median and interquartile ranges are reported and 95% CIs around median changes were computed using bootstrap resampling.’ (N=273 vs 270 in groups) 14
2. Testing Assumptions
Diagnostic Test Passes Fails 15
16
3. Testing a Hypothesis (P-Value) • Put a statement on Trial: “Null Hypothesis” • ISIS #2 (International Sudden Infarct Study #2): The five week mortality rates for Streptokinase and Placebo are equivalent in patients with recent MIs • Results: Strep(791/8592=9.2%) vs. Plac(1029/8595=12.0%) 17
3. P-Value • P=3.8* 10
-9 • If you replicated the experiment in a population where the null hypothesis was true, there is a 3.8 in a billion chance of seeing a difference at least as extreme in either direction (2-sided) 18
3. ISIS #2 Reference
• ISIS #2 Collaborative Group. (1988) Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17,187 cases of acute myocardial infarction: ISIS 2,
Lancet
2: 349-360.
19
3. P-Value and Proof by Contradiction • What is the probability that if you replicated your experiment in a target population where your null hypothesis is true that you would see differences at least as extreme as what you actually observed. If this value (
the p-value
) is small it is evidence against this null hypothesis.
• Analogy is beyond a reasonable doubt. Science uses 5% arbitrarily as “reasonable” doubt in most cases. 20
3. Was this overkill in terms of sample size • Suppose the results were 79/859 vs. 103/860 (same percentages of 9.2% vs. 12.0% but with one tenth the sample size).
• Now P=0.071 (7.1%), and would not be statistically significant. Would we be using this clot buster today? It was the biostatistician, Sir Richard Peto who determined this sample size.
21
3. ISIS #2:
• Any other questions about the study?
22
3. ISIS #2 Issues
• Who was watching the store. Accrual took 3.5 years and outcome was known for each patient within five weeks.
• Always report a sample size justification in your papers (Provenzano, slide 12, did not).
23
4. Real Example
• Coronary Drug Project 24
The Coronary Drug Project Research Group (1980)
• Influence of adherence to treatment and response of cholesterol on mortality in the Coronary Drug Project. NEJM 303: 1038 1041.
• Double blind randomized study of Clofibrate vs. Placebo in men who had prior MI.
25
Compliers vs. Not on Drug
25
Coronary Drug Project
20 15 10 5 0 C_Drug NC_Drug C_Drug NC_Drug 26
Compliers vs. Not
27
Drug vs. Placebo
28
Coronary Drug Project Take home Message
What can this study teach us about Clinical Studies?
29
Intent-to-Treat
• The gold standard for analyzing randomized clinical trials is Intent-to-treat. Patients are analyzed in the groups they were assigned to, irrespective of what they actually received. 30
31
4. Real UF Example:
• Effectiveness of Nesiritide on Dialysis or All-Cause Mortality in Patients Undergoing Cardiothoracic Surgery.
Clinical Cardiology
. 2006; Jan;29(1):18-24. with T. Beaver et. al. • Motivation: Shands impression was that it was harmful and costly.
32
4. Nesiritide Example
• Study Null Hypothesis: 20 day death/dialysis rate in patients getting nesiritide within two days of surgery have the same death rate as “similar” patients not getting it.
• Design Suggestions?
33
4. Possible Designs (+/-)
• Observational: Historical Control (Compare period before drug) to period after drug started to be given to a sizable fraction (gap during ramping up of use). Must include all comers and use electronic chart review.
• Observational: Compare those getting to those not getting the drug. • Randomized controlled prospective trial 34
4. Sources of Variation
• Within treatments, why might we not get the same result for every patient?
• Historical Control?
• Comparing concurrent nesiritide vs. not?
• Randomized prospective trial?
35
4. Sources of Bias (Confounders)
• Why might we see differences that might be totally unrelated to the treatment (nesiritide vs. not)?
• Historical Control?
• Comparing concurrent nesiritide vs. not?
• Randomized prospective trial?
36
4. Nesiritide: Propensity Scoring
• Actual Design: Compared Nesiritide vs. Not by Propensity Score Matching.
• Using 12 key covariates, we estimated the probability that a patient would get Nesiritide given these covariates. Then we matched the nesiritide patients to non nesiritide patients for the propensity, and did a matched analysis.
37
4. Conclusions
• Nesiritide showed no significant difference (inconclusive) within CABG patients, • Nesiritide showed promise in aneurysm subjects with baseline elevated SCR, but was inconclusive in other such patients.
• Run a future randomized double-blind trial in aneurisms with elevated SCR (Just completed and close to being in press with an inconclusive result.) 38
4. Conclusion (continued)
• Note that the Shands study data were very important in designing the randomized follow-up study, in terms of the number of subjects needed (power analysis).
39
Take-home Messages
• • • • •
Rely on Evidence-Based Medicine. Conventional wisdom can easily lead us astray. The objective of Statistics is to make informed inferences about a population, based on a sample. It is imperative to quantify the uncertainty.
The P-value is a quantity that allows us to infer something about whether a scientific hypothesis is false.
Non-significant results are inconclusive Randomization and intent-to-treat are vital components in sound clinical research
40
Design One Together
• Medical Question: Does Caffeine Withdrawal cause Headaches?
41
Eligibility
42
Design
• What are the sources of variation besides caffeine consumption?
• How do we control caffeine consumption • Should we use deception—hide purpose of study? Is this ethical?
43
Design
• Pre-Post?
• Double Blind Parallel Study?
• Double Blind Crossover Study?
44
Forensics for Irregularity
Phenylephrine 45
Phenylephrine Crossover Studies
46
Phenylephrine (Baseline NAR)
Study (10 mg vs Placebo) 1 (N=16) (EB) Std Dev 2.0
CV=100SD/Mea n 15.3% 2 (N=10) (EB) 3 (N=16) 4 (N=15) 5 (N=16) 6 (N=16) 7 (N=14) 0.9
7.8
9.5
6.2
9.8
9.4
6.7% 36.3% 35.6% 29.3% 40.4% 35.3% 47
How do we test for Data Irregularities?
• Background: Baseline NAR (Nasal Airway resistance) measures are typically xx.x (e.g. 20.2), and are always based on the mean of 10 observations (5 from each nostril).
• What null hypothesis can we test to find potential irregularities? What P-value might we use to declare significance?
48
Study 1 0:2 1:4 2:2 3:6 4:2 5:
23
6:8 7:9 8:3 9:5 Baseline Last Digit (3 rd sign) 9 4 7 5 Study 2 5 2 1 10 3 4 49
• Thank You!!
50
Coronary Drug ProjectCoronary Drug Project Data
Five Year Mortality (Clofibrate) • Compliers: 15.0% (15.7%) (N=708) • Non-Compliers: 24.6%(22.5%) (N=357) • Compliers took >80% of their meds to death or to 5 years whichever was first.
• In () is 5 year mortality, adjusted for prognostic factors.
51
Coronary Drug Project
Five Year Mortality (Placebo) • Compliers: 15.1% (16.4%) (N=1813) • Non-Compliers: 28.2%(25.8%) (N=882) • Compliers took >80% of their meds to death or to 5 years whichever was first.
• In () is 5 year mortality, adjusted for prognostic factors.
52
Coronary Drug Project
Five-year mortality (As randomized) • Clofibrate: 20.0% (N=1103) • Placebo: 20.9% (N=2789) • NB: Compliance could not be assessed in a small number of patients.
53