Critical Appraisal Skills
Download
Report
Transcript Critical Appraisal Skills
How to Interpret
Research Evidence
EBM Workshop
September.2007
Aaron Tejani
[email protected]
Declaration
Paid by:
Fraser Health 80%
Therapeutics Initiative, UBC 20%
No perceived or actual conflict of interest with
the pharmaceutical industry in the last 4 years
What do these guys have in
common?
Questions?
The wise man doesn’t give the right answers,
he poses the right questions
- Claude Levi-Strauss
Identifying Misleading Claims
Cautionary Tales in the Clinical Interpretation
of Therapeutic Trial Reports.
Cautionary tales in the interpretation of systematic
reviews of therapy trials
Scott et al. IntMedJ2005;35:611-21
Scott et al. IntMedJ 2006;36:587–599
Users’ Guide to Detecting Misleading Claims in
Clinical Trial Reports.
Montori VM, Jaeschke R et al. BMJ2004;329:1093-96
Evidence Based Medicine
Definition
The integration of best research evidence with clinical
expertise and patient values.
When these three elements are integrated, clinicians and patients
form a diagnostic and therapeutic alliance with optimized clinical
outcomes and quality of life.
David Sackett and colleagues
EBM is NOT purely academic or financial exercise
Its implementation has major clinical implications that can
save lives and prevent harm
Critical Appraisal
Definition
A method of assessing and interpreting the
evidence by systematically considering its
Validity
Results
Relevance
Essential part of evidence-based clinical
practice
Required to determine BEST evidence
Different Forms of Evidence
Systematic review
A rigorous, systematic process to identify, synthesis and
evaluate the available literature
Can be used to change practice by implementing the best
available literature
Meta-analysis
It is an extension of a well done systematic review, which
provides a quantitative estimate of the net benefit
aggregated over the included studies
Different Forms of Evidence
Clinical Trials
Randomized controlled trial (RCT)
Minimize bias
Cohort
Useful for topics with known health risks
Harms of smoking
Effects of drugs in pregnancy
Different Forms of Evidence
Case series or case control
Useful for identifying areas that require further
investigation
Help identify adverse effects
Expert opinion
When other forms of data are not available
Hierarchies of Evidence
I-1
I-2
II-1
II-2
III
IV
Systematic review of several double-blind
randomized control trials
One or more large double-blind randomized
control trials
One or more well conducted cohort
studies
One or more well-conducted case-control
studies
Expert committee sitting in review, peer
leader opinion
Personal experience
**There are many different types of grading systems
The Numbers
TI letter #16 1996
Real Clinical Scenarios
TI letter #16 1996
Absolute Risk Reduction (ARR)
Absolute Risk Reduction illustrates
The actual decrease from control to treatment in
terms of effect
The absolute change
Example
Absolute risk reduction for the use of beta blockers
post MI
3.9%
Absolute Risk Reduction
Calculating Absolute Risk Reduction
ARR =X-Y
x=control event rate
y=treatment event rate
Example:
Treatment cancer rate=4%
Control cancer rate=8%
ARR=8-4=4% in cancer with treatment compared to
control
Number Needed to Treat (NNT)
Tool to place results into humanistic terms
Number needed to treat (NNT)
Benefit of therapy
Number needed to harm (NNH)
Harm of therapy
Number Needed to Treat
Calculation of Number Needed to Treat (NNT)
NNT= 100/ARR(%)
Previous exmaple
ARR=4%, NNT=100/4=25
Treat 25 people to prevent one cancer
Calculation of Number Needed To Harm
NNH= 100/ARI(%)
NNH
Example
Treatment GI bleed rate=10%
Control GI bleed rate=5%
ARI=10-5=5% increased risk with treatment for a
GI bleed
NNH=100/5=20
1 GI bleed will occur for every 20 people treated
Important
Only calculate ARR/ARI/NNT/NNH if the
result is statistically significant!!
Relative Risk Reduction (RR)
Definition
Difference between the control and treatment
usually in terms of “reducing the chances”
Relative Risk Reduction
Calculating Relative risk (RR)
X is control group
Y is treatment group
RR = Y/X
Calculating Relative Risk Reduction (RRR)
X is control
Y is treatment
RRR =
CER-EER/CER
Previous example=8-4/4=50% relative reduction in cancer
with treatment
Types of Outcomes
Dichotomous/categorical
Yes or No
Possible to calculate absolute risk and NNT
Continuous
Blood pressure
Rating scales
In order to calculate AR and NNT/NNH, a
clinically relevant change must be clearly defined a
priori
Confidence Intervals
If the trial is repeated an infinite number of times, the
results will fall within this range 95% of the time
95% certain that difference found is within the stated
range, 5% likelihood it is due to chance
If p=0.05
if p=0.05
Helps to determine how precise the results are
Narrow versus wide confidence interval
Point estimate
CI Humour Break
No, you can’t leave the room
P-Values
Determines if results are true or could be due to
chance (Type 1 error, alpha)
P value of 0.05 means that there is a 5% probability that the
results are due to chance
P-value of 0.01 means that there is a 1% probability that
the results are due to chance
Two tail or one tail tests
Specifies direction of difference
i.e 2-sided, can see differences in positive or negative direction
P value needs to be adjusted for multiple comparisons
Power
Power
Need to have enough people in the study to have
enough power to determine if a difference actually
occurred
If no difference seen, then need to consider the
sample size/power calculation
If a difference is seen, power is not an issue
Alpha
Choosing your Alpha
What you are willing to accept
5% or 1% probability that difference is due to chance
Once you pick your alpha you CANNOT change it post
hoc
Alpha relates to p-value and is used to calculate
sample size
Risk, Odds and Hazards
Risk Ratio (relative risk)
If 24 skiers
6 fall
The risk of falling is 25% (6/24)
Odds
If 24 skiers
6 fall
18 do not fall
Odds is 6/18 or 1/3
The chances of falling were 3 to 1 against
3 times more likely not to fall than to fall
Odds Ratio
Odds of one treatment versus odds of the other
treatment
If 24 skiers
6 fall
If 24 snowboarders
12 fall
Odds ratio 0.25/0.5
50%
Half as likely to fall if you are skiing as compared to
snowboarding
Continuous Outcomes
E.g. blood pressure, rating scales, etc
Important points
Make sure scale is valid
Need to know what a “clinically relevant” change
in the scale is
Need to understand what the scale is measuring
i.e linear, non-linear, Likert-type scale
Don’t get hung up with STATS!!
The appropriate statistical test is not as
important as methods and outcomes
DO NOT PANIC
More important aspects of critical appraisal
Clearly defined methods
Reasonable and clinically significant outcomes and
measurements
Randomization
Define Randomization
A method based on chance alone by which
study participants are assigned to a treatment
group. Randomization minimizes the
differences among groups by equally
distributing people with particular
characteristics among all the trial arms.
www.medterms.com/script/main/art.asp?articlekey
=38700
What Characteristics Are Equally
Distributed?
The Benefits of Randomization
Everyone has an equal chance of getting
assigned treatment or control
MOST IMPORTANT:
Groups are divided equally for known and
unknown characteristics
This ensures…
Differences in outcome are likely only due to
differences in assigned treatment
If randomization was deemed successful
User’s Guides’ Statement
“The beauty of randomization is that it assures,
if sample size is sufficiently large, that both
known and unknown determinants of outcome
are evenly distributed between treatment and
control groups. “
http://www.cche.net/usersguides/therapy.asp
Was Randomization Effective?
Look at Baseline characteristics to see if they were
balanced
NOTE: This does not account for unknown characteristics
but if numbers are large and known’s are balanced
Can assume unknown’s are as well
p-values when comparing baseline characteristics
Don’t worry about them
If you think the differences may have an effect then they
might
Doesn’t matter is they are “statistically significant differences”
User’s Guides’ Statement
“The issue here is not whether there are
statistically significant differences in known
prognostic factors between treatment groups
(in a randomized trial one knows in advance
that any differences that did occur happened
by chance) but rather the magnitude of these
differences.”
http://www.cche.net/usersguides/therapy.asp
NAC for Prevention of RF in
Cardiac Surgery
If baseline characteristics are
different…
All is not lost
Authors can do analyses adjusting for
differences
Should clearly state that this was done and how
If adjusted and unadjusted analyses show
similar results
You can be more confident of the findings
Not Randomized…So What?
Non-randomised studies overestimate
treatment effect by 41% with inadequate
method, 30% with unclear method
JAMA 1995 273: 408-12.
Completely different result between
randomized and non-randomized studies
BritJAnaesth1996; 77: 798-803.
Red and Yellow Lollipops!!
Red
Male
Female
Born Jan-June
Watched the
news last night
Plays an
instrument
Has a pet
Yellow
Allocation
Concealment (AA)
Example:
GP in his office with 2 elderly gentlemen in his
waiting room
He knows one guy well and this man has had
terrible luck with his health
He is enrolling men into a clinical trial
He knows that the next person is going to get Box
A and the one after that gets Box B
Last month he enrolled a man who got Box A and
has not improved rather he has gotten worse
What is AA?
Is it blinding?
Can it always be done?
Can blinding always be done?
What is selection bias?
Example…
What is AA?
Definition:
“…shields those involved in a trial from knowing
upcoming assignments. Without this protection,
investigators and patients have been known to
change who gets the next assignment, making the
comparison groups less equivalent”
Purpose?
To reduce “selection bias”
Evid.Based Med. 2000;5;36-38
What is AA?
If AA is not successful what design feature is
compromised and why?
“…if the investigator or clinician (or the patient) is
able to identify the impending treatment allocation
and is able to influence the enrolment (or selection) of
participating patients, the value of randomisation is
compromised.”
May lead to imbalances in prognostic factors between
groups
MJA 2005;182(2):87-89
What is good AA?
Opaque-sealed envelopes
Pharmacy-controlled allocations
Coded identical medication containers
Telephone or web-based central randomization
Who Cares if AA Wasn’t Done?
Those trials that report inadequate methods of
AA
Report ~30% larger effect sizes compared to trials
that use sound methods of AA
Those trials that do not mention AA
Report ~40% larger effect sizes compared to trials
that reported AA
JAMA 1995;273:408–12.
What do you do if AA isn’t done
well?
If you metaanalyze trials…
Do a sensitivity
analysis of trials
with good AA
versus no AA
E.g.Atypical
Coverage in
patients with CAP
What do you do if AA isn’t done?
If there is no meta-analysis…
Assume the effect size may be exaggerated and base
decisions on conservative estimate of effect size
i.e look at the conservative end of the confidence interval
Mortality reduction 6% (95% CI 1% to 8%)
May conclude reduction is probably between 1% and %5
CAUTION: This is not scientific! This is my attempt at
common sense…
Reporting
Do not assume that AA was not done if not
reported
Can ask the authors if they did or not
Only 9-15% of trials adequately report these
methods
Bad reporting
Good reporting
MJA 2005;182(2):87-89
Blinding (Masking)
Definitions
Single blind
Double blind
Either clinician OR patient is unaware of assigned
treatment
Both clinician and patient are unaware of assigned
treatment
Triple blind
Clinician, patient, and people who adjudicate outcomes are
unaware of treatment assignment
A question
What is “double-dummy”?
The purpose of blinding
Attempts to minimize
Reporting bias
Assessment bias
E.g.
E.g.
Concomitant treatment bias
E.g.
How can you tell if blinding is
broken?
If the authors test for success of blinding
Blinding may be broken when
One treatment has a particular side effect that
would give it away
E.g. infusion site reactions
Look at ADR table and see if this may be occurring
Testing the success of blinding
Some would argue that success of blinding testing is
not reliable
(Sackett Int J Epi 2007;36:665-666)
These tests only tell us about the “hunches” people have
It is better to measure the effects of lost blinding
Co-intervention (get the study drug by other means)
Contamination (controls get open label treatment with another
drug)
Reporting bias (e.g.study drug people down-play symptoms)
What is the consequence of broken
blinding?
Studies with poor/absent blinding tend to over
estimate treatment effects by ~17%
JAMA 1995;273:408-12.
What to do if blinding is broken?
Assume the effect size is an over-estimate
Look to the conservative end of the confidence
interval
Mortality reduction 6% (95% CI 1% to 8%)
May conclude reduction is probably between 1%
and %5
CAUTION: This is not scientific! This is my
attempt at common sense…
How do you interpret open-label
trials?
If there is a meta-analysis
Do a sensitivity analysis on blinded versus unblinded trials and see if the effect size changes
If there is no meta-analysis
Assess whether blinding was possible
Interpret findings carefully knowing they could be
an overestimation of an effect
Especially if blinding was possible
MJA 2005;182(2):87-89
Intention-to-treat Analysis
Very Important!!
Definition:
Analyze participants into the groups to which they
were randomized
Even if they did not take assigned treatment, dropped
out early, or did not follow protocol exactly
Intention-to-treat Analysis
Loss to Follow Up
Each trials should report the number of people
lost and how many per group
Usual assumption
General rule:
Nothing bad happened to people that were lost
>20% of randomized population lost…analysis
becomes unreliable
Consider worse case scenario and see if this
changes the result
A Primer on the
Interpretation of
Subgroup Analyses (SA)
in Clinical Trials
George Bernard Shaw
“Beware of false knowledge, it is more
dangerous than ignorance.”
Different?
What is a Subgroup Analysis?
Drug A versus Drug B are equal with respect
to mortality in patients with coronary artery
disease
You want to see if this finding is the same in men
versus women
Or do Diabetics have a mortality benefit as
compared to non-Diabetics?
Why is SA So Common in Trials?
Basically to see if different types of patients
respond differently to the same treatment
If there is an overall benefit of treatment
See if there is more or less benefit in a subgroup
If there is no overall effect
See if there is an effect in at least a certain type of
patient
What Questions Should SA Attempt
to Answer?
At what stage of disease is treatment most
effective?
How are the risks and benefits of treatment
related to co-morbidity?
What time after an event is treatment most
effective?
Example: CAPRIE Trial
Lancet 1996; 348: 1329–39
Clopidogrel versus ASA for reducing ischemic
events in patients at risk
Patients with MI, Stroke, or PAD history
DBRCT
Clopidogrel 75mg po daily n>30,000
ASA 325 po daily
Duration 1-3 years
Primary: ischemic stroke, MI, vascular death
Example: CAPRIE Trial
Lancet 1996; 348: 1329–39
Problems with SA
1. Multiple testing
The more tests you do,
the greater the
probability of finding a
difference (that is really
due to chance)
Need to correct P value
for multiple comparisons
Crude way
0.05/ # tests
Cook DI et al. MJA 2004;180(3):289-91
Adjusting for Multiplicity
K=number of independent tests performed
p* is the smallest p-value calculated
Corrected p= 1-(1-p*)k
The adjusted p-value can then be compared to
the traditional p=0.05 as being statistically
significant or not
Adjusting for Multiplicity
Example: CHARISMA
Clopidogrel+ASA vs ASA alone in patients
with CVD
NOTE: No benefit in overall population
Symptomatic vs asymptomatic
N Engl J Med 2006;354:1706-17.
Adjusting for Multiplicity
Example: CHARISMA
N Engl J Med 2006;354:1706-17.
Adjusting for Multiplicity
Example: CHARISMA
12 subgroups
Unadjusted p=0.046
Adjusted p-value for symp vs asymp=
1-(1-0.046)12=0.43
NEJM 2006;354:1667-1669
NEJM 2006;354:1706-17
Probability of False Positives
e.g. 4 subgroups
anayzed
Approximately 1520% chance of
falsepositive…instead
of a 5% chance
(i.e p=0.05)
NEJM 2006;354:1667-1669
Lancet 1996; 348: 1329–39
Problems with SA
1. Multiple testing
Using the CAPRIE example
3 subgroups ~ 3 tests for the primary outcome
0.05 / 3 = 0.0167 is the adjusted p-value
E.g. for any of the subgroups would have to have p<0.0167 to
be SS or adjusted p=0.008 =1-(1-p)k
Problems with SA
2. Lack of statistical power for SA
Most studies powered for the whole population
only
Studies usually do not have power to detect
differences in subgroup
If a difference is seen, there is enough power
but it may be a false positive
If no difference is seen in a subgroup it may be
due to actual lack of power
Caution
When there is no overall effect
Subgroups will show an effect 7-21% of the time
that aren’t real
When there is an overall effect
Subgroups won’t show a difference 41-66% of the
time
Health Technol Assess 2001 5: 1–56.
Tests for Interaction
A test for “heterogeneity of treatment effect”
The appropriate statistical test
Does not test the magnitude of difference between
subgroups and the overall population
Does test to see if the subgroup effect is different
form the overall effect but says nothing of by how
much
Tests for Interaction
E.g. CAPRIE
Lancet 1996; 348: 1329–39
Tests for Interaction
E.g. CHARISMA
Lancet 1996; 348: 1329–39
Problems with SA
3. Subgroups are not truly “randomized”
Randomization works to balance known and
unknown factors in the overall population
Subgroups are NOT truly randomized unless
randomization was:
Stratified
This allows SA to be based on pre-randomization
characteristics
Problems with SA
3. Subgroups are not truly “randomized”
E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA
3. Subgroups are not truly “randomized”
E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA
3. Subgroups are not truly “randomized”
E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA
3. Subgroups are not truly “randomized”
E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA
3. Subgroups are not truly “randomized”
E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA
3. Subgroups are not truly “randomized”
E.g. TNT trial
Achieving a certain LDL level is a post-randomization
phenomenon
No test for interaction were done/reported on the effect in
patients who achieved lower vs higher LDL levels
Should have randomized to titrating to LDL<2.5 and
LDL<2.0
This trial was a high vs low dose
NEJM 2005;352(March 08):
Problems with SA
4. The play of chance
Even properly done SA can yield significant tests
of interaction by chance alone
Especially when the subgroup effect is not plausible or
it is unanticipated
CHARISMA
Lancet 1996; 348: 1329–39
Replication is the Solution
Due to lack of power and high probability of
false-positives
Subgroup differences should be “hypothesis
generating”
The findings should be tested in a trial designed for this
purpose
Replication is the Solution
Checklist for “Good” SAs
1. Subgroups should be:
Based on pre-randomization characteristics and
stratified accordingly
YES
Based on intent-to-treat population
To maintain the benefits of randomization
YES
Checklist for “Good” SAs
2. Must be pre-defined
And have a plausible biological reason for
choosing the subgroup
YES
Should justify the direction of the expected
difference in the subgroup
NO
Checklist for “Good” SAs
3. Reporting
All numerators and denominators should be
reported
# of planned subgroups
YES
YES
Address the issue of multiple testing
NO
Checklist for “Good” SAs
4. Statistical Analysis
Need to do tests for interaction (heterogeneity of
effect between overall population and subgroup)
A significant interaction test tells you
The subgroup effect is different than overall effect
DOES NOT tell you anything about the magnitude of
difference
Checklist for “Good” SAs
5. Interpretation of findings
The overall effect should be stressed
Due to the high risk of false-positive findings in subgroups
Not unusual to find a SS difference in a subgroup when NSS
overall finding
YES, the overall effect was stressed
In General…
Focus on overall results
Use results in subgroups only if
Significant interaction test (heterogeneity)
You still want to explore possible reasons for
interaction
i.e make sure there is a biological reason for the
subgroup difference
Has this difference been found in other studies?
Practical Application
•So for CAPRIE…
•Could use the results to
treat PAD patients with
Clopidogrel but not stroke
or MI patients
•Might want to study the
effect of clopidogrel in MI
patients
Subgroup Analyses References
Lagakos SW. NEJM 2006;354(16):1667-9
Cook DI et al. MJA 2004;180(3):289-91
Simes RJ et al. MJA 2004;180(5):467-9
Rothwell P. Lancet 2005;365:176–86
Demographics
Composite Outcome
(CO) Interpretation
What is a Composite Outcome?
Example #1
TIME Trial (Lancet 2001; 358: 951–957)
Invasive therapy versus medical management for
symptomatic coronary artery disease
Death, non-fatal MI, admission for ACS
“The primary endpoint was analysed by
intention to treat as a composite endpoint, and
all components separately as secondary
endpoints.”
What is a Composite Outcome?
Example #2:
CAPRIE Trial (Lancet 1996; 348: 1329–39)
Clopidogrel versus ASA and ischemic events in patients
at risk
The first occurrence of ischaemic stroke, myocardial
infarction, or vascular death.
No stated/planned assessment of individual components of the
composite
What is a Composite Outcome?
Example #3:
ValHeFT Trial (N Engl J Med 2001;345:1667-75.)
Added Valsartan versus standard therapy for CHF
Mortality and the combined end point of mortality and
morbidity:
defined as cardiac arrest with resuscitation, hospitalization for
heart failure, or administration of intravenous inotropic or
vasodilator drugs for four hours or more with hospitalization.
Benefit of Composite Outcomes
Statistical Efficiency
Improved medical care has led to low event rates
(e.g. fewer MIs, strokes, etc…)
Need large trials with long follow up to
demonstrate differences between treatments
Not “feasible” for many researchers
COs allow researchers to show differences in smaller
trials with shorter durations
What do you think?
Death / MI/ Admission for ACS??
Death / MI / Stroke??
Cardiac arrest with resuscitation,
hospitalization for heart failure, or IV inotropic
or vasodilator drugs for four hours or more
without hospitalization??
The Primary Question About CO
Can I use the analysis of the composite
outcome comparison between treatment and
control as the basis for a decision?
E.g. If the CO is lower for Intervention X
compared to control would I prescribe X?
OR
If the CO is lower for X do I need to look at the
analysis of the components before I make a
decision?
Checklist for COs1,2
The benefit of using a CO is realized if:
1.
2.
3.
4.
□ Individual components are of equal importance
□ Effects of intervention on components will be similar
(i.e occur at similar frequency)
□ The more important components should not be
negatively affected by the intervention
□ Counting rules are appropriate and individual
component data is presented
1. Individual Components are of Equal Importance
a.
Example#1TIME Trial
–
Death, non-fatal MI, admission for ACS
Invasive
Med
Could argue that death and MI are more important to patients
than admission for ACS
It then becomes important to know:
–
Which part is driving the reduction in the CO?? If it is
admission due to ACS then invasive treatment may not be
worth it for a patient
2. Components Occur at Similar Frequency
a.
Example#1TIME Trial
–
Death, non-fatal MI, admission for ACS
•Easy to see that Death and MI occur much less than Admission for ACS
•In this case the using the CO result when making decisions could be problematic
•The total CO event rate is primarily driven by admissions and not the
outcomes that “may” hold greater importance to a patient
3. Important Components are Not Negatively
Affected
a.
Example#1TIME Trial
–
Death, non-fatal MI, admission for ACS
•The MOST important outcome… numerically occurs more frequently in the
Invasive treatment group but this is NSS
4. Counting Rules are Appropriate and Component
Data is presented
a.
Example#1TIME Trial
–
Death, non-fatal MI, admission for ACS
•YES…Component data is presented…but was the counting proper?
•They don’t tell us if it is “first occurrrence of” or if individual component rates are
for entire study
•What if you had an MI and then a month later you were admitted for ACS?
•Were both events accounted for?
TIME Trial Conclusion
Do not base decisions on analysis of CO rates
alone
Need to look at the details of the components
Example #2 CAPRIE Trial
Ischemic stroke/MI/Vascular Death
1. Components are of equal importance
Yes, most would agree that they are all clinically important
outcomes
In that case looking at the overall result could be used to make a
decision
Be careful
All cause death is not part of the composite
Assumes clopidogrel won’t effect non-vascular death negatively
Need to consider the rest of the checklist
Example #2 CAPRIE Trial
Ischemic stroke/MI/Vascular Death
2. Occur at similar frequencies
???
NFMI occur less frequently than vascular deaths and strokes
Example #2 CAPRIE Trial
Ischemic stroke/MI/Vascular Death
3. Important outcomes are not negatively affected
None of the components were negatively affected
Example #2 CAPRIE Trial
Ischemic stroke/MI/Vascular Death
4. Counting
rules appropriate and component data presented
Component data presented BUT…
Not sure if counting rules were OK
The main table reports “first occurrence of” rates
So it is possible that a patient died at some point after a stroke and this
death was not counted
CAPRIE Trial Conclusion
Could use CO analysis as the basis for a
decision
The only issue is the lower MI event rates versus
other components but this is debatable
Example#3 ValHeFT Trial
Focus on counting rules only
Appropriate for DEATH
E.g. the “total deaths” are greater than “death (as first event)”
Use the CO Analysis if…
1.□
Individual components are of equal importance
2.□ Effects of intervention on components will be
similar (i.e occur at similar frequency)
3.□ The more important components should not be
negatively affected by the intervention
4.□ Counting rules are appropriate and individual
component data is presented
Composite Outcome References
Kleist P. Applied Clinical Trials 2006 (May
Issue)
Montori VM et al. BMJ 2005;330;594-596
Superiority, Noninferiority, and
Equivalence Trials.
Aaron Tejani
April 18. 2007
The Main Purpose of Each
Superiority
Non-inferiority
Is one treatment better than another?
Is one agent no worse than a standard therapy (based on a
pre-defined “no worse” margin)?
Equivalence
Is one agent no worse or no better than a standard therapy
(based on pre-defined limits of no worse or better)?
Superiority Trials
E.g. GUSTO III
Designed to show that RPA would lower mortality
more than TPA in MI patients
RPA 7.47% vs TPA 7.24
Risk difference 0.23 % (2-sided 95%CI, -0.66 to 1.10
percent).
Incorrect conclusion: No difference in mortality
Correct conclusion: not sure what the difference in
mortality is
Non-inferiority Trials
E.g. St. Johns Wort vs Paroxetine
Designed to show that SJW was no worse than
paroxetine at decreasing HamD scores
Non-inferiority Trials
E.g. St. Johns Wort vs Paroxetine
Key points
Defined the “no worse” margin a priori
Basically they were saying that
E.g. If paroxetine reduced HamD by 15 points then
Then the worse case end of the confidence interval for the
difference between SJW and paroxetine would have to be
no more than 2.5 points
Non-inferiority Trials
E.g. St. Johns Wort vs Paroxetine
Paroxetine HamD decrease 11.4
SJW HamD decrease 14.4
The difference is 3 points more with SJW
The range of the difference is 1.5 to 4.0
The worse case is only a 1.5 difference
This worse case is better than a -2.5 difference (the
defined margin)
Hence non-inferiority is proven
Non-inferiority Trials
E.g. St. Johns Wort vs Paroxetine
Hypothetically non-inferiority would not have been proven
if
Paroxetine decrease was 15
SJW decrease was 13
The difference was -2 (range -4 to -1)
The worse case for the difference is -4 points
This is lower than the margin on -2.5 points
Hence SJW would be considered not non-inferior BUT
need to look at per protocol analysis
Non-inferiority Trials
Per protocol and Intention to treat should be done for noninferiority trials
Per protocol
Randomization benefit lost hence many differences between treatment
groups
Analyzing only those that follow protocol so some randomized people are
censored
As a result it becomes harder to prove one agent is no worse than another
because there are now more confounding variables
Authors should always see similar findings in both analyses to
support a non-inferiority claim
E.g. St. Johns Wort vs Paroxetine
The worse case was a 0.7 point difference hence non-inferiority was
proven
Equivalence Trials
3 month vs 6 month follow up of BP patients
by GPs
Designed to show that the difference between 3
and 6 month visit would be less than 10%, either
better or worse (plus/minus 5mmHg)
Equivalence Trials
3 month vs 6 month follow up of BP patients
by GPs
Equivalence was proven for 6 month visits vs 3
month visits
Is Sample Size Different?
Sample size needs to be larger for a noninferiority trial
Harder to show differences within small range
Need more people to be that precise
I.e. expecting small differences
Small differences will only surface with large numbers
of patients
Can You Switch?
If you prove non-inferiority can you then
conclude superiority?
Yes, as the sample size for needed for superiority
would be met by a non-inferiority trial
As long as it is a pre-specified analysis
P-value needs to be re-calculated
Clinical relevance of superiority needs to be
thought of
Can You Switch?
If you prove non-inferiority can you then
conclude superiority?
E.g. SJW vs paroxetine
SJW decreased HamD more than paroxetine
They pre-specified that they would do this
BUT they di not calculate a new p-value
Non-inferiority Margin
This is the most important thing to look at
Needs to be chosen ahead of time
Needs to be based on statistical and clinical
reasoning
Should be derived from the benefit seen with
standard therapy over placebo
Non-inferiority Margin
Should be derived from the benefit seen with standard
therapy over placebo
E.g Drug A is standard
Reduces MIs by 2% vs placebo (95%CI 1-4)
Drug B is new and is being studied vs Drug A in a noninferiority trial
The margin should be set as no worse than 1% mortality
difference with B vs A
1% comes from the worse case of the 95%CI versus placebo of
standard therapy
Should not be a margin of 2% (this doesn’t take into account the
uncertainity in the benefit of Drug A vs placebo)
Checklist
Is this equivalence or non-inferiority?
Is there a margin pre-specified?
Is the margin appropriately justified by authors
or is it arbitrary?
Did they do a per protocol and an ITT
analysis?
Did they say they would look at superiority
ahead of time and was a p-value re-calculated?
Remember…
If no statistially significant difference seen in a
superiority trial
DO NOT conclude “absence of a difference”
Conclude there is “absence of evidence of a
difference”
Checklist cont’d
If they claimed non-inferiority after they
couldn’t show superiority did they check to see
if they had enough power to do so?
They would need more people to conclude noninferiority (hence probably under-powered) or
have enrolled more than needed for superiority so
the seocnd analysis would be OK?
Surrogate Outcomes
Outcomes that are substitutes for measures of
how a person functions, feels, or if they
survive
Which are surrogates?
BP
LDL cholesterol
HbA1C
Stroke
What is a Serious Adverse Event?
An adverse event is any undesirable experience associated with the use of a medical
product in a patient. The event is SERIOUS and should be reported when the
patient outcome is:
Death
Report if the patient's death is suspected as being a direct outcome of the adverse
event.
Life-Threatening
Report if the patient was at substantial risk of dying at the time of the adverse event
or it is suspected that the use or continued use of the product would result in the
patient's death.
Examples: Pacemaker failure; gastrointestinal hemorrhage; bone marrow
suppression; infusion pump failure which permits uncontrolled free flow resulting in
excessive drug dosing.
Hospitalization (initial or prolonged)
Report if admission to the hospital or prolongation of a hospital stay results because
of the adverse event.
Examples: Anaphylaxis; pseudomembranous colitis; or bleeding causing or
prolonging hospitalization.
What is a Serious Adverse Event?
Disability
Report if the adverse event resulted in a significant, persistent, or permanent change,
impairment, damage or disruption in the patient's body function/structure, physical activities
or quality of life.
Examples: Cerebrovascular accident due to drug-induced hypercoagulability; toxicity;
peripheral neuropathy.
Congenital Anomaly
Report if there are suspicions that exposure to a medical product prior to conception or during
pregnancy resulted in an adverse outcome in the child.
Examples: Vaginal cancer in female offspring from diethylstilbestrol during pregnancy;
malformation in the offspring caused by thalidomide.
Requires Intervention to Prevent Permanent Impairment or Damage
Report if you suspect that the use of a medical product may result in a condition which
required medical or surgical intervention to preclude permanent impairment or damage to a
patient.
Examples: Acetaminophen overdose-induced hepatotoxicity requiring treatment with
acetylcysteine to prevent permanent damage; burns from radiation equipment requiring drug
therapy; breakage of a screw requiring replacement of hardware to prevent malunion of a
fractured long bone.
SAE Reporting
Required in all clinical trials
Problem…
Not sure which outcomes are included in SAE
totals
Not always reported in publications
Example: Finasteride for BPH
TI Letter #58 Jan-Mar 2006
Example: Finasteride for BPH
TI Letter #58 Jan-Mar 2006
e.g. Lumiracoxib (TARGET Trial)
Lancet 2004; 364: 665–74
e.g. Lumiracoxib (TARGET Trial)
GI Adverse Events
Lancet 2004; 364: 665–74
e.g. Lumiracoxib (TARGET Trial)
CV Adverse Events
Lancet 2004; 364: 665–74
e.g. Lumiracoxib (TARGET Trial)
SAE Data Not Reported in Published
Trial
Total number of patients with SAEs both substudies
pooled:
Total number of patients with SAEs ibuprofen substudy:
Lumiracoxib 297 (7%), Ibuprofen 272 (6%)
Total number of patients with SAEs naproxen substudy:
Lumiracoxib 588 (6%), NSAIDs 566 (6%)
Lumiracoxib 291 (6%), Naproxen 294 (6%)
Regardless of drug attribution
Data received via personal communication with Dr. Hawkey, March 2007
TNT SAE Data Request
From: John C LaRosa [mailto:[email protected]]
Sent: Tuesday, August 14, 2007 1:53 PM
To: Tejani, Aaron
Subject: Re: TNT Serious adverse event data request
David
When you return, can you have someone provde me with answers?
Thanks
John
"Tejani, Aaron" <[email protected]> 08/14/2007 04:27 PM
To<[email protected]> ccSubjectTNT Serious adverse event data request
Dr. La Rosa We are reviewing the TNT trial with our pharmacy students and were wondering if you were
able to answer 2 questions:
1. Could you provide us with the serious adverse event (SAE) rates in both groups?
2. Were the components of the composite outcome considered and counted as SAEs? E.g were Mis
included in the SAE totals?
Many thanks,
Aaron
Sample Size and Study Power
What it takes to calculate sample size
Relation of sample size to the primary
outcome
Issues related to secondary outcomes, subgroup
differences, etc
Components of a Sample Size
Calculation
Power
The ability to detect a difference that truly exists
Type II error (beta): missing a difference that
exists i.e insufficient power
E.g 80% power means a 20% chance of missing
a true difference
Power= 1 - beta
Components of a Sample Size
Calculation
Level of significance
An alpha level must be chosen
Alpha relates to Type I error
Type I error= detecting an effect when none exists
Chosen alpha becomes your p-value
E.g. alpha=0.05 then p of less than 0.05 is significant
What does p-value (= 0.0X) than what we choose tell us?
Treatment effect found is due to chance X % of the time
Components of a Sample Size
Calculation
Underlying population event rate
Look at previous studies to get this
E.g TARGET Trial
Lumiracoxib versus naproxen
What is the expected rate of GI complications with
Naproxen
Components of a Sample Size
Calculation
Size of treatment effect
This should be the minimal clinically important
difference
This should be justified
Components of a Sample Size
Calculation
Adjusted sample size requirement for lack of
compliance
Achievable treatment effect (which is a
component of the sample size calculation) is
dependent on compliance to treatment
E.g trial with 100 people per arm if 100%
compliance…
If only 80% compliance, need 280 per arm
Checklist
MJA 2002;157:256-7.
General Comments
In calculating a sample size a balance is
required between risk of a Type I error versus a
Type II error
Sample size and all assumptions apply to only
the primary endpoint
Likely that Type I and II errors will occur for
anything other than the primary endpoint, even
with subgroups
i.e. be aware that false negatives and false positives
likely to occur
Remember…
“Torture numbers and they will confess to
anything.”
Gregg Easterbrook
Definition of PICO
Four components of PICO
Used to formulate a clinical question before you read the trial
Patient or problem
Description of patient or target disorder
Intervention
Could include exposure, diagnostic test, prognostic factor, therapy or
patient’s perception
Comparison intervention
Relevant most often when looking at therapy questions
Outcome
Clinical outcome of interest to you and your patient
DOES NOT HAVE TO BE WHAT THE AUTHORS MEASURED!
Arch Int Med 2001;134:657-62
CASP RCT Checklist
User’s Guide
Appraising Systematic Reviews and
Meta-analyses
First 2 Screening Questions
1. Did the review ask a focused question?
PICO?
Recommend thinking of your PCIo first before
reading the review
2. Did the review include the right type of
study?
If it was therapy did they look at RCTs only?
Finding Studies to Include
Should search at least these databases:
Medline/Pubmed
EMBASE
Cochrane’s CENTRAL database of RCTs
No language restrictions
Should also search
Reference lists, experts, conference proceedings,
unpublished studies
Assessing Quality of Studies
What are quality indicators of clinical trials?
If you measure quality need to do something
with the information
Do sensitivity analyses
Extracting data
Need at least 2 people extracting data
Need a standard data extraction form
Why?
To avoid transcription errors
Was it appropriate to Meta-analyze?
Does it make sense to combine data from
included trials?
Was heterogeneity assessed?
If found were reasons for this explored?
If found was random effects model used to analyze
data?
Forest Plots
Final Notes on Systematic Reviews
Systematic, reproducible, defendable
Only as good as the trials included
Main questions to ask:
Did they attempt to get all the trials?
Did they compile the important information and
critically appraise each trial that was included?
Should be the first place you go to answer
clinical questions…
CASP SR Checklist
User’s Guide for Overviews
Questions or Comments
In dwelling, live close to the ground.
In thinking, keep to the simple.
In conflict, be fair and generous.
In governing, don't try to control.
In work, do what you enjoy.
In family life, be completely present.
When you are content to be simply yourself
and don't compare or compete, everybody will respect you.
Tao Te Ching
Verse 8
Acknowledgements
Therapeutic Initiative Group
CASP
Critical Appraisal Skills Program (CASP)
JAMA Users’ Guides to Evidence-based
practice
Bandolier
Fraser Health Research
Susan and Rosa!