Critical Appraisal Skills

Transcript Critical Appraisal Skills

How to Interpret
Research Evidence
EBM Workshop
September.2007
Aaron Tejani
[email protected]
Declaration

Paid by:
Fraser Health 80%
 Therapeutics Initiative, UBC 20%


No perceived or actual conflict of interest with
the pharmaceutical industry in the last 4 years
What do these guys have in
common?
Questions?

The wise man doesn’t give the right answers,
he poses the right questions
- Claude Levi-Strauss
Identifying Misleading Claims

Cautionary Tales in the Clinical Interpretation
of Therapeutic Trial Reports.


Cautionary tales in the interpretation of systematic
reviews of therapy trials


Scott et al. IntMedJ2005;35:611-21
Scott et al. IntMedJ 2006;36:587–599
Users’ Guide to Detecting Misleading Claims in
Clinical Trial Reports.

Montori VM, Jaeschke R et al. BMJ2004;329:1093-96
Evidence Based Medicine

Definition

The integration of best research evidence with clinical
expertise and patient values.

When these three elements are integrated, clinicians and patients
form a diagnostic and therapeutic alliance with optimized clinical
outcomes and quality of life.
David Sackett and colleagues

EBM is NOT purely academic or financial exercise

Its implementation has major clinical implications that can
save lives and prevent harm
Critical Appraisal

Definition

A method of assessing and interpreting the
evidence by systematically considering its
Validity
 Results
 Relevance


Essential part of evidence-based clinical
practice

Required to determine BEST evidence
Different Forms of Evidence

Systematic review



A rigorous, systematic process to identify, synthesis and
evaluate the available literature
Can be used to change practice by implementing the best
available literature
Meta-analysis

It is an extension of a well done systematic review, which
provides a quantitative estimate of the net benefit
aggregated over the included studies
Different Forms of Evidence

Clinical Trials

Randomized controlled trial (RCT)


Minimize bias
Cohort

Useful for topics with known health risks


Harms of smoking
Effects of drugs in pregnancy
Different Forms of Evidence

Case series or case control
Useful for identifying areas that require further
investigation
 Help identify adverse effects


Expert opinion

When other forms of data are not available
Hierarchies of Evidence
I-1
I-2
II-1
II-2
III
IV
Systematic review of several double-blind
randomized control trials
One or more large double-blind randomized
control trials
One or more well conducted cohort
studies
One or more well-conducted case-control
studies
Expert committee sitting in review, peer
leader opinion
Personal experience
**There are many different types of grading systems
The Numbers
TI letter #16 1996
Real Clinical Scenarios
TI letter #16 1996
Absolute Risk Reduction (ARR)

Absolute Risk Reduction illustrates
The actual decrease from control to treatment in
terms of effect
 The absolute change


Example

Absolute risk reduction for the use of beta blockers
post MI
3.9%
Absolute Risk Reduction

Calculating Absolute Risk Reduction
ARR =X-Y
x=control event rate
y=treatment event rate

Example:



Treatment cancer rate=4%
Control cancer rate=8%
ARR=8-4=4% in cancer with treatment compared to
control
Number Needed to Treat (NNT)


Tool to place results into humanistic terms
Number needed to treat (NNT)


Benefit of therapy
Number needed to harm (NNH)

Harm of therapy
Number Needed to Treat

Calculation of Number Needed to Treat (NNT)
NNT= 100/ARR(%)
Previous exmaple
ARR=4%, NNT=100/4=25
Treat 25 people to prevent one cancer

Calculation of Number Needed To Harm
NNH= 100/ARI(%)
NNH

Example
Treatment GI bleed rate=10%
 Control GI bleed rate=5%
 ARI=10-5=5% increased risk with treatment for a
GI bleed
 NNH=100/5=20
 1 GI bleed will occur for every 20 people treated

Important

Only calculate ARR/ARI/NNT/NNH if the
result is statistically significant!!
Relative Risk Reduction (RR)

Definition

Difference between the control and treatment
usually in terms of “reducing the chances”
Relative Risk Reduction

Calculating Relative risk (RR)



X is control group
Y is treatment group
RR = Y/X
Calculating Relative Risk Reduction (RRR)
X is control
 Y is treatment
RRR =
CER-EER/CER
Previous example=8-4/4=50% relative reduction in cancer
with treatment

Types of Outcomes

Dichotomous/categorical
Yes or No
 Possible to calculate absolute risk and NNT


Continuous
Blood pressure
 Rating scales
 In order to calculate AR and NNT/NNH, a
clinically relevant change must be clearly defined a
priori

Confidence Intervals

If the trial is repeated an infinite number of times, the
results will fall within this range 95% of the time


95% certain that difference found is within the stated
range, 5% likelihood it is due to chance


If p=0.05
if p=0.05
Helps to determine how precise the results are


Narrow versus wide confidence interval
Point estimate
CI Humour Break

No, you can’t leave the room
P-Values

Determines if results are true or could be due to
chance (Type 1 error, alpha)



P value of 0.05 means that there is a 5% probability that the
results are due to chance
P-value of 0.01 means that there is a 1% probability that
the results are due to chance
Two tail or one tail tests



Specifies direction of difference
i.e 2-sided, can see differences in positive or negative direction
P value needs to be adjusted for multiple comparisons
Power

Power
Need to have enough people in the study to have
enough power to determine if a difference actually
occurred
 If no difference seen, then need to consider the
sample size/power calculation
 If a difference is seen, power is not an issue

Alpha

Choosing your Alpha

What you are willing to accept
5% or 1% probability that difference is due to chance
 Once you pick your alpha you CANNOT change it post
hoc


Alpha relates to p-value and is used to calculate
sample size
Risk, Odds and Hazards
Risk Ratio (relative risk)

If 24 skiers
6 fall
 The risk of falling is 25% (6/24)

Odds

If 24 skiers
6 fall
 18 do not fall
 Odds is 6/18 or 1/3
 The chances of falling were 3 to 1 against
 3 times more likely not to fall than to fall

Odds Ratio

Odds of one treatment versus odds of the other
treatment





If 24 skiers
6 fall
If 24 snowboarders
12 fall
Odds ratio 0.25/0.5


50%
Half as likely to fall if you are skiing as compared to
snowboarding
Continuous Outcomes


E.g. blood pressure, rating scales, etc
Important points
Make sure scale is valid
 Need to know what a “clinically relevant” change
in the scale is
 Need to understand what the scale is measuring


i.e linear, non-linear, Likert-type scale
Don’t get hung up with STATS!!



The appropriate statistical test is not as
important as methods and outcomes
DO NOT PANIC
More important aspects of critical appraisal
Clearly defined methods
 Reasonable and clinically significant outcomes and
measurements

Randomization
Define Randomization

A method based on chance alone by which
study participants are assigned to a treatment
group. Randomization minimizes the
differences among groups by equally
distributing people with particular
characteristics among all the trial arms.

www.medterms.com/script/main/art.asp?articlekey
=38700
What Characteristics Are Equally
Distributed?
The Benefits of Randomization


Everyone has an equal chance of getting
assigned treatment or control
MOST IMPORTANT:


Groups are divided equally for known and
unknown characteristics
This ensures…

Differences in outcome are likely only due to
differences in assigned treatment

If randomization was deemed successful
User’s Guides’ Statement

“The beauty of randomization is that it assures,
if sample size is sufficiently large, that both
known and unknown determinants of outcome
are evenly distributed between treatment and
control groups. “
http://www.cche.net/usersguides/therapy.asp
Was Randomization Effective?

Look at Baseline characteristics to see if they were
balanced

NOTE: This does not account for unknown characteristics
but if numbers are large and known’s are balanced


Can assume unknown’s are as well
p-values when comparing baseline characteristics


Don’t worry about them
If you think the differences may have an effect then they
might

Doesn’t matter is they are “statistically significant differences”
User’s Guides’ Statement

“The issue here is not whether there are
statistically significant differences in known
prognostic factors between treatment groups
(in a randomized trial one knows in advance
that any differences that did occur happened
by chance) but rather the magnitude of these
differences.”
http://www.cche.net/usersguides/therapy.asp
NAC for Prevention of RF in
Cardiac Surgery
If baseline characteristics are
different…


All is not lost
Authors can do analyses adjusting for
differences


Should clearly state that this was done and how
If adjusted and unadjusted analyses show
similar results

You can be more confident of the findings
Not Randomized…So What?

Non-randomised studies overestimate
treatment effect by 41% with inadequate
method, 30% with unclear method


JAMA 1995 273: 408-12.
Completely different result between
randomized and non-randomized studies

BritJAnaesth1996; 77: 798-803.
Red and Yellow Lollipops!!
Red
Male
Female
Born Jan-June
Watched the
news last night
Plays an
instrument
Has a pet
Yellow
Allocation
Concealment (AA)
Example:

GP in his office with 2 elderly gentlemen in his
waiting room
He knows one guy well and this man has had
terrible luck with his health
 He is enrolling men into a clinical trial
 He knows that the next person is going to get Box
A and the one after that gets Box B
 Last month he enrolled a man who got Box A and
has not improved rather he has gotten worse

What is AA?





Is it blinding?
Can it always be done?
Can blinding always be done?
What is selection bias?
Example…
What is AA?

Definition:


“…shields those involved in a trial from knowing
upcoming assignments. Without this protection,
investigators and patients have been known to
change who gets the next assignment, making the
comparison groups less equivalent”
Purpose?

To reduce “selection bias”
Evid.Based Med. 2000;5;36-38
What is AA?

If AA is not successful what design feature is
compromised and why?

“…if the investigator or clinician (or the patient) is
able to identify the impending treatment allocation
and is able to influence the enrolment (or selection) of
participating patients, the value of randomisation is
compromised.”
May lead to imbalances in prognostic factors between
groups

MJA 2005;182(2):87-89
What is good AA?




Opaque-sealed envelopes
Pharmacy-controlled allocations
Coded identical medication containers
Telephone or web-based central randomization
Who Cares if AA Wasn’t Done?

Those trials that report inadequate methods of
AA


Report ~30% larger effect sizes compared to trials
that use sound methods of AA
Those trials that do not mention AA

Report ~40% larger effect sizes compared to trials
that reported AA
JAMA 1995;273:408–12.
What do you do if AA isn’t done
well?

If you metaanalyze trials…
Do a sensitivity
analysis of trials
with good AA
versus no AA
 E.g.Atypical
Coverage in
patients with CAP

What do you do if AA isn’t done?


If there is no meta-analysis…
Assume the effect size may be exaggerated and base
decisions on conservative estimate of effect size
i.e look at the conservative end of the confidence interval
Mortality reduction 6% (95% CI 1% to 8%)
 May conclude reduction is probably between 1% and %5
 CAUTION: This is not scientific! This is my attempt at
common sense…

Reporting

Do not assume that AA was not done if not
reported


Can ask the authors if they did or not
Only 9-15% of trials adequately report these
methods
Bad reporting
Good reporting
MJA 2005;182(2):87-89
Blinding (Masking)
Definitions

Single blind


Double blind


Either clinician OR patient is unaware of assigned
treatment
Both clinician and patient are unaware of assigned
treatment
Triple blind

Clinician, patient, and people who adjudicate outcomes are
unaware of treatment assignment
A question

What is “double-dummy”?
The purpose of blinding

Attempts to minimize

Reporting bias


Assessment bias


E.g.
E.g.
Concomitant treatment bias

E.g.
How can you tell if blinding is
broken?


If the authors test for success of blinding
Blinding may be broken when

One treatment has a particular side effect that
would give it away
E.g. infusion site reactions
 Look at ADR table and see if this may be occurring

Testing the success of blinding

Some would argue that success of blinding testing is
not reliable



(Sackett Int J Epi 2007;36:665-666)
These tests only tell us about the “hunches” people have
It is better to measure the effects of lost blinding



Co-intervention (get the study drug by other means)
Contamination (controls get open label treatment with another
drug)
Reporting bias (e.g.study drug people down-play symptoms)
What is the consequence of broken
blinding?

Studies with poor/absent blinding tend to over
estimate treatment effects by ~17%

JAMA 1995;273:408-12.
What to do if blinding is broken?


Assume the effect size is an over-estimate
Look to the conservative end of the confidence
interval
Mortality reduction 6% (95% CI 1% to 8%)
 May conclude reduction is probably between 1%
and %5
 CAUTION: This is not scientific! This is my
attempt at common sense…
How do you interpret open-label
trials?

If there is a meta-analysis


Do a sensitivity analysis on blinded versus unblinded trials and see if the effect size changes
If there is no meta-analysis
Assess whether blinding was possible
 Interpret findings carefully knowing they could be
an overestimation of an effect


Especially if blinding was possible
MJA 2005;182(2):87-89
Intention-to-treat Analysis


Very Important!!
Definition:

Analyze participants into the groups to which they
were randomized

Even if they did not take assigned treatment, dropped
out early, or did not follow protocol exactly
Intention-to-treat Analysis
Loss to Follow Up


Each trials should report the number of people
lost and how many per group
Usual assumption


General rule:


Nothing bad happened to people that were lost
>20% of randomized population lost…analysis
becomes unreliable
Consider worse case scenario and see if this
changes the result
A Primer on the
Interpretation of
Subgroup Analyses (SA)
in Clinical Trials
George Bernard Shaw
“Beware of false knowledge, it is more
dangerous than ignorance.”
Different?
What is a Subgroup Analysis?

Drug A versus Drug B are equal with respect
to mortality in patients with coronary artery
disease
You want to see if this finding is the same in men
versus women
 Or do Diabetics have a mortality benefit as
compared to non-Diabetics?

Why is SA So Common in Trials?

Basically to see if different types of patients
respond differently to the same treatment

If there is an overall benefit of treatment


See if there is more or less benefit in a subgroup
If there is no overall effect

See if there is an effect in at least a certain type of
patient
What Questions Should SA Attempt
to Answer?



At what stage of disease is treatment most
effective?
How are the risks and benefits of treatment
related to co-morbidity?
What time after an event is treatment most
effective?
Example: CAPRIE Trial
Lancet 1996; 348: 1329–39

Clopidogrel versus ASA for reducing ischemic
events in patients at risk
Patients with MI, Stroke, or PAD history
 DBRCT
 Clopidogrel 75mg po daily n>30,000
 ASA 325 po daily
 Duration 1-3 years
 Primary: ischemic stroke, MI, vascular death

Example: CAPRIE Trial
Lancet 1996; 348: 1329–39
Problems with SA

1. Multiple testing



The more tests you do,
the greater the
probability of finding a
difference (that is really
due to chance)
Need to correct P value
for multiple comparisons
Crude way

0.05/ # tests
Cook DI et al. MJA 2004;180(3):289-91
Adjusting for Multiplicity




K=number of independent tests performed
p* is the smallest p-value calculated
Corrected p= 1-(1-p*)k
The adjusted p-value can then be compared to
the traditional p=0.05 as being statistically
significant or not
Adjusting for Multiplicity




Example: CHARISMA
Clopidogrel+ASA vs ASA alone in patients
with CVD
NOTE: No benefit in overall population
Symptomatic vs asymptomatic
N Engl J Med 2006;354:1706-17.
Adjusting for Multiplicity

Example: CHARISMA
N Engl J Med 2006;354:1706-17.
Adjusting for Multiplicity

Example: CHARISMA
12 subgroups
 Unadjusted p=0.046
 Adjusted p-value for symp vs asymp=

1-(1-0.046)12=0.43
NEJM 2006;354:1667-1669
NEJM 2006;354:1706-17
Probability of False Positives
e.g. 4 subgroups
anayzed
Approximately 1520% chance of
falsepositive…instead
of a 5% chance
(i.e p=0.05)
NEJM 2006;354:1667-1669
Lancet 1996; 348: 1329–39
Problems with SA

1. Multiple testing

Using the CAPRIE example



3 subgroups ~ 3 tests for the primary outcome
0.05 / 3 = 0.0167 is the adjusted p-value
E.g. for any of the subgroups would have to have p<0.0167 to
be SS or adjusted p=0.008 =1-(1-p)k
Problems with SA

2. Lack of statistical power for SA
 Most studies powered for the whole population
only
 Studies usually do not have power to detect
differences in subgroup
 If a difference is seen, there is enough power
but it may be a false positive
 If no difference is seen in a subgroup it may be
due to actual lack of power
Caution

When there is no overall effect


Subgroups will show an effect 7-21% of the time
that aren’t real
When there is an overall effect

Subgroups won’t show a difference 41-66% of the
time
Health Technol Assess 2001 5: 1–56.
Tests for Interaction


A test for “heterogeneity of treatment effect”
The appropriate statistical test
Does not test the magnitude of difference between
subgroups and the overall population
 Does test to see if the subgroup effect is different
form the overall effect but says nothing of by how
much

Tests for Interaction

E.g. CAPRIE
Lancet 1996; 348: 1329–39
Tests for Interaction

E.g. CHARISMA
Lancet 1996; 348: 1329–39
Problems with SA

3. Subgroups are not truly “randomized”
Randomization works to balance known and
unknown factors in the overall population
 Subgroups are NOT truly randomized unless
randomization was:

Stratified
 This allows SA to be based on pre-randomization
characteristics

Problems with SA

3. Subgroups are not truly “randomized”

E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA

3. Subgroups are not truly “randomized”

E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA

3. Subgroups are not truly “randomized”

E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA

3. Subgroups are not truly “randomized”

E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA

3. Subgroups are not truly “randomized”

E.g. TNT trial
NEJM 2005;352(March 08):
Problems with SA

3. Subgroups are not truly “randomized”





E.g. TNT trial
Achieving a certain LDL level is a post-randomization
phenomenon
No test for interaction were done/reported on the effect in
patients who achieved lower vs higher LDL levels
Should have randomized to titrating to LDL<2.5 and
LDL<2.0
This trial was a high vs low dose
NEJM 2005;352(March 08):
Problems with SA

4. The play of chance

Even properly done SA can yield significant tests
of interaction by chance alone

Especially when the subgroup effect is not plausible or
it is unanticipated
CHARISMA
Lancet 1996; 348: 1329–39
Replication is the Solution

Due to lack of power and high probability of
false-positives

Subgroup differences should be “hypothesis
generating”

The findings should be tested in a trial designed for this
purpose
Replication is the Solution
Checklist for “Good” SAs

1. Subgroups should be:

Based on pre-randomization characteristics and
stratified accordingly


YES
Based on intent-to-treat population
To maintain the benefits of randomization
 YES

Checklist for “Good” SAs

2. Must be pre-defined

And have a plausible biological reason for
choosing the subgroup


YES
Should justify the direction of the expected
difference in the subgroup

NO
Checklist for “Good” SAs

3. Reporting

All numerators and denominators should be
reported


# of planned subgroups


YES
YES
Address the issue of multiple testing

NO
Checklist for “Good” SAs

4. Statistical Analysis

Need to do tests for interaction (heterogeneity of
effect between overall population and subgroup)

A significant interaction test tells you


The subgroup effect is different than overall effect
DOES NOT tell you anything about the magnitude of
difference
Checklist for “Good” SAs

5. Interpretation of findings

The overall effect should be stressed

Due to the high risk of false-positive findings in subgroups


Not unusual to find a SS difference in a subgroup when NSS
overall finding
YES, the overall effect was stressed
In General…


Focus on overall results
Use results in subgroups only if
Significant interaction test (heterogeneity)
 You still want to explore possible reasons for
interaction

i.e make sure there is a biological reason for the
subgroup difference
 Has this difference been found in other studies?

Practical Application
•So for CAPRIE…
•Could use the results to
treat PAD patients with
Clopidogrel but not stroke
or MI patients
•Might want to study the
effect of clopidogrel in MI
patients
Subgroup Analyses References




Lagakos SW. NEJM 2006;354(16):1667-9
Cook DI et al. MJA 2004;180(3):289-91
Simes RJ et al. MJA 2004;180(5):467-9
Rothwell P. Lancet 2005;365:176–86
Demographics
Composite Outcome
(CO) Interpretation
What is a Composite Outcome?

Example #1

TIME Trial (Lancet 2001; 358: 951–957)

Invasive therapy versus medical management for
symptomatic coronary artery disease
 Death, non-fatal MI, admission for ACS
 “The primary endpoint was analysed by
intention to treat as a composite endpoint, and
all components separately as secondary
endpoints.”
What is a Composite Outcome?

Example #2:

CAPRIE Trial (Lancet 1996; 348: 1329–39)
Clopidogrel versus ASA and ischemic events in patients
at risk
 The first occurrence of ischaemic stroke, myocardial
infarction, or vascular death.


No stated/planned assessment of individual components of the
composite
What is a Composite Outcome?

Example #3:

ValHeFT Trial (N Engl J Med 2001;345:1667-75.)
Added Valsartan versus standard therapy for CHF
 Mortality and the combined end point of mortality and
morbidity:


defined as cardiac arrest with resuscitation, hospitalization for
heart failure, or administration of intravenous inotropic or
vasodilator drugs for four hours or more with hospitalization.
Benefit of Composite Outcomes

Statistical Efficiency
Improved medical care has led to low event rates
(e.g. fewer MIs, strokes, etc…)
 Need large trials with long follow up to
demonstrate differences between treatments
 Not “feasible” for many researchers


COs allow researchers to show differences in smaller
trials with shorter durations
What do you think?

Death / MI/ Admission for ACS??

Death / MI / Stroke??

Cardiac arrest with resuscitation,
hospitalization for heart failure, or IV inotropic
or vasodilator drugs for four hours or more
without hospitalization??
The Primary Question About CO

Can I use the analysis of the composite
outcome comparison between treatment and
control as the basis for a decision?


E.g. If the CO is lower for Intervention X
compared to control would I prescribe X?
OR

If the CO is lower for X do I need to look at the
analysis of the components before I make a
decision?
Checklist for COs1,2

The benefit of using a CO is realized if:
1.
2.
3.
4.
□ Individual components are of equal importance
□ Effects of intervention on components will be similar
(i.e occur at similar frequency)
□ The more important components should not be
negatively affected by the intervention
□ Counting rules are appropriate and individual
component data is presented
1. Individual Components are of Equal Importance
a.
Example#1TIME Trial
–
Death, non-fatal MI, admission for ACS
Invasive


Med
Could argue that death and MI are more important to patients
than admission for ACS
It then becomes important to know:
–
Which part is driving the reduction in the CO?? If it is
admission due to ACS then invasive treatment may not be
worth it for a patient
2. Components Occur at Similar Frequency
a.
Example#1TIME Trial
–
Death, non-fatal MI, admission for ACS
•Easy to see that Death and MI occur much less than Admission for ACS
•In this case the using the CO result when making decisions could be problematic
•The total CO event rate is primarily driven by admissions and not the
outcomes that “may” hold greater importance to a patient
3. Important Components are Not Negatively
Affected
a.
Example#1TIME Trial
–
Death, non-fatal MI, admission for ACS
•The MOST important outcome… numerically occurs more frequently in the
Invasive treatment group but this is NSS
4. Counting Rules are Appropriate and Component
Data is presented
a.
Example#1TIME Trial
–
Death, non-fatal MI, admission for ACS
•YES…Component data is presented…but was the counting proper?
•They don’t tell us if it is “first occurrrence of” or if individual component rates are
for entire study
•What if you had an MI and then a month later you were admitted for ACS?
•Were both events accounted for?
TIME Trial Conclusion

Do not base decisions on analysis of CO rates
alone

Need to look at the details of the components
Example #2 CAPRIE Trial
Ischemic stroke/MI/Vascular Death

1. Components are of equal importance

Yes, most would agree that they are all clinically important
outcomes

In that case looking at the overall result could be used to make a
decision

Be careful


All cause death is not part of the composite
 Assumes clopidogrel won’t effect non-vascular death negatively
Need to consider the rest of the checklist
Example #2 CAPRIE Trial
Ischemic stroke/MI/Vascular Death

2. Occur at similar frequencies


???
NFMI occur less frequently than vascular deaths and strokes
Example #2 CAPRIE Trial
Ischemic stroke/MI/Vascular Death

3. Important outcomes are not negatively affected

None of the components were negatively affected
Example #2 CAPRIE Trial
Ischemic stroke/MI/Vascular Death

4. Counting



rules appropriate and component data presented
Component data presented BUT…
Not sure if counting rules were OK
The main table reports “first occurrence of” rates

So it is possible that a patient died at some point after a stroke and this
death was not counted
CAPRIE Trial Conclusion

Could use CO analysis as the basis for a
decision

The only issue is the lower MI event rates versus
other components but this is debatable
Example#3 ValHeFT Trial

Focus on counting rules only

Appropriate for DEATH

E.g. the “total deaths” are greater than “death (as first event)”
Use the CO Analysis if…
1.□
Individual components are of equal importance
2.□ Effects of intervention on components will be
similar (i.e occur at similar frequency)
3.□ The more important components should not be
negatively affected by the intervention
4.□ Counting rules are appropriate and individual
component data is presented
Composite Outcome References


Kleist P. Applied Clinical Trials 2006 (May
Issue)
Montori VM et al. BMJ 2005;330;594-596
Superiority, Noninferiority, and
Equivalence Trials.
Aaron Tejani
April 18. 2007
The Main Purpose of Each

Superiority


Non-inferiority


Is one treatment better than another?
Is one agent no worse than a standard therapy (based on a
pre-defined “no worse” margin)?
Equivalence

Is one agent no worse or no better than a standard therapy
(based on pre-defined limits of no worse or better)?
Superiority Trials

E.g. GUSTO III
Designed to show that RPA would lower mortality
more than TPA in MI patients
 RPA 7.47% vs TPA 7.24


Risk difference 0.23 % (2-sided 95%CI, -0.66 to 1.10
percent).
Incorrect conclusion: No difference in mortality
 Correct conclusion: not sure what the difference in
mortality is

Non-inferiority Trials

E.g. St. Johns Wort vs Paroxetine

Designed to show that SJW was no worse than
paroxetine at decreasing HamD scores
Non-inferiority Trials

E.g. St. Johns Wort vs Paroxetine

Key points
Defined the “no worse” margin a priori
 Basically they were saying that


E.g. If paroxetine reduced HamD by 15 points then
 Then the worse case end of the confidence interval for the
difference between SJW and paroxetine would have to be
no more than 2.5 points
Non-inferiority Trials

E.g. St. Johns Wort vs Paroxetine
Paroxetine HamD decrease 11.4
 SJW HamD decrease 14.4
 The difference is 3 points more with SJW

The range of the difference is 1.5 to 4.0
 The worse case is only a 1.5 difference
 This worse case is better than a -2.5 difference (the
defined margin)


Hence non-inferiority is proven
Non-inferiority Trials

E.g. St. Johns Wort vs Paroxetine

Hypothetically non-inferiority would not have been proven
if






Paroxetine decrease was 15
SJW decrease was 13
The difference was -2 (range -4 to -1)
The worse case for the difference is -4 points
This is lower than the margin on -2.5 points
Hence SJW would be considered not non-inferior BUT
need to look at per protocol analysis
Non-inferiority Trials

Per protocol and Intention to treat should be done for noninferiority trials

Per protocol

Randomization benefit lost hence many differences between treatment
groups




Analyzing only those that follow protocol so some randomized people are
censored
As a result it becomes harder to prove one agent is no worse than another
because there are now more confounding variables
Authors should always see similar findings in both analyses to
support a non-inferiority claim
E.g. St. Johns Wort vs Paroxetine

The worse case was a 0.7 point difference hence non-inferiority was
proven
Equivalence Trials

3 month vs 6 month follow up of BP patients
by GPs

Designed to show that the difference between 3
and 6 month visit would be less than 10%, either
better or worse (plus/minus 5mmHg)
Equivalence Trials

3 month vs 6 month follow up of BP patients
by GPs

Equivalence was proven for 6 month visits vs 3
month visits
Is Sample Size Different?

Sample size needs to be larger for a noninferiority trial
Harder to show differences within small range
 Need more people to be that precise
 I.e. expecting small differences


Small differences will only surface with large numbers
of patients
Can You Switch?

If you prove non-inferiority can you then
conclude superiority?
Yes, as the sample size for needed for superiority
would be met by a non-inferiority trial
 As long as it is a pre-specified analysis
 P-value needs to be re-calculated
 Clinical relevance of superiority needs to be
thought of

Can You Switch?

If you prove non-inferiority can you then
conclude superiority?

E.g. SJW vs paroxetine
SJW decreased HamD more than paroxetine
 They pre-specified that they would do this
 BUT they di not calculate a new p-value

Non-inferiority Margin



This is the most important thing to look at
Needs to be chosen ahead of time
Needs to be based on statistical and clinical
reasoning

Should be derived from the benefit seen with
standard therapy over placebo
Non-inferiority Margin

Should be derived from the benefit seen with standard
therapy over placebo

E.g Drug A is standard



Reduces MIs by 2% vs placebo (95%CI 1-4)
Drug B is new and is being studied vs Drug A in a noninferiority trial
The margin should be set as no worse than 1% mortality
difference with B vs A


1% comes from the worse case of the 95%CI versus placebo of
standard therapy
Should not be a margin of 2% (this doesn’t take into account the
uncertainity in the benefit of Drug A vs placebo)
Checklist





Is this equivalence or non-inferiority?
Is there a margin pre-specified?
Is the margin appropriately justified by authors
or is it arbitrary?
Did they do a per protocol and an ITT
analysis?
Did they say they would look at superiority
ahead of time and was a p-value re-calculated?
Remember…

If no statistially significant difference seen in a
superiority trial
DO NOT conclude “absence of a difference”
 Conclude there is “absence of evidence of a
difference”

Checklist cont’d

If they claimed non-inferiority after they
couldn’t show superiority did they check to see
if they had enough power to do so?

They would need more people to conclude noninferiority (hence probably under-powered) or
have enrolled more than needed for superiority so
the seocnd analysis would be OK?
Surrogate Outcomes


Outcomes that are substitutes for measures of
how a person functions, feels, or if they
survive
Which are surrogates?
BP
 LDL cholesterol
 HbA1C
 Stroke

What is a Serious Adverse Event?









An adverse event is any undesirable experience associated with the use of a medical
product in a patient. The event is SERIOUS and should be reported when the
patient outcome is:
Death
Report if the patient's death is suspected as being a direct outcome of the adverse
event.
Life-Threatening
Report if the patient was at substantial risk of dying at the time of the adverse event
or it is suspected that the use or continued use of the product would result in the
patient's death.
Examples: Pacemaker failure; gastrointestinal hemorrhage; bone marrow
suppression; infusion pump failure which permits uncontrolled free flow resulting in
excessive drug dosing.
Hospitalization (initial or prolonged)
Report if admission to the hospital or prolongation of a hospital stay results because
of the adverse event.
Examples: Anaphylaxis; pseudomembranous colitis; or bleeding causing or
prolonging hospitalization.
What is a Serious Adverse Event?








Disability
Report if the adverse event resulted in a significant, persistent, or permanent change,
impairment, damage or disruption in the patient's body function/structure, physical activities
or quality of life.
Examples: Cerebrovascular accident due to drug-induced hypercoagulability; toxicity;
peripheral neuropathy.
Congenital Anomaly
Report if there are suspicions that exposure to a medical product prior to conception or during
pregnancy resulted in an adverse outcome in the child.
Examples: Vaginal cancer in female offspring from diethylstilbestrol during pregnancy;
malformation in the offspring caused by thalidomide.
Requires Intervention to Prevent Permanent Impairment or Damage
Report if you suspect that the use of a medical product may result in a condition which
required medical or surgical intervention to preclude permanent impairment or damage to a
patient.

Examples: Acetaminophen overdose-induced hepatotoxicity requiring treatment with
acetylcysteine to prevent permanent damage; burns from radiation equipment requiring drug
therapy; breakage of a screw requiring replacement of hardware to prevent malunion of a
fractured long bone.
SAE Reporting


Required in all clinical trials
Problem…
Not sure which outcomes are included in SAE
totals
 Not always reported in publications

Example: Finasteride for BPH
TI Letter #58 Jan-Mar 2006
Example: Finasteride for BPH
TI Letter #58 Jan-Mar 2006
e.g. Lumiracoxib (TARGET Trial)
Lancet 2004; 364: 665–74
e.g. Lumiracoxib (TARGET Trial)
GI Adverse Events
Lancet 2004; 364: 665–74
e.g. Lumiracoxib (TARGET Trial)
CV Adverse Events
Lancet 2004; 364: 665–74
e.g. Lumiracoxib (TARGET Trial)
SAE Data Not Reported in Published
Trial

Total number of patients with SAEs both substudies
pooled:


Total number of patients with SAEs ibuprofen substudy:


Lumiracoxib 297 (7%), Ibuprofen 272 (6%)
Total number of patients with SAEs naproxen substudy:


Lumiracoxib 588 (6%), NSAIDs 566 (6%)
Lumiracoxib 291 (6%), Naproxen 294 (6%)
Regardless of drug attribution
Data received via personal communication with Dr. Hawkey, March 2007
TNT SAE Data Request

From: John C LaRosa [mailto:[email protected]]
Sent: Tuesday, August 14, 2007 1:53 PM
To: Tejani, Aaron
Subject: Re: TNT Serious adverse event data request
David
When you return, can you have someone provde me with answers?
Thanks
John
"Tejani, Aaron" <[email protected]> 08/14/2007 04:27 PM
To<[email protected]> ccSubjectTNT Serious adverse event data request
Dr. La Rosa We are reviewing the TNT trial with our pharmacy students and were wondering if you were
able to answer 2 questions:
1. Could you provide us with the serious adverse event (SAE) rates in both groups?
2. Were the components of the composite outcome considered and counted as SAEs? E.g were Mis
included in the SAE totals?
Many thanks,
Aaron
Sample Size and Study Power


What it takes to calculate sample size
Relation of sample size to the primary
outcome

Issues related to secondary outcomes, subgroup
differences, etc
Components of a Sample Size
Calculation

Power




The ability to detect a difference that truly exists
Type II error (beta): missing a difference that
exists i.e insufficient power
E.g 80% power means a 20% chance of missing
a true difference
Power= 1 - beta
Components of a Sample Size
Calculation

Level of significance






An alpha level must be chosen
Alpha relates to Type I error
Type I error= detecting an effect when none exists
Chosen alpha becomes your p-value
E.g. alpha=0.05 then p of less than 0.05 is significant
What does p-value (= 0.0X) than what we choose tell us?

Treatment effect found is due to chance X % of the time
Components of a Sample Size
Calculation

Underlying population event rate


Look at previous studies to get this
E.g TARGET Trial


Lumiracoxib versus naproxen
What is the expected rate of GI complications with
Naproxen
Components of a Sample Size
Calculation

Size of treatment effect


This should be the minimal clinically important
difference
This should be justified
Components of a Sample Size
Calculation

Adjusted sample size requirement for lack of
compliance


Achievable treatment effect (which is a
component of the sample size calculation) is
dependent on compliance to treatment
E.g trial with 100 people per arm if 100%
compliance…

If only 80% compliance, need 280 per arm
Checklist
MJA 2002;157:256-7.
General Comments


In calculating a sample size a balance is
required between risk of a Type I error versus a
Type II error
Sample size and all assumptions apply to only
the primary endpoint

Likely that Type I and II errors will occur for
anything other than the primary endpoint, even
with subgroups

i.e. be aware that false negatives and false positives
likely to occur
Remember…

“Torture numbers and they will confess to
anything.”

Gregg Easterbrook
Definition of PICO

Four components of PICO

Used to formulate a clinical question before you read the trial
Patient or problem

Description of patient or target disorder
Intervention

Could include exposure, diagnostic test, prognostic factor, therapy or
patient’s perception
Comparison intervention

Relevant most often when looking at therapy questions
Outcome


Clinical outcome of interest to you and your patient
DOES NOT HAVE TO BE WHAT THE AUTHORS MEASURED!
Arch Int Med 2001;134:657-62
CASP RCT Checklist
User’s Guide
Appraising Systematic Reviews and
Meta-analyses
First 2 Screening Questions

1. Did the review ask a focused question?
PICO?
 Recommend thinking of your PCIo first before
reading the review


2. Did the review include the right type of
study?

If it was therapy did they look at RCTs only?
Finding Studies to Include

Should search at least these databases:
Medline/Pubmed
 EMBASE
 Cochrane’s CENTRAL database of RCTs



No language restrictions
Should also search

Reference lists, experts, conference proceedings,
unpublished studies
Assessing Quality of Studies


What are quality indicators of clinical trials?
If you measure quality need to do something
with the information

Do sensitivity analyses
Extracting data



Need at least 2 people extracting data
Need a standard data extraction form
Why?

To avoid transcription errors
Was it appropriate to Meta-analyze?


Does it make sense to combine data from
included trials?
Was heterogeneity assessed?
If found were reasons for this explored?
 If found was random effects model used to analyze
data?

Forest Plots
Final Notes on Systematic Reviews



Systematic, reproducible, defendable
Only as good as the trials included
Main questions to ask:
Did they attempt to get all the trials?
 Did they compile the important information and
critically appraise each trial that was included?


Should be the first place you go to answer
clinical questions…
CASP SR Checklist
User’s Guide for Overviews
Questions or Comments
In dwelling, live close to the ground.
In thinking, keep to the simple.
In conflict, be fair and generous.
In governing, don't try to control.
In work, do what you enjoy.
In family life, be completely present.
When you are content to be simply yourself
and don't compare or compete, everybody will respect you.
Tao Te Ching
Verse 8
Acknowledgements


Therapeutic Initiative Group
CASP





Critical Appraisal Skills Program (CASP)
JAMA Users’ Guides to Evidence-based
practice
Bandolier
Fraser Health Research
Susan and Rosa!

Critical Appraisal Skills

Transcript Critical Appraisal Skills

Directory