Appraising the evidence

Download Report

Transcript Appraising the evidence

Critical appraisal:
Randomized-controlled trials for
Drug Therapy
Nancy J. Lee, PharmD, BCPS
Research fellow, Drug Effectiveness Review Project
Oregon Evidence-based Practice Center
Oregon Health and Science University
To receive 1.0 AMA PRA Category 1 Credit™ you must review this
section and answer CME questions at the end.
Release date: January 2009
Expiration date: January 2012
Attachments
• The attachments tab in the upper right hand
corner contains documents that supplement
the presentation
• Handouts of slides and a glossary of terms can
be found under this tab and are available to
print out for your use
• URL to online resources are also available
Program funding
This work was made possible by a grant from the
state Attorney General Consumer and Prescriber
Education Program which is funded by the multistate settlement of consumer fraud claims regarding
the marketing of the prescription drug Neurontin®.
Continuing education sponsors:
The following activity is jointly sponsored by:
The University of Texas Southwestern Medical Center
and the Federation of State Medical Board’s Research
and Education Foundation.
CME information
Program Speaker/Author: Nancy J. Lee, PharmD, BCPS
Research fellow, Oregon Health and Science University, Oregon Evidence-base
Practice Center, Drug Effectiveness Review Project
Course Director:
Barbara S. Schneidman, MD, MPH
Federation of State Medical Boards Research and Education Foundation, Secretary
Federation of State Medical Boards, Interim President and Chief Executive Officer
Program Directors:
David Pass, MD
Director, Health Resources Commission, Oregon Office for Health Policy and Research
Dean Haxby, PharmD
Associate Professor of Pharmacy Practice, Oregon State University College of Pharmacy
Daniel Hartung, PharmD, MPH
Assistant Professor of Pharmacy Practice, Oregon State University College of Pharmacy
Target Audience: This educational activity is intended for those that are involved with committees involved with
medication use policies and for health care professionals who are involved with medication prescribing.
Educational Objectives: Upon completion of this activity, participants should be able to: recognize differences
between critical appraisal and quality assessment /internal validity; review the general steps involved in critical
appraisal; discuss various components of internal validity; to review important statistical concepts important in
critical appraisal; recognize importance of clinical insight and experience in critical appraisal.
CME policies
Accreditation: This activity has been planned and implemented in accordance with the Essential Areas & Policies
of the Accreditation Council for Continuing Medical Education through the joint sponsorship of The University of
Texas Southwestern Medical Center and the Federation of State Medical Boards Research and Education
Foundation. The University of Texas Southwestern Medical Center is accredited by the ACCME to provide
continuing medical education for physicians.
Credit Designation: The University of Texas Southwestern Medical Center designates this educational activity for
a maximum of 1.0 AMA PRA Category 1 Credit™. Physicians should only claim credit commensurate with the
extent of their participation in the activity.
Conflict of Interest: It is the policy of UT Southwestern Medical Center that participants in CME activities should
be made aware of any affiliation or financial interest that may affect the author’s presentation. Each author has
completed and signed a conflict of interest statement. The faculty members’ relationships will be disclosed in the
course material.
Discussion of Off-Label Use: Because this course is meant to educate physicians with what is currently in use and
what may be available in the future, “off-label” use may be discussed. Authors have been requested to inform
the audience when off-label use is discussed.
DISCLOSURE TO PARTICIPANTS
It is the policy of the CME Office at The University of Texas Southwestern Medical Center to ensure balance,
independence, objectivity, and scientific rigor in all directly or jointly sponsored educational activities.
Program directors and speakers have completed and signed a conflict of interest statement disclosing a financial
or other relationship with a commercial interest related directly or indirectly to the program.
Information and opinion offered by the speakers represent their viewpoints. Conclusions drawn by the audience
should be derived from careful consideration of all available scientific information. Products may be discussed in
treatment outside current approved labeling.
FINANCIAL RELATIONSHIP DISCLOSURE
Faculty
David Pass, MD
Dean Haxby, PharmD
Daniel Hartung, PharmD, MPH
Nancy Lee, PharmD, BCPS
Barbara S. Schneidman, MD, MPH
Type of Relationship/Name of Commercial Interest(s)
None
Employment/CareOregon
None
None
None
Learning objectives
• Recognize difference between overall critical appraisal
of evidence and quality assessment / internal validity
– What is it and why is it necessary?
• Review general steps involved in critical appraisal
• Discuss various components of internal validity
• Review important statistical concepts important in
critical appraisal
• Recognize importance of clinical insight and experience
in critical appraisal
What is critical appraisal?
• Process of examining research
evidence to evaluate its validity,
results, and relevance before making
an informed decision
• It’s not an exact science and it won’t
give us the “right” answers
• Foundation to practicing thoughtful
“evidence-based” or “evidenceinformed” medicine
Hill, et al. Bandolier Volume 3 (2). http://www.evidence-based-medicine.co.uk
Why is it necessary?
• Not all publications are equally convincing or reliable
(even if published in a reputable journal).
–
–
–
–
Incorrect interpretation
Fraud and misrepresentation of results
Data dredging
Data dumping
• Sometimes clinical experience and theory based on
pathophysiology can be misleading.
• Systematically examining the literature increases our
confidence in our strengths and shed light on areas
of weakness
Turner, et al. NEJM2008;358:252-60
Benefits and challenges
• Benefits
– Encourages objective
assessment of the
literature
– Recognize breadth and
depth of evidence base
in a particular topic area
• Challenges
– Time intensive at first
– Generates more
questions than answers
– Potential to highlight
lack of good evidence
making decision making
challenging
Critical appraisal requires that you
ask yourself:
1.
2.
3.
4.
5.
Is this relevant?
Is this valid?
Is this reliable?
Is this important and meaningful?
Is this applicable or generalizable?
1. Is this article even relevant?
• Should I read it?
– Title
– Abstract
– Introduction
• Does this study identify a gap in the evidence?
• Were the objectives clear and focused?
• Should I continue?
– Methods
• Were inclusion and exclusion criteria clearly stated?
• Are the outcomes patient-oriented or surrogate markers for
long-term health outcomes?
• How long was the study? (study duration)
2. Is this valid?
• INTERNAL validity or study quality
– Was the design, methods, and conduct of a study
likely to have prevented or minimized bias in such
a way that I can trust the findings?
• With the information provided, could I reproduce this
study and observe similar findings?
“Trust no one unless you have eaten much salt with him.”
-Cicero
“Skepticism is the chastity of the intellect…”
-George Santayana
A few words about “quality”
• Means different things to different people
• In the context of this module, “quality” refers
to methodologic quality or…
– Study quality = quality assessment = internal
validity
– NOT the same as quality of reporting
• Remember: “not reported” ≠ “not performed”
• Sometimes difficult to differentiate between the two
• Subjective process—may use dual review
Threats to
INTERNAL validity
Selection bias, Performance bias, Detection bias, Attrition bias
A.
B.
C.
D.
E.
F.
Randomization
Allocation concealment
Blinding
Attrition
Statistical analysis
Other
i. Post-randomization exclusions
ii. Crossovers
Study Quality
Is this valid?: INTERNAL validity
A. Randomization:
– Adequate (unbiased): computerized random number
generator, random number table
– Inadequate (biased): by hospital number, date of birth,
alternate assignment
B. Allocation concealment
– Adequate (unbiased): interactive voice response
system, sealed, opaque envelopes that are coded and
handled by a third party (centralized or pharmacycontrolled)
– Inadequate (biased): serially numbered envelopes
(even sealed opaque envelopes can be subject to
manipulation), open lists
Were both treatment groups fairly balanced?
Example
Primary outcome: cardiovascular events or deaths
Tolbutamide
N= 204
(% of subjects)
Placebo
N= 205
(% of subjects)
Age >55
50
41
Digitalis use
8
5
Angina
7
5
ECG abnormality
4
3
Total chol >300 mg/dL
15
9
Fasting glucose >110
mg/dL
72
64
Hypertension
30
37
Adapted from Elwood. Critical appraisal of Epi studies and Clin trials 1998
Is this valid?: INTERNAL validity
C. Blinding
– Single-blind, double-blind, triple-blind, open-label,
double-dummy
• Who was blinded?
• Was blinding maintained?
– Is blinding essential or possible in every situation?
• Important when outcome measures involve some
subjectivity
• May be less important when outcome measure is death
Therapy administered in the comparator arm
should be as “identical” to the therapy
administered in the treatment arm
Examples
• Aspirin 1 gram vs. placebo post-MI
– Double-blind
– Risk of bleeding?
• Ascorbic acid 1 gram vs. placebo for the common
cold
– Double-blind
– Taste difference?
• Esomeprazole vs. Omeprazole for erosive
esophagitis
– Double-dummy
– Appearance?
Is this valid?: INTERNAL validity
D. Attrition
– Was the total number of participants who
withdrew reported for each group?
– Were reasons for withdrawal provided?
• Includes: adverse events, lost to follow-up, protocol
violation, or lack of efficacy
Is this valid?: INTERNAL validity
• Commonly reported methods of analysis
– Intention-to-treat
• Always verify the numbers yourself
• Practical issue: depending on the reason behind the
missing data, may allow for <3-5% difference in
baseline ITT numbers.
– Other popular approaches: last observation
carried forward (LOCF), as-treated or Perprotocol analyses, data not imputed, mixed
modeling, etc.
Is this valid?: INTERNAL validity
E. Statistical analysis
– Was the method appropriate?
– What is the potential for type I
or type II error?
• Adequate power?
• False positive
• False negative
– Selective analysis of data 
selective reporting
• Example: calculating statistical
significant p-value for A1c at week
26 instead of week 52
Is this valid?: INTERNAL validity
F. Other
- Post-randomization exclusions, crossovers,
contamination?
•
•
Were any groups of participants excluded during the
course of study?
Why? Was this significant?
Is this valid? For Harms
• Apply similar concepts and also ask:
– How was harms monitored?
• Active or passive methods?
– Who assessed the harms?
• Study investigator or third party?
– When and how often were the assessments
conducted?
• Face-to-face or over the phone?
• Various terms used
– Safety = fading out (except with FDA)
– Adverse effect= undesirable outcome with
reasonable causal association
– Adverse event= undesirable outcome with
unknown causal association
– Tolerability= ability or willingness to tolerate
unpleasant drug-related events without serious or
permanent sequelae
Chou, et al. J Clin Epi 2008. Sept 25 (Epub ahead of print, in press)
Tools for assessing internal validity
• There are > 25 different scales and tools for
assessing the internal quality of a trial
– Jadad scale
– Chalmers scale
– Cochrane Risk of Bias tool
– DERP method
• Adapted from US Preventative Task Force (USPTF) and
National Health Service Centre for Reviews and
Dissemination (UK)
Example: Jadad scale
Item
Score
1. Was the study described as randomized?
0 or 1
2. Was the method used to generate the sequence of
randomization described and was it appropriate (ie, computer
generated, table of random numbers)?
0 or 1
3. Was the study described as double-blind?
0 or 1
4.
appropriate?
0 or 1
5. Was there a description of withdrawals and dropouts?
0 or 1
Deduct 1 point if the method used to generate the sequence of
randomization was described but inappropriate (ie, allocated
alternately or according to date of birth or hospital number)?
0 or -1
Deduct 1 point if the study was described as double-blind but the
method of blinding was inappropriate (ie, comparison of tablet vs.
injection without a double dummy)?
0 or -1
Scoring range: 0-5
Poor quality
<3was it
Was the method of double-blinding
described and
Example: Cochrane “Risk of bias” tool
http://www.ohg.cochrane.org/forms/Risk%20of%20bias%20assessment%20tool.pdf
Example: DERP method
Author
Author
Example
The Center for Evidence-based Medicine. Oxford. http://www.cebm.net
1.
2.
3.
4.
5.
Is this relevant?
Is this valid?
Is this reliable?
Is this important and meaningful?
Is this applicable or generalizable?
3. Are the results reliable?
• Were all the results reported?
– Was there evidence of selective outcome reporting?
– How were the results reported? And are they easy to
read or determine?
• How large is the treatment effect?
– Relative risk, relative risk reduction, odds ratio, absolute
risk reduction, number needed to treat
• How precise is the estimate of the effect?
– How narrow or wide is the confidence interval?
– Where does the point estimate fall?
“Do not put your faith in what statistics say until you
have carefully considered what they do not say. ”
-William Watt
• Brief overview of:
– Relative risk
Relative measures
– Odds ratio
– Absolute risk
Absolute measures
– Number needed to treat
– P-value
– Confidence intervals
Is this reliable?: Interpreting data
• Relative risk (RR)= event rate or risk ratio
Risk in treatment arm
Risk in control arm
– RR = 1 (no difference)
– RR < 1 (intervention lowers the risk of the outcome)
– RR > 1 (treatment increases the risk of the outcome)
• Relative risk reduction (RRR)
RRR= (1-RR) x 100
Limitations of risk ratios
• Study 1 (outcome): death from any cause
– Treatment: 1%
– Placebo: 2%
– RR= 0.50 and RRR= 50%
• Study 2 (outcome): death from any cause
– Treatment: 25%
– Placebo: 50%
– RR= 0.50 and RRR= 50%
Is this reliable?: Interpreting data
• Odds for an event within a single group
Odds of an event occurring
=
Odds of an event NOT occurring
p
1–p
• Odds ratio compares the odds across groups
Odds of an event occurring in group A
Odds of an event occurring in group B
=
p/ (1 – p)
q/ (1 – q)
– OR = 1 (no difference)
– OR < 1 (lowers the odds of experiencing the outcome)
– OR > 1 (increases the odds of experiencing the outcome)
Zhang J, et al. JAMA 1998; 280 (19):1690-1. Figure shown in this slide is from this article.
Is this reliable?: Interpreting data
• Absolute risk reduction or Risk difference
ARR = Risk on control – Risk on treatment
• Number needed to treat to benefit or harm
NNT = 1/ARR x 100
– Should include: duration of follow-up and the
control group event rate
Example
New Antiplatelet for Patients with Acute Myocardial Infarction
Conclusion: The new antiplatelet medication (MiHaart®) for acute
myocardial infarction is more effective than placebo with a 25%
reduction in mortality after 60 days of treatment.
Mortality (# events)
Placebo
(N=1000)
MiHaart®
(N=1000)
Composite
250
187.5
High risk patients
200
150
Low risk patients
50
37.5
This example is based on a fictional medication (MiHaart).
Crunching the numbers
Mortality (# events)
Placebo
(N=1000)
MiHaart®
(N=1000)
Composite
250
187.5
High risk patients
200
150
Low risk patients
50
37.5
Risk
MiHaart = 187.5/1000 = 0.1875
Placebo = 250/1000 = 0.250
Absolute risk (rate)
MiHaart = 18.75%
Placebo = 25.0%
Relative risk
MiHaart = 0.1875 = 0.75
Placebo
0.250
Absolute risk reduction
25.0% - 18.75% = 6.25%
Relative risk reduction
1- 0.75 x 100 = 25%
Crunching the numbers
RR
RRR
ARR
NNT
Composite
0.75
25%
6.25%
16
High risk patients
0.75
25%
5%
20
Low risk patients
0.75
25%
1.25%
80
RR and RRR tend to provide inflated magnitude of effect
compared with the ARR but…
in order to determine if any of these values is a good
point estimate of mortality, we must evaluate the
confidence interval in which it lies.
Is this reliable?: Interpreting data
• How can we determine if the
“point estimate” is a good
reflection of the “true” value?
– The utility of the P-value
– Role of the confidence interval (CI)
The P-value
• Used to measure statistical significance in
epidemiology.
• By convention, p-value typically set at 0.05 ()
assumes that an event at a rate ≥ 1 in 20 is
unlikely to be due to random chance alone.
• The smaller the p-value, the more unlikely the
“point estimate” was due to random chance.
• Does not provide the possible range of the true
differences of the “point estimate”
The problem with P-values
• P-values do not give indication of:
– Treatment effect size
– Precision of estimate
– Direction of effect
• Degrades data measures into dichotomous
judgments
– Significant (P<0.05)
– Not Significant (P>0.05; P=NS)
• Does not protect against Type I or Type II errors
– Non-significant P-value = “Negative Trial”
Absence of evidence is NOT evidence of absence
The confidence interval
• More useful than P-value in evaluating results
• Provides a range of possible values for the “true”
treatment value
– Width of a CI is a function of sample size
• Can be calculated for means, medians,
proportions, odds ratios, relative risks, NNT.
– 95% = most commonly calculated
– Can go to http://www.openepi.com to help you
calculate confidence intervals for treatment measures
Same example
New Antiplatelet for Patients with Acute Myocardial Infarction
Conclusion: The new antiplatelet medication (MiHaart®) for acute
myocardial infarction is more effective than placebo with a 25%
reduction in mortality after 60 days of treatment.
ARR
95% CI
NNT
95% CI
6.25%
2.6 to 9.9
16
(benefit)
10 to 38.5
High risk patients
5%
1.7 to 8.3
20
(benefit)
12 to 59
Low risk patients
1.25%
-0.5 to 3.0
80
33 (benefit) to
 and 200 to
 (harm)
Composite
Confidence intervals and
trials that appear “negative”
• Swedish Cooperative Stroke Study (N=505)
– Aspirin= 9% nonfatal stroke
– Placebo= 7% nonfatal stroke
– Risk difference= -2%
– 95% CI (-7% to 3%)
Guyatt G, et al. CMAJ 1995; 152 (2):169-73.
Trials that appear “negative”
Statistically: no difference
Clinically: ?
Definitely negative
5%
-7%
0
PLACEBO
Inadequate sample size
3%
ASPIRIN
Confidence intervals and
trials that appear “positive”
• Enalapril in LV Dysfunction, SOLVD (N=1285)
– Enalapril= 47.7% died or worsening HF
– Placebo= 57.3% died or worsening HF
– Risk difference= 9.6% (~10%)
– 95% CI (6% to 14%)
Guyatt G. CMAJ 1995; 152(2): 169-73.
Trials that appear “positive”
Statistically: there is a difference
Clinically: ?
Let’s say, clinically relevant lower
bound is 7% = inadequate sample size
0
PLACEBO
Definitely relevant
6%
ENALAPRIL
14%
1.
2.
3.
4.
5.
Is this relevant?
Is this valid?
Is this reliable?
Is this important and meaningful?
Is this applicable or generalizable?
4. Making sense of it all:
Is this important and meaningful?
• Do the results make sense?
• Do the results provide anything new?
– Do the results confirm a prior conclusion?
• Will this change my practice?
• Do the concluding remarks match the results?
5. Is this applicable or generalizable?
• EXTERNAL validity = term phasing out
– New terms: APPLICABILITY or GENERALIZABILITY
– Was enough information regarding population
(eligibility criteria), interventions, outcomes,
study design, and setting reported such that I can
apply the results to my patients or generalize the
findings to a broader population?
5. Is this applicable or generalizable?
• Population
– Recruitment methods?
– Disease severity or duration of illness?
– Run-in periods?
• Interventions
– Study medication naïve?
– Dose, duration, other allowed interventions, adherence?
– Level of training for those who assessed intervention?
• Outcomes
– Long term health outcomes relevant to patients?
– Intermediate (or surrogate) markers used?
• Setting
– Specialty setting or general setting (in-or outpatient)?
– Country?
Summary
1.
2.
3.
4.
5.
Is this relevant?
Is this valid?
Is this reliable?
Is this important and significant?
Is this applicable or generalizable?
• Developing critical appraisal skill set is
important for providing quality care
–
Brings awareness of current medical practices and
highlights areas where more research is needed
Acknowledgements
• Attorney General Consumer and Prescriber
Education Program
• Members of the technical advisory committee of
this grant
• Office for Oregon Health Policy and Research
• The University of Texas Southwestern Medical
Center
• The Federation of State Medical Board’s Research
and Education Foundation
CME instructions
• Please complete the survey, CME questions,
and program evaluation after this slide
• Don’t forget to click the finish button at the
end of the CME questions
• You should be directly linked to a CME form
which you will need to fill out and fax, email,
or mail in order to receive credit hours
PROPERTIES
On passing, 'Finish' button:
On failing, 'Finish' button:
Allow user to leave quiz:
User may view slides after quiz:
User may attempt quiz:
Goes to URL
Goes to Next Slide
After user has completed quiz
At any time
Unlimited times