Development of Clinical Practice Guidelines and the GRADE

Download Report

Transcript Development of Clinical Practice Guidelines and the GRADE

RATING THE EVIDENCE:
USING GRADE TO DEVELOP
CLINICAL PRACTICE GUIDELINES
AHRQ Annual Meeting 2009:
"Research to Reform: Achieving Health System Change"
September 14, 2009
Yngve Falck-Ytter, M.D.
Case Western Reserve University, Cleveland, Ohio
Holger Schünemann, M.D., Ph.D.
Chair, Department of Clinical Epidemiology & Biostatistics
Michael Gent Chair in Healthcare Research
McMaster University, Hamilton, Canada
Disclosure
In the past 5 years, Dr. Falck-Ytter received no personal
payments for services from industry. His research group
received research grants from Three Rivers, Valeant and Roche
that were deposited into non-profit research accounts. He is a
member of the GRADE working group which has received
funding from various governmental entities in the US and
Europe. Some of the GRADE work he has done is supported in
part by grant # 1 R13 HS016880-01 from the Agency for
Healthcare Research and Quality (AHRQ).
Content
Part 1
 Introduction
Part 2
 Why revisiting guideline methodology?
Part 3
 The GRADE approach
 Quality of evidence
Part 4
 The GRADE approach
 Strength of recommendations
Q to audience
 Involved in giving recommendations?
 Using any form of grading system?
 Familiarity with GRADE:
 Heard about GRADE before this conference?
 Read a GRADE article published by the GRADE
working group?
 Attended a GRADE presentation?
 Attended a hands-on GRADE workshop?
Reassessment of clinical
practice guidelines
 Editorial by Shaneyfelt and Centor (JAMA 2009)
 “Too many current guidelines have become
marketing and opinion-based pieces…”
 “AHA CPG: 48% of recommendations are based on
level C = expert opinion…”
 “…clinicians do not use CPG […] greater concern […]
some CPG are turned into performance measures…”
 “Time has come for CPG development to again be
centralized, e.g., AHQR…”
Evidence-based clinical decisions
Clinical state and
circumstances
Patient values and
preferences
Expertise
Research evidence
Equal for all
Haynes et al. 2002
Before GRADE
Level of
evidence
Source of evidence
I
SR, RCTs
II
Cohort studies
Grades of
recomend.
A
B
III
Case-control studies
IV
Case series
C
V
Expert opinion
D
Oxford Centre of Evidence Based Medicine; http://www.cebm.net
7
Where GRADE fits in
Prioritize problems, establish panel
Systematic review
Searches, selection of studies, data collection and analysis
Prepare evidence profile:
Quality of evidence for each outcome and summary of findings
Assess overall quality of evidence
Decide direction and strength of recommendation
Draft guideline
Consult with stakeholders and / or external peer reviewer
Disseminate guideline
Implement the guideline and evaluate
GRADE
Assess the relative importance of outcomes
GRADE uptake
GRADE –
WHY REVISITING GUIDELINE METHODOLOGY?
10
Disclosure
Dr. Schünemann receives no personal payments for
service from the pharmaceutical industry. The research
group he belongs to received research grants from the
industry that are deposited into research accounts.
Institutions or organizations that he is affiliated with likely
receive funding from for-profit sponsors that are
supporting infrastructure and research that may serve his
work.
He is documents editor for the American Thoracic Society
and co-chair of the GRADE Working Group.
Content
 Why grading
 Confidence in information and
recommendations
Intro to:
 Quality of evidence
 Strength of recommendations
Please discuss the
difference between consensus
statements and guidelines?
Be prepared to discuss your answer
13
There are no RCTs!
 Do you think that users of recommendations
would like to be informed about the basis
(explanation) for a recommendation or
coverage decision if they were asked (by their
patients)?
 I suspect the answer is “yes”
 If we need to provide the basis for
recommendations, we need to say whether
the evidence is good or not so good; in other
words perhaps “no RCTs”
14
Hierarchy of evidence
STUDY DESIGN

Randomized Controlled
Trials

Cohort Studies and Case
Control Studies

Case Reports and Case
Series, Non-systematic
observations
BIAS
Expert Opinion
Confidence in evidence
 There always is evidence
 “When there is a question there is evidence”
 Better research  greater confidence in the
evidence and decisions
Who can explain the following?
 Concealment of randomization
 Bias, confounding and effect modification
 Blinding (who is blinded in a double blinded trial?)
 Intention to treat analysis and its correct
application
 Why trials stopped early for benefit overestimate
treatment effects?
 P-values and confidence intervals
Hierarchy of evidence

Randomized Controlled
Trials

Cohort Studies and Case
Control Studies

Case Reports and Case
Series, Non-systematic
observations
Expert Opinion
BIAS
Expert Opinion
Expert Opinion
STUDY DESIGN
Reasons for grading evidence?
Appraisal of evidence has become complex and
daunting
 People draw conclusions about the
 quality of evidence and strength of recommendations
 Systematic and explicit approaches can help
 protect against errors, resolve disagreements
 communicate information and fulfil needs
 Change practitioner behavior
 However, wide variation in approaches
GRADE working group. BMJ. 2004 & 2008
Which grading system?
Recommendation for use of oral anticoagulation
in patients with atrial fibrillation and rheumatic
mitral valve disease
Evidence
B
A
 IV
Recommendation
Class I
1
C
Organization
 AHA
 ACCP
 SIGN
What to do?
Recommendations vs
statements!
22
Limitations of older systems
& approaches
 confuse quality of evidence with strength of
recommendations
Levels of evidence
Recommendations
Limitations of older systems
& approaches
 confuse quality of evidence with strength of
recommendations
 lack well-articulated conceptual framework
 criteria not comprehensive or transparent
 focus on single outcomes
GRADE Quality of Evidence
In the context of a systematic review
 The quality of evidence reflects the extent to
which we are confident that an estimate of effect
is correct.
In the context of making recommendations
 The quality of evidence reflects the extent to
which our confidence in an estimate of the effect is
adequate to support a particular recommendation.
What makes you confident in
health care decisions
28
Confident in the evidence?
A meta-analysis of observational studies
showed that bicycle helmets reduce the risk of
head injuries in cyclists.
OR: 0.31, 95%CI: 0.26 to 0.37
A meta-analysis of observational studies
showed that warfarin prophylaxis reduces the
risk of thromboembolism in patients with
cardiac valve replacement.
RR: 0.17, 95%CI: 0.13 to 0.24
29
30
GRADE: Quality of evidence
The extent to which our confidence
in an estimate of the treatment effect
is adequate to support a particular recommendation.
GRADE defines 4 categories of quality:
 High
 Moderate
 Low
 Very low
31
Quality of evidence across
studies
Outcome #1
Outcome #2
Outcome #3
Quality: High
Quality: Moderate
Quality: Low
III
V
II
IB
Determinants of quality
 RCTs start high
 Observational studies start low
What is the study design?
34
Determinants of quality
What lowers quality of evidence? 5 factors:
Methodological
limitations
Inconsistency
of results
Indirectness
of evidence
Imprecision
of results
Publication
bias
Methodological
limitations
Inconsistency
of results
Indirectness
of evidence
Imprecision
of results
Assessment of detailed design and execution
(risk of bias)
For RCTs:
 Lack of allocation concealment
 No true intention to treat principle
 Inadequate blinding
 Loss to follow-up
 Early stopping for benefit
Publication
bias
Allocation concealment
250 RCTs out of 33 meta-analyses
Allocation concealment:
adequate
unclear
not adequate
Effect
(Ratio of OR)
1.00 (Ref.)
0.67 [0.60 – 0.75]
0.59 [0.48 – 0.73]
*
* significant
Schulz KF et al. JAMA 1995
37
5 vs 4 chemo-Rx cycles for AML
Studies stopped early becasue of
benefit
What about scoring tools?
Example: Jadad score
Was the study described as randomized?
Adequate description of randomization?
Double blind?
1
1
1
Method of double blinding described?
Description of withdrawals and dropouts?
1
1
Max 5 points for quality
Jadad AR et al. Control Clin Trials 1996
40
Cochrane Risk of bias graph in
RevMan 5
41
Methodological
limitations
Inconsistency
of results
Indirectness
of evidence
Imprecision
of results
Publication
bias
 Look for explanation for inconsistency
 patients, intervention, comparator, outcome, methods
 Judgment
 variation in size of effect
 overlap in confidence intervals
 statistical significance of heterogeneity
 I2
Heterogeneity
Neurological or vascular complications or death within 30 days of endovascular treatment
(stent, balloon angioplasty) vs. surgical carotid endarterectomy (CEA)
43
Methodological
limitations
Inconsistency
of results
Indirectness
of evidence
Imprecision
of results
Publication
bias
 Indirect comparisons
 Interested in head-to-head comparison
 Drug A versus drug B
 Tenofovir versus entecavir in hepatitis B treatment
 Differences in
 patients (early cirrhosis vs end-stage cirrhosis)
 interventions (CRC screening: flex. sig. vs colonoscopy)
 comparator (e.g., differences in dose)
 outcomes (non-steroidal safety: ulcer on endoscopy vs
symptomatic ulcer complications)
Methodological
limitations
Inconsistency
of results
Indirectness
of evidence
Imprecision
of results
Small sample size
 small number of events
 wide confidence intervals
 uncertainty about magnitude of effect
Publication
bias
Imprecision
Any stroke (or death) within 30 days of endovascular treatment (stent, balloon angioplasty)
vs. surgical carotid endarterectomy (CEA)
Methodological
limitations
Inconsistency
of results
Indirectness
of evidence
 Reporting of studies
 publication bias
 number of small studies
Imprecision
of results
Publication
bias
All phase II and III
licensing trial for
antidepressant
drugs between 1987
and 2004 (74 trials –
23 were not
published)
Quality assessment criteria
Quality of
evidence
Study
design
Lower if…
High
Randomized
trial
Study limitations
(design and execution)
Moderate
Low
Very low
Inconsistency
Observational
study
Indirectness
Imprecision
Higher if…
What can
raise the
quality of
evidence?
Publication bias
49
50
Quality assessment criteria
Quality of
evidence
Study
design
Lower if…
Higher if…
High
Randomized
trial
Study limitations
Large effect (e.g., RR 0.5)
Very large effect (e.g., RR 0.2)
Inconsistency
Evidence of dose-response
gradient
Indirectness
All plausible confounding
would reduce a
demonstrated effect
Moderate
Low
Very low
Observational
study
Imprecision
Publication bias
51
Conceptualizing quality
High
Further research is very unlikely to change our
confidence in the estimate of effect
Moderate
Further research is likely to have an important impact on
our confidence in the estimate of effect and may change
the estimate
Low
Further research is very likely to have an important
impact on our confidence in the estimate of effect and is
likely to change the estimate
Very low
Any estimate of effect is very uncertain














52
Outcome
Critical
Outcome
Important
Outcome
Important
Outcome
Not
Formulate recommendations:
• For or against (direction)
• Strong or weak (strength)
By considering:
 Quality of evidence
 Balance benefits/harms
 Values and preferences
Revise if necessary by considering:
 Resource use (cost)
Overall quality of evidence
Critical
Grade down or up
P
I
C
O
Outcome
GRADE evidence profile
GRADE - FROM EVIDENCE TO
DECISONS
55
Strength of recommendations
Desirable effects
• health benefits
• less burden
• savings
Undesirable effects
• harms
• more burden
• costs
Developing recommendations
Strength of recommendation
 “The strength of a recommendation reflects
the extent to which we can, across the range
of patients for whom the recommendations
are intended, be confident that desirable
effects of a management strategy outweigh
undesirable effects.”
 Strong or weak/conditional
Quality of evidence &
strength of recommendation
 GRADE separates quality of evidence from
strength of recommendation
 Linked but no automatism
 Other factors beyond the quality of evidence
influence our confidence that adherence to a
recommendation causes more benefit than harm
What makes Guidelines Evidence-Based
in 2009?
Standardized Reporting of Clinical Practice Guidelines: A Proposal from
the Conference on Guideline Standardization
Checklist for reporting: 18 items
14. Recommendations and
rationale - state the
recommended action
precisely. Indicate the quality
of evidence and the
recommendation strength.
Ann Intern Med. 2003
What makes Guidelines Evidence-Based
in 2009?
Standardized Reporting of Clinical Practice Guidelines: A Proposal from
the Conference on Guideline Standardization
Checklist for reporting: 18 items
16. Patient preferences describe the role of patient
preferences when a
recommendation involves a
substantial element of
personal choice or values.
Ann Intern Med. 2003
A COPD guideline – do you want your review used
like this?
Another COPD guideline
And another COPD guideline
What to do?
Current state of
recommendations
66
Current state of
recommendations
 Reviewed 7527 recommendations
 1275 randomly selected
 Inconsistency across/within
 31.6% did not recommendations clearly
 Most of them not written as executable actions
 52.7% did not indicated strength
67
Yale Guideline Corpus
 1. Identify the critical recommendations in guideline
text using semantic indicators
 2. Use consistent semantic and formatting indicators
throughout the publication
 3. Group recommendations together in a summary
section
 4. Do not use assertions of fact as recommendations.
 5. Clearly and consistently assign evidence quality and
recommendation strength in proximity
 distinguish between the distinct concepts of quality of
evidence and strength of recommendation.
68
Challenges in wording
recommendations
 Need to express (two) levels
 Need to express direction
 Differences across languages
Strong recommendation for
Weak recommendation for
Weak recommendation
against
Strong recommendation
against
Wording 1
We recommend…
We suggest
Wording 2
Clinicians should…
Clinicians might…
We suggest...not
Clinicians might
not…
Clinicians should
not…
We recommend
…not
Wording 3
We recommend…
We conditionally
recommend…
We conditionally
recommend...not
We recommend …not
 Need codes (letters, symbols, numbers)
70
Categories of
recommendations
Although the degree of confidence is a
continuum, we suggest using two categories:
strong and weak/conditional.
 Strong recommendation: the panel is
Recommend
confident that the desirable effects of
adherence to a recommendation outweigh the  
undesirable effects.
 Weak recommendation: the panel concludes
that the desirable effects of adherence to a
recommendation probably outweigh the
undesirable effects, but is not confident.
Suggest
? ?
Implications of a strong
recommendation
 Patients: Most people in your situation would
want the recommended course of action and
only a small proportion would not
 Clinicians: Most patients should receive the
recommended course of action
 Policy makers: The recommendation can be
adapted as a policy in most situations
Implications of a
weak/conditional recommendation
 Patients: The majority of people in your
situation would want the recommended
course of action, but many would not
 Clinicians: Be prepared to help patients to
make a decision that is consistent with their
own values
 Policy makers: There is a need for substantial
debate and involvement of stakeholders
Case scenario
A 13 year old girl who lives in rural Indonesia presented with flu
symptoms and developed severe respiratory distress over the
course of the last 2 days. She required intubation. The history
reveals that she shares her living quarters with her parents and
her three siblings. At night the family’s chicken stock shares
this room too and several chicken had died unexpectedly a
few days before the girl fell sick.
Interventions: antivirals, such as neuraminidase inhibitors
oseltamivir and zanamivir
Relevant healthcare question?
Clinical question:
Population:
Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir (or Zanamivir)
Comparison: No pharmacological intervention
Outcomes:
Mortality, hospitalizations,
resource use, adverse outcomes,
antimicrobial resistance
WHO Avian Influenza GL. Schunemann et al.,
The Lancet ID, 2007
How would you make decisions?
76
Judgements about the
strength of a recommendation
 No precise threshold for going from a strong to a weak
recommendation
 The presence of important concerns about one or more
of these factors make a weak recommendation more
likely.
 Panels should consider all of these factors and make the
reasons for their judgements explicit.
 Recommendations should specify the perspective that
is taken (e.g. individual patient, health system) and which
outcomes were considered (including which, if any
costs).
Evidence Profile
Oseltamivir for treatment of H5N1 infection:
Summary of findings
Quality assessment
No of studies
(Ref)
Design
Limitations
Consistency
No of patients
Other
considerations
Directness
Effect
Oseltamivir
Placebo
Relative
(95% CI)
Absolute
(95% CI)
Quality
Importance
Healthy adults:
Mortality
0
Hospitalisation (Hospitalisations from influenza – influenza cases only)
-
-
-
-
-
5
(TJ 06)
Imprecise or
sparse data (-1)
-
-
OR 0.22
(0.02 to 2.16)
-

Very low
6
-
-
-
-
-
-
7
2/982
(0.2%)
9/662
(1.4%)
RR 0.149
(0.03 to 0.69)
-

Very low
8
Randomised
trial
No limitations One trial only
-
Major
uncertainty
(-2)1
9
Duration of hospitalization
0
LRTI (Pneumonia - influenza cases only)
5
(TJ 06)
Randomised
trial
-
No limitations One trial only
-
Major
uncertainty
(-2)1
Imprecise or
sparse data (-1)2
Duration of disease (Time to alleviation of symptoms/median time to resolution of symptoms – influenza cases only)
Randomised
53
No limitations4 Important
trials
inconsistency
(TJ 06)
(DT 03)
(-1)5
Viral shedding (Mean nasal titre of excreted virus at 24h)
26
(TJ 06)
Randomised
trials
No limitations
-7
Major
uncertainty
(-2)1
-
-
-
HR 1.303
(1.13 to 1.50)
-

Very low
5
Major
uncertainty
(-2)1
None
-
-
-
WMD -0.738
(-0.99 to -0.47)

Low
4
-
-
-
-
-
-
4
-
-
-
-
-
-
7
-
-
-
-
-
-
7
Imprecise or
sparse data (-1)14
-
-
OR range15
(0.56 to 1.80)
-

Low
-
-
-
-
-
-
Outbreak control
0
Resistance
-
-
-
-
0
Serious adverse effects (Mention of significant or serious adverse effects)
09
Minor adverse effects
311
(TJ 06)
10
-
-
-
(number and seriousness of adverse effects)
Randomised
trials
No limitations
-12
Some
uncertainty
(-1)13
Cost of drugs
0
-
-
-
-
4
Oseltamivir for Girl with
Avian Flu
Summary of findings:
 No clinical trial of oseltamivir for treatment of
H5N1 patients.
 4 systematic reviews and health technology
assessments (HTA) reporting on 5 studies of
oseltamivir in seasonal influenza.
 Hospitalization: OR 0.22 (0.02 – 2.16)
 Pneumonia: OR 0.15 (0.03 - 0.69)




3 published case series.
Many in vitro and animal studies.
No alternative that is more promising at present.
Cost: ~ $45 per treatment course
What are the factors that
determine your decisions?
80
GRADE: Factors influencing
decisions and
recommendations
 Quality of Evidence
 Balance of desirable and undesirable
consequences
 Values and preferences
 Cost
81
Determinants of the strength
of recommendation
Factors that can strengthen a Comment
recommendation
Quality of the evidence
The higher the quality of evidence, the
more likely is a strong
recommendation.
Balance between desirable
The larger the difference between the
and undesirable effects
desirable and undesirable
consequences, the more likely a strong
recommendation warranted. The
smaller the net benefit and the lower
certainty for that benefit, the more likely
weak recommendation warranted.
Values and preferences
The greater the variability in values and
preferences, or uncertainty in values
and preferences, the more likely weak
recommendation warranted.
Costs (resource allocation)
The higher the costs of an intervention
– that is, the more resources
consumed – the less likely is a strong
recommendation warranted
Determinants of the strength
of recommendation
Factors that can weaken the
strength of a recommendation.
Example:
Lower quality evidence
Decision
□
□
Uncertainty about the balance of
benefits versus harms and burdens
Uncertainty or differences in values
□
□
□
□
Uncertainty about whether the net
benefits are worth the costs
□
□
Explanation
Yes
No
Yes
No
Yes
No
Yes
No
Table. Decisions about the strength of a recommendation
Frequent “yes” answers will increase the likelihood of a weak recommendation
Oseltamivir – Avian Influenza
Factors that can weaken the strength
of a recommendation. Example:
treatment of H5N1 patients with
oseltamivir
Lower quality evidence
Decision
Explanation
 Yes
□ No
The quality of evidence is very low
Uncertainty about the balance of
benefits versus harms and burdens
 Yes
□ No
Uncertainty or differences in values
□ Yes
 No
The benefits are uncertain because several
important or critical outcomes where not
measured. However, the potential benefit is
very large despite potentially small relative
risk reductions.
All patients and care providers would
accept treatment for H5N1 disease
Uncertainty about whether the net
benefits are worth the costs
□ Yes
 No
For treatment of sporadic patients the price
is not high ($45).
Frequent “yes” answers will increase the likelihood of a weak recommendation
Example: Oseltamivir for
Avian Flu
Recommendation: In patients with confirmed or
strongly suspected infection with avian influenza A
(H5N1) virus, clinicians should administer
oseltamivir treatment as soon as possible (?????
recommendation, very low quality evidence).
Schunemann et al. The Lancet ID, 2007
Are values important?
Should resources be
considered?
86
Example: Oseltamivir for
Avian Flu
Recommendation: In patients with confirmed or
strongly suspected infection with avian influenza A
(H5N1) virus, clinicians should administer
oseltamivir treatment as soon as possible (strong
recommendation, very low quality evidence).
Values and Preferences
Remarks: This recommendation places a high value
on the prevention of death in an illness with a high
case fatality. It places relatively low values on
adverse reactions, the development of resistance
and costs of treatment.
Schunemann et al. The Lancet ID, 2007
Other explanations
Remarks: Despite the lack of controlled treatment
data for H5N1, this is a strong recommendation, in
part, because there is a lack of known effective
alternative pharmacological interventions at this
time.
The panel voted on whether this recommendation
should be strong or weak and there was one
abstention and one dissenting vote.
Critical
Outcome
Critical
Outcome
Important
Outcome
Not
High
Moderate
Low
Very low
Summary of findings
& estimate of effect
for each outcome
Systematic review
Grade down
P
I
C
O
Outcome
1.
2.
3.
4.
5.
Grade up
RCT start high,
obs. data start low
Risk of bias
Inconsistency
Indirectness
Imprecision
Publication
bias
1. Large effect
2. Dose
response
3. Confounders
Guideline development
Formulate recommendations:
• For or against (direction)
• Strong or weak (strength)
By considering:
 Quality of evidence
 Balance benefits/harms
 Values and preferences
Revise if necessary by considering:
 Resource use (cost)
Rate
overall quality of evidence
across outcomes based on
lowest quality
of critical outcomes
•
•
•
•
“We recommend using…”
“We suggest using…”
“We recommend against using…”
“We suggest against using…”
90