Transcript Slide 1

THE GRADE SYSTEM
Consulting on evidence based recommendation in public health
The European Centre for Disease Prevention and Control
Conference and workshop
23 and 24 Apr 2009
Yngve Falck-Ytter, M.D.
Jan Brozek, M.D.
Disclosure
In the past 5 years, Dr. Falck-Ytter received no personal
payments for services from industry. His research group
received research grants from Valeant and Roche that were
deposited into non-profit research accounts. He is a
member of the GRADE working group which has received
funding from various governmental entities in the US and
Europe. Some of the GRADE work he has done is supported
in part by grant # 1 R13 HS016880-01 from the Agency for
Healthcare Research and Quality (AHRQ).
Content
 Background and rationale for revisiting
guideline methodology
 GRADE approach
 Quality of evidence
 Strength of recommendations
Confidence in evidence
 There always is evidence
 “When there is a question there is evidence”
 Evidence alone is never sufficient to make a
clinical decision
 Better research  greater confidence in the
evidence and decisions
Hierarchy of evidence

Randomized Controlled
Trials

Cohort Studies and Case
Control Studies

Case Reports and Case
Series, Non-systematic
observations
Expert Opinion
BIAS
Expert Opinion
Expert Opinion
STUDY DESIGN
Reasons for grading evidence?
 People draw conclusions about the
 quality of evidence and strength of recommendations
 Systematic and explicit approaches can help to
 protect against errors, resolve disagreements
 communicate information and fulfill needs
 be transparent about the process
 Change practitioner behavior
 However, wide variation in approaches
GRADE working group. BMJ. 2004 & 2008
What would you do?
 How would you rate an RCT that has only few
patients in each arm, did not use allocation
concealment , and had additional flaws?
 What should you do when RCTs show
conflicting evidence?
 You have several well done observational
studies which show consistent and large
effects – how would you rate them?
8
Which grading system?
Recommendation for use of oral anticoagulation
in patients with atrial fibrillation and rheumatic
mitral valve disease
Evidence
B
A
 IV
Recommendation
Class I
1
C
Organization
 AHA
 ACCP
 SIGN
Grading used in GI CPGs
AASLD
AGA
ACG
I RCTs, well
designed, n↑ for
suff. stat. power
I Syst. review of
RCTs
II 1+ properly desig.
RCT, n↑, clinical
setting
II-2 Cohort or casecontrol analytical
studies
II 1 large welldesigned clinical
trial (+/- rand.),
cohort or casecontrol studies or
well designed metaanalysis
II-3 Multiple time series,
dramatic uncontr.
experiments
III Clinical experience,
descr. studies,
expert comm.
III
IV Not rated
IV Non-exp. studies
>1 center/group,
opinion respected
authorities, clinical
evidence, descr.
studies, expert
consensus comm.
I
RCTs
II-1 Controlled trials
(no randomization)
Opinion of respected
authorities, descrip.
epidemiology
III Publ., well-desig.
trials, pre-post,
cohort, time series,
case-control studies
ASGE
A. Prospect.
controlled
trials
B. Observational
studies
C. Expert
opinion
10
Committee of Ministers of the Council of Europe. Oct 2001.
11
What to do?
12
Limitations of existing systems
 Confuse quality of evidence with strength of
recommendations
 Lack well-articulated conceptual framework
 Criteria not comprehensive or transparent
 GRADE unique




breadth, intensity of development process
wide endorsement and use
conceptual framework
comprehensive, transparent criteria
 Focus on all important outcomes related to a
specific question and overall quality
GRADE Working Group
David Atkins, chief medical officera
Dana Best, assistant professorb
Martin Eccles, professord
Francoise Cluzeau, lecturerx
Yngve Falck-Ytter, associate directore
Signe Flottorp, researcherf
Gordon H Guyatt, professorg
Robin T Harbour, quality and information director h
Margaret C Haugh, methodologisti
David Henry, professorj
Suzanne Hill, senior lecturerj
Roman Jaeschke, clinical professork
Regina Kunx, Associate Professor
Gillian Leng, guidelines programme directorl
Alessandro Liberati, professorm
Nicola Magrini, directorn
James Mason, professord
Philippa Middleton, honorary research fellowo
Jacek Mrukowicz, executive directorp
Dianne O’Connell, senior epidemiologistq
Andrew D Oxman, directorf
Bob Phillips, associate fellowr
Holger J Schünemann, professorg,s
Tessa Tan-Torres Edejer, medical officert
David Tovey, Editory
Jane Thomas, Lecturer, UK
Helena Varonen, associate editoru
Gunn E Vist, researcherf
John W Williams Jr, professorv
Stephanie Zaza, project directorw
a) Agency for Healthcare Research and Quality, USA
b) Children's National Medical Center, USA
c) Centers for Disease Control and Prevention, USA
d) University of Newcastle upon Tyne, UK
e) German Cochrane Centre, Germany
f) Norwegian Centre for Health Services, Norway
g) McMaster University, Canada
h) Scottish Intercollegiate Guidelines Network, UK
i) Fédération Nationale des Centres de Lutte Contre le Cancer, France
j) University of Newcastle, Australia
k) McMaster University, Canada
l) National Institute for Clinical Excellence, UK
m) Università di Modena e Reggio Emilia, Italy
n) Centro per la Valutazione della Efficacia della Assistenza Sanitaria, Italy
o) Australasian Cochrane Centre, Australia
p) Polish Institute for Evidence Based Medicine, Poland
q) The Cancer Council, Australia
r) Centre for Evidence-based Medicine, UK
s) National Cancer Institute, Italy
t) World Health Organisation, Switzerland
u) Finnish Medical Society Duodecim, Finland
v) Duke University Medical Center, USA
w) Centers for Disease Control and Prevention, USA
x) University of London, UK
Y) BMJ Clinical Evidence, UK
GRADE uptake
Where GRADE fits in
Prioritize problems, establish panel
Systematic review
Searches, selection of studies, data collection and analysis
Prepare evidence profile:
Quality of evidence for each outcome and summary of findings
Assess overall quality of evidence
Decide direction and strength of recommendation
Draft guideline
Consult with stakeholders and / or external peer reviewer
Disseminate guideline
Implement the guideline and evaluate
GRADE
Assess the relative importance of outcomes
GRADE: Quality of evidence
The extent to which our confidence in an
estimate of the treatment effect is adequate to
support particular recommendation.
Although the degree of confidence is a
continuum, we suggest using four categories:




High
Moderate
Low
Very low
18
Quality of evidence across
studies
Outcome #1
Outcome #2
Outcome #3
Quality: High
Quality: Moderate
Quality: Low
III
V
II
IB
Determinants of quality
 RCTs start high
 Observational studies start low
 What lowers quality of evidence? 5 factors:
 Detailed design and execution
 Inconsistency of results
 Indirectness of evidence
 Imprecision
 Publication bias
What is the study design?
21
Types of studies
Did investigator assign exposure?
Yes
No
Experimental study
Observational study
Random allocation?
Comparison group?
Yes
RCT
No
CCT
Yes
Analytical study
No
Case-series
Direction?
Cohort study
Case-control study
Exposure
and outcome
at the same time
Cross-sectional study
Before and
after study
Variations:
cBAS
ITS
22
1. Design and execution
 Study limitations (risk of bias)
For RCTs:
 Lack of allocation concealment
 No true intention to treat principle
 Inadequate blinding
 Loss to follow-up
 Early stopping for benefit
For observational studies:
 Selection
 Comparability
 Exposure/outcome
Tools: scales and checklists
Example: Jadad score
Was the study described as randomized?
Adequate description of randomization?
Double blind?
1
1
1
Method of double blinding described?
Description of withdrawals and dropouts?
1
1
Max 5 points for quality
Jadad AR et al. Control Clin Trials 1996
24
Cochrane Risk of bias graph in
RevMan 5
25
2. Consistency of results
 Look for explanation for inconsistency
 patients, intervention, comparator, outcome, methods
 Judgment
 variation in size of effect
 overlap in confidence intervals
 statistical significance of heterogeneity
 I2
Heterogeneity
Pagliaro L et al. Ann Intern Med 1992;117:59-70
27
3. Directness of Evidence
 Indirect comparisons
 Interested in head-to-head comparison
 Drug A versus drug B
 Tenofovir versus entecavir in hepatitis B treatment
 Differences in
 population (HPV vaccination in 9 to 26 y/o vs. 27 to 45;
differences in baseline risk / burden of disease)
 interventions (e.g., different route of vaccine application)
 comparator (differeces in formulations/dose: risk vs benefit)
 outcomes (use of surrogates [e.g., immunogenicity];
long-term vs short-term; etc.)
4. Imprecision
Small sample size
 small number of events
 wide confidence intervals
 uncertainty about magnitude of effect
Imprecision
appreciable
benefit
RR
appreciable
harm
precise
imprecise
0.75
1.00
1.25
5. Reporting Bias
(Publication Bias)
 Reporting of studies
 publication bias
 number of small studies
 Reporting of outcomes
English/Non-English Language Bias
Egger et al. Lancet 1997;350:326-29
32
Quality assessment criteria
Quality of
evidence
Study
design
Lower if…
High (4)
Randomized
trial
Study limitations
(design and execution)
Moderate (3)
Low (2)
Very low (1)
Inconsistency
Observational
study
Indirectness
Imprecision
Higher if…
What can
raise the
quality of
evidence?
Publication bias
33
BMJ 2003;327:1459–61
34
Quality assessment criteria
Quality of
evidence
Study
design
Lower if…
Higher if…
High (4)
Randomized
trial
Study limitations
Large effect (e.g., RR 0.5)
Very large effect (e.g., RR 0.2)
Inconsistency
Evidence of dose-response
gradient
Indirectness
All plausible confounding
would reduce a
demonstrated effect
Moderate (3)
Low (2)
Very low (1)
Observational
study
Imprecision
Publication bias
35
Categories of quality
High
Further research is very unlikely to change our
confidence in the estimate of effect
Moderate
Further research is likely to have an important impact on
our confidence in the estimate of effect and may change
the estimate
Low
Further research is very likely to have an important
impact on our confidence in the estimate of effect and is
likely to change the estimate
Very low
Any estimate of effect is very uncertain














36
Judgments about the
overall quality of evidence
 Most systems not explicit
 Options:




Benefits
Primary outcome
Highest
Lowest
 Beyond the scope of a systematic review
 GRADE: Based on lowest of all the critical
outcomes
37
GRADE evidence profile
Going from evidence to
recommendations
 Deliberate separation of quality of evidence
from strength of recommendation
 No automatic one-to-one connection as in
other grading systems
 Example: What if there is high quality
evidence, but the balance between benefit
and risks are finely balanced?
39
Strength of recommendation
“The strength of a recommendation reflects the
extent to which we can, across the range of patients
for whom the recommendations are intended, be
confident that desirable effects of a management
strategy outweigh undesirable effects.”
Although the strength of recommendation is a
continuum, we suggest using two categories :
“Strong” and “Weak”
Desirable and undesirable effects
 Desirable effects
 Mortality reduction
 Improvement in quality of life, fewer
hospitalizations/infections
 Reduction in the burden of treatment
 Reduced resource expenditure
 Undesirable effects
 Deleterious impact on morbidity, mortality or quality of
life, increased resource expenditure
4 determinants of the strength
of recommendation
Factors that can weaken the
strength of a recommendation
Explanation
 Lower quality evidence
The higher the quality of evidence, the more
likely is a strong recommendation.
 Uncertainty about the
balance of benefits versus
harms and burdens
The larger the difference between the desirable
and undesirable consequences, the more likely
a strong recommendation warranted. The
smaller the net benefit and the lower certainty
for that benefit, the more likely is a weak
recommendation warranted.
 Uncertainty or differences
in values
The greater the variability in values and
preferences, or uncertainty in values and
preferences, the more likely weak
recommendation warranted.
 Uncertainty about whether
the net benefits are worth
the costs
The higher the costs of an intervention – that is,
the more resources consumed – the less likely
is a strong recommendation warranted.
Developing recommendations
Implications of a
strong recommendation
 Patients: Most people in this situation would want
the recommended course of action and only a small
proportion would not
 Clinicians: Most patients should receive the
recommended course of action
 Policy makers: The recommendation can be
adapted as a policy in most situations
Implications of a
weak recommendation
 Patients: The majority of people in this situation
would want the recommended course of action,
but many would not
 Clinicians: Be prepared to help patients to make a
decision that is consistent with their own
values/decision aids and shared decision making
 Policy makers: There is a need for substantial
debate and involvement of stakeholders
6 main misconceptions
1.
2.
3.
4.
5.
6.
Isn’t GRADE expensive to realize?
Isn’t GRADE more complicated, takes longer and
requires more resources?
Isn’t GRADE eliminating the expert?
But what about prevalence/burden of disease,
diagnosis, cost?
But GRADE does not have an “insufficient evidence to
make recommendation” category! (or: the “optional”
category), no?
But we only “recommend” – we can’t possibly give
weak recommendations!
Critical
Outcome
Critical
Outcome
Important
Outcome
Not
High
Moderate
Low
Very low
Summary of findings
& estimate of effect
for each outcome
Systematic review
Grade down
P
I
C
O
Outcome
1.
2.
3.
4.
5.
Grade up
RCT start high,
obs. data start low
Risk of bias
Inconsistency
Indirectness
Imprecision
Publication
bias
1. Large effect
2. Dose
response
3. Confounders
Guideline development
Formulate recommendations:
• For or against (direction)
• Strong or weak (strength)
By considering:
 Quality of evidence
 Balance benefits/harms
 Values and preferences
Revise if necessary by considering:
 Resource use (cost)
Rate
overall quality of evidence
across outcomes based on
lowest quality
of critical outcomes
•
•
•
•
“We recommend using…”
“We suggest using…”
“We recommend against using…”
“We suggest against using…”
Conclusions
1. GRADE is gaining acceptance as international
2.
3.
4.
5.
standard
GRADE has criteria for evidence assessment across
questions (e.g., public health interventions) and
outcomes
Criteria for moving from evidence to
recommendations
Simple, transparent, systematic
Balance between simplicity and methodological rigor