Transcript Slide 1

CREATING A COMMON AND TRANSPARENT APPROACH TO GRADING THE QUALITY OF EVIDENCE IN SYSTEMATIC REVIEWS – THE GRADE APPROACH

Committee on Standards for Systematic Reviews of Clinical Effectiveness Research Institute of Medicine of the National Academics October 26, 2009 Yngve Falck-Ytter, M.D.

Assistant Professor of Medicine Case Western Reserve University

Disclosure

In the past 5 years, Dr. Falck-Ytter received no personal payments for services from industry. His research group received research grants from Three Rivers, Valeant and Roche that were deposited into non-profit research accounts. He is a member of the GRADE working group which has received funding from various governmental entities in the US and Europe. Some of the GRADE work he has done is supported in part by grant # 1 R13 HS016880-01 from the Agency for Healthcare Research and Quality (AHRQ).

Content

     Rating evidence: challenges The GRADE approach in 5 min  Quality of evidence  Strength of recommendations The value of GRADE  Perspective: user of systematic reviews (SR) Strengths and limitations of GRADE GRADE and future developments

Before GRADE

Level of evidence

Ia Ib II

Source of evidence

Systematic reviews (SR) RCTs Cohort studies III IV V Case-control studies Case series Expert opinion

Grades of recomend.

A B C D Oxford Centre of Evidence Based Medicine; http://www.cebm.net

4

Level of evidence in GI CPGs

(mid 2009)

AASLD AGA ACG ASGE

A C Multiple RCTs or meta-analysis B Single randomized trial, or non randomized studies Only consensus opinion of experts, case studies, or standard-of-care

Good

well-designed, well conducted studies […]

Fair

[…] Consistent, Limited by the number, quality or consistency of individual studies

Poor

… important flaws, gaps in chain of evidence… 1. Multiple published, well-controlled (?) randomized trials or a well designed systemic (?) meta analysis A. RCTs B. RCT with important limitations 2. One quality published (?) RCT, published well designed cohort/ case-control studies 3. Consensus of authoritative (?) expert opinions based on clinical evidence or from well designed, but uncontrolled or non-rand. clin. trials C. Obser vational studies D. Expert opinion 5

Limitations of existing systems

     Confuse quality of evidence with strength of recommendations Lack well-articulated conceptual framework Criteria not comprehensive or transparent Most steps in the grading process were implicit Focus on primary benefit and not all important outcomes related to a specific question

SR are undervalued

End users of systematic reviews (e.g., health care provider) underutilize SRs because:  SR tend to be very long  Perceived as complex  Difficulty understanding the effect size  Difficulty understanding imprecision  Difficulties in assessing the confidence in the estimate of effect  Difficulties in translating relative effects to absolute effects to be expected for their patients Pagliaro et al 2009

Reporting of SR Quality of SR Presentation & Interpretation E.g., blinding Item: #12, #19 Scientific quality?

Item: #7 Risk of bias: threshold crossed?

Inconsistency?

Indirectness?

Imprecision?

Confidence in the estimate of effect = Quality of evidence

GRADE: Quality of evidence

The extent to which one can be confident that an estimate of effect or association is correct. 9

Quality of evidence across studies

Outcome #1 Outcome #2 Outcome #3 Quality: High Quality: Moderate Quality: Low Old system GRADE

Components determining quality

 RCTs start high  Observational studies start low What lowers quality of evidence? 5 factors: Methodological limitations Inconsistency of results Indirectness of evidence Imprecision of results Publication bias

Methodological limitations Inconsistency of results Indirectness of evidence Imprecision of results Publication bias Assessment of detailed design and execution (risk of bias)      For RCTs: Lack of allocation concealment No true intention to treat principle Inadequate blinding Loss to follow-up Early stopping for benefit

Cochrane Risk of bias graph in RevMan 5

13

Methodological limitations Inconsistency of results Indirectness of evidence Imprecision of results Publication bias   Judgment  variation in size of effect  overlap in confidence intervals   I statistical significance of heterogeneity 2 (or  2 ) Look for explanation for inconsistency  patients, intervention, comparator, outcome, methods

Heterogeneity

Neurological or vascular complications or death within 30 days of endovascular treatment (stent, balloon angioplasty) vs. surgical carotid endarterectomy (CEA) 15

Methodological limitations Inconsistency of results Indirectness of evidence Imprecision of results Publication bias  Indirect comparisons  Interested in head-to-head comparison  Drug A versus drug B – but what if not studied?

 Tenofovir versus entecavir in hepatitis B treatment  Differences in  patients (early cirrhosis vs end-stage cirrhosis)  interventions (CRC screening: flex. sig. vs colonoscopy)  comparator (e.g., differences in dose)  outcomes (non-steroidal safety: ulcer on endoscopy vs symptomatic ulcer complications)

Methodological limitations Inconsistency of results Indirectness of evidence Imprecision of results Publication bias Any stroke (or death) within 30 days of endovascular treatment (stent, balloon angioplasty) vs. surgical carotid endarterectomy (CEA)

Methodological limitations Inconsistency of results Indirectness of evidence Imprecision of results Publication bias All phase II and III licensing trials for antidepressant drugs between 1987 and 2004. 74 trials – 23 were not published.

Quality assessment - summary

Quality of evidence

High

Study design

Randomized trial Moderate Low Very low

Lower if…

Study limitations Inconsistency Observational study Indirectness Imprecision Publication bias

Higher if…

Large effect (e.g., RR 0.5) Very large effect (e.g., RR 0.2) Evidence of dose-response gradient All plausible confounding would reduce a demonstrated effect 19

High

Conceptualizing quality

Moderate Low Very low We are very confident that the true effect lies close to that of the estimate of the effect.

We are moderately confident in the estimate of effect: The true effect is likely to be close to the estimate of effect , but possibility to be substantially different.

Our confidence effect.

in the effect is limited : The true effect may be substantially different from the estimate of the We have very little confidence the estimate of effect.

in the effect estimate: The true effect is likely to be substantially different from                             20

Key benefit: The GRADE evidence profile

Another view: Summary of findings table

P I C O Outcome Outcome Outcome Outcome Critical Critical Important Low

Systematic review Guideline development

Formulate recommendations

: • For or against (direction) • Strong or weak (strength)

By considering:

 Quality of evidence  Balance benefits/harms  Values and preferences Revise if necessary by considering:  Resource use (cost) Summary of findings & estimate of effect for each outcome High Moderate Low Very low RCT start high, obs. data start low 1. Risk of bias 2. Inconsistency 3. Indirectness 4. Imprecision 5. Publication bias 1. Large effect 2. Dose response 3. Confounders Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • • “We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”

An ideal world…

Moving towards standards SR CPG RCTs PRISMA  AMSTAR  CONSORT  Obs.

Harms/safety?

Baseline risk?

Values/prefs?

STROBE 

Key benefits of GRADE in SR

  For health care professionals/consumers  Rely on standardized quality of evidence ratings  Better understand the effect measures  Likely to increase SR value and use  Better information improves shared decision making For guideline authors  Utilize full GRADE evidence profiles  For policy makers  Informs the process of determining net benefits (which may include resource use considerations, values and preferences)

GRADE’s limitations

 Evidence rating for alternative management strategies, not risk or prognosis per se.

  Does not eliminate disagreements in interpreting the evidence – judgments on thresholds continue to be necessary Requires some training in methodology to be applied optimally

What GRADE isn’t

    Not another “risk of bias” tool Not a quantitative system (no scoring required) Not eliminate COI, but able to minimize Not “expensive”  Builds on well established principles of EBM  Some degree of training is needed for any system  Proportionally adds minimal amount of extra time to a systematic review

Conclusion

Gaining acceptance as international standard because GRADE adds value: 1.

2.

3.

GRADE has criteria for evidence assessment across a range of questions and outcomes Sensible, transparent, systematic Balance between simplicity and methodological rigor