Insert Long Worksheet Title

Download Report

Transcript Insert Long Worksheet Title

Critical Appraisal of the
Evidence
Peter Morley
What’s out there? 2 reviews already
GRADE
Grades of Recommendation, Assessment,
Development and Evaluation
(international working group)
Reviewed existing techniques
Created sequential assessment of
evidence through to final judgment about
strength of recommendation (Strong or
Weak)
Allows strong recommendation with
weak evidence
Levels of evidence (Grade other)
“Systematic reviews should not be
included in a hierarchy of evidence (i.e.
as a level or category of evidence). The
availability of a well-done systematic
review does not correspond to high
quality evidence, since a well-done
review might include anything from no
studies to poor quality studies with
inconsistent results to high quality
studies with consistent results.”
Levels of evidence (Grade other)
“Indirect” evidence from other
populations can be used, but results in
downgrading of level of evidence
Humans
?? Animal/bench/modelling
Grade
GRADE
Allows downgrading of quality estimate if
involve “indirect evidence”
Population groups
? Animals etc!
Prognostic or aetiologic studies only
useful if their use modifies outcomes!
Very good approach to making
recommendations
Grade
SIGN
Scottish Intercollegiate GuideliNes
To develop evidence based guidelines
Revised methodology from 2000
Widely used in UK
SIGN LOEs
From 1 to 4 (expert opinion)
From ++ to + to Includes meta-analyses (including those
of non-RCTs)
No retrospective control subset
Nil below case series (eg animals etc)
Not for non-therapy studies
Sign
SIGN
Studies assessed according to checklists
Grades of recommendations (A to D)
linked to LOE
SIGN
(A)NH&MRC
Australian national Medical Heath and
Research Council
Revised in 2005-6
(A)NH&MRC
LOEs from I to IV
Separates study types (intervention,
diagnosis, prognosis, aetiology,
screening)
Definitions and quality assessment tools
for each
Input from McMaster, Oxford and GRADE
(A)NH&MRC
(A)NH&MRC
Complex conversion of body of evidence
using matrix to give final 4 grades
recommendation (linked to LOE)
(A)NH&MRC
(A)NH&MRC
(O)CEBM
Oxford Centre for Evidence Based Medicine
Widely known, comprehensive system
10 levels of evidence (with definitions)
Covers therapy/aetiology, Prognosis,
Diagnosis, Economic analysis
Simplistic compression into 4 grades of
recommendations (1=A, 2-3=B, C=3, D=4)
(O)CEBM May 2001
ACCP
American College of Chest Physicians
Limited to 3 LOEs (A-C)
A=RCTs, B=RCTs of lesser quality,
C+=extrapolated from RCTs,
C=observational
Limited to 2 Grades of recommendation
based on LOE and clarity of risk/benefit
Table 1: C urre nt Approach to Grade s of Re com me n dation s
ACCP
G rade of
recommendation
Clarity of
risk/benefit
Methodologic strength
of supp orting evidence
Implications
Example of
recommendation
1A
Risk/benefit
clear
Random ized
controlled trials
(RCTs) wi thout
import ant limitations
Strong
recom m endation,
can apply to most
patients in most
circumstances
without reservation
We recommend warfarin
therapy in patients with
atrial fibrillation at high
risk for stroke
1B
Risk/benefit
clear
RCTs with important
limitations (inconsistent
results, methodological
flaws*)
Strong
recom m endations,
likel y to apply to
most patients
We recommend that
pentoxify lline should not
be used in patients with
intermittent claudication
1 C+
Risk/benefit
clear
No RCTs but RCT
results can be
unequivocally
extrapolated, or
overwhelming evidence
from observational
studies
Strong
recom m endation,
can apply to most
patients in most
circumstances
We recommend long-term
warfarin therapy for
patients with atrial
fibrillation and rheumatic
mitral valve disease
1C
Risk/benefit
clear
Observational studies
Intermediate
strength
recom m endation;
m ay change when
stronger evidence
available
We recommend
surveillance and
postpartum
anticoagulation for
pregnant patients with
prior venous
thromboembolism
associated with a transient
risk factor
2A
Risk/benefit
unclear
RCTs without important
limitations
Intermediate
strength
recom m endation,
best action may
differ depending on
circumstances or
patientsÕor societal
values
We do not recomm end
aspirin as sole therapy in
patients after hip fracture
surgery to prevent venous
thromboembolism
2B
Risk/benefit
unclear
RCTs with important
limitations (inconsistent
results, methodological
flaws)
Weak
recom m endation,
alternative
approaches likel y to
be better for some
patients under some
circumstances
We recommend early
anticoagulation in patients
with acute cardioembolic
large-artery ischem ic
stroke who are ineligible
for thrombolysis
2C
Risk/benefit
unclear
Observational studies
Very weak
recom m endations;
other alternatives
m ay be equally
reasonable
We recommend long term
aspirin therapy in patients
with bioprosthetic heart
valves who are in normal
sinus rhythm
Others (less applicable)
US Preventative Services Taskforce
US Task Force on Community
Preventative Services
Comparisons and proposal
Study type/A pproac h
C2010 C 2 0 0 5
M eta- analys es
1
1 or 2
RC T s
1 or 2
1
C onc urrent c ontrols
3
2
3
Retros pec tive c ontrols
4
4
N o c ontrols
5
5
A nimal/M ec hanic al/M odel
6
E xtrapolations
7
5
YES
C overs " non-therapy" s tudies
? 1 to7
G rade
SI G N
N H &M RC
O C E BM
n/a
1 ++ to 2 ++
I
1 a or 2 a
H igh/M od/L ow 1 /1 +/1 ++
II
1 b or 2 b
H igh/M od/L ow 2 - to 2 ++ I I I -1 or I I I -2 2 b to 4
L ow
n/a
I I I -3
?1c !
V ery low
3
IV
4
? A s for extrap
n/a
n/a
n/a
D owngraded
n/a
n/a
n/a
I rrelevant!
No
D etailed
GRADE approach closest to our needs
Not numerical
Needs additional “non-therapy” LOEs
OCEBM or NH&MRC
D etailed
Le vels of Evide ncefor Therape utic Interve ntions
LOE 1: Randomised Controlled T rials (or meta-analyses of RCT s)
LOE 2: Studies using concurrent controls without true randomisation (eg. ŅpseudoÓ-randomised)
LOE 3: Studies using retrospective controls
LOE 4: Studies without a control group (eg. case series)
LOE 5: Studies not directly related to the specific patient/population (eg. different
patient/population, animal models, mechanical models etc.)
Assessing quality of individual
studies
Quality for RCTs
Many checklists available
None found perfect, some conflict
Demonstrated to influence outcome
True random allocation
Allocation concealment
Blinding
Funding or sponsorship
Summaries of techniques
AHRQ: key domains
Study population
Randomization
Blinding
Interventions
Outcomes
Statistical analysis
Funding or sponsorship
Quality for “R”CTs (examples)
OCEBM
Detailed checklist
ACCP
Lack of blinding, subjective assessments, large loss to follow up
GRADE/ATS
No intention-to-treat analysis subjective assessments, large loss
to follow up
NH&MRC
Concealed allocation, blinding, equal at baseline, large loss to
follow up
US Preventative Services Task Force
Fatal flaws include: no I-t-T, non-comparable groups
Long checklists
Useful adjuncts for assessing individual
studies
Should be prioritized to facilitate quality
assessment allocation
*
*
*
*=essential
Quality (summary)
There is no uniformly agreed to way of
defining methodological quality
A number of numerical systems have
been proposed but all have their
limitations.
Instead of a strict criterion based
assessment, we ask the worksheet
reviewer to allocate the quality of each
study into Good, Fair and Poor.
Quality items for RCTs
Was the assignment of patients to treatment randomised?
Was the randomisation list concealed?
Were all patients who entered the trial accounted for at its
conclusion?
Were the patients analysed in the groups to which they were
randomised?
Were patients and clinicians "blinded" to which treatment was
being received?
Aside from the experimental treatment, were the groups
treated equally?
Were the groups similar at the start of the trial?
Suggested quality assessment
Good studies = most/all of quality items
Fair studies = some of the quality items
Poor studies = few of the quality items
but sufficient value to include for further
review.
Suggest that at this stage, sponsorship of
article should be noted during assessment but
not to be added as part of allocation
Quality items for RCTs
Was the assignment of patients to treatment randomised?
Was the randomisation list concealed?
Were all patients who entered the trial accounted for at its
conclusion?
Were the patients analysed in the groups to which they were
randomised?
Were patients and clinicians "blinded" to which treatment was
being received?
Aside from the experimental treatment, were the groups
treated equally?
Were the groups similar at the start of the trial?
Induced hypothermia
Quality items for RCTs
Was the assignment of patients to treatment
randomised?
Was the randomisation list concealed?
Were the groups similar at the start of the trial?
Was the follow up of patients sufficiently long and
complete?
Were the patients analysed in the groups to which they
were randomised?
Were patients and clinicians "blinded" to which treatment
was being received?
Aside from the experimental treatment, were the groups
treated equally?
Randomised?
Quality items for RCTs
Was the assignment of patients to treatment
randomised?
Was the randomisation list concealed?
Were the groups similar at the start of the trial?
Was the follow up of patients sufficiently long and
complete?
Were the patients analysed in the groups to which they
were randomised?
Were patients and clinicians "blinded" to which treatment
was being received?
Aside from the experimental treatment, were the groups
treated equally?
Concealed?
Quality items for RCTs
Was the assignment of patients to treatment
randomised?
Was the randomisation list concealed?
Were the groups similar at the start of the trial?
Was the follow up of patients sufficiently long and
complete?
Were the patients analysed in the groups to which they
were randomised?
Were patients and clinicians "blinded" to which treatment
was being received?
Aside from the experimental treatment, were the groups
treated equally?
Groups similar?
Quality items for RCTs
Was the assignment of patients to treatment
randomised?
Was the randomisation list concealed?
Were the groups similar at the start of the trial?
Was the follow up of patients sufficiently long and
complete?
Were the patients analysed in the groups to which they
were randomised?
Were patients and clinicians "blinded" to which treatment
was being received?
Aside from the experimental treatment, were the groups
treated equally?
Follow up long and complete?
Quality items for RCTs
Was the assignment of patients to treatment
randomised?
Was the randomisation list concealed?
Were the groups similar at the start of the trial?
Was the follow up of patients sufficiently long and
complete?
Were the patients analysed in the groups to which they
were randomised?
Were patients and clinicians "blinded" to which treatment
was being received?
Aside from the experimental treatment, were the groups
treated equally?
Intention to treat?
Intention to treat?
Quality items for RCTs
Was the assignment of patients to treatment
randomised?
Was the randomisation list concealed?
Were the groups similar at the start of the trial?
Was the follow up of patients sufficiently long and
complete?
Were the patients analysed in the groups to which they
were randomised?
Were patients and clinicians "blinded" to which treatment
was being received?
Aside from the experimental treatment, were the groups
treated equally?
”Blinded”?
Outcome assessment (by specialist "unaware" of treatment group)
Quality items for RCTs
Was the assignment of patients to treatment
randomised?
Was the randomisation list concealed?
Were the groups similar at the start of the trial?
Was the follow up of patients sufficiently long and
complete?
Were the patients analysed in the groups to which they
were randomised?
Were patients and clinicians "blinded" to which treatment
was being received?
Aside from the experimental treatment, were the groups
treated equally?
Treated equally?
Quality items for RCTs
Was the assignment of patients to treatment
randomised?
Was the randomisation list concealed?
Were the groups similar at the start of the trial?
Was the follow up of patients sufficiently long and
complete?
Were the patients analysed in the groups to which they
were randomised?
Were patients and clinicians "blinded" to which treatment
was being received?
Aside from the experimental treatment, were the groups
treated equally?
Is the valid evidence about
therapy important?
What is the magnitude of the treatment
effect?
How precise is the estimate of treatment
effect?
Outcomes
More discharged home/rehab with
hypothermia
26% vs 49% (ARR = 23%)
NNT 4.5 [2.3 - 76]
Chi square P=0.046; FE, P=0.061.
OR 2.7 [1.0-7.0]
Rel Risk and Rel Risk Red overlap 0
Can we apply this valid, important
evidence about therapy in caring
for our patient?
Is our patient so different from those in the
study that its results cannot apply?
Is the treatment feasible in our setting?
What are our patient’s potential benefits and
harms from therapy?
What are our patients values and
expectations for both the outcome we are
trying to prevent and the treatment we are
offering?
Patients
Multicentre study of out-of-hospital
cardiac arrest in Melbourne Australia.
Patients in VF at arrival of ambulance,
ROSC and persistent coma, but not age
< 18 (men) or <50 (women, as ?
pregnant), hypotension (SBP < 90
despite epinephrine infusion), or other
causes of coma.
Treatment
Standard management included midazolam and
vecuronium, temperature corrected CO2 of 40,
MAP 90-100 (with epinephrine or GTN), lignocaine
infusion and glucose < 10 mmol/L.
Normothermia passively rewarmed to target of
37°C, sedated and paralysed as needed.
Hypothermia group had clothing removed, and
ice-packs to head and torso (paramedics), then
sedated and paralysed as needed to prevent
shivering; target temperature 33°C for 12 hours
after hospital arrival then actively rewarmed over
6 hours.
Quality items for RCTs
Was the assignment of patients to treatment
randomised?
Was the randomisation list concealed?
Were the groups similar at the start of the trial?
Was the follow up of patients sufficiently long and
complete?
Were the patients analysed in the groups to which they
were randomised?
Were patients and clinicians "blinded" to which treatment
was being received?
Aside from the experimental treatment, were the groups
treated equally?
Quality items for meta-analyses
Were specific objectives of the review stated (based on a
specific clinical question in which patient, intervention,
comparator, outcome (PICO)
Was study design defined?
Were selection criteria for stated for studies to be included
(based on trial design and methodological quality)?
Were inclusive searches undertaken (using appropriately
crafted search strategies)?
Were characteristics and methodological quality of each trial
identified?
Were selection criteria applied and a log of excluded studies
with reasons for exclusion reported?
Quality items for non-RCTs
Were comparison groups clearly defined?
Were outcomes measured in the same
(preferably blinded), objective way in both
groups?
Were known confounders identified and
appropriately controlled for?
Was follow-up of patients sufficiently long and
complete?
Quality items for studies
without controls
Were outcomes measured in an objective
way?
Were known confounders identified and
appropriately controlled for?
Was follow-up of patients sufficiently long and
complete?
Studies of diagnostic tests
“Test” = examination finding/investigation
Gives “result”
Starting point is initial “test” on patients
Compare “test” result with known outcome
(“gold standard”)
Develop threshold result (to alter Mx)
= Clinical Decision Rule (CDR)
Better = confirm result in multiple centers
Studies related to prognosis
All “prognosis” questions share 3 elements
a qualitative aspect (which outcomes could
happen?)
a quantitative aspect (how likely are they
to happen?), and
a temporal aspect (over what time period?)
Eg. % (±95%CI) survival at certain time
(EBM Sackett 2000)
As opposed to most
investigations, cardiac arrest
time frames are short
So prognosis and diagnosis studies
may appear to overlap!
Exposure and outcome
measured at same time
= cross sectional study
(otherwise = cohort or case-control studies)
Case-control study
A study which involves identifying
patients who have the outcome of
interest (cases) and patients without the
same outcome (controls), and looking
back to see if they had the exposure of
interest.
http://www.cebm.utoronto.ca/glossary/
Studies of diagnostic tests
“Test” = examination finding/investigation
Starting point is initial “test” on patients
Compare “test” result with known outcome
(“gold standard”)
Develop threshold result (to alter Mx)
= Clinical Decision Rule (CDR)
Better = confirm result in multiple centers
C2010 LOEs for Diagnostic Studies
LOE D1: Validating cohort studies (or meta-analyses), or
validation of Clinical Decision Rule (CDR)
LOE D2: Exploratory cohort study (or meta-analyses), or
derivation of CDR, or split-sample validation only
LOE D3: Diagnostic case control study
LOE D4: Study of diagnostic yield (no reference standard)
LOE D5: Studies not directly related to the specific
patient/population (eg. different patient/population, animal
models, mechanical models etc.)
An example: VF waveforms
In cardiac arrest patients due to VF
(P), does the use of VF waveforms (I)
allow the diagnosis of a successfully
defibrillatable rhythm (ROSC)?
Diagnosis outcomes
Could even be phrased as a
prognostic question:
“predict ROSC with single shock”!
Outcome = % (±95%CI)
LOE D5: Studies not directly related to
the specific patient/population (eg.
different patient/population, animal
models, mechanical models etc.)
VF waveform analysis diagnoses successfully
shockable rhythm in pigs, and induced VF
LOE D4: Study of diagnostic yield (no
reference standard)
VF waveform analysis diagnoses successfully
shockable rhythm in x% of patients in OOHCA, but
no outcome data
“Case-control” design
HF patients
controls
Index test
Blinded cross-classification
Systematic Review of Diagnostic Studies (2.37Mb) Rafael Perera
http://www.cebm.net/?o=1021
LOE D3: Diagnostic case control study
In collection of patients, those with ROSC after a
shock had a distinctly difference appearance of VF
waveform when compared with patients without
ROSC after a shock
Diagnostic Accuracy Study:
Basic Design
Series of patients
Index test
Reference standard
Blinded cross-classification
LOE D2: A study of test accuracy
(exploratory cohort study) (or metaanalyses), or derivation of Clinical
Decision Rule (CDR), or split-sample
validation only
In a group of non-consecutive patients with VF, a
specific cut off point could be determined that
predicted increased likelihood (+LR=12) of ROSC
after shock. This was determined in 50% of patients
and validated in the other 50%.
LOE D1: A study of test accuracy
(validating cohort study) (or metaanalyses), or validation of Clinical
Decision Rule (CDR)
In a group of consecutive patients with VF from
multiple settings, a previously determined Clinical
Decision Rule was confirmed to predict increased
likelihood (+LR = 12) of ROSC after shock.
Ideally!!
In a PRCT, the group of patients in VF that had the
Clinical Decision Rule applied, were more likely to
achieve ROSC, and hospital discharge, than those
who did not have the clinical decision rule applied.
Quality for diagnostic studies
Consistent aim = minimize bias
Suggested quality assessment
Good studies = most/all of quality items
Fair studies = some of the quality items
Poor studies = few of the quality items
but sufficient value to include for further
review.
Suggest that at this stage, sponsorship of
article should be noted during assessment but
not to be added as part of allocation
Diagnostic cohort (studies of test
accuracy) or case-control studies (D1,
D2, or D3)
Was the diagnostic test evaluated in an
appropriate spectrum of patients (like in those
in whom it would be used in practice; spectrum
bias”)?
Was there an independent, blind comparison
(review bias) with a reference ("gold")
standard of diagnosis?
Was the reference standard applied regardless
of the test result (verification bias)?
Can I trust the accuracy data
from the study?
RAMMbo
Recruitment: Was an appropriate spectrum of patients
included?
(Spectrum Bias)
Maintainence: All patients subjected to a Gold Standard?
(Verification Bias)
Measurements: Was there an independent,
blind or
objective comparison with a Gold Standard?
Observer Bias; Differential Reference Bias
Rapid Critical Appraisal of Diagnostic Accuracy Studies (637Kb)
Paul Glasziou http://www.cebm.net/?o=1021
Spectrum Bias
Selected Patients
Index test
Reference standard
Blinded cross-classification
Verification Bias
Series of patients
Index test
Reference standard
Blinded cross-classification
Differential Reference Bias
Series of patients
Index test
Ref. Std A
Ref. Std. B
Blinded cross-classification
Observer Bias
Series of patients
Index test
Reference standard
Unblinded cross-classification
Diagnostic studies without
reference standard (LOE D4)
(As for other LOEs with out controls)
Were outcomes measured in an objective
way?
Were known confounders identified and
appropriately controlled for?
Was follow-up of patients sufficiently
long and complete?
Studies related to prognosis
All “prognosis” questions share 3 elements
a qualitative aspect (which outcomes could
happen?)
a quantitative aspect (how likely are they
to happen?), and
a temporal aspect (over what time period?)
(EBM Sackett 2000)
Studies related to prognosis
Starting point is assessing factor on patients
Compare relation of presence or absence
of factor to outcome
(Develop Clinical Decision Rule (CDR) eg.
combination of multiple factors)
Best = confirm result in multiple centers
C2010 LOEs for Prognostic Studies
LOE P1: Inception (prospective) cohort studies (or metaanalyses of inception cohort studies), or validation of Clinical
Decision Rule (CDR)
LOE P2: Follow up of untreated control groups in RCTs (or
meta-analyses of followup studies), or derivation of CDR, or
validated on split-sample only
LOE P3: Retrospective cohort studies
LOE P4: Case series
LOE P5: Studies not directly related to the specific
patient/population (eg. different patient/population, animal
models, mechanical models etc.)
An example: prediction “intact”
In cardiac arrest patients (P), does the
use of the PETER algorithm (I) allow
the accurate prediction of a intact
neurological survival (O)?
PETER: Pain Eyes Temperature
‘Emodynamics Relatives
LOE P5: Studies not directly related to
the specific patient/population (eg.
different patient/population, animal
models, mechanical models etc.)
PETER algorithm used in pig model on day 3
successfully predicts for intact neurological survival at
1 month
LOE P4: Case series
PETER algorithm observed in 25 cases on day 3
to be associated with intact neurological
survival at 1 month
LOE P3: Retrospective cohort studies
Retrospective identification of 20 cases after
cardiac arrest where PETER algorithm was
highly positive, and 20 cases where PETER
algorithm was strongly negative, and more
cases in the strongly positive group survived
neurologically intact.
LOE P2: Follow up of untreated control
groups in RCTs (or meta-analyses of
followup studies), or derivation of CDR,
or validated on split-sample only
PETER algorithm was used in 100 consecutive
patients after cardiac arrest, and a score of
>10 was associated with >90% survival
neurologically intact at 3 months, and a score
of <3 was associated with no such survivors.
LOE P1: Inception (prospective) cohort
studies (or meta-analyses of inception
cohort studies), or validation of Clinical
Decision Rule (CDR)
The previously proposed cut off of >10 and <3
for the PETER algorithm was tested in 1000
consecutive patients after cardiac arrest in 5
centres (throughout the world), and a score of
>10 was confirmed to be associated with
>95% survival neurologically intact at 3
months, and a score of <3 was associated with
no such survivors.
Ideally!!!
In a multicentre PRCT of the use of the PETER
algorithm in patients supported in an ICU for
up to 30 days before withdrawal, the use of the
algorithm saved 9.5 ICU days/patient who
eventually died, and saved over $US1million.
Quality for prognostic studies
Consistent aim = minimize bias
Suggested quality assessment
Good studies = most/all of quality items
Fair studies = some of the quality items
Poor studies = few of the quality items
but sufficient value to include for further
review.
Suggest that at this stage, sponsorship of
article should be noted during assessment but
not to be added as part of allocation
“Prognostic” studies using controls
(LOE P1, P2 or P3)
Were comparison groups clearly defined?
Were outcomes measured in the same
(preferably blinded), objective way in
both groups?
Were known confounders identified and
appropriately controlled for?
Was follow-up of patients sufficiently
long and complete (eg. >80%)?
“Prognostic” studies without
controls (LOE P4)
(As for other LOEs with out controls)
Were outcomes measured in an objective
way?
Were known confounders identified and
appropriately controlled for?
Was follow-up of patients sufficiently
long and complete (eg. >80%)?