Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management Assistant Professor, Library and Informatics.

Download Report

Transcript Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management Assistant Professor, Library and Informatics.

Critically Evaluating the Evidence:
diagnosis, prognosis, and screening
Elizabeth Crabtree, MPH, PhD (c)
Director of Evidence-Based Practice, Quality Management
Assistant Professor, Library and Informatics
• 2/3 legal claims against GPs in
UK
• 40,000-80,000 US hospital
deaths from misdiagnosis per
year
• Adverse events, negligence
cases, and serious disability are
more likely to be related to
misdiagnosis than drug errors
• Diagnosis uses <5% of hospital
costs, but influences 60% of
decision making
What are tests used for?
What are tests used for?
• Increase certainty about
presence/absence of disease
• Disease severity (triage)
• Monitor clinical course
• Assess prognosis – risk/stage
within diagnosis
• Plan treatment
• Screening
Appraising articles on diagnosis in 3 easy steps:
Are the results important?
Can the results be applied to my patient?
Appraising articles on diagnosis in 3 easy steps:
Appropriate spectrum of patients?
Does everyone get the gold standard?
Is there an independent, blind or objective
comparison with the gold standard?
Are the results important?
Can the results be applied to my patient?
Appropriate spectrum of patients?
An example:
Prospective Validation of the Pediatric Appendicitis Score, Goldman et
al., 2008
Who are the patients being screened with
the PAS?
(Abstract, Methods pg. 279)
- Study includes the full spectrum of manifestation
of the illness (ex. early and late)
- Study includes patients with illnesses commonly
included in the differential
Reference standard applied (does everyone get
the gold standard)?
An example:
Prospective Validation of the Pediatric Appendicitis Score, Goldman et
al., 2008
Did all patients receive CT?
(Methods Section, pg. 279)
Investigators often forgo the reference standard when
the diagnostic test is negative (may require a period of
follow-up with criteria for need for treatment)
Is there an independent, blind or objective
comparison to gold standard?
An example:
Screening for Urinary Tract Infections in Infants in the Emergency
Department: Which Test Is Best, Shaw et al., 1998
What is the diagnostic test and what is
the reference (gold) standard?
(Abstract objectives, Methods, and Table 1)
– Subjects should have both the diagnostic test in
question and the reference standard
– Be vigilant to the reference standard
• The results of one test should not be known (or bias) the
other
Appraising articles on diagnosis in 3 easy steps:
Sensitivity, Specificity
Are the results important?
Predictive Values
ROC Curves
Likelihood Ratios
Can the results be applied to my patient?
Sensitivity and Specificity
• Sensitivity is the proportion of true positives that are
correctly identified by a test or measure (e.g., percent of
sick people correctly identified as having the condition)
• Ex: If 100 patients known to have a disease were tested, and 43
test positive, then the test has 43% sensitivity.
• Specificity is the proportion of true negatives that are
correctly identified by the test (e.g., percent of healthy
people correctly identified as not having the condition)
•
Ex: If 100 patients with no disease are tested and 96 return a
negative result, then the test has 96% specificity.
ROC Curves
- Shows the tradeoff between
sensitivity and specificity
- The closer the curve follows the
left-hand border and then the top
border of the ROC space, the
more accurate the test.
- The closer the curve comes to
the 45-degree diagonal of the ROC
space, the less accurate the test
- The area under the curve is a
measure of text accuracy
Area Under ROC Curve (AUC)
Overall measure of test
performance
Comparisons between two
tests based on differences
between (estimated) AUC
ROC Curve Extremes
Best Test:
Worst test:
100%
True Positive Rate
True Positive Rate
100%
0
%
0
%
0
%
False Positive Rate
The distributions
don’t overlap at all
100
%
0
%
False Positive
Rate
100
%
The distributions
overlap completely
AUC for ROC Curves
100%
100%
0
%
True Positive Rate
True Positive Rate
AUC = 100%
0
%
0
%
False Positive Rate
100
%
0
%
False Positive Rate
100
%
100%
True Positive Rate
100%
0
%
False Positive Rate
True Positive Rate
AUC = 90%
0
%
AUC = 50%
100
%
AUC = 65%
0
%
0
%
False Positive Rate
100
%
An example:
Prospective Validation of the Pediatric
Appendicitis Score, Goldman et al., 2008
What is the total area
under the ROC curve
for the PAS?
(Results pg. 279, Figure 1)
Patients and clinicians have a different question…
Positive and Negative Predictive Values
• Positive predictive value is the probability that a
patient with a positive test result really does have
the condition for which the test was conducted.
• Negative predictive value is the probability that a
patient with a negative test result really is free of
the condition for which the test was conducted
• Predictive values give a direct assessment of the
usefulness of the test in practice
– influenced by the prevalence of disease in the population
that is being tested
Pre- and
Post-Test Probability
a solution to the deficiencies of
sensitivity/specificity and predictive values?
Positive Likelihood Ratio
probability of an individual with the condition
having a positive test divided by the
probability of an individual without the condition
having a positive test
A helpful test will have a large LR positive
sensitivit
y
LR(positive) 
1 specificity
Negative Likelihood Ratio
probability of an individual with the condition
having a negative test divided by the
probability of an individual without the condition
having a negative test
A helpful test will have a small LR negative
1
sensitivit
y
LR(negative) 
specificit y
LR =1 = no diagnostic value
LR < 0.1 = strong negative
test result
LR >10 = strong positive
test result
Likelihood Nomogram
Pre test 5%
Post test 20%
? Appendicitis:
McBurney
tenderness
LR+ = 3.4
Likelihood Ratio
From:
J Gen Intern Med. 2002
August; 17(8): 647–650.
doi: 10.1046/j.15251497.2002.10750.x
Values between 0 and 1
decrease the probability of
disease
0.1
0.2
0.3
0.4
0.5
1
Values greater than 1 increase
the probability of disease
2
3
4
5
6
7
8
9
10
Approximate Change in
Probability (%)*
−45
−30
−25
−20
−15
0
+15
+20
+25
+30
+35
+40
+45
Appraising articles on diagnosis in 3 easy steps:
Are the results important?
Can the results be applied to my patient?
Can I do the test in my setting?
Do results apply to the mix of patients I see?
Will the result change my management?
Costs to patient/health service?
Why is
understanding
prognosis
important?
Appraising articles on prognosis in 3 easy steps:
Sample of patients assembled at a common
point in the course of their disease?
Follow-up sufficient and complete?
Outcome criteria objective?
Adjustment for important prognostic factors?
Are the results important?
Can the results be applied to my patient?
Sample defined at common point?
An example:
Risk of epilepsy after febrile convulsions: a national cohort study,
Verity & Golding, 1991
What kind of study is this?
Was a defined, representative sample of
patients assembled at a common point in
the course of their disease?
(Abstract)
Cohort studies: best design by studying patients with
the disease over time
Case control: limited value by strength of inference
Follow-up sufficient and complete?
An example:
Risk of epilepsy after febrile convulsions: a national cohort study,
Verity & Golding, 1991
How long were patients followed? Was this
long enough to determine the outcome in
question?
(Abstract)
Outcome criteria objective?
An example:
Risk of epilepsy after febrile convulsions: a national cohort study,
Verity & Golding, 1991
What were the criteria applied for
ascertaining the outcome?
(Abstract)
Adjustment for important prognostic factors?
An example:
Risk of epilepsy after febrile convulsions: a national cohort study,
Verity & Golding, 1991
Were there subgroups to follow?
(Abstract & Results)
Appraising articles on prognosis in 3 easy steps:
What is the risk of the outcome over time?
Are the results important?
How precise are the estimates?
Can the results be applied to my patient?
What is the risk of the outcome over time?
Odds Ratios
Relative Risk
Survival Curves
Survival Curves
Post Transplant Survival
for Patients with CF at
Two Time Periods
Liou et al. Am J Respir Crit Care Med 2005
How precise are the estimates?
Confidence Intervals
• Around a rate
– Gives the reader a sense of precision
– It represents the range that the test statistic
would be expected to fall in if the study were
repeated 100 number of times
• Ex. A 95% CI means that 95 out of 100 times the test
statistic would fall within that range
Appraising articles on prognosis in 3 easy steps:
Are the results important?
Is my patient so different to those in the study
that the results cannot apply?
Can the results be applied to my patient?
Will this evidence make a clinically important
impact on my conclusions about what to offer
my patients?
Screening
Basically, the process of deciding whether to
screen follows the following format:
1) Is the prevalance of the disease high enough in
the target population to warrant the time and
expense of screening?
2) Does a therapy exist which will significantly
reduce the risk of disease?
3) If not, will early screening effect the
duration/severity of the disease?
4) Is the screening test itself sufficiently sensitive
to catch the disease so that treatment may
progress?
When appraising articles, always consider validity, importance, and applicability
Feeling confident in making the
diagnosis and understanding
prognosis helps determine
whether to proceed with
therapy
Knowledge translation in this setting is the interpretation and
integration of appraised and accepted evidence into clinical
practice recommendations