hoofdstuk 2 - maten voor ziekte

Download Report

Transcript hoofdstuk 2 - maten voor ziekte

Statistical Methods for Analysis
of Diagnostic Accuracy Studies
Jon Deeks
University of Birmingham
with acknowledgement to Hans Reitsma
Measures of diagnostic accuracy
•
•
•
•
•
Positive and negative predictive values
Sensitivity and specificity
Likelihood ratios
Area under the ROC curve
Diagnostic odds ratio
Diagnostic accuracy studies
• Results from the index test are
compared with the results obtained
with the reference standard on the
same subjects
• Accuracy refers to the degree of
agreement between the results of
the index test and those from the
reference standard
Basic Design
Series of patients
Index test
Reference standard
Cross-classification
Clinical problem
• Diagnostic value of B type natriuretic
(BNP) measurement
• Does BNP measurement distinguish
between those with and without left
ventricular dysfunction in the elderly?
• Smith et al. BMJ 2000; 320: 906.
Anatomy of diagnostic study
•
•
•
•
Target population: unscreened elderly
Index test: BNP
Target condition: LVSD
Final diagnosis (reference standard):
echocardiography – global and regional
assessment of ventricular function
including measurement of LV ejection
fraction
Our example
Elderly patients
BNP measurement
Echocardiography for LVSD
Cross-classification
Results of BNP study
LVSD
>=18.7
Present
Absent
11
50
61
1
93
94
12
143
155
TP FP
BNP
FN TN
<18.7
Measures of test performance
• sensitivity
11 / 12 = 92%
• specificity
93 / 143 = 65%
< Pr(T+|D+) >
< Pr(T-|D-) >
LVSD
Present
Absent
>=18.7
11
50
61
<18.7
1
93
94
12
143
155
BNP
Measures of test performance
• positive predictive value
11 / 61 = 18%
< Pr(D+|T+) >
• negative predictive value
93 / 94 = 99%
< Pr(D-|T-) >
LVSD
Present
Absent
>=18.7
11
50
61
<18.7
1
93
94
12
143
155
BNP
Sensivity and Specificity not
directly affected by prevalence
• sensitivity
131 / 143 = 92%
• specificity
93 / 143 = 65%
LVSD
Present
Absent
>=18.7
131
50
181
<18.7
12
93
105
143
143
286
BNP
Predictive values directly
affected by prevalence
• positive predictive value
131 / 181 = 72%
• negative predictive value
93 / 105 = 89%
LVSD
Present
Absent
>=18.7
131
50
181
<18.7
12
93
105
143
143
286
BNP
Do sensitivity and specificity vary
with prevalence?
• Test performance is sometimes observed to be different in
different settings, patient groups, etc.
• Occasionally attributed to differences in disease prevalence, but:
– diseased and non-diseased spectrums differ as well.
• e.g. using a test in primary care and secondary care referrals
– the diseased group are different (cases more difficult)
– the non-diseased group are different (conditions more similar)
– sensitivity may decrease, specificity certainly decreases
Likelihood ratios
• Why likelihood ratios?
• Applicable in situations with more than
2 test outcomes
• Direct link from pre-test probabilities
to post-test probabilities
Likelihood ratios
• Information value of a test result
expressed as likelihood ratio
Pr(T  | D ) 11/ 12
LR  

 2.6
Pr(T  | D ) 50 / 143
LVSD
Present
Absent
>=18.7
11
50
61
<18.7
1
93
94
12
143
155
BNP
Likelihood Ratio of positive test
• How more often a positive test result
occurs in persons with compared to
those without the target condition
Pr(T  | D )
LR  
Pr(T  | D )
Likelihood ratios
• Likelihood ratio of a negative test
result
Pr(T  | D )
LR  
Pr(T  | D )
• How less likely a negative test result
is in persons with the target
condition compared to those without
the target condition
Likelihood ratios
Pr(T  | D)
1 / 12
LR 

 0.13
Pr(T  | D) 93 / 143
LVSD
Present
Absent
>=18.7
11
50
61
<18.7
1
93
94
12
143
155
BNP
Calculate likelihood ratios from
column percentages
LVSD
Present
Absent
LR
>=18.7 91.67%
34.97%
2.62
<18.7
8.33%
65.03%
0.13
100%
100%
BNP
Interpreting likelihood ratios
• A LR=1 indicates no diagnostic value
• LR+ >10 are usually regarded as a ‘strong’
positive test result
• LR- <0.1 are usually regarded as a strong
negative test result
• But it depends on what change in
probability is needed to make a diagnosis
92%
LR+ = 10
10%
50%
55%
Advantages of likelihood ratios
• Still useful when there are more than
2 test outcomes
BNP is a continuous measurement
• Dichotomisation of BNP (high vs. low)
means loss of information
• Higher values of BNP are more
indicative of LVSD
Results BNP study
LVSD
BNP
Present
Absent
Total
 26.7
9
28
37
 18.7 -26.7
2
22
24
<18.7
1
93
94
Total
12
143
155
Likelihood ratios
• Stratum specific likelihood ratios in
case of more than 2 test results
Pr(T  x | D )
LR(T  x) 
Pr(T  x | D)
Compute LR from column
percentages
LVSD
BNP
Present
Absent
 26.7
75%
20%
3.83
 18.7 -26.7
17%
15%
1.08
<18.7
8%
65%
0.13
Total
100%
100%
LR
Bayes’ rule
Post-test odds for disease
=
Pre-test odds for disease x Likelihood ratio
Bayes’ rule
• Pre-test odds
– chance of disease expressed in odds
– example: if 2 out of 5 persons have
the disease: probability = 2/5 in
odds = 2/3
Bayes’ rule
• odds = probability / (1 – probability)
Pr(D )
Odds( D) 
1  Pr(D )
• probability = odds / (1 + odds)
Odds( D )
Pr( D ) 
1  Odds( D )
Bayes’ rule
patient with BNP >26.7
•
•
•
•
•
Pre-test probability = 0.5
Pre-test odds = 0.5 / (1-0.5) = 1
LR(BNP >26.7) = 3.83
Post-test odds = 1x3.83 = 3.83
Post-test probability = 3.83 / (1+3.83)
= 0.79
Bayes’ rule
patient with BNP lower than 18.7
•
•
•
•
•
Pre-test probability = 0.5
Pre-test odds = 0.5 / (1-0.5) = 1
LR(CK < 40) = 0.13
Post-test odds = 1 x 0.13 = 0.13
Post-test probability = 0.13 / (1+0.13)
= 0.12
Probability for LVSD after BNP
BNP
LR
Pre-test prob.
50%
Post test prob.
 26.7
3.83
79%
18.7-26.7
1.08
52%
<18.7
0.13
12%
79%
52%
50%
12%
5%
17%
5%
1%
Probability for LVSD after BNP
BNP
LR
Pre-test prob.
5%
50%
Post test prob.
 26.7
3.83
17%
79%
18.7-26.7
1.08
5%
52%
<18.7
0.13
1%
12%
Confidence intervals
• Sample uncertainty should be described
for all statistics, using confidence
+ gives upper limit
intervals
- gives lower limit

 
95% CI  ˆ  z / 2  se ˆ
estimate of
effect
Normal deviate
(1.96 for 95% CI)
Standard
error of
estimate
Confidence Intervals for Proportions
• Sensitivity, specificity,
positive and negative
predictive values, and
overall accuracy are all
proportions
se pˆ  
r
pˆ 
n
pˆ (1  pˆ )
n
Exact or Asymptotic CI?
• Asymptotic CI are approximations
• Inappropriate when
– proportion is near 0% or near 100%
– sample sizes are small
(confidence intervals are not symmetric in these
cases)
• Preferable to use Binomial exact methods
– can be computed in many statistics packages
– or refer to tables
Comparison of Asymptotic and Exact
Methods
r/n
0/20
1/20
2/20
3/20
4/20
5/20
6/20
7/20
8/20
9/20
10/20
p
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
95% Confidence intervals
Asymptotic
Exact
not calculable
(0% to 14%)
(-5% to 15%)
(0% to 25%)
(-3% to 23%)
(1% to 32%)
(-1% to 31%)
(3% to 38%)
(2% to 38%)
(6% to 44%)
(6% to 44%)
(9% to 49%)
(10% to 50%)
(12% to 54%)
(14% to 56%)
(15% to 59%)
(19% to 61%)
(19% to 64%)
(23% to 67%)
(23% to 68%)
(28% to 72%)
(27% to 73%)
Confidence Intervals for Ratios of
Probabilities and Odds
• Likelihood ratios
are ratios of
probabilities
r2
RR 
r1
n2
n1
1 1 1 1
seln RR 
  
r1 r2 n1 n2
• Odds ratios are
ratios of odds
OR 
r2
n2  r2 
r1
n1  r1 
1 1
1
1
seln OR 
 

r1 r2 n1  r1 n2  r2
CIs for study
• Sensitivity = 92% (62%, 100%)
• Specificity = 65% (57%, 73%)
• PPV = 82% (70%, 91%)
• NPV = 99% (94%, 100%)
• LR(>= 26.7) = 3.8 (2.4, 6.1)
• LR(18.7 < 26.7) = 1.1 (0.3, 4.1)
• LR(<18.7) = 0.13 (0.02, 0.84)
ROC-curve
• ROC stands for Receiver Operating
Characteristic
• ROC-curve shows the pairs of
sensitivity and specificity that
correspond to various cut-off points
for the continuous test result
Continuous diagnostic test results
Diagnostic Threshold
specificity=94%
sensitivity=94%
Non-diseased
TN
FN
Diseased
FP
TP
Heterogeneity in Threshold
Diagnostic Threshold
specificity=99%
sensitivity=71%
Non-diseased
TN
Diseased
FN
FP
TP
Heterogeneity in Threshold
Diagnostic Threshold
specificity=97%
sensitivity=86%
Non-diseased
TN
Diseased
FN
FP
TP
Heterogeneity in Threshold
Diagnostic Threshold
specificity=94%
sensitivity=94%
Non-diseased
TN
FN
Diseased
FP
TP
Heterogeneity in Threshold
Diagnostic Threshold
specificity=97%
sensitivity=86%
Non-diseased
TN
FN
Diseased
FP
TP
Heterogeneity in Threshold
Diagnostic Threshold
specificity=71%
sensitivity=99%
Non-diseased
TN
FN
FP
Diseased
TP
Threshold effects
Fetal fibronectin
Decreasing
threshold
increases
sensitivity but
decreases
specificity
.6
.4
.2
Increasing
threshold
increases
specificity but
decreases
sensitivity
0
sensitivity
.8
1
for predicting spontaneous birth
1
.8
.6
.4
specificity
.2
0
Change in cut-off value
and effect on sens & spec
Cut-off Sensitivity Specificity
9999
26.7
19.8
18.7
0
0%
100%
75%
80%
83%
70%
92%
65%
100%
0%
ROC-curve BNP
100%
Cut-off:  18.7
Sensitivity
80%
Cut-off:  19.8
Cut-off:  26.7
60%
40%
20%
0%
0%
20% 40% 60% 80% 100%
1-specificity
ROC curve
• Shows the effect of different cut-off
values on sensitivity and specificity
• Better tests have curves that lie closer to
the upper left corner
• Area under the ROC is a single measure of
test performance (higher is better)
• Shape
– RAW continuous data gives steps
– GROUPED data gives straight sloping lines
– FITTED ROC curves are smoothed.
Variation in diagnostic threshold
At what level, is a test result categorised as +ve, and
how should the threshold be selected?
Threshold affects the performance of the test, as
described by ROC curves, and likelihood ratios
Depends on
disease prevalence (affects +ve and -ve predictive
values)
relative costs of false positive and false negative
misdiagnoses
relative benefits of true positive and true negative
diagnoses
Workshop exercise – erratum
• Q16 page 8
Compute post-test probabilities for a
high risk patient, pre-test prob=50%
Q19 page 10
LVSD
MI or BNP
+ve
-ve
+ve
-ve
36
63
4
40
23
86