No Slide Title

Download Report

Transcript No Slide Title

Statistical Aspects of Diagnostic Tests
R. M. Pandey
Appearance to the mind are of four kinds :
1.
2.
3.
4.
Things either are what they appear to be (TP)
Or they neither are, nor appear to be; (TN)
Or they are, and do not appear to be; (FN)
Or they are not, yet appear to be; (FP)
Rightly to aim in all these cases is the wise man’s task –
Epicteus, 2nd Century AD (1)
•
•
Diagnostic tests: To predict the disease (condition)
Prognostic tests: To predict outcome of a Disease(condition)
SIMPLIFYING DATA :
• Clinical Measurements : nominal, ordinal, interval scales
THE ACCURACY OF A TEST RESULT :
• Establishing diagnosis : Imperfect process – probability than certainty
The “Gold Standard” :
• What is a Gold Standard ?
• Tissue daig, radiological contrast
procedures, prolonged follow up, autopsies
• Almost always more costly, less feasible
• Lack of objective standards of disease(e.g.
angina Pectoris:Gold standard is careful
history taking)
• Consequences of imperfect standards
Diagnostic Characteristics
Not a hypothesis testing situation
BUT
– How well does the test identify patients
with a disease?
– How well does the test identify patients
without a disease?
Evaluation of the Diagnostic Test
• Give a group of people (with and without
the disease) both tests (the candidate test
and the “gold standard” test and then crossclassify the results and report the
diagnostic characteristics of the test.
• Spectrum of patients
• Bias (e.g. X-rays)
• Chance (compute adequate sample size as in
prevalence studies: separately for cases and
non-cases)
n= 4 p (100-p)/ d2
PREDICTIVE VALUE
• Definitions
• Determinants of predictive Value
Truth or Gold
Standard
+
+
-
a
(TP)
b
(FP)
Candidate Test
-
c
(FN)
d
(TN)
Probability
• The concept: If a trial (or experiment) is
independently repeated a large number of
times (N) and an outcome (A) occurs n
times, then:
– P(A) = n/N
– Interpretation: if the trial is repeated again in
the future, the likelihood that A will be the
outcome again is n/N.
Diagnostic Characteristics
Sensitivity: The probability that a diseased
individual will be identified as such by the test
= P(T+ / D+) = a/(a+c)
Specificity: The probability that an individual
without the disease will be identified as
such by the test
= P(T- / D-) = d/(b+d)
• A perfect test would have b and c equal to 0.
Diagnostic Characteristics
• False positive rate = P(T+ / D-)
= b/(b+d)
= 1 – Specificity
• False negative rate = P(T- / D+)
= c/(a+c)
= 1 – Sensitivity
Predictive Values of Diagnostic Tests
• More informative from the patient or
physician perspective
• Special applications of Bayes Theorem
Predictive Values of Diagnostic Tests
• Positive Predictive Value
= P(D+ / T+) = a/(a+b)
if the prevalence of disease in the general
population is the same as the prevalence of
disease in the study
Predictive Values of Diagnostic Tests
• Negative Predictive Value
= P(D- / T-) = d/(c+d)
if the prevalence of disease in the general
population is the same as the prevalence of
disease in the study
Example:
A researcher develops a new saliva pregnancy test.
She collects samples from 100 women known to
be pregnant by blood test (the gold standard) and
100 women known not be pregnant, also based on
the same blood test.
The saliva test is “positive” in 95 of the pregnant
women. It is also “positive” in 15 of the 100 nonpregnant women. What are the sensitivity and
specificity?
Test
Saliva +
Saliva Totals
Gold standard
Pregnant
Non-pregnantTotals
95
5
100
15
85
100
Sensitivity = TP/(TP+FN) = 95/100 = 95%
Specificity = TN/(TN+FP) = 85/100 = 85%
110
90
200
Is it more important that a test be sensitive or
specific?
• It depends on its purpose. A cheap mass screening
test should be sensitive (few cases missed). A test
designed to confirm the presence of disease should
be specific (few cases wrongly diagnosed).
• Note that sensitivity and specificity are two
distinct properties. Where classification is based
on an cut point along a continuum, there is a
tradeoff between the two.
Example:
The saliva pregnancy test detects progesterone
(a pregnancy-related hormone). A refined
version is developed.
Suppose you add a drop of indicator solution
to the saliva sample. It can stay clear (0
reaction) or turn green (1+), red (2+), or
black (3+).
The researcher conducts a validation study and finds
the following:
Pregnant
Non-pregnantTotals
Saliva 3+
Saliva 2+
Saliva 1+
Saliva 0
85
10
3
2
5
10
17
68
90
20
20
70
Totals
100
100
200
The sensitivity and specificity of the saliva test will depend
on the definition of “positive” and “negative” used.
• If “positive”  1+, sensitivity = (85+10+3)/100 = 98%
specificity = 68/100 = 68%
• If “positive”  2+, sensitivity = (85+10)/100 = 95%
specificity = (68+17)/100 = 85%
• If “positive” = 3+, sensitivity = 85/100 = 85%
specificity = (68+17+10)/100 = 95%
The choice of cutpoint depends on the relative
adverse consequences of false-negatives vs. falsepositives.
If it is most important not to miss anyone, use
sensitivity and  specificity.
If it is most important that people not be erroneously
labeled as having the condition, use  sensitivity
and  specificity.
Key points:
• The positive and negative predictive values depend on the
pretest probability of the condition of interest - in addition to
the sensitivity and specificity of the test.
• This pretest probability is often the prevalence of the condition
in the population of interest.
• But it can also reflect restriction of this population based on
clinical features and/or other test results.
• For example, the pretest probability of pregnancy will be very
different among young women using oral contraceptives from
that among sexually active young women using no form of
contraception.
Example: The saliva pregnancy test is administered 30 days after
the first day of the last menstrual period to two groups of women
who have thus far “missed” a period.
Group 1: 1000 sexually active young women using no
contraception. Pretest probability of pregnancy 40%
(hypothetical)
Based on sensitivity of 95%, expected TP = 400 x 0.95 = 380
expected FN = 400-380 = 20
Based on specificity of 85%, expected TN = 600 x 0.85 = 510
expected FP = 600-510 = 90
Test +
Test Totals
Pregnant
380
20
400
Non-pregnant
90
510
600
Totals
470
530
1000
Positive predictive value = TP = 380/470 = 81%
TP+FP
In this context, a woman with a positive saliva test
has an 81% chance of being pregnant.
Negative predictive value = TN = 510/530 = 96%
TN+FN
In this context, a woman with a negative saliva test
has a 96% chance of not being pregnant (and a 4%
chance of being pregnant)
Group 2: 1000 oral contraceptive users - pretest probability of
pregnancy = 10% (hypothetical)
Test +
Test Totals
Pregnant
95
5
100
Using sensitivity = 95%,
Using specificity = 85%,
Non-pregnant
135
765
900
expected TP = 0.95 x 100 = 95
expected FN = 100-95 = 5
expected TN = 0.85 x 900 = 765
expected FP = 900-765 = 135
In this context, positive predictive value is only 95/230 = 41% [TP/(TP+FP)]
Negative predictive value is [TN/(TN+FN)]= 765/770 = 99%
Totals
230
770
1000
In which situation is the saliva test more helpful?
Group 1:
PPV: 81% probability of pregnancy
(Pretest probability 40%)
NPV: 96% probability of pregnancy
Group 2:
PPV: 41% probability
(Pretest probability 10%)
NPV: 99% probability
Note that the same test would likely be used and
interpreted very differently in these two contexts.
• This does not imply any difference in the
characteristics of the test itself, i.e. sensitivity and
specificity are not altered by the pretest
probability of the condition of interest.
• Test are most useful when the pretest probability is
in a middle range. They are unlikely to be useful
when the pretest probability is already very high
or low.
Likelihood Ratio
LR’s are an alternate way of describing the performance
of a diagnostic test
Generic formula for LR* :
Probability of test result* in people with disease
_________________________________________
Probability of test result* in people without disease
Or :
Probability of test result * in disease of interest
----------------------------------------------------------------Probability of test result * in other disease (s)
Probability & Odds
D+ = Disease
D- = no Disease
D+
Probability of disease
= D+ /(D+ + D-)
D¯
Odds of Disease
= D+ / DEx. P = 0.3 = 0.3/(0.3+0.7)
Odds = 0.3/0.7=0.4
Odds and Probability
Probability
50%
1/3
0.2
Odds
Uses of Likelihood Ratio
Post test Odds
= Pre test X Likelihood Ratio
Likelihood Ratios : only two outcomes for a test)
When only two outcomes for a test : + or - ,
LR can be calculated from test’s sensitivity and specificity :
LR + =
Probability of + test in people with disease
-----------------------------------------------------------------Probability of + test in people without disease
= Sensitivity/(1- specificity)
LR - =
Probability of - test in people with disease
---------------------------------------------------------------------Probability of - test in people without disease
= (1- sensitivity) / specificity
Likelihood Rations : Advantages
____________________________
•
Sensitivity and specificity are concepts that often waste
information
•
Likelihood ratios best describe the value of diagnostic tests.
•
Likelihood ratios are best for combinations of tests :
(a) easier to calculate for combinations of independent tests.
(b)
better estimates of probability when tests are not
independent (LR for each combination of test results.)
Limitations and strategies:
Studies on diagnostic tests are susceptible to errors
due to chance and bias
Design phase strategies should deal with these errors
• Random errors (compute 95% CI)
• Systematic errors
• Sampling bias
• Measurement bias
• Reporting bias
Sampling Bias
A. Selection of cases and non-cases
•
Selection of cases from referral center leads to over estimation of
sensitivity
•
Selection of non-cases from volunteers leads to over estimation of
specificity
•
Therefore, Study sample should be representative of the target
population in which test would eventually be used
B. Prevalence(prior probability) of disease in the patients being
studied
•
Higher the prevalence would lead to over estimation of predictive
values
•
Therefore, report predictive values at various prevalences
Measurement Bias
A. Non blinding of raters
B. Borderline or unsatisfactory results
Reporting Bias(Publication bias)
Steps in Planning a Study of a Diagnostic Test
1. Determine whether there is a need for a new diagnostic test
2. Describe the way in which the subjects will be selected
3. Should have a reasonable Gold Standard
4. Ensure application of Gold standard and diagnostic test in
standardized and blinded manner
5. Should estimate sample size required 95% CI for sensitivity &
specificity
6. Find a sufficient number of willing subjects to satisfy ‘n’ and
sampling criteria
7. Finally report the results in terms of sensitivity, specificity as
well as PPV, NPV at different prior probability of disease.
Diagnostic Tests and Screening
Readings:
• Jean Bourbeau, Dick Menzies, Kevin Schwartzman, Epidemiology and
Biostatistics 679: Clinical Epidemiology May 31-June 25, 2004,Respiratory
Epidemiology and Clinical Research Unit Montreal Chest Institute K1 3650 St.
Urbain
• Fletcher, chapters 1 (Introduction), 3 (Diagnosis), 8 (Prevention)
• Barry MJ, Prostate-specific antigen testing for early diagnosis of prostate cancer,
N Engl J Med 2001; 344:1373-1377 [Clinical Practice]
• Hamm CW et al, Emergency room triage of patients with acute chest pain by
means of rapid testing for cardiac troponin T or troponin I, N Engl J Med 1997;
337:1648-53
• Scott Evans,Evaluation of Screening and Diagnostic Tests, Introduction to
Biostatistics, Harvard Extension School, Fall, 2004