Validity and reliability: screening and diagnostic tests

Download Report

Transcript Validity and reliability: screening and diagnostic tests

Lecture 2
Screening and diagnostic tests
•
•
•
•
•
•
Normal and abnormal
Validity: “gold” or criterion standard
Sensitivity, specificity, predictive value
Likelihood ratio
ROC curves
Bias: spectrum, verification, information
1
Clinical/public health
applications
• screening: for asymptomatic disease (e.g.,
Pap test, mammography)
• case-finding: testing of patients for diseases
unrelated to their complaint
• diagnostic: to help make diagnosis in
symptomatic disease or to follow-up on
screening test
2
Evaluation of screening and
diagnostic tests
• Performance characteristics
– test alone
• Effectiveness (on outcomes of disease):
– test + intervention
3
Criteria for test selection
•
•
•
•
•
•
Reproducibility
Validity
Feasibility
Simplicity
Cost
Acceptability
4
Sources of variation:
Biological or true variation
• between individuals
• within individuals (e.g., diurnal variation in
BP)
– “controlled” by standardizing time of
measurement
5
Sources of variation:
Measurement error
• random error vs systematic error (bias)
• method (measuring instrument)
• observer
6
7
Quality of measurements
• Validity (accuracy)
– Does it measure what it is intended to?
– Lack of bias
• Reproducibility (reliability, precision,
consistency) of measurements
8
Examples of types of
reproducibility
• Between and within observer (inter- and
intra-observer variation)
– May be random or systematic
• Regression toward the mean
– Systematic error when subjects have extreme
values (more likely to be in error than typical
values)
9
Validity (accuracy)
• Criterion validity
– concurrent
– predictive
• Face validity, content validity: judgement of the
appropriateness of content of measurement
• Construct validity: validity of underlying entity or
theoretical construct
10
Normal vs abnormal
• Statistical definition
– “Gaussian” or “normal” distribution
• Clinical definition
– using criterion
11
12
13
14
15
Selection of criterion
• Concurrent
– salivary screening test for HIV
– history of cough more than 2 weeks (for TB)
• Predictive
– APACHE (acute physiology and chronic
disease evaluation) instrument for ICU patients
– blood lipid level
– maternal height
16
"True" Disease Status
Present
Absent
Screening
test results
Positive
"True positives"
A
"False positives"
B
Negative
"False negatives"
C
"True negatives"
D
Sensitivity of screening test =
A
A+C
Specificity of screening test =
D
B+D
Predictive value of positive test =
A
A+B
Predictive value of negative test =
D
C+D
17
Sensitivity and specificity
Assess correct classification of:
• People with the disease (sensitivity)
• People without the disease (specificity)
18
Predictive value
• More relevant to clinicians and patients
• Affected by prevalence
19
Choice of cut-point
If higher score increases probability of disease
• Lower cut-point:
– increases sensitivity, reduces specificity
• Higher cut-point:
– reduces sensitivity, increases specificity
20
Considerations in selection of
cut-point
Implications of false positive results
• burden on follow-up services
• labelling effect
Implications of false negative results
• Failure to intervene
21
Likelihood ratio
• Likelihood ratio (LR) = sensitivity
1-specificity
• Used to compute post-test odds of disease
from pre-test odds:
post-test odds = pre-test odds x LR
• pre-test odds derived from prevalence
• post-test odds can be converted to
predictive value of positive test
22
Example of LR
•
•
•
•
prevalence of disease in a population is 25%
sensitivity is 80%
specificity is 90%,
pre-test odds = 0.25 = 1/3
1 - 0.25
• likelihood ratio = 0.80 = 8
1-0.90
23
Example of LR
• If prevalence of disease in a population is
25%
• pre-test odds = 0.25 = 1/3
1 - 0.25
• post-test odds = 1/3 x 8 = 8/3
• predictive value of positive result = 8/3+8
= 8/11 = 73%
24
Receiver operating characteristic
(ROC) curve
• Evaluates test over range of cut-points
• Plot of sensitivity against 1-specificity
• Area under curve (AUC) summarizes
performance:
– AUC of 0.5 = no better than chance
25
26
Spectrum bias
• Study population should be representative
of population in which test will be used
• Is range of subjects tested adequate?
– In population with low risk of outcome,
sensitivity will be lower, specificity higher
– In population with high risk of outcome,
sensitivity will be higher, specificity lower
• Comorbidity may affect sensitivity and
specificity
27
Verification bias
• results of test affect intensity of subsequent
investigation
• increasing probability of detection of
outcome in those with positive test result
28
Information bias
• Diagnosis is not blind to test result
• Improves test performance
29
Example: Screening seniors in the
emergency department (ED) for risk of
function decline
• High risk group
• Many not adequately evaluated or referred
for appropriate services
• Development and validation of a brief
screening tool to identify those at increased
risk of functional decline and other adverse
outcomes
30
Two multi-site studies in
Montreal EDs
• Study 1: development of ISAR
– Prospective observational cohort study
– JAGS (1999) 47: 1226-1237.
• Study 2: evaluation of 2-step intervention
– randomized controlled trial
– JAGS (2001) 49: 1272-1281.
31
Common features of 2 studies
• 4 Montreal hospitals (2 participated in both
studies)
• Patients aged 65+, community dwelling,
English or French-speaking
• Exclusions:
– cognitively impaired or severe illness
with no proxy informant
– language barrier (no English or French)
32
Differences between 2 studies:
Study design
• Study 1
– Observational study
– Follow-up at 3 and 6 months after ED visit
• Study 2
– Randomized controlled trial: 2-step intervention
vs usual care
– Randomization by day of visit
– Follow-up at 1 and 4 months after ED visit
33
RESULTS: ISAR development
Adverse health outcome defined as any of
following during 6 months after ED visit
• >10% ADL decline
• Death
• Institutionalization
34
Scale development
• Selection of items that predicted all adverse
health events
• Multiple logistic regression - “best subsets”
analysis
• Review of candidate scales with clinicians
to select clinically relevant scale
35
Identification of Seniors At Risk
(ISAR)
1. Before the illness or injury that brought you to the Emergency, did you
need someone to help you on a regular basis? (yes)
2. Since the illness or injury that brought you to the Emergency, have you
needed more help than usual to take care of yourself? (yes)
3. Have you been hospitalized for one or more nights during the past 6
months (excluding a stay in the Emergency Department)? (yes)
4. In general, do you see well? (no)
5. In general, do you have serious problems with your memory? (yes)
6. Do you take more than three different medications every day? (yes)
Scoring: 0 - 6 (positive score shown in parentheses)
36
A ny
a d ve
%
8 0
D is c h a r g e
Ad m itte d
6 0
4 0
2 0
0
0
1
2
3
4
ISAR
5 - 6
SC
37
O
Other Outcomes Related to ISAR
Source: Dendukuri et al, JAGS, in press
• Does ISAR score identify patients with
current functional problems?
– Self-reported premorbid function (OARS)
– Function at home visit assessed by nurse 1-2
weeks after ED visit (SMAF)
38
Area Under the curve (AUC) for concurrent validity criteria
Detection of depression
at baseline
Study 2
Severe functional
impairment
OARS: Study 1
OARS: Study 2
SMAF: Study 1
0.5
0.6
0.7
0.8
AUC (95% confidence interval)
0.9
1.0
39
Other Outcomes Related to ISAR
• Does ISAR predict adverse outcomes (other
than functional decline) during the
subsequent 5 or 6 months?
–
–
–
–
High hospital utilization (11+ days/5 months)
Frequent ED visits
Frequent community health center visits
Increase in depressive symptoms
40
Area Under the Curve(AUC) for predictive validation criteria
among patients discharged from ED
Adverse health outcome
Study 1
Increase in depressive
symptoms
Study 2
10+ community health
center visits/5 months
Study 2
11+ hospital days/ 5 months
Study 1
Study 2
2+ ED visits/ 5 months
Study 1
Study 2
0.5
0.6
0.7
0.8
0.9
1.0
41
AUC (95% confidence interval)
Summary of data on performance
• Very good detection of patients with current
functional problems and depression (AUC
values 0.8 - 0.9)
• Moderate ability to predict future adverse
health events (functional decline) and health
center utilization (AUC values around 0.7)
• Fair ability to predict future hospital and ED
utilization (AUC values 0.6 - 0.7)
42
Comparison with other screening tools
for patients admitted to hospital
Source: McCusker et al, J Gerontol 2002; 57A: M569-577
• Systematic literature review
• Predictors of functional decline (including
nursing home admission) among
hospitalized seniors
• Investigated individual risk factors and
predictive indices
43
Predictive indices
• Inouye (1993): FD and NH at 3 mo
– 4 factors: decubitus ulcer, cognitive
impairment, premorbid functional impairment,
low social activity
• Mateev(1998): D/NH at 3 mo.
– clinical targeting criteria
44
Predictive indices (cont)
• McCusker (1999): FD/NH/ D at 6 mo.
– Identification of Seniors At Risk (ISAR): 6item self-report questionnaire
• Narain (1988): NH at 6 mo
– hand-developed algorithm based on residence,
mental status, diagnosis
45
Predictive indices (cont)
• Rubenstein (1984): FD and NH at 12 mo
– expected discharge location and diagnosis
• Sager (1996): FD at 3mo
– Hospital Admission Risk Profile (HARP) (age,
MMSE and IADL)
• Zureik (1997): NH at discharge
– 6-item index
46
Performance of 7 predictive indices for
functional decline
1 0 0
9 0
D
8 0
7 0
A
E
F
G
C
B
6 0
Sensitvy
5 0
C
4 0
F
A
3 0
C
2 0
A
:
Inouy e
B
:
Mateev
C :
Mc C us
D :
N ar
ai n
E
R uben
:
F
:
1 0
G:
S
ager
(1
Z
urei k (
0
0
1
2
0
3
0
4
0
5
0
6
0
7
0
8
0
9
1
0
0
0 0
47
1 - Sp e c
if
Performance of predictive indices
• Moderate performance (AUC 0.65 - 0.66)
48