Etched01” Title Slide

Download Report

Transcript Etched01” Title Slide

Slide 1
© 2003 By Default!
Risk Prediction Models:
Calibration, Recalibration, and
Remodeling
HST 951: Biomedical Decision Support
12/04/2006 – Lecture 23
Michael E. Matheny, MD, MS
Brigham & Women’s Hospital
Boston, MA
A Free sample background from www.powerpointbackgrounds.com
Slide 2
© 2003 By Default!
Lecture Outline

Review Risk Model Performance Measurements

Individual Risk Prediction for Binary Outcomes

Inadequate Calibration is “the rule not the
exception”

Addressing the problem with Recalibration and
Remodeling
A Free sample background from www.powerpointbackgrounds.com
Slide 3
© 2003 By Default!
Model Performance Measures

Discrimination

Calibration
– Ability to distinguish well between patients who
will and will not experience an outcome
– Ability of a model to match expected and
observed outcome rates across all of the data
A Free sample background from www.powerpointbackgrounds.com
Slide 4
© 2003 By Default!
Discrimination
Area Under the Receiver Operating Characteristic Curve
True _ Positive
A Free sample background
from_
www.powerpointbackgrounds.com
True
Positive  False _ Negative
Sens 
Spec 
True _ Negative
True _ Negative  False _ Positive
Slide 5
© 2003 By Default!
Discrimination
ROC Curve Generation
A Free sample background from www.powerpointbackgrounds.com
Slide 6
© 2003 By Default!
Calibration
Example Data
Expected Outcome
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
2.75
A Free sample background from www.powerpointbackgrounds.com
Observed Outcome
0
0
0
0
1
0
0
1
1
1
4
Slide 7
© 2003 By Default!
Standardized Outcomes Ratio

Most Aggregated (Crude) comparison of expected
and observed values

1 Value for Entire Sample

Risk-Adjusted by using a risk prediction model to
generate expected outcomes
Observed _ Outcomes
4



 1.45
 Expected _ Outcomes 2.75
A Free sample background from www.powerpointbackgrounds.com
Slide 8
© 2003 By Default!
Standardized Mortality Ratios
(SMR)
CAUSE OF DEATH
(ICD CODES 140-204)
EXPECTED
DEATHS
OBSERVED
DEATHS
SMR
1325.37
1516
1.14
Lip, Oral Cavity and Pharynx
33.81
47
1.39
Esophagus
36.84
45
1.22
Stomach
54.58
72
1.32
Colon, Rectum, Rectosigmoid
180.48
238
1.32
Pancreas
62.51
72
1.15
Trachea, Bronchus & Lung
430.98
481
1.12
Genitourinary
168.90
162
0.96
Bladder
45.02
50
1.11
Lymphomas
44.57
47
1.05
All Cancer Deaths
A Free sample background from www.powerpointbackgrounds.com
CANCER MORTALITY ANALYSIS ALL MALES, SCRANTON CITY, 1975-1985
Slide 9
© 2003 By Default!
Outcome Ratios

Strengths
– Simple
– Frequently used in medical literature
– Easily understood by clinical audiences

Weaknesses
– Not a quantitative test of model calibration
– Unable to show variations in calibration in different risk
strata
– Likely to underestimate the lack of fit
A Free sample background from www.powerpointbackgrounds.com
Slide 10
© 2003 By Default!
Outcome Ratios
Example Calibration Plot
A Free sample background from www.powerpointbackgrounds.com
Slide 11
© 2003 By Default!
Global Performance Measurements
with Calibration Components

Methods that calculate a value for each data point
(most granular)
– Pearson Test
– Residual Deviance
– Brier Score
1
2
 *  ( yi  pi )
n
A Free sample background from www.powerpointbackgrounds.com
Slide 12
© 2003 By Default!
Brier Score Calculation
Expected
Outcome
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
A Free sample background from www.powerpointbackgrounds.com
Observed
Outcome
0
0
0
0
1
0
0
1
1
1
(Yi – Pi)2
0.0025
0.01
0.0225
0.04
0.5625
0.09
0.1225
0.36
0.3025
0.25
1.7625
Slide 13
© 2003 By Default!
Brier Score Calculation
1
1
2
 *  ( yi  pi )  *1.7625  0.17625
n
10

To assess the accuracy of the set of predictions,
Spiegelhalter’s method is used
– Expected Brier (EBrier) = 0.18775
– Variance of Brier (VBrier) = 0.003292
( Brier  EBrier ) (0.17625  0.18775)
Z

 0.04357
0.5
0.5
VBrier
0.003292
A Free sample background from www.powerpointbackgrounds.com
Slide 14
© 2003 By Default!
Brier Score

Strengths
– Quantitative evaluation

Weaknesses
– Sensitive to sample size (↑sample size more likely to fail
test)
– Sensitive to outliers (large differences between expected
and observed)
– Difficult to determine relative performance in risk
subpopulations
A Free sample background from www.powerpointbackgrounds.com
Slide 15
© 2003 By Default!
Hosmer-Lemeshow
Goodness of Fit

Divide the data into subgroups and compare
observed to expected outcomes by subgroup

C Test

H Test
– Divides the sample into 10 equal groups (by
number of samples)
– Divides the sample into 10 groups (by deciles of
risk)
A Free sample background from www.powerpointbackgrounds.com
Slide 16
© 2003 By Default!
Hosmer-Lemeshow
Goodness of Fit
10
2
HL
G

j 1
(O j  E j )
2
E j (1  E j / n j )
2
8
~x
n j  number of observatio ns in the jth group
O j  observed number of cases in the jth group
E j  expected number of cases in the jth group
A Free sample background from www.powerpointbackgrounds.com
Slide 17
© 2003 By Default!
CALICO Registry
Hosmer-Lemeshow Goodness of Fit
C Test
Predicted Mortality by Decile (%)
Admissions
Observed
Expected
H-L
Deaths
Deaths
Statistic
0.007
-
.034
466
2
10.3
6.88
0.034
-
0.052
461
17
19.7
0.39
0.052
-
0.073
454
27
28.3
0.07
0.073
-
0.100
478
24
41.5
8.07
0.100
-
0.127
450
35
51.4
5.89
0.127
-
0.154
469
53
65.8
2.90
0.154
-
0.202
465
66
82.1
3.83
0.203
-
0.287
461
93
111.2
3.94
0.288
-
0.445
463
138
162.5
5.70
0.445
-
0.968
463
255
287.9
9.94
4630
710
860.8
47.61
Total
C= 47.61
A Free sample background from www.powerpointbackgrounds.com
df 8, p < 0.0001
Slide 18
© 2003 By Default!
Calibration Plot
C Test Data
1
0.9
0.8
0.7
Observed
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
Expected
A Free sample background from www.powerpointbackgrounds.com
0.6
0.7
0.8
0.9
1
Slide 19
© 2003 By Default!
CALICO Registry
Hosmer-Lemeshow Goodness of Fit
H Test
Predicted Mortality by Decile (%)
Admissions
Observed
Expected
H-L
Deaths
Deaths
Statistic
0.007
-
0.100
1859
70
99.9
9.46
0.100
-
0.200
1348
149
192.0
11.24
0.200
-
0.300
555
115
135.5
4.10
0.301
-
0.400
323
97
110.9
2.65
0.400
-
0.499
185
58
83.0
13.64
0.500
-
0.598
131
70
71.7
0.09
0.600
-
0.694
103
58
66.4
3.02
0.701
-
0.800
65
48
48.6
0.03
0.803
-
0.896
48
34
40.7
7.29
0.904
-
0.968
13
11
12.1
1.59
4630
710
860.8
53.10
Total
H= 53.10
A Free sample background from www.powerpointbackgrounds.com
df 8, p < 0.0001
Slide 20
© 2003 By Default!
Calibration Plot
H Test Data
1
0.9
0.8
0.7
Observed
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
Expected
A Free sample background from www.powerpointbackgrounds.com
0.6
0.7
0.8
0.9
1
Slide 21
© 2003 By Default!
Hosmer-Lemeshow
Goodness of Fit

Strengths
– Quantitative evaluation
– Assesses calibration in risk subgroups

Weaknesses
– Disagreement with how to generate subgroups (C versus H)
– Even among the same method (C or H), different statistical packages
generate different results due to rounding rule differences
– Sensitive to sample size (↑sample size more likely to fail test)
– Sensitive to outliers (but to a lesser degree than Brier Score)
A Free sample background from www.powerpointbackgrounds.com
Slide 22
© 2003 By Default!
Risk Prediction Models
for Binary Outcomes
Case Data (Variables X1..Xi)
-> Predictive Model for Outcome Y (Yes/No)
-> Case Outcome Prediction (0 – 1)
Logistic Regression
 Bayesian Networks
 Artificial Neural Networks
 Support Vector Machine Regression

A Free sample background from www.powerpointbackgrounds.com
Slide 23
© 2003 By Default!
Risk Prediction Models
Clinical Utility

Risk Stratification for Research and Clinical
Practice

Risk-Adjusted Assessment of Providers and
Institutions

Individual risk prediction
A Free sample background from www.powerpointbackgrounds.com
Slide 24
© 2003 By Default!
Individual Risk Prediction

Good discrimination is necessary but not
sufficient for individual risk prediction

Calibration is the key index for individual risk
prediction
A Free sample background from www.powerpointbackgrounds.com
Slide 25
© 2003 By Default!
Inadequate Calibration
Why?

Models require external validation to be generally
accepted, and in those studies the general trend is:
– Discrimination retained
– Calibration fails

Factors that contribute to inadequate model
calibration in clinical practice
– Regional Variation
• Different Clinical Practice Standards
• Different Patient Case Mixes
– Temporal Variation
• Changes in Clinical Practice
• New diagnostic tools available
• Changes in Disease Incidence and Prevalence
A Free sample background from www.powerpointbackgrounds.com
Slide 26
© 2003 By Default!
Individual Risk Prediction
Clinical Examples
A Free sample background from www.powerpointbackgrounds.com

10 year “Hard” Coronary
heart disease risk
estimation

Logistic Regression

Calibration Problems
– Framingham Heart Study
–
–
–
–
Low SES
Young age
Female
Non-US populations
Kannel et al. Am J Cardiol, 1976
Slide 27
© 2003 By Default!
Individual Risk Prediction
Clinical Examples
A Free sample background from www.powerpointbackgrounds.com

Lifetime Invasive Breast
Cancer Risk Estimation

Logistic Regression

Calibration Problems
– Gail Model
Gail et al. JNCI, 1989
–
–
–
–
Age <35
Prior Hx Breast CA
Strong Family Hx
Lack of regular
mammograms
Slide 28
© 2003 By Default!
Individual Risk Prediction
Clinical Examples

Intensive Care Unit Mortality Prediction
–
–
–
–
–
–
APACHE-II
APACHE-III
MPM0
MPM0-II
SAPS
SAPS-II
A Free sample background from www.powerpointbackgrounds.com
Slide 29
© 2003 By Default!
Individual Risk Prediction
Clinical Examples
A Free sample background from www.powerpointbackgrounds.com
Ohno-Machado, et al. Annu Rev Biomed Eng. 2006;8:567-99
Slide 30
© 2003 By Default!
Individual Risk Prediction
Clinical Examples
A Free sample background from www.powerpointbackgrounds.com
Ohno-Machado, et al. Annu Rev Biomed Eng. 2006;8:567-99
Slide 31
© 2003 By Default!
Individual Risk Prediction
Clinical Examples

Interventional Cardiology Mortality Prediction
Model
Dates
Location
Sample
NY 1992
1991
NY
5827
NY 1997
1991 – 1994
NY
62670
CC 1997
1993 – 1994
Cleveland, OH
12985
NNE 1999
1994 – 1996
NH, ME, MA, VT
15331
MI 2001
1999 – 2000
Detroit, MI
10796
BWH 2001
1997 – 1999
Boston, MA
2804
ACC 2002
1998 – 2000
National
100253
A Free sample background from www.powerpointbackgrounds.com
Matheny, et al. J Biomed Inform. 2005 Oct;38(5):367-75
Slide 32
© 2003 By Default!
Individual Risk Prediction
Clinical Examples
Deaths
AUC
HL χ2
HL (p)
NY 1992
96.7
0.82
31.1
<0.001
NY 1997
61.6
0.88
32.2
<0.001
CC 1997
78.8
0.88
27.8
<0.001
NNE 1999
56.2
0.89
45.9
<0.001
MI 2001
61.8
0.86
30.4
<0.001
BWH 2001
136.1
0.89
39.7
<0.001
ACC 2002
49.9
0.90
42.0
<0.001
BWH 2004
70.5
0.93
7.61
0.473
Model
Observed Deaths = 71
A Free sample background from www.powerpointbackgrounds.com
Matheny, et al. J Biomed Inform. 2005 Oct;38(5):367-75
Slide 33
© 2003 By Default!
Inadequate Calibration
What to do?

In most cases, risk prediction models are
developed on much larger data sets than are
available for local model generation.
– Decreased variance and increased stability of model
covariate values
– Large, external models (especially those that have been
externally validated) are generally accepted by domain
experts

Goal is to ‘throw out’ as little prior model
information as possible while improving
performance
A Free sample background from www.powerpointbackgrounds.com
Slide 34
© 2003 By Default!
Recalibration and Remodeling
General Evaluation Rules

Model recalibration or remodeling follows the same
rules of evaluation as model building in general
– Separate training and test data, or
– Cross-Validation, etc

If temporal issues are central to that domain’s
calibration problems, training data should be both
before (in time) and separate from testing data
A Free sample background from www.powerpointbackgrounds.com
Slide 35
© 2003 By Default!
Discrimination versus Calibration
Model A
Expected Outcome
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
2.75
Model B
Expected Outcome
0.33
0.45
0.47
0.53
0.68
0.77
0.81
0.93
0.95
0.96
6.88
A Free sample background from www.powerpointbackgrounds.com
Observed
Outcome
0
0
0
0
1
0
0
1
1
1
4
Slide 36
© 2003 By Default!
Logistic Regression
General Equation
P[(Y  1)] 
1
1 e
 ( B0  B1 x1  Bi xi )

B0 is the intercept of the equation, which represents
the outcome probability in the absence of all other
risk factors (baseline risk)

The model assumes each covariate is independent
of each other, and Bx is the natural log of the odds
ratio of the risk attributable to that risk factor
A Free sample background from www.powerpointbackgrounds.com
Slide 37
© 2003 By Default!
Logistic Regression
”Original” Model and Cases
Variable
Intercept
Variable 1
Variable 2
Variable 3
Case
Probability
Model
β coeff
-3
0.2
0.5
1.0
Case 1
1
0
0
0
0.047
Case 2
1
1
0
0
0.057
Case 3
1
1
1
0
0.091

Case 4*
1
1
1
1
0.310
Minimum predicted risk for each case is intercept
only
 Adjusting intercept scales all results
A Free sample background from www.powerpointbackgrounds.com
* Case 4 is Outcome = 1, Case 1 -3 are Outcome = 0
Slide 38
© 2003 By Default!
LR Intercept Recalibration

The proportion of risk contributed by the intercept
(baseline) can be calculated for a data set by:
1

 ( B0 )
nobs 1  e
RiskInt (%) 
1

 ( B0  B1 x1  Bi xi )
nobs 1  e
A Free sample background from www.powerpointbackgrounds.com
Slide 39
© 2003 By Default!
LR Intercept Recalibration

The intercept contribution to risk (RiskInt(%)) is
multiplied by the observed event rate, and
converted back to a Beta Coefficient from a
probability:


1
B0 ( New)   ln 
 1
 RiskInt (%) * ObsEventRate 

A relative weakness of the method is that values
can exceed 1, and must be truncated
A Free sample background from www.powerpointbackgrounds.com
Slide 40
© 2003 By Default!
LR Intercept Recalibration
Example Model and Cases
Variable
Intercept
Variable 1
Variable 2
Variable 3
New Prob.
Orig Prob.
Old
β coeff
New
β coeff
-3.0
0.2
0.5
1.0
-2.2
0.2
0.5
1.0

Case 1
Case 2
Case 3
Case 4*
1
0
0
0
0.099
0.047
1
1
0
0
0.119
0.057
1
1
1
0
0.182
0.091
1
1
1
1
0.500
0.310
Original Expected = 0.51
 Intercept Recalibration Expected = 0.90
A Free sample background from www.powerpointbackgrounds.com
Slide 41
© 2003 By Default!
LR Slope Recalibration

In this method, the output probability of the original
LR equation is used to model a new LR equation
with that output as the only covariate:
P( New) 
A Free sample background from www.powerpointbackgrounds.com
1
1 e
 ( B0  B1 [ P ( Old )])
Slide 42
© 2003 By Default!
LR Slope Recalibration
Example Model and Cases
Variable
New Model Intercept
Orig Model Result
New Probability
Intercept Probability
New Model
β coeff
-3.0
11.0

Case 1
1
0.047
0.077
0.099
Case 2
1
0.057
0.086
0.119
Case 3
1
0.091
0.119
0.182
Original Expected = 0.51
 Slope Recalibration Expected = 0.88
A Free sample background from www.powerpointbackgrounds.com
Case 4*
1
0.310
0.601
0.500
Slide 43
© 2003 By Default!
LR Covariate Recalibration
Variable
Intercept
Variable 1
Variable 2
Variable 3
New Prob
Orig Prob
Old
β coeff
-3
0.2
0.5
1.0
New
β coeff
-2.5
0.1
0.3
3.0

Case 1
1
0
0
0
0.076
0.047
Case 2
1
1
0
0
0.083
0.057
Case 3
1
1
1
0
0.109
0.091
Case 4*
1
1
1
1
0.711
0.310
Original Expected = 0.51
 Covariate Recalibration Expected = 0.97
A Free sample background from www.powerpointbackgrounds.com
Slide 44
© 2003 By Default!
Recalibration Example
Local Institutional Data
Year
Cases
Mortality (%)
2002
1947
15 (0.8%)
2003
2004
1841
1767
33 (1.8%)
33 (1.9%)
A Free sample background from www.powerpointbackgrounds.com
Slide 45
© 2003 By Default!
Recalibration Example
External Risk Prediction Models
Year
Abbrev
Outcomes Sample
%
National ACC
ACC
707
50123
1.4
Northern New England
NNE
165
15331
1.1
University of Michigan
MIC
169
10796
1.6
Cleveland Clinic
CCL
169
2985
1.3
A Free sample background from www.powerpointbackgrounds.com
Slide 46
© 2003 By Default!
Results
No Recalibration
Observed
Expected
HL χ2
ACC
NNE
33
33
414
39.0
634
24.3
MIC
33
27.2
6.6
CCL
2004
ACC
33
56.3
14.0
33
418
641
NNE
MIC
CCL
33
33
33
36.6
23.3
60.3
51.0
22.9
21.2
Model
2003
A Free sample background from www.powerpointbackgrounds.com
Slide 47
© 2003 By Default!
Results
LR Intercept Recalibration
Observed
Expected
HL χ2
2003
ACC
NNE
33
33
45.1
26.0
10.0
43.6
MIC
CCL
33
33
22.1
24.8
12.7
10.5
2004
ACC
33
34.1
14.6
NNE
MIC
33
33
28.9
26.5
69.8
17.6
CCL
33
33.5
14.2
Model
A Free sample background from www.powerpointbackgrounds.com
Slide 48
© 2003 By Default!
Results
LR Slope Recalibration
Observed
Expected
HL χ2
2003
ACC
NNE
33
33
24.0
18.6
12.7
32.9
MIC
CCL
33
33
20.1
25.5
24.0
15.2
2004
ACC
33
32.0
35.7
NNE
MIC
33
33
31.2
31.0
21.7
23.6
CCL
33
31.6
13.2
Model
A Free sample background from www.powerpointbackgrounds.com
Slide 49
© 2003 By Default!
Clinical Applications
CALICO

California Intensive Care Outcomes (CALICO)
Project
– 23 Volunteer Hospitals beginning in 2002
– Compare hospital outcomes for selected conditions,
procedures, and intensive care unit types
– Identified popular, well-validated models
• MPMo-II, SAPS-II, APACHE-II, APACHE-III
– Evaluated the models on CALICO data, after determining
they were inadequately calibrated, conducted
recalibration of each of the models using the LR
Covariate Recalibration method
A Free sample background from www.powerpointbackgrounds.com
Slide 50
© 2003 By Default!
Clinical Applications
CALICO
A Free sample background from www.powerpointbackgrounds.com
Slide 51
© 2003 By Default!
Examples on Website

Most of the calculations from this
presentation are available on the website in
an Excel workbook
A Free sample background from www.powerpointbackgrounds.com
Slide 52
© 2003 By Default!
The End
Michael Matheny, MD, MS
[email protected]
Brigham & Women’s Hospital
Thorn 309
75 Francis Street
Boston, MA 02115
A Free sample background from www.powerpointbackgrounds.com