Transcript integrated discrimination improvement ( idi).
Comparison of the C-statistic with new model discriminators in the prediction of long versus short hospital stay
Richard J Woodman 1 , Campbell H Thompson 2 , Susan W Kim 1 , Paul Hakendorf 3
.
1 Flinders Centre for Epidemiology and Biostatistics, Flinders University, Adelaide 2 Discipline of General Medicine, Adelaide University, Adelaide 3 Redesigning Care, Flinders Medical Centre, Adelaide
2011 Australia and New Zealand Stata Users Group meeting
17 th September 2011
Usefulness of new predictors
• Meaningful new risk predictors – Traditionally rely on the Concordance statistic (C-statistic / ROC) for assessing usefulness of new predictive measures
• C-statistic
– Measures overall test/model accuracy (sensitivity/specificity) – A weighted average of sensitivity over all possible cut points • Weighted by pdf of non-events • High sensitivities (low cut-points) have high weights – Probability Interpretation:
the probability of assigning a greater risk to a randomly selected patient with the event compared with a randomly selected patient without the event.
– P(p event > p non-event ) for random pair
Receiver Operating Curve (ROC)
Predicted p True positive rate .8
.7
1.00
.6
Pr (Longstay) .5
.4
.3
.2
0.00
0.25
Area under ROC curve = 0.7167
0.50
1 - Specificity 0.75
Shortstay Longstay False positive rate ∆ C-statistic Interpretation: Increase in probability that a random event subject will have a higher predicted p than a random non-event subject.
Usually small after a few good predictors included in the model
New Risk reclassification measures
• Clinicians want to know whether an added predictor will change risk such that they should treat patients differently • Can we better quantify improvement in risk prediction from new biomarkers?
• Net Reclassification Improvement (NRI) • Integrated Discrimination Improvement (IDI) – Pencina, Agostino et al.,
Statist. Med.
2008;
27
:157-172.
• How do they differ from the C-statistic?
• How and when should we be using them?
Net Reclassification Improvement
• NRI can be calculated as a sum of two separate components: one for individuals with events and the other for individuals without events • For events, assign 1 for upward reclassification, -1 for downward and 0 for people who do not change their risk category • The opposite is done for non-events • Sum the individual scores and divide by numbers of people in each group
Category-free NRI
• Calculate p 1 and p 2 (Old model=p 1 New model=p 2 ) • Event NRI = P(up l event) – P(down l event) • Non-event NRI = P(down l nonevent) – P(up l nonevent) • NRI= Event NRI+Non-event NRI Or • ½ NRI (Pencina 2010) Or • ½ wNRI (Pencina 2010) (Pencina 2008)
Integrated Discrimination Improvement (IDI)
•
Absolute IDI:
Probability difference in discrimination slopes (mean difference in p between events and non events).
= (p 2E - p 2NE ) - (p 1E - p 1NE ) = (p 2E - p 1E ) - (p 2NE - p 1NE ) •
Relative IDI
= (p 2E - p 2NE )/(p 1E - p 1NE )
Recent example
JACC 2011; 58(10): 1025-33.
August 2011
Veerana et al.
Category-dependent NRI
NRI
Am J Epidemiology 174 (5); June 27, 2011
NRI
Stratified versus Unstratified NRI
Stratified NRI nonCases Cases
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Unstratified NRI
Noncases Cases 0.085
0.055
0.088
0.053
0.003
-0.002
-0.01 (0.016) 0.72
Statistical testing: Z score for discordance ~ McNemar’s test.
Predicting length of hospital stay • Short-stay wards necessary due to bed shortages in specialist wards • But incorrectly assign patients to short-stay
– Would overfill short stay units – Prevent correct treatment for long stay patients
• Clinicians trained to diagnose and treat not to predict length of stay • Few variables beyond age appear informative
Dataset
• 3 major hospitals – FMC – RGH – Auckland N=1457 General medical patients • • • • • Complete data on: – Age – – SBP HR – RR – – – – Mobility WBC count Cardiac failure (CF) Need for supplementary oxygen (SuO 2 ) All previously collected for predicting outcome Modified Early Warning Score (MEWS) Used by Emergency Medical Services to quickly determine risk of death – SBP – HR – RR – Temperature
Statistical Analysis
• Logistic regression model for predicting p: P(long stay) • Scaling using 2 STATA commands: – lintrend (Joanne Garrett – Univ North Carolina) – fracpoly (Patrick Royston) • Calibration – HL-deciles and LR tests • Measures of Discrimination – C-statistic – IDI – Category-dependent NRI • 50% cut-off • 57% cut-off – Category free NRI
STATA lintrend command – log odds age
lintrend longstay age, round(10) plot(log) xlab ylab
STATA lintrend command – log odds WBC count
lintrend longstay wbc, round(1) plot(log) xlab ylab
Fracpoly WBC
. fracpoly logistic longstay wbc, table compare ........
-> gen double Iwbc__1 = X^.5-.9876731667 if e(sample) -> gen double Iwbc__2 = X^.5*ln(X)+.0245010876 if e(sample) (where: X = wbc/10) Logistic regression Number of obs = 1457 LR chi2(2) = 49.38
Prob > chi2 = 0.0000
Log likelihood = -971.8662 Pseudo R2 = 0.0248
----------------------------------------------------------------------------- longstay | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+--------------------------------------------------------------- Iwbc__1 | .0040704 .0076682 -2.92 0.003 .0001014 .1633818
Iwbc__2 | 34.78284 33.17947 3.72 0.000 5.362915 225.5948
----------------------------------------------------------------------------- Deviance: 1943.73. Best powers of wbc among 44 models fit: .5 .5.
Fractional polynomial model comparisons: -------------------------------------------------------------- wbc df Deviance Dev. dif. P (*) Powers -------------------------------------------------------------- Not in model 0 1993.113 49.380 0.000 Linear 1 1954.819 11.087 0.011 1 m = 1 2 1949.234 5.502 0.064 2 m = 2 4 1943.732 - - .5 .5
-------------------------------------------------------------- (*) P-value from deviance difference comparing reported model with m = 2 model
Final model
Age (yrs) HR (10 bpm) Age#HR Mobility (range 0 to 3.5) Age#mobility BP (mmHg) WBC_1 (^0.5) WBC_2 (^0.5*ln(x)) RR (breaths/min) CCF (0=N,1=Y) SuO 2 (0=N,1=Y) Odds ratio 1.07
1.04
0.9996
13.1
95% CI 1.04-1.10
1.01-1.06
0.9993-0.9999
3.9-44.2
0.97
0.995
0.001
49.2
1.05
1.68
1.54
0.96-0.99
0.992-0.998
0.000-0.085
5.8-417.9
1.01-1.09
1.14-2.48
1.05-2.26
P-value <0.001
0.001
0.04
<0.001
<0.001
0.001
0.002
<0.001
0.02
0.009
0.03
Calibration
n n 0 1 2 3 4 Observed Long-stay 5 6 7 8 Predicted Long-stay 9
number of observations = 1457 number of groups = 10 Hosmer-Lemeshow chi2(8) = 14.66
Prob > chi2 = 0.07
number of observations = 1457 number of covariate patterns = 1457 Pearson chi2(1445) = 1486.69
Prob > chi2 = 0.22
0 1 Observed Long-stay 2 3 5 Predicted Long-stay
number of observations = 1457 number of groups = 5 Hosmer-Lemeshow chi2(3) = 5.64
Prob > chi2 = 0.13
C-statistic
#Compare Age with Age + Heart rate using “roccomp” quietly logistic longstay age predict p1 if e(sample),p quietly logistic longstay c.age##c.hrby10
predict p2 if e(sample),p roccomp longstay p1 p2 ROC -Asymptotic Normal- Obs Area Std. Err. [95% Conf. Interval] ------------------------------------------------------------------------ p1 1457 0.7167 0.0136 0.69000 0.74338
p2 1457 0.7433 0.0131 0.71767 0.76897
------------------------------------------------------------------------ Ho: area(p1) = area(p2) chi2(1) = 15.68 Prob>chi2 = 0.0001
ROC curves
0.00
0.20
Age WBC Age Area ROC=0.717
0.40
HR RR 0.60
mobility CCF 0.80
1.00
BP SuppO2 Age + heart rate Area ROC=0.743
^ event > p non-event ) for random pair ~ 2.5%
Sensitivity and Specificity
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
Age WBC 0.4
0.6
Cut-point HR RR mobility CCF 0.8
BP SuppO2 1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
Age WBC 0.4
0.6
Cut-point HR RR mobility CCF 0.8
BP SuppO2 1.0
Improved sensitivity only at high cut-points.
C-statistic weights large sensitivities more heavily May be why improvements in sensitivities with later predictors don’t translate to increased C.
Predicted probabilities
Short-stay (n=630) 50 40 30 n 20 10 0 0 .1
p1 .2
p2 .3
.4
p3 .5
p4 .6
.7
p5 .8
p6 .9
1 p7 150 100 n 50 Long-stay (n=827) 0 0 .1
p1 .2
p2 .3
.4
p3 .5
p4 .6
.7
p5 .8
p6 .9
1 p7 Distribution of probabilities shift lower Distribution of probabilities flatten
STATA NRI command
User written – Author Liisa Byberg, Department of Surgical Sciences, Orthopedics unit, and Uppsala Clinical Research Center, Uppsala University, Sweden type net from http://www.ucr.uu.se/sv/images/stories/downloads Syntax nri1 depvar varlist1, prvars(varlist2) cut(#) nri2 depvar varlist1, prvars(varlist2) cut(# #) nri3 depvar varlist1, prvars(varlist2) cut(# # #)
nri1 – heart rate (probability cut-point=50)
nri1 longstay age,prvars(hrby10 agehrby10) cut(50) ----------------------------------------------------------------- NRI | Estimate Std. Err. Z P-value ----------+------------------------------------------------------ | 0.05170 0.01792 2.88484 0.00392
----------------------------------------------------------------- ------------------------------ longstay | and | Established risk Establish | factors + new ed risk | predictors factors | <50% >=50% Total ----------+------------------- 1 | <50% | 108 63 171 >=50% | 36 620 656 | Total | 144 683 827 ----------+------------------- 0 | <50% | 294 29 323 >=50% | 41 266 307 | Total | 335 295 630 ------------------------------ reclassified Downward (%) 36/827 (0.0435) 41/630 (0.0650)
reclassified Upward (%)
63/827 (0.0762) 29/630 (0.0460)
reclassified Upward Downward (%)
(0.0327) (-0.0190)
NRI
0.0517
P-value
0.004
SE= √ ((0.0762+0.0435)/827 + (0.0460+0.0651)/630)=0.0179 z=0.0517/0.0179=2.88 (McNemar – asymptotic test for correlated proportions)
STATA IDI command syntax
idi depvar varlist1,prvars(varlist2) idi longstay age,prvars(hrby10 agehrby10) --------------------------------------------------- IDI | Estimate Std. Err. P-value ----------+---------------------------------------- | 0.04195 0.00525 0.00000
----------------------------------------------------
Definition: IDI= (IS 2 – IS 1 ) – (IP 2 – IP 1 ) IDI = (p 2 -p 1 )events IS = ∫ sensitivity IP = ∫ (1 – specificity) (p 2 -p 1 )non-events
Predicted probabilities and the IDI
Short-stay Long-stay 1.0
0.8
0.6
0.4
0.2
0.0
1 2 Graphs by longstay 3 4 5 6 7 8 1 Predictor variable 2 Individual subjects 3 4 5 Overall mean 6 7 8 1 .8
p .6
.4
.2
0 1 IDI=Difference minus baseline difference 2 3 4 5 Predictor variable Short-stay 6 Long-stay 7 8
IDI interpretation:
Improvement in average sensitivity plus any potential decrease in average (1-specificty).
Magnitude is hard to interpret.
Some studies also present relative IDI (%).
C-Statistic .03
.025
.02
.015
.01
.005
0 *** *** ** ** HR RR Mobility CCF IDI .045
.04
.035
.03
.025
.02
.015
.01
.005
0 *** *** *** *** *** * * BP Supp_O2 WBC
NRI50 NRI57 .06
.05
.04
.03
.02
.01
0 ** * ** .06
.05
.04
.03
.02
.01
0 -.01
** * ** HR RR Mobility CCF BP Supp_O2 WBC Effect of each variable on re-classification depends on the classification cut point Small changes in chosen cut-point can have large influences
Overall Category-free NRI .3
.25
.2
.15
.1
.05
0 -.05
*** *** *** * * HR RR Mobility CCF BP Supp_O2 WBC
Interpretation:
direction – proportion of subjects with movement of p in the correct
averaged
for event and non-event subjects.
Category-free Event NRI Category-free Non-Event NRI .1
0 -.1
-.2
-.3
-.4
-.5
-.6
-.7
*** *** *** *** *** *** *** .8
.7
.3
.2
.1
0 .6
.5
.4
*** *** *** *** *** *** *** HR RR Mobility CCF BP Supp_O2 WBC
Interpretation:
Net movement of p’s in
the correct direction
- for event and non-event subjects separately.
Pr(p is higher-p is lower) → mostly poorer re-classification Pr(p is lower- p is higher) → consistently improved re-classification
Proportion of long-stay whose p went up 1 .9
.8
.7
.6
.5
.4
.3
.2
.1
0 Proportion of short-stay whose p went down .4
.3
.2
.1
0 .7
.6
.5
1 .9
.8
Mostly < 50% with each new variable Consistently > 50% with each new variable HR RR Mobility CCF BP Supp_O2 WBC
Summary
• IDI – Mirrored the C-statistic but was more sensitive. – Equally weights sensitivity across cut-points.
– C-statistic weights large sensitivities more heavily.
• Category-dependent NRI – The variables selected were heavily dependent on the chosen cut-points – Fewer variables identified as important discriminators than for either the C-statistic, the IDI or category-free NRI.
• Category-free NRI – Overall, quite similar results to the C-statistic and IDI – Very different performances amongst the short-stay and long stay patients
Conclusions
• Discrimination statistics cannot be used interchangeably • May be necessary to present all 4 for greatest insight. • C-statistic: Averaged sensitivity – Does not weight equally across cut-points – Does not assess risk re-classification.
• IDI: Averaged sensitivity – Weights cut-points equally – Adjusts for specificity differently to C-statistic – May better highlight potentially important predictors.
• Category-free NRI: % subjects with correct movement in p.
– Event and non-event NRI may perform quite differently • Category-dependent NRI: % correct movement across categories.
– Results may be heavily influenced by chosen cut-points.
– Be wary of studies using the category-dependent NRI with non predefined cut-points.