integrated discrimination improvement ( idi).

Download Report

Transcript integrated discrimination improvement ( idi).

Comparison of the C-statistic with new model discriminators in the prediction of long versus short hospital stay

Richard J Woodman 1 , Campbell H Thompson 2 , Susan W Kim 1 , Paul Hakendorf 3

.

1 Flinders Centre for Epidemiology and Biostatistics, Flinders University, Adelaide 2 Discipline of General Medicine, Adelaide University, Adelaide 3 Redesigning Care, Flinders Medical Centre, Adelaide

2011 Australia and New Zealand Stata Users Group meeting

17 th September 2011

Usefulness of new predictors

• Meaningful new risk predictors – Traditionally rely on the Concordance statistic (C-statistic / ROC) for assessing usefulness of new predictive measures

• C-statistic

– Measures overall test/model accuracy (sensitivity/specificity) – A weighted average of sensitivity over all possible cut points • Weighted by pdf of non-events • High sensitivities (low cut-points) have high weights – Probability Interpretation:

the probability of assigning a greater risk to a randomly selected patient with the event compared with a randomly selected patient without the event.

– P(p event > p non-event ) for random pair

Receiver Operating Curve (ROC)

Predicted p True positive rate .8

.7

1.00

.6

Pr (Longstay) .5

.4

.3

.2

0.00

0.25

Area under ROC curve = 0.7167

0.50

1 - Specificity 0.75

Shortstay Longstay False positive rate ∆ C-statistic Interpretation: Increase in probability that a random event subject will have a higher predicted p than a random non-event subject.

Usually small after a few good predictors included in the model

New Risk reclassification measures

• Clinicians want to know whether an added predictor will change risk such that they should treat patients differently • Can we better quantify improvement in risk prediction from new biomarkers?

• Net Reclassification Improvement (NRI) • Integrated Discrimination Improvement (IDI) – Pencina, Agostino et al.,

Statist. Med.

2008;

27

:157-172.

• How do they differ from the C-statistic?

• How and when should we be using them?

Net Reclassification Improvement

• NRI can be calculated as a sum of two separate components: one for individuals with events and the other for individuals without events • For events, assign 1 for upward reclassification, -1 for downward and 0 for people who do not change their risk category • The opposite is done for non-events • Sum the individual scores and divide by numbers of people in each group

Category-free NRI

• Calculate p 1 and p 2 (Old model=p 1 New model=p 2 ) • Event NRI = P(up l event) – P(down l event) • Non-event NRI = P(down l nonevent) – P(up l nonevent) • NRI= Event NRI+Non-event NRI Or • ½ NRI (Pencina 2010) Or • ½ wNRI (Pencina 2010) (Pencina 2008)

Integrated Discrimination Improvement (IDI)

Absolute IDI:

Probability difference in discrimination slopes (mean difference in p between events and non events).

= (p 2E - p 2NE ) - (p 1E - p 1NE ) = (p 2E - p 1E ) - (p 2NE - p 1NE ) •

Relative IDI

= (p 2E - p 2NE )/(p 1E - p 1NE )

Recent example

JACC 2011; 58(10): 1025-33.

August 2011

Veerana et al.

Category-dependent NRI

NRI

Am J Epidemiology 174 (5); June 27, 2011

NRI

Stratified versus Unstratified NRI

Stratified NRI nonCases Cases

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

Unstratified NRI

Noncases Cases 0.085

0.055

0.088

0.053

0.003

-0.002

-0.01 (0.016) 0.72

Statistical testing: Z score for discordance ~ McNemar’s test.

Predicting length of hospital stay • Short-stay wards necessary due to bed shortages in specialist wards • But incorrectly assign patients to short-stay

– Would overfill short stay units – Prevent correct treatment for long stay patients

• Clinicians trained to diagnose and treat not to predict length of stay • Few variables beyond age appear informative

Dataset

• 3 major hospitals – FMC – RGH – Auckland N=1457 General medical patients • • • • • Complete data on: – Age – – SBP HR – RR – – – – Mobility WBC count Cardiac failure (CF) Need for supplementary oxygen (SuO 2 ) All previously collected for predicting outcome Modified Early Warning Score (MEWS) Used by Emergency Medical Services to quickly determine risk of death – SBP – HR – RR – Temperature

Statistical Analysis

• Logistic regression model for predicting p: P(long stay) • Scaling using 2 STATA commands: – lintrend (Joanne Garrett – Univ North Carolina) – fracpoly (Patrick Royston) • Calibration – HL-deciles and LR tests • Measures of Discrimination – C-statistic – IDI – Category-dependent NRI • 50% cut-off • 57% cut-off – Category free NRI

STATA lintrend command – log odds age

lintrend longstay age, round(10) plot(log) xlab ylab

STATA lintrend command – log odds WBC count

lintrend longstay wbc, round(1) plot(log) xlab ylab

Fracpoly WBC

. fracpoly logistic longstay wbc, table compare ........

-> gen double Iwbc__1 = X^.5-.9876731667 if e(sample) -> gen double Iwbc__2 = X^.5*ln(X)+.0245010876 if e(sample) (where: X = wbc/10) Logistic regression Number of obs = 1457 LR chi2(2) = 49.38

Prob > chi2 = 0.0000

Log likelihood = -971.8662 Pseudo R2 = 0.0248

----------------------------------------------------------------------------- longstay | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+--------------------------------------------------------------- Iwbc__1 | .0040704 .0076682 -2.92 0.003 .0001014 .1633818

Iwbc__2 | 34.78284 33.17947 3.72 0.000 5.362915 225.5948

----------------------------------------------------------------------------- Deviance: 1943.73. Best powers of wbc among 44 models fit: .5 .5.

Fractional polynomial model comparisons: -------------------------------------------------------------- wbc df Deviance Dev. dif. P (*) Powers -------------------------------------------------------------- Not in model 0 1993.113 49.380 0.000 Linear 1 1954.819 11.087 0.011 1 m = 1 2 1949.234 5.502 0.064 2 m = 2 4 1943.732 - - .5 .5

-------------------------------------------------------------- (*) P-value from deviance difference comparing reported model with m = 2 model

Final model

Age (yrs) HR (10 bpm) Age#HR Mobility (range 0 to 3.5) Age#mobility BP (mmHg) WBC_1 (^0.5) WBC_2 (^0.5*ln(x)) RR (breaths/min) CCF (0=N,1=Y) SuO 2 (0=N,1=Y) Odds ratio 1.07

1.04

0.9996

13.1

95% CI 1.04-1.10

1.01-1.06

0.9993-0.9999

3.9-44.2

0.97

0.995

0.001

49.2

1.05

1.68

1.54

0.96-0.99

0.992-0.998

0.000-0.085

5.8-417.9

1.01-1.09

1.14-2.48

1.05-2.26

P-value <0.001

0.001

0.04

<0.001

<0.001

0.001

0.002

<0.001

0.02

0.009

0.03

Calibration

n n 0 1 2 3 4 Observed Long-stay 5 6 7 8 Predicted Long-stay 9

number of observations = 1457 number of groups = 10 Hosmer-Lemeshow chi2(8) = 14.66

Prob > chi2 = 0.07

number of observations = 1457 number of covariate patterns = 1457 Pearson chi2(1445) = 1486.69

Prob > chi2 = 0.22

0 1 Observed Long-stay 2 3 5 Predicted Long-stay

number of observations = 1457 number of groups = 5 Hosmer-Lemeshow chi2(3) = 5.64

Prob > chi2 = 0.13

C-statistic

#Compare Age with Age + Heart rate using “roccomp” quietly logistic longstay age predict p1 if e(sample),p quietly logistic longstay c.age##c.hrby10

predict p2 if e(sample),p roccomp longstay p1 p2 ROC -Asymptotic Normal- Obs Area Std. Err. [95% Conf. Interval] ------------------------------------------------------------------------ p1 1457 0.7167 0.0136 0.69000 0.74338

p2 1457 0.7433 0.0131 0.71767 0.76897

------------------------------------------------------------------------ Ho: area(p1) = area(p2) chi2(1) = 15.68 Prob>chi2 = 0.0001

ROC curves

0.00

0.20

Age WBC Age Area ROC=0.717

0.40

HR RR 0.60

mobility CCF 0.80

1.00

BP SuppO2 Age + heart rate Area ROC=0.743

^ event > p non-event ) for random pair ~ 2.5%

Sensitivity and Specificity

1.0

0.8

0.6

0.4

0.2

0.0

0.0

0.2

Age WBC 0.4

0.6

Cut-point HR RR mobility CCF 0.8

BP SuppO2 1.0

1.0

0.8

0.6

0.4

0.2

0.0

0.0

0.2

Age WBC 0.4

0.6

Cut-point HR RR mobility CCF 0.8

BP SuppO2 1.0

Improved sensitivity only at high cut-points.

C-statistic weights large sensitivities more heavily May be why improvements in sensitivities with later predictors don’t translate to increased C.

Predicted probabilities

Short-stay (n=630) 50 40 30 n 20 10 0 0 .1

p1 .2

p2 .3

.4

p3 .5

p4 .6

.7

p5 .8

p6 .9

1 p7 150 100 n 50 Long-stay (n=827) 0 0 .1

p1 .2

p2 .3

.4

p3 .5

p4 .6

.7

p5 .8

p6 .9

1 p7 Distribution of probabilities shift lower Distribution of probabilities flatten

STATA NRI command

User written – Author Liisa Byberg, Department of Surgical Sciences, Orthopedics unit, and Uppsala Clinical Research Center, Uppsala University, Sweden type net from http://www.ucr.uu.se/sv/images/stories/downloads Syntax nri1 depvar varlist1, prvars(varlist2) cut(#) nri2 depvar varlist1, prvars(varlist2) cut(# #) nri3 depvar varlist1, prvars(varlist2) cut(# # #)

nri1 – heart rate (probability cut-point=50)

nri1 longstay age,prvars(hrby10 agehrby10) cut(50) ----------------------------------------------------------------- NRI | Estimate Std. Err. Z P-value ----------+------------------------------------------------------ | 0.05170 0.01792 2.88484 0.00392

----------------------------------------------------------------- ------------------------------ longstay | and | Established risk Establish | factors + new ed risk | predictors factors | <50% >=50% Total ----------+------------------- 1 | <50% | 108 63 171 >=50% | 36 620 656 | Total | 144 683 827 ----------+------------------- 0 | <50% | 294 29 323 >=50% | 41 266 307 | Total | 335 295 630 ------------------------------ reclassified Downward (%) 36/827 (0.0435) 41/630 (0.0650)

reclassified Upward (%)

63/827 (0.0762) 29/630 (0.0460)

reclassified Upward Downward (%)

(0.0327) (-0.0190)

NRI

0.0517

P-value

0.004

SE= √ ((0.0762+0.0435)/827 + (0.0460+0.0651)/630)=0.0179 z=0.0517/0.0179=2.88 (McNemar – asymptotic test for correlated proportions)

STATA IDI command syntax

idi depvar varlist1,prvars(varlist2) idi longstay age,prvars(hrby10 agehrby10) --------------------------------------------------- IDI | Estimate Std. Err. P-value ----------+---------------------------------------- | 0.04195 0.00525 0.00000

----------------------------------------------------

Definition: IDI= (IS 2 – IS 1 ) – (IP 2 – IP 1 ) IDI = (p 2 -p 1 )events IS = ∫ sensitivity IP = ∫ (1 – specificity) (p 2 -p 1 )non-events

Predicted probabilities and the IDI

Short-stay Long-stay 1.0

0.8

0.6

0.4

0.2

0.0

1 2 Graphs by longstay 3 4 5 6 7 8 1 Predictor variable 2 Individual subjects 3 4 5 Overall mean 6 7 8 1 .8

p .6

.4

.2

0 1 IDI=Difference minus baseline difference 2 3 4 5 Predictor variable Short-stay 6 Long-stay 7 8

IDI interpretation:

Improvement in average sensitivity plus any potential decrease in average (1-specificty).

Magnitude is hard to interpret.

Some studies also present relative IDI (%).

C-Statistic .03

.025

.02

.015

.01

.005

0 *** *** ** ** HR RR Mobility CCF IDI .045

.04

.035

.03

.025

.02

.015

.01

.005

0 *** *** *** *** *** * * BP Supp_O2 WBC

NRI50 NRI57 .06

.05

.04

.03

.02

.01

0 ** * ** .06

.05

.04

.03

.02

.01

0 -.01

** * ** HR RR Mobility CCF BP Supp_O2 WBC Effect of each variable on re-classification depends on the classification cut point Small changes in chosen cut-point can have large influences

Overall Category-free NRI .3

.25

.2

.15

.1

.05

0 -.05

*** *** *** * * HR RR Mobility CCF BP Supp_O2 WBC

Interpretation:

direction – proportion of subjects with movement of p in the correct

averaged

for event and non-event subjects.

Category-free Event NRI Category-free Non-Event NRI .1

0 -.1

-.2

-.3

-.4

-.5

-.6

-.7

*** *** *** *** *** *** *** .8

.7

.3

.2

.1

0 .6

.5

.4

*** *** *** *** *** *** *** HR RR Mobility CCF BP Supp_O2 WBC

Interpretation:

Net movement of p’s in

the correct direction

- for event and non-event subjects separately.

Pr(p is higher-p is lower) → mostly poorer re-classification Pr(p is lower- p is higher) → consistently improved re-classification

Proportion of long-stay whose p went up 1 .9

.8

.7

.6

.5

.4

.3

.2

.1

0 Proportion of short-stay whose p went down .4

.3

.2

.1

0 .7

.6

.5

1 .9

.8

Mostly < 50% with each new variable Consistently > 50% with each new variable HR RR Mobility CCF BP Supp_O2 WBC

Summary

• IDI – Mirrored the C-statistic but was more sensitive. – Equally weights sensitivity across cut-points.

– C-statistic weights large sensitivities more heavily.

• Category-dependent NRI – The variables selected were heavily dependent on the chosen cut-points – Fewer variables identified as important discriminators than for either the C-statistic, the IDI or category-free NRI.

• Category-free NRI – Overall, quite similar results to the C-statistic and IDI – Very different performances amongst the short-stay and long stay patients

Conclusions

• Discrimination statistics cannot be used interchangeably • May be necessary to present all 4 for greatest insight. • C-statistic: Averaged sensitivity – Does not weight equally across cut-points – Does not assess risk re-classification.

• IDI: Averaged sensitivity – Weights cut-points equally – Adjusts for specificity differently to C-statistic – May better highlight potentially important predictors.

• Category-free NRI: % subjects with correct movement in p.

– Event and non-event NRI may perform quite differently • Category-dependent NRI: % correct movement across categories.

– Results may be heavily influenced by chosen cut-points.

– Be wary of studies using the category-dependent NRI with non predefined cut-points.