Added Predictive Ability

Download Report

Transcript Added Predictive Ability

A SAS Macro to Compute Added Predictive Ability of New Markers in Logistic Regression

Kevin Kennedy, MS Saint Luke’s Hospital, Kansas City, MO Kansas City Area SAS User Group Meeting September 3, 2009

Acknowledgment

A special thanks to Michael Pencina (PhD, Biostatistics Dept, Boston University, Harvard Clinical Research Institute) for his valuable input, ideas, and additional formulas to strengthen output.

Motivation

Predicting dichotomous outcomes are important

Will a patient develop a disease?

Will an applicant default on a loan?

Will KSU win a game in the NCAA tournament?

Improving an already usable models should be a continued goal

Motivation

Many published models exist

Risk of Bleeding after PCI

Predicts patient risk of bleeding after a PCI (Percutaneous Coronary Intervention) based on 10 patient characteristics: Age, gender, shock, prior intervention, kidney function, etc..

Example: a 78 year old female with Prior Shock and Congestive Heart Failure has a ~8% chance of bleeding after procedure (national avg=2-3%)

Why important? We can treat those at high risk appropriately

Mehta et al. Circ Intervention, June 2009

However…..

• •

These models aren’t etched in stone New markers (variables) should be investigated to improve model performance

Important Question: How do we determine if we should add these new markers to the model?

Project Goal

Compare:

model1 : logit (

y

)   1 var 1  .....

 

n

var

n With

model2 : logit (

y

)   1 var 1  .....

 

n

var

n

 

n

 1 var

n

 1 • •

How Much Does the new variable add to model performance?

Output of Interest: Predicted Probability of Event (one for each model) computed with the logistic regression model

Outline

Traditional Comparisions

Receiver Operating Characteristic Curve

New Measures

IDI, NRI, Vickers Decision Curve

SAS macro to obtain output

Traditional Approach-AUCs

• • •

Common to plot the Receiver Operating Characteristic Curve (ROC) and report the area underneath (AUC) or c-statistic Measures model discrimination Equivalent to the probability that the predicted risk is higher for an event than a non-even[4]

AUC/ROC Background

ROC/AUC plot depicts trade-off between benefit (true positives) and costs (false positives) by defining a “cut off” to determine positive and negative individuals Cut Off Events Non-Events False Neg Predicted Probabilities True Positive Rate (Sensitivity)=.9

False Positive Rate (1-Specificity)=.4

100%

ROC curve

AUC

 .

9

AUC

1 ~ .

9 ~ .

7  .

65 .

5 Perfect Discrimina tion Yippy!

Ehh

...

Ugh

...

No Discrimina tion

0% 0%

False Positive Rate (1-specificity)

100%

AUC Computation

• •

All possible pairs are made between events and nonevents

A dataset with 100 events and 1000 non-events would have 100*1000=100000 pairs

– –

If the predicted probability for the event is higher for the subject actually experiencing the event give a ‘1’ (concordant) otherwise ‘0’ (discordant) C statistic is the average of the 1’s and 0’s (.5 for ties) Now use methods by DeLong to compare AUCs of Model1 and Model2

Advantages

• •

Used all the time Recommended guidelines exist for Excellent/Good/Poor discrimination

Default output in proc logistic, with new extensions in version 9.2 (roc and roccontrast statements) along with other computing packages

Disadvantages

• • •

Rank based

A comparison of .51 to .50 is treated the same as .7 to .1

Doesn’t imply a useful model

Example: all events have probability=.51 and non-event have probability=.5---Perfect discrimination (c=1) but not useful Extremely hard to find markers that result in high AUC

Pepe [5] claims an odds ratio of 3 doesn’t yield good discrimination

Alternatives

Pencina and D’Agostino in 2008 (Statistics in Medicine) suggest 2 additional statistics:

IDI (Integrated Discrimination Improvement)

NRI (Net Reclassification Improvement)

Vickers [2,3] developed graphical techniques of comparing models

IDI

• •

Measure of improved sensitivity without sacrificing specificity Formula measures how much increase in ‘p’ for events, and how much decrease for the non-event

Absolute

_

IDI

Re

lative

_

IDI

 (

p

_

event

_ 2  

p

_

event

_ 1 )  (

p p

_

p event

_

event

_ 2  _ 1 

p

_

nonevent

_ 2

p

_

nonevent

_ 1  1 _

nonevent

_ 1 

p

_

nonevent

_ 2 )

where

:

p

_

event

_ 2 is the mean predicted probabilit y of the events in the 2nd model

Model1 Model2 50 40 30 20 10 0

IDI

Absolute_IDI=(.15-.12)+(.25-.2)=.08

Relative_I DI  .

25 .

2  .

12  .

15  1  1 .

6

Mean ‘p’ decreased by 3% for non-events 15 12 Non-Events 160% Relative improvement Mean ‘p’ increased by 5% for events 20 25 Events

NRI

• • • •

Measures how well the new model reclassifies event and non-event Dependant on how we decide to classify observations

Example: 0-10% Low, 10-20% Moderate, >20% High Questions: do the patients experiencing an event go up in risk? Mod to High between model1 and 2 Do Patients not experiencing an event go down in risk? Moderate to Low?

NRI

Formula:

NRI

     # Events moving up # of Events      # Events moving down # of Events          # Non # events moving down of Non events      # Non # events moving up of Non events    

NRI Computation Example

• •

3 groups defined

<10% (low)

10-20% (moderate)

>20% (high)

Each individual will have a group for model 1 and 2 2 cross-tabulation tables (events and non-events)

NRI Computation Example

Crosstab1: Events (100 Events) Model 2 Low Mod High Low 10 3 2 Mod 8 30 High 2 20 Events moving up 10 Events moving down 70 Events not moving 10 5 30 Net of 10/100 (10%) of events getting reclassified correctly

NRI Computation Example

Crosstab2: Non-Events (200 Non-Events) Model 2 Low Mod High Low Mod High 50 20 5 5 40 10 0 10 60 15 Non-events moving up 35 Non-events moving down 150 Non-events not moving Net of 20/200 (10%) of non events getting reclassified correctly

NRI

   20 100  10 100      35 200  15 200    .

2

NRI Caveats

• • •

Dependant on Groups

Would we reach similar conclusions with groups: <15, 15-30, >30?????

Alternative ways to define groups

Any up/down movement. A change in p from .15 to .151 would be an ‘up’ movement

A threshold: ex. a change of 3% would constitute an up/down. i.e. .33 to .37 would be an ‘up’ but .33 to .34 would be no movement Good News: Macro handles all these cases and you can request all at once

Vickers Decision Curve

• • •

A graphical comparison of model 1 and 2 based off of ‘net’ benefit (first attributed to Peirce-1884) Useful if a threshold is important. Example: If a persons predicted probability of an outcome is greater than 10% we treat with strategy A

Here we’d want to compare the models at this threshold

NetBenefit

 True Positives

n

   False Positives

n

  1 

p t p t

   

Vickers Decision Curve

Example:

N=1000, dichotomous event, 10% as threshold Outcome Yes True Positive count No False Positive count ≥10% 80 200 <10% 20 700 5.78 net true positive results per 100 compared with treating all as negative

NetBenefit

 80 1000    200 1000   1  .

1 .

1     .

0578

0.06

0.05

0.04

0.03

0.02

0.01

0.00

-0.01

0

Vickers Decision Curve

No difference between 1 2 and treat all Model 2 seems to outperform 1

40 10 Treat All 20 30 Threshold Probability in % Model1 PLOT Model2 50

SAS Macro to Obtain Output

%added_pred(data= ,id= , y= , model1cov= ,model2cov= , nripoints=ALL, nriallcut=%str(), vickersplot=FALSE, vickerpoints=%str());

• • • • •

Model1cov=initial model Model2cov=new model Nripoints=nri levels (eg: <10, 10-20, >20) insert nripoints=.1 .2

Nriallcut=if you want to test amount of increase or decrease instead of levels, i.e. if you want to know if a person increase/decreases by 5% insert nriallcut=.05

Vickerpoints=thresholds to test (eg, 10%)

AUC section of Macro

• •

AUC Comparisions are easy with SAS 9.2

PROC LOGISTIC DATA=&DATA; MODEL &y=&model1cov &model2cov; ROC ‘FIRST’ &model1cov; ROC ‘SECOND’ &model2cov; ROCCONTRAST REFERENCE=‘FIRST’; ODS OUTPUT ROCASSOCIATION=ROCASS ROCCONTRASTESTIMATE=ROCDIFF; RUN; If working from an earlier version the %roc macro will be called from the SAS website[6]

AUC Section of Macro

• •

Sample output %added_pred(data=data, id=id, y=event, model1cov=x1 x2 x3 x4, model2cov=x1 x2 x3 x4 x5, …….); Model1 AUC 0.77907

Model2 AUC 0.79375

Difference in AUC 0.0147

Std Error for Difference .0042

P-value for difference 0.0005

95% CI for Difference (0.00646,0.0229)

IDI Section of Macro

• •

Use proc logistic output dataset for model1 and model2 PROC LOGISTIC DATA=&DATA; MODEL &y=&model1cov; OUTPUT OUT=OLD PRED=P_OLD; RUN; PROC LOGISTIC DATA=&DATA; MODEL &y=&model2cov; OUTPUT OUT=NEW PRED=P_NEW; RUN; proc sql noprint; create table probs as select *, (p_new-p_old) as pdiff from old(keep=&id &y &model1cov &model2cov p_old ) as a join new(keep=&id &y &model1cov &model2cov p_new ) as b on a.&id=b.&id order by &y; quit; Now use proc means or sql to obtain: p_event_new, p_event_old, p_nonevent_new, p_nonevent_old

IDI Section of Macro

Sample Output IDI .0207

IDI Std Err Z-value .0064

6.3

P value <.0001

95% CI (0.0143, 0.0272) Probability change for events .0186

Probability change for non-events -.002125

Relative IDI .167

NRI Section of Macro

In 3 parts:

– – –

All groups (any up/down movement) User defined (eg. <10,10-20,>20) Threshold (eg. a change of 3%)

Coding is more involved here containing

– –

Counts for number of groups Do-loops for various # of thresholds and user groups

NRI Section of Macro User Defined Groups(eg. <10, 10-20,>20 nripoints=.1 .2) %if &nripoints^=ALL %then %do; %let numgroups=%eval(%words(&nripoints)+1); data nriprobs; set probs; /*define first ordinal group for pre and post*/ /*figure out how many ordinal groups*/ if 0

<=p_old<=%scan(&nripoints,

1

,' ') then group_pre=

1

; if

0

<=p_new<=%scan(&nripoints,

1

,' ') then group_post=

1

; %let i=1; %do %until(&i>%eval(&numgroups-

1

)); if %scan(&nripoints,&i,' ')

1

;end; if %scan(&nripoints,&i,' ')

1

;end; %let i=%eval(&i+1); %end; if &y=

0

then do; if group_post>group_pre then up_nonevent=

1

; if group_post

1

; end; if &y=

1

then do; if group_post>group_pre then up_event=

1

; if group_post

1

end; if up_nonevent=

.

then up_nonevent=

0

; if down_nonevent=

.

then down_nonevent=

0

; if up_event=

.

then up_event=

0

; if down_event=

.

then down_event=

0

; run;

• •

Group

NRI Section of Macro

Sample Output %added_pred(data=,…..,nripoints=.1 .2, nriallcut=.03,…..); NRI STD ERR Z value P VALUE 95% CI % of events correctly reclassified (0.3649,0.5447) (10%) ALL .454

.09

9.7

<.0001

% of non-event correctly reclassified 56% CUT_.03

.101

.08

USER .127

.05

2.46

4.95

.014

(0.0205,0.1817) 4% <.0001

(0.0769,0.177) 5% 6% 8%

Vickers’ Section of Macro

• • •

Default is no analysis Can test multiple thresholds

Uses bootstrapping techniques to create 95% CI’s If testing thresholds run time will increase due to Bootstrapping

0.11

0.10

0.09

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0.00

-0.01

Vickers Section of Macro

%added_pred(data=,…..,vickersplot=TRUE,…..) Vickers Decision Curve

0 10 20 30 Treat All 40 50 60 Threshold Probability in % Model1 70 80 90 100 PLOT Model2

Conclusions

Don’t rely only on AUC and statistical significance to access added marker predictive ability, but a combination of methods

Future: extend to time-to-event analysis

Q’s or Comments

• •

If you want to use the macro or obtain literature contact me at: Email:

[email protected] or [email protected]

References

1) Pencina MJ, D'Agostino RB Sr, D’Agostino RB Jr. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med 2008; 27:157-72.

2) Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making. 2006 Nov Dec;26(6):565-74 3) Vickers AJ, Cronin AM, Elkin EB, Gonen M. Extensions to decision curve analysis, A novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Medical Informatics and Decision Making. 2008 Nov 26;8(1):53.

4) Cook NR. Use and Misuse of the receiver operating characteristics curve in risk prediction. Circulation 2007; 115:928-935.

5) Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159:882-890. 6) http://support.sas.com/kb/25/017.html