Statistics 262: Intermediate Biostatistics

Download Report

Transcript Statistics 262: Intermediate Biostatistics

Statistics 262: Intermediate
Biostatistics
May 18, 2004: Cox Regression III: residuals and
diagnostics, repeated events
Jonathan Taylor and Kristin Cobb
Satistics 262
1
Residuals


Residuals are used to investigate the
lack of fit of a model to a given subject.
For Cox regression, there’s no easy
analog to the usual “observed minus
predicted” residual of linear regression
Satistics 262
2
Deviance Residuals
Deviance residuals are based on
martingale residuals: ci (1 if event, 0 if
censored) minus the estimated
cumulative hazard to ti (as a function of
fitted model) for individual i:
ci-H(ti,Xi, ßi)
See Hosmer and Lemeshow for more
discussion…

Satistics 262
3
Deviance Residuals




Behave like residuals from ordinary linear
regression
Should be symmetrically distributed around 0
and have standard deviation of 1.0.
Negative for observations with longer than
expected observed survival times.
Plot deviance residuals against covariates to
look for unusual patterns.
Satistics 262
4
Deviance Residuals
In SAS, option on the output statement:
Ouput out=outdata resdev=

Satistics 262
5
Schoenfeld residuals

Schoenfeld (1982) proposed the first set of residuals
for use with Cox regression packages


Schoenfeld D. Residuals for the proportional hazards
regresssion model. Biometrika, 1982, 69(1):239-241.
Instead of a single residual for each individual, there
is a separate residual for each individual for each
covariate


Based on the individual contributions to the derivative
of the log partial likelihood (see chapter 6 in Hosmer
and Lemeshow for more math details, p.198-199)
Note: Schoenfeld residuals are not defined for
censored individuals.
Satistics 262
6
Schoenfeld residuals
Where K is the covariate of interest,
the Schoenfeld residual is the covariate-value, Xik, for
the person (i) who actually died at time ti minus the
expected value of the covariate for the risk set at ti
(=a weighted-average of the covariate, weighted by
each individual’s likelihood of dying at ti).
residual  xik 
jR (ti )
x
kj p j
i 1
Plot Schoenfeld residuals against time to evaluate PH
assumption
Satistics 262
7
Schoenfeld residuals
In SAS:
option on the output statement:
ressch=
Satistics 262
8
Influence diagnostics

How would the result change if a
particular observation is removed from
the analysis?
Satistics 262
9
Influence statistics
•
•
Likelihood displacement (ld): measures
influence of removing one individual on the
model as a whole. What’s the change in the
likelihood when this individual is omitted?
DFBETA-how much each coefficient will
change by removal of a single observation
•
negative DFBETA indicates coefficient increases
when the observation is removed
Satistics 262
10
Influence statistics
In SAS:
option on the output statement:
ld= dfbeta=
Satistics 262
11
What about repeated events?

Death (presumably) can only happen
once, but many outcomes could happen
twice…
Fractures
 Heart attacks
 Pregnancy
Etc…

Satistics 262
12
Repeated events: 1


Strategy 1: run a second Cox
regression (among those who had a
first event) starting with first event time
as the origin
Repeat for third, fourth, fifth, events,
etc.

Problems: increasingly smaller and smaller
sample sizes.
Satistics 262
13
Repeated events:Strategy 2

Treat each interval as a distinct
observation, such that someone who
had 3 events, for example, gives 3
observations to the dataset

Major problem: dependence between the
same individual
Satistics 262
14
Strategy 3


Stratify by individual (“fixed effects partial
likelihood”)
In PROC PHREG: strata id;




Problems:
does not work well with RCT data, however
requires that most individuals have at least 2
events
Can only estimate coefficients for those covariates
that vary across successive spells for each
individual; this excludes constant personal
characteristics such as age, education, gender,
ethnicity, genotype
Satistics 262
15