Statistics 262: Intermediate Biostatistics
Download
Report
Transcript Statistics 262: Intermediate Biostatistics
Statistics 262: Intermediate
Biostatistics
May 18, 2004: Cox Regression III: residuals and
diagnostics, repeated events
Jonathan Taylor and Kristin Cobb
Satistics 262
1
Residuals
Residuals are used to investigate the
lack of fit of a model to a given subject.
For Cox regression, there’s no easy
analog to the usual “observed minus
predicted” residual of linear regression
Satistics 262
2
Deviance Residuals
Deviance residuals are based on
martingale residuals: ci (1 if event, 0 if
censored) minus the estimated
cumulative hazard to ti (as a function of
fitted model) for individual i:
ci-H(ti,Xi, ßi)
See Hosmer and Lemeshow for more
discussion…
Satistics 262
3
Deviance Residuals
Behave like residuals from ordinary linear
regression
Should be symmetrically distributed around 0
and have standard deviation of 1.0.
Negative for observations with longer than
expected observed survival times.
Plot deviance residuals against covariates to
look for unusual patterns.
Satistics 262
4
Deviance Residuals
In SAS, option on the output statement:
Ouput out=outdata resdev=
Satistics 262
5
Schoenfeld residuals
Schoenfeld (1982) proposed the first set of residuals
for use with Cox regression packages
Schoenfeld D. Residuals for the proportional hazards
regresssion model. Biometrika, 1982, 69(1):239-241.
Instead of a single residual for each individual, there
is a separate residual for each individual for each
covariate
Based on the individual contributions to the derivative
of the log partial likelihood (see chapter 6 in Hosmer
and Lemeshow for more math details, p.198-199)
Note: Schoenfeld residuals are not defined for
censored individuals.
Satistics 262
6
Schoenfeld residuals
Where K is the covariate of interest,
the Schoenfeld residual is the covariate-value, Xik, for
the person (i) who actually died at time ti minus the
expected value of the covariate for the risk set at ti
(=a weighted-average of the covariate, weighted by
each individual’s likelihood of dying at ti).
residual xik
jR (ti )
x
kj p j
i 1
Plot Schoenfeld residuals against time to evaluate PH
assumption
Satistics 262
7
Schoenfeld residuals
In SAS:
option on the output statement:
ressch=
Satistics 262
8
Influence diagnostics
How would the result change if a
particular observation is removed from
the analysis?
Satistics 262
9
Influence statistics
•
•
Likelihood displacement (ld): measures
influence of removing one individual on the
model as a whole. What’s the change in the
likelihood when this individual is omitted?
DFBETA-how much each coefficient will
change by removal of a single observation
•
negative DFBETA indicates coefficient increases
when the observation is removed
Satistics 262
10
Influence statistics
In SAS:
option on the output statement:
ld= dfbeta=
Satistics 262
11
What about repeated events?
Death (presumably) can only happen
once, but many outcomes could happen
twice…
Fractures
Heart attacks
Pregnancy
Etc…
Satistics 262
12
Repeated events: 1
Strategy 1: run a second Cox
regression (among those who had a
first event) starting with first event time
as the origin
Repeat for third, fourth, fifth, events,
etc.
Problems: increasingly smaller and smaller
sample sizes.
Satistics 262
13
Repeated events:Strategy 2
Treat each interval as a distinct
observation, such that someone who
had 3 events, for example, gives 3
observations to the dataset
Major problem: dependence between the
same individual
Satistics 262
14
Strategy 3
Stratify by individual (“fixed effects partial
likelihood”)
In PROC PHREG: strata id;
Problems:
does not work well with RCT data, however
requires that most individuals have at least 2
events
Can only estimate coefficients for those covariates
that vary across successive spells for each
individual; this excludes constant personal
characteristics such as age, education, gender,
ethnicity, genotype
Satistics 262
15