Statistics 262: Intermediate Biostatistics

Transcript Statistics 262: Intermediate Biostatistics

Statistics 262: Intermediate
Biostatistics
May 11, 2004: Cox Regression II: tied data, PH
assumption, time-dependent covariates
Jonathan Taylor and Kristin Cobb
Satistics 262
1
Recall: Partial Likelihood
L p (β) 
m

i 1
(
e
βx i

e
βx j
) i
Where, jis the censoring
variable (1=if event, 0 if
censored) and R(ti)is the risk
set at time ti
jR ( ti )
Satistics 262
2
Ties


The PL assumed no tied values among
the observed survival times
Not often the case with real data
Satistics 262
3
Ties




Exact method (time is continuous; ties are a result
of imprecise measurement of time)
Breslow approximation (SAS default)
Efron approximation
Discrete method (treats time as discrete; ties are
real)
In SAS:
option on the model statement:
ties=exact/efron/breslow/discrete
Satistics 262
4
Ties: Exact method
Assumes ties result from imprecise measurement of time.

Assumes there is a true unknown order of events in time.

Mathematically, the exact method is an exact method. It
calculates the exact probability of all possible orderings of
events.

For example, in the hmohiv data, there were 15 events at
time=1 month. (We can assume that all patients did not die
at the precise same moment but that time is measured
imprecisely.) ID’s= 13, 16, 28, 32, 52, 54, 69, 72, 78, 79, 82,
83, 93, 96, 100

With 15 events, there are 15! (1.3x1012)different orderings.
 Instead of 15 terms in the partial likelihood for 15 events, get 1
term that equals:
Where Oi is the ith possible ordering;
15!
for example, here, 15!th ordering is:
L  P(Oi ) 100, 96, 93, 83, 82, 79, 78, 72, 69, 54,
26216, 13
5
52,Satistics
32, 28,
i 1


Exact, continued
15!
L   P(Oi )
i 1
eβx100
eβx 96
eg : P(O15! )  ( βx1 βx 2
)( βx1 βx 2
)
βx 100
βx 99
e  e  ...e
e  e  ...e
eβx 93
( βx1 βx 2
)....
βx 95
βx 97
βx 98
βx 99
e  e  ...e  e  e  e
Each P(Oi) has 15 terms; sum 15! P(Oi)’s…
Hugely complex computation!…so need approximations…
Satistics 262
6
Breslow and Efron methods
Breslow (1974)
 Efron (1977)
 Both are approximations to the exact method.
both have much faster calculation times
Breslow is SAS default.
Breslow does not do well when the number of
ties at a particular time point is a large
proportion of the number of cases at risk.
Prefer Efron to Breslow
We’ll see how to implement in SAS today and
Satistics 262
7
compare methods.

Discrete method
Assumes time is truly discrete.
 When would time be discrete?
When events are only periodic, such as:
--Winning an Olympic medal (can only happen
every 4 years)
--Missing this class (can only happen on
Tuesdays at 10am)
--Voting for President (can only happen every 4
years)

Satistics 262
8
Discrete method


Models proportional odds: coefficients represent odds
ratios, not hazard ratios.
For example, at time= 1 month in the hmohiv data, we
could ask the question: given that 15 events occurred,
what is the probability that they happened to this
particular set of 15 people out of the 98 at risk at 1
month?
Odds are a function of an
individual’s covariates.
15
L1 
 Oddsi
i 1
15
15
 Odds   Odds  ....
j
jP (1)
j
jP ( 2 )
All possible sets of 15 out of 98!
Satistics 262
Recursive algorithm makes it
possible to calculate.
9
Evaluation of Proportional
Hazards assumption:
Recall proportional hazards concept:
hi (t ) 0 (t )e 1xi1
1 ( xi1  x j 1 )
HR 


e
h j (t ) 0 (t )e 1x j1
implies: hi (t )  HRh j (t ); where hazard ratio HR is constant
Satistics 262
10
Evaluation of Proportional
Hazards assumption:
hi (t )  HRhj (t )
t
S j (t )  e
  h j ( u ) du
0
t
and S i (t )  e
  HRh j ( u ) du
0
t
Multiply
both sides
by a
negative
and take
logs again
 S i (t )  e
HR (   h ( u ) du )
S i (t )  (e
0
t
(   h ( u ) du )
0
) HR  S i (t )  S j (t ) HR
Take log of both
sides
log Si (t )  log S j (t ) HR  log Si (t )  HR log S j (t )
log( log Si (t ))  log( HR log S j (t ))
log( log Si (t ))  log HR  log( log S j (t ))
Y (t )  K  X (t )
Satistics 262
i.e., log(-log) survival curves are
parallel,
and different by log(HR)
11
Evaluation of Proportional
Hazards assumption:
e.g., graph we’ll produce in lab today…
Satistics 262
12
Cox models with NonProportional Hazards
Violation of the PH assumption for a given
covariate is equivalent to that covariate having a
Time-interaction
significant interaction with time.
log h(t )  log 0 (t )   x x   xt xt 
log h(t )  log 0 (t )  (  x   xt t ) x
coefficient
The covariate
multiplied by time
If Interaction coefficient is significant indicates non-proportionality, and at the same time its
inclusion in the model corrects for non-proportionality!
Negative value indicates that effect of x decreases linearly with time.
Positive value indicates that effect of x increases linearly with time.
262
This introduces the concept of a time-dependent Satistics
covariate…
13
Time-dependent covariates





Covariate values for an individual may change over time
For example, if you are evaluating the effect of taking the drug
raloxifene on breast cancer risk in an observational study, women
may start and stop the drug at will. Subject A may be taking
raloxifene at the time of the first event, but may have stopped taking
it by the time the 15th case of breast cancer happens.
If you are evaluating the effect of weight on diabetes risk over a long
study period, subjects may gain and lose large amounts of weight,
making their baseline weight a less than ideal predictor.
If you are evaluating the effects of smoking on the risk of pancreatic
cancer, study participants may change their smoking habits
throughout the study.
Cox regression can handle these time-dependent covariates!
Satistics 262
14
Time-dependent covariates


For example, evaluating the effect of taking
oral contraceptives (OCs)on stress fracture
risk in women athletes over two years—many
women switch on or off OCs .
If you just examine risk by a woman’s OCstatus at baseline, can’t see much effect for
OCs. But, you can incorporate times of
starting and stopping OCs.
Satistics 262
15
Time-dependent covariates


Ways to look at OC use:
Not time-dependent




Ever/never during the study
Yes/no use at baseline
Total months use during the study
Time-dependent


Using OCs at event time t (yes/no)
Months of OC use up to time t
Satistics 262
16
Time-dependent covariates
ID
1
2
3
4
5
6
7
Time
12
11
20
24
19
6
17
Fracture
1
0
1
0
0
1
1
StartOC
0
10
.
0
0
.
1
Satistics 262
StopOC
12
11
.
24
11
.
7
17
The PL using baseline value of
OC use
e  ( 0)
e  (1)
e  ( 0)
e  (0)
L p (  oc )   (1)
x  (1)
x  (1)
x  (1)
 ( 0)
 ( 0)
 ( 0)
3e  4e
3e  2e
2e  2e
e  e  ( 0)
A second time-independent option would be to
use the variable “ever took OCs” during the study
period
Satistics 262
18
Time-dependent covariates
ID
1
2
3
4
5
6
7
Time
12
11
20
24
19
6
17
Fracture
1
0
1
0
0
1
1
StartOC
0
10
.
0
0
.
1
Satistics 262
StopOC
12
11
.
24
11
.
7
19
The PL at t=6
e x6 (t 6)
L p (  oc )  x1 ( 6)
e
 e x2 ( 6)  e x3 ( 6)  e x4 ( 6)  e x5 ( 6)  e x6 ( 6)  e x7 ( 6)
Satistics 262
20
Time-dependent covariates
ID
1
2
3
4
5
6
7
Time
12
11
20
24
19
6
17
Fracture
1
0
1
0
0
1
1
StartOC
0
10
.
0
0
.
1
Satistics 262
StopOC
12
11
.
24
11
.
7
21
The PL at t=6
e x6 (t 6)
L p (  oc )  x1 ( 6)
e
 e x2 ( 6)  e x3 ( 6)  e x4 ( 6)  e x5 ( 6)  e x6 ( 6)  e x7 ( 6)
e  (0)
  (0)
3e
 4e  (1)
Satistics 262
22
Time-dependent covariates
ID
1
2
3
4
5
6
7
Time
12
11
20
24
19
6
17
Fracture
1
0
1
0
0
1
1
StartOC
0
10
.
0
0
.
1
Satistics 262
StopOC
12
11
.
24
11
.
7
23
The PL at t=12
e  ( 0)
e x1 (t 12 )
L p (  oc )   ( 0)
x x1 (12 )
 (1)
3e
 4e
e
 e x3 (12 )  e x4 (12 )  e x5 (12 )  e x7 (12 )
Satistics 262
24
Time-dependent covariates
ID
1
2
3
4
5
6
7
Time
12
11
20
24
19
6
17
Fracture
1
0
1
0
0
1
1
StartOC
0
10
.
0
0
.
1
Satistics 262
StopOC
12
11
.
24
11
.
7
25
The PL at t=12
e  ( 0)
e  (1)
L p (  oc )   ( 0)
x  (1)
 (1)
3e
 4e
2e  3e  ( 0)
Satistics 262
26
Time-dependent covariates
ID
1
2
3
4
5
6
7
Time
12
11
20
24
19
6
17
Fracture
1
0
1
0
0
1
1
StartOC
0
10
.
0
0
.
1
Satistics 262
StopOC
12
11
.
24
11
.
7
27
Time-dependent covariates
ID
1
2
3
4
5
6
7
Time
12
11
20
24
19
6
17
Fracture
1
0
1
0
0
1
1
StartOC
0
10
.
0
0
.
1
Satistics 262
StopOC
12
11
.
24
11
.
7
28
The PL at t=17
e  ( 0)
e  (1)
e  ( 0)
L p (  oc )   ( 0)
x  (1)
x  (1)
 (1)
 ( 0)
3e
 4e
2e  3e
e  3e  ( 0)
Satistics 262
29
Time-dependent covariates
ID
1
2
3
4
5
6
7
Time
12
11
20
24
19
6
17
Fracture
1
0
1
0
0
1
1
StartOC
0
10
.
0
0
.
1
Satistics 262
StopOC
12
11
.
24
11
.
7
30
Time-dependent covariates
ID
1
2
3
4
5
6
7
Time
12
11
20
24
19
6
17
Fracture
1
0
1
0
0
1
1
StartOC
0
10
.
0
0
.
1
Satistics 262
StopOC
12
11
.
24
11
.
7
31
The PL at t=20
e  ( 0)
e  (1)
e  (0)
e  ( 0)
L p (  oc )   ( 0)
x  (1)
x  (1)
x  (1)  ( 0)
 (1)
 ( 0)
 ( 0)
3e
 4e
2e  3e
e  3e
e e
vs. PL for OC-status at baseline (from before):
e  ( 0)
e  (1)
e  ( 0)
e  (0)
L p (  oc )   ( 0)
x  (1)
x  (1)
x  (1)
 (1)
 ( 0)
 ( 0)
4e
 3e
3e  2e
2e  2e
e  e  ( 0)
Satistics 262
32
Next week: Cox regression III



Diagnostics and influence statistics
Sample size calculations
Repeated events
Satistics 262
33

Statistics 262: Intermediate Biostatistics

Transcript Statistics 262: Intermediate Biostatistics

Directory