Cox Regression with Replicate Measures on Error

Download Report

Transcript Cox Regression with Replicate Measures on Error

Cox Model With Intermitten and Error-Prone Covariate Observation

Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor

Introduction

 Regression analysis of right-censored survival data commonly arises in many fields, especially in medical science.  The most popular survival data regression model is the Cox (1972) proportional hazard model, in which the hazard function for an individual with covariate vector X is modeled as  (

t

|

X

)   0 (

t

) exp( 

T X

)

Introduction

  In many cases, the covariate X is not measured exactly.

Instead of X, we observe a surrogate measure W, which is subject to error. Measurement error in the covariates has three main effects: a) b) c) causes bias in parameter estimation for statistical models; leads to a loss of power, sometimes profound, for detecting covariate effects; masks the features of the data, making graphical model analysis difficult. In addition, although in theory the covariate is a continuous process in time, in practice measurements are taken only over a discrete grid of timepoints (every 6 months, once a year, etc.). This discrepancy may also lead to a bias.

Introduction

  The classical measurement error model for individual i and measurement j can be written as:

W ij

X i

 

ij

where measurement error term is independent of X

i

mean given X

i

. Independence across i is also assumed.

with zero Let n be the number of individuals and J the number of observations (we assume that J is the same for all individuals), and suppose the measurements of the surrogate covariate are  

Introduction

   Tsiatis and Davidian (2004) have suggested a so-called joint model, where models for event time distribution and longitudinal data depend on a common set of latent random variables.

Andersen and Liestol (2003) proposed a simple approach based on the regression calibration idea. The bias due to observing the longitudinal covariate process over a discrete grid is handled by introducing additional variables to the standard Cox model, while measurement error is treated using an external procedure.

Most of the proposed approaches are limited to some special cases or/and specific distribution assumptions.

Proposed Method

  We propose an approach that is of intermediate complexity relative to the simple approach of Andersen and Liestol and the joint modeling approach.

We assume additive model for measurement error. The error term is independent across i and j, and independent of all other random variables in the model. We do not assume a specific parametric distribution of

ε ij

.

 We assume a working parametric model F for the conditional distribution

W i ( t 2 ) | { W i ( t 1 )

w , Y i ( t 2 )

1 } , t 2 > t 1

.

 Note that we do not assume that the data is actually distributed according to F, but rather than F is a close enough approximation to yield reasonable estimators of Cox coefficients.

Proposed Method

 Assume that Moment Generating Function (MGF) of F exists and is well-defined.  To cover a variety of cases, flexible distributions may be used, such as the Semi-Nonparametric distribution (SNP) of Gallant and Nychka, 1987.

Proposed Method

 To start with, assume that (no measurement

δ i N ( t )

I (

is the event indicator.

T i

t W ,

ij i

 

1 X ) i i

- event time,  (

t

) 

E

[

dN

(

t

) | {

W

(

u

),

u

t

},

Y

(

t

)  1 ]  It follows, that hazard function may be represented as:

( t )

E [ dN ( t ) | { W ( u ), u

t }, Y ( t )

1 ]

E [ E [ dN ( t ) | { W ( u ), u

t }, Y ( t )

1 ] | W ( u ), u

t }, Y ( t )

1 ]]

E [ e

W ( t ) | W ( u ), u

t }, Y ( t )

1 ]

where Y(t) is an indicator to be at risk at time t.

Proposed Method

 The above expression may be approximated by the moment generating function (MGF) of F. It follows that approximated hazard is given by: 

*

( t )

 

0 (

0 MGF F ( t

) Ee

W i ( ,

( t t ) | { W i (

( ), W i (

( t t )), Y i ( )), t t )

1 } )

  (

t

) is a nearest timepoint before t at which W

i

can see W

i

 

,...,

 is availible.

Proposed Method

 The discrete grid problem is treated by introducing working models for each central moment m

k

:

g k

(

m k

(

t

1 ,

t

2 ,

w

))  

k

6

w

 

k

7

w

2  

k

1  

k

8 (

t

2  

k

2

t

1  

k

3

t

1 2 

t

1 )

w

 

k

4 (

t

2 

t

1 )  

k

9

Slope hist

(

t

1 )  

k

5 (

t

2  

k T Z

(

t

1 ,

t

2 ,

w

) 

t

1 ) 2   In the above expression, g

k

is some function (chosen by numerical reasons), and Slope

hist

is a slope of the historical data (before t

1

).

θ k

are estimated using all available data at the observed timepoints

τ q

,

τ p

(

τ q

<

τ p

, conditioning on being at risk at

τ q

). OLS technique is applied.

Proposed Method

    Using estimated coefficients of the working model in the previous slide: Solve:

k

( 

q

, 

p

,

w

)) 

g k

 1 (  ˆ

k T Z

( 

q

, 

p

,

W

( 

q

))  Given these estimated moments, the formulas for F moments every t. (

W i

(  (

t

)),  (

t

),

t

) Given the above, the MGF is defined at every point, and the hazard can be calculated for every t > s.

The estimator for  is obtained using the Cox partial likelihood, incorporating the proposed hazard function.

 

Proposed Method

Note that:

Ee

W i

(

t

) 

Ee

 [

X i

(

t

)  

i

(

t

)] 

Ee

X i

(

t

)

Ee



i

(

t

) 

MGF X i

(

t

)

MGF

i

(

t

) Cox partial likelihood is given in this case by: 

L Cox

i n

  1

j n

  1

Y j MGF W i

(

T i

) (

T i

( 

i

)

MGF W j

) (

T i

) (  ) 

i n

  1

j n

  1

Y j MGF X i

(

T i

) (

T i

)

MGF X

( 

j

(

T i

)

MGF

 ) (  (

T i

) (  )

MGF

 (

T i

) ) (  )

i n

  1

n

  1

j Y j MGF X i

(

T i

) (

T i

)

MGF X

( 

j

(

T i

) ) (  )  Variance of is estimated using Weighted Bootstrap approach.

Simulation Study

  Data simulation is based on the setting of Andersen and Liestol (2003), patterned after a clinical trial studying the effect of prednisone treatment versus placebo on survival with liver cirrhosis (Christensen, 1986). The true data is simulated from the model:

X i

(

t

)  

t

A i

U i

(

t

)  where time. 

t

denotes a common trend of the form , A

i

represents initial variation between individuals, and is a stochastic process representing changes in the covariate over 

ME

t ik

Simulation Study

     Following Andersen and Liestol (2003), we take U 0.282.

i

(t) to be either a Brownian motion (BM) process or an Ornstein– Uhlenbeck (OU) process with correlation parameter  initial level of 72 and a decrease of 5 units per year.

In the paper, Cox regression parameter is:

β

=-0.04.

Sample size is 300, and 12 observations are available, every half a year (total trial length is 6 years). Failure times were simulated from a Weibull hazard. Each result was obtained based on 500 simulation runs.

Results

Process BM OU  -0.04 -0.06 -0.08 -0.04 -0.06 -0.08 -0.04 -0.06 -0.08 -0.04 -0.06 -0.08 -0.04 -0.06 -0.08 -0.04 -0.06 -0.08 Parameters

Estimation of β, W(t) is measured without error

 0 2.3 2.3 2.3 1 1 1 0.2 0.2 0.2 2.3 2.3 2.3 1 1 1 0.2 0.2 0.2 Proposed Method with SNP approximation  ˆ

std

(  ˆ ) -0.0425 -0.0627 -0.0832 -0.0419 -0.0631 -0.0840 -0.0409 -0.0621 -0.0831 -0.0398 -0.0530 -0.0628 -0.0411 -0.0569 -0.0676 -0.0423 -0.0626 -0.0776 0.0080 0.0093 0.0116 0.0099 0.0126 0.0157 0.0144 0.0177 0.0211 0.0088 0.0105 0.0119 0.0104 0.0122 0.0134 0.0190 0.0208 0.0213 Andersen and Liestol’s method  ˆ -0.0078 -0.0263 -0.0422 -0.0098 -0.0291 -0.0489 -0.0280 -0.0448 -0.0620 -0.0043 -0.0220 -0.0365 -0.0160 -0.0281 -0.0442 -0.0286 -0.0430 -0.0610

std

(  ˆ ) 0.0044 0.0067 0.0068 0.0053 0.0088 0.0092 0.0127 0.0186 0.0210 0.0051 0.0081 0.0082 0.0063 0.0107 0.0109 0.0157 0.0211 0.0216 LVCF (naive method)  ˆ -0.0066 -0.0235 -0.0351 -0.0136 -0.0311 -0.0449 -0.0257 -0.0422 -0.0611 -0.0029 -0.0202 -0.0306 -0.0112 -0.0292 -0.0393 -0.0248 -0.0447 -0.0564

std

(  ˆ ) 0.0035 0.0044 0.0047 0.0049 0.0065 0.0063 0.0101 0.0120 0.0136 0.0045 0.0061 0.0058 0.0056 0.0071 0.0075 0.0118 0.0147 0.0147 Proportion of events 0.58 0.57 0.56 0.28 0.27 0.27 0.09 0.09 0.08 0.58 0.56 0.56 0.34 0.29 0.28 0.09 0.09 0.09

Results

Estimation of β, W(t) is measured with error,

 0  2 .

3 Process BM OU Parameters  -0.04 -0.04 -0.06 -0.06 -0.08 -0.08 -0.04 -0.04 -0.06 -0.06 -0.08 -0.08   2 50 100 50 100 50 100 50 100 50 100 50 100 Proposed Method with SNP approximation  ˆ

std

(  ˆ ) -0.0401 -0.0386 -0.0579 -0.0551 -0.0743 -0.0694 -0.0371 -0.0352 -0.0490 -0.0464 -0.0575 -0.0544 0.0083 0.0088 0.0101 0.0109 0.0118 0.0125 0.0089 0.0093 0.0107 0.0114 0.0114 0.0127 Proportion of events 0.58 0.58 0.57 0.57 0.56 0.56 0.58 0.58 0.56 0.56 0.55 0.55

Results

Variance estimation by Weighted Bootstrap

Proposed method with SNP approximation,  Process   2  ˆ

std emp

(  ˆ )   0 .

04 ,  0

std Boot

(  ˆ )

P

 2 .

3

cover

BM OU 0 213 0 213 -0.0425 -0.0386 -0.0398 -0.0349 0.0080 0.0128 0.0088 0.0108 0.0076 0.9480 0.0100 0.8840 0.0087 0.9350 0.0114 0.8720

Conclusions

 We propose a new semiparametric estimation approach for the Cox regression model when covariates are measured intermittently and with error. The intermittent measurement issue is handled by modeling the parameters of the distribution of the covariate among individuals still at risk as a function of time. The relative risk is then computed using the MGF.

 The accuracy of the proposed estimators depends critically on the form of the OLS working model for in between times. Increasing the accuracy of estimates by assuming more flexible interpolation model is a topic for future research.

Conclusions

 In a simulation study we found that in most cases the proposed method provides reasonable estimates for the Cox regression parameter and its standard deviations.

 Because the SNP model covers a range of distributional shapes, the method can be applied in a range of settings.

 The computational burden is moderate – less than one minute for one run for the SNP - based procedure.