Transcript Slide 1
Empirical Likelihood Mai Zhou Department of Statistics University of Kentucky • A new (2001) book by A. Owen “Empirical Likelihood” . But Cox model with likelihood ratio output exists for a long time. SAS proc phreg, Splus/R function coxph( ) all have it computed. Claim: The (partial) likelihood ratio statistic for the regression coefficients in Cox model can be interpreted as a case of Empirical Likelihood Ratio. (Pan 1997) Empirical Likelihood allows the statistician to employ likelihood methods, without having to pick a parametric family of distributions for the data. --- Owen Empirical Likelihood allows for testing hypothesis and constructing confidence regions without a variance estimator. • The advantage is most visible • When sample sizes are small—medium • When parameter(s) is/are near boundary • For n observations, • independent, from likelihood is • EL(F) = Where x1 , x2 ,...xn F (t ) the empirical F ( x ) i i 1,...,n F ( xi ) PF ( X xi ) • Censored Observations For a right censored observation x i , the likelihood contribution is 1 F ( xi ) • For a left censored observation the contribution is F ( xi ) • Interval censored: F (U i ) F ( Li ) Truncated observations For a left truncated observation (often referred to as delayed entry) : (entry time, survival time) = yi , xi • The likelihood contribution is F ( xi ) 1 F ( yi ) • If the survival time is right censored, then the contribution is 1 F ( xi ) 1 F ( yi ) Maximize the log empirical likelihood with/without the mean fixed at a given value. (or median or hazard or … ) -2 [max log EL(mean fixed) – max log EL(not fixed)] Has an approximate chi-square distribution if the mean is fixed at correct value – the null hypothesis. (proofs are rather involved for censored data, the maximizer is difficult to describe….) (actual computation is easier -- iteration) • Idea of proof: construct distributions F (t | , h) • Such that F (t | 0, h) the KaplanMeier estimator. • Where is a 1-dim parameter, h() is a function • It is easier to find the max for this family of distributions, easier to workout the asymptotics. (fix h() ) • We then max over all possible h() • Quantity to be fixed 1. g (t )dF (t ) 2. g (t )d(t ) Where g (t ) and or are given. • Once we proved the chi-square limiting distribution for the –2 log lik ratio (Wilks Theorem), the implementation is simple conceptually – finding the maximums. leaves the dirty work to computer – search for the maximum. • This feature is similar to the bootstrap method. Software R is “Gnu S” or free Splus http://www.cran-us.org http://www.r-project.org Many additional packages available for R. • There is a package called emplik, mostly does testing hypothesis using empirical likelihood ratio with censored or truncated data library(emplik) library(help=emplik) el.cen.EM(x, d, fun=gfun, mu=0.5) Paired comparison, log(times) • • • • • • • • • • Y1 Y2 d=Y1-Y2 2.73 2.98+ -0.25-2.80 2.98+ -0.18-2.01 2.84 -0.83 2.19 2.76 -0.57 2.34 2.83 -0.49 …………………………………… 2.97 2.64 0.33 2.74 2.31 0.43 2.96 2.51 0.45 2.98+ 2.68 0.30+ Test H 0 : median of (Y1-Y2)=0 • • • • • • The largest loglik is -41.19336 The loglik at median =0 is -41.43003 The chi-sq statistic is 2x(-41.19336+41.43003)=0.47334 The P-value is 0.5085 95% confidence interval= [-0.57, 0.33] »The P-value is 0.5085 Improving estimation/testing in Cox proportional hazards model Make use of additional information on the baseline hazard library(coxEL) coxphEL(Surv(time, status)~x, gfun=myfun, lam=0.2 )