Transcript Slide 1

Empirical Likelihood
Mai Zhou
Department of Statistics
University of Kentucky
• A new (2001) book by A. Owen
“Empirical Likelihood” .
But Cox model with likelihood ratio
output exists for a long time. SAS proc
phreg, Splus/R function coxph( ) all
have it computed.
Claim: The (partial) likelihood ratio
statistic for the regression coefficients
in Cox model can be interpreted as a
case of Empirical Likelihood Ratio.
(Pan 1997)
Empirical Likelihood allows the statistician
to employ likelihood methods, without
having to pick a parametric family of
distributions for the data. --- Owen
Empirical Likelihood allows for testing
hypothesis and constructing confidence
regions without a variance estimator.
• The advantage is most visible
• When sample sizes are small—medium
• When parameter(s) is/are near boundary
• For n observations,
• independent, from
likelihood is
• EL(F) =
Where
x1 , x2 ,...xn
F (t )
the empirical
 F ( x )
i
i 1,...,n
F ( xi )  PF ( X  xi )
•
Censored Observations
For a right censored observation x i , the
likelihood contribution is
1  F ( xi )
• For a left censored observation the contribution is
F ( xi )
• Interval censored:
F (U i )  F ( Li )
Truncated observations
For a left truncated observation (often referred to
as delayed entry) :
(entry time, survival time) = yi , xi
• The likelihood contribution is
F ( xi )
1  F ( yi )
• If the survival time is right censored, then the
contribution is
1  F ( xi )
1  F ( yi )
Maximize the log empirical likelihood with/without the
mean fixed at a given value. (or median or hazard
or … )
-2 [max log EL(mean fixed) – max log EL(not fixed)]
Has an approximate chi-square distribution if the
mean is fixed at correct value – the null hypothesis.
(proofs are rather involved for censored data, the
maximizer is difficult to describe….)
(actual computation is easier -- iteration)
• Idea of proof: construct distributions
F (t |  , h)
• Such that
F (t |   0, h)  the KaplanMeier estimator.
• Where  is a 1-dim parameter, h() is a function
• It is easier to find the max for this family of
distributions, easier to workout the asymptotics.
(fix h() )
• We then max over all possible h()
• Quantity to be fixed
1.
 g (t )dF (t )  
2.
 g (t )d(t )  
Where g (t ) and

or

are given.
• Once we proved the chi-square limiting
distribution for the –2 log lik ratio (Wilks
Theorem), the implementation is simple
conceptually – finding the maximums.
leaves the dirty work to computer – search for
the maximum.
• This feature is similar to the bootstrap method.
Software R is “Gnu S” or free Splus
http://www.cran-us.org
http://www.r-project.org
Many additional packages available for R.
• There is a package called emplik, mostly
does testing hypothesis using empirical
likelihood ratio with censored or truncated
data
library(emplik)
library(help=emplik)
el.cen.EM(x, d, fun=gfun, mu=0.5)
Paired comparison, log(times)
•
•
•
•
•
•
•
•
•
•
Y1
Y2
d=Y1-Y2
2.73
2.98+
-0.25-2.80
2.98+
-0.18-2.01
2.84
-0.83
2.19
2.76
-0.57
2.34
2.83
-0.49
……………………………………
2.97
2.64
0.33
2.74
2.31
0.43
2.96
2.51
0.45
2.98+
2.68
0.30+
Test H 0 : median of (Y1-Y2)=0
•
•
•
•
•
•
The largest loglik is -41.19336
The loglik at median =0 is -41.43003
The chi-sq statistic is
2x(-41.19336+41.43003)=0.47334
The P-value is 0.5085
95% confidence interval= [-0.57, 0.33]
»The P-value is 0.5085
Improving estimation/testing in Cox
proportional hazards model
Make use of additional information on the
baseline hazard
library(coxEL)
coxphEL(Surv(time, status)~x, gfun=myfun,
lam=0.2 )