Transcript slides

On ranking in survival analysis: Bounds
on the concordance index
Vikas C. Raykar | Harald Steck | Balaji Krishnapuram
CAD & Knowledge Solutions (IKM CKS), Siemens Medical Solutions USA, Inc., Malvern, USA
Cary Dehing-Oberije | Philippe Lambin
Maastro clinic, University Hospital Maastricht, University Maastricht-GROW, The Netherlands
NIPS 2007
1
Organization
•
•
•
•
•
•
Motivation
Brief review of survival analysis
Concordance index
Our proposed ranking approach
Connections to survival analysis
Results
2
Motivation: Personalized medicine
Predict survival time of lung cancer
patients.
Different kinds of treatment
Chemo/radiotherapy dosage
Survival time
Different patient characteristics
Age/gender/health
Dataset available from MAASTRO
hospital our collaborator.
3
Why not use regression?
• Not amenable to standard statistical/
machine learning methods due to
censored data.
• Well studied in statistics as survival
analysis.
4
Review: Survival Analysis
Branch of statistics that deals with
time until the occurrence of a event
 When did a patient die ?
 When did the disease manifest?
 When did the machine fail?
Widely used in medical statistics, epidemiology,
reliability engineering, economics, sociology,
marketing, insurance, etc.
5
What is censored data?
Start of the study
Patient unavailable
for follow-up
Some
patients die
during the
study period.
Patient 1
2001
At the end of the
study a lot of
patients may
still survive.
Data collected
at this time
End of study
Death
TIME
Censored Data
2005
The exact survival time may be longer than the observation period
6
Censoring provides only partial information
Typically a large portion of the data is censored.
Survival Time
Observed Data
Censored Data
7
Notation: Survival analysis
8
Proportional Hazard (PH) Model
•
Has become a standard model for studying the effect of
covariates on survival time distributions.
unknown regression
parameters
relative
hazard function
Baseline hazard function
covariate
• Parameter estimates for PH model are obtained by maximizing
Cox’s partial likelihood.
9
Concordance Index or c-index
• Standard performance measure for model
assessment in survival analysis.
• Generalization of the area under the ROC
curve to regression problems/censored
data.
• Fraction of all pairs of subjects who's
survival times can be ordered such that the
subject with higher predicted survival is the
one who actually survived longer.
10
Concordance Index-no censoring
5
5
4
Survival time
4
3
2
3
1
covariate
2
1
C=1 perfect prediction accuracy
C=0.5 as good as a random predictor
11
Concordance Index-with censoring
5
5
4
Survival time
4
3
3
No arrow can
go above a
censored point
1
2
2
1
Censored
12
Proposed approach:
Maximize CI directly
•
While CI is widely used to evaluate a learnt
model, it is not generally used as an objective
function for training.
• CI is invariant to monotone transformation of the
survival times.
• Hence the model learnt by maximizing the CI is a
ranking function. (N-partite ranking problem)
13
Lower bounds on the CI
Discrete optimization problem
Use a differentiable
concave lower bound
Related to the PH model
14
Maximize lower bounds on the CI
Linear ranking functions
Regularization
Use gradient
based methods to
maximize this
15
Connection to the PH model
Log-likelihood for correct ranking
For a proportional hazard model we can
show that
This is a common assumption made in ranking
literature. We have shown that if we use PH
models this is exactly the case.
16
Penalized log-likelihood
Compare this with the objective function
using the lower bound approach
17
Cox partial likelihood
• Our proposed method explicitly maximizes
a lower bound.
• Cox method maximizes partial likelihood.
• Experimental results indicate that both do
well.
• Conjecture: Is Cox’s partial likelihood also
a lower bound on the CI?
18
Cox partial likelihood (cont.)
19
Results
Proposed method slightly
better than Cox-PH.
However differences not
significant.
20
Thank You ! | Questions ?
21