Biostatistics in Prostate Cancer Research

Download Report

Transcript Biostatistics in Prostate Cancer Research

Cancer Control Journal Club
Survival Analysis
Elizabeth Garrett-Mayer, PhD
Director of Biostatistics, HCC
October 2, 2014
Motivating example
Wang C-Y, Tapsoba JD, Anderson M, et al.
Time to Screening in Systems of Support to Increase Colorectal Cancer
Screening Trial
Cancer Epidemiology, Biomarkers & Prevention
2014; 23(8):1683-1688
Design paper:
Green BB, Wang C-Y, Horner K, et al.
Systems of Support to Increase Colorectal Cancer Screening and Followup Rates (SOS): Design Challenges, and baseline Characteristics of Trial
Participants
Contemporary Clinical Trials
2010; 31: 589-603.
Summary
• Understanding how interventions affect time to completion of
colorectal cancer (CRC) screening might assist in planning and
delivering population-based screening interventions.
• The Systems of Support to Increase CRC Screening (SOS) study was
conducted between 2008 and 2011 at 21 primary care medical
centers in Western Washington.
• Participants in the study, aged 50-73 years, were eligible if they
were enrolled in Group Health and were due for CRC screening.
• 4,675 recruited participants were randomized to usual care (UC) or
one of three interventions with incremental levels of systems of
support for completion of CRC screening.
• We conducted time to screening analyses of the SOS data in year 1
and year 2.
Time to event outcomes
• In cancer research, we are often interested in
measuring time until an event occurs
• The event is usually bad so we are trying to
prevent the event from occurring:
• Time to death
• Time to recurrence
• Time to progression
• However, sometimes the event is good:
• Time to screening
• Time to recovery of neutrophil counts.
Why time to event (TTE) outcomes are tricky
• At first glance, these ‘times’ look like
continuous variables.
• But, in most studies, at the end of the study,
many patients will not have had the outcome.
• These events that are not observed are called
‘censored’
• More specifically, “right censored”
• (left-censoring occurs, too, but not discussed today)
Simple example:
Time 0
Introduce “administrative” censoring
Time 0
STUDY END
Introduce “administrative” censoring
Time 0
STUDY END
More realistic:
patients enter study at different times
Time 0
STUDY END
More realistic:
patients enter study at different times
Time 0
STUDY END
Additional issues
• Patient drop-out
• Loss to follow-up
Drop-out or LTFU
Time 0
STUDY END
How do we quantify the data?
Shift everything
so each
patient time
represents time
on study
Time of
enrollment
Quantifying TTE data
• Requires two pieces of
information per subject:
– The time
– Indicator of whether or
not the time is an event
time or a censoring time.
Time
Death
12
1
15
1
27
1
143
0
26
0
68
1
42
0
55
1
Our example
• The outcome of interest is time to screening.
• Time to screening in year 1
• Time to screening in year 2
• Important to define when “the clock” starts
– Very often starts at randomization.
– Sometimes at diagnosis or at surgery
– Depends on the type of study and it is not always obvious.
• To calculate time to screening in year 1 you need:
– Date the patient was randomized
– Date the patient was screened
– If no screening occurred by 365 days, patient’s time was
censored.
– There were other reasons for censoring earlier (e.g. dropped
out of plan, moved, etc.). Last date of a clinic visit would be
used to calculate a censored time.
Groups
• Routine/Usual Care: Patients are reminded to complete preventive health
screening tests by an outreach birthday letter reminding them of
screening tests that are due (including colorectal cancer screening) as well
as efforts by health care providers at medical visits.
• Automated: Usual care plus automated support in the beginning of each
study year, including mailed reminders, FOBT screening cards, information
about colorectal cancer screening options, and a phone number to call in
case they had questions or wanted to discuss screening preferences.
• Assisted: Usual care plus Automated support plus a phone call from a
medical assistant (soon after automated support was provided), who
documented elected screening options and sent patients’ requests to their
assigned physicians.
• Navigated: Usual plus Automated support and Assisted support plus care
management from a nurse who provided telephone counseling about
screening options, action plans, and follow-up to assure screening
completion.
Kaplan–Meier survival (nonscreened) curves for time to completion
of any colorectal cancer screening for year 1 or year 2.
14 by American Association for Cancer Research
Wang C et al. Cancer Epidemiol Biomarkers Prev
2014;23:1683-1688
Describing TTE data: Kaplan-Meier curves
• Kaplan-Meier curves show the attrition over time
as the events occur in the population.
– Y-axis: Proportion of patients who have not had the
event. Usually denoted S(t) for ‘survival at time t’.
– X-axis: time (t)
• Provide a picture of the survival distribution
• “step-function” where each step represents one
or more events at that particular time
• Hash-marks indicate censored times
• At time=0, the height is always S(t)=1
Describing TTE data: Kaplan-Meier curves
• Why are they special?
• They incorporate the partial information provided by
censored times.
– Standard methods would treat a missing value as missing
– KM approaches use the information regarding how long a
person has ‘survived’ without the event
• The curve represents the proportion of people who
haven’t had the event among those in the ‘risk set’
• The risk includes all people who have events in the
future: both those whose times we know and we don’t
know.
Kaplan–Meier survival (nonscreened) curves for time to completion
of any colorectal cancer screening for year 1 or year 2.
14 by American Association for Cancer Research
Wang C et al. Cancer Epidemiol Biomarkers Prev
2014;23:1683-1688
Now what?
• Summary statistics are usually of interest
• Common ways of reporting TTE outcomes
– Median time to event
• Why not the mean?
– S(t) for a particular time
• Pancreatic cancer: Overall survival at 6 months
• Breast cancer: Recurrence-free survival at 5 years
• Colorectal screening: Unscreened population at 1 year.
• 95% confidence intervals should also be reported.
Our example: Proportion screened at 1 year
Group
1 year screening
rates
Usual Care
39.7%
Automated
62.2%
Assisted
68.3%
Navigated
74.4%
• Reported rates are based
on simple proportions
with intent to treat
• KM estimates at 1 year
would be slightly different
• Why?
– ITT assumes drop-outs
before screen did not have
screening
– KM approach keeps them
in and makes no
assumptions regarding
their screening after
dropout.
So how do we get p-values?!
• Two main approaches
– Log rank tests
– Cox regression analysis
• There are plenty of other ways!
– Gehan test, Flemington-Harrington tests, TaroneWare, (Modified) Peto-Peto…..
– Additive hazards regression, accelerated failure
times model,…
• More later on which to choose….
Hazards
• Survival analysis is based on the concept of a
hazard function, h(t), which can be hard to grasp.
• Technically, the hazard function at time t
represents the event rate at time t conditional on
survival until time t or later
• Sort of like a probability of having the event at
time t, but the units are not the same.
• Hazards themselves are not interpretable, but
– Relative hazards are: they represent relative risks
– Hazard functions can be inspected, to see if rates
increase or decrease.
Hazard function vs. Survival function
• If the hazard rate is constant (h(t) = λ for all t),
then S(t) = e-λt
• If not, then there are integrals and limits, etc.
involved.
• Don’t worry about the terminology: the
important point is that there is a one-to-one
correspondence between S(t) and h(t)
• Why did I bring this up? Most of our hypothesis
testing approaches rely on the comparison of
hazards.
Example of hazards vs. survival
• Kidney infection data
• Two groups:
– patients with percutaneous placement of
catheters (N=76)
– patients with surgical placement of catheters
(N=43)
1.0
0.6
0.4
0.2
Percutaneous
Surgical
5
10
15
20
25
0.04
0.02
Estimated Hazard Functions
0.06
0.08
0.10
Time to infection (months)
0.00
0
hazard
0.0
S(t)
0.8
Kaplan-Meier Survival Curves
0
5
10
15
time
20
25
Hypothesis Testing
• Global hypothesis test:
– H0: the rates of time to screening are the same in the four
groups
– H1: the rates of time to screening are different in the three
groups.
• Pairwise hypothesis test:
– H0: the rates of time to screening in Usual Care group are the
same as in Automated group
– H1: the rates of time to screening in Usual Care group are
different than in Automated group
– The log-rank test provides a p-value for determining if there
is evidence to reject H0
Log-rank test
• Practically, it is a weighted average of the
differences in the event (hazard) rates in the
groups being compared.
• Log-rank tests give as much weights to
differences that occur early as differences that
occur later in time.
– Often criticized
– By contrast, Gehan test gives more weight to early
differences.
• You just get a p-value (no summary statistic to
interpret).
Cox regression
(aka Proportional hazards regression)
• Very popular regression approach for TTE data
• Two very appealing aspects of this model:
– It’s a regression approach so we can adjust for other
covariates/confounders.
– The model coefficients are very interpretable: log hazard ratios.
• Hazard ratio: a relative risk. It’s the ratio of the hazard rate
in group A vs. the hazard rate in group B.
• In our example, for the analysis of time to screening in Year 1:
– Automated vs. Usual care: HR = 2.46 (95% CI: 2.19, 2.77)
– Assisted vs. Usual care:
HR = 2.82 (95% CI: 2.50, 3.16)
– Navigated vs. Usual care: HR = 3.23 (95% CI: 2.88, 3.62))
Big Assumption
• Proportional hazards assumption: the survival
curves for two groups must have hazard
functions that are proportional over time (i.e.
constant relative hazard).
• Simpler: Relative risk of the event is constant
over time.
• How does this look?
Proportionality
• Notice that this is via
the hazard functions.
• Alternatively, log(h(t))
curves are parallel
(easier to see
proportionality).
• Very hard to tell if
assumption holds by
looking at survival
curves!
Diagnostics
• There are approaches to evaluating the
assumption
• Graphical diagnostics
– Graphs of the hazards (or log hazards) over time
per group
– Graph of the hazard RATIO (or log HR) over time
– Tests of the (Schoenfeld) residuals.
Kaplan–Meier survival (nonscreened) curves for time to completion
of any colorectal cancer screening for year 1 or year 2.
©2014 by American Association for Cancer Research
Wang C et al. Cancer Epidemiol Biomarkers Prev
2014;23:1683-1688
Back to our example
• Authors did their job: they noticed the
potential problem and checked the
assumption.
Time-varying adjusted HRs for time to completion of any colorectal cancer
screening in years 1 and 2: the solid curves are the estimated HR curves and
the dashed curves represent the 95% CIs.
Wang C et al. Cancer Epidemiol
Biomarkers Prev 2014;23:16831688
©2014 by American Association for
Cancer Research
Implications?
•
•
•
•
What do those estimated HRs mean?
Are they valid?
Are the p-values valid?
How might the analysis be changed to get to
provide valid inferences?
– Time-varying approaches
– You can allow for associations to vary over time.
– Surprised this was not done in a formal way.
Some take-home points
• Survival analysis methods are a critically important set
of tools in cancer research
• Survival methods are widely available and tools specific
for survival analysis should be used for analysis of
censored data.
• Cox regression is omnipresent, yet the proportionality
assumption is often ignored, leading to an invalid
summary statistic (HR).
• Cox regression and log-rank tests are just the ‘tip of the
iceberg’. There are many other methods out there that
may be more appropriate, just less often utilized.