Main presentation title goes here.

Download Report

Transcript Main presentation title goes here.

Additional Regression
techniques
Scott Harris
October 2009
Learning outcomes
By the end of this session you should:
• be aware of 2 additional regression techniques:
– Cox Regression
– Logistic regression;
• know when these techniques are applicable;
• be able to interpret the results from these regression
techniques.
2
Contents
• Cox Regression
– Assumptions behind the model
– Fitting Cox regression models in SPSS
– Interpreting the model
– Testing the assumptions
• Log-log plot
• Plots of partial residuals against rank time
• Logistic Regression
– When to use it
– ‘How to’ in SPSS
– Interpreting the output
3
Cox regression
Cox regression
• Models time-to-event data in the presence of censored
cases.
• Allows the inclusion of predictor variables
(covariates). These can be categorical or continuous.
• Can be extended to allow for time dependent
covariates (not covered here).
• Also known as Cox Proportional Hazards model or
Cox model.
5
1.00
Hazard functions
Kaplan-Meier survival curves for Group A and Group B
0.60
Hazard
0.20
0.40
Group A
Group B
0.00
Proportion: No event
0.80
Log rank test, p = 0.1191
0
5
10
15
Time
20
25
30
6
Hazard rates & ratios
• The hazard rate is the probability that if the event in
question has not already occurred, it will occur in the
next time interval, divided by the length of that
interval. This time interval is made very short, so that
in effect the hazard rate represents an instantaneous
rate.
• The hazard ratio is an estimate of the ratio of the
hazard rate in the treated versus the control group.
7
Cox regression: PH assumption
• Assumption of Proportional Hazards: The hazards are
consistent and do not vary differently over time.
• Can be graphically assessed by looking at the Log-Log
plot: If PH model is true then the curves should be
approximately parallel.
• Can also examine the residuals (Schoenfeld residuals):
If PH is true then the plot of the residuals should be
horizontal and close to 0.
8
SPSS – Cox regression
Analyze  Survival  Cox Regression…
9
SPSS – Cox regression
10
SPSS – Cox regression
* Cox regression adjusted for age .
COXREG
Time /STATUS=Status(1)
/CONTRAST (Group)=Indicator(1)
/METHOD=ENTER Age Group
/SAVE=PRESID
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) .
11
Info: Cox regression in SPSS
1)
From the menus select ‘Analyze’  ‘Survival’  ‘Cox Regression…’.
2)
Put the variable containing the time into the ‘Time:’ box.
3)
Put the categorical variable, that indicates whether a case had the event of
interest or not into the ‘Status:’ box. Then click the ‘Define Event…’ button
and enter the single value or range of values that all indicate that the event
occurred. Click ‘Continue’.
4)
Add any other variables that you would like included in your model into the
‘Covariates:’ box.
5)
If any of the variables that were included in the ‘Covariates:’ box are
categorical then click the ‘Categorical…’ button. Each of these variables then
need to be moved to the ‘Categorical Covariates:’ box. In the ‘Change
Contrast’ box decide, for each variable, whether the reference category should
be either the first or last level and make any changes if appropriate. Click
‘Continue’.
6)
Click the ‘Save…’ button and tick the ‘Partial Residuals’ option in the
‘Diagnostics’ box. Click ‘Continue’.
7)
Click the ‘Options’ button and tick the ‘CI for exp(β):’ option in the ‘Model
Statistics’ box. Click ‘Continue’.
8)
Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this
into your syntax file.
12
SPSS – Cox regression: Output
This table in conjunction with how
the contrast was set up
defines how you should
interpret the output for
the categorical
variables.
Categorical Variable Codingsb
Groupa
1=Group A
2=Group B
Frequency
8
8
(1)
0
1
a. Indicator Parameter Coding
b. Category variable: Group
Here the reference category was
set up as the first level, which
here sets Group A as the
reference.
Hazard ratio for each unit increase in
Age with CI and p value.
Variables in the Equation
B
Age
Group
.576
2.175
SE
.195
.961
Wald
8.723
5.120
df
1
1
Sig .
.003
.024
Exp(B)
1.779
8.804
95.0% CI for Exp(B)
Lower
Upper
1.214
2.607
1.338
57.938
Hazard ratio for being
in Group B, relative to
Group A (reference)
with CI and p value. 13
SPSS – Cox regression
Age
Group B
Hazard ratio (95% CI)
p value
1.78 (1.21, 2.61)
0.003
8.80 (1.34, 57.94)
0.024
Here you can see that the hazard is 78% higher for each
additional year of age and this effect is highly significant
(p=0.003). Having adjusted for age however there appears to be
a very clear difference between the groups with a hazard ratio for
Group B relative to Group A of 8.80 (95% CI: 1.34 to 57.94)
(p=0.024). Notice that this confidence interval is very wide and
that the lower limit suggests that the true hazard ratio may be as
low as 1.34.
14
SPSS – Cox regression
Group B
Hazard ratio (95% CI)
p value
2.56 (0.74, 8.82)
0.136
If we take Age out of the model then the effect of the groups is
reduced with Group B having an increased hazard ratio relative to
Group A of 2.56 (95% CI: 0.74 to 8.82), which is now not
statistically significant at the 5% level (p=0.136).
Model selection for Survival models is as important as it is for
other modelling procedures and needs to be thought about
carefully.
15
The PH assumption: Log-log plot
The log-log plot is one way to assess graphically
whether the assumption of proportional hazards was
reasonable. For the assumption to hold then the loglog plot should show the separate lines as
approximately parallel to each other.
16
SPSS – The PH assumption: Log-log plot
To produce an accurate log-log plot in SPSS you need to define
the categorical variable as a Strata.
* Log-log plot .
COXREG
Time /STATUS=Status(1) /STRATA=Group
/METHOD=ENTER Age
/PLOT LML
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) .
17
Info: Cox regression: Log-log plot in SPSS
1) Follow the information sheet on producing a Cox
regression, but stop after point 5.
2) To produce the Log-log plot we need to remove the
most important categorical variable from the
‘Covariates:’ box and put it into the ‘Strata:’ box
instead. This variable is quite often the groups that
we are looking to compare.
3) Once a variable is in the ‘Strata:’ box, click on the
‘Plots…’ button. Tick the option for the ‘Log minus
log’ plot in the ‘Plot Type’ box. Click ‘Continue’.
4) Finally click ‘OK’ to produce the plot or ‘Paste’ to
add the syntax for this into your syntax file.
18
SPSS – The PH assumption: Log-log plot
Not enough cases
in each strata 
Dataset too small
W arnings
Since coefficients did not converge, no further models will be fitted.
19
SPSS – Cox regression: Aside
Aside: Strata
Fitting the group variable
as a strata instead of as a
covariate, with no other
covariates in the model,
replicates the Kaplan-Meier
plot if we ask for the
survival plot.
20
SPSS – The PH assumption: Residual plots
Plot each of the residuals against rank time.
If the PH assumption has not been violated then each of the
plots:
– Should not show a clear trend over time (i.e. not drastically
increasing or decreasing).
– It should also be centered close to 0.
* Creating the ranks .
RANK
VARIABLES=Time (A) /RANK /PRINT=YES
/TIES=MEAN .
* Producing the scatter graphs .
GRAPH
/SCATTERPLOT(BIVAR)=RTime WITH PR1_1
/MISSING=LISTWISE .
GRAPH
/SCATTERPLOT(BIVAR)=RTime WITH PR2_1
/MISSING=LISTWISE .
21
Info: Cox regression: Residual plots in SPSS
1)
Follow the information sheet on producing a Cox regression all the way
through until the end. This will save a new set of variables to the
dataset that contain the residuals (you will get 1 residual for each
covariate in the model and they will start with PR).
2)
We now need to produce a rank time variable. To do this we need to go
to ‘Transform’  ‘Rank Cases’.
–
–
3)
Now put the time variable into the ‘Variable(s):’ box.
Click ‘OK’ to produce the ranks or ‘Paste’ to add the syntax for
this into your syntax file.
Now we have the 2 elements to produce the scatter plots. To draw the
scatterplots we go to: ‘Graphs’  ‘Scatter/Dot…’ then select ‘Simple
Scatter’ and click ‘Define’. Put the new rank time on the x axis and
each of the residual variables in turn on the y axis.
4)
Finally click ‘OK’ to produce the plot or ‘Paste’ to add the syntax for
this into your syntax file.
5)
You can now edit the plot to improve presentation (see Introduction
course notes). It is often useful to add a horizontal reference line at 0
to aid interpretation.
22
SPSS – The PH assumption: Residual plots
These plots don’t seem to indicate any obvious trend and are
generally centered close to zero, but we are dealing with a very
small example dataset here.
23
Logistic regression
Logistic regression
• Logistic regression is used when the outcome variable is binary
(is categorical and has 2 levels).
• Allows the inclusion of predictor variables (covariates). These can
be categorical or continuous.
• The modeling is conducted on the log odds scale but the results
should be presented on the odds scale (see categorical notes).
• Can be extended to deal with outcomes with more than 2 levels.
These models are known as multinomial or ordinal regression
(not covered here).
25
SPSS – Logistic regression
Binary outcome variable
All other covariates
Analyze  Regression  Binary Logistic…
26
SPSS – Logistic regression…
If you have any categorical
variables then you need to
use the ‘Categorical…’
option to set up how to
deal with these.
ln_yesno is a binary yes/no
variable so we move it into
the ‘Categorical Covariates:’
box.
27
SPSS – Logistic regression…
Right click and
select ‘Variable
information’
For each categorical variable you
now need to set up up which level
will be the reference category.
Here ‘No’ is the first category (the
lowest code) and so we set this as
28
the reference.
SPSS – Logistic regression…
Go into the options and tick the box for
confidence intervals for the odds ratios.
29
Info: Logistic Regression in SPSS
1)
From the menus select ‘Analyze’  ‘Regression’  ‘Binary
Logistic…’.
2)
Put the variable containing the binary outcome into the ‘Dependent:’
box.
3)
Add all other variables that you would like included in your model into
the ‘Covariates:’ box.
4)
If any of the variables that were included in the ‘Covariates:’ box are
categorical then click the ‘Categorical…’ button. Each of these
variables then need to be moved to the ‘Categorical Covariates:’ box.
In the ‘Change Contrast’ box decide, for each variable, whether the
reference category should be either the first or last level and make any
changes if appropriate. Click ‘Continue’.
5)
Click the ‘Options’ button and tick the ‘CI for exp(β):’ option in the
‘Statistics and Plots’ box. Click ‘Continue’.
6)
Finally click ‘OK’ to produce the test results or ‘Paste’ to add the
syntax for this into your syntax file.
30
SPSS Logistic Regression: Output
Information on the
amount of data used in
the analysis.
Very important as this identifies the level of
the binary outcome that is being modelled.
Here the higher level is 1 which was used to
indicate subjects who died within 5 years and
so this is what our model will be looking at.
Convergence information.
31
SPSS Logistic Regression: Output…
P values.
Odds ratios.
95% confidence intervals
for the odds ratios.
Interpretation:
Having adjusted for lymph node involvement each additional year of age increases the
odds of mortality within 5 years by a factor of 0.99 (95% CI 0.97 to 1.01), although this
was not statistically significant (p=0.375).
Having adjusted for age, subjects with lymph node involvement have their odds of
mortality in 5 years increased by a factor of 2.65 (95% CI 1.49 to 4.72) compared to
those with no lynph node involvement. This effect was highly statistically significant
32
(p=0.001).
Summary
You should now:
• be aware of 2 additional regression techniques:
– Cox Regression
– Logistic regression;
• know when these techniques are applicable;
• be able to interpret the results from these regression
techniques.
33
References
• Practical Statistics for medical research, D Altman: Chapter 13.
• Medical Statistics, B Kirkwood, J Stern: Chapter 26.
• An introduction to medical statistics, M Bland: Chapter 15.6.
Survival analysis specific texts
• Kleinbaum D. G., Klein M., Survival Analysis: A Self-Learning
Text, Springer-Verlag Publishers, 2005.
• Parmar M. K. B., Machin D., Survival analysis: a practical
approach, Wiley, 1995.
34