Main presentation title goes here.

Download Report

Transcript Main presentation title goes here.

Analysis of Variance
(ANOVA)
Scott Harris
October 2009
Learning outcomes
By the end of this session you should be able to choose between,
perform (using SPSS) and interpret the results from:
– Analysis of Variance (ANOVA),
– Kruskal-Wallis test,
– Adjusted ANOVA (can also be called Univariate General
Linear Model or Multiple linear regression.).
2
Contents
• Reminder of the example dataset.
• Comparison of more than 2 independent groups (P/NP)
– Test information.
– ‘How to’ in SPSS.
• Adjusting for additional variables
– ‘How to’ in SPSS.
– What to do when you add a continuous predictor.
– What to do when you add 2 or more categorical predictors.
– Interpreting the output.
3
Example dataset: Information
CISR (Clinical Interview Schedule: Revised) data:
– Measure of depression – the higher the score the worse the
depression.
– A CISR value of 12 or greater is used to indicate a clinical
case of depression.
– 3 groups of patients (each receiving a different form of
treatment: GP, CMHN and CMHN problem solving).
– Data collected at two time points (baseline and then a followup visit 6 months later).
– Calculated age at interview from the 2 dates.
4
Example CISR dataset
5
Comparing more than
two independent groups
Analysis of variance (ANOVA) or
Kruskal Wallis test
Normally
distributed data
Analysis of variance (ANOVA)
More than 2 groups?
When there are more than 2 groups that you
wish to compare then t tests are no longer
suitable and you should employ Analysis of
variance (ANOVA) techniques instead.
8
Analysis of Variance (ANOVA): Hypotheses
H 0 : X1  X 2  X3  ... X n
H1 : Not all the means are the same
The null hypothesis (H0) is that all of the groups
are the same. The alternative hypothesis (H1) is
that they are not all the same.
9
SPSS – One-way ANOVA
Analyze  Compare Means  One-Way ANOVA…
10
SPSS – One-way ANOVA…
* One-way ANOVA .
ONEWAY
B0SCORE BY TMTGR
/STATISTICS DESCRIPTIVES
/MISSING ANALYSIS
/POSTHOC = LSD BONFERRONI ALPHA(.05).
11
Info: One-way ANOVA in SPSS
1) From the menus select ‘Analyze’  ‘Compare Means’  ‘OneWay ANOVA…’.
2) Put the variable that you want to test into the ‘Dependent List:’
box.
3) Put the categorical variable, that indicates which group the
values come from, into the ‘Factor:’ box.
4) Click the ‘Options’ button and then tick the boxes for
‘Descriptive’. Click ‘Continue’.
5) Click the ‘Post Hoc…’ button and then tick the boxes for the
post hoc tests that you would like. Click ‘Continue’.
6) Finally click ‘OK’ to produce the test results or ‘Paste’ to add
the syntax for this into your syntax file.
12
SPSS – One-way ANOVA: Output
Descriptives
B0SCORE
N
GP
CMHN
CMHN PS
Total
28
40
41
109
Mean
23.8571
28.8750
28.1463
27.3119
Std. Deviation
11.01418
10.27116
10.76699
10.75286
Std. Error
2.08148
1.62401
1.68152
1.02994
Group summary statistics
(descriptives option)
95% Confidence Interval for
Mean
Lower Bound
Upper Bound
19.5863
28.1280
25.5901
32.1599
24.7479
31.5448
25.2704
29.3534
Minimum
6.00
7.00
4.00
4.00
Maximum
52.00
48.00
57.00
57.00
2 sided p value with an alternative
hypothesis of non-equality of at least one
group. Non significant (P=0.137) hence
no significant evidence to suggest
differences in the groups.
ANOVA
B0SCORE
Between Groups
Within Groups
Total
Sum of
Squares
460.469
12026.926
12487.394
df
2
106
108
Mean Square
230.234
113.462
F
2.029
Sig .
.137
13
SPSS – One-way ANOVA: Output…
These methods use ‘t
tests’ to perform all
pair wise comparisons
between group means
No adjustment for
multiple
comparisons
(LSD option)
Multiple Comparisons
LSD
Bonferroni
Adjusted p values
for multiple
comparisons
(Bonferroni option)
P-value
Dependent Variable: B0SCORE
(I) TMTGR (J) TMTGR
GP
CMHN
CMHN PS
CMHN
GP
CMHN PS
CMHN PS GP
CMHN
GP
CMHN
CMHN PS
CMHN
GP
CMHN PS
CMHN PS GP
CMHN
Mean
Difference
(I-J)
-5.01786
-4.28920
5.01786
.72866
4.28920
-.72866
-5.01786
-4.28920
5.01786
.72866
4.28920
-.72866
Std. Error
2.62464
2.61143
2.62464
2.36725
2.61143
2.36725
2.62464
2.61143
2.62464
2.36725
2.61143
2.36725
Sig .
.059
.103
.059
.759
.103
.759
.176
.310
.176
1.000
.310
1.000
95% Confidence Interval
Lower Bound
Upper Bound
-10.2215
.1857
-9.4666
.8882
-.1857
10.2215
-3.9647
5.4220
-.8882
9.4666
-5.4220
3.9647
-11.4025
1.3668
-10.6417
2.0633
-1.3668
11.4025
-5.0298
6.4872
-2.0633
10.6417
-6.4872
5.0298
Mean difference between Groups I and J
95% Confidence interval for the
difference between Groups I and J
14
Non-normally
distributed data
Kruskal Wallis test
SPSS – Kruskal Wallis test
* Kruskal-Wallis test .
NPAR TESTS
/K-W=M6SCORE BY TMTGR(1 3)
/MISSING ANALYSIS.
Analyze  Nonparametric Tests  K Independent Samples…
16
Info: Kruskal Wallis test in SPSS
1) From the menus select ‘Analyze’  ‘Nonparametric Tests’ 
‘K Independent Samples…’.
2) Put the variable that you want to test into the ‘Test Variable
List:’ box.
3) Put the categorical variable, that indicates which group the
values come from, into the ‘Grouping Variable:’ box.
4) Click the ‘Define Range…’ box and then enter the numeric
codes for the minimum and maximum of the groups that you
want to compare. Click ‘Continue’.
5) Ensure that the ‘Kruskal-Wallis H’ option is ticked in the ‘Test
Type’ box.
6) Finally click ‘OK’ to produce the test results or ‘Paste’ to add
the syntax for this into your syntax file.
17
SPSS – Kruskal Wallis test: Output
Ranks
M6SCORE
TMTGR
GP
CMHN
CMHN PS
Total
N
28
40
41
109
Mean Rank
68.66
48.24
52.27
Observed mean ranks
Test Statisticsa,b
Chi-Square
df
Asymp. Sig.
M6SCORE
7.388
2
.025
a. Kruskal Wallis Test
2 sided p value with an alternative
hypothesis of non-equality of groups.
Significant (P=0.025) hence significant
evidence that at least one of the
groups is different.
b. Grouping Variable: TMTGR
If you want to find out where the differences are then you need to
conduct a series of pair-wise Mann Whitney U tests.
18
Practical Questions
Analysis of Variance
Questions 1 and 2
Practical Questions
From the course webpage download the file HbA1c.sav by
clicking the right mouse button on the file name and selecting
Save Target As.
The dataset is pre-labelled and contains data on Blood sugar
reduction for 245 patients divided into 3 groups.
1)
Assuming that the outcome variable is normally distributed:
Conduct a suitable statistical test to compare the finishing
HbA1c level (HBA1C_2) between all of the 3 groups. What are
your conclusions from this test if you don’t worry about
multiple testing? What about if you do, using a Bonferroni
correction?
2)
Assuming that the outcome variable is NOT normally distributed:
Conduct a suitable statistical test to compare the finishing
HbA1c level (HBA1C_2) between all of the 3 groups. What are
your conclusions from this test?
20
Practical Solutions
1) The ANOVA table shows that at least one of the groups is significantly
different from the others (p=0.010).
Descriptives
HB1AC_2
N
Active A
Active B
Placebo
Total
83
80
82
245
Mean
5.7208
6.0132
6.5105
6.0806
Std. Deviation
1.79766
1.60600
1.60229
1.69735
95% Confidence Interval for
Mean
Lower Bound
Upper Bound
5.3283
6.1133
5.6558
6.3706
6.1584
6.8625
5.8670
6.2942
Std. Error
.19732
.17956
.17694
.10844
Minimum
1.36
2.31
3.44
1.36
Maximum
10.36
9.88
11.21
11.21
ANOVA
HB1AC_2
Between Groups
Within Groups
Total
Sum of
Squares
26.261
676.706
702.967
df
2
242
244
Mean Square
13.131
2.796
F
4.696
Sig .
.010
21
Practical Solutions
Looking at the individual LSD and Bonferroni corrected pair-wise
comparisons it can be seen that there is only one contrast that shows
a significant difference at the 5% level and that is Active A vs.
Placebo, with the Placebo levels higher.
Multiple Comparisons
Dependent Variable: HB1AC_2
LSD
(I) Treatment g roup
Active A
Active B
Placebo
Bonferroni
Active A
Active B
Placebo
(J) Treatment group
Active B
Placebo
Active A
Placebo
Active A
Active B
Active B
Placebo
Active A
Placebo
Active A
Active B
Mean
Difference
(I-J)
-.29239
-.78968*
.29239
-.49728
.78968*
.49728
-.29239
-.78968*
.29239
-.49728
.78968*
.49728
Std. Error
.26200
.26037
.26200
.26278
.26037
.26278
.26200
.26037
.26200
.26278
.26037
.26278
Sig .
.266
.003
.266
.060
.003
.060
.797
.008
.797
.179
.008
.179
95% Confidence Interval
Lower Bound
Upper Bound
-.8085
.2237
-1.3026
-.2768
-.2237
.8085
-1.0149
.0204
.2768
1.3026
-.0204
1.0149
-.9240
.3392
-1.4174
-.1620
-.3392
.9240
-1.1308
.1362
.1620
1.4174
-.1362
1.1308
*. The mean difference is significant at the .05 level.
22
Practical Solutions
2)
For the non-parametric test, again, there is only a p value to report from the
test (although the group medians could be reported from elsewhere, the pairwise comparisons need to be done as separate Mann-Whitney U tests as
shown in Analysing Continuous data and CI’s for these differences could be
calculated from CIA).
Ranks
HB1AC_2
Treatment group
Active A
Active B
Placebo
Total
Test Statisticsa,b
N
83
80
82
245
Mean Rank
108.99
119.30
140.79
Chi-Square
df
Asymp. Sig.
HB1AC_2
8.631
2
.013
a. Kruskal Wallis Test
b. Grouping Variable: Treatment group
The Kruskal-Wallis test shows that at least one of the groups is significantly
different from the others (p=0.013)
23
Comparing groups and
adjusting for other variables
Adjusted ANOVA
Adjusted ANOVA
Sometimes you wish to look at a relationship that is more
complicated than one continuous outcome with one categorical
group ‘predictor’.
Adjusted ANOVA allows for the addition of other covariates
(predictor variables). These can be either categorical, continuous
or a combination of both.
The next command in SPSS is one of the most powerful. SPSS
calls it a Univariate General Linear Model (GLM). It can replicate
one-way ANOVA and Linear regression.
It is also equivalent to multiple regression but with a bit more
flexibility.
25
Example 1
Replicating the one-way ANOVA
SPSS – Adjusted ANOVA
Outcome variable
Categorical predictor variables
Continuous predictor variables
Analyze  General Linear Model  Univariate…
27
SPSS – Adjusted ANOVA…
Can produce pair-wise comparisons
for multiple categorical variables
The same additional options can be
set as for the one-way ANOVA, with
post hoc pair-wise comparisons…
28
SPSS – Adjusted ANOVA…
and simple descriptive
statistics available.
29
Info: Adjusted ANOVA in SPSS (no continuous covariates)
1) From the menus select ‘Analyze’  ‘General Linear Model’ 
‘Univariate…’.
2) Put the variable that you want to test into the ‘Dependent
Variable:’ box.
3) Put any categorical variables, that indicate which group the
values come from or some other category, into the ‘Fixed
Factor(s):’ box.
4) Click the ‘Options’ button and then tick the boxes for
‘Descriptive statistics’. Click ‘Continue’.
5) Click the ‘Post Hoc…’ button and then move over the
categorical variable(s) that you would like the pairwise
comparisons for. Then tick the boxes for the post hoc tests that
you would like. Click ‘Continue’.
6) Finally click ‘OK’ to produce the test results or ‘Paste’ to add
the syntax for this into your syntax file.
30
SPSS – Adjusted ANOVA: Output
This is the same p value as from the
one-way ANOVA and it is interpreted
in the same way. Notice how the row
uses the variable name (important
for later).
31
SPSS – Adjusted ANOVA: Output…
The same post-hoc pair-wise results as before:
32
Example 2
Adjusting for continuous and
categorical covariates
SPSS – Adjusted ANOVA
Outcome variable
Categorical predictor variables
(2 categorical variables here)
Continuous predictor variables
(1 continuous variable here)
Analyze  General Linear Model  Univariate…
34
SPSS – Adjusted ANOVA…
As soon as we include a continuous covariate, the Post Hoc option is no
longer available and we need to use the ‘Contrasts…’ option which isn’t
quite as powerful.
35
SPSS – Adjusted ANOVA…
Select the Category that you want the contrast for and then you can
select the type of contrast and the reference level.
‘Simple’ is the standard contrast (simple differences between levels) and
the reference category is the level that all other levels of the categorical
variable are compared against.
36
Info: Adjusted ANOVA in SPSS (inc. continuous covariates)
1)
From the menus select ‘Analyze’  ‘General Linear Model’  ‘Univariate…’.
2)
Put the variable that you want to test into the ‘Dependent Variable:’ box.
3)
Put any categorical variables, that indicate which group the values come from or
some other category, into the ‘Fixed Factor(s):’ box.
4)
Put any continuous variables into the ‘Covariate(s):’ box.
5)
Click the ‘Contrasts…’ button and set up any contrasts that you want for any
categorical variables. You need to select the variable, then choose the type of
contrast (generally you use ‘Simple’). Next you need to select the reference
level. This can be either first or last and it will dictate the level of the category
variable that all other levels will be compared against (first will compare all
other levels against the first: 2nd -1st, 3rd-1st etc.), then click ‘Change’. When you
are finished click ‘Continue’.
6)
Click the ‘Options’ button and then tick the boxes for ‘Descriptive statistics’.
Click ‘Continue’.
7)
Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this
into your syntax file.
37
SPSS – Adjusted ANOVA…
If you have more than 1
variable in the ‘Fixed
Factor(s)’ box then you
need to go into the ‘Model..’
options.
38
SPSS – Adjusted ANOVA…
The default model is
‘Full factorial’. This
will include all
possible interactions
between factors.
Generally we want to
consider only main
effects (at least at
the start). To do this
select ‘Custom’ and
then set ‘Type:’ to
‘Main effects’ and
move all ‘Factors &
Covariates’ into the
‘Model:’ box.
39
Info: Adjusted ANOVA in SPSS (2+ categorical covariates)
1)
From the menus select ‘Analyze’  ‘General Linear Model’ 
‘Univariate…’.
2)
Put the variable that you want to test into the ‘Dependent Variable:’ box.
3)
Put the 2 or more categorical variables, that indicate which group the
values come from or some other category, into the ‘Fixed Factor(s):’ box.
4)
Click the ‘Model…’ button and then select ‘Custom…’. Change the ‘Type:’
type to ‘Main effects’ and then move all ‘Factors & Covariates’ into the
Model:’ box.
5)
Put any continuous variables into the ‘Covariate(s):’ box.
6)
Set up either the ‘Post Hoc…’ or ‘Contrasts…’ options depending on
whether there are any continuous covariates or not (see previous
information slides).
7)
Click the ‘Options’ button and then tick the boxes for ‘Descriptive
statistics’. Click ‘Continue’.
8)
Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax
for this into your syntax file.
40
SPSS – Adjusted ANOVA: Output
Descriptive statistics separated by all
combinations of the factor variables.
This is now the p-value for the effect of
TMTGR having adjusted for SEX and
AgeInt. So having taken into account
variability due to SEX and AgeInt there is
no statistically significant difference
between the treatment groups (p=0.121).
Similar statements can be
made regarding the other
variables in the model, i.e.
having adjusted for TMTGR
and AgeInt there is no
statistically significant
difference between the 2 sexes
(p=0.261).
41
SPSS – Adjusted ANOVA: Output…
The Contrast results (interpretation is the same as the previous slide):
The TMTGR variable had 3
levels here:
1 – GP
2 – CMHN
3 – CMHN PS
By selecting the first level
of the factor as the
reference category the
contrast will produce:
CMHN – GP (2-1)
CMHN PS – GP (3-1)
Difference between
CMHN PS and GP
P-value for
CMHN PS - GP
95% CI for the
difference between
CMHN PS and GP
42
Practical Questions
Analysis of Variance
Question 3
Practical Questions
3) Using an Adjusted ANOVA with the finial HbA1c level
(HBA1C_2) as the outcome:
i.
Replicate the model from question 1.
ii. Add the baseline level of HbA1c (HBA1C_1) in as
a covariate. How does this affect the results?
iii. Add Gender to the model from part (ii). Look at
just the main effects of the variables rather than
any interactions. Does this change your results?
Do you think Gender should be in the model?
44
Practical Solutions
3)
i.
By adding
HBA1C_2 as the
dependent
variable and
GROUP as a
Fixed factor we
can replicate the
one-way ANOVA
45
Practical Solutions
The same multiple comparisons:
46
Practical Solutions
3)
ii.
By adding HBA1C_1 to the model GROUP has become more significant.
We are explaining an additional amount of variability, hence increasing the
precision.
47
Practical Solutions
We need to use
the contrasts
option when a
continuous
covariate is added
to the model.
To see the
remaining contrast
we need to re-run
with a different
reference category.
48
Practical Solutions
3)
iii.
Adding in another
categorical covariate
means we need to
go into the model
options or we will get
interaction terms
fitted as default.
49
Practical Solutions
The size of the effects alters slightly but the conclusion remains the same.
Although Gender is not statistically significant it may still be important in the
model. We can include terms if they are significant in our sample, they are
key variables that have shown to be important in the literature or we want to
50
test them for differences.
Practical Solutions
51
Summary
You should now be able to choose between, perform
(using SPSS) and interpret the results from:
– Comparing three or more independent groups:
• Parametric: Analysis of Variance (ANOVA)
• Non-parametric: Kruskal-Wallis test.
– Comparing independent groups and adjusting
for other variables:
• Parametric: Adjusted ANOVA (can also be called
Univariate GLM or Multiple linear
regression.)
52
References
Parametric
• Practical statistics for medical research, D Altman: Chapter 9.
• Medical statistics, B Kirkwood, J Stern: Chapters 7 & 9.
• An introduction to medical statistics, M Bland: Chapter 10.
• Statistics for the Terrified: Testing for differences between groups.
Non-parametric
• Practical statistics for medical research, D Altman: Chapter 9.
• Medical statistics, B Kirkwood, J Stern: Chapter 30.
• An introduction to medical statistics, M Bland: Chapter 12.
• Statistics for the Terrified: Testing for differences between groups.
• Practical Non-Parametric Statistics (3rd ED), W.J. Conover.
53