Transcript Analysis of Covariance - University of Windsor
Analysis of Covariance
46-512: Statistics for Graduate Study in Psychology 1
Learning Outcomes
What is an ANCOVA?
How does it relate to what we have done already?
When would we use it?
What are the issues & assumptions?
What are some limitations and alternatives?
2
Experiment:
3 instructional methods
Subj. # 1 2 3 4 5 6 10 11 12 7 8 9 n Means Std. Devs.
Pearson r
Group 1
X 98 102 104 103 112 113 118 120 115 106 112 122 12 110.42
7.73
12 70.83
6.12
0.87
Y 60 63 66 69 72 75 78 80 67 70 74 76
Group 2
X 104 109 104 117 120 113 117 126 113 109 125 118 12 114.58
7.28
12 72.17
7.57
0.72
Y 62 63 67 71 77 79 82 84 64 68 72 77
Group 3
X 102 117 108 117 105 116 111 120 107 104 128 119 12 112.83
7.87
12 76.42
5.68
0.45
Y 65 68 72 76 78 80 82 84 75 77 79 81 3
First, let’s run it as an MRA…
Treat group 3 as control: DC1 identifies Group 1 DC2 identifies Group 2 compute dc1=0.
compute dc2=0.
if (gpid=1) dc1=1.
if (gpid=2) dc2=1.
execute.
REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS CI R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT y /METHOD=ENTER dc1 dc2 .
4
Result…
ANOV A b
Model 1 Regres sion Residual Total Sum of Squares 204.056
1396.250
1600.306
a. Predic tors: (Constant), dc2, dc1 b. Dependent Variable: y df 2 33 35 Mean S quare 102.028
42.311
R 2 = 204.056
/ 1600.306
= .128
F 2.411
Sig.
.105
a Model 1 (Const ant) dc 1 dc 2 Unstandardized Coeffic ient s B 76.417
-5. 583 -4. 250 St d. E rror 1.878
2.656
2.656
a. Dependent Variable: y St andardiz ed Coeffic ient s Beta -.395
-.300
t 40.696
-2. 103 -1. 600 Sig.
.000
.043
.119
95% Confidenc e Interval for B Lower Bound 72.596
-10.986
-9. 653 Upper Bound 80.237
-.181
1.153
5
Which agrees with GLM
Tests of Between-Subjects Effects
Dependent Variable: y Source Corrected Model Intercept gpid Error Total Corrected Total Type III Sum of Squares 204.056
a 192574.694
204.056
1396.250
194175.000
1600.306
df 2 1 2 33 36 35 Mean Square a. R Squared = .128 (Adjusted R Squared = .075) 102.028
192574.694
102.028
42.311
F 2.411
4551.452
2.411
Sig.
.105
.000
.105
6
Back to MRA
Enter our continuous variable (IQ) Sans interaction term for the time being.
ANOV A b
Model 1 Regres sion Residual Total Sum of Squares 843.542
756.764
1600.306
a. Predic tors: (Constant), x, dc 2, dc1 df b. Dependent Variable: y 3 32 35 Mean S quare 281.181
23.649
F 11.890
Sig.
.000
a R 2 = .527
Model 1 (Constant) dc1 dc2 x Unstandardized Coefficients B 11.324
Std. Error 12.596
-4.189
-5.260
.577
2.003
1.995
.111
a. Dependent Variable: y
Coefficients a
Standardized Coefficients Beta -.296
-.372
.649
t .899
-2.091
-2.637
5.200
Sig.
.375
.045
.013
.000
95% Confidence Interval for B Lower Bound -14.334
Upper Bound 36.981
-8.270
-9.323
.351
-.109
-1.196
.803
7
What have we done?
Analysis of Covariance What does it tell us?
In general, why would we use this technique?
1) 2) 3) 8
Examples of different applications
Elimination of systematic bias The relationship between questionnaire responses and business performance, controlling for pre-existing differences in business performance.
Reduce Error Variance In a random assignment experiment, looking at vigilance and using age as a covariate Step-down Analysis Studying the effects of an educational intervention on performance & self-esteem.
9
Effects & Extensions
Types of Effects Significance of Covariate(s) Main Effects Interactions among Factors Interactions between factors and covariate(s) = bad news.
Extensions Can have multiple covariates Factorial Designs Mixed Randomized by Repeated Designs Within Subjects Designs 10
Back to our example (as one-way)
Tests of Between-Subjects Effects
Dependent Variable: y Source Corrected Model Intercept gpid Error Total Corrected Total Type III Sum of Squares 204.056
a 192574.694
204.056
1396.250
194175.000
1600.306
df 2 1 2 33 36 35 Mean Square a. R Squared = .128 (Adjusted R Squared = .075) 102.028
192574.694
102.028
42.311
F 2.411
4551.452
2.411
Sig.
.105
.000
.105
11
Run through GLM as ANCOVA
Tests of Between-Subjects Effects
Dependent Variable: y Source Corrected Model Intercept x gpid Error Total Corrected Total Type III Sum of Squares 843.542
a 10.082
639.486
185.333
756.764
194175.000
1600.306
df 3 1 1 2 32 36 35 Mean Square 281.181
10.082
639.486
92.666
23.649
a. R Squared = .527 (Adjus ted R Squared = .483) F 11.890
.426
27.041
3.918
Sig.
.000
.518
.000
.030
Why is GPID now significant?
12
Means and Adjusted Means
Group Group 1 Group 2 Group 3 Total Mean for X 110.417
114.583
112.833
112.611
Unadjusted Mean for Y 70.833
72.167
76.417
73.139
Adjusted Means calculated as… Y '
Y j
j
X
) For Group 1… 72.099
Adjusted Mean for Y 72.099
71.029
76.288
73.139
13
Parameter Estimates from SPSS
Parameter Estimates
Dependent Variable: y Parameter Intercept x [gpid=1.00] [gpid=2.00] [gpid=3.00] B 11.324
.577
-4.189
-5.260
0 a Std. Error 12.596
.111
2.003
1.995
.
t .899
5.200
-2.091
-2.637
.
Sig.
.375
.000
.045
.013
.
a. This parameter is s et to zero becaus e it is redundant.
95% Confidence Interval Lower Bound -14.334
.351
Upper Bound 36.981
.803
-8.270
-9.323
.
-.109
-1.196
.
Compare to those from our MRA 14
Post-Hocs from SPSS
Pairwise Comparisons
Dependent Variable: y (I) gpid 1.00
2.00
3.00
(J) gpid 2.00
3.00
1.00
3.00
1.00
2.00
Mean Difference (I-J) 2.472
-4.092
-2.472
-6.564* 4.092
6.564* Std. Error 1.990
1.897
1.990
1.921
1.897
1.921
Sig.
a .531
.111
.531
.005
.111
.005
Bas ed on es tim ated m arginal m eans *. The m ean difference is significant at the .05 level.
a. Adjus tm ent for m ultiple com paris ons : Sidak.
95% Confidence Interval for Difference a Lower Bound -2.541
Upper Bound 7.485
-8.871
-7.485
-11.404
-.687
1.724
.687
2.541
-1.724
8.871
11.404
15
Bryant-Paulson Post Hoc
BP
BP
y i
*
y
*
j
*
MS W
[1
MS B X
/
SS W X
] /
n
5.584
1.423
3.924
BP crit = 3.55, Cell 3 is significantly higher than 1 & 2 Bryant Paulson is an extension of Tukey’s Post-Hoc test, and more appropriate if X is random.
16
ANCOVA & Intact Groups
Groups can still differ in unknown ways.
Question whether groups that are equivalent on the covariate ever exist – since ANCOVA adjusts for equivalence on the covariate.
Assumptions of linearity and homogeneity of regression slopes need to be satisfied.
Differential growth of subjects i.e., is difference due to treatment or differential growth?
Measurement error can produce spurious results. 17
Assumptions of ANCOVA
Larger sample sizes (because of the regression of the DV on the CV) Absence of Multicollinearity and Singularity Normality of sampling distributions (of the means) Homogeneity of Variance Linearity – of relationship between covariate and dependent variable Homogeneity of regression Reliability of covariates 18
Alternatives
In pre-post situations, using difference scores (assuming same metric) Controversial and carries some risk Incorporating pre-scores into a RM ANOVA design.
Residualize DV and run an ANOVA on the residualized scores. Controversial, not a very popular approach Blocking (rather than tackling!) assigning/matching people based on pre-scores or creating appropriate IV categories of intact groups.
Utilizing the CV as a factor in the experiment, if it lends itself well to categorization.
This side-steps many issues, such as homogeneity of regression.
Johnson-Neyman technique See Stevens (1999) for an alternative 19
Things to consider about covariates
Number Reliability Pre-screening Multicollinearity Loss of
df
20
More complicated designs
More than one covariate Factorial Designs Repeated Measures Designs For now, we will suspend discussion of more complicated designs, but revisit when we cover MANOVA and MANCOVA 21