Stat 512 Class 1 - Purdue University

Download Report

Transcript Stat 512 Class 1 - Purdue University

Topic 26: Analysis of
Covariance
Outline
• One-way analysis of covariance
– Data
– Model
– Inference
– Diagnostics and rememdies
• Multifactor analysis of covariance
Data for One-Way
ANCOVA
• Yij is the jth observation on the
response variable in the ith group
• Xij is the jth observation on the
covariate in the ith group
• i = 1, . . . , r levels (groups) of factor
• j = 1, . . . , ni observations for level i
KNNL Example (pg 927)
• Y is cases of crackers sold during
promotion period
• Factor is the type of promotion (r=3)
– Customers sample crackers in store
– Additional shelf space
– Special display shelves
• ni=5 different stores per type
• The covariate X is the number of cases
of crackers sold at the store in the
preceding period
Data
data a1;
infile 'c:\...\CH22TA01.DAT';
input cases last trt store;
proc print data=a1;
run;
Output
Obs
1
2
3
4
5
6
7
cases
38
39
36
45
33
43
38
last
21
26
22
28
19
34
26
trt
1
1
1
1
1
2
2
store
1
2
3
4
5
1
2
Output
Obs
8
9
10
11
12
13
14
15
cases
38
27
34
24
32
31
21
28
last
29
18
25
23
29
30
16
29
trt
2
2
2
3
3
3
3
3
store
3
4
5
1
2
3
4
5
Plot the data
title1 'Plot of the data';
symbol1 v='1' i=none c=black;
symbol2 v='2' i=none c=black;
symbol3 v='3' i=none c=black;
proc gplot data=a1;
plot cases*last=trt/frame;
run;
Background
• Covariates are sometimes called
concomitant variables
• Covariates should be related to the
response variable
• Covariates should not be affected by
the treatment variable (factor)
• Often they are some kind of baseline
or pretest value
Basic ideas
• A covariate can reduce the MSE,
thereby increasing power
• A covariate can adjust for differences
in characteristics of subjects in the
treatment groups
• We assume that the covariate will be
linearly related to the response and
that the relationship will be the same
for all levels of the factor. Similar to
comparing regression lines.
Cell Means Model for
one-way ancova
•
Yij  i   ( Xij  X.. )   ij
– the ij are iid N(0, σ2)
– Yij ~N(i   ( Xij  X.. ), σ2) and indep
• For each i, we have a simple linear
regression
• The slopes are the same
• The intercepts can be different
Plot of the data with lines
title1 'Plot of the data
with lines';
symbol1 v='1' i=rl c=black;
symbol2 v='2' i=rl c=black;
symbol3 v='3' i=rl c=black;
proc gplot data=a1;
plot cases*last=trt/frame;
run;
Parameters
• The parameters of the model are
– μi for i = 1 to r
–β
– σ2
Estimates
• We use multiple regression methods
to estimate the μi and β
• We use the residuals from the model
to estimate σ2
• The estimate is s2 (equal to the MSE)
Factor Effects Model for
one way anova
• Yij     i  ( X ij  X .. )   ij
– the ij are iid N(0, σ2)
• The usual constraints are  = 0
• Proc glm sets αr= 0
Interpretation of model
• Expected value of a Y with level i
and Xij=x is    i   ( X  X.. )
• Expected value of a Y with level i´
and Xij=x is    i    ( X  X.. )
• The difference is i - i´
• Note that this difference does not
depend on the value of x (due to
assumption of constant slopes)
Proc glm
proc glm data=a1;
class trt;
model cases=last trt;
run;
Model output
Source DF
MS
F
P
Model
3 202.60 57.78 .0001
Error 11
3.50
Total 14
Anova table output
Type I
Source DF SS MS
F
P
last
1 190 190 54.38 <.0001
trt
2 417 208 59.48 <.0001
Type III
Source DF SS MS
F
P
last
1 269 269 76.72 <.0001
trt
2 417 208 59.48 <.0001
Type I vs Type III
• Because X for covariate is not
orthogonal (like indep argument
before) to X’s for factor
– Order that the X are fit makes a
difference. Want to compare
means after adjusting for covariate
– General rule to use Type III SS
when Type I and III SS differ
Parameter Estimates
proc glm data=a1;
class trt;
model cases=last trt
/solution;
run;
Output
Par
Est
Int 4.37 B
last 0.89
trt1 12.9 B
trt2 7.9 B
trt3 0.0 B
SE
t
P
2.73 1.60 0.1381
0.10 8.76 <.0001
1.20 10.76 <.0001
1.18 6.65 <.0001
Common slope is 0.89
Interpretation
• Expected value of Y with level i of factor
A and X=x is    i   ( X  X.. )
• So ˆ  ˆi is the expected value of Y
when X is equal to the average covariate
value
• This is usually the level of X where the trt
means are calculated and compared
• Need to make sure this level of X is
reasonable for each level of the factor
LSMEANS
• The L(least)S(square) means can be
used to obtain these estimates
– All other categorical values are set at an
equal mix for all levels (I.e., average
over the other factors)
– All continuous values are set at the
overall mean
• These are similar to subpopulation
mean estimates
Intepretation for KNNL
example
• Y is cases of crackers sold under a
particular promotion scenario
• X is the cases of crackers sold
during the last period
• The LSMEANS are the estimated
cases of crackers that would be sold
for a store with the ave number of
crackers sold during the last period
LSMEANS Statement
proc glm data=a1;
class trt;
model cases=last trt;
lsmeans trt/
stderr tdiff pdiff cl;
run;
Output
treat LSMEAN
1
39.8
2
34.7
3
26.8
SE
0.8
0.8
0.8
P
<.0001
<.0001
<.0001
Output
Least
Squares Means for Effect treat
t for H0: LSMean(i)=LSMean(j) / Pr > |t|
Dependent Variable: cases
i/j
1
2
3
1
-4.12981
0.0017
-10.7636
<.0001
2
4.129808
0.0017
-6.64687
<.0001
3
10.76359
<.0001
6.646871
<.0001
Output
treat LSMEAN
1
39.8
2
34.7
3
26.8
95% CL
37.9 41.7
32.8 36.6
24.9 28.6
Output
i
1
1
2
Difference
Between
95% CL for
j Means
LSM(i)-LSM(j)
2 5.0
2.3 7.7
3 12.9
10.3 15.6
3 7.9
5.2 10.5
Prep data for plot
title1 'Plot of the data
with the model';
proc glm data=a1;
class trt;
model cases=last trt;
output out=a2 p=pred;
Prep data for plot
data a3; set a2;
drop cases pred;
if trt eq 1 then do
cases1=cases;
pred1=pred;
output; end;
Prep data for plot
if treat eq 2 then do
cases2=cases;
pred2=pred;
output; end;
if treat eq 3 then do
cases3=cases;
pred3=pred;
output; end;
proc print data=a3; run;
Code for plot
symbol1 v='1' i=none c=black;
symbol2 v='2' i=none c=black;
symbol3 v='3' i=none c=black;
symbol4 v=none i=rl c=black;
symbol5 v=none i=rl c=black;
symbol6 v=none i=rl c=black;
proc gplot data=a3;
plot (cases1 cases2 cases3
pred1 pred2 pred3)
*last/frame overlay;
run;
Check for equality of
slopes
title1 'Check for equal slopes';
proc glm data=a1;
class trt;
model cases=last trt
last*trt;
run;
Output
Type I
Source DF
last
1
treat
2
last*treat 2
Type III
Source
DF
last
1
treat
2
last*treat 2
F
54.44
59.55
1.01
P
<.0001
<.0001
0.4032
F
P
69.42 <.0001
0.18 0.8379
1.01 0.4032
Diagnostics and remedies
•
•
•
•
Examine the data and residuals
Look for outliers that are influential
Transform if needed, consider Box-Cox
Examine variances (standard deviations)
– Use a BY statement and look at the
MSE
Last slide
• Read KNNL Ch 22
• We used topic26.sas to generate the
output for today