Doing ANOVA and t

Download Report

Transcript Doing ANOVA and t

Doing ANOVA and t-tests
LISA short course by Ciro Velasco-Cruz
October 21, 2008
ONE SAMPLE t TEST
Example
In a study, 15 lobsters were randomly selected from recent catches
along a certain region of the Maine shore line. The lobsters were
weighed to the nearest ounce, with results:
26 14 18 13 22 15 24 21 29 10 12 31 19 16 21
Suppose that for research purposes it is needed that the mean
lobster’s weight equal to 15 ounces. It is known that lobster weight is
normally distributed with both mean and standard deviation unknown.
SAS for coding
The data step
data lobsters_w;
input type weigth @@;
datalines;
1 26 1 14 1 18 1 13 1 22
1 15 1 24 1 21 1 29 1 10
1 12 1 31 1 19 1 16 1 21
;
SAS for coding
Exploratory data analysis:
proc means data=lobsters_w mean std max min median;
var weigth;
run;
proc boxplot data=lobsters_w;
title'BoxPlot for one sample t-test example';
plot (weigth)*type/ cframe = vligb
cboxes = dagr
cboxfill = ywh;
inset mean max min /CFILL = WHITE
header = "Summary"
CTEXT = RED;
run;
SAS OUTPUT
The SAS System
The MEANS Procedure
Analysis Variable : weigth
Mean
Std Dev
Maximum
Minimum
Median
19.4000000
6.2655521
31.0000000
10.0000000
19.0000000
SAS OUTPUT
SAS coding
Data analysis:
proc ttest data=lobsters_w h0=15;
title 'One sample t test example';
var weigth;
run;
SAS OUTPUT
One sample t test example
The TTEST
Procedure
Statistics
Variable
N
weigth
1
5
Lower CL
Mean
Mean
Upper CL
Mean
Lower CL
Std Dev
Std Dev
Upper CL
Std Dev
Std Err
Minimum
Maximum
15.93
19.4
22.87
4.5872
6.2656
9.8814
1.6178
10
31
T-Tests
Variable
weigth
DF
t Value
Pr > |t|
14
2.72
0.0166
Conclusion: Since the p-value is <0.05, we reject the Null Hypothesis, that
the mean=15, at 5% of level of significance.
Two Sample t-test example
An animal scientist is interested in comparing two different topical treatments
(A, B) against osteoarthritis in the leg joints of horses. Seven horses with the
illness are available at the animal clinic. For each horse it is randomly
determined which of the front legs receives treatment A and which treatment B.
After four weeks of treat., the horses’ mobility is measured.
Assuming that they were two independent samples, we can perform our tests.
SAS data step
data horses;
input trt horse mobility @@ ;
cards;
1 1 48.2 1 2 44.6 1 3 49.7 1 4 40.5
1 5 54.6 1 6 47.1 1 7 46.8 2 1 41.5
2 2 40.1 2 3 44.0 2 4 41.2 2 5 49.8
2 6 41.7 2 7 51.4
;
SAS E.D.A.
proc means data=horses mean std max min median;
class trt;
var mobility;
run;
proc boxplot data=horses;
title'BoxPlot for two sample t-test example';
plot (mobility)*trt/ cframe
= vligb
cboxes = dagr
cboxfill = ywh;
insetgroup mean max min q1 q2 q3/header = 'Summary by
Treatme
ctext = red;
run;
SAS OUTPUT
The MEANS Procedure
Analysis Variable : mobility
trt
N Obs
Mean
Std Dev
Maximum
Minimum
Median
1
7
47.3571429
4.3523393
54.6000000
40.5000000
47.1000000
2
7
44.2428571
4.5199031
51.4000000
40.1000000
41.7000000
SAS OUTPUT
SAS t test
proc ttest data=horses;
title 'Two sample t test example';
class trt;
var mobility;
run;
SAS OUTPUT
Two sample t test example
The TTEST
Procedure
Statistics
Variable
trt
N
Lower CL
Mean
Upper CL
Mean
Lower CL
Std Dev
Std Dev
Upper CL
Std Dev
Std Err
Minimum
Maximum
mobility
1
7
43.332
47.35
7
51.382
2.8046
4.3523
9.5841
1.645
40.5
54.6
mobility
2
7
40.063
44.24
3
48.423
2.9126
4.5199
9.9531
1.7084
40.1
51.4
mobility
Diff (1-2)
-2.053
3.114
3
8.2816
3.1816
4.4369
7.3242
2.3716
Mean
T-Tests
Variable
Method
Variances
DF
t Value
Pr > |t|
mobility
Pooled
Equal
12
1.31
0.2137
mobility
Satterthwaite
Unequal
12
1.31
0.2137
Equality of Variances
Variable
Method
mobility
Folded F
Num DF
Den DF
F Value
Pr > F
6
6
1.08
0.9293
Conclusion
• About Variance: Since the p-value is larger
than 5%, we conclude that the variances
are indeed equal.
• About means: Since p-value for this test is
larger to 5% too, we conclude that the
means are equal.
Paired t test example
• Let’s consider the last example. Since treatment
A and B were both measured on the same
horse. Measurements of mobility are not
independent within horses. Then the right way to
analyze the data is by Paired t test.
• Idea: we look at the difference between the
response from trts A and B:
Di=YiA-YiB
SAS paired test
proc ttest data=newhorses;
paired MobilityA*MobilityB;
run;
The SAS System
The TTEST Procedure
Statistics
Difference
N
Lower CL
Mean
Mea
n
Upper CL
Mean
Lower CL
Std Dev
Std Dev
Upper CL
Std Dev
Std Err
Minimum
Maximum
MobilityA - MobilityB
7
-0.729
3.114
3
6.9571
2.6775
4.1551
9.1498
1.5705
-4.6
6.7
T-Tests
Difference
MobilityA - MobilityB
DF
t Value
Pr > |t|
6
1.98
0.0946
But why is it happeing?
One Way Anova
An experiment was conducted to study the growth of
plant tissue in the presence of hormone solutions
containing various growth inhibiting substances. For
each solution, 10 independent tissues cultures were
prepared and the growth of the plant tissue was recorded
in mm.
This experiment has One factor and 5 levels. Each has
10 replications.
SAS data step
data peasection;
input trtmnt growth @@;
label trtmnt= 1:'Control'
2:'Sol.1'
3:'Sol.2'
4:'Mixture'
5:'Sol.3';
datalines;
1 7.841 8.691 8.11 1
1 7.691 7.981 7.641
2 6.782 6.692 6.952
2 6.692 6.722 6.572
3 6.793 6.793 6.793
3 6.693 6.573 6.493
4 6.644 6.574 6.784
4 6.364 6.674 6.264
5 7.315 7.655 7.265
5 7.465 7.325 7.135
;
8.351
8.571
6.642
6.672
6.613
7.053
6.484
6.674
7.395
7.075
7.74
8.32
6.41
7.07
6.43
6.72
6.54
6.68
6.98
7.25
SAS coding
proc boxplot data=peasection;
title'BoxPlot for one-way ANOVA example';
plot growth*trtmnt/ cframe
= vligb
cboxes = dagr
cboxfill = ywh;
insetgroup mean stddev q1 q2 q3/header = 'Summary by Treatment'
ctext = red;
run;
SAS output
SAS glm anyway
proc glm data=peasection;
class trtmnt;
model growth=trtmnt;
lsmeans trtmnt /pdiff adjust=tukey ;
contrast 'our first contrast with contrast' trtmnt -1 0-1 0 2;
estimate 'our first contrast with estimate' trtmnt -1 0-1 0 2;
output out=residuals p=yhat r=res;
run;
SAS output
The GLM Procedure
Dependent Variable: growth
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
4
16.11827200
4.02956800
74.32
<.0001
Error
45
2.43972000
0.05421600
Corrected Total
49
18.55799200
Model
Source
DF
Type I SS
Mean Square
F Value
Pr > F
trtmnt
4
16.11827200
4.02956800
74.32
<.0001
Source
DF
Type III SS
Mean Square
F Value
Pr > F
trtmnt
4
16.11827200
4.02956800
74.32
<.0001
SAS output
trtmnt
growth LSMEAN
LSMEAN Number
1
8.09300000
1
2
6.71900000
2
3
6.69300000
3
4
6.56500000
4
5
7.28200000
5
Least Squares Means for effect trtmnt
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: growth
i/j
1
1
2
3
4
5
<.0001
<.0001
<.0001
<.0001
0.9991
0.5812
<.0001
0.7346
<.0001
2
<.0001
3
<.0001
0.9991
4
<.0001
0.5812
0.7346
5
<.0001
<.0001
<.0001
Contrast
<.0001
<.0001
DF
Contrast SS
Mean Square
F Value
Pr > F
1
0.08214000
0.08214000
1.52
0.2248
our first contrast with contrast
Parameter
our first contrast with estimate
Estimate
Standard Error
t Value
Pr > |t|
-0.22200000
0.18035964
-1.23
0.2248
Note that: -(8.093+6.693)+2*7.282= -.222
Remedies
• Transform the response:
Log(var(y))=Co+q*log(mean)
1. g(y)=y^(1-q/2) if q different to 2
2. g(y)=log(y) q=2 and y>0
3. g(y)=log(y+shift) q=2 if some y <=0
• Use analysis for Gaussian data with
unequal variances: Satterthwaite’s
approximation or Welch (for one-way
anova)
SAS E.D.A.
proc means data=peasection noprint;
var growth;
by trtmnt;
output out=varmeans var= vargro mean=meangro;
run;
data varmeans;set varmeans; vargro=log(vargro);meangro=log(meangro);
proc gplot data=varmeans;
plot vargro*meangro;
run;
proc reg data=varmeans;
model vargro=meangro;
run;
SAS output
SAS regression
The REG Procedure
Model: MODEL1
Dependent Variable: vargro
Root MSE
0.24863
R-Square
0.8990
Dependent Mean
-3.14165
Adj R-Sq
0.8654
Coeff Var
-7.91405
Parameter Estimates
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
1
-17.58795
2.79721
-6.29
0.0081
meangro
1
7.39762
1.43125
5.17
0.0141
Variable
SAS trans. And analysis code
data trans;
set peasection;
yt=growth**-2.69881;
;
proc glm data=trans;
class trtmnt;
model yt=trtmnt;
means trtmnt /hovtest=levene(type=square);
output out=resi r=res;
run;
proc boxplot data=resi;
title'BoxPlot for one-way ANOVA example';
plot res*trtmnt/ cframe
= vligb
cboxes = dagr
cboxfill = ywh;
insetgroup mean stddev q1 q2 q3/header = 'Summary by Treatment'
ctext = red;
run;
SAS output
The GLM Procedure
Dependent Variable: yt
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
4
0.00004922
0.00001231
72.52
<.0001
Error
45
0.00000764
0.00000017
Corrected Total
49
0.00005686
Model
Source
DF
Type I SS
Mean Square
F Value
Pr > F
trtmnt
4
0.00004922
0.00001231
72.52
<.0001
Source
DF
Type III SS
Mean Square
F Value
Pr > F
trtmnt
4
0.00004922
0.00001231
72.52
<.0001
Levene's Test for Homogeneity of yt Variance
ANOVA of Squared Deviations from Group Means
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
trtmnt
4
3.2E-14
8E-15
0.21
0.9297
45
1.69E-12
3.75E-14
Error
SAS output
Two-way ANOVA fixed factors
An educational researcher was interested in the
factors noise and solitude as they affect study conditions. Each
subject in an experiment was asked to study an essay on
American history for 15 minutes and then was tested on a 25 item
quiz, the number of correct items being the score. The subjects
differed, however, in the conditions under which they were allowed
to study
Factor Solitude with 2 levels: Alone and not alone (w/stooge)
Factor Noise with 3 levels: no noise, soft background music, and
loud rock and roll music.
There are 3 replication of each treatment combination.
SAS data step
data QuizScores;
input Solitude $ Noise $ Score @@;
datalines;
Alone None 10 Alone None 6 Alone None 14
Alone Soft 21 Alone Soft 21 Alone Soft 16
Alone Loud 5 Alone Loud 15 Alone Loud 7
Stooge None 6 Stooge None 11 Stooge None 1
Stooge Soft
6 Stooge Soft 17 Stooge Soft 13
Stooge Loud 1 Stooge Loud 2 Stooge Loud 6
;
SAS
E.D.A
proc boxplot data=quizscores;
title'BoxPlot for two-way ANOVA example';
plot score*noise(solitude)/ cframe
= vligb
cboxes = dagr
cboxfill = ywh;
*inset mean max min/pos=tm header='The overall summary';
insetgroup mean stddev q1 q2 q3/header = 'Summary by
Treatment'
ctext = red;
run;
proc means data=quizscores noprint;
by solitude noise;
var score;
output out=meanquizscore mean=meanquiz;
run;
symbol i=j;
symbol2 i=j;
proc gplot data=meanquizscore;
plot meanquiz*Noise=solitude;
plot meanquiz*solitude=noise;
run;
SAS output
SAS output
SAS output
SAS output
proc glm data=quizscores;
class solitude noise;
model score=solitude|noise;
run;
The GLM Procedure
Dependent Variable: Score
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
5
471.1111111
94.2222222
4.90
0.0113
Error
12
230.6666667
19.2222222
Corrected Total
17
701.7777778
Model
Source
DF
Type I SS
Mean Square
F Value
Pr > F
Solitude
1
150.2222222
150.2222222
7.82
0.0162
Noise
2
312.4444444
156.2222222
8.13
0.0059
Solitude*Noise
2
8.4444444
4.2222222
0.22
0.8060
Source
DF
Type III SS
Mean Square
F Value
Pr > F
Solitude
1
150.2222222
150.2222222
7.82
0.0162
Noise
2
312.4444444
156.2222222
8.13
0.0059
Solitude*Noise
2
8.4444444
4.2222222
0.22
0.8060
Slices
• On this example interaction was not significant.
But what we should do if it were?
There are a way to come out with this problem:
SLICES.
Since main effects could be either significant or
not at the presence of interaction, we need to
test how they change at a given level of a
treatment.
In SAS, we use the following statement to obtain
the slices:
lsmeans “interaction”/slice=treatment;
SAS two way ANOVA random
factor
An experiment was performed to examine the effect of time Aging
on the strength of cement. From a large number of mixes three
cement mixes were randomly selected and six specimens were
produced form each mix. After two days three randomly selected
specimens from each mix were tested for strength with a load test
and the other three specimens were tested after seven days.
This is a two-way classification with factor Cement Mix (three
levels) and Time (2 levels) The levels of factor Time were
predetermined. The three levels of cement mixes were randomly
selected from a large number of mixes, thus Cement Mix factor is
Random.
SAS data input
data YieldLoads;
input Aging $ Mix Load @@;
datalines;
2-Days 1 574 2-Days 1 564 2-Days 1 550
2-Days 2 524 2-Days 2 573 2-Days 2 551
2-Days 3 576 2-Days 3 540 2-Days 3 592
7-Days 1 1092 7-Days 1 1086 7-Days 1
1065
7-Days 2 1028 7-Days 2 1073 7-Days 2
998
7-Days 3 1066 7-Days 3 1045 7-Days 3
1055
;
SAS code
proc glm data=yieldloads;
class aging mix;
model load = aging mix aging*mix;
random mix aging*mix /test;
run;
OR USING:
proc mixed data=yieldloads;
class aging mix;
model load= aging;
random mix mix*aging;
run;
Source
SAS output
Type III Expected Mean Square
Aging
Var(Error) + 3 Var(Aging*Mix) + Q(Aging)
Mix
Var(Error) + 3 Var(Aging*Mix) + 6 Var(Mix)
Aging*Mix
Var(Error) + 3 Var(Aging*Mix)
The GLM Procedure
Tests of Hypotheses for Mixed Model Analysis of Variance
Dependent Variable: Load
Source
DF
Type III SS
Mean Square
F Value
Pr > F
Aging
1
1107072
1107072
1965.80
0.0005
Mix
2
2957.444444
1478.722222
2.63
0.2758
Error: MS(Aging*Mix)
2
1126.333333
563.166667
DF
Type III SS
Mean Square
F Value
Pr > F
2
1126.333333
563.166667
1.06
0.3774
12
6386.666667
532.222222
Source
Aging*Mix
Error: MS(Error)
Question…
• Option 1. Go back and complete SLICE
part
or
• Option 2. Go ahead to the MANOVA
• ?
MANOVA example
A researcher randomly assigns 33 subjects to one of three
groups:
G1 receives technical dietary information interactively from an
on-line website.
G2 receives the same information in from a nurse practitioner
G3 receives the information from a video tape made by the
same nurse practitioner
The researcher looks at three different ratings of the
presentation, difficulty, useful and importance, to determine if
there is a difference in the modes of presentation. In particular,
the researcher is interested in whether the interactive website is
superior because that is the most cost-effective way of
delivering the information.
SAS code
proc glm data=manovaex;
class group;
model useful difficulty importance = group;
contrast '1 vs 2&3' group 2 -1 -1;
contrast '2 vs 3' group 0 1 -1;
manova h=_all_;
run;
Note: go to the manova.sas example