Transcript Document

Be humble in our attribute, be loving and varying in our
attitude, that is the way to live in heaven.
Applied Statistics Using SAS
and SPSS
Topic: One Way ANOVA
By Prof Kelly Fan, Cal State Univ, East Bay
Statistical Tools vs. Variable Types
Response
(output)
Numerical
Categorical
Predictor (input)
Numerical
Categorical/Mixed
Analysis of
Variance (ANOVA)
Simple
Analysis of
and Multiple
Covariance
Regression
(ANCOVA)
Categorical data analysis
Example: Battery Lifetime
 8 brands of battery are studied. We would like to
find out whether or not the brand of a battery will
affect its lifetime. If so, of which brand the
batteries can last longer than the other brands.
 Data collection: For each brand, 3 batteries are
tested for their lifetime.
 What is Y variable? X variable?
Data: Y = LIFETIME (HOURS)
3 replications
1
per level
BRAND
2
3
4
5
6
7
8
1.8 4.2 8.6 7.0 4.2 4.2 7.8 9.0
5.0 5.4 4.6 5.0 7.8 4.2 7.0 7.4
1.0 4.2 4.2 9.0 6.6 5.4 9.8 5.8
2.6 4.6 5.8 7.0 6.2 4.6 8.2 7.4 5.8
Dotplot for life
1
2
3
4
5
6
life
7
8
9
10
Dotplots of life by brand
(group means are indicated by lines)
10
9
8
7
life
6
5
4
3
2
8
7
6
5
4
3
2
brand
1
1
Boxplots of life by brand
(means are indicated by solid circles)
10
9
8
7
life
6
5
4
3
2
1
8
7
6
5
4
3
2
brand
1
0
Statistical Model
“LEVEL” OF BRAND
1
1
2
•
•
•
•
n
2 ••• • • •••C
Y11 Y12 • • • • • • •Y1c
Y21
•
•
•
•
•
•
Yij
•
•
•
•
YnI
(Brand is, of course, represented as
“categorical”)
• • • •
•
•
•
•
•Ync
Yij = i + ij
i = 1, . . . . . , C
j = 1, . . . . . , n
Hypotheses Setup
HO: Level of X has no impact on Y
HI: Level of X does have impact on Y
HO: 1 = 2 = • • • • 8
HI: not all j are EQUAL
ONE WAY ANOVA
Analysis of Variance for life
Source
DF
SS
MS
brand
7
69.12
9.87
Error
16
46.72
2.92
Total
23
115.84
F
3.38
P
0.021
Estimate of the common variance s^2
S = 1.709
R-Sq = 59.67%
R-Sq(adj) = 42.02%
Review
Fitted value = Predicted value
Residual = Observed value – fitted value
Diagnosis: Normality
• The points on the normality plot must more or less
follow a line to claim “normal distributed”.
• There are statistic tests to verify it scientifically.
• The ANOVA method we learn here is not sensitive
to the normality assumption. That is, a mild
departure from the normal distribution will not
change our conclusions much.
Normality plot: normal scores vs. residuals
From the Battery lifetime data:
Normal Probability Plot of the Residuals
(response is life)
99
95
90
Percent
80
70
60
50
40
30
20
10
5
1
-4
-3
-2
-1
0
Residual
1
2
3
4
Diagnosis: Equal Variances
• The points on the residual plot must be more or less
within a horizontal band to claim “constant
variances”.
• There are statistic tests to verify it scientifically.
• The ANOVA method we learn here is not sensitive to
the constant variances assumption. That is, slightly
different variances within groups will not change our
conclusions much.
Residual plot: fitted values vs. residuals
From the Battery lifetime data:
Residuals Versus the Fitted Values
(response is life)
3
Residual
2
1
0
-1
-2
2
3
4
5
Fitted Value
6
7
8
Multiple Comparison
Procedures
Once we reject H0: ==...c in favor
of H1: NOT all ’s are equal, we don’t
yet know the way in which they’re not
all equal, but simply that they’re not all
the same. If there are 4 columns, are
all 4 ’s different? Are 3 the same and
one different? If so, which one? etc.
These “more detailed” inquiries into
the process are called MULTIPLE
COMPARISON PROCEDURES.
Errors (Type I):
We set up “” as the significance level
for a hypothesis test. Suppose we test 3
independent hypotheses, each at =
.05; each test has type I error (rej H0
when it’s true) of .05. However,
P(at least one type I error in the 3 tests)
all ) = 1 - (.95)3  .14
= 1-P(3,accept
given true
In other words, Probability is .14 that at
least one type one error is made. For 5
tests, prob = .23.
Question - Should we choose = .05,
and suffer (for 5 tests) a .23 OVERALL
Error rate (or “a” or experimentwise)?
OR
Should we choose/control the overall
error rate, “a”, to be .05, and find the
individual test  by 1 - (1-)5 = .05,
(which gives us  = .011)?
The formula 1 - (1-)5 = .05
would be valid only if the tests are
independent; often they’re not.
1
2
3
[ e.g., 1=22= 3, 1= 3
IF 1 accepted & 2 rejected, isn’t
it more likely that 3 rejected? ]
When the tests are not independent,
it’s usually very difficult to arrive at
the correct for an individual test so
that a specified value results for the
overall error rate.
Categories of multiple
comparison tests
- “Planned”/ “a priori” comparisons (stated in
advance, usually a linear combination of the
column means equal to zero.)
- “Post hoc”/ “a posteriori” comparisons (decided
after a look at the data - which comparisons “look
interesting”)
- “Post hoc” multiple comparisons (every column
mean compared with each other column mean)
There are many multiple comparison
procedures. We’ll cover only a few.
Post hoc multiple comparisons
1)Pairwise comparisons: Do a series of
pairwise tests; Duncan and SNK tests
2)(Optional) Comparisons to control: Dunnett
tests
Example: Broker Study
A financial firm would like to determine if brokers they use to
execute trades differ with respect to their ability to provide a stock
purchase for the firm at a low buying price per share. To measure
cost, an index, Y, is used.
Y=1000(A-P)/A
where
P=per share price paid for the stock;
A=average of high price and low price per share, for the
day.
“The higher Y is the better the trade is.”
CoL: broker
1
12
3
5
-1
12
5
6
2
7
17
13
11
7
17
12
3
8
1
7
4
3
7
5
4
21
10
15
12
20
6
14
5
24
13
14
18
14
19
17
}
R=6
Five brokers were in the study and six trades
were randomly assigned to each broker.
SPSS Output
Analyze>>General Linear Model>>Univariate…
Tests of Between-Subjects Effects
Dependent Variable: sales
Source
Corrected Model
Intercept
broker
Error
Total
Corrected Total
Type III Sum
of Squares
640.800a
3499.200
640.800
530.000
4670.000
1170.800
df
4
1
4
25
30
29
Mean Square
160.200
3499.200
160.200
21.200
a. R Squared = .547 (Adjusted R Sq uared = .475)
F
7.557
165.057
7.557
Sig .
.000
.000
.000
Homogeneous Subsets
sales
Subset
Student-Newman-Keuls
Duncan a,b
a,b
broker
3.00
1.00
2.00
4.00
5.00
Sig .
3.00
1.00
2.00
4.00
5.00
Sig .
N
6
6
6
6
6
6
6
6
6
6
Means for groups in homogeneous subsets are displayed.
Based on Type III Sum of Squares
The error term is Mean Square(Error) = 21.200.
a. Uses Harmonic Mean Sample Size = 6.000.
b. Alpha = .05.
1
5.0000
6.0000
.710
5.0000
6.0000
.710
2
12.0000
14.0000
17.0000
.165
12.0000
14.0000
17.0000
.086
Conclusion : 3, 1 2, 4, 5
Conclusion : 3, 1 2 4 5 ???
Conclusion : 3, 1 2 4 5
 Broker 1 and 3 are not significantly different
but they are significantly different to the
other 3 brokers.
 Broker 2 and 4 are not significantly different,
and broker 4 and 5 are not significantly
different, but broker 2 is different to (smaller
than) broker 5 significantly.
Comparisons to Control
Dunnett’s test
Designed specifically for (and incorporating
the interdependencies of) comparing several
“treatments” to a “control.”
Example:
CONTROL
Col
1
2
6 12
3
4
5
5 14 17
} R=6
CONTROL
In our example:
1
2
3
4
5
6
12
5 14 17
Multiple Comparisons
Dependent Variable: sales
Dunnett t (2-sided) a
(I) broker
2.00
3.00
4.00
5.00
(J) broker
1.00
1.00
1.00
1.00
Mean
Difference
(I-J)
6.0000
-1.0000
8.0000*
11.0000*
Std. Error
2.65832
2.65832
2.65832
2.65832
Sig .
.103
.987
.020
.001
Based on observed means.
*. The mean difference is significant at the .05 level.
a. Dunnett t-tests treat one g roup as a control, and compare all other groups ag ainst it.
- Cols 4 and 5 differ from the control [ 1 ].
- Cols 2 and 3 are not significantly different
from control.
95% Confidence Interval
Lower Bound
Upper Bound
-.9300
12.9300
-7.9300
5.9300
1.0700
14.9300
4.0700
17.9300
Exercise: Sales Data
Sales
Col Mean
1
Treatment
2
3
6
3
8
3
6
3
8
3
5
6
5
4
9
6
5
4
9
6
11
10
8
11
11
10
8
11
10
Exercise.
1. Find the Anova table.
2. Perform SNK tests at a = 5% to
group treatments .
3. Perform Duncan tests at a = 5% to
group treatments.
4. Which treatment would you use?
Post Hoc and Priori comparisons
•F test for linear combination of column
means (contrast)
•Scheffe test: To test all linear
combinations at once. Very conservative;
not to be used for a few of comparisons.