Transcript Lecture 10

Single-Factor ANOVA
We skip Sec 10.3
Single-factor ANOVA focuses on a comparison of more
than two population or treatment means. Let
l = the number of populations or treatments being
compared
1 = the mean of population 1 or the true average response
when treatment 1 is applied
.
I = the mean of population I or the true average response
when treatment I is applied
1
Single-Factor ANOVA
The relevant hypotheses are
H0: 1 = 2 = ··· = I
versus
Ha: at least two the of the i’s are different
If I = 4, H0 is true only if all four i’s are identical. Ha would
be true, for example, if
1 = 2  3 = 4, if 1 = 3 = 4  2,
or if all four i’s differ from one another.
2
Notation and Assumptions
Let
xi, j = the random variable (rv) that denotes the jth
measurement taken from the ith population, or the
measurement taken on the jth experimental unit that
receives the ith treatment
xi, j = the observed value of xi, j when the experiment is
performed
3
Notation and Assumptions
Here we’ll focus on the case of equal sample sizes;
Let J denote the number of observations in each sample
(J = 6 in Example 1). The data set consists of IJ
observations.
The individual sample means will be denoted by
X1, X2, . . ., XI.
That is,
4
Notation and Assumptions
The dot in place of the second subscript signifies that we
have added over all values of that subscript while holding
the other subscript value fixed, and the horizontal bar
indicates division by J to obtain an average.
Similarly, the average of all IJ observations, called the
grand mean, is
5
Notation and Assumptions
Assumptions
The I population or treatment distributions are all normal
with the same variance 2. That is, each xij is normally
distributed with
E(Xij) = i
V(Xij) = 2
The I sample standard deviations will generally differ
somewhat even when the corresponding ’s are identical.
6
The Test Statistic
Definition
Mean square for treatments is given by
and mean square for error is
The test statistic for single-factor ANOVA is F = MSTr/MSE.
7
The Test Statistic
The terminology “mean square” will be explained shortly.
Notice that uppercase X’s and S2’s are used, so MSTr and
MSE are defined as statistics.
We will follow tradition and also use MSTr and MSE
(rather than mstr and mse) to denote the calculated values
of these statistics.
Each
assesses variation within a particular sample, so
MSE is a measure of within-samples variation.
8
The Test Statistic
Proposition
When H0 is true,
E(MSTr) = E(MSE) = 2
whereas when H0 is false,
E(MSTr) > E(MSE) = 2
That is, both statistics are unbiased for estimating the
common population variance 2 when H0 is true, but MSTr
tends to overestimate 2 when H0 is false.
9
F Distributions and the F Test
Theorem
Let F = MSTr/MSE be the test statistic in a single-factor
ANOVA problem involving I populations or treatments with
a random sample of J observations from each one.
When H0 is true and the basic assumptions of this section
are satisfied, F has an F distribution with v1 = I – 1 and
v2 = I(J – 1).
With f denoting the computed value of F, the rejection
region f 
then specifies a test with significance
level .
10
F Distributions and the F Test
The rationale for v1 = I – 1 is that although MSTr is based
on the I deviations X1 – X, . . ., X1 – X (X1 – X) = 0, so
only I – 1 of these are freely determined.
Because each sample contributes J – 1 df to MSE and
these samples are independent,
v2 = (J – 1) + · · · + (J – 1) = I(J – 1).
11
Sums of Squares
The introduction of sums of squares facilitates developing
an intuitive appreciation for the rationale underlying
single-factor and multifactor ANOVAs.
Let xi represent the sum (not the average, since there is no
bar) of the xij’s for i fixed (sum of the numbers in the ith row
of the table) and x denote the sum of all the xij’s
(the grand total).
12
Sums of Squares
Definition
The total sum of squares (SST), treatment sum of
squares (SSTr), and error sum of squares (SSE) are
given by
13
Sums of Squares
Fundamental Identity
SST = SSTr + SSE
(10.1)
Thus if any two of the sums of squares are computed, the
third can be obtained through (10.1); SST and SSTr are
easiest to compute, and then SSE = SST – SSTr. The proof
follows from squaring both sides of the relationship
xij – x = (xij – xi) + (xi – x)
(10.2)
and summing over all i and j.
14
Sums of Squares
Once SSTr and SSE are computed, each is divided by its
associated df to obtain a mean square (mean in the sense
of average). Then F is the ratio of the two mean squares.
(10.3)
15
Sums of Squares
The computations are often summarized in a tabular
format, called an ANOVA table, as displayed in Table 10.2.
Tables produced by statistical software customarily include
a P-value column to the right of f.
An ANOVA Table
Table 10.2
16
Multiple Comparisons in ANOVA
When the computed value of the F statistic in single-factor
ANOVA is not significant, the analysis is terminated
because no differences among the i’s have been
identified.
But when H0 is rejected, the investigator will usually want to
know which of the i’s are different from one another.
A method for carrying out this further analysis is called a
multiple comparisons procedure.
17
Multiple Comparisons in ANOVA
Several of the most frequently used procedures are based
on the following central idea.
First calculate a confidence interval for each pairwise
difference i – j with i < j. Thus if I = 4. the six required CIs
would be for 1 – 2 (but not also for 2 – 1), 1 – 3,
1 – 4, 2 – 3, 2 – 4, and 3 – 4.
Then if the interval for 1 – 2 does not include 0, conclude
that 1 and 2 differ significantly from one another; if the
interval does include 0, the two ’s are judged not
significantly different.
18