Transcript Document

Chapter 13
Analysis of Variance
1
Chapter Outline
An introduction to experimental design and
analysis of variance
Analysis of Variance and the completely
randomized design
2
An Introduction to Experimental Design and
Analysis and Variance
 Statistical studies can be classified as being either
experimental or observational.
 In an experimental study, one or more factors are
controlled so that data can be obtained about how the
factors influence the variables of interest.
 In an observational study, no attempt is made to control the
factors.
 Cause and effect relationship are easier to establish in
experimental studies than in observational studies.
 Analysis of variance (ANOVA) can be used to analyze the
data obtained from experimental or observational studies.
3
An Introduction to Experimental Design and
Analysis and Variance
 Three types of experimental designs are
introduced.
 A completely randomly design
 A randomized block design
 A factorial experiment
4
An Introduction to Experimental Design and
Analysis and Variance
 A factor is a variable that the experimenter has
selected for investigation.
 A treatment is a level of a factor
 For example, if location is a factor, then a treatment of
location can be New York, Chicago, or Seattle.
 Experimental units are the objects of interest in the
experiment.
 A completely randomized design is an
experimental design in which the treatments are
randomly assigned to the experimental units.
5
Analysis of Variance: A Conceptual Overview
 Analysis of Variance (ANOVA) can be used to test
for the equality of three or more population
means.
 Data obtained from observational or experimental
studies can be used for the analysis.
 We want to use the sample results to test the
following hypothesis:
H0: 1 = 2 = 3 = . . . = k
Ha: Not all population means are equal
6
Analysis of Variance: A Conceptual Overview
H0: 1 = 2 = 3 = . . . = k
Ha: Not all population means are equal
If H0 is rejected, we cannot conclude that all
population means are different.
Rejecting H0 means that at least two population
means have different values.
7
Analysis of Variance: A Conceptual Overview
Assumptions for Analysis of Variance
For each population, the response (dependent)
variable is normally distributed.
The variance of the response variable, denoted  2,
is the same for all of the populations.
The observations must be independent.
8
Analysis of Variance: A Conceptual Overview
Sampling Distribution of x Given H0 is True
Sample means are likely to be
close to the same population
mean if H0 is true.
 x2 
x2
µ x1
x3
2
n
If H0 is true, all the
populations have the
same mean. It is also
assumed that all the
populations have the
same variance. Therefore,
all the sample means are
drawn from the same
sampling distribution. As
a result, the sample
means tend to be close to
one another.
9
Analysis of Variance: A Conceptual Overview
Sampling Distribution of x Given H0 is False
When H0 is false, sample means are drawn from different
populations. As a result, sample means tend NOT to be
close together. Instead, they tend to be close to their own
population means.
x3
3
x1 1
2
x2
10
Analysis of Variance
 Between-treatments estimate of population
variance
 Within-treatments estimate of population variance
 Comparing the variance estimates: The F test
 ANOVA table
11
Between-Treatments Estimate of Population
Variance  2
 The estimate of  2 based on the variation of the
sample means is called the mean square due to
treatments and is denoted by MSTR.
k
MSTR 
2
n
(
x

x
)
 j j
Denominator is the
degrees of freedom
associated with SSTR
j 1
k1
Numerator is called
the sum of squares due
to treatments (SSTR)
12
Between-Treatments Estimate of Population
Variance  2
k
MSTR 
2
n
(
x

x
)
 j j
j 1
k1
• k is the number of treatments (# of samples)
• n j is the number of observations in treatment j
• x j is the sample mean of treatment j
• x is the overall mean, i.e. the average value of ALL the
k
observations from all the treatments
x
nj
 x
j 1 i 1
ij
nT
13
Within-Treatments Estimate of Population
Variance  2
 The estimate of  2 based on the variation of the
sample observations within each sample is called
the mean square due to error and is denoted by
MSE.
k
MSE 
Denominator is the
degrees of freedom
associated with SSE
 n
j 1
j
 1 s 2j
nT  k
Numerator is called
the sum of squares
due to error (SSE)
14
Within-Treatments Estimate of Population
Variance  2
 n
k
MSE 
j 1
j
 1 s 2j
nT  k
• k is the number of treatments (# of samples)
• n j is the number of observations in treatment j
• s 2j is the sample variance of treatment j
• nT is the total number of ALL the observations from all the
k
treatments
nT   n j
j 1
15
Comparing the Variance Estimates: The F Test
 Because the within-treatments estimate (MSE) of  2 only
involves sample variances, all of which are unbiased
estimates of the population variance (according to the
assumptions, all the population variances are the same),
MSE is a good estimate of population variance regardless
whether H0 is true or not.
 On the other hand, the between-treatments estimate
(MSTR), which uses sample means, will be a good
estimate of  2 if H0 is true, since all the sample means are
drawn from the same population when H0 is true.
 When H0 is false, the sample means are drawn from
different populations (with different µ). Therefore, MSTR
will overestimate  2 since the sample means will not be
close together.
16
Comparing the Variance Estimates: The F Test
 If the null hypothesis is true and the ANOVA assumptions
are valid, the sampling distribution of MSTR/MSE is an F
distribution with MSTR degrees of freedom (d.f.) equal to
k -1 and MSE d.f. equal to nT - k.
 If H0 is true, MSTR/MSE should be close to 1 since both
are good estimates of  2.
 If H0 is false, i.e. if the means of the k populations are not
equal, the ratio MSTR/MSE will be larger than 1 since
MSTR overestimates  2 .
 Hence, we will reject H0 if the value of MSTR/MSE
proves to be too large to have been resulted at random
from the appropriate F distribution.
17
Comparing the Variance Estimates: The F Test
Sampling Distribution of MSTR/MSE
Sampling Distribution
of MSTR/MSE
Reject H0
Do Not Reject H0

F
Critical Value
MSTR/MSE
18
ANOVA Table
Source of
Variation
Sum of Degrees of
Squares Freedom
SSTR
k-1
Error
SSE
nT - k
Total
SST
nT - 1
Treatments
SST is partitioned
into SSTR and SSE.
Mean
Square
F
pValue
SSTR MSTR
k-1
MSE
SSE
MSE 
nT - k
MSTR 
SST’s degrees of freedom
(d.f.) are partitioned into
SSTR’s d.f. and SSE’s d.f.
19
ANOVA Table
SST divided by its degrees of freedom nT – 1 is the
overall sample variance that would be obtained if we
treated the entire set of observations as one data set.
With the entire data set as one sample, the formula
for computing the total sum of squares, SST, is:
k
nj
SST   ( xij  x )2  SSTR  SSE
j 1 i 1
20
ANOVA Table
ANOVA can be viewed as the process of partitioning
the total sum of squares and the degrees of freedom
into their corresponding sources: treatments and error.
Dividing the sum of squares by the appropriate
degrees of freedom provides the variance estimates.
The F value (MSTR/MSE) is used to test the hypothesis
of equal population means.
21
Test for the Equality of k Population Means

Hypotheses
H0: 1 = 2 = 3 = . . . = k
Ha: Not all population means are equal

Test Statistic
F = MSTR/MSE
22
Test for the Equality of k Population Means

Rejection Rule
p-value Approach:
Reject H0 if p-value < 
Critical Value Approach:
Reject H0 if F > F
where the value of F is based on an
F distribution with k - 1 numerator d.f.
and nT - k denominator d.f.
23
Test for the Equality of k Population Means:
An Observational Study

Example: Reed Manufacturing
Janet Reed would like to know if there is any
significant difference in the mean number of hours
worked per week for the department managers at
her three manufacturing plants (in Buffalo,
Pittsburgh, and Detroit). An F test will be conducted
using  = .05.
24
Test for the Equality of k Population Means:
An Observational Study

Example: Reed Manufacturing
A simple random sample of five managers from
each of the three plants was taken and the number of
hours worked by each manager in the previous week
is shown on the next slide.
Factor . . . Manufacturing plant
Treatments . . . Buffalo, Pittsburgh, Detroit
Experimental units . . . Managers
Response variable . . . Number of hours worked
25
Test for the Equality of k Population Means:
An Observational Study
Observation
1
2
3
4
5
Sample Mean
Sample Variance
Plant 1
Buffalo
48
54
57
54
62
Plant 2
Pittsburgh
73
63
66
64
74
Plant 3
Detroit
51
63
61
54
56
55
26.0
68
26.5
57
24.5
26
Test for the Equality of k Population Means:
An Observational Study
1. Develop the hypotheses.
H0:  1 =  2 =  3
Ha: Not all the means are equal
where:
 1 = mean number of hours worked per
week by the managers at Plant 1
 2 = mean number of hours worked per
week by the managers at Plant 2
 3 = mean number of hours worked per
week by the managers at Plant 3
27
Test for the Equality of k Population Means:
An Observational Study
2. Specify the level of significance.
 = .05
3. Compute the value of the test statistic.
Mean Square Due to Treatments
(Only when sample sizes are all equal, the overall
mean is equal to the average of sample means.)
x  x1  x2  x3  / 3 = (55 + 68 + 57)/3 = 60
SSTR = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490
MSTR = 490/(3 - 1) = 245
28
Test for the Equality of k Population Means:
An Observational Study
3. Compute the value of the test statistic. (con’t.)
Mean Square Due to Error
SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308
MSE = 308/(15 - 3) = 25.667
F = MSTR/MSE = 245/25.667 = 9.55
29
Test for the Equality of k Population Means:
An Observational Study
 ANOVA Table
Source of
Variation
Sum of Degrees of
Squares Freedom
Treatment
Error
490
308
2
12
Total
798
14
Mean
Square
245
25.667
F
p-Value
9.55
.0033
30
Test for the Equality of k Population Means:
An Observational Study

p – Value Approach
4. Compute the p –value.
With 2 numerator d.f. and 12 denominator d.f.,
the p-value is .0033 for F = 9.55.
5. Determine whether to reject H0.
The p-value < .05, so we reject H0.
We have sufficient evidence to conclude that the
mean number of hours worked per week by
department managers is not the same at all 3 plants.
31
Test for the Equality of k Population Means:
An Observational Study

Critical Value Approach
4. Determine the critical value and rejection rule.
Based on an F distribution with 2 numerator
d.f. and 12 denominator d.f., F.05 = 3.89.
Reject H0 if F > 3.89 (critical value)
5. Determine whether to reject H0.
Because F = 9.55 > 3.89, we reject H0.
We have sufficient evidence to conclude that the
mean number of hours worked per week by
department managers is not the same at all 3 plants.
32