Testing and Estimating Variances

Download Report

Transcript Testing and Estimating Variances

Comparing k Populations
Means – One way Analysis of
Variance (ANOVA)
The F test – for comparing k means
Situation
• We have k normal populations
• Let mi and s denote the mean and standard
deviation of population i.
• i = 1, 2, 3, … k.
• Note: we assume that the standard deviation
for each population is the same.
s1 = s2 = … = sk = s
We want to test
H 0 : m1  m2  m3    mk
against
H A : mi  m j for at least one pair i, j
The data
• Assume we have collected data from each
of th k populations
• Let xi1, xi2 , xi3 , … denote the ni
observations from population i.
• i = 1, 2, 3, … k.
Let
xi 
x
j 1
ni
ij
 x
n
ni
si 
i 1
ij
 xi 
ni  1
2
One possible solution (incorrect)
• Choose the populations two at a time
• then perform a two sample t test of
H0 : mi  m j vs H A : mi  m j
xy
t
1 1
s pooled

n m
• Repeat this for every possible pair of
populations
• The flaw with this procedure is that you are
performing a collection of tests rather than a
single test
• If each test is performed with a = 0.05, then the
probability that each test makes a type I error is
5% but the probability the group of tests makes a
type I error could be considerably higher than
5%.
• i.e. Suppose there is no different in the means of
the populations. The chance that this procedure
could declare a significant difference could be
considerably higher than 5%
The Bonferoni inequality
If N tests are preformed with significance level a.
then
P[group of N tests makes a type I error] ≤ 1 – (1 – a)N
Example:
Suppose a. = 0.05, N = 10 then
P[group of N tests makes a type I error]
≤ 1 – (0.95)10 = 0.41
For this reason we are going to consider a single test for
testing:
H 0 : m1  m2  m3    mk
against
H A : mi  m j for at least one pair i, j
Note: If k = 10, the number of pairs of means
(and hence the number of tests that would have
to be performed ) is:
10  9
10 
 45
   10 C2 
2 1
2
The F test
To test
H 0 : m1  m2  m3    mk
against H A : mi  m j for at least one pair i, j
use the test statistic
2
Between
2
Pooled
s
F
s

k
2


 ni xi  x
k 1
i 1
k


2
ni  1si   ni  k 

i 1
 i 1

where xi  mean for the ith sample.
th
si  standard deviation for the i sample
n1 x1   nk xk
x
 overall mean
n1   nk
k
k
the statistic
n x  x
i
i 1
2
i
is called the Between Sum of Squares and is
denoted by SSBetween
It measures the variability between samples
k – 1 is known as the Between degrees of
freedom and k
 n  x  x   k  1
2
i 1
i
i
is called the Between Mean Square and is
denoted by MSBetween
k
2
n

1
s
 i  i
the statistic
i 1
is called the Within Sum of Squares and is
denoted by SSWithin
k
n  k  N  k
i 1
i
is known as the Within degrees of freedom and
k
  n 1 s
i 1
i
2
i
 k

  ni  k 
 i 1

is called the Within Mean Square and is denoted
by MSWithin
then
F  MSBetween MSWithin
The Computing formula for F:
Compute
ni
1)
2)
Ti   xij  T otalfor sample i
j 1
k
k
G  Ti   xij  Grand T otal
i 1
k
3)
i 1
ni
 x
ij
i 1 j 1
k
5)
i 1 j 1
N   ni  T otalsamplesize
k
4)
ni
2
Ti

i 1 ni
2
Then
1)
3)
2
Ti G
SSBetween   
N
i 1 ni
k
2)
2
k
ni
k
2
Ti
SSWithin   xij  
i 1 j 1
i 1 ni
2
SSBetween k  1
F
SSWithin N  k 
The critical region for the F test
We reject
H 0 : m1  m2  m3    mk
if
F  Fa
Fa is the critical point under the F distribution
with n1 = k - 1degrees of freedom in the
numerator and n2 = N – k degrees of freedom in
the denominator
Example
In the following example we are comparing weight
gains resulting from the following six diets
1. Diet 1 - High Protein , Beef
2. Diet 2 - High Protein , Cereal
3. Diet 3 - High Protein , Pork
4. Diet 4 - Low protein , Beef
5. Diet 5 - Low protein , Cereal
6. Diet 6 - Low protein , Pork
Gains in weight (grams) for rats under six diets
differing in level of protein (High or Low)
and source of protein (Beef, Cereal, or Pork)
Diet
Mean
Std. Dev.
x
x2
1
73
102
118
104
81
107
100
87
117
111
100.0
15.14
1000
102062
2
98
74
56
111
95
88
82
77
86
92
85.9
15.02
859
75819
3
94
79
96
98
102
102
108
91
120
105
99.5
10.92
995
100075
4
90
76
90
64
86
51
72
90
95
78
79.2
13.89
5
107
95
97
80
98
74
74
67
89
58
83.9
15.71
792
839
64462 72613
6
49
82
73
86
81
97
106
70
61
82
78.7
16.55
787
64401
Hence
i
Ti
1
2
1000 859
3
995
4
792
k
5
839
6 Total (G )
787
5272
N   ni  T otalsamplesize  60
i 1
ni
k
 x
i 1 j 1
ij
2
 479432
Ti 2
 467846

i 1 ni
k
Thus
Ti 2 G 2
52722
SSBetween   
 467846
 4612.933
N
60
i 1 ni
2
k ni
k
Ti
2
SSWithin   xij  
 479432 467846 11586
i 1 j 1
i 1 ni
k
SSBetween k  1 4612.933/ 5 922.6
F


 4.3
SSWithin N  k  11586/ 54 214.56
F0.05  2.386 withn1  5 andn 2  54
Thus since F > 2.386 we reject H0
The ANOVA Table
A convenient method for displaying
the calculations for the F-test
Anova Table
Source
d.f.
Sum of
Squares
Between
k-1
SSBetween
Mean
Square
MSBetween
Within
N-k
SSWithin
MSWithin
Total
N-1
SSTotal
F-ratio
MSB /MSW
The Diet Example
Source
d.f.
Sum of
Squares
Between
5
Within
Total
F-ratio
4612.933
Mean
Square
922.587
54
11586.000
214.556
(p = 0.0023)
59
16198.933
4.3
Equivalence of the F-test and the t-test
when k = 2
the t-test
xy
t
1 1
s Pooled

n m
sPooled 
n  1sx2  m  1s2y
nm2
the F-test
2
Between
2
Pooled
s
F
s
k

 n x  x 
2
i
i 1
i
k
2


n

1
s
 i i
i 1
k 1
 k

  ni  k 
 i 1

n1  x1  x   n2  x2  x 
2

2
 n1  1 s12   n2  1 s22   n1  n2  2 
denominator  s
2
pooled
numerator  n1  x1  x   n2  x2  x 
2
2

n1 x1  n2 x2 

n1 x1  x   n1  x1 
n1  n2 

2
n1n2
2
x1  x2 

2
n1  n2 
2
n2 x2  x 
2

n1 x1  n2 x2 

 n2  x2 
n1  n2 

2
1 2
nn
2
x1  x2 

2
n1  n2 
2
2
nn n n
2
x1  x2 
n1 x1  x   n2 x2  x  
n1  n 
n1n2
x1  x2 2

n1  n2 
2
2

Hence
F
1
1 1
  
 n1 n2 
1
2
1 2
2
2 1
2
2
x1  x2 
2
x1  x2 
1 1 s
  
 n1 n2 
2
2
Pooled
t
2
Using SPSS
Note: The use of another statistical package
such as Minitab is similar to using SPSS
Assume the data is contained in an Excel file
Each variable is in a column
1. Weight gain (wtgn)
2. diet
3. Source of protein (Source)
4. Level of Protein (Level)
After starting the SSPS program the following
dialogue box appears:
If you select Opening an existing file and press OK the
following dialogue box appears
The following dialogue box appears:
If the variable names are in the file ask it to read the
names. If you do not specify the Range the program will
identify the Range:
Once you “click OK”, two windows will appear
One that will contain the output:
The other containing the data:
To perform ANOVA select Analyze->General Linear
Model-> Univariate
The following dialog box appears
Select the dependent variable and the fixed factors
Press OK to perform the Analysis
The Output
Tests of Between-Subjects Effects
Dependent Variable: wtgn
Source
Corrected Model
Type III Sum of
Squares
df
Mean Square
F
Sig.
4612.933(a)
5
922.587
4.300
.002
463233.067
1
463233.067
2159.036
.000
4612.933
5
922.587
4.300
.002
Error
11586.000
54
214.556
Total
479432.000
60
16198.933
59
Intercept
diet
Corrected Total
a R Squared = .285 (Adjusted R Squared = .219)
Comments
• The F-test H0: m1 = m2 = m3 = … = mk against HA: at
least one pair of means are different
• If H0 is accepted we know that all means are equal
(not significantly different)
• If H0 is rejected we conclude that at least one pair of
means is significantly different.
• The F – test gives no information to which pairs of
means are different.
• One now can use two sample t tests to determine
which pairs means are significantly different
Fishers LSD (least significant difference)
procedure:
1. Test H0: m1 = m2 = m3 = … = mk against HA:
at least one pair of means are different,
using the ANOVA F-test
2. If H0 is accepted we know that all means
are equal (not significantly different). Then
stop in this case
3. If H0 is rejected we conclude that at least
one pair of means is significantly different,
then follow this by
• using two sample t tests to determine which pairs
means are significantly different
Example
In the following example we are comparing weight
gains resulting from the following six diets
1. Diet 1 - High Protein , Beef
2. Diet 2 - High Protein , Cereal
3. Diet 3 - High Protein , Pork
4. Diet 4 - Low protein , Beef
5. Diet 5 - Low protein , Cereal
6. Diet 6 - Low protein , Pork
Gains in weight (grams) for rats under six diets
differing in level of protein (High or Low)
and source of protein (Beef, Cereal, or Pork)
Diet
Mean
Std. Dev.
x
x2
1
73
102
118
104
81
107
100
87
117
111
100.0
15.14
1000
102062
2
98
74
56
111
95
88
82
77
86
92
85.9
15.02
859
75819
3
94
79
96
98
102
102
108
91
120
105
99.5
10.92
995
100075
4
90
76
90
64
86
51
72
90
95
78
79.2
13.89
5
107
95
97
80
98
74
74
67
89
58
83.9
15.71
792
839
64462 72613
6
49
82
73
86
81
97
106
70
61
82
78.7
16.55
787
64401
Hence
i
Ti
1
2
1000 859
3
995
4
792
k
5
839
6 Total (G )
787
5272
N   ni  T otalsamplesize  60
i 1
ni
k
 x
i 1 j 1
ij
2
 479432
Ti 2
 467846

i 1 ni
k
Thus
Ti 2 G 2
52722
SSBetween   
 467846
 4612.933
N
60
i 1 ni
2
k ni
k
Ti
2
SSWithin   xij  
 479432 467846 11586
i 1 j 1
i 1 ni
k
The ANOVA Table
Source
d.f.
Sum of
Squares
Between
5
Within
Total
F-ratio
4612.933
Mean
Square
922.587
54
11586.000
214.556
(p = 0.0023)
59
16198.933
4.3
F0.05  2.386 withn1  5 andn 2  54
Thus since F > 2.386 we reject H0
Conclusion: There are significant differences
amongst the k = 6 means
Now we want to perform t tests to compare the
k = 6 means
t
xi  x j
s pooled
1 1

ni n j
with t0.025 = 2.005 for 54 d.f.
s pooled  MSwithin
Table of means
Level
Source
Diet
Mean
Beef
1
100.0
t test results
value tabled is t 
High
Cereal
2
85.9
Pork
3
99.5
xi  x j
s pooled
i
2
3
4
5
6
1 1

ni n j
Beef
4
79.2
Low
Cereal
5
83.9
Pork
6
78.7
where s pooled  MS within
1 vs i
2 vs i
3 vs i
4 vs i
5 vs i
2.152
0.076
3.175
2.458
3.252
-2.076
1.023
0.305
1.099
3.099
2.381
3.175
-0.717
0.076
0.794
Critical value t0.025 = 2.005 for 54 d.f.
t values that are significant are indicated in bold.
i
2
3
4
5
6
1 vs i
2 vs i
3 vs i
4 vs i
5 vs i
2.152
0.076
3.175
2.458
3.252
-2.076
1.023
0.305
1.099
3.099
2.381
3.175
-0.717
0.076
0.794
Conclusions:
1. There is no significant difference between diet 1 (high
protein, pork) and diet 3 (high protein, pork).
2. There are no significant differences amongst diets 2, 4, 5 and
6. (i. e. high protein, cereal (diet 2) and the low protein diets
(diets 4, 5 and 6)).
3. There are significant differences between diets 1and 3 (high
protein, meat) and the other diets (2, 4, 5, and 6).
Major conclusion: High protein diets result in a higher weight
gain but only if the source of protein is a meat source.
These are similar conclusions to those made
using exploratory techniques
– Examining box-plots
Box Plots: Weight Gains for Six Diets
130
High Protein
120
Low Protein
110
Weight Gain
100
90
80
70
60
50
Beef
Cereal
Pork
Beef
2
3
4
Cereal
Pork
40
1
Diet
5
6
Non-Outlier Max
Non-Outlier Min
Median; 75%
25%
Conclusions
• Weight gain is higher for the high protein meat
diets
• Increasing the level of protein - increases
weight gain but only if source of protein is a
meat source
The carrying out of the F-test and Fisher’s
LSD ensures the significance of the
conclusions. Differences observed exploratory
methods could have occurred by chance.
Comparing k Populations
Proportions
The c2 test for independence
The two sample test for proportions
pˆ 1  pˆ 2
test statistic z 
1 1
pˆ 1  pˆ   
 n1 n2 
x1
x2
x1  x2
pˆ1  , pˆ1 
and pˆ 
n1
n2
n1  n2
The data can be displayed in the following table:
population
1
2
Total
Success
x1
x2
x1 + x2
Failure
n1 - x2
n2 - x2
n1 + n2(x1 + x2)
Total
n1
n2
n1 + n2
This problem can be extended in two ways:
1. Increasing the populations (columns) from 2 to
k (or c)
2. Increasing the number of categories (rows) from
2 to r.
1
2
c
Total
1
x11
x12
R1
2
x21
x22
R2
Rr
Total
C1
C2
Cc
N
The c2 test for independence
Situation
•
•
•
•
We have two categorical variables R and C.
The number of categories of R is r.
The number of categories of C is c.
We observe n subjects from the population and
count xij = the number of subjects for which R
= i and C = j.
• R = rows, C = columns
Example
Both Systolic Blood pressure (C) and Serum
Cholesterol (R) were meansured for a sample of
n = 1237 subjects.
The categories for Blood Pressure are:
<126 127-146 147-166
167+
The categories for Cholesterol are:
<200 200-219 220-259
260+
Table: two-way frequency
Serum
Cholesterol
<200
200-219
220-259
260+
Total
<127
117
85
119
67
388
Systolic Blood pressure
127-146
147-166
121
47
98
43
209
68
99
46
527
204
167+
22
20
43
33
118
Total
307
246
439
245
1237
The c2 test for independence
Define
c
Ri   xij  i th row T otal
j 1
c
Ci   xij  j
th
columnT otal
i 1
Eij 
Ri C j
n
= Expected frequency in the (i,j) th cell in
the case of independence.
Justification - for Eij = (RiCj)/n in the case of
independence
Let pij = P[R = i, C = j] = P[R = i] P[C = j] = rigj
in the case of independence
Eij  np ij  nrig j  nrˆigˆ j
 Ri  C j
 n 
 n  n
 Ri C j
 
n

= Expected frequency in the (i,j) th cell in
the case of independence.
Then to test
H0: R and C are independent
against
HA: R and C are not independent
Use test statistic
r
c
c  
2
i 1 j 1
x
ij
 Eij 
2
Eij
Eij= Expected frequency in the (i,j) th cell 
in the case of independence.
xij= observed frequency in the (i,j) th cell
Ri C j
n
Sampling distribution of test statistic when H0 is
true
r
c
c  
2
x
ij
 Eij 
2
Eij
i 1 j 1
- c2 distribution with degrees of
freedom n = (r - 1)(c - 1)
Critical and Acceptance Region
Reject H0 if :
c 2  ca
Accept H0 if :
c  ca
2

Table
Expected frequencies, Observed frequencies, Standardized Residuals
Serum
Cholesterol
<200
200-219
220-259
260+
Total
c2 = 20.85
<127
96.29
(117)
2.11
77.16
(85)
0.86
137.70
(119)
-1.59
76.85
(67)
-1.12
388
Systolic Blood pressure
127-146
147-166
130.79
50.63
(121)
(47)
-0.86
-0.51
104.80
40.47
(98)
(43)
-0.66
0.38
187.03
72.40
(209)
(68)
1.61
-0.52
104.38
40.04
(99)
(46)
-0.53
0.88
527
204
167+
29.29
(22)
-1.35
23.47
(20)
-0.72
41.88
(43)
0.17
23.37
(33)
1.99
118
Total
307
246
439
245
1237
Standardized residuals

x

ij
rij
Test statistic
r
c
c 2  
 Eij 
Eij
x
i 1 j 1
ij  Eij 
2
Eij
r
c
 rij2  20.85
i 1 j 1
degrees of freedom n = (r - 1)(c - 1) = 9
c0.05  16.919
Reject H0 using a = 0.05
Another Example
This data comes from a Globe and Mail study
examining the attitudes of the baby boomers.
Data was collected on various age groups
Age group
Echo (Age 20 – 29)
Gen X (Age 30 – 39)
Younger Boomers (Age 40 – 49)
Older Boomers (Age 50 – 59)
Pre Boomers (Age 60+)
Total
Total
398
342
378
286
445
1849
One question with responses
In an average week, how many times would you drink alcohol?
never
once
twice
three or
four
times
Echo (Age 20 – 29)
Gen X (Age 30 – 39)
Younger Boomers (Age 40 – 49)
Older Boomers (Age 50 – 59)
Pre Boomers (Age 60+)
115
130
136
109
218
135
123
87
74
80
64
38
64
40
45
48
31
57
43
40
36
20
34
20
62
398
342
378
286
445
Total
708
499
251
219
172
1849
Age group
Are there differences in weekly consumption of
alcohol related to age?
five
more
times
Total
Table: Expected frequencies
three or
four five more
times
Total
times
Age group
never
once
twice
Echo (Age 20 – 29)
Gen X (Age 30 – 39)
Younger Boomers (Age 40 – 49)
Older Boomers (Age 50 – 59)
Pre Boomers (Age 60+)
152.40
130.96
144.74
109.51
170.39
107.41
92.30
102.01
77.18
120.09
54.03
46.43
51.31
38.82
60.41
47.14
40.51
44.77
33.87
52.71
37.02
31.81
35.16
26.60
41.40
398
342
378
286
445
708
499
251
219
172
1849
Total
Table: Residuals
rij

x

ij
 Eij 
Eij
Age group
never
once
twice
three or
four
times
Echo (Age 20 – 29)
Gen X (Age 30 – 39)
Younger Boomers (Age 40 – 49)
Older Boomers (Age 50 – 59)
Pre Boomers (Age 60+)
-3.029
-0.083
-0.726
-0.049
3.647
2.662
3.196
-1.486
-0.362
-3.659
1.357
-1.237
1.771
0.189
-1.982
0.125
-1.494
1.828
1.568
-1.750
r
c
c  
2
i 1 j 1
x
ij
 Eij 
Eij
five
more
times
-0.168
-2.095
-0.196
-1.280
3.203
2
r
c
 rij2  93.97
i 1 j 1
2
c.05
 26.296 for  4 4  16 d. f
Conclusion: There is a significant relationship between
age group and weekly alcohol use
Examining the Residuals allows one to identify the cells
that indicate a departure from independence
Age group
never
once
twice
three or
four
times
Echo (Age 20 – 29)
Gen X (Age 30 – 39)
Younger Boomers (Age 40 – 49)
Older Boomers (Age 50 – 59)
Pre Boomers (Age 60+)
-3.029
-0.083
-0.726
-0.049
3.647
2.662
3.196
-1.486
-0.362
-3.659
1.357
-1.237
1.771
0.189
-1.982
0.125
-1.494
1.828
1.568
-1.750
five
more
times
-0.168
-2.095
-0.196
-1.280
3.203
• Large positive residuals indicate cells where the observed
frequencies were larger than expected if independent
Large negative residuals indicate cells where the observed
frequencies were smaller than expected if independent
Another question with responses
In an average week, how many times would you surf the
internet?
5 to 9
times
10 or
more
times
Age group
never
1 to 4
times
Echo (Age 20 – 29)
Gen X (Age 30 – 39)
Younger Boomers (Age 40 – 49)
Older Boomers (Age 50 – 59)
Pre Boomers (Age 60+)
48
51
79
92
276
72
82
128
63
71
100
92
76
57
67
178
117
95
74
31
398
342
378
286
445
Total
546
416
392
495
1849
Total
Are there differences in weekly internet use related to
age?
Table: Expected frequencies
5 to 9
times
10 or
more
times
Age group
never
1 to 4
times
Echo (Age 20 – 29)
Gen X (Age 30 – 39)
Younger Boomers (Age 40 – 49)
Older Boomers (Age 50 – 59)
Pre Boomers (Age 60+)
117.53
100.99
111.62
84.45
131.41
89.54
76.95
85.04
64.35
100.12
84.38
72.51
80.14
60.63
94.34
106.55
91.56
101.20
76.57
119.13
398
342
378
286
445
Total
546
416
392
495
1849
Total
Table: Residuals
rij

x

ij
 Eij 
Eij
Age group
never
1 to 4
times
Echo (Age 20 – 29)
Gen X (Age 30 – 39)
Younger Boomers (Age 40 – 49)
Older Boomers (Age 50 – 59)
Pre Boomers (Age 60+)
-6.41
-4.97
-3.09
0.82
12.61
-1.85
0.58
4.66
-0.17
-2.91
r
c
c  
2
i 1 j 1
x
ij
 Eij 
Eij
5 to 9
times
10 or
more
times
1.70
2.29
-0.46
-0.47
-2.82
6.92
2.66
-0.62
-0.29
-8.07
2
r
c
 rij2  406.29
i 1 j 1
2
c.05
 21.03 for  43  12 d. f
Conclusion: There is a significant relationship between
age group and weekly internet use
Echo (Age 20 – 29)
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
never
1 to 4 times
5 to 9 times
10 or more times
Gen X (Age 30 – 39)
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
never
1 to 4 times
5 to 9 times
10 or more times
Younger Boomers (Age 40 – 49)
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
never
1 to 4 times
5 to 9 times
10 or more times
Older Boomers (Age 50 – 59)
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
never
1 to 4 times
5 to 9 times
10 or more times
Pre Boomers (Age 60+)
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
never
1 to 4 times
5 to 9 times
10 or more times
Regressions and Correlation
Estimation by confidence intervals,
Hypothesis Testing