tests of goodness of fit and independencec

Transcript tests of goodness of fit and independencec

Slides by
JOHN
LOUCKS
St. Edward’s
University
© 2008 Thomson South-Western. All Rights Reserved
Slide 1
Chapter 12
Tests of Goodness of Fit and Independence
 Goodness of Fit Test: A Multinomial Population
 Test of Independence
 Goodness of Fit Test: Poisson
and Normal Distributions
© 2008 Thomson South-Western. All Rights Reserved
Slide 2
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial Population
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the observed
frequency, fi , for each of the k categories.
3. Assuming H0 is true, compute the expected
frequency, ei , in each category by multiplying the
category probability by the sample size.
© 2008 Thomson South-Western. All Rights Reserved
Slide 3
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial Population
4. Compute the value of the test statistic.
2
(
f

e
)
2   i i
ei
i 1
k
where:
fi = observed frequency for category i
ei = expected frequency for category i
k = number of categories
Note: The test statistic has a chi-square distribution
with k – 1 df provided that the expected frequencies
are 5 or more for all categories.
© 2008 Thomson South-Western. All Rights Reserved
Slide 4
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial Population
5. Rejection rule:
p-value approach:
Reject H0 if p-value < 
Critical value approach:
Reject H0 if
 2  2
where  is the significance level and
there are k - 1 degrees of freedom
© 2008 Thomson South-Western. All Rights Reserved
Slide 5
Multinomial Distribution Goodness of Fit Test

Example: Finger Lakes Homes (A)
Finger Lakes Homes manufactures
four models of prefabricated homes,
a two-story colonial, a log cabin, a
split-level, and an A-frame. To help
in production planning, management
would like to determine if previous
customer purchases indicate that there
is a preference in the style selected.
© 2008 Thomson South-Western. All Rights Reserved
Slide 6
Multinomial Distribution Goodness of Fit Test

Example: Finger Lakes Homes (A)
The number of homes sold of each
model for 100 sales over the past two
years is shown below.
SplitAModel Colonial Log Level Frame
# Sold
30
20
35
15
© 2008 Thomson South-Western. All Rights Reserved
Slide 7
Multinomial Distribution Goodness of Fit Test

Hypotheses
H0: pC = pL = pS = pA = .25
Ha: The population proportions are not
pC = .25, pL = .25, pS = .25, and pA = .25
where:
pC = population proportion that purchase a colonial
pL = population proportion that purchase a log cabin
pS = population proportion that purchase a split-level
pA = population proportion that purchase an A-frame
© 2008 Thomson South-Western. All Rights Reserved
Slide 8
Multinomial Distribution Goodness of Fit Test

Rejection Rule
Reject H0 if p-value < .05 or 2 > 7.815.
With  = .05 and
k-1=4-1=3
degrees of freedom
Do Not Reject H0
Reject H0
7.815
© 2008 Thomson South-Western. All Rights Reserved
2
Slide 9
Multinomial Distribution Goodness of Fit Test

Expected Frequencies
e1 = .25(100) = 25
e3 = .25(100) = 25

e2 = .25(100) = 25
e4 = .25(100) = 25
Test Statistic
2
2
2
2
(
30

25
)
(
20

25
)
(
35

25
)
(
15

25
)
2 



25
25
25
25
=1+1+4+4
= 10
© 2008 Thomson South-Western. All Rights Reserved
Slide 10
Multinomial Distribution Goodness of Fit Test

Conclusion Using the p-Value Approach
Area in Upper Tail
.10
.05
.025
.01
.005
2 Value (df = 3)
6.251 7.815 9.348 11.345 12.838
Because 2 = 10 is between 9.348 and 11.345, the
area in the upper tail of the distribution is between
.025 and .01.
The p-value <  . We can reject the null hypothesis.
Note: A precise p-value can be found using
Minitab or Excel.
© 2008 Thomson South-Western. All Rights Reserved
Slide 11
Multinomial Distribution Goodness of Fit Test

Conclusion Using the Critical Value Approach
2 = 10 > 7.815
We reject, at the .05 level of significance,
the assumption that there is no home style
preference.
© 2008 Thomson South-Western. All Rights Reserved
Slide 12
Test of Independence: Contingency Tables
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the observed
frequency, fij , for each cell of the contingency table.
3. Compute the expected frequency, eij , for each cell.
(Row i Total)(Column j Total)
eij 
Sample Size
© 2008 Thomson South-Western. All Rights Reserved
Slide 13
Test of Independence: Contingency Tables
4. Compute the test statistic.
2   
i
j
( f ij  eij ) 2
eij
5. Determine the rejection rule.
2
2
Reject H0 if p -value <  or   .
where  is the significance level and,
with n rows and m columns, there are
(n - 1)(m - 1) degrees of freedom.
© 2008 Thomson South-Western. All Rights Reserved
Slide 14
Contingency Table (Independence) Test

Example: Finger Lakes Homes (B)
Each home sold by Finger Lakes
Homes can be classified according to
price and to style. Finger Lakes’
manager would like to determine if
the price of the home and the style of
the home are independent variables.
© 2008 Thomson South-Western. All Rights Reserved
Slide 15
Contingency Table (Independence) Test

Example: Finger Lakes Homes (B)
The number of homes sold for
each model and price for the past two
years is shown below. For convenience,
the price of the home is listed as either
$99,000 or less or more than $99,000.
Price Colonial
< $99,000
18
> $99,000
12
Log
6
14
Split-Level
19
16
© 2008 Thomson South-Western. All Rights Reserved
A-Frame
12
3
Slide 16
Contingency Table (Independence) Test

Hypotheses
H0: Price of the home is independent of the
style of the home that is purchased
Ha: Price of the home is not independent of the
style of the home that is purchased
© 2008 Thomson South-Western. All Rights Reserved
Slide 17
Contingency Table (Independence) Test

Expected Frequencies
Price
Colonial
Log
Split-Level A-Frame
Total
< $99K
18
6
19
12
55
> $99K
12
14
16
3
45
Total
30
20
35
15
100
© 2008 Thomson South-Western. All Rights Reserved
Slide 18
Contingency Table (Independence) Test

Rejection Rule
2
With  = .05 and (2 - 1)(4 - 1) = 3 d.f., .05
 7.815
Reject H0 if p-value < .05 or 2 > 7.815

Test Statistic
2
2
2
(
18

16
.
5
)
(
6

11
)
(
3

6
.
75
)
2 

 ... 
16.5
11
6. 75
= .1364 + 2.2727 + . . . + 2.0833 =
© 2008 Thomson South-Western. All Rights Reserved
9.149
Slide 19
Contingency Table (Independence) Test

Conclusion Using the p-Value Approach
Area in Upper Tail
.10
.05
.025
.01
.005
2 Value (df = 3)
6.251 7.815 9.348 11.345 12.838
Because 2 = 9.145 is between 7.815 and 9.348, the
area in the upper tail of the distribution is between
.05 and .025.
The p-value <  . We can reject the null hypothesis.
Note: A precise p-value can be found using
Minitab or Excel.
© 2008 Thomson South-Western. All Rights Reserved
Slide 20
Contingency Table (Independence) Test

Conclusion Using the Critical Value Approach
2 = 9.145 > 7.815
We reject, at the .05 level of significance,
the assumption that the price of the home is
independent of the style of home that is
purchased.
© 2008 Thomson South-Western. All Rights Reserved
Slide 21
Goodness of Fit Test: Poisson Distribution
1. Set up the null and alternative hypotheses.
H0: Population has a Poisson probability distribution
Ha: Population does not have a Poisson distribution
2. Select a random sample and
a. Record the observed frequency fi for each value of
the Poisson random variable.
b. Compute the mean number of occurrences .
3. Compute the expected frequency of occurrences ei
for each value of the Poisson random variable.
© 2008 Thomson South-Western. All Rights Reserved
Slide 22
Goodness of Fit Test: Poisson Distribution
4. Compute the value of the test statistic.
2
(
f

e
)
2   i i
ei
i 1
k
where:
fi = observed frequency for category i
ei = expected frequency for category i
k = number of categories
© 2008 Thomson South-Western. All Rights Reserved
Slide 23
Goodness of Fit Test: Poisson Distribution
5. Rejection rule:
p-value approach:
Reject H0 if p-value < 
Critical value approach:
Reject H0 if
 2  2
where  is the significance level and
there are k - 2 degrees of freedom
© 2008 Thomson South-Western. All Rights Reserved
Slide 24
Goodness of Fit Test: Poisson Distribution

Example: Troy Parking Garage
In studying the need for an
additional entrance to a city
parking garage, a consultant
has recommended an analysis
approach that is applicable
only in situations where the number of cars
entering during a specified time period follows a
Poisson distribution.
© 2008 Thomson South-Western. All Rights Reserved
Slide 25
Goodness of Fit Test: Poisson Distribution

Example: Troy Parking Garage
A random sample of 100 oneminute time intervals resulted
in the customer arrivals listed
below. A statistical test must
be conducted to see if the
assumption of a Poisson distribution is reasonable.
# Arrivals 0
1
2
3
4
5
6
7
8
Frequency 0
1
4 10 14 20 12 12 9
© 2008 Thomson South-Western. All Rights Reserved
9 10 11 12
8
6
3
1
Slide 26
Goodness of Fit Test: Poisson Distribution

Hypotheses
H0: Number of cars entering the garage during
a one-minute interval is Poisson distributed
Ha: Number of cars entering the garage during a
one-minute interval is not Poisson distributed
© 2008 Thomson South-Western. All Rights Reserved
Slide 27
Goodness of Fit Test: Poisson Distribution

Estimate of Poisson Probability Function
otal Arrivals = 0(0) + 1(1) + 2(4) + . . . + 12(1) = 600
Estimate of  = 600/100 = 6
Total Time Periods = 100
Hence,
6 x e 6
f ( x) 
x!
© 2008 Thomson South-Western. All Rights Reserved
Slide 28
Goodness of Fit Test: Poisson Distribution

Expected Frequencies
x
f (x )
nf (x )
x
0
1
2
3
4
5
6
.0025
.0149
.0446
.0892
.1339
.1606
.1606
.25
1.49
4.46
8.92
13.39
16.06
16.06
7
8
9
10
11
12+
Total
f (x )
nf (x )
.1377
.1033
.0688
.0413
.0225
.0201
1.0000
13.77
10.33
6.88
4.13
2.25
2.01
100.00
© 2008 Thomson South-Western. All Rights Reserved
Slide 29
Goodness of Fit Test: Poisson Distribution

Observed and Expected Frequencies
i
fi
ei
f i - ei
0 or 1 or 2
3
4
5
6
7
8
9
10 or more
5
10
14
20
12
12
9
8
10
6.20
8.92
13.39
16.06
16.06
13.77
10.33
6.88
8.39
-1.20
1.08
0.61
3.94
-4.06
-1.77
-1.33
1.12
1.61
© 2008 Thomson South-Western. All Rights Reserved
Slide 30
Goodness of Fit Test: Poisson Distribution

Rejection Rule
With  = .05 and k - p - 1 = 9 - 1 - 1 = 7 d.f.
(where k = number of categories and p = number
2
of population parameters estimated), .05
 14.067
Reject H0 if p-value < .05 or 2 > 14.067.

Test Statistic
2
2
2
(

1.20)
(1.08)
(1.61)
2 

 ... 
 3.268
6.20
8.92
8.39
© 2008 Thomson South-Western. All Rights Reserved
Slide 31
Goodness of Fit Test: Poisson Distribution

Conclusion Using the p-Value Approach
Area in Upper Tail
.90
.10
.05
.025
.01
2 Value (df = 7)
2.833 12.017 14.067 16.013 18.475
Because 2 = 3.268 is between 2.833 and 12.017 in the
Chi-Square Distribution Table, the area in the upper tail
of the distribution is between .90 and .10.
The p-value >  . We cannot reject the null hypothesis.
There is no reason to doubt the assumption of a Poisson
distribution.
Note: A precise p-value can be found
using Minitab or Excel.
© 2008 Thomson South-Western. All Rights Reserved
Slide 32
Goodness of Fit Test: Normal Distribution
1. Set up the null and alternative hypotheses.
2. Select a random sample and
a. Compute the mean and standard deviation.
b. Define intervals of values so that the expected
frequency is at least 5 for each interval.
c. For each interval record the observed frequencies
3. Compute the expected frequency, ei , for each interval.
© 2008 Thomson South-Western. All Rights Reserved
Slide 33
Goodness of Fit Test: Normal Distribution
4. Compute the value of the test statistic.
2
(
f

e
)
2   i i
ei
i 1
k
5. Reject H0 if  2  2 (where  is the significance level
and there are k - 3 degrees of freedom).
© 2008 Thomson South-Western. All Rights Reserved
Slide 34
Normal Distribution Goodness of Fit Test

Example: IQ Computers
IQ Computers (one better than HP?)
IQ
manufactures and sells a general
purpose microcomputer. As part of
a study to evaluate sales personnel, management
wants to determine, at a .05 significance level, if the
annual sales volume (number of units sold by a
salesperson) follows a normal probability distribution.
© 2008 Thomson South-Western. All Rights Reserved
Slide 35
Normal Distribution Goodness of Fit Test

Example: IQ Computers
A simple random sample of 30 of
the salespeople was taken and their
numbers of units sold are below.
33
64
83
43
65
84
44
66
85
45
68
86
52
70
91
52
72
92
56
73
94
IQ
58 63 64
73 74 75
98 102 105
(mean = 71, standard deviation = 18.54)
© 2008 Thomson South-Western. All Rights Reserved
Slide 36
Normal Distribution Goodness of Fit Test

Hypotheses
H0: The population of number of units sold
has a normal distribution with mean 71
and standard deviation 18.54.
Ha: The population of number of units sold
does not have a normal distribution with
mean 71 and standard deviation 18.54.
© 2008 Thomson South-Western. All Rights Reserved
Slide 37
Normal Distribution Goodness of Fit Test

Interval Definition
To satisfy the requirement of an expected
frequency of at least 5 in each interval we will
divide the normal distribution into 30/5 = 6
equal probability intervals.
© 2008 Thomson South-Western. All Rights Reserved
Slide 38
Normal Distribution Goodness of Fit Test

Interval Definition
Areas
= 1.00/6
= .1667
53.02
71
88.98 = 71 + .97(18.54)
71  .43(18.54) = 63.03 78.97
© 2008 Thomson South-Western. All Rights Reserved
Slide 39
Normal Distribution Goodness of Fit Test

Observed and Expected Frequencies
i
fi
ei
f i - ei
Less than 53.02
53.02 to 63.03
63.03 to 71.00
71.00 to 78.97
78.97 to 88.98
More than 88.98
Total
6
3
6
5
4
6
30
5
5
5
5
5
5
30
1
-2
1
0
-1
1
© 2008 Thomson South-Western. All Rights Reserved
Slide 40
Normal Distribution Goodness of Fit Test

Rejection Rule
With  = .05 and k - p - 1 = 6 - 2 - 1 = 3 d.f.
(where k = number of categories and p = number
2
of population parameters estimated), .05
 7.815
Reject H0 if p-value < .05 or 2 > 7.815.

Test Statistic
2
2
2
2
2
2
(1)
(

2)
(1)
(0)
(

1)
(1)
2 





 1.600
5
5
5
5
5
5
© 2008 Thomson South-Western. All Rights Reserved
Slide 41
Normal Distribution Goodness of Fit Test

Conclusion Using the p-Value Approach
Area in Upper Tail
2 Value (df = 3)
.90 .10
.05
.584 6.251 7.815
.025
.01
9.348 11.345
Because 2 = 1.600 is between .584 and 6.251 in the
Chi-Square Distribution Table, the area in the upper tail
of the distribution is between .90 and .10.
The p-value >  . We cannot reject the null hypothesis.
There is little evidence to support rejecting the
assumption the population is normally distributed with
 = 71 and  = 18.54.
A precise p-value can be found
using Minitab or Excel.
© 2008 Thomson South-Western. All Rights Reserved
Slide 42
End of Chapter 12
© 2008 Thomson South-Western. All Rights Reserved
Slide 43