3680 Lecture 13

Download Report

Transcript 3680 Lecture 13

Math 3680
Lecture #13
Hypothesis Testing:
The z Test
The One-Sided z Test
Example: Before a design artist was hired to
improve its entrance, an average of 3218 people
entered a department store daily, with an SD of 287.
Since the entrance was redesigned, a simple random
sample of 42 days has been studied. The results are
shown on the next slide.
3767
3678
3456
3596
3449
3358
3341
3445
3556
3216
3309
3716
3667
3100
3154
3171
3379
3405
3310
3780
3212
3220
3197
3715
3034
3420
3082
3482
3039
3630
3525
3193
3338
3125
3186
3378
3695
3470
3506
3070
3526
3558
Average =
Is this statistically significant for indicating that the
average number of people entering the store daily has
increased?
Note: Don’t compare the difference
3392 - 3218 = 174
with the population standard deviation 287.
The former applies to an average, while the latter is
for individual days.
There are two possibilities:
• The average number of people entering has not
changed. The observed sample average of 3,392
can be reasonably attributed to chance
fluctuations.
• The difference between the observed average and
the expected average is too large to be simply
chance. The average number of people entering
has increased with the improved entrance.
Definition: Null Hypothesis. The first
hypothesis, which asserts simple chance
fluctuations, is called the null hypothesis.
Definition: Alternative Hypothesis. The
second hypothesis, which asserts that the
average has in fact increased, is called the
alternative hypothesis.
The null hypothesis is the default assumption.
This is the assumption to be disproved.
For example, if the sample average were 3219 per
day, that would hardly be convincing evidence.
However, if the new sample average were 5000
per day, we can be confident of the lure of the
new storefront – and rule out simple chance. So
where is the “cut-off” value?
Solution.
• H0: m = 3218
(The average number of customers entering the store has not
changed due to the improved storefront)
• Ha: m > 3218
(The average number of customers entering the store has
increased due to the improved storefront)
• We choose a = 0.05
Before continuing, why isn’t Ha written as m 
Assuming H0, we have a sample of 42 days which are
being drawn from a box with m = 3218 and s = 287.
The average has the following moments:
E ( X )  m  3218
SD ( X ) 
s
n

287
42
• Test statistic:
zs 
X m
s/ n

3392  3218
287 /
 3.929
42
• P-value. Assuming H0, we must find the chance of obtaining a test
statistic at least this extreme. For this problem, that means
P( Z  z s )
 P ( Z  3.929)
 4.265  10
5
• Conclusion: We reject the
null hypothesis. There is
good reason to believe that
the average number of
customers has increased
after the redesign.
3.929
Excel: Use the command
=ZTEST(A1:D11, 3218, 287)
3767 3445
3678 3556
3456 3216
3596 3309
3449 3716
3358 3667
3341 3100
3039 3525
3630 3193
3378 3470
3695 3506
4.362E-05
3154
3171
3379
3405
3310
3780
3212
3338
3125
3070
3526
3220
3197
3715
3034
3420
3082
3482
3558
3186
Observations:
1. This test of significance is called the z-test,
named after the test statistic.
2. The z-test is best used with large samples –
so that the normal approximation may be
safely made.
3. Notice we have not proven beyond a
shadow of a doubt that the new storefront was
effective in increasing the number of patrons.
4. The alternative hypothesis is that the daily
average of patrons is greater than 3218. It is not
that the new average is exactly equal to 3392. In
other words, the alternative hypothesis was a
compound hypothesis, not a simple hypothesis.
5. Small values of P are evidence against the
null hypothesis; they indicate that something
besides chance is at work.
6. We are NOT saying that there is 1 chance in
20,000 for the null hypothesis to be correct.
Another (equivalent) procedure for hypothesis testing:
In the previous problem, if the test statistic was any number
greater than 1.645, then we would have obtained a P-value
less than 0.05, the specified a. (Why?)
We call zc = 1.645
the critical value,
and the interval
(1.645, ) is called
the rejection region.
Since zs lies in the
rejection region, we
choose to reject H0.
5%
1.645
In terms of the customers, we have the critical value
xc  3218  zcs X  3218  (1.645)
287
 3290.8
42
5%
1.645
Another (equivalent) procedure for
hypothesis testing:
Hypothesis testing may be correctly conducted by
using the P-value (the first method) or by using the
critical value (as we just discussed).
In scientific articles, both are usually reported, even
though the two methods are logically equivalent.
As we now discuss, the critical value also eases
computation of the power of the test.
Example: In the previous example, suppose that the
redesign increased the average number of patrons by
100, from 3218 to 3318. How likely is it that a sample
of only 42 days will come to the correct conclusion
(by rejecting the null hypothesis)?
Note: Recall that this is called the power of the test.
Solution: recall the critical value: xc= 3290.8
P( Reject H0 | m = 3318)
 P ( X  3290.8 | m  3318)
 X  m 3290.8  3318 

 P

 s

287 / 42 
X

 P ( Z  0.6142)
 0.7305
Alternative
distribution
Null
distribution
73.05%
5%
3218
3291
3318
Power of the test (1-β) as a function of the true average
1
0.8
0.6
0.4
0.2
3225
3250
3275
3300
3325
3350
3375
3400
Example: The average braking distance from 60 mph of a
Mercury Sable is 159 feet, with an SD of 23.5 feet. Sables
equipped with (hopefully) improved tires have just undergone
early testing; the results of the first 45 tests are shown below.
Does this indicate that the new tires have decreased the
braking distance? Use a = 0.05.
139.6
150.9
151.1
134.9
120.9
170.4
157.1
159.1
150.8
165.0
127.9
175.4
120.1
178.8
164.4
157.7
174.2
143.2
179.2
183.0
172.4
136.4
143.6
126.2
175.2
175.4
171.1
171.8
125.5
175.5
182.0
147.3
151.0
159.4
167.7
157.6
137.8
146.0
132.2
163.2
125.8
167.6
120.3
118.2
180.0
Example: The Compute the probability b of
committing a Type II error if the braking distance
with the improved tires is now 155 feet.
Conceptual Questions:
1) We made a test of significance because
(choose one)
i) We knew what was in the box, but did not
know how the sample would turn out; or
ii) We knew how the sample turned out, but
did not know what was in the box.
Conceptual Questions:
2) The null hypothesis says that the average
of the (sample / box) is 159 feet.
Conceptual Questions:
3) True or False:
a) The observed significance level of 8%
depends on the data (i.e. sample)
b) There are 92 chances out of 100 for the
alternative hypothesis to be correct.
Conceptual Questions:
4) Suppose only 10 tests were performed
instead of 45. Should we use the normal
curve to compute P?
Conceptual Questions:
5) True or False:
a) A “highly statistically significant” result
cannot possibly be due to chance.
b) If a sample difference is “highly
statistically significant,” there is less than a
1% chance for the null hypothesis to be
correct.
Conceptual Questions:
6) True or False:
a) If P  43% , then the null hypothesis looks
plausible.
b) If P  0.43% , then the null hypothesis looks
implausible.
The Two-Sided z Test
Example: A company claims to have designed a
new fishing line that has a mean breaking strength of
8 kg with an SD of 0.5 kg. Consumer Reports tests a
random sample of 45 lines; the results are shown
below. Test the validity of the company’s claim.
7.56
7.96
7.36
7.81
7.84
7.93
7.72
7.50
7.33
7.72
7.36
7.84
8.06
7.87
8.17
8.19
7.56
7.51
8.15
7.40
7.48
8.05
7.40
8.23
7.72
7.82
8.16
8.21
7.95
7.34
7.43
7.45
8.18
7.72
7.74
8.24
7.27
7.81
7.45
7.93
7.44
7.33
8.10
7.59
7.54
Notes:
•To avoid data snooping, we must use a two-tailed
test. Before the tests were actually performed, we had
no a priori reason to think that the sample average
would return either too high or too low.
• For a two-sided alternative hypothesis, the P-value
is twice as large as for a one-sided alternative.
Example. Let’s take a look at the results of the Salk
vaccine trial, which we first saw back in Lecture #2:
Treatment
Control
Total
Number
200745
201229
401974
Polio Cases
57
142
199
Does it appear that the vaccine was effective?
Note: If the vaccine was ineffective, then we would
expect the 199 polio cases to be distributed with
p = 200745/401974 = 0.499398,
and the 57 polio cases among the treated was just due
to a run of luck.