Introduction to Hypothesis Testing

Download Report

Transcript Introduction to Hypothesis Testing

Lecture 3
•
•
•
•
Miscellaneous details about hypothesis testing
Type II error
Practical significance vs. statistical significance
Chapter 12.2: Inference about mean when s.d. is
unknown.
1
Relation between p-value and rejection region
methods
• Compare the p-value to a. Reject the null hypothesis only if pvalue <a
• Ex. 11.1:
0.14
Estim Mean177.9965
Hypoth Mean 170
T Ratio2.3392873316
P Value 0.0099065919
0.12
0.10
Y
0.08
0.06
0.04
0.02
0.00
150
160
170
180
190
X
Sample Size= 400
2
Null Hypothesis in One-Sided Test
• We start by defining H1 because this is the focus
of our test.
• Example 11.1: H1: m > 170
• The null hypothesis H0 :m  170 is more
logically satisfying than H0 :m  170
• However, only the parameter value in H0 that is
closest to H1 influences the form of the test.
• We therefore take H0 :m  170 for simplicity.
3
Calculating the Probability of a
Type II Error
• To properly interpret the results of a test of
hypothesis, we need to
– specify an appropriate significance level or judge the
p-value of a test;
– understand the relationship between Type I and Type
II errors.
– How do we compute a type II error?
4
Calculation of the Probability
of a Type II Error
• A Type II error occurs when a false H0 is not
rejected.
• To calculate Type II error we need to…
– express the rejection region directly, in terms of the
parameter hypothesized (not standardized).
– specify the alternative value under H1.
• Let us revisit Example 11.1
5
Calculation of the Probability
of a Type II Error
• Let us revisit Example 11.1
– The rejection region was x  175.34 with a = .05.
– Let the alternative value be m = 180 (rather than just
m>170)
  P( x  175.34 given that m  180)
175.34  180
 P( z 
)  0.0764
65 400
6
Judging the Test
• A hypothesis test is effectively defined by the significance
level a and by the sample size n.
• A measures of effectiveness is the probability of Type II
error. Typically we want to keep the probability of Type II
error as small as possible.
• If the probability of a Type II error  is judged to be too
large, we can reduce it by
– increasing a, and/or
– increasing the sample size.
7
Judging the Test
• Increasing the sample size reduces 
xL  m

Recall : z a 
, thus xL  m  z a
 n
n
By increasing the sample size the
standard deviation of the sampling
distribution of the mean decreases.
Thus, x Ldecreases.
8
Judging the Test
• In Example 11.1, suppose n increases from 400
to 1000.

65
xL  m  z a
 170  1.645
 173.38
n
1000
173.38  180
  P( Z 
)  P( Z  3.22)  0
65 1000
• a remains 5%, but the probability of a Type II
drops dramatically.
9
Judging the Test
• Another way of expressing how well a test
performs is to report its power
– The power of a test is defined as 1 - .
– It represents the probability of rejecting the null
hypothesis when it is false.
10
Planning Studies
• Power calculations are important in planning
studies.
• Using a hypothesis test with low power makes it
unlikely that you will reject H0 even if the truth is
far from the null hypothesis.
• Operating characteristic curve is a plot of 
versus the alternative m for a fixed sample size
n and a fixed significance level a
11
Operating Characteristic Curve for
Example 11.1
0.2
0.4
0.6
n=100
n=400
n=1000
n=2000
0.0
Probability of a Type II error
0.8
Operating Characteristic Curves
170
175
180
185
190
Population Mean
12
Problem 11.54
Many Alpine ski centers base their projections of revenues
and profits on the assumption that the average Alpine
skier skis 4 times per year.
To investigate the validity of this assumption, a random
sample of 63 skiers is drawn and each is asked to report
the number of times they skied the previous year.
Assume that the population standard deviation is 2, and
the sample mean is 4.84. Can we infer at the 10% level
that the assumption is wrong?
13
14
Problem 11.54 follow-up
• What is the probability of making a Type II error if
the average Alpine skier skis 4.2 times per year?
15
Problem: Effects of SAT Coaching
• Suppose that SAT mathematics scores in the absence of
coaching have a normal distribution with 475 and standard
deviation 100. Suppose further that coaching may change the
mean but not the standard deviation. Calculate the p-value for
the test of H 0 : m  475 versus H1 : m  475
for each of the following three situations:
(a) A coaching service coaches 100 students; their SAT-M scores
x  478
average
(b) By the next year, the coaching service has coached 1000
students; their SAT-M scores average x  478
(c) An advertising campaign brings the total number of students
coached to 10,000; their average score is still x  478
16
17
Practical Significance vs.
Statistical Significance
• An increase in the average SAT-M score from 475 to
478 is of little importance in seeking admission to
college, but a large enough sample size will always
declare very small effects statistically significant.
• A confidence interval provides information about the
size of the effect and should always be reported. The
two-sided 95% confidence intervals for the SAT
coaching problem are 478 (1.96)(100/ n ). Thus, for
(a) - (458.4,497.6); (b) – (471.8,484.2); (c) –
(476.04,479.96).
• For large samples, the CI says “Yes, the mean score is18
higher after coaching but only by a small amount.”
Chapter 12
• In this chapter we utilize the approach developed
before to describe a population.
– Identify the parameter to be estimated or tested.
– Specify the parameter’s estimator and its sampling
distribution.
– Construct a confidence interval estimator or perform
a hypothesis test.
19
12.2 Inference About a Population
Mean When the Population Standard
Deviation Is Unknown
Recall that when  is known we use the following
statistic to estimate and test a population mean
z
xm

n
When  is unknown, we use its point estimator s,
and the z-statistic is replaced then by the t-statistic
20
t-Statistic
xm
t
s/ n
• When the sampled population is normally distributed, the t
statistic is Student t distributed with n-1 degrees of freedom.
s
a /2
x

t
a / 2 ,n1
• Confidence Interval:
where
is
the
t
a / 2,n1
n
quantile of the Student t-distribution with n-1 degrees of freedom.
21
The t - Statistic
t
The t distribution is mound-shaped,
and symmetrical around zero.
d.f. = v2
v1 < v2
d.f. = v1
0
xm
s
n
The “degrees of freedom”,
(a function of the sample size)
determine how spread the
distribution is (compared to the
normal distribution)
22
A = .05
tA
t.100
t.05
t.025
t.01
t.005
3.078
1.886
.
.
1.325
6.314
2.92
.
.
1.725
12.706
4.303
.
.
2.086
31.821
6.965
.
.
2.528
.
.
.
.
.
.
.
.
.
.
200
1.286
1.282
1.653
1.645
1.972
1.96
2.345
2.326
63.657
9.925
.
.
2.845
.
.
2.601
2.576
Degrees of Freedom
1
2
.
.
20

23
Testing m when  is unknown
• Example 12.1
– In order to determine the number of workers required to meet
demand, the productivity of newly hired trainees is studied.
– It is believed that trainees can process and distribute more
than 450 packages per hour within one week of hiring.
– Fifty trainees were observed for one hour. In this sample of
50 trainees, the mean number of packages processed is
460.38 and s=38.82.
– Can we conclude that the belief is correct, based on the
productivity observation of 50 trainees?
24
25
Checking the required conditions
• In deriving the test and confidence interval, we have made two
assumptions: (i) the sample is a random sample from the
population; (ii) the distribution of the population is normal.
• The t test is robust – the results are still approximately valid as
long as the population is not extremely nonnormal. Also if the
sample size is large, the results are approximately valid.
• A rough graphical approach to examining normality is to look at
buti ons
the sample histogram. D istri
Packages
350
400
450
500
550
26
JMP Example
• Problem 12.45: Companies that sell groceries over the
Internet are called e-grocers. Customers enter their
orders, pay by credit card, and receive delivery by truck.
A potential e-grocer analyzed the market and
determined that to be profitable the average order
would have to exceed $85. To determine whether an egrocer would be profitable in one large city, she offered
the service and recorded the size of the order for a
random sample of customers. Can we infer from the
data than e-grocery will be profitable in this city at
significance level 0.05?
27
Practice Problems
• 11.68,11.84,12.40,12.46
28