Transcript Slide 1

8
Tests of Hypotheses
Based on a Single
Sample
Hypotheses
Tests
About
8.1-8.2
a Population Mean
Overview of Inference

Methods for drawing conclusions about a population from sample
data are called statistical inference

Methods


Confidence Intervals - estimating a value of a population parameter

Tests of hypotheses - assess evidence for a claim about a population
Inference is appropriate when data are produced by either

a random sample or

a randomized experiment
Stating hypotheses
A test of hypothesis tests a specific hypothesis using sample data to
decide on the validity of the hypothesis.
In statistics, a hypothesis is an assumption or a theory about the
characteristics of one or more variables in one or more populations.
What you want to know: Does the calibrating machine that sorts cherry
tomatoes into packs need revision?
The same question reframed statistically: Is the population mean µ for the
distribution of weights of cherry tomato packages equal to 227 g (i.e., half
a pound)?
The null hypothesis is a very specific statement about a parameter of
the population(s). It is labeled H0.
The alternative hypothesis is a more general statement about a
parameter of the population(s) that is exclusive of the null hypothesis. It
is labeled Ha.
Weight of cherry tomato packs:
H0 : µ = 227 g (µ is the average weight of the population of packs)
Ha : µ ≠ 227 g (µ is either larger or smaller)
One-sided and two-sided tests
A two-tail or two-sided test of the population mean has these null
and alternative hypotheses:

H0 : µ = [a specific number] Ha : µ  [a specific number]
A one-tail or one-sided test of a population mean has these null and
alternative hypotheses:

H0 : µ = [a specific number] Ha : µ < [a specific number]
OR
H0 : µ = [a specific number] Ha : µ > [a specific number]
The FDA tests whether a generic drug has an absorption extent similar to
the known absorption extent of the brand-name drug it is copying. Higher or
lower absorption would both be problematic, thus we test:
H0 : µgeneric = µbrand
Ha : µgeneric  µbrand
two-sided
Test Statistic
A test of significance is based on a statistic that estimates the
parameter that appears in the hypotheses. When H0 is true, we expect
the estimate to take a value near the parameter value specified in H0.
Values of the estimate far from the parameter value specified by H0
give evidence against H0.
A test statistic calculated from the sample data measures how far
the data diverge from what we would expect if the null hypothesis
H0 were true.
estimate - hypothesized value
z
standard deviation of the estimate
Large values of the statistic show that the data are not consistent
with
H 0.
P-Value
The null hypothesis H0 states the claim that we are seeking evidence
against. The probability that measures the strength of the evidence
against a null hypothesis is called a P-value.
The probability, computed assuming H0 is true, that the statistic
would take a value as extreme as or more extreme than the one
actually observed is called the P-value of the test. The smaller
the P-value, the stronger the evidence against H0 provided by
the data.
 Small P-values are evidence against H0 because they say that the
observed result is unlikely to occur when H0 is true.
 Large P-values fail to give convincing evidence against H0 because
they say that the observed result is likely to occur by chance when
H0 is true.
Statistical Significance
The final step in performing a significance test is to draw a conclusion
about the competing claims you were testing. We will make one of two
decisions based on the strength of the evidence against the null
hypothesis (and in favor of the alternative hypothesis)―reject H0 or fail
to reject H0.
 If our sample result is too unlikely to have happened by chance
assuming H0 is true, then we’ll reject H0.
 Otherwise, we will fail to reject H0.
Note: A fail-to-reject H0 decision in a significance test doesn’t mean
that H0 is true. For that reason, you should never “accept H0” or use
language implying that you believe H0 is true.
In a nutshell, our conclusion in a significance test comes down to:
P-value small → reject H0 → conclude Ha (in context)
P-value large → fail to reject H0 → cannot conclude Ha (in context)
Statistical Significance
There is no rule for how small a P-value we should require in order to
reject H0 — it’s a matter of judgment and depends on the specific
circumstances. But we can compare the P-value with a fixed value
that we regard as decisive, called the significance level. We write it
as , the Greek letter alpha. When our P-value is less than the
chosen , we say that the result is statistically significant.
If the P-value is smaller than alpha, we say that the data are
statistically significant at level . The quantity  is called the
significance level or the level of significance.
When we use a fixed level of significance to draw a conclusion in a
significance test,
P-value <  → reject H0 → conclude Ha (in context)
P-value ≥  → fail to reject H0 → cannot conclude Ha (in context)
Tests for a Population Mean
Four Steps of Tests of Significance
Tests of Significance: Four Steps
1. State the null and alternative hypotheses.
2. Calculate the value of the test statistic.
3. Find the P-value for the observed data.
4. State a conclusion.
We will learn the details of many tests of significance in the following
chapters. The proper test statistic is determined by the hypotheses
and the data collection design.
Does the packaging machine need revision?


x  222g
H0 : µ = 227 g versus Ha : µ ≠ 227 g
What is the probability of drawing a random sample such
as yours if H0 is true?
  5g
x   222 227
z

 2
 n
5 4
n4
From table A, the area under the standard
normal curve to the left of z is 0.0228.
Sampling
distribution
Thus, P-value = 2*0.0228 = 4.56%.
σ/√n = 2.5 g
2.28%
The probability of getting a random
2.28%
sample average so different from
µ is so low that we reject H0.
217
The machine does need recalibration.
222
227
232
x,
µ (H0)weight (n=4)
Average
package
z  2
237
The significance level: 
The significance level, α, is the largest P-value tolerated for rejecting a
true null hypothesis (how much evidence against H0 we require). This
value is decided arbitrarily before conducting the test.

If the P-value is equal to or less than α (P ≤ α), then we reject H0.

If the P-value is greater than α (P > α), then we fail to reject H0.
Does the packaging machine need revision?
Two-sided test. The P-value is 4.56%.
* If α had been set to 5%, then the P-value would be significant.
* If α had been set to 1%, then the P-value would not be significant.
Two-Sided Significance Tests and
Confidence Intervals
Because a two-sided test is symmetrical, you can also use a 1 – 
confidence interval to test a two-sided hypothesis at level .
In a two-sided test,
C=1–
C confidence level
 significance level
α /2
α /2
Sweetening colas
Cola manufacturers want to test how much the sweetness of a new
cola drink is affected by storage. The sweetness loss due to storage
was evaluated by 10 professional tasters (by comparing the sweetness
before and after storage):










Taster
1
2
3
4
5
6
7
8
9
10
Sweetness loss
2.0
0.4
0.7
2.0
−0.4
2.2
−1.3
1.2
1.1
2.3
Obviously, we want to test if
storage results in a loss of
sweetness, thus:
H0:  = 0 versus Ha:  > 0
This looks familiar. However, here we do not know the population parameter .
 The population of all cola drinkers is too large.
 Since this is a new cola recipe, we have no population data.
This situation is very common with real data.
When  is unknown
The sample standard deviation s provides an estimate of the population
standard deviation .
When
the sample size is large,
the sample is likely to contain
elements representative of the
whole population. Then s is a
good estimate of .
But
when the sample size is
small, the sample contains only
a few individuals. Then s is a
mediocre estimate of .
Population
distribution
Large sample
Small sample
Standard deviation s – standard error s/√n
For a sample of size n, the sample standard deviation s is:
1
2
s
(
x

x
)

i
n 1
The value s/√n is called the standard error of the mean
x.
The t distributions
Suppose that an SRS of size n is drawn from an N(µ, σ) population.

When  is known, the sampling distribution is N(,/√n).

When  is estimated from the sample standard deviation s, the
sampling distribution follows a t distribution t(, s/√n) with degrees
of freedom n − 1.
x 
t
s n
is the one-sample t statistic.
When n is very large, s is a very good estimate of , and the
corresponding t distributions are very close to the normal distribution.
The t distributions become wider for smaller sample sizes, reflecting the
lack of precision in estimating  from s.
Standardizing the data before using t-table
As with the normal distribution, the first step is to standardize the data.
Then we can use t-table to obtain the area under the curve.
t(,s/√n)
df = n − 1
x 
t
s n
s/√n



t(0,1)
df = n − 1
x
1
0
t
T-table
When σ is unknown,
we use a t distribution
with “n−1” degrees of
freedom (df).
Table shows the
z-values and t-values
corresponding to
landmark P-values/
confidence levels.
x 
t
s n

When σ is known, we
use the normal
distribution and the
standardized z-value.
z-table vs. t-table
Z-table gives the area to the
LEFT of hundreds of z-values.
It should only be used for
Normal distributions.
(…)
Table D
t-table gives the area
to the RIGHT of a
dozen t or z-values.
(…)
It can be used for
t distributions of a
given df and for the
Normal distribution.
T-table also gives the middle area under a t or normal distribution comprised
between the negative and positive value of t or z.
The P-value is the probability, if H0 is true, of randomly drawing a
sample like the one obtained or more extreme, in the direction of Ha.
The P-value is calculated as the corresponding area under the curve,
one-tailed or two-tailed depending on Ha:
One-sided
(one-tailed)
Two-sided
(two-tailed)
x  0
t
s n
T-table
For df = 9 we only
look into the
corresponding row.
The calculated value of t is 2.7.
We find the 2 closest t values.
2.398 < t = 2.7 < 2.821
thus
0.02 > upper tail p > 0.01
For a one-sided Ha, this is the P-value (between 0.01 and 0.02);
for a two-sided Ha, the P-value is doubled (between 0.02 and 0.04).
Sweetening colas (continued)
Is there evidence that storage results in sweetness loss for the new cola
recipe at the 0.05 level of significance ( = 5%)?
H0:  = 0 versus Ha:  > 0 (one-sided test)
t
x  0
s
n

1.02  0
 2.70
1.196 10

The critical value t = 1.833.
t > t thus the result is significant.

2.398 < t = 2.70 < 2.821 thus 0.02 > p > 0.01.
p <  thus the result is significant.
Taster
Sweetness loss
1
2.0
2
0.4
3
0.7
4
2.0
5
-0.4
6
2.2
7
-1.3
8
1.2
9
1.1
10
2.3
___________________________
Average
1.02
Standard deviation
1.196
Degrees of freedom
n−1=9
The t-test has a significant p-value. We reject H0.
There is a significant loss of sweetness, on average, following storage.
The one-sample t-test
As in the previous chapter, a test of hypotheses requires a few steps:
1. Stating the null and alternative hypotheses (H0 versus Ha)
2. Deciding on a one-sided or two-sided test
3. Choosing a significance level 
4. Calculating t and its degrees of freedom
5. Finding the area under the curve with t-table
6. Stating the P-value and interpreting the result

The one-sample t-confidence interval
The level C confidence interval is an interval with probability C of
containing the true population parameter.
We have a data set from a population with both  and  unknown. We
use x to estimate  and s to estimate ,using a t distribution (df n−1).
Practical use of t : t*

C is the area between −t* and t*.
We find t* in the line of Table D
for df = n−1 and confidence level
C.


The margin of error m is:
m  t*s
n
C
m
−t*
m
t*
Red wine, in moderation
Drinking red wine in moderation may protect against heart attacks. The
polyphenols it contains act on blood cholesterol, likely helping to prevent heart
attacks.
To see if moderate red wine consumption increases the average blood level of
polyphenols, a group of nine randomly selected healthy men were assigned to
drink half a bottle of red wine daily for two weeks. Their blood polyphenol levels
were assessed before and after the study, and the percent change is presented
here:
0.7 3.5
4
4.9 5.5
7
7.4 8.1 8.4
Firstly: Are the data approximately normal?
Histogram
Frequency
4
3
2
1
0
2.5
5
7.5
9
More
Percentage change in polyphenol
blood levels
There is a low
value, but overall
the data can be
considered
reasonably normal.
What is the 95% confidence interval for the average percent change?
Sample average = 5.5; s = 2.517; df = n − 1 = 8
(…)
The sampling distribution is a t distribution with n − 1 degrees of freedom.
For df = 8 and C = 95%, t* = 2.306.
The margin of error m is: m = t*s/√n = 2.306*2.517/√9 ≈ 1.93.
Therefore, the confidence interval is (5.5-1.93, 5.5+1.93).
With 95% confidence, the population average percent increase in
polyphenol blood levels of healthy men drinking half a bottle of red wine
daily is between 3.6% and 7.4%.
Type I and II errors
When we draw a conclusion from a significance test, we hope our
conclusion will be correct. But sometimes it will be wrong. There are two
types of mistakes we can make.
If we reject H0 when H0 is true, we have committed a Type I error.
If we fail to reject H0 when H0 is false, we have committed a Type II
error.
Truth about the population
Conclusion
based on
sample
H0 true
H0 false
(Ha true)
Reject H0
Type I error
Correct
conclusion
Fail to reject
H0
Correct
conclusion
Type II error
Type I and II errors

A Type I error is made when we reject the null hypothesis and the
null hypothesis is actually true (incorrectly reject a true H0).
The probability of making a Type I error is the significance level .

A Type II error is made when we fail to reject the null hypothesis
and the null hypothesis is false (incorrectly keep a false H0).
The probability of making a Type II error is labeled .
The power of a test is 1 − .
The Common Practice of Testing Hypotheses
1.
State H0 and Ha as in a test of significance.
2.
Think of the problem as a decision problem, so the probabilities of
Type I and Type II errors are relevant.
3.
Consider only tests in which the probability of a Type I error is no
greater than .
4.
Among these tests, select a test that makes the probability of a
Type II error as small as possible.
Steps for Tests of Significance
1. Assumptions/Conditions

Specify variable, parameter, method of data collection, shape of population.
2. State hypotheses

Null hypothesis Ho and alternative hypothesis Ha.
3. Calculate value of the test statistic

A measure of “difference” between hypothesized value and its estimate.
4. Determine the P-value

Probability, assuming Ho true that the test statistic takes the observed value
or a more extreme value.
5. State the decision and conclusion

Interpret P-value, make decision about Ho.
8.3
Tests Concerning a
Population Proportion
Sampling distribution of sample proportion
ˆ is approximately
The sampling distribution of a sample proportion p
normal (normal approximation of a binomial distribution) when the
sample size is large enough.
Conditions for inference on p
Assumptions:
1. The data used for the estimate are an SRS from the population
studied.
2. The population is at least 10 times as large as the sample used for
inference.
3. The sample size n is large enough that the sampling distribution can
be approximated with a normal distribution. Otherwise, rely on the
binomial distribution.
Large-sample confidence interval for p
Confidence intervals contain the population proportion p in C% of
samples. For an SRS of size n drawn from a large population, and with
ˆ calculated from the data, an approximate level C
sample proportion p
confidence interval for p is:
pˆ  m, m is themargin of error
m  z * SE  z * pˆ (1  pˆ ) n
C
m
Use this method when the number of
successes and the number of
failures are both at least 15.
−Z*
m
Z*
C is the area under the standard
normal curve between −z* and z*.
Medication side effects
Arthritis is a painful, chronic inflammation of the joints.
An experiment on the side effects of pain relievers
examined arthritis patients to find the proportion of
patients who suffer side effects.
What are some side effects of ibuprofen?
Serious side effects (seek medical attention immediately):
Allergic reaction (difficulty breathing, swelling, or hives)
Muscle cramps, numbness, or tingling
Ulcers (open sores) in the mouth
Rapid weight gain (fluid retention)
Seizures
Black, bloody, or tarry stools
Blood in your urine or vomit
Decreased hearing or ringing in the ears
Jaundice (yellowing of the skin or eyes)
Abdominal cramping, indigestion, or heartburn
Less serious side effects (discuss with your doctor):
Dizziness or headache
Nausea, gaseousness, diarrhea, or constipation
Depression
Fatigue or weakness
Dry mouth
Irregular menstrual periods
Let’s calculate a 90% confidence interval for the population proportion of
arthritis patients who suffer some “adverse symptoms.”
ˆ?
What is the sample proportion p
pˆ 
23
 0.052
440
What is the sampling distribution for the proportion of arthritis patients with
adverse symptoms for samples of 440?
For a 90% confidence level, z* = 1.645.
Using the large sample method, we
calculate a margin of error m:
m  z * pˆ (1  pˆ ) n
m  1.645* 0.052(1  0.052) / 440
pˆ  N ( p, p(1  p) n )
z*
Upper tail probability P
0.25
0.2 0.15
0.1 0.05 0.03 0.02 0.01
0.67 0.841 1.036 1.282 1.645 1.960 2.054 2.326
50% 60% 70% 80% 90% 95% 96% 98%
Confidence level C
90%CIforp :pˆ  m
or0.052 0.0174
m  1.645* 0.0106 0.0174
 With a 90% confidence level, between 3.5% and 6.9% of arthritis patients
taking this pain medication experience some adverse symptoms.
Significance test for p
ˆ is approximately normal for large
The sampling distribution for p
sample sizes and its shape depends solely on p and n.
Thus, we can easily test the null hypothesis:
H0: p = p0 (a given value we are testing).
p0 (1 p0 )
n
If H0 is true, the sampling distribution is known 
The likelihood of our sample proportion given the
ˆ
null hypothesis depends on how far from p0 our p
is in units of standard deviation.
z
pˆ  p0
p0 (1  p0 )
n
p0


pˆ
This is valid when both expected counts—expected successes np0 and
expected failures n(1 − p0)—are each 10 or larger.
A national survey by the National Institute for Occupational Safety and Health on
restaurant employees found that 75% said that work stress had a negative impact
on their personal lives.
You investigate a restaurant chain to see if the proportion of all their employees
negatively affected by work stress differs from the national proportion p0 = 0.75.
H0: p = p0 = 0.75 vs. Ha: p ≠ 0.75 (2 sided alternative)
In your SRS of 100 employees, you find that 68 answered “Yes” when asked,
“Does work stress have a negative impact on your personal life?”
The expected counts are 100 × 0.75 = 75 and 25.
z
Both are greater than 10, so we can use the z-test.
The test statistic is:

pˆ  p0
p0 (1  p0 )
n
0.68  0.75
 1.62
(0.75)(0.25)
100
From Table A we find the area to the left of z = -1.62 is 0.0526.
Thus P(Z ≤ -1.62) = 0.0526. Since the alternative hypothesis is two-sided, the Pvalue is the area in both tails, and therefore the p-value = 2 × 0.0526 = 0.1052.
 The chain restaurant data
are not significantly different
from the national survey results
ˆ = 0.68, z = -1.62,
( p
p-value = 0.11).