P. STATISTICS LESSON 12

Download Report

Transcript P. STATISTICS LESSON 12

AP STATISTICS
LESSON 12 - 1
INFERENCE FOR A POPULATION
PROPORTION
ESSENTIAL QUESTION:
What are the procedures for
creating significance tests and
confidence intervals for
population proportion problems?
Objectives:
• To create confidence intervals for
population proportions.
• To find significance for proportion
populations.
Introduction
We often want to answer questions
about the proportion of some outcome
in a population, or to compare
proportions across several
populations.
Population Proportion Problems
Page 685
• Example 12.1 Risky Behavior in the Age of
AIDS (estimating a single population
proportion)
• Example 12.2 Does Preschool Make a
Difference? (comparing two population
proporations)
• Example 12.3 Extracurriculars and Grades
(comparing more than two population
proportions)
Inference for a Population
Proportion
We are interested in the unknown
proportion p of a population that has
some outcome.
For convenience, call the outcome we
are looking for a “success.”
Sample Proportion
^ = count of successes in the sample
ρ
count of observations in the sample
^ as “p-hat.”
Read the sample proportion ρ
Conditions for Inference
• As always, inference is based on the
sampling distribution of a statistic.
• The mean is ^
p. That is, the sample
proportion p is an unbiased estimator of
the population proportion p. The standard
deviation of p is √ p(1-p)/n, provided that
the population is at least 10 times as large
as the sample. If the sample size is large
enough that both np and n(1 – p ) are at
least 10, the distribution of p is
approximately normal.
z Statistic
^ – p)/ √p(1 – p )/n
z = (p
The statistic z has approximately the
standard normal distribution N(0,1) if
the sample is not too small and the
sample is not a large part of the
population.
Working Without p
• To test the null hypothesis Ho: p = p0 that the
unknown p has a specific value po, just
replace p by po in the z statistic and in
checking the values of np and n(1 – p).
• In a confidence interval for p, we have no
specific value to substitute. In large
samples, ^
p will be close to p. So we replace
p by ^
p in determining the values of np and
n(1 – p). We also replace the standard
deviation by the standard error of p
^ – ^p)/n to get a
SE = √p(1
confidence interval estimate ± z*SE
Conditions for Inference About a
Proportion
• The data are an SRS from the population of
interest.
• The population is at least 10 times as large
as the sample.
• For a test Ho: p = po , the sample size n is
so large that both npo and
n(1 – po) are 10 or more. For a confidence
interval, n is so large that both the count of
^ and the count of the failures
successes np
n( 1 – ^p ) are 10 or more.
Example 12.4 Page 688
Are the Conditions Met?
• The sampling design was in fact a
complex stratified sample, and the
survey used inference procedures for
that design. The overall effect is
close to an SRS, however.
• The number of adult heterosexuals
(the population) is much larger than
10 times the sample size, n = 2673
Inference for a Population
Proportion
• Draw an SRS of size n from a large
population with unknown proportion p of
success. An approximate level C
confidence interval for p is
^ ± z*√ p(1
^ –p
^)/n
p
Where z* is the upper (1-C)/2 standard
normal critical value. To test the
hypothesis Ho: p = po compute the z
statistic z = (p – po )/√po(1 – po)/n
Inference for Population
Proportion (continued…)
In terms of a variable Z having the
standard normal distribution, the
approximate P-value for a test Ho
against
Ha: p > po is P(Z ≥ z )
Ha: p < po is P(Z ≤ z )
Ha: p ≠ po is 2P(Z ≥ lzl )
Example 12.5 Page 690
Estimating Risky Behavior
The National AIDS Behavioral Surveys found that 170 of
a sample of 2673 adult heterosexuals had multiple
^ = 0.0636.
partners. That is, p
A 99% confidence interval for the proportion p of all
adult heterosexuals with multiple partners uses the
standard normal critical value z* = 2.576 (use the bottom
row of Table C for standard normal critical values)
We are 99% confident that the percent of adult
heterosexuals who had more than one sexual partner in
the past year lies between about 5.1% and 7.6%
Example 12.6
Page 691
Binge Drinking in College
Binge drinking for men = 5 or more drinks (women = 4 or more
drinks) on at lease one occasion within two weeks.
In a representative sample of 140 colleges and 17,592 students
(SRS), 7741 students identified themselves as binge drinkers.
Does this constitute strong evidence that more than 40% of all
college students engage in binge drinking?
Answer:
The P-value tells us that there is virtually no change of obtaining a
sample proportion as far away from0.40 as ^
p = 0.44. We reject H0
and conclude that more than 40% of U.S. college students have
engaged in binge drinking.
Example 12.7
Page 692
Is That Coin Fair?
A coin that is balanced should come up heads half the time in
the long run. The French naturalist Count Buffon tossed a
coin 4040 times and got 2048 heads (p = 0.5069)
Is this evidence that Buffon’s coin was not balanced? (hint: use
the p-value for the two-sided test)
Answer:
We failed to find good evidence against H0: p
= 0.5. We cannot conclude that H0 is true,
that is, that the coin is perfectly balanced.
NOTE: The test of significance only shows
that the results of Buffon’s 4040 tosses can’t
distinguish this coin from one that is perfectly
balanced. To see what values of p are
consistent with sample results, use a
confidence interval.
Example 12.8 Page 693
Confidence Interval For p
We are 95% confident that the probailiby
of a head is between 0.4915 and
0.52223.
The confidence interval is more
informative than the text in Example
12.7.
Choosing the Sample Size
In planning a study, we may want to
choose a sample size that will allow us to
estimate the parameter within a given
margin of error.
^ )/ n
m = z* √ ^
p(1 – p
Here z* is the standard normal critical
value for the level of confidence we want.
Because the margin of error involves the
sample proportion of success p, we need
to guess this value when choosing n.
Call our guess p*. Here are two ways to
get p*.
Ways to Get p*
1. Use a guess or p* based on a pilot study or
on past experience with similar studies. You
should do several calculations that cover the
range of p-values you might get.
1. ^Use p* = 0.5 as the guess. The margin of
error m is larger when
^ = 0.5, so this guess is conservative in the
p
sense that if we get other p when we do our
study, we will get a margin of error smaller
than planned.
Sample Size for
Desired Margin of Error
To determine the sample size n that will yield a
level C confidence interval for a population
proportion p with a specified margin of error m,
set the following expression for the margin of
error to be less than or equal to m, and solve
for n:
z* √p*(1 – p*) / n ≤ m
Where p* is a guessed value for the sample
proportion. The margin of error will be less
than or equal to m if you take the guess p* to be
0.5
Choosing p*
The method for finding the guess p* does not
matter that much in most cases. The n you get
doesn’t change much when you change p* as
long as p* is not too far from .5. So use the
^ expect the
conservative guess p* = 0.5 if you
true p to be roughly between 0.3 and 0.7. If the
true p is close to 0 or 1, using p* as your guess
will give a sample much larger than you need.
So try to use a better guess from a pilot study
when you suspect that ^
p will be less than 0.3 or
greater than 0.7.
Example 12.9 Page 696
Determining Sample Size
for Election Polling
Find sample size for 2.5% margin of
error (sample size n = 1537),
and
for 2% margin of error (n = 2041).