Chapter 9: Estimation Using a Single Sample

Download Report

Transcript Chapter 9: Estimation Using a Single Sample

Chapter 9: Estimation Using a Single Sample
Confidence Intervals
Inferential Statistics
• Our study of confidence intervals begins our study
of inferential statistics
• In inferential statistics, our objective is to learn
about a population from a sample of data
• Use a sample of data to decrease our uncertainty
about the population the sample was drawn from
• More specifically, we’ll be using samples of data
to estimate unknown population parameters like 
and
.

Point Estimates
• A single number derived from a sample of
data (statistic) that represents a plausible
value for a population parameter
• First, we decide what is the appropriate
statistic. We then collect a random sample
of data. The computed statistic is our point
estimate -- X as a point estimate for 
More than one choice
• Interested in the proportion of American voters
who support gay marriages
• Obviously the appropriate statistic is sample
proportion -- p as an estimate for 
• Sometimes there’s more than one choice
• Sample mean, trimmed mean or median as a point
estimate for population mean
• How do you choose? Choose the statistic that
tends, on average, to be the closest estimate to the
true value.
Biased and Unbiased Statistic
• When there’s more than one choice we want to
choose the statistic that is most accurate
• Sampling distributions of statistics give us
information about how accurate a statistic is for
estimating a population parameter
• Statistics with sampling distributions that are
centered on the parameter we’re trying to estimate
are called unbiased
• The two unbiased statistics we’ll be studying are
sample mean and sample proportion
Accuracy of Point Estimates
• Even though we might select an unbiased statistic,
how accurate is this single number that we
calculate?
• Remember sampling variability?
• Example – samples of 50 from a normal
distribution
• Using an unbiased statistic with a small standard
deviation guarantees no systematic tendency to
underestimate or overestimate the parameter and
the estimates will be relatively close to the true
value
Confidence Intervals
• How accurate a point estimate is depends on
which sample you happen to draw from the
population
• While the point estimate using an unbiased
statistic may be our best single-number best guess
– it’s not the only plausible estimate
• An alternative to a single number estimate is to
provide a range of values or an interval that we
feel very confident the true value will fall into
• We call this type of estimation confidence
intervals
Definition of Confidence Interval
• An interval of plausible values for the
characteristic. It is constructed so that, with
a chosen degree of confidence, the value of
the characteristic – parameter – will be
captured in the interval
Confidence Interval
•
Confidence Interval = Statistic Critical Value x Statistic Std Dev
•
•
•
•
Statistic
Standard Deviation of Sampling Distribution
Critical Value
Associated confidence level
– How much confidence we have in the method used to
construct the CI
– Not our confidence in any particular interval
Basic Concept of CI
• We start with the sampling distribution of
the statistic we are using
• We will be using sampling distributions that
are well approximated by a normal
distribution
• We take a sample and calculate a point
estimate, a statistic (unbiased) from that
sample
Continuing …
• With what we know about normal distributions,
we know that about 95% of the statistics
calculated from random samples will fall within 2
sd of the mean.
• The mean of the sampling distribution is centered
on the population parameter
• If the statistic is within approx 2 sd of the
sampling distribution’s mean 95% of the time,
then the interval Statistic Critical Value x Statistic Std Dev will
capture the mean of the sampling distribution 95%
of the time
More …
• The width of the interval is adjusted by selecting a
different confidence level
• Typical confidence levels are 90%, 95% and 99%
• The endpoints are determined by multiplying the
critical values (which are determined by
confidence levels) by the sampling distribution
standard deviation (sd of the statistic)
Large Sample Confidence Interval for a
Population Proportion
• Parameter of interest is the population
proportion 
• Statistic used is sample proportion p
• Why are large sample CI ?? From last
chapter, when sample is large, the statistic is
normally distributed
• How large is large? n  10 and n1     10
• We know  p   and    1n  
p
Large Sample Confidence Interval for a
Population Proportion
• Calculate a sample proportion from a random
sample
–
p
number in sample that have characteri stic
n
• Estimate the sample standard deviation
–
p1  p 
n
standard error
• Choose a confidence level – let’s say 95%
• Determine the critical value
– Use standard normal table – 1.96
• Calculate your confidence interval
–
Confidence Interval = Statistic Critical Value x Statistic Std Dev
Let’s do an example
• Pg 453 Problem # 9.14
In summary
• The Large Sample Confidence Interval for

– p is the sample proportion from a random sample
– The sample size, n, is large np  10 and n1  p  10
– The CI is
p1  p 
p  z*
n
– The desired confidence level determines which critical
value is used
– Note: This method is not appropriate for small samples
Choosing the Sample Size
• Terminology: Bound
• Confidence Interval = Statistic  Critical Value x Statistic Std Dev
• Consider the statistic an estimate of the parameter
• Consider ‘critical value x standard deviation’ the
bound on the error of your estimate
• In the case of population proportions
p  z*
p1  p 
n
Finding appropriate sample size
• Consider that before you do a study, you may be
asked to estimate a particular parameter to a
certain degree of accuracy
• The question now is, how big a sample should I
take to get a specific degree of accuracy at a
certain confidence level
• We use the ‘bound’ to determine sample size
 z *
n   1    
B
2
• But the population parameter is unknown so we
make a reasonable estimate – or use .5 as a
conservative estimate for 
• Example – pg 454, 9.25
Confidence Interval for Population Mean
• We’ll look at these cases:
– Population standard deviation is known
• n  30
• Small sample but population is approx normal
– Population standard deviation is unknown
• n  30
• Small sample but population is approx normal
Sampling Distribution of the Sample Mean
• X  
•   
X
n
• When the population is normal, the sampling
distribution is normal regardless of sample size
• When the population is not normal, the sampling
distribution is normal if the sample size is large
(CLT).
Confidence Interval for Population Mean
 Known
• X is sample mean from a random sample
• Sample size is large or population is
approximately normal
• Population standard deviation is known
• CI is:
 
X  z* 

 n
Sampling Distribution of the Sample Mean
Unknown

•
•
X  
s
X 
n
• When the population is normal, the sampling
distribution is normal regardless of sample size
• When the population is not normal, the sampling
distribution is normal if the sample size is large
(CLT)
Confidence Interval for Population Mean
Unknown

• X is sample mean from a random sample
• Sample size is large or population is
approximately normal
• Population standard deviation is known
• CI is:
* s
t : n  1 df
X t
n
Student’s t-Distribution
• Recall that a standard normal distribution is a
bell-shaped distribution with parameters and 
• The t-distribution is bell-shaped and centered on 0.
• There are many t-distributions differentiated by
the degrees of freedom – which is n-1
• Each t-curve is a little more spread out than the zcurve but as n gets larger and larger, the tdistribution approaches the z-curve.

Student’s t-Distribution
• Recall from our study of sampling distribution the
properties of the sampling distribution of X
• When the population standard deviation is not
known, then X is distributed according to the tdistribution
• This distribution will give us critical values a little
higher than a normal distribution since we don’t
know the value of the population distribution -therefore introducing a little more uncertainty
t-Distribution Table
• Appendix III in the back of your textbook
Choosing the Sample Size
• When estimating the population mean using a
large sample or a small sample from a normal
population, the bound on error estimation,
associated with a 95% CL is
 
B  1.96

 n
• Since population standard deviation is usually
unknown we can
– Make a best guess
– Divide the Range by 4
Degrees of Freedom
• The number of independent pieces of
information that go into the estimate of the
parameter
• The number of values in the calculation of a
statistic that are free to vary
• The number of pieces of independent pieces
of info that go into an estimate minus the
number of parameters estimated