Interval estimates - University of Regina

Download Report

Transcript Interval estimates - University of Regina

Interval estimation
ASW, Chapter 8
Economics 224, Notes for October 8, 2008
Central limit theorem – CLT (ASW, 271)
The sampling distribution of the sample mean, x , is
approximated by a normal distribution when the sample is a
simple random sample and the sample size, n, is large.
In this case, the mean of the sampling distribution is the
population mean, μ, and the standard deviation of the
sampling distribution is the population standard deviation, σ,
divided by the square root of the sample size. The latter is
referred to as the standard error of the mean.



In symbols, the standard error is x n
A sample size of 100 or more elements is generally considered
sufficient to permit using the CLT. If the population from
which the sample is drawn is symmetrically distributed, n > 30
may be sufficient to use the CLT.
Large random sample from any population
Any population
Sampling distribution of ͞x
when sample is random
No. of elements
N
n
Mean
μ
μ
Standard deviation
σ
x 

n
A sample size n of greater than 100 is
generally considered sufficiently large to use
these results from the CLT.
Probability that a sample mean is within a specified
distance of the population mean
Population:   2352and  1485
Randomsampleof n  50
x has mean   2352
x has standard deviation of 210
What is the probability that a particular random sample
of size n = 50 has a mean that is within $100 of the
population mean? See next slide.
• Within $100 of the mean is from 2352 - 100 = 2252 to 2352 +
100 = 2452.
• The sampling distribution of the sample means is normal since
the sample size n = 50 is large, and µ = 2352 and σ = 210.
• The required probability is the area under the normal curve
between 2252 and 2452.
• Obtain the corresponding Z-values.
2452  2352
2252  2352
Z
 0.48 and Z 
 0.48
210
210
For Z = -0.48, cumulative probability is 0.3156
For Z = 0.48, cumulative probability is 0.6844
Required probability is 0.6844 – 0.3156 = 0.3688. The probability
that a sample yields a sample mean within $100 of the
population mean is 0.37.
Standard error for the sample mean
• The standard deviation of the sampling distribution of the
sample mean is also referred to as the standard error. As n
increases, the standard error decreases, so the sample means
are less variable. As n increases, the sample means tend to be
closer to the population mean. That is, for a larger n, there is
an increased probability that the sample mean lies within any
specified distance from the mean. See the next slide for a
diagram.

 
x
n
• In the last example, if n = 200, the standard error is 1485
divided by the square root of 200, or 105. With this larger
sample size, the probability that a sample mean is within $100
of the population mean is the area under a normal curve
between z = -0.95 and z = 0.95, or 0.6578.
Example of the effect of changing sample size
n
x 

n
Value of standard error
when σ = 1485
Probability of sample mean
being within $100 of μ
50
σ/7.071
210
0.37
200
σ/14.142
105
0.66
500
σ/22.361
66.4
0.87
1000
σ/31.623
47.0
0.97
From these calculations, note how the larger sample size
produces sampling distributions where the sample mean is
generally closer to the population mean μ. The last column
shows how there is increased probability that the sample mean
is within $100 of the population mean µ as n becomes larger.
Constructing interval estimates of a parameter
• The general form for the interval estimate of a population
parameter is
Point estimate of parameter ± Margin of error
• The margin of error is an amount that is added to and
subtracted from the point estimate of a statistic, to produce
an interval estimate of the parameter.
• The size of the margin of error depends on
– The type of sampling distribution for the sample statistic.
– The percentage of the area under the sampling
distribution that a researcher decides to include – usually
90%, 95%, or 99%. This is termed a confidence level.
• Each interval estimate is an interval constructed around the
point estimate, along with a confidence level.
Examples of interval estimates
• Statistics Canada reports that mean weekly food expenditures
for Prairie households in 2001 were $127.78. But the data
were obtained from a sample so there is sampling error
associated with this estimate. The “true” value of the mean is
between $123.78 and $131.78, 68% of the time and between
$119.78 and 135.78, 95% of the time.
Source: Statistics Canada, Food Expenditure in Canada 2001, catalogue no. 62554-XIE, pp. 16, 70, 81.
• “The margin of error is estimated to be plus or minus 3.51 per
cent, 19 times out of 20.” From the Palliser electoral district
poll reporting Conservative at 43.3%, NDP at 35.7%, Liberal at
17.3%, and Green at 3.5% of decided voters, conducted by
Sigma Analytics.
Source: Leader-Post, Regina, October 3, 2008, pp. A1-A2.
Statistics Canada uses the
following method
For example, if the estimate of an average
expenditure for a given category is $75 and
the corresponding CV is 5%, then the “true”
value is between $71.25 and $78.75, 68% of
the time and between $67.50 and $82.50,
95% of the time. (p. 70 of 62-514XIE).
The intervals on for mean food expenditure on
the last slide were constructed from this.
Modified FIGURE 8.1
SAMPLING DISTRIBUTION OF THE SAMPLE MEAN AMOUNT SPENT FROM
SIMPLE RANDOM SAMPLES OF 100 CUSTOMERS
x
A sampling distribution of the sample mean for a simple
random sample of 100 individuals from a population with a
standard deviation of 20. The mean of the sampling
distribution of x is the population mean μ and its standard
deviation, or standard error, is 2. This distribution can be
used to construct an interval estimate of μ.
Constructing an interval estimate for a
population mean μ
• Obtain the point estimate of μ, that is, the sample mean x.
• Determine the distribution of the sample mean. If n is large,
then the Central Limit Theorem can be used and x is
normally distributed with mean μ and standard deviation
x 

n
• Select a confidence level. The most common level is 95%.
• Obtain the margin of error associated with the confidence
level. For a normal distribution, the interval from Z = -1.96 to
Z = 1.96 contains 95% of the area under the curve or of the
sample means. See next slide to illustrate this.


x

1
.
96
to
x

1
.
96
• The 95% interval estimate is
n
n
Modified FIGURE 8.2. SAMPLING DISTRIBUTION OF x SHOWING THE
¯
LOCATION OF SAMPLE MEANS THAT ARE WITHIN 3.92 Z-values OF μ
In this example, the standard error is 2 and the margin
of error is 2 x 1.96 = 3.92. For the general case, 1.96
is multiplied by the standard error to determine the
margin of error.
Example of interval estimates - I
Statistics of total income, Saskatchewan females
employed full-time and full-year, by age, 2003
Age group
Income in thousands
of dollars
Sample
size
Mean
Standard
deviation
25-34
35-44
33.3
40.3
13.5
20.7
55
57
45-54
55-64
45.1
40.1
25.9
25.9
37
31
Source: Data for this question adapted from Statistics Canada. General Social
Survey of Canada, 2003. Cycle 17: Social Engagement [machine readable data
file]. 1st Edition. Ottawa, ON: Statistics Canada [publisher and distributor]
10/1/2004. Obtained through University of Regina Data Library Services.
Example of interval estimates - II
• Obtain 95% interval estimates for the mean income
of all full-time, full-year employed females in
Saskatchewan in these age groups.
• Describe the pattern of mean income by age.
Analysis: The pattern in the samples is clear –
increased mean income from ages 25-34 to 45-54,
then a decline for ages 55-64. However, the data
from each of the four age groups is a sample, so
interval estimates are necessary to comment on
whether this pattern appears to hold for all females.
Example of interval estimates - III
Obtain an interval estimate for the mean income of all females
aged 25-34. Call this μ.
• The point estimate of μ is the sample mean, x  33 .3
• Since n = 55 is reasonably large, the Central Limit Theorem
will be used. Thus, x is normally distributed with mean μ
and standard deviation  x   n
• Select the 95% confidence level, as requested.
• In a normal distribution, Z = -1.96 to Z = 1.96 has 95% of the
area under the curve or of the sample means.


x

1
.
96
to
x

1
.
96
• The 95% interval estimate is
n
n
• In this example, s is used as an estimate of σ.
• The interval is 33.3 1.96(13.5 55) to33.3 1.96(13.5 55)
• The margin of error is ±3.6 and the 95% interval estimate of μ
is (29.7, 36.9) thousand dollars.
Example of interval estimates - IV
Age
group
Income in thousands
of dollars
Sample Margin
size
of error
95% interval
estimates
Mean
Standard
deviation
25-34
33.3
13.5
55
±3.6
(29.7, 36.9)
35-44
40.3
20.7
57
±5.4
(34.9, 45.7)
45-54
55-64
45.1
40.1
25.9
25.9
37
31
±8.2
±9.1
(36.9, 53.3)
(31.0, 49.2)
• Explain why the margins of error differ as they do.
• Explain the pattern of mean income by age for all females of
each age group, now that interval estimates are available.
Example of interval estimates - V
• The margin of error is greater when s is larger or n is smaller.
All these interval estimates have the same Z = ±1.96
associated with the 95% confidence level. A larger confidence
level produces a larger Z, a larger margin of error, and a wider
interval.
• The intervals for each of the groups between ages 35 and 64
overlap a lot, meaning that there may not be differences in
the mean income for all females of these ages. The interval
for the 45-54 and 25-34 age groups do not overlap so it is
fairly certain that all females aged 25-34 have lower incomes
than do all those aged 45-54.
• Note that the target or sample populations in this example are
not really all Saskatchewan females of each age group, but
only those employed and employed full-time and full-year.
Interpretation of interval estimates
• The interval estimate is an interval of values of the sample
mean x . We hope that this interval contains the population
mean μ.
• With repeated random sampling, if a 95% confidence level is
selected, the probability is 0.95 that the intervals contain the
population mean μ. A particular interval may or may not
contain μ but the method employed here means that 95% of
intervals are constructed so that they cross the population
mean μ. (For example, 95% confidence intervals for the two
poor samples – samples 65 and 171 – in the 192 sample
simulation do not contain the population mean). See
following slide for an illustration of this.
• When reporting a confidence interval, make sure you report
both the interval and the confidence level. One without the
other is meaningless.
Determination of σ
• In order to construct an interval estimate, it is necessary to
obtain some estimate of σ, the variability of the population
from which the sample is drawn. This is required to obtain an



estimate of the standard error of the sample mean x
n
• Generally, the sample standard deviation s is used as an
estimate of σ. For large sample size, assume the CLT holds
and assume s provides a reasonable estimate of σ. For a
small sample, where n < 30, the t-distribution should be used,
again using s as an estimate of σ .
• In sections 8.1 and 8.2, ASW distinguish methods for when σ
is known and unknown. In practice σ is rarely known and in
note 1, p. 299, ASW state this. In addition, as n increases, the
t-distribution approaches the normal distribution. Thus, so
long as n > 30, it is acceptable to use s as an estimate of σ for
purposes of constructing an interval estimate.
Selecting a confidence level
• There is no one confidence level that is appropriate for all
circumstances.
• Greater confidence level means greater certainty that the
interval estimate of µ actually contains µ. But for 99% or
99.9% confidence level, the interval may be very wide.
• Smaller confidence levels (eg. 80% or 90%) produce smaller
margins of error and seemingly more precise interval
estimates, but they are less likely to contain µ.
• Use the level requested or the level others have used when
researching similar issues.
• By tradition, the default level is 95%.
• Issues such as manufacturing products to be safe for human
use, eg. foods, should require high confidence levels (99.9%+).
But this may increase costs of manufacture and checking for
safety.
Cautions about interval estimates
• There are many assumptions involved in interval estimation:
– The sample is randomly selected from a population.
– The sample size is sufficiently large to use the CLT.
– The population standard deviation is known or s is a good
estimate of σ.
– The selection of a confidence level is an arbitrary process.
– The population is not too skewed (note 2, ASW, 308).
• As a result, interval estimates are not precise, but are
estimates or approximations.
• Larger n, repeated sampling, comparisons with other studies,
and careful sampling and survey design and practice can
improve the quality of the estimates.
Next week
• t-distribution (ASW, sections 8.1, 8.2).
• Sample size (ASW, section 8.3)
• Interval estimates for proportions (ASW, sections 6.3,
7.6, 8.4).
• Extra office hour – Friday, October 10, 1-3 p.m., CL
237.