Transcript LGM 8

Presentation 2
Sampling Distributions
• Sampling Distribution of sample proportions
• Sampling Distribution of sample means
1
Statistics VS Parameters

Statistic – is a numerical value computed from a
sample.



Parameter – is a numerical value associated with a
population.
Essentially, we would like to know the parameter. But
in most cases it is hard to know the parameter since
the population is too large.
So we have to estimate the parameter by some
proper statistics computed from the sample.
2
Some Notation


p = population proportion
pˆ = sample proportion

μ = population mean

x = sample mean


σ = standard deviation
s = sample standard deviation
3
A. Sampling Distribution of the
Sample Proportion
Situation 1: A survey is undertaken to determine the
proportion of PSU students who engage in under-age
drinking. The survey asks 200 random under-age
students (assume no problems with bias). Suppose the
true population proportion of those who drink is 60%.
Thus, p = 0.6 and
drink.
pˆ
is the proportion in the sample who
4
Repeated Samples
Imagine repeating this survey many times, and each
time we record the sample proportion of those who have
engaged in under-age drinking. What would the
ˆ look like?
sampling distribution of p
Sample (n=200)
Sample Proportion
pˆ1
pˆ2
1
2
pˆ 3
pˆ4
pˆ5
3
4
5
…
150,000
pˆ is a random variable
assigning a value to
each sample!
…
pˆ 150,000
5
pˆ for 150 000 samples.
0
2
4
6
8
10
Histogram of
0.4
0.5
pˆ
0.6
0.7
0.8
6
Sampling Distribution of pˆ


Let X be the number of respondents who say they
engage in under age drinking.
X is binomial with n =200 and p =0.6.
So, we can calculate the probability of X for each possible
outcome (0-200). The PDF is plotted below:
0.06
0.05
Probability

0.04
0.03
0.02
0.01
0.00
69 74 79 84 89 94 99 104 109 114 119 124 129 134 139 144 149 154 159 164 169
X
7
Sampling Distribution of pˆ



pˆ 
number of students in the sample that drink X
X


totalnumber of students in the sample
n 200
Since X ~Bin (n =200, p =0.6), the sampling distribution
ˆ is the same as that of the binomial distribution
of p
divided by n.
Therefore we have
E ( X ) np

p
n
n
np (1  p )
sd ( X )
Std Dev. for pˆ 


n
n
Mean for pˆ 
p (1  p )
n
8
Sampling Distribution of


pˆ - Cont.
Using the Normal approximation to the binomial distribution
pˆ
we have that the sampling distribution of
is
approximately Normal with mean p and std. dev. p (1  p ) / n
i.e.
approx .
 p (1  p ) 
pˆ ~ N  p ,

n


The conditions for this approximation to be valid are:
1. The sample selected from the population is random.
2. The sample must be large enough, np and n(1-p) MUST
be greater than 5, and should be greater than 10.
9
Example:




Recent studies have shown that about 20% of American
adults fit the medical definition of being obese.
A large medical clinic would like to estimate what percent
of their patients are obese, so they take a random sample
of 100 patients and find that 18 percent are obese.
Suppose in truth, the same percentage holds for the
patients of the medical clinic as for the general population,
20%.
Give notation and the numerical value for the following.
10
Problem - Cont.
a. The population proportion of obese patients in the medical
clinic:
b. The proportion of obese patients in the sample of 100
patients:
c. The mean of the sampling distribution of pˆ :
d. The standard deviation of the sampling distribution of pˆ :
e. The variance of the sampling distribution of pˆ :
11
B. Sampling Distribution of the
Sample Mean
Situation 2: The mean height of women age 20 to 30, X ,
is normally distributed (bell-shaped) with a mean of 65
inches and a standard deviation of 3 inches. i.e.
X ~N(65,9)
A random sample of 200 women was taken and the sample
mean X recorded.
Now IMAGINE taking MANY samples of size 200 from the
population of women. For each sample we record the X .
What is the sampling distribution of X ?
12
Histograms for the Distribution of
X and X -Bar
Distribution of Sample Means:
X-bar = mean of random sample
of size 200.
0.0
0.00
0.02
0.5
0.04
0.06
1.0
0.08
0.10
1.5
0.12
Original Population of Women:
X= height of random woman
50
55
60
65
X
70
75
80
62
63
64
65
x
66
67
68
13
Normal Data


Consider a Normal random variable X with mean μ and
standard deviation σ,
X ~N( μ , σ2 ).
The sampling distribution of the sample mean of X for a
sample of size n is Normal with
Mean or ExpectedValue of X  E ( X )  

n
2
Variance of X  Var ( X ) 
n
i.e.
Std Dev. of X  s .d .(X ) 
 2 
X ~ N   , 
n 

14
Skewed or Non-Normal Data
0
10
20
30
40
Situation 3: In a college survey, students were asked to
report the number of cd’s they own. Clearly CDs is a right
skewed data set. Suppose our population looked
something like this, let us take repeated samples from this
population and see what the sample mean looks like.
0
100
200
300
400
500
600
CDs
15
1200
n=8
0
0
200
500
400
600
1000
800
1500
n=4
1000
2000
Suppose we take repeated samples of
size n = 4, 8, 16, 32
0
100
200
0
300
50
100
150
200
250
Sample Mean for n=8
800
Sample Mean for n=4
800
n = 32
0
0
200
200
400
400
600
600
n = 16
50
100
150
Sample Mean for n16
200
40
60
80
100
120
140
160
180
Sample Mean for n=32
16
Statistics From Skewed Data


Using that CD sample as the population,
µ = 87.6, σ = 87.8
The sample means from the previous slide had the
following summary statistics:
Sample Size
Mean of X-bar
Std. Dev. of X-bar
n=4
86.6
43.2
n=8
86.8
30.9
n = 16
86.7
21.9
n = 32
86.6
15.6
Note: that the mean remains constant, and the std. deviation
decreases as the sample size increases!
17
Central Limit Theorem

For non-normal data coming from a population with
mean µ and standard deviation σ the sampling
distribution of the sample mean is approximately normal
with
Mean or ExpectedValue of X  E ( X )  

n
2
Variance of X  Var ( X ) 
n
approx.
 2 

i.e.
X ~ N   ,
n 

Std Dev. of X  s .d .(X ) 
 Conditions: The above is true if the sample size is large
enough, usually n > 30 is sufficient.
18
What next?



We have shown that both the sampling distribution of the
sample proportion, and the sampling distribution of the
sample mean are both normal under certain conditions.
Now we can use what we know about normal distributions
ˆ and X !
to make conclusions about p
In the following we will see how to use the values of the
statistics (p-hat, x-bar) to make inferences about the
parameters (p, µ).
19
Exercise 1

The population proportion is 0.30. Consider the following
questions.
1.
Find the sampling distribution of p-hat for each of the
following sample sizes n=100, n=200, n=1000
2.
What is the probability that a sample proportion will
be within ±.04 of the population proportion for each
of these sample sizes?
3.
What is the advantage of larger sample size?
20
Exercise 2

A certain antibiotic in known to cure 85% of strep
bacteria infections. A scientist wants to make sure the
drug does not lose its potency over time. He treats 100
strep patients with a 1 year old supply of the antibiotic.
Let pˆ be the proportion of individuals who are cured.
ASSUME the drug has NOT lost potency, answer the
following questions…
1.
2.
3.
What is the sampling distribution of pˆ ? Draw a picture
If we repeated this study many times we would expect
95% of pˆ to fall within what interval?
What is the probability that more than 90% in the sample
are cured?
21
Exercise 3

A newspaper conducts a poll to determine the proportion
of adults who favor a certain candidate. They ask a
random sample of 800 people whether or not they favor
that candidate (Assume no bias!). Suppose the true
proportion of adults who favor the candidate is 58%.
1.
2.
3.
4.
The newspaper records the sample proportion who favor the
candidate. What is the sampling distribution of the sample
proportion? Draw a picture of its PDF (center it correctly and
include the appropriate scale).
What is the probability that the newspaper would have
recorded a sample proportion greater than 62%?
What is the probability that less than 50% of the newspaper
respondents would support this candidate?
What is the probability that a randomly selected individual
favors this candidate?
22
Exercise 4

Suppose the number of calories FIT students consume in a
day is normally distributed with mean 2000 and standard
deviation 300.
1.
2.
3.
4.
5.
About 95% of PSU students have a daily caloric intake
between what two values?
What is the probability that a randomly selected individual
consumed between 1800 and 2100 calories yesterday?
Suppose I take a random sample of 36 students and
recorded the number of calories each consumed on a given
day. Describe the sampling distribution of the sample mean.
Draw a picture of the sampling distribution of the sample
mean (center it correctly and include the appropriate scale).
If I take a sample of size 36 from the student body, what is
the probability that the sample mean will be less than 2050?
23
Exercise 5

Assume the length of trout living in the Susquehanna River
is normally distributed with mean of 14 inches and
standard deviation of 2 inches. A random sample of 16
trout is taken from the river.
1.
2.
3.
4.
5.
What is the sampling distribution of the average trout length
(i) in a sample of size 16 (ii) in a sample of size 100?
What happens to the sampling distribution of the sample
mean as the sample size increases? (Draw a picture)
What is the probability that a random sample of 16 trout will
provide a sample mean within one in of the population mean?
What is the probability that a random sample of 100 trout will
provide a sample mean within one in of the population
mean?
What is the advantage of a larger sample size when one is
attempting to estimate the population mean?
24