Transcript Document

Sampling Distributions
Parameter & Statistic
Parameter
•
Summary measure about
population
Sample Statistic
•
Summary measure about
sample
• P in Population
& Parameter
• S in Sample
& Statistic
Common Statistics & Parameters
Sample Statistic
Population Parameter
Mean
X

Standard
Deviation
S

Variance
S2
2
Binomial
Proportion
^
p
p
Sampling Distribution
1.
Theoretical probability distribution
2.
Random variable is sample statistic
• Sample mean, sample proportion, etc.
3.
Results from drawing all possible samples of a
fixed size
4.
List of all possible [x, p(x)] pairs
• Sampling distribution of the sample mean
Sampling from
Normal Populations
定理1
if x ~ N (  ,  ) , 若 Y  a  bx , then Y ~ N ( a  b  , b  )
2
2
平均數
E (Y )  E ( a  bx )  E ( a )  E ( bx )  a  bE ( x )  a  b 
變異數
V (Y )  V ( a  bx )  V ( a )  V ( bx )  0  b V ( x )  b 
2
2
2
2
定理2
if X ~ N (  x ,  x ) , Y ~ N (  Y ,  Y ) , X and Y are independent
2
2
if w  aX  bY , then w ~ N ( a  X  b  Y , a  X  b  Y )
2
2
2
2
定理:Let Y1,Y2,…,Yn be a random sample of size n
from a normal distribution with mean μand varianceσ2.
Then
1 n
Y   Yi
n i 1
is normally distribution with mean  Y  
And variance  Y 2   2 / n
Proof:
E (Yi )   and V (Yi )  
Y 
1
n
Y

n
i 1
i

1
n
(Y1 ) 
 a1Y1  a 2 Y2 
1
n
2
for i  1, 2 ,..., n
(Y 2 ) 

1
n
(Y n )
 a nYn
w here a i  1 / n , i  1, 2, ..., n
1
1
E (Y )  E  (Y1 )  (Y 2 ) 
n
n

1
n
( ) 
1
n
( ) 


 (Y n ) 
n

1
1
n
( )
1
1
V (Y )  V  (Y1 )  (Y 2 ) 
n
n

1
n

2
1
n
2
( ) 
2
1
n
( n ) 
2
( ) 
2
2


 (Y n ) 
n

1

1
n
2
n
2
( )
2
Properties of the Sampling
Distribution of x
Standard Error of the Mean
1. Standard deviation of all possible sample
means, x
●
Measures scatter in all sample means, x
2. Less than population standard deviation
3. Formula (sampling with replacement)

x 
n
Sampling from Normal Populations

Central Tendency
x  

Population Distribution
 = 10
Dispersion

x 
n

 = 50
X
Sampling Distribution
Sampling with replacement n = 4
X = 5
n =16
X = 2.5
X- = 50
X
Standardizing the Sampling Distribution of x
Sampling
Distribution
X  x X  
Z


x
n
Standardized Normal
Distribution
X
=1
X
X
 =0
Z
Thinking Challenge
You’re an operations analyst
for AT&T. Long-distance
telephone calls are normally
distribution with  = 8 min.
and  = 2 min. If you select
random samples of 25 calls,
what percentage of the
sample means would be
between 7.8 & 8.2 minutes?
© 1984-1994 T/Maker Co.
Sampling Distribution Solution*
X 
Sampling
Distribution
7.8  8
Z

 .50

2
25
n
X   8.2  8
Z

 .50

2
25
n
X = .4
Standardized Normal
Distribution
=1
.3830
.1915 .1915
7.8 8 8.2 X
–.50 0 .50
Z
Sampling from
Non-Normal Populations
Developing Sampling Distributions
Suppose There’s a Population ...
 Population
 Random
 Values
size, N = 4
variable, x
of x: 1, 2, 3, 4
 Uniform
distribution
© 1984-1994 T/Maker Co.
Population Characteristics
Summary Measures
Population Distribution
N
 

Xi
i 1
.3
.2
.1
.0
 2 .5
N
N
 
X
i
i 1
N

P(x)
x
1
2
 1.12
2
3
4
All Possible Samples of Size n = 2
16 Samples
16 Sample Means
1st 2nd Observation
Obs 1
2
3
4
1st 2nd Observation
Obs 1
2
3
4
1
1,1 1,2 1,3 1,4
1 1.0 1.5 2.0 2.5
2
2,1 2,2 2,3 2,4
2 1.5 2.0 2.5 3.0
3
3,1 3,2 3,3 3,4
3 2.0 2.5 3.0 3.5
4
4,1 4,2 4,3 4,4
4 2.5 3.0 3.5 4.0
Sample with replacement
Sampling Distribution of All Sample Means
16 Sample Means
1st 2nd Observation
Obs 1
2
3
4
1 1.0 1.5 2.0 2.5
2 1.5 2.0 2.5 3.0
3 2.0 2.5 3.0 3.5
4 2.5 3.0 3.5 4.0
Sampling Distribution
of the Sample Mean
P(x)
.3
.2
.1
.0
x
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Summary Measures of All Sample Means
N
X 

Xi
i 1
1 .0  1 .5  ...  4 .0

N
16
N
X 
X
i
 X

2
i 1
N
(1.0  2.5)  (1.5  2.5)  ...  (4.0  2.5)
2

 2 .5
2
16
2
 .79
Comparison
Population
.3
.2
.1
.0
Sampling Distribution
P(x)
x
1
2
  2.5
  1.12
3
4
P(x)
.3
.2
.1
.0
x
1.0 1.5 2.0 2.5 3.0 3.5 4.0
 x  2.5
 x  .79
Sampling Distribution of the Mean…
A fair die is thrown infinitely many times,
with the random variable X = # of spots on any throw.
The probability distribution of X is:
x 1
P(x) 1/6
2
1/6
3
1/6
4
1/6
5
1/6
6
1/6
…and the mean and variance are calculated as well:
9.25
Sampling Distribution of Two Dice
A sampling distribution is created by looking at
all samples of size n=2 (i.e. two dice) and their means…
While there are 36 possible samples of size 2, there are only 11
values for , and some (e.g. =3.5) occur more frequently than
others (e.g.
=1).
9.26
Sampling Distribution of Two Dice…
6/36
P( )
1.5
2/36
2.0
3/36
2.5
4/36
3.0
5/36
3.5
6/36
4.0
5/36
4.5
4/36
5.0
3/36
5.5
2/36
6.0
1/36
5/36
)
1/36
4/36
P(
1.0
3/36
2/36
1.0
1/36
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
9.27
Compare…
Compare the distribution of X…
1
2
3
4
5
6
1.0
1.5
…with the sampling distribution of
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
.
As well, note that:
9.28
Sampling from
Non-Normal Populations
Law of Large Numbers
The law of large numbers states that, under general
conditions, will be near
with very high probability
when n is large.
 The conditions for the law of large numbers are



Yi , i=1, …, n, are i.i.d.
The variance of Yi ,
, is finite.
Central Limit Theorem…
The sampling distribution of the mean of a random
sample drawn from any population is approximately
normal for a sufficiently large sample size.
The larger the sample size, the more closely the sampling
distribution of X will resemble a normal distribution.
9.32
Central Limit Theorem…
If the population is normal, then X is normally distributed
for all values of n.
If the population is non-normal, then X is approximately
normal only for larger values of n.
In most practical situations, a sample size of 30 may be
sufficiently large to allow us to use the normal distribution
as an approximation for the sampling distribution of X.
9.33
Sampling from Non-Normal Populations

Central Tendency
x  

Population Distribution
s = 10
Dispersion

x 
n

m = 50
X
Sampling Distribution
Sampling with replacement n = 4
X=5
n =30
X = 1.8
mX- = 50
X
Central Limit Theorem
As sample
size gets
large
enough
(n  30) ...

x 
n
x  
sampling
distribution
becomes almost
normal.
X
Central Limit Theorem Example
The amount of soda in cans of a
particular brand has a mean of 12 oz
and a standard deviation of .2 oz. If
you select random samples of 50 cans,
what percentage of the sample means
would be less than 11.95 oz?
SODA
Central Limit Theorem Solution*
X   11.95  12
Z

 1.77

.2
n
50
Sampling
Distribution
Standardized Normal
Distribution
X = .03
.0384
=1
.4616
11.95 12
X
–1.77 0
Shaded area exaggerated
Z
Example
One survey interviewed 25 people who graduated one
year ago and determines their weekly salary.
The sample mean to be $750.
To interpret the finding one needs to calculate the
probability that a sample of 25 graduates would have a
mean of $750 or less when the population mean is $800
and the standard deviation is $100.
After calculating the probability, he needs to draw some
conclusions.
9.40
Example
We want to find the probability that the sample
mean is less than $750. Thus, we seek
X
P ( X  750 )
The distribution of X, the weekly income, is likely to
be positively skewed, but not sufficiently so to make
the distribution of nonnormal. As a result, we may
assume that X is normal with mean
 x    800
and standard deviation
x   /
n  100 /
25
 20
9.41
Example
Thus,
P ( X  750 )
 X  x
750  800

 P



20
x





 P ( Z   2 .5 )
 . 5  . 4938
 . 0062
The probability of observing a sample mean as low as
$750 when the population mean is $800 is extremely
small. Because this event is quite unlikely, we would
have to conclude that the dean's claim is not justified.
9.42
Using the Sampling Distribution for Inference
Here’s another way of expressing the probability
calculated from a sampling distribution.
P(-1.96 < Z < 1.96) = .95
Substituting the formula for the sampling distribution
P (  1 . 96 
X  
/
 1 . 96 )  . 95
n
With a little algebra
P (   1 . 96

n
 X    1 . 96

)  . 95
n
9.43
Using the Sampling Distribution for Inference
Returning to the chapter-opening example where µ =
800, σ = 100, and n = 25, we compute
P ( 800  1 . 96
100
 X  800  1 . 96
25
100
)  . 95
25
or
P ( 760 . 8  X  839 . 2 )  . 95
This tells us that there is a 95% probability that a
sample mean will fall between 760.8 and 839.2.
Because the sample mean was computed to be $750,
we would have to conclude that the dean's claim is not
supported by the statistic.
9.44
Using the Sampling Distribution for Inference
For example, with µ = 800, σ = 100, n = 25 and
α= .01, we produce
P (   z .005

P (800  2 . 575
n
 X    z .005
100

)  1  . 01
n
 X  800  2 . 575
25
100
)  . 99
25
P ( 748 . 5  X  851 . 5 )  . 99
9.45
Sampling Distributions The Proportion

The proportion of the population having some
characteristic is denoted π.
p
X
n

number
of items in the sample having
sample size
the characteri stic of interest
Sampling Distributions The Proportion

Standard error for the
proportion:
 (1   )
σp 

n
Z value for the
proportion:
Z
p 
σp

p 
 (1   )
n
Sampling Distributions The Proportion: Example
If the true proportion of voters who support
Proposition A is π = .4, what is the probability that a
sample of size 200 yields a sample proportion between
.40 and .45?
 In other words, if π = .4 and n = 200, what is

P(.40 ≤ p ≤ .45) ?
Sampling Distributions The Proportion: Example
Find σ p :
Convert to
standardize
d normal:
σp 
 (1   )
n

.4(1  .4)
 .03464
200
.45  .40 
 .40  .40
P(.40  p  .45)  P 
Z

.03464 
 .03464
 P(0  Z  1.44)
Sampling Distributions The Proportion: Example
Use cumulative normal table:
P(0 ≤ Z ≤ 1.44) = P(Z ≤ 1.44) – 0.5 = .4251
Standardized
Normal Distribution
Sampling
Distribution
.4251
Standardize
.40
.45
p
0
1.44
Z