SAMPLING DISTRIBUTION - Middle East Technical University

Download Report

Transcript SAMPLING DISTRIBUTION - Middle East Technical University

SAMPLING DISTRIBUTION
Introduction
• In real life calculating parameters of populations is
usually impossible because populations are very
large.
• Rather than investigating the whole population, we
take a sample, calculate a statistic related to the
parameter of interest, and make an inference.
2
STATISTIC
• Let X1, X2,…,Xn be a r.s. of size n from a
population and let T(x1,x2,…,xn) be a function
which does not depend on any unknown
parameters. Then, the r.v. or a random vector
Y=T(X1, X2,…,Xn) is called a statistic.
3
STATISTIC
• The sample mean is the arithmetic average of
the values in a r.s.
X1  X 2   X n 1 n
X
  Xi
n
n i 1
• The sample variance is the statistic defined by
1 n
2
S 
 Xi  X 
n  1 i1
2
• The sample standard deviation is the statistic
defined by S.
4
SAMPLING DISTRIBUTION
• A statistic is also a random variable. Its
distribution depends on the distribution of the
random sample and the form of the function
Y=T(X1, X2,…,Xn). The probability distribution
of a statistic Y is called the sampling
distribution of Y.
Sampling Distribution of the Mean
• An example
– A die is thrown infinitely many times. Let X
represent the number of spots showing on any
throw.
– The probability distribution of X is
x
1 2 3 4 5 6
p(x) 1/6 1/6 1/6 1/6 1/6 1/6
E(X) = 1(1/6) +
2(1/6) + 3(1/6)+
………………….= 3.5
V(X) = (1-3.5)2(1/6) +
(2-3.5)2(1/6) +
…………. …= 2.92
6
Throwing a die twice – sample mean
• Suppose we want to estimate the
mean of a population m from the mean
of a sample, X , of size n = 2.
• What is the distribution of X ?
7
Throwing a die twice – sample mean
Sample
1
2
3
4
5
6
7
8
9
10
11
12
1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6
Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5
19
4,1
2.5
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5
Sample
25
26
27
28
29
30
31
32
33
34
35
36
Mean
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6
3
3.5
4
4.5
5
5.5
3.5
4
4.5
5
5.5
6
8
Sample
1
2
3
4
5
6
7
8
9
10
11
12
Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5
19
4,1
2.5
x
x
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5
1,1
The distribution
of
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6
Sample
25
26
27
28
29
30 2
31
32x
33
34
35
36
Mean
X when n = 2
Note : m  m
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6
and  

2
x
2
3
3.5
4
4.5
5
5.5
3.5
4
4.5
5
5.5
6
E( x) =1.0(1/36)+
1.5(2/36)+….=3.5
6/36
5/36
V( x ) = (1.0-3.5)2(1/36)+
(1.5-3.5)2(2/36)... = 1.46
4/36
3/36
2/36
1/36
1
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5 6.0
x
9
Sampling Distribution of the Mean
n5
m x  3.5
 2x
  .5833 (  )
5
2
x
n  10
m x  3.5
n  25
m x  3.5
 2x
  .2917 (  )
10
 2x
  .1167 (  )
25
2
x
2
x
6
10
Sampling Distribution of the Mean
n5
m x  3.5
 2x
2
 x  .5833 (  )
5
n  10
m x  3.5
n  25
m x  3.5
 2x
  .2917 (  )
10
 2x
  .1167 (  )
25
2
x
Notice that  x2is issmaller
smallerthan
than. x2.
The larger the sample size the
smaller 2x. .Therefore,
Therefore,x tends
xtends
to fall closer to m, as the sample
size increases.
2
x
11
SAMPLING FROM THE NORMAL
DISTRIBUTION
Properties of the Sample Mean and Sample
Variance
• Let X1, X2,…,Xn be a r.s. of size n from a
N(m,2) distribution. Then,
2
a) X and S are independent rvs.
b) X ~ N  m ,  / n 
2
n  1 S

2
c)
~  n1
2
2

12
SAMPLING FROM THE NORMAL
DISTRIBUTION
• Let X1, X2,…,Xn be a r.s. of size n from a
N(m,2) distribution. Then,
X m
~ N  0,1
/ n
•Most of the time  is unknown, so we use:
X m
.
S/ n
13
SAMPLING FROM THE NORMAL
DISTRIBUTION
In statistical inference, Student’s t distribution
is very important.
14
SAMPLING FROM THE NORMAL
DISTRIBUTION
• Let X1, X2,…,Xn be a r.s. of size n from a
N(mX,X2) distribution and let Y1,Y2,…,Ym
be a r.s. of size m from an independent
N(mY,Y2).
• If we are interested in comparing the
variability of the populations, one
quantity of interest would be the ratio
2
2
2
2
 X /  Y  S X / SY
15
SAMPLING FROM THE NORMAL
DISTRIBUTION
• The F distribution allows us to compare these
quantities by giving the distribution of
S X2 / SY2 S X2 /  X2
 2 2 ~ Fn1,m1
2
2
 X /  Y SY /  Y
• If X~Fp,q, then 1/X~Fq,p.
• If X~tq, then X2~F1,q.
16
CENTRAL LIMIT THEOREM
If a random sample is drawn from any population, the sampling
distribution of the sample mean is approximately normal for a
sufficiently large sample size. The larger the sample size, the
more closely the sampling distribution of X will resemble a
normal distribution.
Random Sample
(X1, X2, X3, …,Xn)
X
Random Variable
(Population) Distribution
X
as n  
Sample Mean
Distribution
17
Sampling Distribution of the Sample
Mean
mX  m
2


2
X 
or  X 
n
n
If X is normal, X is normal.
X m
X ~ N( m , / n )  Z 
~ N( 0,1 )
/ n
If X is non-normal, X is approximately normally
distributed for sample size greater than or
equal to 30.
2
18
EXAMPLE 1
• The amount of soda pop in each bottle is
normally distributed with a mean of 32.2 ounces
and a standard deviation of 0.3 ounces.
– Find the probability that a bottle bought by a customer
will contain more than 32 ounces.
– Solution
• The random variable X is the
0.7486
amount of soda in a bottle.
P( x  32)  P(
x  m 32  32.2

)
x
.3
 P( z  .67)  0.7486
x = 32 m = 32.2
19
EXAMPLE 1 (contd.)
• Find the probability that a carton of four bottles will have
a mean of more than 32 ounces of soda per bottle.
• Solution
– Define the random variable as the mean amount of soda per
bottle.
x  m 32  32.2

)
x
.3 4
 P( z  1.33)  0.9082
P( x  32)  P(
0.9082
0.7486
x = 32
x  32 m = 32.2
m x  32.2
20
Sampling Distribution of
a Proportion
• The parameter of interest for nominal data is
the proportion of times a particular outcome
(success) occurs.
• To estimate the population proportion ‘p’ we
use the sample proportion.
The number
of successes
The estimate of p = p^ =
X
n
21
Sampling Distribution of
a Proportion
• Since X is binomial, probabilities about p^ can
be calculated from the binomial distribution.
• Yet, for inference about ^p we prefer to use
normal approximation to the binomial
whenever this approximation is appropriate.
22
Approximate Sampling Distribution of a
Sample Proportion
• From the laws of expected value and variance, it can be
shown that E( pˆ ) = p and V( pˆ )=p(1-p)/n
• If both np ≥ 5 and n(1-p) ≥ 5, then
z
ˆp
p
p(1  p)
n
• Z is approximately standard normally distributed.
23
EXAMPLE
– A state representative received 52% of the
votes in the last election.
– One year later the representative wanted to
study his popularity.
– If his popularity has not changed, what is the
probability that more than half of a sample of
300 voters would vote for him?
24
EXAMPLE (contd.)
Solution
• The number of respondents who prefer the representative is
binomial with n = 300 and p = .52. Thus, np = 300(.52) = 156 and
n(1-p) = 300(1-.52) = 144 (both greater than 5)


ˆ
p

p
.
50

.
52
  .7549
P( pˆ  .50)  P

 p(1  p) n

(.
52
)(
1

.
52
)
300


25