45-733: lecture 7 (chapter 6) Sampling Distributions 1

Download Report

Transcript 45-733: lecture 7 (chapter 6) Sampling Distributions 1

45-733: lecture 7 (chapter 6)
Sampling Distributions
William B. Vogt, Carnegie Mellon, 45-733
1
5/27/2016
Samples from populations

There is some population we are interested
in:
–
–
–
–
Families in the US
Products coming off our assembly line
Consumers in our product’s market segment
Employees
William B. Vogt, Carnegie Mellon, 45-733
2
5/27/2016
Samples from populations

We are interested in some quantitative
information (called variables) about these
populations:
– Income of families in the US
– Defects in products coming off our assembly
line
– Perception of consumers of our product
– Productivity of our employees
William B. Vogt, Carnegie Mellon, 45-733
3
5/27/2016
Samples from populations

All the information (accessible to
statistics) about a quantity in a population
is contained in its distribution function
– Real-world distribution functions are
complicated things
– In real life, we usually know little or nothing
about the distribution functions of the
variables we are interested in
William B. Vogt, Carnegie Mellon, 45-733
4
5/27/2016
Samples from populations

Because distribution functions are
complex, we only try to find out about
certain aspects of them (parameters):
– Average income of families in the US
– Rate of defects coming off our production line
– % of customers who view our product
favorably
– Average pieces/hour finished by a worker
William B. Vogt, Carnegie Mellon, 45-733
5
5/27/2016
Samples from populations
Of course, we do not begin by knowing
even these quantities
 One possibility is to measure the whole
population

– Allows us to answer any question about the
distribution or parameters, using the
techniques of chapter 2
– However, this is almost always expensive and
often infeasible
William B. Vogt, Carnegie Mellon, 45-733
6
5/27/2016
Samples from populations
Instead, we take a sample
 Taking a sample

– We select only a few of the members of the
population
– We measure the variables of interest for those
members we select
– Examples
Phone survey
 Take 1 out of each 10,000 units off our prod line

William B. Vogt, Carnegie Mellon, 45-733
7
5/27/2016
Samples from populations

The whole of statistics is figuring out what
we can learn about the population from a
sample:
– What can we say about the distribution of a
variable from the information in a sample?
– What can we say about the parameters we are
interested in from our sample?
– How good is the information in our sample
about the population?
William B. Vogt, Carnegie Mellon, 45-733
8
5/27/2016
Samples from populations

Example:
– We are interested in how favorably our
product is viewed by customers
– We do a phone survey of our 5 good friends
and ask them if they view our product
favorably or unfavorably
All 5 say favorably
 What can we conclude?

William B. Vogt, Carnegie Mellon, 45-733
9
5/27/2016
Samples from populations

Example:
– We are interested in how favorably our
product is viewed by customers
– We do a phone survey of 500 people who have
purchased our product before and ask them if
they view our product favorably or
unfavorably
466 say they view our product favorably
 What can we conclude?

William B. Vogt, Carnegie Mellon, 45-733
10
5/27/2016
Samples from populations

Example:
– We are interested in how favorably our
product is viewed by customers
– We do a phone survey of 500 random adults
and ask them if they view our product
favorably or unfavorably
351 say they view our product favorably
 What can we conclude?

William B. Vogt, Carnegie Mellon, 45-733
11
5/27/2016
Samples and statistics
As a practical matter, we are usually
interested in using our sample to say
something about a parameter of the
distribution we care about
 To get at this parameter, we construct a
variable called an estimator or statistic

William B. Vogt, Carnegie Mellon, 45-733
12
5/27/2016
Samples and statistics

Example:
– If we want to know the average income of families in
the US, we draw a sample from a random phone
survey of 1000 families
– We ask, among other things, for their family income
– To estimate E(I), we calculate the estimator or statistic
called sample mean:
1 1000
I
Ii

1000 i 1
William B. Vogt, Carnegie Mellon, 45-733
13
5/27/2016
Samples and statistics

Example:
– But, what does the sample mean of income tell
us about E(I)?
– Answering this question is the subject of the
rest of the course, and of statistics in general
William B. Vogt, Carnegie Mellon, 45-733
14
5/27/2016
Random sampling



There are different ways to sample a population,
different sampling schemes
The simplest sampling scheme is called “simple
random sampling” or just “random sampling”
If there is a population of size N from which we
are to draw a sample of size n, random sampling
just says that the probability of any one of the N
members of the population being drawn is just
1/N, and that the draws are independent.
William B. Vogt, Carnegie Mellon, 45-733
15
5/27/2016
Statistic or estimator
A statistic (or estimator) is any function of
a sample
 It is an algorithm which tells us what we
would do given a sample
 Example:
1

1000
I

1000
– Sample mean:
I
– Sample variance:
1 1000
2


ˆ I 
I

I
 i
999 i 1
William B. Vogt, Carnegie Mellon, 45-733
i 1
i
16
5/27/2016
Statistic as random variable
A statistic is a random variable!!
 A statistic is a random variable!!
 A statistic is a random variable!!
 A statistic is a random variable!!
 A statistic is a random variable!!
 A statistic is a random variable!!
 A statistic is a random variable!!
 A statistic is a random variable!!

William B. Vogt, Carnegie Mellon, 45-733
17
5/27/2016
Statistic as random variable

A simple example
– Consider the Bernoulli random variable X
with parameter p
– We are interested in p, the probability of a
success
– To estimate p, we will calculate the sample
mean of X:
1 n
pˆ   X i
n i 1
William B. Vogt, Carnegie Mellon, 45-733
18
5/27/2016
Statistic as random variable

A simple example
– First, with a sample size of n=1:
w. p. p
1
pˆ  
0 w. p. 1  p
William B. Vogt, Carnegie Mellon, 45-733
19
5/27/2016
Statistic as random variable

A simple example
– Next, with a sample size of n=2:
 1
w. p. p 2

pˆ  1 / 2 w. p. 2 p1  p 
2
 0
w. p. 1  p 

William B. Vogt, Carnegie Mellon, 45-733
20
5/27/2016
Statistic as random variable

A simple example
– Next, with a sample size of n=3:
 1

2 / 3
pˆ  
1 / 3
 0
w. p.
p3
w. p. 3 p 2 1  p 
w. p. 3 p1  p 
3
w. p. 1  p 
William B. Vogt, Carnegie Mellon, 45-733
2
21
5/27/2016
Statistic as random variable

The statistic is a random variable
– It has a distribution
Probability function or density
 Cumulative distribution function

– It has an expectation
– It has a variance / standard deviation
William B. Vogt, Carnegie Mellon, 45-733
22
5/27/2016
Statistic as random variable

For the Bernoulli example
– Expectation, variance with n=1
E  pˆ   1 p   01  p   p
 
V  pˆ   E pˆ  E  pˆ 
2
2
2
ˆ
V  p   1 p   01  p   p  p1  p 
William B. Vogt, Carnegie Mellon, 45-733
23
5/27/2016
Statistic as random variable

For the Bernoulli example
– Expectation, variance with n=2
 
E  pˆ   1 p 2  1
2

2 p1  p   01  p   p
2
 
 
2
V  pˆ   E pˆ 2  E  pˆ 
2
V  pˆ   1 p 2  1 2 p1  p   01  p   p 2
4
1
 p1  p 
2
William B. Vogt, Carnegie Mellon, 45-733
24
5/27/2016
Statistic as random variable

For the Bernoulli example
– Expectation, variance with n=3
E  pˆ     p
1
V  pˆ     p1  p 
3
William B. Vogt, Carnegie Mellon, 45-733
25
5/27/2016
Statistic as random variable

For the Bernoulli example
– Probability function, n=1
p
1-p
0
William B. Vogt, Carnegie Mellon, 45-733
p
1
p̂
26
5/27/2016
Statistic as random variable

For the Bernoulli example
– Probability function, n=2
0
William B. Vogt, Carnegie Mellon, 45-733
p
1/2
1
p̂
27
5/27/2016
Statistic as random variable

For the Bernoulli example
– Probability function, n=3
0
1/3
p
William B. Vogt, Carnegie Mellon, 45-733
2/3
1
p̂
28
5/27/2016
Sample mean

As we have discussed before, the sample
mean of a random variable X from a
sample of size n is:
n
1
X   Xi
n i 1
William B. Vogt, Carnegie Mellon, 45-733
29
5/27/2016
Sample mean
The sample mean is a random variable!!
 Sample mean is made out of n random
variables; therefore, it is a random
variable:

n
1
X   Xi
n i 1
William B. Vogt, Carnegie Mellon, 45-733
30
5/27/2016
Sample mean

Let’s suppose X is a random variable with
mean X and standard deviation X, and
let’s consider the sample mean:
n
1
X   Xi
n i 1
William B. Vogt, Carnegie Mellon, 45-733
31
5/27/2016
Sample mean

Since the sample mean is a random
variable, we can ask about its expectation:
1 n

E X   E   X i 
 n i 1 
1  n

 E  X i 
n  i 1 
William B. Vogt, Carnegie Mellon, 45-733
32
5/27/2016
Sample mean

Since the sample mean is a random
variable, we can ask about its expectation:
1 n
E X    E  X i 
n i 1
n
1
   X  X
n i 1
William B. Vogt, Carnegie Mellon, 45-733
33
5/27/2016
Sample mean
The expectation of the sample mean is
equal to the expectation of the underlying
random variable
 On average, the sample mean is equal to
the underlying random variable

William B. Vogt, Carnegie Mellon, 45-733
34
5/27/2016
Sample mean

We can also ask about the variance of the
sample mean:
1 n

V X   V   X i 
 n i 1 
1  n

 2 V  Xi 
n  i 1 
1 n



 2  V  X i   2 Cov X i , X j 
n  i 1

William B. Vogt, Carnegie Mellon, 45-733
35
5/27/2016
Sample mean

If it is an independent, random sample
then the covariances are all zero:
1 n

V X   2  V  X i   2 CovX i , X j 
n  i 1

1 n

 2  V  X i 
n  i 1

1  n 2 1 2
 2   X    X
n  i 1
 n
William B. Vogt, Carnegie Mellon, 45-733
36
5/27/2016
Sample mean
The variance of the sample mean is less
than the variance of the underlying random
variable
 The variance of the sample mean gets
smaller as the sample size increases
 The variance of the sample mean goes to
zero as the sample size goes to infinity

William B. Vogt, Carnegie Mellon, 45-733
37
5/27/2016
Sample mean

Our two results:
E X    X
 E  X 
1 2
V X    X
n
1
 
X
n
2
X
William B. Vogt, Carnegie Mellon, 45-733
called standard error
38
5/27/2016
Sample mean

Say that:
– On average, the sample mean is equal to the
mean of the underlying random variable,
regardless of sample size
– As the sample size grows, the variance of the
sample mean shrinks, eventually approaching
zero
William B. Vogt, Carnegie Mellon, 45-733
39
5/27/2016
Sample mean
What would happen if the sample size “got
to” infinity?
 Then the sample mean would no longer be
a random variable, it would literally equal
the population mean, E(X):

E X    X
 E  X 
1 2
V X    X  0

William B. Vogt, Carnegie Mellon, 45-733
40
5/27/2016
Sample mean

Suppose X~N(1,1).
1.4
n=100
1.2
1
0.8
f(x)
n=1
0.6
0.4
0.2
0
-5
-4
William B. Vogt, Carnegie Mellon, 45-733
-3
-2
-1
0
1
2
3
4
5
X
41
5/27/2016
Sample mean

Suppose X~N(1,1).
14
n=1000
12
10
n=100
f(x)
8
6
4
n=1
2
0
-5
-4
-3
-2
William B. Vogt, Carnegie Mellon, 45-733
-1
0
1
2
3
4
5
42
5/27/2016
Sample mean

Finite sample correction
– What has gone before has assumed either that
you sample with replacement or that the
population you are sampling from is very
large (infinite)
– Just as we needed to use hypergeometric
rather than binomial when sampling from a
small pop without replacement, so here:
William B. Vogt, Carnegie Mellon, 45-733
43
5/27/2016
Sample mean

Finite sample correction
– For a population of size N, sampled without
replacement by a sample of size n:
E X    X  E  X 
1
N n
V X    X2
n
N 1
1 2
X 
X
n
William B. Vogt, Carnegie Mellon, 45-733
N n
N 1
44
5/27/2016
Sample mean

Normal variables and
– If X is normal, then so is X-bar
– If X is normal, then:
Z
X  X
X
William B. Vogt, Carnegie Mellon, 45-733
~ N 0,1
45
5/27/2016
Sample mean

Central limit theorem and:
– As long as X comes from an independent
random sample:
Z
X  X
X
William B. Vogt, Carnegie Mellon, 45-733

 N 0,1
46
5/27/2016
Sample proportion
Consider W a Bernoulli and an
independent random sample of size n
 Observe that X= W1+ W2+…+ Wn is
distributed Binomial (and therefore approx
normal)

William B. Vogt, Carnegie Mellon, 45-733
47
5/27/2016
Sample proportion
The sample mean (I.e. sample proportion)
is:
 Just a binomial divided by n
 Also approx normal

1
1
W   Wi  X
n
n
William B. Vogt, Carnegie Mellon, 45-733
48
5/27/2016
Sample proportion

To emphasize that we are estimating the p
parameter of the Bernoulli, we may write:
1
1
pˆ  W   Wi  X
n
n
William B. Vogt, Carnegie Mellon, 45-733
49
5/27/2016
Sample proportion

Just as before, the sample mean has the
same expectation as the underlying
Bernoulli random variable:
E  pˆ   E W   p
William B. Vogt, Carnegie Mellon, 45-733
50
5/27/2016
Sample proportion

Just as before, the sample mean has the
variance of the underlying Bernoulli
random variable over n:
p1  p 
V  pˆ   V W  
n
p1  p 
 pˆ 
n
William B. Vogt, Carnegie Mellon, 45-733
51
5/27/2016
Sample proportion

Just as before, if there is a finite
population sampled w/o replacement:
p1  p  N  n
V  pˆ   V W  
n
N 1
p1  p  N  n
 pˆ 
n
N 1
William B. Vogt, Carnegie Mellon, 45-733
52
5/27/2016
Sample variance

As we have discussed before, the sample
variance and sample standard deviation are
given by:
2
1
X i  X 
ˆ 

n  1 i 1
n
2
X
ˆ X  ˆ X2
William B. Vogt, Carnegie Mellon, 45-733
53
5/27/2016
Sample variance

Sometimes these are written:
2
1
X i  X 
s 

n  1 i 1
n
2
X
s X  s X2
William B. Vogt, Carnegie Mellon, 45-733
54
5/27/2016
Sample variance

It turns out that:
  
Es
2
X
William B. Vogt, Carnegie Mellon, 45-733
2
X
55
5/27/2016
Sample variance

It turns out that:
 
V s
2
X
William B. Vogt, Carnegie Mellon, 45-733
2

n 1
4
X
56
5/27/2016
Sample variance

It turns out that, if X is distributed normal:
n  1s

2
X
William B. Vogt, Carnegie Mellon, 45-733
2
X
~
2
 n 1
57
5/27/2016
Sample variance

It turns out that (by the CLT), if X is from
an independent random sample:
n  1s

2
X
William B. Vogt, Carnegie Mellon, 45-733
2
X

 
2
 n 1
58
5/27/2016
Sample variance

Discuss Chi-Squared distribution
William B. Vogt, Carnegie Mellon, 45-733
59
5/27/2016
Sample variance

Example (problem 46, page 251)
– A drug company manufactures pills
– These pills have normally distributed weight
– The drug co wants the variance of weight to be
smaller than 1.5 milligrams squared
– Drug co collects a sample of size 20
– The sample variance is 2.05
– How likely is it that a sample variance this high or
higher would be found if the true variance is 1.5?
William B. Vogt, Carnegie Mellon, 45-733
60
5/27/2016