45-733: lecture 7 (chapter 6) Sampling Distributions 1
Download
Report
Transcript 45-733: lecture 7 (chapter 6) Sampling Distributions 1
45-733: lecture 7 (chapter 6)
Sampling Distributions
William B. Vogt, Carnegie Mellon, 45-733
1
5/27/2016
Samples from populations
There is some population we are interested
in:
–
–
–
–
Families in the US
Products coming off our assembly line
Consumers in our product’s market segment
Employees
William B. Vogt, Carnegie Mellon, 45-733
2
5/27/2016
Samples from populations
We are interested in some quantitative
information (called variables) about these
populations:
– Income of families in the US
– Defects in products coming off our assembly
line
– Perception of consumers of our product
– Productivity of our employees
William B. Vogt, Carnegie Mellon, 45-733
3
5/27/2016
Samples from populations
All the information (accessible to
statistics) about a quantity in a population
is contained in its distribution function
– Real-world distribution functions are
complicated things
– In real life, we usually know little or nothing
about the distribution functions of the
variables we are interested in
William B. Vogt, Carnegie Mellon, 45-733
4
5/27/2016
Samples from populations
Because distribution functions are
complex, we only try to find out about
certain aspects of them (parameters):
– Average income of families in the US
– Rate of defects coming off our production line
– % of customers who view our product
favorably
– Average pieces/hour finished by a worker
William B. Vogt, Carnegie Mellon, 45-733
5
5/27/2016
Samples from populations
Of course, we do not begin by knowing
even these quantities
One possibility is to measure the whole
population
– Allows us to answer any question about the
distribution or parameters, using the
techniques of chapter 2
– However, this is almost always expensive and
often infeasible
William B. Vogt, Carnegie Mellon, 45-733
6
5/27/2016
Samples from populations
Instead, we take a sample
Taking a sample
– We select only a few of the members of the
population
– We measure the variables of interest for those
members we select
– Examples
Phone survey
Take 1 out of each 10,000 units off our prod line
William B. Vogt, Carnegie Mellon, 45-733
7
5/27/2016
Samples from populations
The whole of statistics is figuring out what
we can learn about the population from a
sample:
– What can we say about the distribution of a
variable from the information in a sample?
– What can we say about the parameters we are
interested in from our sample?
– How good is the information in our sample
about the population?
William B. Vogt, Carnegie Mellon, 45-733
8
5/27/2016
Samples from populations
Example:
– We are interested in how favorably our
product is viewed by customers
– We do a phone survey of our 5 good friends
and ask them if they view our product
favorably or unfavorably
All 5 say favorably
What can we conclude?
William B. Vogt, Carnegie Mellon, 45-733
9
5/27/2016
Samples from populations
Example:
– We are interested in how favorably our
product is viewed by customers
– We do a phone survey of 500 people who have
purchased our product before and ask them if
they view our product favorably or
unfavorably
466 say they view our product favorably
What can we conclude?
William B. Vogt, Carnegie Mellon, 45-733
10
5/27/2016
Samples from populations
Example:
– We are interested in how favorably our
product is viewed by customers
– We do a phone survey of 500 random adults
and ask them if they view our product
favorably or unfavorably
351 say they view our product favorably
What can we conclude?
William B. Vogt, Carnegie Mellon, 45-733
11
5/27/2016
Samples and statistics
As a practical matter, we are usually
interested in using our sample to say
something about a parameter of the
distribution we care about
To get at this parameter, we construct a
variable called an estimator or statistic
William B. Vogt, Carnegie Mellon, 45-733
12
5/27/2016
Samples and statistics
Example:
– If we want to know the average income of families in
the US, we draw a sample from a random phone
survey of 1000 families
– We ask, among other things, for their family income
– To estimate E(I), we calculate the estimator or statistic
called sample mean:
1 1000
I
Ii
1000 i 1
William B. Vogt, Carnegie Mellon, 45-733
13
5/27/2016
Samples and statistics
Example:
– But, what does the sample mean of income tell
us about E(I)?
– Answering this question is the subject of the
rest of the course, and of statistics in general
William B. Vogt, Carnegie Mellon, 45-733
14
5/27/2016
Random sampling
There are different ways to sample a population,
different sampling schemes
The simplest sampling scheme is called “simple
random sampling” or just “random sampling”
If there is a population of size N from which we
are to draw a sample of size n, random sampling
just says that the probability of any one of the N
members of the population being drawn is just
1/N, and that the draws are independent.
William B. Vogt, Carnegie Mellon, 45-733
15
5/27/2016
Statistic or estimator
A statistic (or estimator) is any function of
a sample
It is an algorithm which tells us what we
would do given a sample
Example:
1
1000
I
1000
– Sample mean:
I
– Sample variance:
1 1000
2
ˆ I
I
I
i
999 i 1
William B. Vogt, Carnegie Mellon, 45-733
i 1
i
16
5/27/2016
Statistic as random variable
A statistic is a random variable!!
A statistic is a random variable!!
A statistic is a random variable!!
A statistic is a random variable!!
A statistic is a random variable!!
A statistic is a random variable!!
A statistic is a random variable!!
A statistic is a random variable!!
William B. Vogt, Carnegie Mellon, 45-733
17
5/27/2016
Statistic as random variable
A simple example
– Consider the Bernoulli random variable X
with parameter p
– We are interested in p, the probability of a
success
– To estimate p, we will calculate the sample
mean of X:
1 n
pˆ X i
n i 1
William B. Vogt, Carnegie Mellon, 45-733
18
5/27/2016
Statistic as random variable
A simple example
– First, with a sample size of n=1:
w. p. p
1
pˆ
0 w. p. 1 p
William B. Vogt, Carnegie Mellon, 45-733
19
5/27/2016
Statistic as random variable
A simple example
– Next, with a sample size of n=2:
1
w. p. p 2
pˆ 1 / 2 w. p. 2 p1 p
2
0
w. p. 1 p
William B. Vogt, Carnegie Mellon, 45-733
20
5/27/2016
Statistic as random variable
A simple example
– Next, with a sample size of n=3:
1
2 / 3
pˆ
1 / 3
0
w. p.
p3
w. p. 3 p 2 1 p
w. p. 3 p1 p
3
w. p. 1 p
William B. Vogt, Carnegie Mellon, 45-733
2
21
5/27/2016
Statistic as random variable
The statistic is a random variable
– It has a distribution
Probability function or density
Cumulative distribution function
– It has an expectation
– It has a variance / standard deviation
William B. Vogt, Carnegie Mellon, 45-733
22
5/27/2016
Statistic as random variable
For the Bernoulli example
– Expectation, variance with n=1
E pˆ 1 p 01 p p
V pˆ E pˆ E pˆ
2
2
2
ˆ
V p 1 p 01 p p p1 p
William B. Vogt, Carnegie Mellon, 45-733
23
5/27/2016
Statistic as random variable
For the Bernoulli example
– Expectation, variance with n=2
E pˆ 1 p 2 1
2
2 p1 p 01 p p
2
2
V pˆ E pˆ 2 E pˆ
2
V pˆ 1 p 2 1 2 p1 p 01 p p 2
4
1
p1 p
2
William B. Vogt, Carnegie Mellon, 45-733
24
5/27/2016
Statistic as random variable
For the Bernoulli example
– Expectation, variance with n=3
E pˆ p
1
V pˆ p1 p
3
William B. Vogt, Carnegie Mellon, 45-733
25
5/27/2016
Statistic as random variable
For the Bernoulli example
– Probability function, n=1
p
1-p
0
William B. Vogt, Carnegie Mellon, 45-733
p
1
p̂
26
5/27/2016
Statistic as random variable
For the Bernoulli example
– Probability function, n=2
0
William B. Vogt, Carnegie Mellon, 45-733
p
1/2
1
p̂
27
5/27/2016
Statistic as random variable
For the Bernoulli example
– Probability function, n=3
0
1/3
p
William B. Vogt, Carnegie Mellon, 45-733
2/3
1
p̂
28
5/27/2016
Sample mean
As we have discussed before, the sample
mean of a random variable X from a
sample of size n is:
n
1
X Xi
n i 1
William B. Vogt, Carnegie Mellon, 45-733
29
5/27/2016
Sample mean
The sample mean is a random variable!!
Sample mean is made out of n random
variables; therefore, it is a random
variable:
n
1
X Xi
n i 1
William B. Vogt, Carnegie Mellon, 45-733
30
5/27/2016
Sample mean
Let’s suppose X is a random variable with
mean X and standard deviation X, and
let’s consider the sample mean:
n
1
X Xi
n i 1
William B. Vogt, Carnegie Mellon, 45-733
31
5/27/2016
Sample mean
Since the sample mean is a random
variable, we can ask about its expectation:
1 n
E X E X i
n i 1
1 n
E X i
n i 1
William B. Vogt, Carnegie Mellon, 45-733
32
5/27/2016
Sample mean
Since the sample mean is a random
variable, we can ask about its expectation:
1 n
E X E X i
n i 1
n
1
X X
n i 1
William B. Vogt, Carnegie Mellon, 45-733
33
5/27/2016
Sample mean
The expectation of the sample mean is
equal to the expectation of the underlying
random variable
On average, the sample mean is equal to
the underlying random variable
William B. Vogt, Carnegie Mellon, 45-733
34
5/27/2016
Sample mean
We can also ask about the variance of the
sample mean:
1 n
V X V X i
n i 1
1 n
2 V Xi
n i 1
1 n
2 V X i 2 Cov X i , X j
n i 1
William B. Vogt, Carnegie Mellon, 45-733
35
5/27/2016
Sample mean
If it is an independent, random sample
then the covariances are all zero:
1 n
V X 2 V X i 2 CovX i , X j
n i 1
1 n
2 V X i
n i 1
1 n 2 1 2
2 X X
n i 1
n
William B. Vogt, Carnegie Mellon, 45-733
36
5/27/2016
Sample mean
The variance of the sample mean is less
than the variance of the underlying random
variable
The variance of the sample mean gets
smaller as the sample size increases
The variance of the sample mean goes to
zero as the sample size goes to infinity
William B. Vogt, Carnegie Mellon, 45-733
37
5/27/2016
Sample mean
Our two results:
E X X
E X
1 2
V X X
n
1
X
n
2
X
William B. Vogt, Carnegie Mellon, 45-733
called standard error
38
5/27/2016
Sample mean
Say that:
– On average, the sample mean is equal to the
mean of the underlying random variable,
regardless of sample size
– As the sample size grows, the variance of the
sample mean shrinks, eventually approaching
zero
William B. Vogt, Carnegie Mellon, 45-733
39
5/27/2016
Sample mean
What would happen if the sample size “got
to” infinity?
Then the sample mean would no longer be
a random variable, it would literally equal
the population mean, E(X):
E X X
E X
1 2
V X X 0
William B. Vogt, Carnegie Mellon, 45-733
40
5/27/2016
Sample mean
Suppose X~N(1,1).
1.4
n=100
1.2
1
0.8
f(x)
n=1
0.6
0.4
0.2
0
-5
-4
William B. Vogt, Carnegie Mellon, 45-733
-3
-2
-1
0
1
2
3
4
5
X
41
5/27/2016
Sample mean
Suppose X~N(1,1).
14
n=1000
12
10
n=100
f(x)
8
6
4
n=1
2
0
-5
-4
-3
-2
William B. Vogt, Carnegie Mellon, 45-733
-1
0
1
2
3
4
5
42
5/27/2016
Sample mean
Finite sample correction
– What has gone before has assumed either that
you sample with replacement or that the
population you are sampling from is very
large (infinite)
– Just as we needed to use hypergeometric
rather than binomial when sampling from a
small pop without replacement, so here:
William B. Vogt, Carnegie Mellon, 45-733
43
5/27/2016
Sample mean
Finite sample correction
– For a population of size N, sampled without
replacement by a sample of size n:
E X X E X
1
N n
V X X2
n
N 1
1 2
X
X
n
William B. Vogt, Carnegie Mellon, 45-733
N n
N 1
44
5/27/2016
Sample mean
Normal variables and
– If X is normal, then so is X-bar
– If X is normal, then:
Z
X X
X
William B. Vogt, Carnegie Mellon, 45-733
~ N 0,1
45
5/27/2016
Sample mean
Central limit theorem and:
– As long as X comes from an independent
random sample:
Z
X X
X
William B. Vogt, Carnegie Mellon, 45-733
N 0,1
46
5/27/2016
Sample proportion
Consider W a Bernoulli and an
independent random sample of size n
Observe that X= W1+ W2+…+ Wn is
distributed Binomial (and therefore approx
normal)
William B. Vogt, Carnegie Mellon, 45-733
47
5/27/2016
Sample proportion
The sample mean (I.e. sample proportion)
is:
Just a binomial divided by n
Also approx normal
1
1
W Wi X
n
n
William B. Vogt, Carnegie Mellon, 45-733
48
5/27/2016
Sample proportion
To emphasize that we are estimating the p
parameter of the Bernoulli, we may write:
1
1
pˆ W Wi X
n
n
William B. Vogt, Carnegie Mellon, 45-733
49
5/27/2016
Sample proportion
Just as before, the sample mean has the
same expectation as the underlying
Bernoulli random variable:
E pˆ E W p
William B. Vogt, Carnegie Mellon, 45-733
50
5/27/2016
Sample proportion
Just as before, the sample mean has the
variance of the underlying Bernoulli
random variable over n:
p1 p
V pˆ V W
n
p1 p
pˆ
n
William B. Vogt, Carnegie Mellon, 45-733
51
5/27/2016
Sample proportion
Just as before, if there is a finite
population sampled w/o replacement:
p1 p N n
V pˆ V W
n
N 1
p1 p N n
pˆ
n
N 1
William B. Vogt, Carnegie Mellon, 45-733
52
5/27/2016
Sample variance
As we have discussed before, the sample
variance and sample standard deviation are
given by:
2
1
X i X
ˆ
n 1 i 1
n
2
X
ˆ X ˆ X2
William B. Vogt, Carnegie Mellon, 45-733
53
5/27/2016
Sample variance
Sometimes these are written:
2
1
X i X
s
n 1 i 1
n
2
X
s X s X2
William B. Vogt, Carnegie Mellon, 45-733
54
5/27/2016
Sample variance
It turns out that:
Es
2
X
William B. Vogt, Carnegie Mellon, 45-733
2
X
55
5/27/2016
Sample variance
It turns out that:
V s
2
X
William B. Vogt, Carnegie Mellon, 45-733
2
n 1
4
X
56
5/27/2016
Sample variance
It turns out that, if X is distributed normal:
n 1s
2
X
William B. Vogt, Carnegie Mellon, 45-733
2
X
~
2
n 1
57
5/27/2016
Sample variance
It turns out that (by the CLT), if X is from
an independent random sample:
n 1s
2
X
William B. Vogt, Carnegie Mellon, 45-733
2
X
2
n 1
58
5/27/2016
Sample variance
Discuss Chi-Squared distribution
William B. Vogt, Carnegie Mellon, 45-733
59
5/27/2016
Sample variance
Example (problem 46, page 251)
– A drug company manufactures pills
– These pills have normally distributed weight
– The drug co wants the variance of weight to be
smaller than 1.5 milligrams squared
– Drug co collects a sample of size 20
– The sample variance is 2.05
– How likely is it that a sample variance this high or
higher would be found if the true variance is 1.5?
William B. Vogt, Carnegie Mellon, 45-733
60
5/27/2016