Working with Sampling distributions 2

Download Report

Transcript Working with Sampling distributions 2

We draw a random sample,
size n
Sample mean is x-bar
Sample standard deviation is s
Estimate of population standard
deviation is s. Real population standard
deviation is sigma.
Estimate of sampling distribution
standard deviation (standard error) is
s
n
1
Consider the range
Begin with (x-bar - 2*s/sqrt(n))
End with (x-bar + 2*s/sqrt(n))
s
x 2
n
 x
s
 x2
n
2
Sampling distributionWhat is p of
each type of
sample (1 - 6)?
Sample 1
x
Sample 2
x
Sample 3
x
Sample 4
x
Sample 5
x
Sample 6
x
3
Note
We earlier said population mean ± 2
standard deviations of the sampling
distribution (called the standard error of
the mean, or just the standard error) will
include 95% of the samples.
Now we say that sample mean ± 2
standard errors will include the
population mean 95% of the time
4
In other words
Sample
2s
x
n
will include the
population mean 95% of the time.
Does not depend upon shape or
distribution of the population, especially
as sample size (n) increases.
When sample is small, we use
alternative t distribution rather than the
normal distribution
5
The “t” distribution
Fewer cases in
the middle
Fatter tails
6
Why the t distribution?
We use the normal distribution when we
know the standard deviation of the
population.
We use the t distribution when we have
to estimate the population standard
deviation from the sample. In doing this
we lose useful information and the
resulting sampling distribution has more
cases far from the mean
7
Why the t distribution?
We loose one piece of information when
we calculate the mean.
Consider: 1 2 3 is sample. Now fix
mean at 2. What possible values can
each case take if mean is 2?
This has a bigger impact when we have
a small sample.
How much information is there in a
sample? n pieces (n is the sample size)
8
Lost information
With less information, there is more
variability in the sampling distribution.
This means fatter tails and fewer cases
in the middle
Look again at the t distribution. This one
is for samples of size 10.
9
The t distribution
Fewer cases in
the middle
Fatter tails
10
Different t distributions
There is a different t distribution for
every sample size.
We label them by the number of
degrees of freedom
As the sample size increases (the
number of degrees of freedom
increases) the t becomes more and
more normal in shape
11
Compare at ± 2 standard
errors
Sample
Size
10
20
30
40
50
60
100
200
500
1000
Normal
Prob.
.0455
.0455
.0455
.0455
.0455
.0455
.0455
.0455
.0455
.0455
t
Prob.
.0734
.0593
.0546
.0523
.0509
.0500
.0482
.0469
.0460
.0458
Diff.
.0279
.0138
.0091
.0068
.0054
.0045
.0027
.0014
.0005
.0003
12
With small sample sizes
Mean of sampling distribution is mean
of population. Our estimate is xbar
Shape of sampling distribution is t
Standard error is sigma over square
root of n. Our estimate is
s
n
13
Confidence interval
For large samples, confidence interval
for mean is X  z  s
n
For small samples, confidence interval
s
for mean is
X t
n
For large samples, confidence interval
for proportions is p  z  p(1  p)
n
14
Conf. interval & hypothesis
test
We have been looking at confidence
intervals around means and proportions
An alternative approach is to test a
hypothesis.
These two approaches do much the
same thing, though we state our
conclusions differently
15
Confidence interval conclusion
The population mean (proportion) is
likely (95%) to be within this interval
s
X  z
n
or
s
X t
n
16
Hypothesis-test conclusion
I am confident (95%) that this sample
did not come from the hypothesized
population. Therefore I reject the
hypothesis.
I cannot reject the hypothesis that the
sample came from the hypothesized
population.
17
Compare
Reject any hypothesis that was outside
the confidence interval (H1)
Fail to reject any hypothesis that was
within the confidence interval (H2)
s
x 2
n
H1
 x
s
 x2
n
H2
18
Which samples
would reject
Hypothesis of
mean at the
blue line?
Sampling distribution
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
19