Sampling Distribution of the Mean
Download
Report
Transcript Sampling Distribution of the Mean
Sampling Distribution of the Mean
Central Limit Theorem
Given population with
distribution will have:
A mean
A variance
Standard Error (mean)
and
( x )
2
the sampling
( x2 )
( x )
2
N
N
As N increases, the shape of the distribution becomes
normal (whatever the shape of the population)
Testing Hypothesis Known
and
Remember:
We could test a hypothesis concerning a population and a
single score by
z
x
Obtain p(z ) and use z table
We will continue the same logic
Given: Behavior Problem Score of 10 years olds
50, 10
( N 5) Sample of 10 year olds under stress x 56
H 0 50
H 1 50
Because we know and
, we can use the Central
Limit Theorem to obtain the Sampling Distribution
when H0 is true.
Sampling Distribution will have
x 50, x2
x
2
( s tan dard error )
N
102
20
5
4.47
N
We can find areas under the distribution by referring
to Z table
We need to know
p( x ) 56
Minor change from z score
z
x
NOW
z
x
x
or
z
x
N
With our data
z
56 50
6
1.34
4.47
4.47
Changes in formula because we are dealing with
distribution of means NOT individual scores.
From Z table we find
p( z ) 1.34 is 0.0901
Because we want a two-tailed test we double 0.0901
(2)0.0901 = 0.1802
0.1802 0.05
NOT REJECT H0
or
is
0.0901 0.025
One-Sample t test
Pop’n
= known &
2 unknown we must estimate
2
2 with S
Because we use S, we can no longer declare the answer
to be a Z, now it is a t
Why?
Sampling Distribution of t
2
-
S is unbiased estimator of
-
The problem is the shape of the S2 distribution
2
positively
skewed
thus:
S2 is more likely to UNDERESTIMATE
2
(especially with small N)
thus: t is likely to be larger than Z (S2 is in denominator)
t - statistic
z
x
x
x
N
and substitute S2 for
x
2
N
2
x
x
x
t
S
Sx
S2
N
N
To treat t as a Z would give us too many significant results
Guinness Brewing Company (student)
Student’s t distribution we switch to the t Table when we
use S2
Go to Table
Unlike Z, distribution is a function of df with
N , t z
Degrees of Freedom
For one-sample cases, df N 1
1 df lost because we used x (sample mean) to
calculate S2
( x x ) 0, all x can vary save for 1
2
Example: One-Sample Unknown
Effect of statistic tutorials:
Last 100 years: 76.0
this years:
x 79.3
(no tutorials)
(tutorials)
N = 20, S = 6.4
H 0 : 76
t
H 1 : 76
x
Sample Mean - P op'n mean
Sx
standarderror
x
s
N
79.3 76
6.4
20
3.3
1.43
2 .3 1
Go to t-Table
t-Table
-
not area (p) above or below value of t
-
gives t values that cut off critical areas, e.g., 0.05
t also defined for each df
N=20
df = (N-1) = 20-1 = 19
Go to Table
t.05(19) is 2.093
critical value
2.31 2.093
reject H 0
Factors Affecting Magnitude of t
& Decision
1.
2.
Difference between x and ( x )
the larger the numerator, the larger the
t value
Size of S2
as S2 decreases, t increases
3.
Size of N
as N increases, denominator decreases, t
increases
4.
5.
One-, or two-tailed test
level
Confidence Limits on Mean
Point estimate
Specific value taken as estimator of a parameter
Interval estimates
A range of values estimated to include parameter
Confidence limits
Range of values that has a specific (p) of bracketing
the parameter. End Points = confidence limits.
How large or small could be without rejecting H 0 if we
ran a t-test on the obtained sample mean.
Confidence Limits (C.I.)
x
x
t
S
Sx
N
We already know
x , S and
N
We know critical value for t at .05
t.05 (19) 2.093
We solve for
2.093
79.3
79.3
6.4
1.43
20
Rearranging
2.093(1.43) 79.3
2.993 79.3
Using +2.993 and -2.993
upper 2.993 79.3 82.29
lower 2.993 79.3 76.31
C.I .95 76.31 82.29
Two Related Samples t
Related Samples
Design in which the same subject is observed under
more than one condition (repeated measures, matched
samples)
Each subject will have 2 measures x 1 and x 2 that will
be correlated. This must be taken into account.
Promoting social skills in adolescents
Before and after intervention
H 0 : 1 2 or 1 2 0
before
after
Difference Scores
Set of scores representing the difference between the
subject’s performance or two occasions ( x1 and x 2 )
x1
x2
Difference( D)
18
5
19
13
12
4
17
6
12
17
26
3
x
S
1
20
15
18
10
8
15
13.333
6.914
11
8
12
27
3
3
14
12
14
11
10
9
11.133
5.998
1
2
2
4
5
1
0
2
6
3
4
1
2
6
2.200
2.933
our data can be the D column
H 0 : D 0 from 1 2 0
H0
we are testing a hypothesis using ONE sample
Related Samples t
x
t
Sx
remember
now
D
D 0 D 0
t
SD
SD
N
D
N
N = # of D scores
Degrees of Freedom
same as for one-sample case
df
our data
= (N - 1) = (15 - 1) = 14
t
2.20 0
2.20
2.91
2.933
0.757
15
Go to table
t.05 (14) 2.145 t 2.91
reject H 0
Advantages of Related Samples
1.
Avoids problems that come with subject to subject
variability.
The difference between(x1) 26 and (x2) 24
is the same as between (x1) 6 and (x2) 4
(increases power)
(less variance, lower denominator, greater t)
2.
Control of extraneous variables
3.
Requires fewer subjects
Disadvantages
1.
Order effects
2.
Carry-over effects
Two Independent Samples t
H 0 : 1 2 0
Sampling distribution of differences
between means
2 pop’ns
Suppose:
x1
1
2
1
draw pairs of samples:
and
and
and
x2
2
2
2
sizes N1, and N2
x1 and x2 and the differences
record means
x
between x1, and x2 ( x1 x2 )for each pair of
samples
1
repeat
times
x1
x2
x1
x2
Mean Difference
x11
x21
x22
x11 x21
x12
x1
Mean
1
Variance
N1
Standard
Error
2
1
1
N1
x2
2
N2
2
N2
2
2
x11 x21
x11 x21
1 2
12
N1
12
N1
22
N2
22
N2
Variance Sum Law
Variance of a sum or difference of two INDEPENDENT
variables = sum of their variances
The distribution of the differences is also normal
t
Difference Between Means
z
( x1 x2 ) (1 2 )
x x
1
2
( x1 x2 ) ( 1 2 )
12
N1
We must estimate
22
N2
2 with s 2
( x1 x2 ) ( 1 2 )
t
sx1 x2
Because
H0 : 1 2 0
( x1 x2 )
t
S x1 x2
or
t
( x1 x2 )
2
1
2
2
s
s
N1
N2
t
( x1 x2 )
2
1
2
2
s
s
n1 n2
When
is O.K. only when the N’s
are the same size
n1 n2 we need a better estimate of
2
We must assume homogeneity of variance ( 2
1
2
Rather than using s1 or
we use their average.
Because
22 )
s22 to estimate 2,
n1 n2
we need a Weighted Average
weighted by their degrees of freedom
2
2
(
n
1
)
s
(
n
1
)
s
2
1
2
2
sp 1
n1 n2 2
(s 2p )
Pooled
Variance
Now
x1 x2
t
s x1 x2
x1 x2
2
1
2
2
s
s
n1 n2
x1 x2
1 1
s
n1 n2
2
p
1
1
n1 n2
come from formula for
Standard Error
s
Degrees of
Freedom
2
p
two means have been used to calculate
df (n1 1) (n2 1)
n1 n2 2
df
Example:
x
2
s
Group1 Group 2
17
13
17
18
21
17
18
13
22
14
18
13
16
18
15
19
18
16
20
14
21
13
16
15
15
14
16
16
20
15
15
13
17
17
15
18.00 15.25
5.286 3.671
Example:
18.00 – 15.25
We have numerator
We need denominator
Pooled Variance because
???????
n1 n2
2
2
(
n
1
)
s
(
n
1
)
s
1
2
2
s 2p 1
n1 n2 2
14(5.286 ) 19(3.671)
15 20 2
74.004 69.749
4.356
33
Denominator becomes
1
1
4.356
15 20
4.356 4.356
15
20
=
t
x1 x2
1 1
s
n1 n2
2
p
(18.00 15.25)
4.356 4.356
15
20
2.75
0.5082
2.75
3.86
0.713
df (15 20 2) 33
Go to Table
t.05 (33) 2.04
t 3.86 2.04
reject H0
Summary
If and
2 are known, then treat x
s
in Z score formula; x replaces
z
as
x
x
n
If
is known and is unknown, then
2
in
sx
sD
replaces
x
t
s
n
If two related samples, then D replaces
and
s
replaces
sx
x
D 0
t
sD
n
If two independent samples, and Ns are of equal
size, then
sD
t
is replaced by
x1 x2
2
1
s1
s2
n1
n2
2
2
s
s
n n
If two independent samples, and Ns are NOT equal,
s2
then 1 and
t
s22 are replaced by s 2p
x1 x2
1 1
s
n1 n2
2
p