Sampling Distribution of the Mean

Transcript Sampling Distribution of the Mean

Sampling Distribution of the Mean
Central Limit Theorem
Given population with
distribution will have:

A mean
A variance
Standard Error (mean)
and
( x )

2
the sampling

( x2 )

( x )


2
N

N
As N increases, the shape of the distribution becomes
normal (whatever the shape of the population)
Testing Hypothesis Known
and
Remember:
We could test a hypothesis concerning a population and a
single score by
z
x

Obtain p(z ) and use z table
We will continue the same logic
Given: Behavior Problem Score of 10 years olds
  50,   10
( N  5) Sample of 10 year olds under stress x  56
H 0   50
H 1   50

Because we know  and
, we can use the Central
Limit Theorem to obtain the Sampling Distribution
when H0 is true.
Sampling Distribution will have
 x  50,  x2 
x
2

( s tan dard error )
N

102

 20
5
 4.47
N
We can find areas under the distribution by referring
to Z table
We need to know
p( x )  56
Minor change from z score
z
x

NOW
z
x
x
or
z
x

N
With our data
z
56  50
6

 1.34
4.47
4.47
Changes in formula because we are dealing with
distribution of means NOT individual scores.
From Z table we find
p( z )  1.34 is 0.0901
Because we want a two-tailed test we double 0.0901
(2)0.0901 = 0.1802
0.1802  0.05
NOT REJECT H0
or
is
0.0901 0.025
One-Sample t test

Pop’n
= known &
 2 unknown we must estimate
2
 2 with S
Because we use S, we can no longer declare the answer
to be a Z, now it is a t
Why?
Sampling Distribution of t
2
-
S is unbiased estimator of
-
The problem is the shape of the S2 distribution
2
positively
skewed
thus:
S2 is more likely to UNDERESTIMATE
2
(especially with small N)
thus: t is likely to be larger than Z (S2 is in denominator)
t - statistic
z 
x
x

x

N
and substitute S2 for

x
2
N
2
x
x
x
t 


S
Sx
S2
N
N
To treat t as a Z would give us too many significant results
Guinness Brewing Company (student)
Student’s t distribution we switch to the t Table when we
use S2
Go to Table
Unlike Z, distribution is a function of df with
N  , t  z
Degrees of Freedom
For one-sample cases, df  N  1
1 df lost because we used x (sample mean) to
calculate S2
( x  x )  0, all x can vary save for 1
2

Example: One-Sample Unknown
Effect of statistic tutorials:
Last 100 years:   76.0
this years:
x  79.3
(no tutorials)
(tutorials)
N = 20, S = 6.4
H 0 :   76
t 

H 1 :   76
x
Sample Mean - P op'n mean

Sx
standarderror
x
s
N

79.3  76
6.4
20

3.3
1.43
 2 .3 1
Go to t-Table
t-Table
-
not area (p) above or below value of t
-
gives t values that cut off critical areas, e.g., 0.05
t also defined for each df
N=20
df = (N-1) = 20-1 = 19
Go to Table
t.05(19) is  2.093
critical value
2.31 2.093

reject H 0
Factors Affecting Magnitude of t
& Decision
1.
2.
Difference between x and  ( x   )
the larger the numerator, the larger the
t value
Size of S2
as S2 decreases, t increases
3.
Size of N
as N increases, denominator decreases, t
increases
4.

5.
One-, or two-tailed test
level
Confidence Limits on Mean
Point estimate
Specific value taken as estimator of a parameter
Interval estimates
A range of values estimated to include parameter
Confidence limits
Range of values that has a specific (p) of bracketing
the parameter. End Points = confidence limits.
How large or small  could be without rejecting H 0 if we
ran a t-test on the obtained sample mean.
Confidence Limits (C.I.)
x
x
t 

S
Sx
N
We already know
x , S and
N
We know critical value for t at   .05
 t.05 (19)  2.093
We solve for

 2.093 
79.3  
79.3  

6.4
1.43
20
Rearranging
  2.093(1.43)  79.3
  2.993 79.3
Using +2.993 and -2.993
 upper  2.993 79.3  82.29
 lower  2.993 79.3  76.31
C.I .95  76.31    82.29
Two Related Samples t
Related Samples
Design in which the same subject is observed under
more than one condition (repeated measures, matched
samples)
Each subject will have 2 measures x 1 and x 2 that will
be correlated. This must be taken into account.
Promoting social skills in adolescents
Before and after intervention
H 0 : 1   2 or 1   2  0
before
after
Difference Scores
Set of scores representing the difference between the
subject’s performance or two occasions ( x1 and x 2 )
x1
x2
Difference( D)
18
5
19
13
12
4
17
6
12
17
26
3
x
S
1
20
15
18
10
8
15
13.333
6.914
11
8
12
27
3
3
14
12
14
11
10
9
11.133
5.998
1
2
2
4
5
1
0
2
6
3
4
1
2
6
2.200
2.933
our data can be the D column
H 0 :  D  0 from 1   2  0

H0
we are testing a hypothesis using ONE sample
Related Samples t
x
t
Sx
remember
now
D
D 0 D 0
t

SD
SD
N
D
N
N = # of D scores
Degrees of Freedom
same as for one-sample case
df
our data
= (N - 1) = (15 - 1) = 14
t
2.20  0
2.20

 2.91
2.933
0.757
15
Go to table
t.05 (14)  2.145  t  2.91
 reject H 0
Advantages of Related Samples
1.
Avoids problems that come with subject to subject
variability.
The difference between(x1) 26 and (x2) 24
is the same as between (x1) 6 and (x2) 4
(increases power)
(less variance, lower denominator, greater t)
2.
Control of extraneous variables
3.
Requires fewer subjects
Disadvantages
1.
Order effects
2.
Carry-over effects
Two Independent Samples t
H 0 : 1  2  0
Sampling distribution of differences
between means
2 pop’ns
Suppose:
x1
1
2
1
draw pairs of samples:
and
and
and
x2
2

2
2
sizes N1, and N2
x1 and x2 and the differences
record means
x
between x1, and x2 ( x1  x2 )for each pair of
samples
1
repeat
 times
x1
x2
x1
x2
Mean Difference
x11
x21
x22
x11  x21
x12

x1
Mean
1
Variance

N1
Standard
Error
2
1
1
N1

x2 
2

N2
2
N2
2
2
x11  x21

x11  x21
1   2
 12
N1

 12
N1
 22
N2

 22
N2
Variance Sum Law
Variance of a sum or difference of two INDEPENDENT
variables = sum of their variances
The distribution of the differences is also normal
t
Difference Between Means
 z
( x1  x2 )  (1  2 )
 x x
1

2
( x1  x2 )  ( 1  2 )
 12
N1
We must estimate

 22
N2
 2 with s 2
( x1  x2 )  ( 1  2 )
 t
sx1  x2
Because
H0 : 1  2  0
( x1  x2 )
t
S x1  x2
or
t
( x1  x2 )
2
1
2
2
s
s

N1
N2
t
( x1  x2 )
2
1
2
2
s
s

n1 n2
When
is O.K. only when the N’s
are the same size
n1  n2 we need a better estimate of
2
We must assume homogeneity of variance ( 2
1
2
Rather than using s1 or
we use their average.
Because
  22 )
s22 to estimate  2,
n1  n2
we need a Weighted Average
weighted by their degrees of freedom
2
2
(
n

1
)
s

(
n

1
)
s
2
1
2
2
sp  1
n1  n2  2
(s 2p )
Pooled
Variance
Now
x1  x2
t
s x1  x2

x1  x2
2
1
2
2
s
s

n1 n2

x1  x2
1 1
s   
 n1 n2 
2
p
1
1

n1 n2
come from formula for
Standard Error
s
Degrees of
Freedom
2
p
two means have been used to calculate
 df  (n1 1)  (n2 1)
 n1  n2  2
df
Example:
x
2
s
Group1 Group 2
17
13
17
18
21
17
18
13
22
14
18
13
16
18
15
19
18
16
20
14
21
13
16
15
15
14
16
16
20
15
15
13
17
17
15
18.00 15.25
5.286 3.671
Example:
18.00 – 15.25
We have numerator
We need denominator
Pooled Variance because
???????
n1  n2
2
2
(
n

1
)
s

(
n

1
)
s
1
2
2
s 2p  1
n1  n2  2
14(5.286 )  19(3.671)

15  20  2
74.004  69.749

 4.356
33
Denominator becomes
1 
1
4.356   
 15 20 
4.356 4.356

15
20
=
t
x1  x2
1 1
s   
 n1 n2 
2
p
(18.00  15.25)

4.356 4.356

15
20
2.75

0.5082
2.75

 3.86
0.713
df  (15  20  2)  33
Go to Table
t.05 (33)  2.04
t  3.86  2.04
 reject H0
Summary
If  and
 2 are known, then treat x
s
in Z score formula; x replaces
z

as
x
x

n
If

is known and  is unknown, then
2

in
sx
sD
replaces
x
t
s
n
If two related samples, then D replaces
and
s
replaces
sx
x
D 0
t
sD
n
If two independent samples, and Ns are of equal
size, then
sD
t
is replaced by
x1  x2
2
1
s1
s2

n1
n2
2
2
s
s

n n
If two independent samples, and Ns are NOT equal,
s2
then 1 and
t
s22 are replaced by s 2p
x1  x2
1 1
s   
 n1 n2 
2
p

Sampling Distribution of the Mean

Transcript Sampling Distribution of the Mean

Directory