Hypothesis Testing – Two Samples
Download
Report
Transcript Hypothesis Testing – Two Samples
Hypothesis Testing – Two
Samples
Chapters 12 & 13
Chapter Topics
Comparing Two Independent Samples
Independent samples Z test for the difference in two
means
Pooled-variance t test for the difference in two means
F Test for the Difference in Two Variances
Comparing Two Related Samples
Paired-sample Z test for the mean difference
Paired-sample t test for the mean difference
Two-sample Z test for population proportions
Independent and Dependent Samples
Comparing Two Independent
Samples
Different Data Sources
Unrelated
Independent
Sample selected from one population has no effect or
bearing on the sample selected from the other
population
Use the Difference between 2 Sample Means
Use Z Test or t Test for Independent Samples
Independent Sample Z Test
(Variances Known)
Assumptions
Samples are randomly and independently
drawn from normal distributions
Population variances are known
Test Statistic
Z
( X 1 X 2 ) ( 1 )
2
n1
2
n2
t Test for Independent Samples
(Variances Unknown)
Assumptions
Both populations are normally distributed
Samples are randomly and independently
drawn
Population variances are unknown but
assumed equal
If both populations are not normal, need large
sample sizes
Developing the t Test for
Independent Samples
Setting Up the Hypotheses
H0: 1 = 2
H1: 1 2
OR
H0: 1 - 2 = 0
H1: 1 - 2 0
Two
Tail
H0: 1 2
H1: 1 > 2
OR
H0: 1 - 2 0
H1: 1 - 2 > 0
Right
Tail
H0: 1 2
H1: 1 < 2
OR
H0: 1 - 2
H1: 1 - 2 < 0
Left
Tail
Developing the t Test for
Independent Samples
(continued)
Calculate the Pooled Sample Variance as
an Estimate of the Common Population
Variance
(n1 1) S (n2 1) S
S
(n1 1) (n2 1)
2
p
2
1
2
2
S p2 : Pooled sample variance
n1 : Size of sample 1
S12 : Variance of sample 1
n2 : Size of sample 2
S22 : Variance of sample 2
Developing the t Test for
Independent Samples
(continued)
Compute the Sample Statistic
X
t
df n1 n2 2
S
2
p
1
X 2 1 2
1 1
S
n1 n2
2
p
n1 1 S n2 1 S
n1 1 n2 1
2
1
Hypothesized
difference
2
2
t Test for Independent Samples:
Example
You’re a financial analyst for Charles Schwab. Is there a
difference in average dividend yield between stocks
listed on the NYSE & NASDAQ? You collect the
following data:
NYSE
NASDAQ
Number
21
25
Sample Mean
3.27
2.53
Sample Std Dev 1.30
1.16
Assuming equal variances, is
there a difference in average
yield (a = 0.05)?
© 1984-1994 T/Maker Co.
Calculating the Test Statistic
X
t
1
X 2 1 2
1 1
S
n1 n2
2
p
3.27 2.53 0
1
1
1.510
21 25
2
2
n
1
S
n
1
S
1
1
2
2
2
Sp
n1 1 n2 1
21 11.302 25 11.162
1.502
21 1 25 1
2.03
Solution
H0: 1 - 2 = 0 i.e. (1 = 2) Test Statistic:
3.27 2.53
H1: 1 - 2 0 i.e. (1 2) t
2.03
a = 0.05
1
1
1.502
df = 21 + 25 - 2 = 44
21 25
Critical Value(s):
Decision:
Reject H0
Reject H0
Reject at a = 0.05.
.025
.025
Conclusion:
There is evidence of a
-2.0154 0 2.0154 t
difference in means.
2.03
p -Value Solution
(p-Value is between .02 and .05) < (a = 0.05)
Reject.
p-Value
is between .01 and .025
2
Reject
Reject
a
-2.0154
0
2.0154
2.03
=.025
Z
Test Statistic 2.03 is in the Reject Region
Example
You’re a financial analyst for Charles Schwab. You
collect the following data:
NYSE
NASDAQ
Number
21
25
Sample Mean
3.27
2.53
Sample Std Dev 1.30
1.16
You want to construct a 95%
confidence interval for the
difference in population average
yields of the stocks listed on
NYSE and NASDAQ.
© 1984-1994 T/Maker Co.
Example: Solution
2
2
n
1
S
n
1
S
1
2
2
S p2 1
n1 1 n2 1
21 11.302 25 11.162
1.502
21 1 25 1
X
1
X 2 ta / 2,n1 n2 2
1 1
S
n1 n2
2
p
1 1
3.27 2.53 2.0154 1.502
21 25
0.0088 1 2 1.4712
Independent Sample (Two Sample) tTest in JMP
Independent Sample t Test with Variances
Known
Analyze | Fit Y by X | Measurement in Y box
(Continuous) | Grouping Variable in X box
(Nominal) | | Means/Anova/Pooled t
Comparing Two Related Samples
Test the Means of Two Related Samples
Paired or matched
Repeated measures (before and after)
Use difference between pairs
Di X1i X 2i
Eliminates Variation between Subjects
Z Test for Mean Difference (Variance
Known)
Assumptions
Both populations are normally distributed
Observations are paired or matched
Variance known
Test Statistic
Z
D D
D
n
n
D
D
i 1
n
i
t Test for Mean Difference (Variance
Unknown)
Assumptions
Both populations are normally distributed
Observations are matched or paired
Variance unknown
If population not normal, need large samples
Test Statistic
D D
t
SD
n
n
D
Di
i 1
n
n
SD
( D D)
i 1
i
n 1
2
Dependent-Sample t Test: Example
Assume you work in the finance department. Is the
new financial package faster (a=0.05 level)? You
collect the following processing times:
User Existing System (1)
C.B.
9.98 Seconds
T.F.
9.88
M.H.
9.84
R.K.
9.99
M.O.
9.94
D.S.
9.84
S.S.
9.86
C.T.
10.12
K.T.
9.90
S.Z.
9.91
New Software (2)
9.88 Seconds
9.86
9.75
9.80
9.87
9.84
9.87
9.98
9.83
9.86
Difference Di
.10
.02
.09
.19
.07
.00
- .01
.14
.07
.05
D
D
i
n
SD
.072
D D
i
n 1
.06215
2
Dependent-Sample t Test: Example
Solution
Is the new financial package faster (0.05 level)?
H0: D
H1: D >
Reject
a .5 D = .072
Critical Value=1.8331
df = n - 1 = 9
Test Statistic
D D
.072 0
t
3.66
SD / n .06215/ 10
a .5
1.8331
Decision: Reject H0
t
3.66
t Stat. in the rejection
zone.
Conclusion: The new
software package is faster.
Confidence Interval Estimate for D
of Two Dependent Samples
Assumptions
Both populations are normally distributed
Observations are matched or paired
Variance is unknown
100 1 a % Confidence Interval Estimate:
D ta / 2,n 1
SD
n
Example
Assume you work in the finance department. You want
to construct a 95% confidence interval for the mean
difference in data entry time. You collect the following
processing times:
User Existing System (1)
C.B.
9.98 Seconds
T.F.
9.88
M.H.
9.84
R.K.
9.99
M.O.
9.94
D.S.
9.84
S.S.
9.86
C.T.
10.12
K.T.
9.90
S.Z.
9.91
New Software (2)
9.88 Seconds
9.86
9.75
9.80
9.87
9.84
9.87
9.98
9.83
9.86
Difference Di
.10
.02
.09
.19
.07
.00
- .01
.14
.07
.05
Solution:
D
D
i
n
.072
SD
ta / 2,n 1 t0.025,9
D
i
D
n 1
2.2622
SD
D ta / 2,n 1
n
.06215
.072 2.2622
10
0.0275 < D < 0.1165
2
.06215
F Test for Difference in Two
Population Variances
Test for the Difference in 2 Independent
Populations
Parametric Test Procedure
Assumptions
Both populations are normally distributed
Test is not robust to this violation
Samples are randomly and independently
drawn
The F Test Statistic
2
1
2
2
S
F
S
2
1 = Variance of Sample 1
S
n1 - 1 = degrees of freedom
S
2
2= Variance of Sample 2
n2 - 1 = degrees of freedom
0
F
Developing the F Test
Hypotheses
H0 : 1 2 = 2 2
H1 : 1 2 2 2
Reject H0
Reject H0
a/2
Test Statistic
0
F = S12 /S22
Two Sets of Degrees of Freedom
Do Not
Reject
FL
a/2
FU
F
df1 = n1 - 1; df2 = n2 - 1
Critical Values: FL( n1 -1, n2 -1) and FU( n1 -1 , n2 -1)
FL = 1/FU*
(*degrees of freedom switched)
Developing the F Test
Easier Way
Reject H0
Put the largest in the num.
Do Not
Reject
Test Statistic
F = S12 /S22
0
a
F
F
F Test: An Example
Assume you are a financial analyst for Charles Schwab. You
want to compare dividend yields between stocks listed on the
NYSE & NASDAQ. You collect the following data:
NYSE
NASDAQ
Number
21
25
Mean
3.27
2.53
Std Dev
1.30
1.16
Is there a difference in the
variances between the NYSE
& NASDAQ at the a 0.05 level?
© 1984-1994 T/Maker Co.
F Test: Example Solution
Finding the Critical
Values for a = .05
df1 n1 1 21 1 20
df 2 n2 1 25 1 24
F.05,20,24 2.03
F Test: Example Solution
Test Statistic:
H0 : 1 2 = 2 2
2
1
2
2
S
1.30
F
1.25
2
S
1.16
H1 : 1 2 2 2
a .05
df1 20 df2 24
Critical Value(s):
Decision:
Do not reject at a = 0.05.
Reject
.05
0
2.33
1.25
2
F
Conclusion:
There is insufficient
evidence to prove a
difference in variances.
F Test: One-Tail
H0: 12 22
H1: 12 < 22
or
a = .05
FL n1 1,n2 1
Reject
H0: 12 22
H1: 12 > 22
1
FU n2 1,n1 1
Reject
a .05
a .05
0
Degrees of
freedom
switched
F
FL n1 1,n2 1
0
FU n1 1,n2 1
F
F Test: One-Tail
Easier Way
Reject H0
Put the largest in the num.
Do Not
Reject
Test Statistic
F = S12 /S22
0
a
F
F
Z Test for Differences in Two
Proportions (Independent Samples)
What is It Used For?
To determine whether there is a difference
between 2 population proportions and whether
one is larger than the other
Assumptions:
Independent samples
Population follows binomial distribution
Sample size large enough: np 5 and n(1-p) 5
for each population
Z Test Statistic
Z
p
1
p 2
p Pd q Pd p Pd q Pd
n1
n2
p Pd
q Pd
Z
p
1
n1 p 1 n2 p 2
n1 n2
n1q1 n2 q 2
n1 n2
p 2 p1 p2
p 1q1 p 2 q 2
n1
n2
The Hypotheses for the
Z Test
Research Questions
Hypothesi
s
No Difference Prop 1 Prop 2 Prop 1 Prop 2
Any Difference Prop 1 < Prop 2 Prop 1 > Prop 2
H0
p1 - p2
H1
p1 - p 2 0
p1 - p2 0
p1 - p2 < 0
p1 - p2 0
p1 - p2 > 0
Z Test for Differences in Two
Proportions: Example
As personnel director, you
want to test the perception
of fairness of two methods
of performance evaluation.
63 of 78 employees rated
Method 1 as fair. 49 of 82
rated Method 2 as fair. At
the 0.01 significance level,
is there a difference in
perceptions?
63
0.8077
78
49
p2
0.598
82
p 1
np 5
n1 78
n2 82
n1 p 5
Calculating the Test Statistic
Z
p
1
p 2 p1 p2
p Pd q Pd p Pd q Pd
n1
n2
2.898
p Pd
q Pd
n1 p 1 n2 p 2
n1 n2
n1q1 n2 q 2
n1 n2
Z Test for Differences in Two
Proportions: Solution
H0: p1 - p2 = 0
H1: p1 - p2 0
a = 0.01
n1 = 78 n2 = 82
Critical Value(s):
Reject H 0
Reject H 0
.005
.005
-2.58 0 2.58 Z
Test Statistic:
Z 2.90
Decision:
Reject at a = 0.01.
Conclusion:
There is evidence of a
difference in proportions.
Confidence Interval for Differences in
Two Proportions
The 100 1 a % Confidence Interval for
Differences in Two Proportions
p
1
p 2 z
1
p 1 1 p 1
a
2
n1
p 2 1 p 2
n2
Confidence Interval for Differences in Two
Proportions: Example
As personnel director, you
want to find out the
perception of fairness of
two methods of
performance evaluation. 63
of 78 employees rated
Method 1 as fair. 49 of 82
rated Method 2 as fair.
Construct a 99%
confidence interval for the
difference in two
proportions.
63
0.8077
78
49
p2
0.598
82
p 1
np 5
n1 78
n2 82
n1 p 5
Confidence Interval for Differences in Two
Proportions: Solution
p
1
p 2 z
1
p 1 1 p 1
a
2
n1
p 2 1 p 2
n2
0.80771 0.8077 0.59761 0.5976
0.8077 0.5976 2.5758
78
82
0.0294 < p1 p2 < 0.3909
We are 99% confident that the difference between two
proportions is somewhere between 0.0294 and 0.3909.
Z Test for Differences in Two
Proportions (Dependent Samples)
What is It Used For?
To determine whether there is a difference
between 2 population proportions and whether
one is larger than the other
Assumptions:
Dependent samples
Population follows binomial distribution
Sample size large enough: np 5 and n(1-p) 5
for each population
Z Test Statistic for Dependent
Samples
a d
Z
a d
Sample One
Category 1
Category 2
Sample Two
Category 1 Category 2
a
b
c
d
a+c
b+d
This Z can be used when a+d>10
If 10<a+d<20, use the correction in the text
a+b
c+d
N
Unit Summary
Compared Two Independent Samples
Performed Z test for the differences in two
means
Performed t test for the differences in two
means
Performed Z test for differences in two
proportions
Addressed F Test for Difference in Two Variances
Unit Summary
Compared Two Related Samples
Performed dependent sample Z tests for the
mean difference
Performed dependent sample t tests for the
mean difference
Performed Z tests for proportions using
dependent samples