Section 6-1, 6-2

Download Report

Transcript Section 6-1, 6-2

Chapter 8
Inferences from Two Samples
Variation (one population)
6-4 Confidence interval for the population variance
7- 5 Inferences about the standard deviation
8-1 Inferences about Two Means: Independent and LARGE
Samples
8-2 Inferences about Two Means: Matched Pairs
8-3 Comparing Variation in Two Samples
8-4 Inferences about Two Means: Independent and SMALL
Samples
8-5 Inferences about Two Proportions
1
Overview
There are many important and meaningful
situations in which it becomes necessary
to compare two sets of sample data.
2
6-4, 7-5, 8-3
Estimation and Inferences
Variation
 6-4 Confidence interval for the population variance
 7-5 Inferences about the standard deviation
 8-3 Comparing Variation in Two Samples
3
Estimation and Inferences
Variation
 6-4 Confidence interval for the population variance
 7-5 Inferences about the standard deviation
 8-3 Comparing Variation in Two Samples
4
Assumptions
1. The sample is a random sample.
2. The population must have
normally distributed values (even if the sample
is large).
5
Chi-Square Distribution
X =
2
(n - 1) s 2
2
where
n
s2
2
= sample size
= sample variance
= population variance
6
X2 Critical Values found in the
Chi-Square Table
 Web Site
 Degrees of freedom (df ) = n - 1
7
Properties of the Distribution of
the Chi-Square Statistic
1. The chi-square distribution is not symmetric, unlike
the normal and Student t distributions.
As the number of degrees of freedom increases, the
distribution becomes more symmetric. (continued)
df = 10
Not symmetric
df = 20
0
All values are nonnegative
Chi-Square Distribution
x2
0
5 10 15 20
25 30 35 40 45
Chi-Square Distribution for df = 10
and df = 20
8
Properties of Chi-Square
Distribution
All values of X2 are nonnegative, and the
distribution is not symmetric.
There is a different distribution for each
number of degrees of freedom.
The critical values are found in the ChiSquare Table using n-1 degrees of freedom.
9
Chi-Square (x2) Distribution
Degrees of
freedom
Area to the Right of the Critical Value
0.005
0.10
0.025
0.05
0.10
0.995
0.99
0.975
0.95
0.90
1
2
3
4
5
7.879
10.597
12.838
14.860
16.750
6.635
9.210
11.345
13.277
15.086
5.024
7.378
9.348
11.143
12.833
3.841
5.991
7.815
9.488
11.071
2.706
4.605
6.251
7.779
9.236
_
0.010
0.072
0.207
0.412
_
0.020
0.115
0.297
0.554
0.001
0.051
0.216
0.484
0.831
0.004
0.103
0.352
0.711
1.145
0.016
0.211
0.584
1.064
1.610
6
7
8
9
10
18.548
20.278
21.955
23.589
25.188
16.812
18.475
20.090
21.666
23.209
14.449
16.013
17.535
19.023
20.483
12.592
14.067
15.507
16.919
18.307
10.645
12.017
13.362
14.684
15.987
0.676
0.989
1.344
1.735
2.156
0.872
1.239
1.646
2.088
2.558
1.237
1.690
2.180
2.700
3.247
1.635
2.167
2.733
3.325
3.940
2.204
2.833
3.490
4.168
4.865
11
12
13
14
15
26.757
28.299
29.819
31.319
32.801
24.725
26.217
27.688
29.141
30.578
21.920
23.337
24.736
26.119
27.488
19.675
21.026
22.362
23.685
24.996
17.275
18.549
19.812
21.064
22.307
2.603
3.074
3.565
4.075
4.601
3.053
3.571
4.107
4.660
5.229
3.816
4.404
5.009
5.629
6.262
4.575
5.226
5.892
6.571
7.261
5.578
6.304
7.042
7.790
8.547
16
17
18
19
20
34.267
35.718
37.156
38.582
39.997
32.000
33.409
34.805
36.191
37.566
28.845
30.191
31.526
32.852
34.170
26.296
27.587
28.869
30.144
31.410
23.542
24.769
25.989
27.204
28.412
5.142
5.697
6.265
6.844
7.434
5.812
6.408
7.015
7.633
8.260
6.908
7.564
8.231
8.907
9.591
7.962
8.672
9.390
10.117
10.851
9.312
10.085
10.865
11.651
12.443
21
22
23
24
25
41.401
42.796
44.181
45.559
46.928
38.932
40.289
41.638
42.980
44.314
35.479
36.781
38.076
39.364
40.646
32.671
33.924
35.172
36.415
37.652
29.615
30.813
32.007
33.196
34.382
8.034
8.643
9.260
9.886
10.520
8.897
9.542
10.196
10.856
11.524
10.283
10.982
11.689
12.401
13.120
11.591
12.338
13.091
13.848
14.611
13.240
14.042
14.848
15.659
16.473
26
27
28
29
30
48.290
49.645
50.993
52.336
53.672
45.642
46.963
48.278
49.588
50.892
41.923
43.194
44.461
45.722
46.979
38.885
40.113
41.337
42.557
43.773
35.563
36.741
37.916
39.087
40.256
11.160
11.808
12.461
13.121
13.787
12.198
12.879
13.565
14.257
14.954
13.844
14.573
15.308
16.047
16.791
15.379
16.151
16.928
17.708
18.493
17.292
18.114
18.939
19.768
20.599
40
50
60
70
80
90
100
66.766
79.490
91.952
104.215
116.321
128.299
140.169
63.691
76.154
88.379
100.425
112.329
124.116
135.807
59.342
71.420
83.298
95.023
106.629
118.136
129.561
55.758
67.505
79.082
90.531
101.879
113.145
124.342
51.805
63.167
74.397
85.527
96.578
107.565
118.498
20.707
27.991
35.534
43.275
51.172
59.196
67.328
22.164
29.707
37.485
45.442
53.540
61.754
70.065
24.433
32.357
40.482
48.758
57.153
65.647
74.222
26.509
34.764
43.188
51.739
60.391
69.126
77.929
29.051
37.689
46.459
55.329
64.278
73.291
82.358
10
Critical Values: Table
Areas to the right of each tail
0.975
0.025
0.025
0.025
0
XL2 = 2.700
2
X2
(df = 9)
XR = 19.023
11
Estimators of 
2
The sample variance s is the best
point estimate of the population
variance  .
2
2
12
Confidence Interval for the
2
Population Variance 
(n - 1)s 2
X
Right-tail CV
2
R
2 
(n - 1)s 2
X
2
L
Left-tail CV
Confidence Interval for the Population Standard Deviation 
(n - 1)s 2
X
2
R

(n - 1)s 2
2
XL
13
Example
Find the confidence interval for IQ Scores of
professional athletes ( assume population has a
normal distribution)
1 – a = .90
n = 12
x = 104
s = 12
14
Roundoff Rule for Confidence Interval
Estimates of  or 2
1. When using the original set of data to construct a
confidence interval, round the confidence interval
limits to one more decimal place than is used for
the original set of data.
2. When the original set of data is unknown and
only the summary statistics (n, s) are used, round
the confidence interval limits to the same number
of decimals places used for the sample standard
deviation or variance.
15
Estimation and Inferences
Variation
 6-4 Confidence interval for the population variance
 7-5 Inferences about the standard deviation
 8-3 Comparing Variation in Two Samples
16
Chi-Square Distribution
Test Statistic
X2=
n
(n - 1) s 2
2
= sample size
s 2 = sample variance
2 = population variance
(given in null hypothesis)
17
Example: Aircraft altimeters have measuring errors with a standard
deviation of 43.7 ft. With new production equipment, 81 altimeters
measure errors with a standard deviation of 52.3 ft. Use the 0.05
significance level to test the claim that the new altimeters have a
standard deviation different from the old value of 43.7 ft.
Claim:  43.7
H0: = 43.7
H1:  43.7
a = 0.05 a2
= 0.025
0.975
0.025
0.025
57.153
n = 81
df = 80
Use Table
106.629
18
x
2
=
(n -1)s2

2
=
(81 -1) (52.3)2
43.72
 114.586
Reject H0
57.153
106.629
x2 = 114.586
The sample evidence supports the claim that the
standard deviation is different from 43.7 ft.
19
x
2
=
(n -1)s2

2
=
(81 -1) (52.3)2
43.72
 114.586
Reject H0
57.153
106.629
x2 = 114.586
The new production method appears to be worse than the
old method. The data supports that there is more variation
in the error readings than before.
20
P-Value Method
Table A-4 includes only selected values
of a so specific P-values usually cannot
be found
Some calculators and computer
programs will find exact P-values
Don’t worry about finding p-values only
know how to read them.
21
8-1
Inferences about Two Means:
Independent and
LARGE Samples
22
Definitions
Two Samples: Independent
The sample values selected from one
population are not related or somehow paired
with the sample values selected from the
other population.
If the values in one sample are related to the
values in the other sample, the samples are
dependent. Such samples are often referred
to as matched pairs or paired samples.
23
Assumptions
1. The two samples are independent.
2. The two sample sizes are large. That is,
n1 > 30 and n2 > 30.
3. Both samples are random samples.
24
TI – 83 Procedure(s) for this
Section
2-SampZInt
2-SampZTest
Note: Must know σ1 and σ2
25
Hypothesis Tests
Test Statistic for Two Means:
Independent and Large Samples
Z*
=
(x1 - x2) - (µ1 - µ2)

2.
1
n1

2
2
+ n
2
26
Hypothesis Tests
Null Hypothesis
Ho :  1 =  2
or Ho : 1 - 2 = 0
Alternative Hypothesis
H1 :  1 ≠  2
H1 :  1 >  2
H1 :  1 <  2
27
Hypothesis Tests
Test Statistic for Two Means:
Independent and Large Samples
Proceedure
 and 
If  and  are not known, use s1 and s2
in their places. provided that both
samples are large.
Decision:
Use the computed value of the test statistic z,
the critical values and either the traditional or
P-value method to draw your conclusion.
28
Weights of Men
Men aged 25-34
Men aged 65-74
n
804
1657
x
176
164
σ
35
27
Sometimes is better to use a subscript that reflects
something about each population instead of just
1 or 2
For example o and y
29
Weights of Men
Claim: average weights of older men is
less that average weights of younger
men. That is, o < y
Ho :  o =  y
H1 : o < y Reject H
Fail to reject H
0
0
a = 0.01
Z = - 2.33
1 -  = 0
or Z = 0
30
Weights of Men
Test Statistic for Two Means:
Independent and Large Samples
z*
=
(164 - 176) - 0
27 2
1657
+
35 2
804
= - 8.56
31
Weights of Men
Claim: o < y
Ho :  o =  y
H1 :  o <  y
a = 0.01
Reject H0
Fail to reject H0
Reject H0
Z = - 2.33
-8.56
1 -  = 0
or Z = 0
32
Weights of Men
Claim: o < y
Ho : o = y
H1 : o < y
a = 0.01
What about the p-value?
P (z < -8.56) = ???? (very small)
33
Confidence Intervals
(x1 - x2) - E < (µ1 - µ2) < (x1 - x2) + E
where E =
za

2
1
n1

2
2
+ n
2
34
Confidence Intervals
E = 2.575 (1.401) = 3.6
12 – 3.6 < (µy - µo) < 12 + 3.6
8.4 < (µy - µo) < 15.6
Can we use this confidence interval to test equality of
means? Set up the hypothesis.
Note: When calculating x1 – x2, always make it a
positive value
35
8-2
Inferences about Two Means:
Matched Pairs
36
Assumptions
1. The sample data consist of matched pairs.
2. The samples are random samples.
3. If the number of pairs of sample data is
small (n  30), then the population of
differences in the paired values must be
approximately normally distributed.
37
TI – 83 Procedure(s) for this
Section
T-Test
T-Int
Note: (put differences in L1)
38
Notation for Matched Pairs
µd
= mean value of the differences d for the
population of paired data
d
= mean value of the differences d for the
paired sample data (equal to the mean
of the x - y values)
sd
= standard deviation of the differences d for
the paired sample data
n
= number of pairs of data.
39
Test Statistic for Matched Pairs of Sample Data
t*=
d - µd
sd
n
where degrees of freedom = n - 1
40
Hypothesis Tests
Null Hypothesis
Ho :  d = 0
Alternative Hypothesis
H1 :  d ≠ 0
H1 :  d > 0
H1 :  d < 0
41
Critical Values
If n  30, critical values are found
in Table A-3 (t-distribution).
If n > 30, critical values are found
in Table A- 2 (normal distribution).
42
Confidence Intervals
d - E < µd < d + E
where
E = ta/2
sd
n
degrees of freedom = n -1
43
SAT Scores – HW #7
d = 11
sd = 20.24846
n = 10
ta = -1.833 (found from Table A-3 with 10
degrees of freedom and 0.05 in two tails)
Set up hypothesis, sketch and test the claim
44
Confidence Interval
E = ta/2
sd
n
E = (2.262)(
20.24846
10
)
= 14.5
45
Confidence Interval
-3.5 < µd < 25.5
In the long run, 95% of such samples will lead to
confidence intervals that actually do contain the true
population mean of the differences.
Using the confidence interval to test the claim
Since the interval does contain 0, we FAIL TO
REJECT H0, There is not sufficient evidence to support
the claim that there is a difference between the scores
for those student who took the preparatory course and
those who didn’t.
46
Test Problem from this section
will have data similar to
A
B
C
D
E
Sample 1
Sample 2
75
79
76
85
76
74
79
82
76
88
Difference
4
9
-2
3
12
47
Estimation and Inferences
Variation
 6-4 Confidence interval for the population variance
 7-5 Inferences about the standard deviation
 8-3 Comparing Variation in Two Samples
48
Measures of Variation
s = standard deviation of sample
 = standard deviation of population
s2 = variance of sample
2 = variance of population
49
Comparing Variation in
Two Samples
Assumptions
1. The two populations are
independent of each other.
2. The two populations are each
normally distributed.
50
TI – 83 Procedure(s) for this
Section
2-SampFTest
51
Notation for Hypothesis Tests
with Two Variances
s2
= larger of the two sample variances
n
= size of the sample with the larger
variance
1
1
2
1
= variance of the population from which
the sample with the larger variance was
drawn
The symbols s2 , n2 , and 2 2are used for the
2
other sample and population.
52
F - distribution
Not symmetric (skewed to the right)
a
nonnegative values only
There is a different F - distribution for each
different pair of degrees of freedom for
numerator and denominator.
53
Critical Values
Critical Values: Using F-Table, we obtain critical F
values that are determined by the following three
values:
1. The significance level a.
2. Numerator degrees of freedom (df1) = n1 - 1
3. Denominator degrees of freedom (df2) = n2 - 1
54
 All one-tailed tests will be right-tailed.
 All two-tailed tests will need only the critical value
to the right.
Since T-5 only lists certain values of the degrees
of freedom you may have to interpolate or at least
approximate the F value.
Excel will give you exact value
55
Table
F-Distribution
(a value specified)
Denominator degrees of freedom (df2)
Numerator degrees of freedom (df1)
1
2
3
4
.
.
.
1
2
3
4
5
.
.
.
56
Let’s find some critical values
1. 1 – a = .95 , n1 = 10, n2=20
2. 1 – a = .90 , n1 = 10, n2=20
3. 1 – a = .95 , n1 = 20, n2=10
See excel calculator
57
Test Statistic for Hypothesis Tests with
Two Variances
F=
s
s
2
1
2
2
Where S21 is the larger value
58
If the two populations do have equal
s
variances, then F= s will be close to
2
2
1 because s1 and s2 are close in
value.
2
1
2
2
If the two populations have radically
s
different variances, then F= s will be
a large number.
2
1
2
2
Remember, the larger sample variance will be s12 .
59
Consequently, a value of F near 1 will
be evidence in favor of the conclusion
that 1 = 2 .
But a large value of F will be evidence
against the conclusion of equality of
the population variances.
60
Example
Comparison of Filtered and Non Filtered
Cigarettes
Filtered
Non Filtered
n
21
8
x
0.94
1.65
s
0.31
0.16
61
Claim:  = 
2
1
2
2
Example
Ho :  = 
2
1
2
1
H1 :   
2
2
2
2
a = 0.1 (use 0.05 in table)
62
Example
Claim:  = 
2
1
Ho :  = 
2
1
2
1
H1 :    2
2
2
Reject H0
Fail to reject H0 Reject H0
2
2
2
a = 0.1
0
Value of F =
=
F = above 3.4445
s12
s22
0.31 2
0.16 2
= 3.7539
63
Example
Claim:  = 
2
1
Ho :  = 
2
1
2
1
H1 :    2
2
2
Reject H0
Fail to reject H0 Reject H0
2
2
2
a = 0.1
0
F = approx 3.4445
Sample data:
F = 3.7539
There is sufficient evidence to warrant rejection of
the claim that the two variances are equal.
64
8-4
Inferences about Two Means:
Independent and
Small Samples
65
Inferences about Two Means:
Independent and Small Samples
Assumptions
1. The two samples are independent.
2. The two samples are random samples from
normally distributed populations.
3. At least one of the two samples is small (n  30).
66
TI – 83 Procedure(s) for this
Section
2-SampTInt
2-SampTTest
Note: Must consider pooled variances
67
Three Different Procedures
Case 1: The values of both population variances
are known. (This case seldom occurs.)
Case 2: The two populations appear to have
equal variances. (That is, 1
2
= 2 )
2
Case 3: The two populations appear to have
unequal variances. (That is, 1 
2
2 )
2
68
Case 1:
Both Population Variances
are Known
Almost never occurs as it is
necessary to know all the values
of both populations, and
therefore the values of 1 and 2
are known. Use method for
large samples.
69
Choosing between
Case 2 and Case 3:
Apply the F test described in section 8 – 3
2
2

=

to test the null hypothesis that 1
2
Use the conclusion of that test as follows:
2
2

=

Fail to reject 1
2 : Assume equal
population variances. (Case 2).
70
Choosing between
Case 2 and Case 3:
Apply the F test described in section 8 - 5
2
2

=

to test the null hypothesis that 1
2
Use the conclusion of that test as follows:
2
2

=

Fail to reject 1
2 : Assume equal
population variances. (Case 2).
Reject 1 = 2 : Assume unequal population
variances. (Case 3).
2
2
71
Case 2:
The Two Populations Appear to have
Equal Variances
 A pooled estimate of the variance  that is
2
common to both populations, denoted by sp2,
is calculated.
 s2 is a weighted average of s2 and s2
p
1
2.
72
Case 2:
Test Statistic
(Small Independent Samples and Equal Variances)
t* =
(x1 - x2) - (µ1 - µ2)
s
2
p
n1
where
s
2
p
+
s
2
p
n2
2
1
2
2
(n1 - 1) s + (n2 - 1) s
=
(n1 - 1) + (n2 - 1)
degrees of freedom df = n1 + n2 - 2
73
Case 2:
Confidence Interval
(Small Independent Samples and Equal Variances)
(x1 - x2) - E < (µ1 - µ2) < (x1 - x2 ) + E
where E =
and
2
p
s
ta
s +s
n
n
2
p
2
p
1
2
is as given in the test statistic
74
Case 3:
Unequal Population Variances
75
Case 3:
Test Statistic
(Small Independent Samples and Unequal Variances)
An approximate method
t* =
(x1 - x2) - (µ1 - µ2)
s
2
1
s
2
2
n1 + n2
where df = smaller of n1 - 1 and n2 - 1
76
Case 3:
Confidence Interval
(Small Independent Samples and Unequal Variances)
(x1 - x2) - E < (µ1 - µ2 ) < (x1 - x2) + E
where E =
ta
2
1
2
2
1
2
s +s
n
n
and df = smaller of n1 - 1 and n2 - 1
77
More Exact results using
df =
(A + B)
2
B2
A2
+
n1 - 1 n2 - 1
2
1
2
2
1
2
s
s
where A = n and B = n
This know as the Fisher-Behrens test
It is still an approximation, just a better one
78
Inferences about the Means of Two Populations
8.2
8.1
8.4
8.4 Case 2
79
8-5
Inferences about Two Proportions
80
Inferences about Two Proportions
Assumptions
1. We have proportions from two
independent simple random samples.
2. For both samples, the conditions np  5
and nq  5 are satisfied.
81
TI – 83 Procedure(s) for this
Section
2-PropZInt
2-PropZTest
82
Notation for Two Proportions
For population 1, we let:
p1 = population proportion
n1 = size of the sample
x1 = number of successes in the sample
p^1 = x1/n1 (the sample proportion)
q^1 = 1 - p^1
The corresponding meanings are attached
^ . and q^ , which come from
to p2, n2 , x2 , p
2
2
population 2.
83
Pooled Estimate of p1 and p2
 The pooled estimate of p1 and
is denoted by p
p2
x1 + x2
p= n +n
1
2

q =1- p
84
Test Statistic for Two Proportions
For H0: p1 = p2 , H0: p1  p2 ,
H1: p1  p2 ,
z* =
H0: p1 p2
H1: p1 < p2 , H1: p 1> p2
( p^1 - p^2 ) ( p1 - p2 )
pq
pq
+
n2
n1
85
Test Statistic for Two Proportions
For H0: p1 = p2 , H0: p1  p2 ,
H0: p1 p2
H1: p1  p2 ,
H1: p1 < p2 , H1: p 1> p2
p1 - p 2 = 0
(assumed in the null hypothesis)
where
p^
1
p=
x1 + x2
n1 + n2
x1
= n
1
and
and
p^
2
x2
=
n2
q=1-p
86
Confidence Interval Estimate of p1 -
p2
(p^1 - p^2 ) - E < ( p1 - p2 ) < (p^1 - p^2 ) + E
where E =
za
p^1 q^1
p^2 q^2
n1 + n2
87