Transcript Document

Chapter 9
Inferences Based
on
Two Samples
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.1
z Tests and Confidence
Intervals for a
Difference Between
Two Population Means
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Difference Between Two
Population Means
New Notation
Assumptions:
1. X1,…,Xm is a random sample from a
2
population with 1 and 1 .
m: sample size 1
2. Y1,…,Yn is a random sample from a
2
population with 2 and  2 .
n: sample size 2
3. The X and Y samples are independent
of one another
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Expected Value and Standard
Deviation of X  Y
The expected value is 1  2 .
So X  Y is an estimator of
Think of this as
the parameter.
1  2 .
The standard deviation is
 X Y 

2
1
m


2
2
n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Test Procedures for Normal
Populations With Known Variances
Null hypothesis:
H0 : 1  2  0
same
Test statistic value: z 
x  y  0

2
1
m


2
2
n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
 () = P(Type II Error)
 (  1  2 )
Alt. Hypothesis
Ha : 1  2  0
Ha : 1  2  0
Ha : 1  2  0
Similar to p. 330
formulas
   0 

  z 

 

   0 

1     z 




   0 

  z / 2 

 

   0 

   z / 2 

 

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Large-Sample Tests
The assumptions of normal population
distributions and known values of 1 ,  2
are unnecessary. The Central Limit
Theorem guarantees that X  Y has
approximately a normal distribution.
Rule of thumb: Both m, n>40
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Large-Sample Tests
Use of the test statistic value
z
x  y  0
2
1
2
2
s
s

m n
Usually zero
m, n >40
along with previously stated rejection
regions based on z critical values give
large-sample tests whose significance
levels are approximately .
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval for 1   2
Provided m and n are large, a CI for
1  2 with a confidence level of
100(1   )% is
x  y  z / 2
2
1
2
2
s
s

m n
confidence bounds can be found by
replacing z / 2 by z .
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.2
The Two-Sample
t Test and
Confidence Interval
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Assumptions
Both populations are normal, so that
X1,…,Xm is a random sample from a
normal distribution and so is Y1,…,Yn.
The plausibility of these assumptions can
be judged by constructing a normal
probability plot of the xi’s and another of
the yi’s.
Normality assumption important for (small-sample) t-tests!
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
t Distribution
When the population distributions are
both normal, the standardized variable
T
X  Y  ( 1  2 )
2
1
2
2
S
S

m n
has approximately a t distribution…
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
t Distribution
df v can be estimated from the data
by
Yuck! Don’t do
2
2 2
v
s
2
1
 s1 s2 
  
m n
/ m
m 1
2
s


2
2
by hand if you
can help it.
/ n
2
n 1
(round down to the nearest integer)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Two-Sample CI for 1   2
The two-sample CI for 1   2
with a confidence level of 100(1   )%
is
x  y  t / 2,v
2
1
2
2
s s

m n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Two-Sample t Test
Null hypothesis:
H0 : 1  2  0
Usually zero
Test statistic value: z 
x  y  0
2
1
2
2
s
s

m n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Two-Sample t Test
Alternative
Hypothesis
Rejection Region for
Approx. Level  Test
H a :   0   0
t  t ,v
H a :   0   0
t  t ,v
H a :   0  0 t  t / 2,v or t  t / 2,v
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Important: pooled t assumes equal variances
Pooled t Procedures
Assume two populations are normal and
2
have equal variances. If  denotes the
common variance, it can be estimated
by combining information from the two
samples. Standardizing X  Y using
the pooled estimator gives a t variable
based on m + n – 2 df.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Pooled sample variance
2
2
(
m

1)
S
(
n

1)
S
1
2
S P2 

mn2 mn2
Usage in formulas:
S12 S22

becomes
m n
S P2 S P2
1
2 1

or S P   
m n
m n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.3
Analysis
Paired Data
of
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Paired Data (Assumptions)
Important: A natural pairing must exist!
The data consists of n independently
selected pairs (X1,Y1),…, (Xn,Yn), with
E( X i )  1 and E(Yi )  2
Let D1 = X1 – Y1, …, Dn = Xn – Yn.
The Di’s are assumed to be normally
distributed with mean value  D and
2
variance  D . Bottom line: Two-sample problem
becomes a one-sample problem!
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Paired t Test
Null hypothesis:
H 0 :  D  0
Usually zero
Test statistic value:
t
d  0
sD / n
d and sD are the sample mean
and standard deviation of the di’s.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Paired t Test
Alternative
Hypothesis
Nothing new
here!
Rejection Region for
Level  Test
H a :  D  0
t  t ,n1
H a :  D  0
t  t ,n1
H a :  D  0
t  t / 2,n1 or t  t / 2,n1
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval for  D
The paired t CI for
 D is
Nothing new
here!
d  t / 2,n1  sD / n
confidence bounds can be found by
replacing t / 2 by t .
For large samples, you could use Z test and CI
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Paired Data and Two-Sample t
1

V ( X  Y )  V ( D)  V   Di 
n

2
2
V ( Di )  1   2  2  1 2


n
n
Remember: Smaller variance means better estimates
Independence between X and Y 
Positive dependence 
 0
 0
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Pros and Cons of Pairing
1. For great heterogeneity and large correlation
within experimental units, the loss in degrees
of freedom will be compensated for by an
increased precision associated with pairing
Usually, we’re in case 1;
(use pairing).
use pairing if possible.
2. If the units are relatively homogeneous and
the correlation within pairs is not large, the
gain in precision due to pairing will be
outweighed by the decrease in degrees of
freedom (use independent samples).
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.4
Inferences
Concerning a
Difference Between
Population Proportions
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Difference Between Population
Proportions
Let X ~Bin(m,p1) and Y ~Bin(n,p2) with
X and Y independent variables. Then
pˆ1  pˆ 2 is an estimator of p1  p2
X
Y
Note: p1 
and p2 
m
n
E  pˆ1  pˆ 2   p1  p2
p1q1 p2 q2
V  pˆ1  pˆ 2  

m
n
(qi = 1 – pi)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
mpˆ1  10 and mqˆ1  10 and npˆ 2  10 and nqˆ2  10
Large-Samples
Null hypothesis:
H0 : p1  p2  0
Test statistic value:
z
pˆ1  pˆ 2
ˆ ˆ 1/ m  1/ n 
pq
Standard error involves p, a
weighted average of p1 and p2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Only for test of H 0 : p1  p2  0,
Standard error involves p, a weighted average of p1 and p2
m
n
p
p1 
p2
mn
mn
Total number of successes (X  Y )
p
Total number of trials (m  n)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval for p1 – p2
pˆ1  pˆ 2  z / 2
pˆ1qˆ1 pˆ 2 qˆ2

m
n
Note: Standard error here is
slightly different than for test!
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.