Comparing 2 Population Means

Transcript Comparing 2 Population Means

Comparison of 2 Population
Means
• Goal: To compare 2 populations/treatments
wrt a numeric outcome
• Sampling Design: Independent Samples
(Parallel Groups) vs Paired Samples
(Crossover Design)
• Data Structure: Normal vs Non-normal
• Sample Sizes: Large (n1,n2>20) vs Small
Independent Samples
• Units in the two samples are different
• Sample sizes may or may not be equal
• Large-sample inference based on Normal
Distribution (Central Limit Theorem)
• Small-sample inference depends on
distribution of individual outcomes (Normal
vs non-Normal)
Parameters/Estimates
(Independent Samples)
•
•
•
•
Parameter:     
Estimator: Y  Y
S
Estimated standard error:
n
Shape of sampling distribution:
1
2
2
1
1
– Normal if data are normal
– Approximately normal if n1,n2>20
– Non-normal otherwise (typically)
2

S2
n2
Large-Sample Test of 
• Null hypothesis: The population means differ by
D0 (which is typically 0): H 0 :  1   2  D 0
• Alternative Hypotheses:
– 1-Sided:
H A : 1   2  D 0
– 2-Sided: H
• Test Statistic:
A
: 1   2  D 0
z obs 
( y1  y 2 )  D 0
2
S1
n1
2

S2
n2
Large-Sample Test of 
• Decision Rule:
– 1-sided alternative
H A : 1   2  D 0
• If zobs  za ==> Conclude   D0
• If zobs < za ==> Do not reject   D0
– 2-sided alternative
H A : 1   2  D 0
• If zobs  za/ ==> Conclude   D0
• If zobs  -za/ ==> Conclude  < D0
• If -za/ < zobs < za/ ==> Do not reject   D0
Large-Sample Test of 
• Observed Significance Level (P-Value)
– 1-sided alternative H A :  1   2  D 0
• P=P(z  zobs) (From the std. Normal distribution)
– 2-sided alternative
H A : 1   2  D 0
• P=2P( z |zobs| ) (From the std. Normal distribution)
• If P-Value  a, then reject the null hypothesis
Large-Sample (1-a)100% Confidence
Interval for 
• Confidence Coefficient (1-a) refers to the proportion
of times this rule would provide an interval that
contains the true parameter value  if it were
applied over all possible samples
• Rule:
y
1
)
 y 2  za / 2
S
2
1
n1

S
2
2
n2
Large-Sample (1-a)100% Confidence
Interval for 
• For 95% Confidence Intervals, z.025=1.96
• Confidence Intervals and 2-sided tests give
identical conclusions at same a-level:
– If entire interval is above D0, conclude   D0
– If entire interval is below D0, conclude  < D0
– If interval contains D0, do not reject  ≠ D0
Example: Vitamin C for Common Cold
• Outcome: Number of Colds During Study Period
for Each Student
• Group 1: Given Placebo
y 1  2 .2
s1  0 . 12
n1  155
• Group 2: Given Ascorbic Acid (Vitamin C)
y 2  1 .9
Source: Pauling (1971)
s 2  0 . 10
n 2  208
2-Sided Test to Compare Groups
• H0: 12 0 No difference in trt effects)
• HA: 12≠ 0 Difference in trt effects)
• Test Statistic:
z obs 
( 2 .2  1 .9 )  0
( 0 . 12 )
155
2

( 0 . 10 )

2
0 .3
 25 . 3
0 . 0119
208
• Decision Rule (a=0.05)
– Conclude  > 0 since zobs = 25.3 > z.025 = 1.96
95% Confidence Interval for 
• Point Estimate:
y 1  y 2  2 .2  1 .9  0 .3
• Estimated Std. Error:
( 0 . 12 )
155
2

( 0 . 10 )
2
 0 . 0119
208
• Critical Value: z.025 = 1.96
• 95% CI: 0.30 ± 1.96(0.0119)  0.30 ± 0.023
 (0.277 , 0.323) Entire interval > 0
Small-Sample Test for 
Normal Populations
• Case 1: Common Variances (s12 = s22 = s2)
• Null Hypothesis:
H 0 : 1   2  D 0
• Alternative Hypotheses:
– 1-Sided:
H A : 1   2  D 0
– 2-Sided:
H A : 1   2  D 0
• Test Statistic:(where Sp2 is a “pooled” estimate of s2)
t obs 
( y1  y 2 )  D 0
 1
1 

S 

n2 
 n1
2
p
( n 1  1 ) S 1  ( n 2  1) S 2
2
S
2
p

n1  n 2  2
2
Small-Sample Test for 
Normal Populations
• Decision Rule: (Based on t-distribution with n=n1+n2-2 df)
– 1-sided alternative
• If tobs  ta,n ==> Conclude   D0
• If tobs < ta,n ==> Do not reject   D0
– 2-sided alternative
• If tobs  ta/ ,n ==> Conclude   D0
• If tobs  -ta/,n ==> Conclude  < D0
• If -ta/,n < tobs < ta/,n ==> Do not reject   D0
Small-Sample Test for 
Normal Populations
• Observed Significance Level (P-Value)
• Special Tables Needed, Printed by Statistical Software
Packages
– 1-sided alternative
• P=P(t  tobs) (From the tn distribution)
– 2-sided alternative
• P=2P( t  |tobs| ) (From the tn distribution)
• If P-Value  a, then reject the null hypothesis
Small-Sample (1-a)100% Confidence Interval
for   Normal Populations
• Confidence Coefficient (1-a) refers to the proportion of
times this rule would provide an interval that contains the
true parameter value  if it were applied over all
possible samples
• Rule:
y
1
)
 y 2  ta / 2 ,
 1
1 

S 


n
n
2 
 1
2
p
• Interpretations same as for large-sample CI’s
Small-Sample Inference for 
Normal Populations
• Case 2: s12  s22
• Don’t pool variances:
2
Sy
1
 y2
S1

2

n1
S2
n2
• Use “adjusted” degrees of freedom (Satterthwaites’
Approximation) :
 S
S 
2
1
n*
2
2
2

 n  n 

2 
 1
2
2
 S2

 S 22

  1



n1 
n2 
 



n1  1
n2  1










Example - Scalp Wound Closure
• Groups: Stapling (n1=15) / Suturing (n2=16)
• Outcome: Physician Reported VAS Score at 1-Year
M ean
S td D ev
S am ple S ize
S tap lin g (i= 1)
96.92
7.51
15
S u tu rin g (i= 2)
96.31
8.06
16
• Conduct a 2-sided test of whether mean scores differ
• Construct a 95% Confidence Interval for true difference
Source: Khan, et al (2002)
Example - Scalp Wound Closure
H0:   0
S
2
p

HA:   0
(15  1)( 7 . 51 )
TS : t obs 
2
 (16  1)( 8 . 06 )
15  16  2
96 . 92  96 . 31
1 
 1
60 . 83 


16 
 15

0 . 61
(a = 0.05)
2
 60 . 83
 0 . 22
2 . 80
RR : | t obs |  t . 025 , 29  2 . 045
95 % CI : 0 . 61  2 . 045 ( 2 . 80 )  0 . 61  5 . 73  (  5 . 12 , 6 . 34 )
No significant difference between 2 methods
Small Sample Test to Compare Two
Medians - Nonnormal Populations
• Two Independent Samples (Parallel Groups)
• Procedure (Wilcoxon Rank-Sum Test):
– Rank measurements across samples from smallest (1)
to largest (n1+n2). Ties take average ranks.
– Obtain the rank sum for each group (T1 , T2 )
– 1-sided tests:Conclude HA: M1 > M2 if T2  T0
– 2-sided tests:Conclude HA: M1  M2 if min(T1, T2)  T0
– Values of T0 are given in many texts for various sample
sizes and significance levels. P-values printed by
statistical software packages.
Example - Levocabostine in Renal Patients
• 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6)
• Outcome: Levocabastine AUC (1 Outlier/Group)
N o n -D ia ly sis
8 5 7 (1 2 )
5 6 7 (9 )
6 2 6 (1 0 )
5 3 2 (8 )
444
(5 )
357
(1 )
T1 = 45
H e m o d ia ly sis
5 2 7 (7 )
7 4 0 (1 1 )
3 9 2 (2 .5 )
5 1 4 (6 )
433
(4 )
392
(2 .5 )
T2 = 33
2-sided Test: Conclude Medians differ if min(T1,T2)  26
Source: Zagornik, et al (1993)
Computer Output - SPSS
n
N
f
G
A
N
6
0
0
H
6
0
0
2
T
b
a
U
M
0
0
W
2
Z
6
A
4
a
E
S
a
N
b
G
Inference Based on Paired Samples
(Crossover Designs)
• Setting: Each treatment is applied to each subject or pair
(preferably in random order)
• Data: di is the difference in scores (Trt1-Trt2) for subject
(pair) i
• Parameter: D - Population mean difference
• Sample Statistics:

d 
n
i 1
n
di
d


n
s
2
d
i 1
i
d
n 1
)
2
sd 
2
sd
Test Concerning D
• Null Hypothesis: H0:D=D0
(almost always 0)
• Alternative Hypotheses:
– 1-Sided: HA: D > D0
– 2-Sided: HA: D  D0
• Test Statistic:
t obs 
d
sd
n
Test Concerning D
Decision Rule: (Based on t-distribution with n=n-1 df)
1-sided alternative
If tobs  ta,n ==> Conclude D  D0
If tobs < ta,n ==> Do not reject D  D0
2-sided alternative
If tobs  ta/ ,n ==> Conclude D  D0
If tobs  -ta/,n ==> Conclude D < D0
If -ta/,n < tobs < ta/,n ==> Do not reject D  D0
Confidence Interval for D
 sd 
d  t a / 2 ,n 

 n
Example - Evaluation of Transdermal
Contraceptive Patch In Adolescents
• Subjects: Adolescent Females on O.C. who then
received Ortho Evra Patch
• Response: 5-point scores on ease of use for each
type of contraception (1=Strongly Agree)
• Data: di = difference (O.C.-EVRA) for subject i
• Summary Statistics:
d  1 . 77
Source: Rubinstein, et al (2004)
s d  1 . 48
n  13
Example - Evaluation of Transdermal
Contraceptive Patch In Adolescents
• 2-sided test for differences in ease of use (a=0.05)
• H0:D = 0
TS : t obs 
HA:D  0
1 . 77
1 . 48

1 . 77
 4 . 31
0 . 41
13
RR :| t obs | t .025 ,12  2 . 179
95 % CI :
1 . 77  2 . 179 ( 0 . 41 )  1 . 77  0 . 89  ( 0 . 88 , 2 . 66 )
Conclude Mean Scores are higher for O.C., girls find
the Patch easier to use (low scores are better)
Small-Sample Test For Nonnormal Data
• Paired Samples (Crossover Design)
• Procedure (Wilcoxon Signed-Rank Test)
– Compute Differences di (as in the paired t-test) and obtain their
absolute values (ignoring 0s)
– Rank the observations by |di| (smallest=1), averaging ranks for ties
– Compute T+ and T-, the rank sums for the positive and negative
differences, respectively
– 1-sided tests:Conclude HA: M1 > M2 if T-  T0
– 2-sided tests:Conclude HA: M1  M2 if min(T+, T- )  T0
– Values of T0 are given in many texts for various sample sizes and
significance levels. P-values printed by statistical software
packages.
Example - New MRI for 3D Coronary
Angiography
• Previous vs new Magnetization Prep Schemes (n=7)
• Response: Blood/Myocardium Contrast-Noise-Ratio
S u b je c t
A
B
C
D
E
F
G
P r e v io u s
20
31
20
19
40
28
10
N ew
36
37
27
32
48
40
25
D iff= P r e -N e w
-1 6
-6
-7
-1 3
-8
-1 2
-1 5
|D iff|
16
6
7
13
8
12
15
R a n k (|D iff|)
7
1
2
5
3
4
6
• All Differences are negative, T- = 1+2+…+7 = 28, T+ = 0
• From tables for 2-sided tests, n=7, a=0.05, T0=2
• Since min(0,28)  2, Conclude the scheme means differ
Source: Nguyen, et al (2004)
a
Computer Output - SPSS
n
o
n
N
f
R
a
N
N
0
0
0
b
P
7
0
0
c
T
0
T
7
a
N
b
N
c
N
t
b
a
W
V
I
a
Z
6
8
A
a
B
b
W
Note that SPSS is taking NEW-PREVIOUS in top table