Comparing 2 Population Means
Download
Report
Transcript Comparing 2 Population Means
Comparison of 2 Population
Means
• Goal: To compare 2 populations/treatments
wrt a numeric outcome
• Sampling Design: Independent Samples
(Parallel Groups) vs Paired Samples
(Crossover Design)
• Data Structure: Normal vs Non-normal
• Sample Sizes: Large (n1,n2>20) vs Small
Independent Samples
• Units in the two samples are different
• Sample sizes may or may not be equal
• Large-sample inference based on Normal
Distribution (Central Limit Theorem)
• Small-sample inference depends on
distribution of individual outcomes (Normal
vs non-Normal)
Parameters/Estimates
(Independent Samples)
•
•
•
•
Parameter:
Estimator: Y Y
S
Estimated standard error:
n
Shape of sampling distribution:
1
2
2
1
1
– Normal if data are normal
– Approximately normal if n1,n2>20
– Non-normal otherwise (typically)
2
S2
n2
Large-Sample Test of
• Null hypothesis: The population means differ by
D0 (which is typically 0): H 0 : 1 2 D 0
• Alternative Hypotheses:
– 1-Sided:
H A : 1 2 D 0
– 2-Sided: H
• Test Statistic:
A
: 1 2 D 0
z obs
( y1 y 2 ) D 0
2
S1
n1
2
S2
n2
Large-Sample Test of
• Decision Rule:
– 1-sided alternative
H A : 1 2 D 0
• If zobs za ==> Conclude D0
• If zobs < za ==> Do not reject D0
– 2-sided alternative
H A : 1 2 D 0
• If zobs za/ ==> Conclude D0
• If zobs -za/ ==> Conclude < D0
• If -za/ < zobs < za/ ==> Do not reject D0
Large-Sample Test of
• Observed Significance Level (P-Value)
– 1-sided alternative H A : 1 2 D 0
• P=P(z zobs) (From the std. Normal distribution)
– 2-sided alternative
H A : 1 2 D 0
• P=2P( z |zobs| ) (From the std. Normal distribution)
• If P-Value a, then reject the null hypothesis
Large-Sample (1-a)100% Confidence
Interval for
• Confidence Coefficient (1-a) refers to the proportion
of times this rule would provide an interval that
contains the true parameter value if it were
applied over all possible samples
• Rule:
y
1
)
y 2 za / 2
S
2
1
n1
S
2
2
n2
Large-Sample (1-a)100% Confidence
Interval for
• For 95% Confidence Intervals, z.025=1.96
• Confidence Intervals and 2-sided tests give
identical conclusions at same a-level:
– If entire interval is above D0, conclude D0
– If entire interval is below D0, conclude < D0
– If interval contains D0, do not reject ≠ D0
Example: Vitamin C for Common Cold
• Outcome: Number of Colds During Study Period
for Each Student
• Group 1: Given Placebo
y 1 2 .2
s1 0 . 12
n1 155
• Group 2: Given Ascorbic Acid (Vitamin C)
y 2 1 .9
Source: Pauling (1971)
s 2 0 . 10
n 2 208
2-Sided Test to Compare Groups
• H0: 12 0 No difference in trt effects)
• HA: 12≠ 0 Difference in trt effects)
• Test Statistic:
z obs
( 2 .2 1 .9 ) 0
( 0 . 12 )
155
2
( 0 . 10 )
2
0 .3
25 . 3
0 . 0119
208
• Decision Rule (a=0.05)
– Conclude > 0 since zobs = 25.3 > z.025 = 1.96
95% Confidence Interval for
• Point Estimate:
y 1 y 2 2 .2 1 .9 0 .3
• Estimated Std. Error:
( 0 . 12 )
155
2
( 0 . 10 )
2
0 . 0119
208
• Critical Value: z.025 = 1.96
• 95% CI: 0.30 ± 1.96(0.0119) 0.30 ± 0.023
(0.277 , 0.323) Entire interval > 0
Small-Sample Test for
Normal Populations
• Case 1: Common Variances (s12 = s22 = s2)
• Null Hypothesis:
H 0 : 1 2 D 0
• Alternative Hypotheses:
– 1-Sided:
H A : 1 2 D 0
– 2-Sided:
H A : 1 2 D 0
• Test Statistic:(where Sp2 is a “pooled” estimate of s2)
t obs
( y1 y 2 ) D 0
1
1
S
n2
n1
2
p
( n 1 1 ) S 1 ( n 2 1) S 2
2
S
2
p
n1 n 2 2
2
Small-Sample Test for
Normal Populations
• Decision Rule: (Based on t-distribution with n=n1+n2-2 df)
– 1-sided alternative
• If tobs ta,n ==> Conclude D0
• If tobs < ta,n ==> Do not reject D0
– 2-sided alternative
• If tobs ta/ ,n ==> Conclude D0
• If tobs -ta/,n ==> Conclude < D0
• If -ta/,n < tobs < ta/,n ==> Do not reject D0
Small-Sample Test for
Normal Populations
• Observed Significance Level (P-Value)
• Special Tables Needed, Printed by Statistical Software
Packages
– 1-sided alternative
• P=P(t tobs) (From the tn distribution)
– 2-sided alternative
• P=2P( t |tobs| ) (From the tn distribution)
• If P-Value a, then reject the null hypothesis
Small-Sample (1-a)100% Confidence Interval
for Normal Populations
• Confidence Coefficient (1-a) refers to the proportion of
times this rule would provide an interval that contains the
true parameter value if it were applied over all
possible samples
• Rule:
y
1
)
y 2 ta / 2 ,
1
1
S
n
n
2
1
2
p
• Interpretations same as for large-sample CI’s
Small-Sample Inference for
Normal Populations
• Case 2: s12 s22
• Don’t pool variances:
2
Sy
1
y2
S1
2
n1
S2
n2
• Use “adjusted” degrees of freedom (Satterthwaites’
Approximation) :
S
S
2
1
n*
2
2
2
n n
2
1
2
2
S2
S 22
1
n1
n2
n1 1
n2 1
Example - Scalp Wound Closure
• Groups: Stapling (n1=15) / Suturing (n2=16)
• Outcome: Physician Reported VAS Score at 1-Year
M ean
S td D ev
S am ple S ize
S tap lin g (i= 1)
96.92
7.51
15
S u tu rin g (i= 2)
96.31
8.06
16
• Conduct a 2-sided test of whether mean scores differ
• Construct a 95% Confidence Interval for true difference
Source: Khan, et al (2002)
Example - Scalp Wound Closure
H0: 0
S
2
p
HA: 0
(15 1)( 7 . 51 )
TS : t obs
2
(16 1)( 8 . 06 )
15 16 2
96 . 92 96 . 31
1
1
60 . 83
16
15
0 . 61
(a = 0.05)
2
60 . 83
0 . 22
2 . 80
RR : | t obs | t . 025 , 29 2 . 045
95 % CI : 0 . 61 2 . 045 ( 2 . 80 ) 0 . 61 5 . 73 ( 5 . 12 , 6 . 34 )
No significant difference between 2 methods
Small Sample Test to Compare Two
Medians - Nonnormal Populations
• Two Independent Samples (Parallel Groups)
• Procedure (Wilcoxon Rank-Sum Test):
– Rank measurements across samples from smallest (1)
to largest (n1+n2). Ties take average ranks.
– Obtain the rank sum for each group (T1 , T2 )
– 1-sided tests:Conclude HA: M1 > M2 if T2 T0
– 2-sided tests:Conclude HA: M1 M2 if min(T1, T2) T0
– Values of T0 are given in many texts for various sample
sizes and significance levels. P-values printed by
statistical software packages.
Example - Levocabostine in Renal Patients
• 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6)
• Outcome: Levocabastine AUC (1 Outlier/Group)
N o n -D ia ly sis
8 5 7 (1 2 )
5 6 7 (9 )
6 2 6 (1 0 )
5 3 2 (8 )
444
(5 )
357
(1 )
T1 = 45
H e m o d ia ly sis
5 2 7 (7 )
7 4 0 (1 1 )
3 9 2 (2 .5 )
5 1 4 (6 )
433
(4 )
392
(2 .5 )
T2 = 33
2-sided Test: Conclude Medians differ if min(T1,T2) 26
Source: Zagornik, et al (1993)
Computer Output - SPSS
n
N
f
G
A
N
6
0
0
H
6
0
0
2
T
b
a
U
M
0
0
W
2
Z
6
A
4
a
E
S
a
N
b
G
Inference Based on Paired Samples
(Crossover Designs)
• Setting: Each treatment is applied to each subject or pair
(preferably in random order)
• Data: di is the difference in scores (Trt1-Trt2) for subject
(pair) i
• Parameter: D - Population mean difference
• Sample Statistics:
d
n
i 1
n
di
d
n
s
2
d
i 1
i
d
n 1
)
2
sd
2
sd
Test Concerning D
• Null Hypothesis: H0:D=D0
(almost always 0)
• Alternative Hypotheses:
– 1-Sided: HA: D > D0
– 2-Sided: HA: D D0
• Test Statistic:
t obs
d
sd
n
Test Concerning D
Decision Rule: (Based on t-distribution with n=n-1 df)
1-sided alternative
If tobs ta,n ==> Conclude D D0
If tobs < ta,n ==> Do not reject D D0
2-sided alternative
If tobs ta/ ,n ==> Conclude D D0
If tobs -ta/,n ==> Conclude D < D0
If -ta/,n < tobs < ta/,n ==> Do not reject D D0
Confidence Interval for D
sd
d t a / 2 ,n
n
Example - Evaluation of Transdermal
Contraceptive Patch In Adolescents
• Subjects: Adolescent Females on O.C. who then
received Ortho Evra Patch
• Response: 5-point scores on ease of use for each
type of contraception (1=Strongly Agree)
• Data: di = difference (O.C.-EVRA) for subject i
• Summary Statistics:
d 1 . 77
Source: Rubinstein, et al (2004)
s d 1 . 48
n 13
Example - Evaluation of Transdermal
Contraceptive Patch In Adolescents
• 2-sided test for differences in ease of use (a=0.05)
• H0:D = 0
TS : t obs
HA:D 0
1 . 77
1 . 48
1 . 77
4 . 31
0 . 41
13
RR :| t obs | t .025 ,12 2 . 179
95 % CI :
1 . 77 2 . 179 ( 0 . 41 ) 1 . 77 0 . 89 ( 0 . 88 , 2 . 66 )
Conclude Mean Scores are higher for O.C., girls find
the Patch easier to use (low scores are better)
Small-Sample Test For Nonnormal Data
• Paired Samples (Crossover Design)
• Procedure (Wilcoxon Signed-Rank Test)
– Compute Differences di (as in the paired t-test) and obtain their
absolute values (ignoring 0s)
– Rank the observations by |di| (smallest=1), averaging ranks for ties
– Compute T+ and T-, the rank sums for the positive and negative
differences, respectively
– 1-sided tests:Conclude HA: M1 > M2 if T- T0
– 2-sided tests:Conclude HA: M1 M2 if min(T+, T- ) T0
– Values of T0 are given in many texts for various sample sizes and
significance levels. P-values printed by statistical software
packages.
Example - New MRI for 3D Coronary
Angiography
• Previous vs new Magnetization Prep Schemes (n=7)
• Response: Blood/Myocardium Contrast-Noise-Ratio
S u b je c t
A
B
C
D
E
F
G
P r e v io u s
20
31
20
19
40
28
10
N ew
36
37
27
32
48
40
25
D iff= P r e -N e w
-1 6
-6
-7
-1 3
-8
-1 2
-1 5
|D iff|
16
6
7
13
8
12
15
R a n k (|D iff|)
7
1
2
5
3
4
6
• All Differences are negative, T- = 1+2+…+7 = 28, T+ = 0
• From tables for 2-sided tests, n=7, a=0.05, T0=2
• Since min(0,28) 2, Conclude the scheme means differ
Source: Nguyen, et al (2004)
a
Computer Output - SPSS
n
o
n
N
f
R
a
N
N
0
0
0
b
P
7
0
0
c
T
0
T
7
a
N
b
N
c
N
t
b
a
W
V
I
a
Z
6
8
A
a
B
b
W
Note that SPSS is taking NEW-PREVIOUS in top table