Transcript Chapter 6
Chapter 6
Inferences Regarding Locations of
Two Distributions
Comparing 2 Means - Independent Samples
• Goal: Compare responses between 2 groups (populations,
treatments, conditions)
• Observed individuals from the 2 groups are samples from
distinct populations (identified by (m1,s1) and (m2,s2))
• Measurements across groups are independent (different
individuals in the 2 groups)
• Summary statistics obtained from the 2 groups:
Group 1 : Mean : y1 Std. Dev. : s1 Sample Size : n1
Group 2 : Mean : y 2 Std. Dev. : s2 Sample Size : n2
Sampling Distribution of
Y1 Y
2
• Underlying distributions normal sampling distribution
is normal
• Underlying distributions nonnormal, but large sample
sizes sampling distribution approximately normal
• Mean, variance, standard error (Std. Dev. of estimator):
E Y 1 Y 2 mY 1 Y 2 m1 m 2
V Y 1 Y 2 s Y 1 Y 2
sY
1 Y 2
2
s 12
n1
s 22
n2
s 12
n1
s 22
n2
Small-Sample Test for m1m2
Normal Populations
• Case 1: Common Variances (s12 = s22 = s2)
• Null Hypothesis:
H 0 : m1 m 2 0
• Alternative Hypotheses:
– 1-Sided:
H A : m1 m 2 0
– 2-Sided: H A : m1 m2 0
• Test Statistic:(where Sp2 is a “pooled” estimate of s2)
t obs
( y1 y 2 ) 0
sp
1
1
n n
2
1
( n1 1) s1 ( n2 1) s2
2
sp
n1 n2 2
2
Small-Sample Test for m1m2
Normal Populations
• Decision Rule: (Based on t-distribution with n=n1+n2-2 df)
– 1-sided alternative
• If tobs ta,n ==> Conclude m1m2 0
• If tobs < ta,n ==> Do not reject m1m2 0
– 2-sided alternative
• If tobs ta/2 ,n ==> Conclude m1m2 0
• If tobs -ta/2,n ==> Conclude m1m2 < 0
• If -ta/2,n < tobs < ta/2,n ==> Do not reject m1m2 0
Small-Sample Test for m1m2
Normal Populations
• Observed Significance Level (P-Value)
• Special Tables Needed, Printed by Statistical Software
Packages
– 1-sided alternative
• P=P(t tobs) (From the tn distribution)
– 2-sided alternative
• P=2P( t |tobs| ) (From the tn distribution)
• If P-Value a, then reject the null hypothesis
Small-Sample (1-a100% Confidence Interval
for m1m2 Normal Populations
• Confidence Coefficient (1-a) refers to the proportion of times this
rule would provide an interval that contains the true parameter
value m1m2 if it were applied over all possible samples
• Rule:
y
1
y 2 ta / 2 s p
1
1
n
n
2
1
• Interpretation (at the a significance level):
– If interval contains 0, do not reject H0: m1 = m2
– If interval is strictly positive, conclude that m1 > m2
– If interval is strictly negative, conclude that m1 < m2
t-test when Variances are Unequal
• Case 2: Population Variances not assumed to be equal (s12s22)
• Approximate degrees of freedom
– Calculated from a function of sample variances and sample sizes (see formula
below) - Satterthwaite’s approximation
– Smaller of n1-1 and n2-1
• Estimated standard error and test statistic for testing H0: m1=m2:
2
Estimated standard error : SE Y 1 Y 2
Test Statistic : t obs
y1 y 2
SE y1 y 2
2
n1
2
s1
2
2
s12
s2
n n
2
1
s2
n2
2
2
2
2
s2
s
1
2
n
n2
1
n 1 n 1
1
2
s2
n2
y1 y 2
n1
Satterthwa ite' s df : n
s1
Example - Maze Learning (Adults/Children)
• Groups: Adults (n1=14) / Children (n2=10)
• Outcome: Average # of Errors in Maze Learning Task
• Raw Data on next slide
Mean
Std Dev
Sample Size
Adults (i=1)
13.28
4.47
14
Children (i=2)
18.28
9.93
10
• Conduct a 2-sided test of whether true mean scores differ
• Construct a 95% Confidence Interval for true difference
Source: Gould and Perrin (1916)
Example - Maze Learning (Adults/Children)
Name
Group
Trials
Errors
Average
H
1
41
728
17.76
W
1
25
333
13.32
Mac
1
33
453
13.73
McG
1
31
528
17.03
L
1
41
335
8.17
1
14
13.28
4.47
R
1
48
553
11.52
2
10
18.28
9.93
Hv
1
24
217
9.04
Hy
1
32
711
22.22
F
1
46
839
18.24
Wd
1
47
473
10.06
Rh
1
35
532
15.20
D
1
69
538
7.80
Hg
1
27
213
7.89
Hp
1
27
375
13.89
Hl
2
42
254
6.05
McS
2
89
1559
17.52
Lin
2
38
1089
28.66
B
2
20
254
12.70
N
2
49
599
12.22
T
2
40
520
13.00
J
2
50
828
16.56
Hz
2
40
516
12.90
Lev
2
54
2171
40.20
K
2
58
1331
22.95
Group
n
Mean
Std Dev
Example - Maze Learning
Case 1 - Equal Variances
H0: m1m2 0
HA: m1m2 0
(14 1)( 4.47) (10 1)(9.93)
2
sp
(a = 0.05)
TS : t obs
14 10 2
13.28 18.28
1
1
7.22
14 10
5.00
2
52.15 7.22
1.67
2.99
RR : | t obs | t.025, 22 2.074
P value : 2 P (T | 1.67 |) .1091 (From EXCEL)
95%CI : 5.00 2.074( 2.99) 5.00 6.20 ( 11.2,1.2)
No significant difference between 2 age groups
Example - Maze Learning
Case 2 - Unequal Variances
H0: m1m2 0
2
S1
n1
n
*
( 4.47)
HA: m1m2 0
2
2
1.43
14
S2
n2
1.43 9.86 2
(1.43)
13
TS : t obs
(9.93)
2
(9.86)
9
2
( 4.47)
14
2
9.86
10
127.46
(9.93)
11.63
10.96
13.28 18.28
2
(a = 0.05)
2
5.00
1.49
3.36
10
RR : | t obs | t.025,11.63 2.19
95%CI : 5.00 2.19(3.36) 5.00 7.36 ( 12.36,2.36)
No significant difference between 2 age groups
Note: Alternative would be to use 9 df (10-1)
SPSS Output
Group Statistics
AVE_ERR
GROUP
Adult
Child
N
Mean
13.2761
18.2759
14
10
Std. Error
Mean
1.19408
3.14102
Std. Deviation
4.46784
9.93279
Independent Samples Test
Levene's Test for
Equality of Variances
F
AVE_ERR
Equal variances
ass umed
Equal variances
not as sumed
4.420
Sig.
.047
t-tes t for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
-1.672
22
.109
-4.9998
2.99017
-11.20101
1.20145
-1.488
11.621
.163
-4.9998
3.36034
-12.34787
2.34831
(1a)100% Confidence Interval for m1-m2
Case 1 s s
2
1
2
2
: y
1
y 2 ta / 2 s p
1
1
n
n
2
1
df n1 n2 2
Maze Data (df 22) :
95%CI : 5.00 2.074( 2.99) 5.00 6.20 ( 11.2,1.2)
y
Case 2 s 1 s 2 :
2
2
1
y 2 ta / 2
2
s1
n1
2
s2
n2
df Satterthwa ite or smaller of n1 1, n2 1
Maze Data (df 11.63 or could use 9) :
95%CI : 5.00 2.19(3.36) 5.00 7.36 ( 12.36,2.36)
Small Sample Test to Compare Two
Medians - Nonnormal Populations
• Two Independent Samples (Parallel Groups)
• Procedure (Wilcoxon Rank-Sum Test):
– Null hypothesis: Population Medians are equal H0: M1 = M2
– Rank measurements across samples from smallest (1) to
largest (n1+n2). Ties take average ranks.
– Obtain the rank sum for group with smallest sample size (T )
– 1-sided tests:Conclude HA: M1 > M2 if T > TU
–
Conclude: HA: M1 < M2 if T < TL
– 2-sided tests: Conclude HA: M1 M2 if T > TU or T < TL
– Values of TL and TU are given in Table 6, p. 683 for various
sample sizes and significance levels.
– This test is mathematically equivalent to Mann-Whitney U-test
Example - Levocabostine in Renal Patients
• 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6)
• Outcome: Levocabastine AUC (1 Outlier/Group)
Non-Dialysis
857 (12)
567 (9)
626 (10)
532 (8)
444
(5)
357
(1)
T1 = 45
Hemodialysis
527 (7)
740 (11)
392 (2.5)
514 (6)
433
(4)
392
(2.5)
T2 = 33
• 2-sided Test a = 0.05): TL=26, TU = 52, T=45 (Group 1)
• Conclude Medians differ (M1<M2) if T < 26
• Conclude Medians differ (M1>M2) if T > 52
• Neither criteria are met, do not conclude medians differ
Source: Zagornik, et al (1993)
Computer Output - SPSS
Ranks
AUC
GROUP
Non-Dialysis
Hemodialys is
Total
N
6
6
12
Mean Rank
7.50
5.50
Sum of Ranks
45.00
33.00
Test Statisticsb
Mann-Whitney U
Wilcoxon W
Z
Asym p. Si g. (2-tail ed)
Exact Si g. [2*(1-tai led
Sig.)]
AUC
12.000
33.000
-.962
.336
.394
a
a. Not corrected for ti es .
b. Grouping Vari able: GROUP
Note that SPSS uses rank sum for Group 2 as test statistic
Rank-Sum Test: Normal Approximation
• Under the null hypothesis of no difference in the two
groups (let T be rank sum for group 1):
mT
n1 ( N 1)
2
n1n2 ( N 1)
sT
12
N n1 n2
• A z-statistic can be computed and P-value (approximate)
can be obtained from Z-distribution
zobs
T mT
sT
T n1 ( N 1) / 2
n1n2 ( N 1) / 12
Note: When there are many ties in ranks, a more complex formula
for sT is often used, see p. 254 of Longnecker and Ott.
Example - Maze Learning
Adults = Group 1
Hl
2
42
254
6.05
1
0
1
0
1
D
1
69
538
7.80
2
1
0
2
0
Hg
1
27
213
7.89
3
1
0
3
0
L
1
41
335
8.17
4
1
0
4
0
Hv
1
24
217
9.04
5
1
0
5
0
Wd
1
47
473
10.06
6
1
0
6
0
R
1
48
553
11.52
7
1
0
7
0
N
2
49
599
12.22
8
0
1
0
8
B
2
20
254
12.70
9
0
1
0
9
Hz
2
40
516
12.90
10
0
1
0
10
T
2
40
520
13.00
11
0
1
0
11
W
1
25
333
13.32
12
1
0
12
0
Mac
1
33
453
13.73
13
1
0
13
0
Hp
1
27
375
13.89
14
1
0
14
0
Rh
1
35
532
15.20
15
1
0
15
0
J
2
50
828
16.56
16
0
1
0
16
McG
1
31
528
17.03
17
1
0
17
0
McS
2
89
1559
17.52
18
0
1
0
18
H
1
41
728
17.76
19
1
0
19
0
F
1
46
839
18.24
20
1
0
20
0
Hy
1
32
711
22.22
21
1
0
21
0
K
2
58
1331
22.95
22
0
1
0
22
Lin
2
38
1089
28.66
23
0
1
0
23
Lev
2
54
2171
40.20
24
0
1
0
24
158
142
T=T1
T2
Example - Maze Learning
H 0 : M1 M 2
Group 1 : Adults
T 158
sT
zobs
mT
n1 14
n2 10
n1 ( N 1)
14( 25)
2
n1n2 ( N 1)
N n1 n2 24
175
2
14(10)( 25)
12
158 175
a 0.05
H A : M1 M 2
17.08
12
0.9954
17.08
RR : | zobs | za / 2 1.96
2 sided P - value : 2 P ( Z | .9954 |) 2(.16) .32
Computer Output - SPSS
Ranks
AVE_ERR
GROUP
Adult
Child
Total
N
14
10
24
Mean Rank
11.29
14.20
Sum of Ranks
158.00
142.00
Test Statisticsb
AVE_ERR
Mann-Whitney U
53.000
Wilcoxon W
158.000
Z
-.995
Asymp. Sig. (2-tailed)
.320
a
Exact Sig. [2*(1-tailed
.341
Sig.)]
a. Not corrected for ties .
b. Grouping Variable: GROUP
Inference Based on Paired Samples
(Crossover Designs)
• Setting: Each treatment is applied to each subject or pair
(preferably in random order)
• Data: di is the difference in scores (Trt1-Trt2) for subject
(pair) i
• Parameter: mD - Population mean difference
• Sample Statistics:
d
n
i 1
n
di
d
n
s
2
d
i 1
i
d
n 1
2
sd sd
2
Test Concerning mD
• Null Hypothesis: H0:mD=0
(almost always 0)
• Alternative Hypotheses:
– 1-Sided: HA: mD > 0
– 2-Sided: HA: mD 0
• Test Statistic:
tobs
d 0
sd
n
Test Concerning mD
Decision Rule: (Based on t-distribution with n=n-1 df)
1-sided alternative (HA: mD > 0)
If tobs ta ==> Conclude mD 0
If tobs < ta ==> Do not reject mD 0
2-sided alternative (HA: mD 0)
If tobs ta/2 ==> Conclude mD 0
If tobs -ta/2 ==> Conclude mD < 0
If -ta/2 < tobs < ta/2 ==> Do not reject mD 0
Confidence Interval for mD
sd
d ta / 2
n
Example Antiperspirant Formulations
• Subjects - 20 Volunteers’ armpits (df=20-1=19)
• Treatments - Dry Powder vs Powder-in-Oil
• Measurements - Average Rating by Judges
– Higher scores imply more disagreeable odor
• Summary Statistics (Raw Data on next slide):
d 0.15
Source: E. Jungermann (1974)
sd 0.248
n 20
Example Antiperspirant Formulations
Subject
Dry Powder
Powder-in-Oil Difference
1
2
1.9
0.1
2
2.8
2.4
0.4
3
1.3
1.5
-0.2
4
1.8
1.8
0
5
1.9
1.8
0.1
6
2.8
2.4
0.4
7
2
2.2
-0.2
8
1.5
1.5
0
9
1.9
1.7
0.2
10
2.9
2.8
0.1
11
2.9
2.7
0.2
12
2.3
1.5
0.8
13
2.3
2.5
-0.2
14
3.6
3.2
0.4
15
2.2
2.1
0.1
16
2.1
1.9
0.2
17
2.5
2.6
-0.1
18
2.4
2
0.4
19
3.1
2.9
0.2
20
2
1.9
0.1
0.15 Mean
0.248151058 Std Dev
Example Antiperspirant Formulations
H 0 : m D 0 (No difference in formulatio n effects)
H A : m D 0 (Formulati on effects differ)
TS : tobs
d
sd
0.15
0.248
0.15
2.70
.0555
20
n
RR : tobs t.025 t.025 2.093
P value 2P(t 2.70)
95% CI for m D :
d t.025
sd
n
0.15 2.093(.0555) 0.15 0.116 (0.034,0.266)
Evidence that scores are higher (more unpleasant) for the dry
powder (formulation 1)
Small-Sample Test For Nonnormal Data
• Paired Samples (Crossover Design)
• Procedure (Wilcoxon Signed-Rank Test)
– Compute Differences di (as in the paired t-test) and obtain their
absolute values (ignoring 0s). n= number of non-zero differences
– Rank the observations by |di| (smallest=1), averaging ranks for ties
– Compute T+ and T- , the rank sums for the positive and negative
differences, respectively
– 1-sided tests:Conclude HA: M1 > M2 if T=T- T0
– 2-sided tests:Conclude HA: M1 M2 if T=min(T+ , T- ) T0
– Values of T0 are given in Table 7, pp 684-685 for various sample
sizes and significance levels. P-values printed by statistical
software packages.
Signed-Rank Test: Normal Approximation
• Under the null hypothesis of no difference in the
two groups :
mT
n(n 1)
4
sT
n(n 1)( 2n 1)
24
• A z-statistic can be computed and P-value
(approximate) can be obtained from Z-distribution
zobs
T mT
sT
T n(n 1) / 4
n(n 1)( 2n 1) / 24
Example - Caffeine and Endurance
• Subjects: 9 well-trained cyclists
• Treatments: 13mg Caffeine (Condition 1) vs 5mg (Condition 2)
• Measurements: Minutes Until Exhaustion
• This is subset of larger study (we’ll see later)
• Step 1: Take absolute values of differences (eliminating 0s)
• Step 2: Rank the absolute differences (averaging ranks for ties)
• Step 3: Sum Ranks for positive and negative true differences
Source: Pasman, et al (1995)
Example - Caffeine and Endurance
Original Data
Cyclist
mg13
mg5
mg13-mg5
1
37.55
42.47
-4.92
2
59.30
85.15
-25.85
3
79.12
63.20
15.92
4
58.33
52.10
6.23
5
70.54
66.20
4.34
6
69.47
73.25
-3.78
7
46.48
44.50
1.98
8
66.35
57.17
9.18
9
36.20
35.05
1.15
Example - Caffeine and Endurance
Cyclist
Absolute Differences
Ranked Absolute Differences
T+ = 1+2+4+6+7+8=28
T- = 3+5+9=17
Cyclist
mg13
mg5
mg13-mg5
abs(diff)
1
37.55
42.47
-4.92
4.92
2
59.30
85.15
-25.85
25.85
3
79.12
63.20
15.92
15.92
4
58.33
52.10
6.23
6.23
5
70.54
66.20
4.34
4.34
6
69.47
73.25
-3.78
3.78
7
46.48
44.50
1.98
1.98
8
66.35
57.17
9.18
9.18
9
36.20
35.05
1.15
1.15
mg13
mg5
mg13-mg5 abs(diff)
rank
9
36.20
35.05
1.15
1.15
1
7
46.48
44.50
1.98
1.98
2
6
69.47
73.25
-3.78
3.78
3
5
70.54
66.20
4.34
4.34
4
1
37.55
42.47
-4.92
4.92
5
4
58.33
52.10
6.23
6.23
6
8
66.35
57.17
9.18
9.18
7
3
79.12
63.20
15.92
15.92
8
2
59.30
85.15
-25.85
25.85
9
Example - Caffeine and Endurance
Under null hypothesis of no difference in the two groups (T=T+):
mT
sT
zobs
n( n 1)
9(9 1)
4
4
n( n 1)( 2n 1)
90
22.5
4
9(9 1)(18 1)
24
T mT
sT
24
28 22.5
8.44
5.5
1710
8.44
24
0.65
8.44
P Value : 2 P ( Z | 0.65 |) 2(.2578) .5156
There is no evidence that endurance times differ for the 2
doses (we will see later that both are higher than no dose)
SPSS Output
Ranks
N
MG5 - MG13
Negative Ranks
Pos itive Ranks
Ties
Total
a
6
3b
0c
9
Mean Rank
4.67
5.67
Sum of Ranks
28.00
17.00
a. MG5 < MG13
b. MG5 > MG13
c. MG5 = MG13
Test Statisticsb
Z
Asymp. Sig. (2-tailed)
MG5 - MG13
-.652a
.515
a. Bas ed on positive ranks .
b. Wilcoxon Signed Ranks Tes t
Note that SPSS is taking MG5-MG13, while we used MG13-MG5
Sample Sizes for Given Margin of Error
• Goal: Achieve a particular margin of error (E) for
estimating m1-m2 (Width of 95% CI will be 2E)
– Case 1: Independent Samples (Assumes equal variances)
E za / 2s
1
n1
1
n2
za / 2s
2
n
2 za / 2s
2
when n1 n2 n n
– Case 2: Paired Samples
E za / 2s d
1
n
za / 2s d
2
n
E
2
2
In practice, the variance will need to estimated in a pilot study or
obtained from previously conducted work.
E
2
2
Sample Size Calculations for Fixed Power
• Goal - Choose sample sizes to have a favorable chance of
detecting a specified difference in m1 and m2
• Step 1 - Define an important difference in means: m m
1
2
• Step 2 - Choose the desired power to detect the the clinically
meaningful difference (1-b, typically at least .80). For 2-sided test:
Independen t Samples : n1 n2
Paired Samples : n
s
2
d
z
2s
2
z
a /2
zb
2
2
zb
2
a /2
2
For 1-sided tests, replace za/2 with za
In practice, variance must be estimated, or given in units of s
Example - Rosiglitazone for HIV-1
Lipoatrophy
•
•
•
•
•
Trts - Rosiglitazone vs Placebo
Response - Change in Limb fat mass
Clinically Meaningful Difference - =0.5s
Desired Power - 1-b = 0.80
Significance Level - a = 0.05
za / 2 1.96
z b z.20 .84
21.96 0.84
2
n1 n2
Source: Carr, et al (2004)
(0.5)
2
63
Data Sources
• Zagonik, J., M.L. Huang, A. Van Peer, et al. (1993). “Pharmacokinetics of
Orally Administered Levocabastine in Patients with Renal Insufficiency,”
Journal of Clinical Pharmacology, 33:1214-1218
• Gould, M.C. and F.A.C. Perrin (1916). “A Comparison of the Factors Involved
in the Maze Learning of Human Adults and Children,” Journal of Experimental
Psychology, 1:122-???
• Jungermann, E. (1974). “Antiperspirants: New Trends in Formulation and
Testing Technology,” Journal of the Society of Cosmetic Chemists 25:621-638
• Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and A. de Haan (1995). “The
Effect of Different Dosages of Caffeine on Endurance Performance Time,”
International Journal of Sports Medicine, 16:225-230
• Carr, A., C. Workman, D. Crey, et al, (2004). “No Effect of Rosiglitazone for
Treatment of HIV-1 Lipoatrophy: Randomised, Double-Blind, PlaceboControlled Trial,” Lancet, 363:429-438