Transcript Chapter 6

Chapter 6
Inferences Regarding Locations of
Two Distributions
Comparing 2 Means - Independent Samples
• Goal: Compare responses between 2 groups (populations,
treatments, conditions)
• Observed individuals from the 2 groups are samples from
distinct populations (identified by (m1,s1) and (m2,s2))
• Measurements across groups are independent (different
individuals in the 2 groups)
• Summary statistics obtained from the 2 groups:
Group 1 : Mean : y1 Std. Dev. : s1 Sample Size : n1
Group 2 : Mean : y 2 Std. Dev. : s2 Sample Size : n2
Sampling Distribution of
Y1 Y
2
• Underlying distributions normal  sampling distribution
is normal
• Underlying distributions nonnormal, but large sample
sizes  sampling distribution approximately normal
• Mean, variance, standard error (Std. Dev. of estimator):


E Y 1  Y 2  mY 1 Y 2  m1  m 2


V Y 1  Y 2  s Y 1 Y 2 
sY
1 Y 2

2
s 12
n1

s 22
n2
s 12
n1

s 22
n2
Small-Sample Test for m1m2
Normal Populations
• Case 1: Common Variances (s12 = s22 = s2)
• Null Hypothesis:
H 0 : m1  m 2   0
• Alternative Hypotheses:
– 1-Sided:
H A : m1  m 2   0
– 2-Sided: H A : m1  m2   0
• Test Statistic:(where Sp2 is a “pooled” estimate of s2)
t obs 
( y1  y 2 )   0
sp
 1
1 

n  n 

2 
 1
( n1  1) s1  ( n2  1) s2
2
sp 
n1  n2  2
2
Small-Sample Test for m1m2
Normal Populations
• Decision Rule: (Based on t-distribution with n=n1+n2-2 df)
– 1-sided alternative
• If tobs  ta,n ==> Conclude m1m2  0
• If tobs < ta,n ==> Do not reject m1m2  0
– 2-sided alternative
• If tobs  ta/2 ,n ==> Conclude m1m2  0
• If tobs  -ta/2,n ==> Conclude m1m2 < 0
• If -ta/2,n < tobs < ta/2,n ==> Do not reject m1m2  0
Small-Sample Test for m1m2
Normal Populations
• Observed Significance Level (P-Value)
• Special Tables Needed, Printed by Statistical Software
Packages
– 1-sided alternative
• P=P(t  tobs) (From the tn distribution)
– 2-sided alternative
• P=2P( t  |tobs| ) (From the tn distribution)
• If P-Value  a, then reject the null hypothesis
Small-Sample (1-a100% Confidence Interval
for m1m2  Normal Populations
• Confidence Coefficient (1-a) refers to the proportion of times this
rule would provide an interval that contains the true parameter
value m1m2 if it were applied over all possible samples
• Rule:
y
1

 y 2  ta / 2 s p
1
1 
 

n

n
2 
 1
• Interpretation (at the a significance level):
– If interval contains 0, do not reject H0: m1 = m2
– If interval is strictly positive, conclude that m1 > m2
– If interval is strictly negative, conclude that m1 < m2
t-test when Variances are Unequal
• Case 2: Population Variances not assumed to be equal (s12s22)
• Approximate degrees of freedom
– Calculated from a function of sample variances and sample sizes (see formula
below) - Satterthwaite’s approximation
– Smaller of n1-1 and n2-1
• Estimated standard error and test statistic for testing H0: m1=m2:


2
Estimated standard error : SE Y 1  Y 2 
Test Statistic : t obs 
y1  y 2

SE y1  y 2


2

n1
2
s1
2

2
 s12
s2 

n  n 

2 
 1
s2
n2
2
2
2
2
  s2



s
 1

 2

n
n2 

1 

 n 1  n 1
1
2



s2
n2
y1  y 2
n1
Satterthwa ite' s df : n 
s1







Example - Maze Learning (Adults/Children)
• Groups: Adults (n1=14) / Children (n2=10)
• Outcome: Average # of Errors in Maze Learning Task
• Raw Data on next slide
Mean
Std Dev
Sample Size
Adults (i=1)
13.28
4.47
14
Children (i=2)
18.28
9.93
10
• Conduct a 2-sided test of whether true mean scores differ
• Construct a 95% Confidence Interval for true difference
Source: Gould and Perrin (1916)
Example - Maze Learning (Adults/Children)
Name
Group
Trials
Errors
Average
H
1
41
728
17.76
W
1
25
333
13.32
Mac
1
33
453
13.73
McG
1
31
528
17.03
L
1
41
335
8.17
1
14
13.28
4.47
R
1
48
553
11.52
2
10
18.28
9.93
Hv
1
24
217
9.04
Hy
1
32
711
22.22
F
1
46
839
18.24
Wd
1
47
473
10.06
Rh
1
35
532
15.20
D
1
69
538
7.80
Hg
1
27
213
7.89
Hp
1
27
375
13.89
Hl
2
42
254
6.05
McS
2
89
1559
17.52
Lin
2
38
1089
28.66
B
2
20
254
12.70
N
2
49
599
12.22
T
2
40
520
13.00
J
2
50
828
16.56
Hz
2
40
516
12.90
Lev
2
54
2171
40.20
K
2
58
1331
22.95
Group
n
Mean
Std Dev
Example - Maze Learning
Case 1 - Equal Variances
H0: m1m2  0
HA: m1m2  0
(14  1)( 4.47)  (10  1)(9.93)
2
sp 
(a = 0.05)
TS : t obs 
14  10  2
13.28  18.28
1 
 1
7.22 


 14 10 

 5.00
2

52.15  7.22
 1.67
2.99
RR : | t obs |  t.025, 22  2.074
P  value : 2 P (T | 1.67 |)  .1091 (From EXCEL)
95%CI :  5.00  2.074( 2.99)   5.00  6.20  ( 11.2,1.2)
No significant difference between 2 age groups
Example - Maze Learning
Case 2 - Unequal Variances
H0: m1m2  0
2
S1

n1
n
*
( 4.47)
HA: m1m2  0
2
2
 1.43
14

S2
n2
1.43  9.86 2
 (1.43)


13

TS : t obs 

(9.93)
2
(9.86) 


9

2

( 4.47)
14

2
 9.86
10

127.46
(9.93)
 11.63
10.96
13.28  18.28
2
(a = 0.05)

2
 5.00
 1.49
3.36
10
RR : | t obs |  t.025,11.63  2.19
95%CI :  5.00  2.19(3.36)   5.00  7.36  ( 12.36,2.36)
No significant difference between 2 age groups
Note: Alternative would be to use 9 df (10-1)
SPSS Output
Group Statistics
AVE_ERR
GROUP
Adult
Child
N
Mean
13.2761
18.2759
14
10
Std. Error
Mean
1.19408
3.14102
Std. Deviation
4.46784
9.93279
Independent Samples Test
Levene's Test for
Equality of Variances
F
AVE_ERR
Equal variances
ass umed
Equal variances
not as sumed
4.420
Sig.
.047
t-tes t for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
-1.672
22
.109
-4.9998
2.99017
-11.20101
1.20145
-1.488
11.621
.163
-4.9998
3.36034
-12.34787
2.34831
(1a)100% Confidence Interval for m1-m2

Case 1 s  s
2
1
2
2
 : y
1

 y 2  ta / 2 s p
1
1 
 

n

n
2 
 1
df  n1  n2  2
Maze Data (df  22) :
95%CI :  5.00  2.074( 2.99)   5.00  6.20  ( 11.2,1.2)

 y
Case 2 s 1  s 2 :
2
2
1

 y 2  ta / 2
2
s1
n1
2

s2
n2
df  Satterthwa ite or smaller of n1  1, n2  1
Maze Data (df  11.63 or could use 9) :
95%CI :  5.00  2.19(3.36)   5.00  7.36  ( 12.36,2.36)
Small Sample Test to Compare Two
Medians - Nonnormal Populations
• Two Independent Samples (Parallel Groups)
• Procedure (Wilcoxon Rank-Sum Test):
– Null hypothesis: Population Medians are equal H0: M1 = M2
– Rank measurements across samples from smallest (1) to
largest (n1+n2). Ties take average ranks.
– Obtain the rank sum for group with smallest sample size (T )
– 1-sided tests:Conclude HA: M1 > M2 if T > TU
–
Conclude: HA: M1 < M2 if T < TL
– 2-sided tests: Conclude HA: M1  M2 if T > TU or T < TL
– Values of TL and TU are given in Table 6, p. 683 for various
sample sizes and significance levels.
– This test is mathematically equivalent to Mann-Whitney U-test
Example - Levocabostine in Renal Patients
• 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6)
• Outcome: Levocabastine AUC (1 Outlier/Group)
Non-Dialysis
857 (12)
567 (9)
626 (10)
532 (8)
444
(5)
357
(1)
T1 = 45
Hemodialysis
527 (7)
740 (11)
392 (2.5)
514 (6)
433
(4)
392
(2.5)
T2 = 33
• 2-sided Test a = 0.05): TL=26, TU = 52, T=45 (Group 1)
• Conclude Medians differ (M1<M2) if T < 26
• Conclude Medians differ (M1>M2) if T > 52
• Neither criteria are met, do not conclude medians differ
Source: Zagornik, et al (1993)
Computer Output - SPSS
Ranks
AUC
GROUP
Non-Dialysis
Hemodialys is
Total
N
6
6
12
Mean Rank
7.50
5.50
Sum of Ranks
45.00
33.00
Test Statisticsb
Mann-Whitney U
Wilcoxon W
Z
Asym p. Si g. (2-tail ed)
Exact Si g. [2*(1-tai led
Sig.)]
AUC
12.000
33.000
-.962
.336
.394
a
a. Not corrected for ti es .
b. Grouping Vari able: GROUP
Note that SPSS uses rank sum for Group 2 as test statistic
Rank-Sum Test: Normal Approximation
• Under the null hypothesis of no difference in the two
groups (let T be rank sum for group 1):
mT 
n1 ( N  1)
2
n1n2 ( N  1)
sT 
12
N  n1  n2
• A z-statistic can be computed and P-value (approximate)
can be obtained from Z-distribution
zobs 
T  mT
sT

T  n1 ( N  1) / 2
n1n2 ( N  1) / 12
Note: When there are many ties in ranks, a more complex formula
for sT is often used, see p. 254 of Longnecker and Ott.
Example - Maze Learning
Adults = Group 1
Hl
2
42
254
6.05
1
0
1
0
1
D
1
69
538
7.80
2
1
0
2
0
Hg
1
27
213
7.89
3
1
0
3
0
L
1
41
335
8.17
4
1
0
4
0
Hv
1
24
217
9.04
5
1
0
5
0
Wd
1
47
473
10.06
6
1
0
6
0
R
1
48
553
11.52
7
1
0
7
0
N
2
49
599
12.22
8
0
1
0
8
B
2
20
254
12.70
9
0
1
0
9
Hz
2
40
516
12.90
10
0
1
0
10
T
2
40
520
13.00
11
0
1
0
11
W
1
25
333
13.32
12
1
0
12
0
Mac
1
33
453
13.73
13
1
0
13
0
Hp
1
27
375
13.89
14
1
0
14
0
Rh
1
35
532
15.20
15
1
0
15
0
J
2
50
828
16.56
16
0
1
0
16
McG
1
31
528
17.03
17
1
0
17
0
McS
2
89
1559
17.52
18
0
1
0
18
H
1
41
728
17.76
19
1
0
19
0
F
1
46
839
18.24
20
1
0
20
0
Hy
1
32
711
22.22
21
1
0
21
0
K
2
58
1331
22.95
22
0
1
0
22
Lin
2
38
1089
28.66
23
0
1
0
23
Lev
2
54
2171
40.20
24
0
1
0
24
158
142
T=T1
T2
Example - Maze Learning
H 0 : M1  M 2
Group 1 : Adults
T  158
sT 
zobs 
mT 
n1  14
n2  10
n1 ( N  1)
14( 25)

2
n1n2 ( N  1)

N  n1  n2  24
 175
2
14(10)( 25)
12
158  175
a  0.05
H A : M1  M 2
 17.08
12
 0.9954
17.08
RR : | zobs | za / 2  1.96
2  sided P - value : 2 P ( Z | .9954 |)  2(.16)  .32
Computer Output - SPSS
Ranks
AVE_ERR
GROUP
Adult
Child
Total
N
14
10
24
Mean Rank
11.29
14.20
Sum of Ranks
158.00
142.00
Test Statisticsb
AVE_ERR
Mann-Whitney U
53.000
Wilcoxon W
158.000
Z
-.995
Asymp. Sig. (2-tailed)
.320
a
Exact Sig. [2*(1-tailed
.341
Sig.)]
a. Not corrected for ties .
b. Grouping Variable: GROUP
Inference Based on Paired Samples
(Crossover Designs)
• Setting: Each treatment is applied to each subject or pair
(preferably in random order)
• Data: di is the difference in scores (Trt1-Trt2) for subject
(pair) i
• Parameter: mD - Population mean difference
• Sample Statistics:

d
n
i 1
n
di
d


n
s
2
d
i 1
i
d
n 1

2
sd  sd
2
Test Concerning mD
• Null Hypothesis: H0:mD=0
(almost always 0)
• Alternative Hypotheses:
– 1-Sided: HA: mD > 0
– 2-Sided: HA: mD  0
• Test Statistic:
tobs 
d  0
sd
n
Test Concerning mD
Decision Rule: (Based on t-distribution with n=n-1 df)
1-sided alternative (HA: mD > 0)
If tobs  ta ==> Conclude mD  0
If tobs < ta ==> Do not reject mD  0
2-sided alternative (HA: mD  0)
If tobs  ta/2 ==> Conclude mD  0
If tobs  -ta/2 ==> Conclude mD < 0
If -ta/2 < tobs < ta/2 ==> Do not reject mD  0
Confidence Interval for mD
 sd 
d  ta / 2 

 n
Example Antiperspirant Formulations
• Subjects - 20 Volunteers’ armpits (df=20-1=19)
• Treatments - Dry Powder vs Powder-in-Oil
• Measurements - Average Rating by Judges
– Higher scores imply more disagreeable odor
• Summary Statistics (Raw Data on next slide):
d  0.15
Source: E. Jungermann (1974)
sd  0.248
n  20
Example Antiperspirant Formulations
Subject
Dry Powder
Powder-in-Oil Difference
1
2
1.9
0.1
2
2.8
2.4
0.4
3
1.3
1.5
-0.2
4
1.8
1.8
0
5
1.9
1.8
0.1
6
2.8
2.4
0.4
7
2
2.2
-0.2
8
1.5
1.5
0
9
1.9
1.7
0.2
10
2.9
2.8
0.1
11
2.9
2.7
0.2
12
2.3
1.5
0.8
13
2.3
2.5
-0.2
14
3.6
3.2
0.4
15
2.2
2.1
0.1
16
2.1
1.9
0.2
17
2.5
2.6
-0.1
18
2.4
2
0.4
19
3.1
2.9
0.2
20
2
1.9
0.1
0.15 Mean
0.248151058 Std Dev
Example Antiperspirant Formulations
H 0 : m D  0 (No difference in formulatio n effects)
H A : m D  0 (Formulati on effects differ)
TS : tobs 
d

sd
0.15

0.248
0.15
 2.70
.0555
20
n
RR : tobs  t.025  t.025  2.093
P  value  2P(t  2.70)
95% CI for m D :

d  t.025
sd
n
0.15  2.093(.0555)  0.15  0.116  (0.034,0.266)
Evidence that scores are higher (more unpleasant) for the dry
powder (formulation 1)
Small-Sample Test For Nonnormal Data
• Paired Samples (Crossover Design)
• Procedure (Wilcoxon Signed-Rank Test)
– Compute Differences di (as in the paired t-test) and obtain their
absolute values (ignoring 0s). n= number of non-zero differences
– Rank the observations by |di| (smallest=1), averaging ranks for ties
– Compute T+ and T- , the rank sums for the positive and negative
differences, respectively
– 1-sided tests:Conclude HA: M1 > M2 if T=T-  T0
– 2-sided tests:Conclude HA: M1  M2 if T=min(T+ , T- )  T0
– Values of T0 are given in Table 7, pp 684-685 for various sample
sizes and significance levels. P-values printed by statistical
software packages.
Signed-Rank Test: Normal Approximation
• Under the null hypothesis of no difference in the
two groups :
mT 
n(n  1)
4
sT 
n(n  1)( 2n  1)
24
• A z-statistic can be computed and P-value
(approximate) can be obtained from Z-distribution
zobs 
T  mT
sT

T  n(n  1) / 4
n(n  1)( 2n  1) / 24
Example - Caffeine and Endurance
• Subjects: 9 well-trained cyclists
• Treatments: 13mg Caffeine (Condition 1) vs 5mg (Condition 2)
• Measurements: Minutes Until Exhaustion
• This is subset of larger study (we’ll see later)
• Step 1: Take absolute values of differences (eliminating 0s)
• Step 2: Rank the absolute differences (averaging ranks for ties)
• Step 3: Sum Ranks for positive and negative true differences
Source: Pasman, et al (1995)
Example - Caffeine and Endurance
Original Data
Cyclist
mg13
mg5
mg13-mg5
1
37.55
42.47
-4.92
2
59.30
85.15
-25.85
3
79.12
63.20
15.92
4
58.33
52.10
6.23
5
70.54
66.20
4.34
6
69.47
73.25
-3.78
7
46.48
44.50
1.98
8
66.35
57.17
9.18
9
36.20
35.05
1.15
Example - Caffeine and Endurance
Cyclist
Absolute Differences
Ranked Absolute Differences
T+ = 1+2+4+6+7+8=28
T- = 3+5+9=17
Cyclist
mg13
mg5
mg13-mg5
abs(diff)
1
37.55
42.47
-4.92
4.92
2
59.30
85.15
-25.85
25.85
3
79.12
63.20
15.92
15.92
4
58.33
52.10
6.23
6.23
5
70.54
66.20
4.34
4.34
6
69.47
73.25
-3.78
3.78
7
46.48
44.50
1.98
1.98
8
66.35
57.17
9.18
9.18
9
36.20
35.05
1.15
1.15
mg13
mg5
mg13-mg5 abs(diff)
rank
9
36.20
35.05
1.15
1.15
1
7
46.48
44.50
1.98
1.98
2
6
69.47
73.25
-3.78
3.78
3
5
70.54
66.20
4.34
4.34
4
1
37.55
42.47
-4.92
4.92
5
4
58.33
52.10
6.23
6.23
6
8
66.35
57.17
9.18
9.18
7
3
79.12
63.20
15.92
15.92
8
2
59.30
85.15
-25.85
25.85
9
Example - Caffeine and Endurance
Under null hypothesis of no difference in the two groups (T=T+):
mT 
sT 
zobs 
n( n  1)

9(9  1)
4

4
n( n  1)( 2n  1)
90
 22.5
4

9(9  1)(18  1)
24
T  mT
sT


24
28  22.5
8.44

5.5
1710
 8.44
24
 0.65
8.44
P  Value : 2 P ( Z | 0.65 |)  2(.2578)  .5156
There is no evidence that endurance times differ for the 2
doses (we will see later that both are higher than no dose)
SPSS Output
Ranks
N
MG5 - MG13
Negative Ranks
Pos itive Ranks
Ties
Total
a
6
3b
0c
9
Mean Rank
4.67
5.67
Sum of Ranks
28.00
17.00
a. MG5 < MG13
b. MG5 > MG13
c. MG5 = MG13
Test Statisticsb
Z
Asymp. Sig. (2-tailed)
MG5 - MG13
-.652a
.515
a. Bas ed on positive ranks .
b. Wilcoxon Signed Ranks Tes t
Note that SPSS is taking MG5-MG13, while we used MG13-MG5
Sample Sizes for Given Margin of Error
• Goal: Achieve a particular margin of error (E) for
estimating m1-m2 (Width of 95% CI will be 2E)
– Case 1: Independent Samples (Assumes equal variances)
E  za / 2s
1
n1

1
n2
 za / 2s
2
n
2 za / 2s
2
when n1  n2  n  n 
– Case 2: Paired Samples
E  za / 2s d
1
n
za / 2s d
2
n
E
2
2
In practice, the variance will need to estimated in a pilot study or
obtained from previously conducted work.
E
2
2
Sample Size Calculations for Fixed Power
• Goal - Choose sample sizes to have a favorable chance of
detecting a specified difference in m1 and m2
• Step 1 - Define an important difference in means:   m  m
1
2
• Step 2 - Choose the desired power to detect the the clinically
meaningful difference (1-b, typically at least .80). For 2-sided test:
Independen t Samples : n1  n2 
Paired Samples : n 
s
2
d
z
2s
2
z
a /2
 zb

2

2
 zb 
2
a /2

2
For 1-sided tests, replace za/2 with za
In practice, variance must be estimated, or  given in units of s
Example - Rosiglitazone for HIV-1
Lipoatrophy
•
•
•
•
•
Trts - Rosiglitazone vs Placebo
Response - Change in Limb fat mass
Clinically Meaningful Difference - =0.5s
Desired Power - 1-b = 0.80
Significance Level - a = 0.05
za / 2  1.96
z b  z.20  .84
21.96  0.84 
2
n1  n2 
Source: Carr, et al (2004)
(0.5)
2
 63
Data Sources
• Zagonik, J., M.L. Huang, A. Van Peer, et al. (1993). “Pharmacokinetics of
Orally Administered Levocabastine in Patients with Renal Insufficiency,”
Journal of Clinical Pharmacology, 33:1214-1218
• Gould, M.C. and F.A.C. Perrin (1916). “A Comparison of the Factors Involved
in the Maze Learning of Human Adults and Children,” Journal of Experimental
Psychology, 1:122-???
• Jungermann, E. (1974). “Antiperspirants: New Trends in Formulation and
Testing Technology,” Journal of the Society of Cosmetic Chemists 25:621-638
• Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and A. de Haan (1995). “The
Effect of Different Dosages of Caffeine on Endurance Performance Time,”
International Journal of Sports Medicine, 16:225-230
• Carr, A., C. Workman, D. Crey, et al, (2004). “No Effect of Rosiglitazone for
Treatment of HIV-1 Lipoatrophy: Randomised, Double-Blind, PlaceboControlled Trial,” Lancet, 363:429-438