Transcript Chapter 6
Chapter 6 Inferences Regarding Locations of Two Distributions Comparing 2 Means - Independent Samples • Goal: Compare responses between 2 groups (populations, treatments, conditions) • Observed individuals from the 2 groups are samples from distinct populations (identified by (m1,s1) and (m2,s2)) • Measurements across groups are independent (different individuals in the 2 groups) • Summary statistics obtained from the 2 groups: Group 1 : Mean : y1 Std. Dev. : s1 Sample Size : n1 Group 2 : Mean : y 2 Std. Dev. : s2 Sample Size : n2 Sampling Distribution of Y1 Y 2 • Underlying distributions normal sampling distribution is normal • Underlying distributions nonnormal, but large sample sizes sampling distribution approximately normal • Mean, variance, standard error (Std. Dev. of estimator): E Y 1 Y 2 mY 1 Y 2 m1 m 2 V Y 1 Y 2 s Y 1 Y 2 sY 1 Y 2 2 s 12 n1 s 22 n2 s 12 n1 s 22 n2 Small-Sample Test for m1m2 Normal Populations • Case 1: Common Variances (s12 = s22 = s2) • Null Hypothesis: H 0 : m1 m 2 0 • Alternative Hypotheses: – 1-Sided: H A : m1 m 2 0 – 2-Sided: H A : m1 m2 0 • Test Statistic:(where Sp2 is a “pooled” estimate of s2) t obs ( y1 y 2 ) 0 sp 1 1 n n 2 1 ( n1 1) s1 ( n2 1) s2 2 sp n1 n2 2 2 Small-Sample Test for m1m2 Normal Populations • Decision Rule: (Based on t-distribution with n=n1+n2-2 df) – 1-sided alternative • If tobs ta,n ==> Conclude m1m2 0 • If tobs < ta,n ==> Do not reject m1m2 0 – 2-sided alternative • If tobs ta/2 ,n ==> Conclude m1m2 0 • If tobs -ta/2,n ==> Conclude m1m2 < 0 • If -ta/2,n < tobs < ta/2,n ==> Do not reject m1m2 0 Small-Sample Test for m1m2 Normal Populations • Observed Significance Level (P-Value) • Special Tables Needed, Printed by Statistical Software Packages – 1-sided alternative • P=P(t tobs) (From the tn distribution) – 2-sided alternative • P=2P( t |tobs| ) (From the tn distribution) • If P-Value a, then reject the null hypothesis Small-Sample (1-a100% Confidence Interval for m1m2 Normal Populations • Confidence Coefficient (1-a) refers to the proportion of times this rule would provide an interval that contains the true parameter value m1m2 if it were applied over all possible samples • Rule: y 1 y 2 ta / 2 s p 1 1 n n 2 1 • Interpretation (at the a significance level): – If interval contains 0, do not reject H0: m1 = m2 – If interval is strictly positive, conclude that m1 > m2 – If interval is strictly negative, conclude that m1 < m2 t-test when Variances are Unequal • Case 2: Population Variances not assumed to be equal (s12s22) • Approximate degrees of freedom – Calculated from a function of sample variances and sample sizes (see formula below) - Satterthwaite’s approximation – Smaller of n1-1 and n2-1 • Estimated standard error and test statistic for testing H0: m1=m2: 2 Estimated standard error : SE Y 1 Y 2 Test Statistic : t obs y1 y 2 SE y1 y 2 2 n1 2 s1 2 2 s12 s2 n n 2 1 s2 n2 2 2 2 2 s2 s 1 2 n n2 1 n 1 n 1 1 2 s2 n2 y1 y 2 n1 Satterthwa ite' s df : n s1 Example - Maze Learning (Adults/Children) • Groups: Adults (n1=14) / Children (n2=10) • Outcome: Average # of Errors in Maze Learning Task • Raw Data on next slide Mean Std Dev Sample Size Adults (i=1) 13.28 4.47 14 Children (i=2) 18.28 9.93 10 • Conduct a 2-sided test of whether true mean scores differ • Construct a 95% Confidence Interval for true difference Source: Gould and Perrin (1916) Example - Maze Learning (Adults/Children) Name Group Trials Errors Average H 1 41 728 17.76 W 1 25 333 13.32 Mac 1 33 453 13.73 McG 1 31 528 17.03 L 1 41 335 8.17 1 14 13.28 4.47 R 1 48 553 11.52 2 10 18.28 9.93 Hv 1 24 217 9.04 Hy 1 32 711 22.22 F 1 46 839 18.24 Wd 1 47 473 10.06 Rh 1 35 532 15.20 D 1 69 538 7.80 Hg 1 27 213 7.89 Hp 1 27 375 13.89 Hl 2 42 254 6.05 McS 2 89 1559 17.52 Lin 2 38 1089 28.66 B 2 20 254 12.70 N 2 49 599 12.22 T 2 40 520 13.00 J 2 50 828 16.56 Hz 2 40 516 12.90 Lev 2 54 2171 40.20 K 2 58 1331 22.95 Group n Mean Std Dev Example - Maze Learning Case 1 - Equal Variances H0: m1m2 0 HA: m1m2 0 (14 1)( 4.47) (10 1)(9.93) 2 sp (a = 0.05) TS : t obs 14 10 2 13.28 18.28 1 1 7.22 14 10 5.00 2 52.15 7.22 1.67 2.99 RR : | t obs | t.025, 22 2.074 P value : 2 P (T | 1.67 |) .1091 (From EXCEL) 95%CI : 5.00 2.074( 2.99) 5.00 6.20 ( 11.2,1.2) No significant difference between 2 age groups Example - Maze Learning Case 2 - Unequal Variances H0: m1m2 0 2 S1 n1 n * ( 4.47) HA: m1m2 0 2 2 1.43 14 S2 n2 1.43 9.86 2 (1.43) 13 TS : t obs (9.93) 2 (9.86) 9 2 ( 4.47) 14 2 9.86 10 127.46 (9.93) 11.63 10.96 13.28 18.28 2 (a = 0.05) 2 5.00 1.49 3.36 10 RR : | t obs | t.025,11.63 2.19 95%CI : 5.00 2.19(3.36) 5.00 7.36 ( 12.36,2.36) No significant difference between 2 age groups Note: Alternative would be to use 9 df (10-1) SPSS Output Group Statistics AVE_ERR GROUP Adult Child N Mean 13.2761 18.2759 14 10 Std. Error Mean 1.19408 3.14102 Std. Deviation 4.46784 9.93279 Independent Samples Test Levene's Test for Equality of Variances F AVE_ERR Equal variances ass umed Equal variances not as sumed 4.420 Sig. .047 t-tes t for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper -1.672 22 .109 -4.9998 2.99017 -11.20101 1.20145 -1.488 11.621 .163 -4.9998 3.36034 -12.34787 2.34831 (1a)100% Confidence Interval for m1-m2 Case 1 s s 2 1 2 2 : y 1 y 2 ta / 2 s p 1 1 n n 2 1 df n1 n2 2 Maze Data (df 22) : 95%CI : 5.00 2.074( 2.99) 5.00 6.20 ( 11.2,1.2) y Case 2 s 1 s 2 : 2 2 1 y 2 ta / 2 2 s1 n1 2 s2 n2 df Satterthwa ite or smaller of n1 1, n2 1 Maze Data (df 11.63 or could use 9) : 95%CI : 5.00 2.19(3.36) 5.00 7.36 ( 12.36,2.36) Small Sample Test to Compare Two Medians - Nonnormal Populations • Two Independent Samples (Parallel Groups) • Procedure (Wilcoxon Rank-Sum Test): – Null hypothesis: Population Medians are equal H0: M1 = M2 – Rank measurements across samples from smallest (1) to largest (n1+n2). Ties take average ranks. – Obtain the rank sum for group with smallest sample size (T ) – 1-sided tests:Conclude HA: M1 > M2 if T > TU – Conclude: HA: M1 < M2 if T < TL – 2-sided tests: Conclude HA: M1 M2 if T > TU or T < TL – Values of TL and TU are given in Table 6, p. 683 for various sample sizes and significance levels. – This test is mathematically equivalent to Mann-Whitney U-test Example - Levocabostine in Renal Patients • 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6) • Outcome: Levocabastine AUC (1 Outlier/Group) Non-Dialysis 857 (12) 567 (9) 626 (10) 532 (8) 444 (5) 357 (1) T1 = 45 Hemodialysis 527 (7) 740 (11) 392 (2.5) 514 (6) 433 (4) 392 (2.5) T2 = 33 • 2-sided Test a = 0.05): TL=26, TU = 52, T=45 (Group 1) • Conclude Medians differ (M1<M2) if T < 26 • Conclude Medians differ (M1>M2) if T > 52 • Neither criteria are met, do not conclude medians differ Source: Zagornik, et al (1993) Computer Output - SPSS Ranks AUC GROUP Non-Dialysis Hemodialys is Total N 6 6 12 Mean Rank 7.50 5.50 Sum of Ranks 45.00 33.00 Test Statisticsb Mann-Whitney U Wilcoxon W Z Asym p. Si g. (2-tail ed) Exact Si g. [2*(1-tai led Sig.)] AUC 12.000 33.000 -.962 .336 .394 a a. Not corrected for ti es . b. Grouping Vari able: GROUP Note that SPSS uses rank sum for Group 2 as test statistic Rank-Sum Test: Normal Approximation • Under the null hypothesis of no difference in the two groups (let T be rank sum for group 1): mT n1 ( N 1) 2 n1n2 ( N 1) sT 12 N n1 n2 • A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution zobs T mT sT T n1 ( N 1) / 2 n1n2 ( N 1) / 12 Note: When there are many ties in ranks, a more complex formula for sT is often used, see p. 254 of Longnecker and Ott. Example - Maze Learning Adults = Group 1 Hl 2 42 254 6.05 1 0 1 0 1 D 1 69 538 7.80 2 1 0 2 0 Hg 1 27 213 7.89 3 1 0 3 0 L 1 41 335 8.17 4 1 0 4 0 Hv 1 24 217 9.04 5 1 0 5 0 Wd 1 47 473 10.06 6 1 0 6 0 R 1 48 553 11.52 7 1 0 7 0 N 2 49 599 12.22 8 0 1 0 8 B 2 20 254 12.70 9 0 1 0 9 Hz 2 40 516 12.90 10 0 1 0 10 T 2 40 520 13.00 11 0 1 0 11 W 1 25 333 13.32 12 1 0 12 0 Mac 1 33 453 13.73 13 1 0 13 0 Hp 1 27 375 13.89 14 1 0 14 0 Rh 1 35 532 15.20 15 1 0 15 0 J 2 50 828 16.56 16 0 1 0 16 McG 1 31 528 17.03 17 1 0 17 0 McS 2 89 1559 17.52 18 0 1 0 18 H 1 41 728 17.76 19 1 0 19 0 F 1 46 839 18.24 20 1 0 20 0 Hy 1 32 711 22.22 21 1 0 21 0 K 2 58 1331 22.95 22 0 1 0 22 Lin 2 38 1089 28.66 23 0 1 0 23 Lev 2 54 2171 40.20 24 0 1 0 24 158 142 T=T1 T2 Example - Maze Learning H 0 : M1 M 2 Group 1 : Adults T 158 sT zobs mT n1 14 n2 10 n1 ( N 1) 14( 25) 2 n1n2 ( N 1) N n1 n2 24 175 2 14(10)( 25) 12 158 175 a 0.05 H A : M1 M 2 17.08 12 0.9954 17.08 RR : | zobs | za / 2 1.96 2 sided P - value : 2 P ( Z | .9954 |) 2(.16) .32 Computer Output - SPSS Ranks AVE_ERR GROUP Adult Child Total N 14 10 24 Mean Rank 11.29 14.20 Sum of Ranks 158.00 142.00 Test Statisticsb AVE_ERR Mann-Whitney U 53.000 Wilcoxon W 158.000 Z -.995 Asymp. Sig. (2-tailed) .320 a Exact Sig. [2*(1-tailed .341 Sig.)] a. Not corrected for ties . b. Grouping Variable: GROUP Inference Based on Paired Samples (Crossover Designs) • Setting: Each treatment is applied to each subject or pair (preferably in random order) • Data: di is the difference in scores (Trt1-Trt2) for subject (pair) i • Parameter: mD - Population mean difference • Sample Statistics: d n i 1 n di d n s 2 d i 1 i d n 1 2 sd sd 2 Test Concerning mD • Null Hypothesis: H0:mD=0 (almost always 0) • Alternative Hypotheses: – 1-Sided: HA: mD > 0 – 2-Sided: HA: mD 0 • Test Statistic: tobs d 0 sd n Test Concerning mD Decision Rule: (Based on t-distribution with n=n-1 df) 1-sided alternative (HA: mD > 0) If tobs ta ==> Conclude mD 0 If tobs < ta ==> Do not reject mD 0 2-sided alternative (HA: mD 0) If tobs ta/2 ==> Conclude mD 0 If tobs -ta/2 ==> Conclude mD < 0 If -ta/2 < tobs < ta/2 ==> Do not reject mD 0 Confidence Interval for mD sd d ta / 2 n Example Antiperspirant Formulations • Subjects - 20 Volunteers’ armpits (df=20-1=19) • Treatments - Dry Powder vs Powder-in-Oil • Measurements - Average Rating by Judges – Higher scores imply more disagreeable odor • Summary Statistics (Raw Data on next slide): d 0.15 Source: E. Jungermann (1974) sd 0.248 n 20 Example Antiperspirant Formulations Subject Dry Powder Powder-in-Oil Difference 1 2 1.9 0.1 2 2.8 2.4 0.4 3 1.3 1.5 -0.2 4 1.8 1.8 0 5 1.9 1.8 0.1 6 2.8 2.4 0.4 7 2 2.2 -0.2 8 1.5 1.5 0 9 1.9 1.7 0.2 10 2.9 2.8 0.1 11 2.9 2.7 0.2 12 2.3 1.5 0.8 13 2.3 2.5 -0.2 14 3.6 3.2 0.4 15 2.2 2.1 0.1 16 2.1 1.9 0.2 17 2.5 2.6 -0.1 18 2.4 2 0.4 19 3.1 2.9 0.2 20 2 1.9 0.1 0.15 Mean 0.248151058 Std Dev Example Antiperspirant Formulations H 0 : m D 0 (No difference in formulatio n effects) H A : m D 0 (Formulati on effects differ) TS : tobs d sd 0.15 0.248 0.15 2.70 .0555 20 n RR : tobs t.025 t.025 2.093 P value 2P(t 2.70) 95% CI for m D : d t.025 sd n 0.15 2.093(.0555) 0.15 0.116 (0.034,0.266) Evidence that scores are higher (more unpleasant) for the dry powder (formulation 1) Small-Sample Test For Nonnormal Data • Paired Samples (Crossover Design) • Procedure (Wilcoxon Signed-Rank Test) – Compute Differences di (as in the paired t-test) and obtain their absolute values (ignoring 0s). n= number of non-zero differences – Rank the observations by |di| (smallest=1), averaging ranks for ties – Compute T+ and T- , the rank sums for the positive and negative differences, respectively – 1-sided tests:Conclude HA: M1 > M2 if T=T- T0 – 2-sided tests:Conclude HA: M1 M2 if T=min(T+ , T- ) T0 – Values of T0 are given in Table 7, pp 684-685 for various sample sizes and significance levels. P-values printed by statistical software packages. Signed-Rank Test: Normal Approximation • Under the null hypothesis of no difference in the two groups : mT n(n 1) 4 sT n(n 1)( 2n 1) 24 • A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution zobs T mT sT T n(n 1) / 4 n(n 1)( 2n 1) / 24 Example - Caffeine and Endurance • Subjects: 9 well-trained cyclists • Treatments: 13mg Caffeine (Condition 1) vs 5mg (Condition 2) • Measurements: Minutes Until Exhaustion • This is subset of larger study (we’ll see later) • Step 1: Take absolute values of differences (eliminating 0s) • Step 2: Rank the absolute differences (averaging ranks for ties) • Step 3: Sum Ranks for positive and negative true differences Source: Pasman, et al (1995) Example - Caffeine and Endurance Original Data Cyclist mg13 mg5 mg13-mg5 1 37.55 42.47 -4.92 2 59.30 85.15 -25.85 3 79.12 63.20 15.92 4 58.33 52.10 6.23 5 70.54 66.20 4.34 6 69.47 73.25 -3.78 7 46.48 44.50 1.98 8 66.35 57.17 9.18 9 36.20 35.05 1.15 Example - Caffeine and Endurance Cyclist Absolute Differences Ranked Absolute Differences T+ = 1+2+4+6+7+8=28 T- = 3+5+9=17 Cyclist mg13 mg5 mg13-mg5 abs(diff) 1 37.55 42.47 -4.92 4.92 2 59.30 85.15 -25.85 25.85 3 79.12 63.20 15.92 15.92 4 58.33 52.10 6.23 6.23 5 70.54 66.20 4.34 4.34 6 69.47 73.25 -3.78 3.78 7 46.48 44.50 1.98 1.98 8 66.35 57.17 9.18 9.18 9 36.20 35.05 1.15 1.15 mg13 mg5 mg13-mg5 abs(diff) rank 9 36.20 35.05 1.15 1.15 1 7 46.48 44.50 1.98 1.98 2 6 69.47 73.25 -3.78 3.78 3 5 70.54 66.20 4.34 4.34 4 1 37.55 42.47 -4.92 4.92 5 4 58.33 52.10 6.23 6.23 6 8 66.35 57.17 9.18 9.18 7 3 79.12 63.20 15.92 15.92 8 2 59.30 85.15 -25.85 25.85 9 Example - Caffeine and Endurance Under null hypothesis of no difference in the two groups (T=T+): mT sT zobs n( n 1) 9(9 1) 4 4 n( n 1)( 2n 1) 90 22.5 4 9(9 1)(18 1) 24 T mT sT 24 28 22.5 8.44 5.5 1710 8.44 24 0.65 8.44 P Value : 2 P ( Z | 0.65 |) 2(.2578) .5156 There is no evidence that endurance times differ for the 2 doses (we will see later that both are higher than no dose) SPSS Output Ranks N MG5 - MG13 Negative Ranks Pos itive Ranks Ties Total a 6 3b 0c 9 Mean Rank 4.67 5.67 Sum of Ranks 28.00 17.00 a. MG5 < MG13 b. MG5 > MG13 c. MG5 = MG13 Test Statisticsb Z Asymp. Sig. (2-tailed) MG5 - MG13 -.652a .515 a. Bas ed on positive ranks . b. Wilcoxon Signed Ranks Tes t Note that SPSS is taking MG5-MG13, while we used MG13-MG5 Sample Sizes for Given Margin of Error • Goal: Achieve a particular margin of error (E) for estimating m1-m2 (Width of 95% CI will be 2E) – Case 1: Independent Samples (Assumes equal variances) E za / 2s 1 n1 1 n2 za / 2s 2 n 2 za / 2s 2 when n1 n2 n n – Case 2: Paired Samples E za / 2s d 1 n za / 2s d 2 n E 2 2 In practice, the variance will need to estimated in a pilot study or obtained from previously conducted work. E 2 2 Sample Size Calculations for Fixed Power • Goal - Choose sample sizes to have a favorable chance of detecting a specified difference in m1 and m2 • Step 1 - Define an important difference in means: m m 1 2 • Step 2 - Choose the desired power to detect the the clinically meaningful difference (1-b, typically at least .80). For 2-sided test: Independen t Samples : n1 n2 Paired Samples : n s 2 d z 2s 2 z a /2 zb 2 2 zb 2 a /2 2 For 1-sided tests, replace za/2 with za In practice, variance must be estimated, or given in units of s Example - Rosiglitazone for HIV-1 Lipoatrophy • • • • • Trts - Rosiglitazone vs Placebo Response - Change in Limb fat mass Clinically Meaningful Difference - =0.5s Desired Power - 1-b = 0.80 Significance Level - a = 0.05 za / 2 1.96 z b z.20 .84 21.96 0.84 2 n1 n2 Source: Carr, et al (2004) (0.5) 2 63 Data Sources • Zagonik, J., M.L. Huang, A. Van Peer, et al. (1993). “Pharmacokinetics of Orally Administered Levocabastine in Patients with Renal Insufficiency,” Journal of Clinical Pharmacology, 33:1214-1218 • Gould, M.C. and F.A.C. Perrin (1916). “A Comparison of the Factors Involved in the Maze Learning of Human Adults and Children,” Journal of Experimental Psychology, 1:122-??? • Jungermann, E. (1974). “Antiperspirants: New Trends in Formulation and Testing Technology,” Journal of the Society of Cosmetic Chemists 25:621-638 • Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and A. de Haan (1995). “The Effect of Different Dosages of Caffeine on Endurance Performance Time,” International Journal of Sports Medicine, 16:225-230 • Carr, A., C. Workman, D. Crey, et al, (2004). “No Effect of Rosiglitazone for Treatment of HIV-1 Lipoatrophy: Randomised, Double-Blind, PlaceboControlled Trial,” Lancet, 363:429-438