Transcript Document
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven. Applied Statistics Using SAS and SPSS Topic: One Way ANOVA By Prof Kelly Fan, Cal State Univ, East Bay Statistical Tools vs. Variable Types Response (output) Numerical Categorical Predictor (input) Numerical Categorical/Mixed Analysis of Variance (ANOVA) Simple Analysis of and Multiple Covariance Regression (ANCOVA) Categorical data analysis Example: Battery Lifetime 8 brands of battery are studied. We would like to find out whether or not the brand of a battery will affect its lifetime. If so, of which brand the batteries can last longer than the other brands. Data collection: For each brand, 3 batteries are tested for their lifetime. What is Y variable? X variable? Data: Y = LIFETIME (HOURS) 3 replications 1 per level BRAND 2 3 4 5 6 7 8 1.8 4.2 8.6 7.0 4.2 4.2 7.8 9.0 5.0 5.4 4.6 5.0 7.8 4.2 7.0 7.4 1.0 4.2 4.2 9.0 6.6 5.4 9.8 5.8 2.6 4.6 5.8 7.0 6.2 4.6 8.2 7.4 5.8 Dotplot for life 1 2 3 4 5 6 life 7 8 9 10 Dotplots of life by brand (group means are indicated by lines) 10 9 8 7 life 6 5 4 3 2 8 7 6 5 4 3 2 brand 1 1 Boxplots of life by brand (means are indicated by solid circles) 10 9 8 7 life 6 5 4 3 2 1 8 7 6 5 4 3 2 brand 1 0 Statistical Model “LEVEL” OF BRAND 1 1 2 • • • • n 2 ••• • • •••C Y11 Y12 • • • • • • •Y1c Y21 • • • • • • Yij • • • • YnI (Brand is, of course, represented as “categorical”) • • • • • • • • •Ync Yij = i + ij i = 1, . . . . . , C j = 1, . . . . . , n Hypotheses Setup HO: Level of X has no impact on Y HI: Level of X does have impact on Y HO: 1 = 2 = • • • • 8 HI: not all j are EQUAL ONE WAY ANOVA Analysis of Variance for life Source DF SS MS brand 7 69.12 9.87 Error 16 46.72 2.92 Total 23 115.84 F 3.38 P 0.021 Estimate of the common variance s^2 S = 1.709 R-Sq = 59.67% R-Sq(adj) = 42.02% Review Fitted value = Predicted value Residual = Observed value – fitted value Diagnosis: Normality • The points on the normality plot must more or less follow a line to claim “normal distributed”. • There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much. Normality plot: normal scores vs. residuals From the Battery lifetime data: Normal Probability Plot of the Residuals (response is life) 99 95 90 Percent 80 70 60 50 40 30 20 10 5 1 -4 -3 -2 -1 0 Residual 1 2 3 4 Diagnosis: Equal Variances • The points on the residual plot must be more or less within a horizontal band to claim “constant variances”. • There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not sensitive to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much. Residual plot: fitted values vs. residuals From the Battery lifetime data: Residuals Versus the Fitted Values (response is life) 3 Residual 2 1 0 -1 -2 2 3 4 5 Fitted Value 6 7 8 Multiple Comparison Procedures Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc. These “more detailed” inquiries into the process are called MULTIPLE COMPARISON PROCEDURES. Errors (Type I): We set up “” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at = .05; each test has type I error (rej H0 when it’s true) of .05. However, P(at least one type I error in the 3 tests) all ) = 1 - (.95)3 .14 = 1-P(3,accept given true In other words, Probability is .14 that at least one type one error is made. For 5 tests, prob = .23. Question - Should we choose = .05, and suffer (for 5 tests) a .23 OVERALL Error rate (or “a” or experimentwise)? OR Should we choose/control the overall error rate, “a”, to be .05, and find the individual test by 1 - (1-)5 = .05, (which gives us = .011)? The formula 1 - (1-)5 = .05 would be valid only if the tests are independent; often they’re not. 1 2 3 [ e.g., 1=22= 3, 1= 3 IF 1 accepted & 2 rejected, isn’t it more likely that 3 rejected? ] When the tests are not independent, it’s usually very difficult to arrive at the correct for an individual test so that a specified value results for the overall error rate. Categories of multiple comparison tests - “Planned”/ “a priori” comparisons (stated in advance, usually a linear combination of the column means equal to zero.) - “Post hoc”/ “a posteriori” comparisons (decided after a look at the data - which comparisons “look interesting”) - “Post hoc” multiple comparisons (every column mean compared with each other column mean) There are many multiple comparison procedures. We’ll cover only a few. Post hoc multiple comparisons 1)Pairwise comparisons: Do a series of pairwise tests; Duncan and SNK tests 2)(Optional) Comparisons to control: Dunnett tests Example: Broker Study A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used. Y=1000(A-P)/A where P=per share price paid for the stock; A=average of high price and low price per share, for the day. “The higher Y is the better the trade is.” CoL: broker 1 12 3 5 -1 12 5 6 2 7 17 13 11 7 17 12 3 8 1 7 4 3 7 5 4 21 10 15 12 20 6 14 5 24 13 14 18 14 19 17 } R=6 Five brokers were in the study and six trades were randomly assigned to each broker. SPSS Output Analyze>>General Linear Model>>Univariate… Tests of Between-Subjects Effects Dependent Variable: sales Source Corrected Model Intercept broker Error Total Corrected Total Type III Sum of Squares 640.800a 3499.200 640.800 530.000 4670.000 1170.800 df 4 1 4 25 30 29 Mean Square 160.200 3499.200 160.200 21.200 a. R Squared = .547 (Adjusted R Sq uared = .475) F 7.557 165.057 7.557 Sig . .000 .000 .000 Homogeneous Subsets sales Subset Student-Newman-Keuls Duncan a,b a,b broker 3.00 1.00 2.00 4.00 5.00 Sig . 3.00 1.00 2.00 4.00 5.00 Sig . N 6 6 6 6 6 6 6 6 6 6 Means for groups in homogeneous subsets are displayed. Based on Type III Sum of Squares The error term is Mean Square(Error) = 21.200. a. Uses Harmonic Mean Sample Size = 6.000. b. Alpha = .05. 1 5.0000 6.0000 .710 5.0000 6.0000 .710 2 12.0000 14.0000 17.0000 .165 12.0000 14.0000 17.0000 .086 Conclusion : 3, 1 2, 4, 5 Conclusion : 3, 1 2 4 5 ??? Conclusion : 3, 1 2 4 5 Broker 1 and 3 are not significantly different but they are significantly different to the other 3 brokers. Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly. Comparisons to Control Dunnett’s test Designed specifically for (and incorporating the interdependencies of) comparing several “treatments” to a “control.” Example: CONTROL Col 1 2 6 12 3 4 5 5 14 17 } R=6 CONTROL In our example: 1 2 3 4 5 6 12 5 14 17 Multiple Comparisons Dependent Variable: sales Dunnett t (2-sided) a (I) broker 2.00 3.00 4.00 5.00 (J) broker 1.00 1.00 1.00 1.00 Mean Difference (I-J) 6.0000 -1.0000 8.0000* 11.0000* Std. Error 2.65832 2.65832 2.65832 2.65832 Sig . .103 .987 .020 .001 Based on observed means. *. The mean difference is significant at the .05 level. a. Dunnett t-tests treat one g roup as a control, and compare all other groups ag ainst it. - Cols 4 and 5 differ from the control [ 1 ]. - Cols 2 and 3 are not significantly different from control. 95% Confidence Interval Lower Bound Upper Bound -.9300 12.9300 -7.9300 5.9300 1.0700 14.9300 4.0700 17.9300 Exercise: Sales Data Sales Col Mean 1 Treatment 2 3 6 3 8 3 6 3 8 3 5 6 5 4 9 6 5 4 9 6 11 10 8 11 11 10 8 11 10 Exercise. 1. Find the Anova table. 2. Perform SNK tests at a = 5% to group treatments . 3. Perform Duncan tests at a = 5% to group treatments. 4. Which treatment would you use? Post Hoc and Priori comparisons •F test for linear combination of column means (contrast) •Scheffe test: To test all linear combinations at once. Very conservative; not to be used for a few of comparisons.