Transcript Document
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven. Statistical Package Usage Topic: One Way ANOVA By Dr. Kelly Fan, Cal State Univ, East Bay Statistical Tools vs. Variable Types Response (output) Numerical Categorical Predictor (input) Numerical Categorical/Mixed Analysis of Variance (ANOVA) Simple Analysis of and Multiple Covariance Regression (ANCOVA) Categorical data analysis Example: Broker Study A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used. Y=1000(A-P)/A where P=per share price paid for the stock; A=average of high price and low price per share, for the day. “The higher Y is the better the trade is.” CoL: broker 1 12 3 5 -1 12 5 6 2 7 17 13 11 7 17 12 3 8 1 7 4 3 7 5 4 21 10 15 12 20 6 14 5 24 13 14 18 14 19 17 } R=6 Five brokers were in the study and six trades were randomly assigned to each broker. Statistical Model (Broker is, of course, represented as “categorical”) “LEVEL” OF BROKER 1 1 2 • • • • n 2 ••• • • •••C Y11 Y12 • • • • • • •Y1c Y21 • • • • • • Yij • • • • YnI • • • • • • • • •Ync Yij = j + ij i = 1, . . . . . , n j = 1, . . . . . , C One-Way Anova F-Test: HO: Level of X has no impact on Y HI: Level of X does have impact on Y HO: 1 = 2 = • • • • 8 HI: not all j are EQUAL ONE WAY ANOVA The GLM Procedure Dependent Variable: TRADE Source DF Sum of Squares Model 4 640.800000 160.200000 Error 25 530.000000 21.200000 Corrected Total 29 1170.800000 Mean Square R-Square Coeff Var Root MSE 0.547318 42.63283 4.604346 Estimate of the common standard deviation s F Value 7.56 TRADE Mean 10.80000 Pr > F 0.0004 Diagnosis: Normality Normality plot: normal scores vs. residuals • Don’t do the normality checking for all groups but only for the residuals • The points on the normality plot must more or less follow a line to claim “normal distributed”. • There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much. From the Broker data: 7. 5 5. 0 2. 5 R E S I D U A L 0 - 2. 5 - 5. 0 - 7. 5 - 10. 0 - 3 - 2 - 1 0 No r ma l Qu a n t i l e s 1 2 3 Diagnosis: Equal Variances Residual plot: predicted values vs. residuals • The points on the residual plot must be more or less within a horizontal band to claim “constant variances”. • There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not sensitive to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much. From the Broker data: RE S I DUA L 7 6 5 4 3 2 1 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 5 6 7 8 9 10 11 P RE DI CT E D 12 13 14 15 16 17 Multiple Comparison Procedures Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc. Pairwise Comparison Goal: grouping levels Method: Compare each pair of levels SNK procedure is a popular procedure and introduced here SAS Output for SNK Procedure Number of Means Critical Range 2 3 SNK Grouping Mean N BROKER A 17.000 6 5 14.000 6 4 A 12.000 6 2 B 6.000 6 1 5.000 6 3 A A B B 5 5.4749249 6.6214244 7.3120942 7.8071501 Means with the same letter are not significantly different. A 4 Conclusion : 5 4 2 13 Conclusion : 5 4 2 13 Brokers 1 and 3 are not significantly different each other but they are significantly different to the other 3 brokers. Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly. Comparisons to Control Dunnett Procedure Designed specifically for comparing several “treatments” to a “control.” Example: CONTROL Col 1 2 6 12 3 4 5 5 14 17 } R=6 CONTROL In our example: 1 2 3 6 12 5 14 17 Comparisons significant at the 0.05 level are indicated by ***. Simultaneous Difference 95% Between Confidence BROKER Comparison Means Limits 5-1 11.000 4.070 17.930 *** 4-1 8.000 1.070 14.930 *** 2-1 6.000 -0.930 12.930 3-1 -1.000 -7.930 5.930 - Cols 4 and 5 differ from the control [ 1 ]. - Cols 2 and 3 are not significantly different from control. 4 5 Contrast Question 1: Broker 1 vs. the others Question 2: Brokers 1, 2 are more experienced than the others. Experienced vs. less experienced brokers SAS Output for Question 1 Contrast BROKER 1 VS THE OTHERS DF Contrast SS Mean Square F Value Pr > F 1 172.8000000 172.8000000 8.15 0.0085 KRUSKAL - WALLIS TEST (Non - Parametric Alternative) HO: The probability distributions are identical for each level of the factor HI: Not all the distributions are the same Example: Life Insurance Amount State 1: CA 2: KA 3: CO 90 80 165 200 140 160 225 150 140 100 140 160 170 150 175 300 300 155 250 280 180 RE S I DUA L 200 100 0 - 100 - 200 160 170 180 P RE DI CT E D 190 200 KRUSKAL - WALLIS TEST Kruskal-Wallis Test Chi-Square DF Pr > Chi-Square 1.0791 2 0.5830 SAS Code DATA INSURANCE; INPUT STATE $ AMOUNT@@; DATALINES; CA 90 CA 200 CA 225 CA 100 CA 170 CA 300 CA 250 KA 80 KA 140 KA 150 KA 140 KA 150 KA 300 KA 280 CO 165 CO 160 CO 140 CO 160 CO 175 CO 155 CO 180 ; ** NON-PARAMETRIC TEST; PROC NPAR1WAY DATA=INSURANCE WILCOXON; TITLE "NONPARAMETRIC TEST TO COMPARE STATES"; CLASS STATE; VAR AMOUNT; RUN;