Transcript Lecture 10
Single-Factor ANOVA We skip Sec 10.3 Single-factor ANOVA focuses on a comparison of more than two population or treatment means. Let l = the number of populations or treatments being compared 1 = the mean of population 1 or the true average response when treatment 1 is applied . I = the mean of population I or the true average response when treatment I is applied 1 Single-Factor ANOVA The relevant hypotheses are H0: 1 = 2 = ··· = I versus Ha: at least two the of the i’s are different If I = 4, H0 is true only if all four i’s are identical. Ha would be true, for example, if 1 = 2 3 = 4, if 1 = 3 = 4 2, or if all four i’s differ from one another. 2 Notation and Assumptions Let xi, j = the random variable (rv) that denotes the jth measurement taken from the ith population, or the measurement taken on the jth experimental unit that receives the ith treatment xi, j = the observed value of xi, j when the experiment is performed 3 Notation and Assumptions Here we’ll focus on the case of equal sample sizes; Let J denote the number of observations in each sample (J = 6 in Example 1). The data set consists of IJ observations. The individual sample means will be denoted by X1, X2, . . ., XI. That is, 4 Notation and Assumptions The dot in place of the second subscript signifies that we have added over all values of that subscript while holding the other subscript value fixed, and the horizontal bar indicates division by J to obtain an average. Similarly, the average of all IJ observations, called the grand mean, is 5 Notation and Assumptions Assumptions The I population or treatment distributions are all normal with the same variance 2. That is, each xij is normally distributed with E(Xij) = i V(Xij) = 2 The I sample standard deviations will generally differ somewhat even when the corresponding ’s are identical. 6 The Test Statistic Definition Mean square for treatments is given by and mean square for error is The test statistic for single-factor ANOVA is F = MSTr/MSE. 7 The Test Statistic The terminology “mean square” will be explained shortly. Notice that uppercase X’s and S2’s are used, so MSTr and MSE are defined as statistics. We will follow tradition and also use MSTr and MSE (rather than mstr and mse) to denote the calculated values of these statistics. Each assesses variation within a particular sample, so MSE is a measure of within-samples variation. 8 The Test Statistic Proposition When H0 is true, E(MSTr) = E(MSE) = 2 whereas when H0 is false, E(MSTr) > E(MSE) = 2 That is, both statistics are unbiased for estimating the common population variance 2 when H0 is true, but MSTr tends to overestimate 2 when H0 is false. 9 F Distributions and the F Test Theorem Let F = MSTr/MSE be the test statistic in a single-factor ANOVA problem involving I populations or treatments with a random sample of J observations from each one. When H0 is true and the basic assumptions of this section are satisfied, F has an F distribution with v1 = I – 1 and v2 = I(J – 1). With f denoting the computed value of F, the rejection region f then specifies a test with significance level . 10 F Distributions and the F Test The rationale for v1 = I – 1 is that although MSTr is based on the I deviations X1 – X, . . ., X1 – X (X1 – X) = 0, so only I – 1 of these are freely determined. Because each sample contributes J – 1 df to MSE and these samples are independent, v2 = (J – 1) + · · · + (J – 1) = I(J – 1). 11 Sums of Squares The introduction of sums of squares facilitates developing an intuitive appreciation for the rationale underlying single-factor and multifactor ANOVAs. Let xi represent the sum (not the average, since there is no bar) of the xij’s for i fixed (sum of the numbers in the ith row of the table) and x denote the sum of all the xij’s (the grand total). 12 Sums of Squares Definition The total sum of squares (SST), treatment sum of squares (SSTr), and error sum of squares (SSE) are given by 13 Sums of Squares Fundamental Identity SST = SSTr + SSE (10.1) Thus if any two of the sums of squares are computed, the third can be obtained through (10.1); SST and SSTr are easiest to compute, and then SSE = SST – SSTr. The proof follows from squaring both sides of the relationship xij – x = (xij – xi) + (xi – x) (10.2) and summing over all i and j. 14 Sums of Squares Once SSTr and SSE are computed, each is divided by its associated df to obtain a mean square (mean in the sense of average). Then F is the ratio of the two mean squares. (10.3) 15 Sums of Squares The computations are often summarized in a tabular format, called an ANOVA table, as displayed in Table 10.2. Tables produced by statistical software customarily include a P-value column to the right of f. An ANOVA Table Table 10.2 16 Multiple Comparisons in ANOVA When the computed value of the F statistic in single-factor ANOVA is not significant, the analysis is terminated because no differences among the i’s have been identified. But when H0 is rejected, the investigator will usually want to know which of the i’s are different from one another. A method for carrying out this further analysis is called a multiple comparisons procedure. 17 Multiple Comparisons in ANOVA Several of the most frequently used procedures are based on the following central idea. First calculate a confidence interval for each pairwise difference i – j with i < j. Thus if I = 4. the six required CIs would be for 1 – 2 (but not also for 2 – 1), 1 – 3, 1 – 4, 2 – 3, 2 – 4, and 3 – 4. Then if the interval for 1 – 2 does not include 0, conclude that 1 and 2 differ significantly from one another; if the interval does include 0, the two ’s are judged not significantly different. 18