Transcript Effect size
Instructor: Mr. Chu Duc Nghia Group members: Duong Thi Chi Pham Thi Hoa Pham Thi Mai Nguyen Thi Van 1 2 Problem: Susan predicts that students will learn most effectively with a constant background sound, as opposed to an unpredictable sound or no sound at all. She randomly divides twenty-four students into three groups of eight. All students study a passage of text for 30 minutes. Those in group 1 study with background sound at a constant volume in the background. Those in group 2 study with noise that changes volume periodically. Those in group 3 study with no sound at all. After studying, all students take a 10 point multiple choice test over the material. 0.05 3 SOURCES Among Within SS 30.08 87.88 df 2 21 MS 15.04 F 3.59 4.18 scores Sample mean Constant sound(1) 74686629 6 Random sound(2) 55344722 4 No sound(3) 24712155 3.375 F F ,k 1,nk F0.05,2,21 3.4668 Decision rule: reject Ho if DECISION: reject Ho as F=3.59> F0.05,2,21 3.4668 Conclusion: difference exists among average score of 3 group =>background sound affects studying results 4 1. 2. 3. Difference belongs to which pairs of means? How to identify? Multiple t-tests??? 5 Because the more means there are: • The more number of t-test we have to take • The greater type-I error ( the probability of rejecting the null hypothesis when it is true) => Using TUKEY TEST 6 Tukey test: is a multiple comparison procedure and statistical test developed by John Tukey. Characteristics: Compare all possible pairs of means to find which means are significantly different from one another Generally used in conjunction with an ANOVA Based on a studentized range distribution q 7 Identify the technique Problem objective: detect the difference between population means Data type: quantitative Experimental design: independent Assumptions The observations being tested are independent The means are from normally distributed populations There is equal variation across observations. (homoscedasticity) 8 Studentized distribution is built upon the formula: q x max x min MSE / n It is similar to student-t distribution: but q-distribution takes into account the number of means under consideration. The more means under consideration, the larger q value (studentized t). How it is built? We take random samples from independent populations of interest. Then identify the largest and the smallest mean among the sample means chosen, calculate difference between these two means, and then compute q as formula. After repeating the procedure many times, we get many value of q. These values form a q-distribution. 9 Step 1 : arrange the means from the smallest to the largest and calculate the difference b/w each pair of means. Step 2 : calculate the critical value ω : k: number of samples q , k ,v MSE ng v: d.f associated with MSE (v=n-k) α: significance level qα,k,v: critical value of studentized range (see in the table next slide) ng : number of observations *equal sample size: ng = n1 n2 n3 ... *unequal sample sizes: ng 2n1n2 n1 n2 10 Step 3 : compare the differences calculated & ω. If larger than ω the means pairs are significantly different. 11 The Tukey confidence limits MSE ( xl arg er xsmaller ) q ,v ,k ng How to use confidence interval?? - Calculate confidence intervals for each pair of means. - If the interval contains value 0, then conclude: difference of that pair is not significantly different from 0 - If the interval is in negative/positive side, then difference exist in that pair of means 12 Problem objective: detect the difference between population means Data type: quantitative Experimental design: independent use Tukey test with assumptions as The means (average scores of students from each groups) are from normally distributed populations There is equal variation across observations. (homoscedasticity) 13 Step 1 : No sound(3) Random S(2) Const. S(1) 3.375 4 6 No sound(3) 3.375 - 0.625 2.625 Random S(2) 4 - - 2 Const. S(1) 6 - - - Step 2: ω=q0.05 ,24-3,3 * Step 3 : MSE = q 0.05,21,3 * ng 4.18 8 =3.58*0.72=2.5776 see that the difference b/w constant sound group and no sound group is significant because 2.625>2.5776. 14 Other solution to example : using the Tukey confidence interval. The 95% confidence interval between 3 pairs of means are: 0.0474 x1 x3 5.2026 0.5776 x1 x2 4.5776 0.19526 x2 x3 3.2026 the intervals of x1 & x2 ; x2 & x3 contain zero not significantly different from zero difference between x1 & x3 is statistically significant or the difference b/w constant sound group and no sound group is significant . This conclusion is consistent with using Tukey test. 15 16 What if the result of Ex.1 change into: Not reject H0? This result may be explained by… Which kind of background sounds does not affect studying result (H0 is true) We made a wrong decision. (H0 is false but we couldn’t reject it) => We made type II error. => How to know we made wrong decision or not? Based on power of the test! 17 According to Cohen (1988), Power is “the probability of rejecting a null hypothesis when it is false — and therefore should be rejected.” H0 is true H0 is false Reject H0 Type I error = Correct decision = 1- = power Not reject H0 Correct decision = 1- Type II error = Example: Ho: beautiful girls are intelligent. Ha: beautiful girls are not intelligent. If beautiful girls are actually intelligent , but we say they are stupid, so we make Type I error!!! If they are actually not intelligent, but we say they are we commit Type II error! If they are actually not intelligent & we say they are not the test’s power is strong! 18 Non-rejection region 19 Role of power analysis : find optimal sample size + compute the test’s power to check how many % it will not make Type II error important! Priori Power Analysis • Before a research • Aim: find the optimal sample size to ensure the test is powerful (β≥0.8) . • too large sample size waste of time, money , effort, etc, • too small sample size low test’s power. Posteriori Power Analysis • After a research • Compute the test’s power. 20 Effect size Significance level (conventional 0.05) Sample size Types of test (ANOVA, ttest...) Power 21 Sample size: larger sample size more information collected the test is more powerful. But too large sample size waste of time, money & other resources. Statistical significance level ( conventional: 0.05): The greater alpha the smaller beta the more powerful. Effect size : the bigger effect size is the more power the test has. 22 EFFECT SIZE : show that difference is significant or not . Generally, effect size is calculated by taking the difference between the two groups and dividing it by the standard deviation. To interpret the resulting number, most social scientists use this general guide developed by Cohen: ▪ < 0.1 = trivial effect ▪ 0.1 - 0.3 = small effect ▪ 0.3 - 0.5 = moderate effect ▪ > 0.5 = large difference effect 23 Because effect size can only be calculated after data is collected, you will have to use an estimate for the power analysis. How to estimate?? Literature review: based on similar test in the same field in the past in which the author detected the effect size successfully. Based on experience, rationale, perception of yourself. Neutral: use a value of 0.5 as it indicates a moderate to large difference. 24 EFFECT SIZE: Effect size can be used for many types of tests, each test has a specific formula to calculate effect size. s For 2 means: ES x1 x2 with s For ANOVA: k: Number of groups ES 2 ( x x ) i k * MSE 25 Example : Testing the effectiveness of two different teaching method: A&B. 2 random samples of students which have the same studying result were taken from two classes to participate in the test. After 1 month, the result revealed that group A student has better scores than group B, measured by the mean scores of two groups. Group A’s result is 10 points higher than group B’s , s=30 x A xB with ES s ES= 10/30=0.33 moderate effect. 26 Using the example from Tukey test: α=0.05, medium ES , power =0.8, ANOVA with 3 groups. look at the table at the next slide, the required sample size each group is 52. 27 28 G* power (FREE and available at http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/) Power and Precision - Biostat (www.PowerAnalysis.com ) One-Stop F Calculator (Included in Murphy & Myors (2004)) PASS - NCSS software (www.ncss.com/pass.html) 29 Tukey test: Help detect where the difference belong to which pairs of means , simultaneously, control Type I error :α (reject Ho when it is true- serious case) But conservative: loss of power when compare all pair wise of means with a critical value. Power analysis Help best estimate the sample sizes when conducting different kinds of tests Make the test more meaningful as it points out the effect size of each test Avoid the case when researchers can not reject Ho and arbitrarily conclude that Ho is true 30 http://137.148.49.106/offices/assessment/Assessment%20Reports%202006/CoS/ Psychology%203%20of%203.pdf http://pcbfaculty.ou.edu/classfiles/MGT%206973%20Seminar%20in%20Research %20Methods/MGT%206973%20Res%20Methods%20Spr%202006/Week5%20Research%20Design%20and%20Primary%20Data%20Collection/Cohen%2 01992%20PB%20A%20power%20primer.pdf http://www.cvgs.k12.va.us/DIGSTATS/main/Guides/g_tukey.html http://www.epa.gov/bioiweb1/statprimer/power.html http://www.faculty.sfasu.edu/cobledean/Biostatistics/Lecture6/MultipleCompari sonTests.PDF http://web.mst.edu/~psyworld/tukeyssteps.html http://www.cvgs.k12.va.us/DIGSTATS/main/Guides/g_tukey.html http://faculty.vassar.edu/lowry/ch14pt2.html http://people.richland.edu/james/lecture/m170/ch13-1wy.html http://faculty.vassar.edu/lowry/vsanova.html http://www.statsoft.com/textbook/power-analysis/ http://math.yorku.ca/SCS/Online/power/ 31 32