Effect size

Transcript Effect size

Instructor: Mr. Chu Duc Nghia
Group members:
Duong Thi Chi
Pham Thi Hoa
Pham Thi Mai
Nguyen Thi Van
1
2

Problem: Susan predicts that students will learn most effectively with
a constant background sound, as opposed to an unpredictable sound
or no sound at all. She randomly divides twenty-four students into
three groups of eight. All students study a passage of text for 30
minutes. Those in group 1 study with background sound at a constant
volume in the background. Those in group 2 study with noise that
changes volume periodically. Those in group 3 study with no sound at
all. After studying, all students take a 10 point multiple choice test
over the material.
  0.05
3
SOURCES
Among
Within
SS
30.08
87.88
df
2
21
MS
15.04
F
3.59
4.18
scores
Sample mean
Constant
sound(1)
74686629
6
Random
sound(2)
55344722
4
No sound(3)
24712155
3.375
F  F ,k 1,nk  F0.05,2,21  3.4668

Decision rule: reject Ho if


DECISION: reject Ho as F=3.59> F0.05,2,21  3.4668
Conclusion: difference exists among average score of 3 group
=>background sound affects studying results
4
1.
2.
3.
Difference belongs to which pairs of means?
How to identify?
Multiple t-tests???
5
Because the more means there are:
• The more number of t-test we have to take
• The greater type-I error ( the probability of rejecting the null hypothesis
when it is true)
=> Using TUKEY TEST
6
Tukey test: is a multiple comparison procedure and
statistical test developed by John Tukey.
Characteristics:

Compare all possible pairs of means to find which
means are significantly different from one another

Generally used in conjunction with an ANOVA

Based on a studentized range distribution q
7
Identify the technique

Problem objective: detect the difference between
population means

Data type: quantitative

Experimental design: independent
Assumptions

The observations being tested are independent

The means are from normally distributed populations

There is equal variation across observations.
(homoscedasticity)
8

Studentized distribution is built upon the formula:
q
x max  x min
MSE / n

It is similar to student-t distribution: but q-distribution takes into
account the number of means under consideration. The more
means under consideration, the larger q value (studentized t).

How it is built?
We take random samples from independent populations of interest.
Then identify the largest and the smallest mean among the sample
means chosen, calculate difference between these two means, and
then compute q as formula. After repeating the procedure many
times, we get many value of q. These values form a q-distribution.
9
Step 1 : arrange the means from the smallest to the largest and calculate the difference b/w
each pair of means.
Step 2 : calculate the critical value ω :
k: number of samples
  q , k ,v
MSE
ng
v: d.f associated with MSE (v=n-k)
α: significance level
qα,k,v: critical value of studentized range (see in the table next slide)
ng : number of observations
*equal sample size: ng = n1  n2  n3  ...
*unequal sample sizes:
ng 
2n1n2
n1 n2
10
Step 3 : compare the differences
calculated & ω. If larger than ω 
the means pairs are significantly
different.
11

The Tukey confidence limits
MSE
( xl arg er  xsmaller )  q ,v ,k 
ng

How to use confidence interval??
- Calculate confidence intervals for each pair of means.
- If the interval contains value 0, then conclude: difference of
that pair is not significantly different from 0
- If the interval is in negative/positive side, then difference exist
in that pair of means
12
Problem objective: detect the difference between population
means
 Data type: quantitative
 Experimental design: independent
 use Tukey test with assumptions as
 The means (average scores of students from each groups) are
from normally distributed populations
 There is equal variation across observations.
(homoscedasticity)

13
Step 1 :
No sound(3)
Random S(2)
Const. S(1)
3.375
4
6
No sound(3)
3.375
-
0.625
2.625
Random S(2)
4
-
-
2
Const. S(1)
6
-
-
-
Step 2:
ω=q0.05 ,24-3,3 *
Step 3 :
MSE = q
0.05,21,3 *
ng
4.18
8
=3.58*0.72=2.5776
see that the difference b/w constant sound group and no sound group is
significant because 2.625>2.5776.
14
Other solution to example : using the Tukey confidence interval.
The 95% confidence interval between 3 pairs of means are:
0.0474 x1  x3  5.2026
 0.5776 x1  x2  4.5776
 0.19526 x2  x3  3.2026
the intervals of
x1 & x2 ; x2 & x3
contain zero  not significantly
different from zero difference between x1 & x3 is statistically
significant or the difference b/w constant sound group and no sound
group is significant . This conclusion is consistent with using Tukey test.
15
16


What if the result of Ex.1 change into: Not
reject H0?
This result may be explained by…
 Which kind of background sounds does not affect
studying result (H0 is true)
 We made a wrong decision. (H0 is false but we
couldn’t reject it) => We made type II error.
=> How to know we made wrong decision or
not? Based on power of the test!
17

According to Cohen (1988), Power is “the probability of rejecting a null hypothesis when it is
false — and therefore should be rejected.”
H0 is true
H0 is false
Reject H0
Type I error
=
Correct decision
= 1-  = power
Not reject H0
Correct decision
= 1-
Type II error
=
Example: Ho: beautiful girls are intelligent.
Ha: beautiful girls are not intelligent.
 If beautiful girls are actually intelligent , but we say they are stupid, so we make Type I
error!!!
 If they are actually not intelligent, but we say they are we commit Type II error!
 If they are actually not intelligent & we say they are not the test’s power is strong!

18
Non-rejection
region
19
Role of power analysis : find optimal sample size + compute the test’s
power to check how many % it will not make Type II error important!
Priori Power Analysis
• Before a research
• Aim: find the optimal sample
size to ensure the test is
powerful (β≥0.8) .
• too large sample size waste
of time, money , effort, etc,
• too small sample size low
test’s power.
Posteriori Power Analysis
• After a research
• Compute the test’s
power.
20
Effect size
Significance
level
(conventional
0.05)
Sample size
Types of test
(ANOVA, ttest...)
Power
21

Sample size: larger sample size more information collected the test is
more powerful. But too large sample size waste of time, money & other
resources.

Statistical significance level ( conventional: 0.05): The greater alpha the
smaller beta the more powerful.

Effect size : the bigger effect size is  the more power the test has.
22
EFFECT SIZE : show that difference is significant or not .

Generally, effect size is calculated by taking the difference between the
two groups and dividing it by the standard deviation.

To interpret the resulting number, most social scientists use this general
guide developed by Cohen:
▪ < 0.1 = trivial effect
▪ 0.1 - 0.3 = small effect
▪ 0.3 - 0.5 = moderate effect
▪ > 0.5 = large difference effect
23
Because effect size can only be calculated after data is collected, you will
have to use an estimate for the power analysis. How to estimate??
 Literature review: based on similar test in the same field in the past in
which the author detected the effect size successfully.
 Based on experience, rationale, perception of yourself.
 Neutral: use a value of 0.5 as it indicates a moderate to large
difference.
24

EFFECT SIZE:
Effect size can be used for many types of tests, each test has a
specific formula to calculate effect
size.
s

For 2 means:
ES 

x1  x2
with
s
For ANOVA:
k: Number of groups
ES 
2
(
x

x
)
 i
k * MSE
25
Example :
 Testing the effectiveness of two different teaching method: A&B. 2
random samples of students which have the same studying result
were taken from two classes to participate in the test. After 1
month, the result revealed that group A student has better scores
than group B, measured by the mean scores of two groups. Group
A’s result is 10 points higher than group B’s , s=30

x A  xB
with
ES 
s
 ES= 10/30=0.33 moderate effect.
26
Using the example from Tukey test:
α=0.05, medium ES , power =0.8, ANOVA with 3 groups.
look at the table at the next slide, the required sample size
each group is 52.
27
28




G* power (FREE and available at
http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/)
Power and Precision - Biostat
(www.PowerAnalysis.com )
One-Stop F Calculator (Included in Murphy &
Myors (2004))
PASS - NCSS software
(www.ncss.com/pass.html)
29

Tukey test:
 Help detect where the difference belong to which pairs of means ,
simultaneously, control Type I error :α (reject Ho when it is true- serious case)

But conservative: loss of power when compare all pair wise of means with a
critical value.

Power analysis
 Help best estimate the sample sizes when conducting different kinds of tests
 Make the test more meaningful as it points out the effect size of each test
 Avoid the case when researchers can not reject Ho and arbitrarily conclude that
Ho is true
30












http://137.148.49.106/offices/assessment/Assessment%20Reports%202006/CoS/
Psychology%203%20of%203.pdf
http://pcbfaculty.ou.edu/classfiles/MGT%206973%20Seminar%20in%20Research
%20Methods/MGT%206973%20Res%20Methods%20Spr%202006/Week5%20Research%20Design%20and%20Primary%20Data%20Collection/Cohen%2
01992%20PB%20A%20power%20primer.pdf
http://www.cvgs.k12.va.us/DIGSTATS/main/Guides/g_tukey.html
http://www.epa.gov/bioiweb1/statprimer/power.html
http://www.faculty.sfasu.edu/cobledean/Biostatistics/Lecture6/MultipleCompari
sonTests.PDF
http://web.mst.edu/~psyworld/tukeyssteps.html
http://www.cvgs.k12.va.us/DIGSTATS/main/Guides/g_tukey.html
http://faculty.vassar.edu/lowry/ch14pt2.html
http://people.richland.edu/james/lecture/m170/ch13-1wy.html
http://faculty.vassar.edu/lowry/vsanova.html
http://www.statsoft.com/textbook/power-analysis/
http://math.yorku.ca/SCS/Online/power/
31
32