Why effect size? - Department of Education

Download Report

Transcript Why effect size? - Department of Education

Funded through the ESRC’s Researcher
Development Initiative
Sessions 1.2-1.3: Effect Size Calculation
Prof. Herb Marsh
Ms. Alison O’Mara
Dr. Lars-Erik Malmberg
Department of Education,
University of Oxford
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
2
The effect size makes meta-analysis possible
 It is based on the “dependent variable” (i.e., the outcome)
 It standardizes findings across studies such that they can
be directly compared
Any standardized index can be an “effect size” (e.g.,
standardized mean difference, correlation coefficient,
odds-ratio), but must
 be comparable across studies (standardization)
 represent magnitude & direction of the relationship
 be independent of sample size
Different studies in same meta-analysis can be based
on different statistics, but have to transform each to a
standardized effect size that is comparable across
different studies
Sample size, significance and d
effect size
Study 1
N
M
SD
t
p
d
Exp
10
105
15
Cntr
10
100
15
Study 3
N
M
SD
t
p
d
Exp
100
105
15
Study 2
N
M
SD
t
p
d
Exp
50
105
15
Cntr
50
100
15
Cntr
100
100
15
XLS
Sample size, significance and d
effect size
Study 1
N
M
SD
t
p
d
Exp
10
105
15
Cntr
10
100
15
0.750
0.466
0.333
Study 3
N
M
SD
t
p
d
Study 2
N
M
SD
t
p
d
Exp
100
105
15
Exp
50
105
15
Cntr
50
100
15
1.667
0.099
0.333
Cntr
100
100
15
2.360
0.019
0.333
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
XLS
5
Simulate ds on homemade calculator
T-test and effect sizes
(ES.xls)
M
SD
N
Treatment
Control
105 pooled SD
100
15
15
15
15
15
T
0.91
0.91
28
0.3691
0.33
DF
sign
d
one or two tailed
2
M
SD
N
Treatment
Control
100 pooled SD
105
15
15
15
15
15
T
DF
sign
d
one or two tailed
-0.91
0.91
28
0.3691
-0.33
2
Change direction of effects
Change Ns (equal or same?)
Change SDs
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
XLS
6
Effect size as proportion in the Treatment
group doing better than the average
Control group person
d = .20
d = .80
d = .50
0.50
0.50
0.50
0.40
0.40
0.40
0.30
0.30
0.30
0.20
0.20
0.20
0.10
0.10
0.10
0.00
0.00
0.00
57% of T above xc
69% of T above xc
79% of T above xc
= Control
= Treatment
7
Effect size as proportion of success in the
Treatment versus Control group (Binomial
Effect Size Display = BESD):
d = .20
d = .80
d = .50
0.50
0.50
0.50
0.40
0.40
0.40
0.30
0.30
0.30
0.20
0.20
0.20
0.10
0.10
0.10
0.00
0.00
0.00
Success:
55% of T, 45% of C
Success:
62% of T, 38% of C
xc
Success:
68% of T, 32% of C
= Control
= Treatment
8
Why effect size?
 Long focus on significance level (safe-guarding
against Type I (a) error) – today focus on practical
and meaningful significance.
Cohen, J. (1994). The earth is round (p < .05),
American Psychologist, 49, 997–1003.
Real world
H0 True
Accept
Study
H0
ok
Accept
H1
Type I (a) error
H1 True
Type II () error
ok
9
A short history of the effect size (Huberty, 2002;
see also Olejnik & Algina, 2000 for review of effect sizes)
10
Power and effect size
Power: “Finding what is out there”
Type II () error “not finding what is out there”
Power (1 – ): the probability of rejecting a false H0
hypothesis
Power of .80 or .90 in primary research
11
Power, sought effect size, at significance
level a = .05 in primary research (prior to
conducting study)
Sample size for three effects sizes, a = .05
N needed per sample
600
500
400
effect .20
effect .25
300
effect .30
200
100
0
0.50
0.60
0.70
0.80
0.90
power
12
How meaningful is a “small” effect size?
Raw counts
No heart
attack
Aspirin
10,933.00
Placebo 10,845.00
Total
21,778.00
Heart
attack
Total
104 11,037.00
189 11,034.00
293 22,071.00
Percentages (row)
No heart
attack
Aspirin
99.06
Placebo
98.29
Total
98.67
Heart
attack
Total
0.94
1.71
1.33
100
100
100
Binomial effect size display (proportions)
No heart
Heart
attack
attack
Total
Aspirin
0.517
0.483
100
Placebo
0.483
0.517
100
Total
100
100
200
 A small effect size
changed the course of
an RCT in 1987: placebo
group participants were
given aspirin instead
(see Rosenthal, 1994, p.
242)
[21]  25.01
r  
2
n

25.01
 0.034
22071
XLS
r =  = .034 (r2 = .0011)
BESD (Binomial Effect Size Display):
Treatment success rate .50 - r/2
Condition treatment success rate .50 - r/2
13
 Within the one meta-analysis, can include studies
based on any combination of statistical analysis
(e.g., t-tests, ANOVA, correlation, odds-ratio, chisquare, etc).
 The “art” of meta-analysis is how to compute effect
sizes based on non-standard designs and studies
that do not supply complete data (see
Lipsey&Wilson_AppB.pdf).
 Convert all effect sizes into a common metric based
on the “natural” metric given research in the area.
E.g. d, r, OR
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
14
 Standardized mean difference
 Group contrast research
 Treatment groups
 Naturally occurring groups
 Inherently continuous construct
 Correlation coefficient
 Association between inherently continuous constructs
 Odds-ratio
 Group contrast research
 Treatment or naturally occurring groups
 Inherently dichotomous construct
 Regression coefficients and other multivariate effects
 Requires access to covariance-variance (correlation) matrices for each
included study
15
Calculating ds (1)
Means and standard
deviations
Correlations
P-values
F-statistics
t-statistics
d
Almost all test
statistics can
be transformed
into an
standardized
effect size “d”
“other” test statistics
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
16
16
Calculating ds (1)
 Represents a standardized group contrast on an
inherently continuous measure
 Uses the pooled standard deviation
 Commonly called “d”
X  X G2
ES  G1
s pooled
If n1  n2 s pooled
If n1  n2 s pooled
s12  s22

2
( s12 (n1  1))  ( s22 (n2  1))

(n1  n2  2)
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
Various contrast effect sizes
 Cohen’s d
X G1  X G 2
ES  d 
SDpooled
 Hedge’s g
X G1  X G 2
ES  g 
S pooled
 Glass’s D
X G1  X G 2
ES  D 
sC
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
18
Calculating d (1) using Ms, SDs and ns
T-test and effect sizes
Treatment
M
25
SD
5
N
25
T
DF
sign
d
one or two tailed
Control
pooled SD
5
3.6927
3.6927
53
0.0005
1.0000
2
20
5
30
X G1  X G 2 25  20
ES 

 1.00
s pooled
5
ES 
X Exper  X Control
s pooled
Remember to code
treatment effect in
positive direction!
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
19
ES_calculator.xls
20
Calculating d (2) using ES calculator, using
Ms, ns, and t-value
21
Calculating d (3) using ES calculator, using
ns, and t-value
 The treatment group
scored higher than
the control group at
Time 2 (t[28]= 4.11;
p<.001).
 From sample
description we learn
that n1 = n2
ES  t
ESm  t
n1  n2
n1n2
n1  n2
15  15
30
 4.11
 4.11
 4.11 .1333  (4.11)(0.365)  1.50
n1n2
(15)(15)
225
22
Calculating d (3) correcting for small
sample bias
Hedges proposed a correction for small sample size
bias (ns < 20)
Must be applied before analysis
3 

'
ESsm
 ESsm 1 

 4N  9 
ES
'
sm
3 

 1.51 
 (1.5)(.97)  1.46

 4N  9 
23
Calculating d (4) using ES calculator, using
ns, and F-value
F
1.6
ES  2
2
 .40
N
40
Remember: in a
two-group
ANOVA F = t2
24
Calculating d (5) using ES calculator, using
p-value
“The mean-level comparison was not significant (p = .53)”
25
T-test table
df = (n1 + ns –2)
Sometimes
authors only
report e.g.,
p<.01 (n = 22).
If so, use a
conservative
approach to
reading the ttest table.
NOTE: When p = n.s.
some researchers code
d = 0 in data base
26
Example dataset so far (1) (ES_enter.sav):
study
es
1
2
3
4
5
1.00
0.80
1.46
0.40
0.10
Treat
Cntr
n
Groups
25
30
55
2
40
40
80
2
15
15
30
2
20
20
40
2
80
75
155
2
27
Use all available tools for calculating the
following 5 effect sizes
ES 6: MT = 21, MC = 20, nT = 60, nC = 60, t = .55
ES 7: MT = 103.5, MC = 100, SDT = 22.0, SDC = 18.5, nT
= 45, nC = 35,
ES 8: nT = 45, nC = 40, p <.05
ES 9: nT = 100, nC = 120, F = 8.73
ES 10: nT = 200, nC = 160, t = 5.66
(see electronic document: “Correct ds for 5 effect sizes.doc”)
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
28
Example dataset so far (2) (ES_enter.sav):
study
es
1
2
3
4
5
6
7
8
9
10
1.00
0.80
1.46
0.40
0.10
0.10
0.17
0.43
0.40
0.60
Treat
Cntr
n
Groups
25
30
55
2
40
40
80
2
15
15
30
2
20
20
40
2
80
75
155
2
60
60
120
2
45
35
80
2
45
40
85
2
100
120
220
2
200
160
360
2
29
Calculating d (11) using ES calculator,
using number of successful outcomes per
group
Frequencies
Success
Failure
Treatment Group
a
b
Control Group
c
d
Treatment
Control
Total
Success Failure Total
28
28
56
31
34
65
59
62
121
ad
ES OR 
bc
28 34
ES OR 
 1.097
31 28
loge (1.097)  .092
logit  .092/ 1.83  .05
30
Calculating d (11) using ES calculator,
using number of successful outcomes per
group
Success Failure Total
Treatment
Control
Total
28
31
59
28
34
62
56
65
121
31
Calculating d (12) using ES calculator,
using proportion of successes per group
(53% vs. 48.5%)
32
Calculating d (13) using paired t-test (only
one experimental group; “each person their
own control”)
Don’t use the
X T 2  X T1
ESsg 
sPooled
X T 1  4.50, X T 2  6.25, sP  2.50
6.25  4.50
ESsg 
 .70
2.50
SD of the
change score!
r = correlation
between Time 1
and Time 2
s Pooled  ( sT21  sT2 2 ) / 2
33
Calculating d (14) using paired t-test (only
one experimental group)
n (pairs) = 90, t-value = 6.5, r = .70
34
Calculating d (15)
 “The 20 participants increased .84 z-scores between
time 1 and time 2 (p<.01)”
 ES = .84
 Correct for small sample bias
ES
'
sm


3
 .841 
 (.84)(.957)  .80

 (4)(20)  9 
35
Example dataset so far 3 (ES_enter.sav):
study
es
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1.00
0.80
1.46
0.40
0.10
0.10
0.17
0.43
0.40
0.60
0.05
0.10
0.70
0.53
0.80
Treat
Cntr
n
Groups
25
30
55
2
40
40
80
2
15
15
30
2
20
20
40
2
80
75
155
2
60
60
120
2
45
35
80
2
45
40
85
2
100
120
220
2
200
160
360
2
56
65
121
2
70
80
150
2
80
0
80
1
90
0
90
1
20
0
20
1
Method
difference: mean
contrast and gain
scores
36
Summary of equations from Lipsey &
Wilson (2001) (for more formulae see Lipsey &
Wilson Appendix B)
37
Weighting for mean-level differences
The effect sizes are weighted by the inverse of the
variance to give more weight to effects based on
larger sample sizes
Variance for mean level comparison is calculated as
di2
(n1 n 2 )
vi 

(n1 n 2 )
2(n1 n 2 )
The standard error of each effect size is given by
the square root of the sampling variance
SE =  vi
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
38
Enter_w.xls
d i2
( n1  n 2 )

( n1 n 2 )
2( n1  n 2 )
se 
study
es
Treat
Cntr
n
Groups
se
1
1.00
0.80
1.46
0.40
0.10
0.10
0.17
0.43
0.40
0.60
0.05
0.10
25
30
55
2
0.2871
40
40
80
2
0.2324
15
15
30
2
0.4109
20
20
40
2
0.3194
80
75
155
2
0.1608
60
60
120
2
0.1827
45
35
80
2
0.2258
45
40
85
2
0.2198
100
120
220
2
0.1367
200
160
360
2
0.1084
56
65
121
2
0.1824
70
80
150
2
0.1638
2
3
4
5
6
7
8
9
10
11
12
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
39
Weighting for gain scores
SE for gain scores
SE sg 
2(1  r )

n
2n
ES sg2
T1 and T2
scores are
dependent so
we need to get
correlation
between T1 and
T2 into equation
(not always
reported)
Inverse variance for gain scores
1
2n
wsg 

2
SEsg 4(1  r )  ESsg2
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
40
Enter_w.xls
SE sg 
2(1  r )

n
2n
ES sg2
study
es
Treat
Cntr
n
Groups
r
se
13
0.70
0.53
0.80
80
0
80
1
0.65
0.1087
90
0
90
1
0.70
0.0907
20
0
20
1
0.50
0.2569
14
15
XLS
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
41
Compute the weighted mean ES and s.e.
of the ES in SPSS (var_ofES.sps) (1)
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
42
Compute the weighted mean ES and s.e.
of the ES in SPSS (var_ofES.sps) (2)
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
43
Compute the weighted mean ES and s.e.
of the ES
Weight the ES by the inverse of the s.e.
1
w
2
SEES
The average ES
( wi ESi )
ES 
wi
Standard error of the ES
SE ES 
1
wi
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
44
Enter_w.xls
w
1
1

2
SEES
(0.2871) 2
wes  w  ES 
12.131.000  12.13
study
es
Treat
Cntr
n
Groups
se
w
wes
1
1.00
0.80
1.46
0.40
0.10
0.10
0.17
0.43
0.40
0.60
0.05
0.10
25
30
55
2
0.2871
12.13
12.13
40
40
80
2
0.2324
18.52
14.81
15
15
30
2
0.4109
5.92
8.65
20
20
40
2
0.3194
9.80
3.92
80
75
155
2
0.1608
38.66
3.87
60
60
120
2
0.1827
29.96
3.00
45
35
80
2
0.2258
19.62
3.34
45
40
85
2
0.2198
20.70
8.90
100
120
220
2
0.1367
53.48
21.39
200
160
360
2
0.1084
85.11
51.06
56
65
121
2
0.1824
30.07
1.50
70
80
150
2
0.1638
37.29
3.69
2
3
4
5
6
7
8
9
10
11
12
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
45
study
es
Treat
Cntr
n
Groups
1
1.00
0.80
1.46
0.40
0.10
0.10
0.17
0.43
0.40
0.60
0.05
0.10
0.70
0.53
0.80
25
30
55
40
40
15
2
3
4
5
6
7
8
9
10
11
12
13
14
15
r
se
w
wes
2
0.2871
12.13
12.13
80
2
0.2324
18.52
14.81
15
30
2
0.4109
5.92
8.65
20
20
40
2
0.3194
9.80
3.92
80
75
155
2
0.1608
38.66
3.87
60
60
120
2
0.1827
29.96
3.00
45
35
80
2
0.2258
19.62
3.34
45
40
85
2
0.2198
20.70
8.90
100
120
220
2
0.1367
53.48
21.39
200
160
360
2
0.1084
85.11
51.06
56
65
121
2
0.1824
30.07
1.50
70
80
150
2
0.1638
37.29
3.69
80
0
80
1
0.65
0.1087
84.66
59.26
90
0
90
1
0.70
0.0907
121.55
64.42
20
0
20
1
0.50
0.2569
15.15
12.12
Sums
ES 
( wi ESi ) 272.07

 0.47
wi
582.63
SEES 
1
1

 0.04
wi
582.63
582.63
272.07
Funnel plot for x = sample size, y = ES
 Does average of ES converge toward the average of
the largest (n) study?
average es
se of mean es
95% C.I. Lower
95% C.I. Upper
Effect sizes by sam ple size
1.60
1.40
1.20
1.00
0.47
0.04
0.39
0.55
95% C.I. = ±1.96 * s.e.
0.80
0.60
0.40
99% C.I. = ±2.58 * s.e.
0.20
0.00
0
50
100
150
200
250
300
350
400
99.9% C.I. = ±3.29 * s.e.
47
Funnel plot including s.e. of ES
 ES in smaller sample has larger standard error
(s.e.)
Effect sizes by sam ple size
2.00
1.75
1.50
1.25
1.00
0.75
0.50
0.25
0.00
-0.25 0
50
100
150
200
250
300
350
400
48
Population
N = ‘size’
m = ‘mean’
d = ‘effect size’
Sample
n = ‘size’
m = ‘mean’
d = ‘effect size’
Interval estimates
The “likely” population
parameter is the sample
parameter ± uncertainty
 Standard errors (s.e.)
 Confidence intervals (C.I.)
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
49
Calculating rs
Means and standard
deviations (d)
2 
P-values
F-statistics
t-statistics
r
Almost all test
statistics can
be transformed
into an
standardized
effect size “r”
“other” test statistics
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
50
Correlations / relationships between
variables
 rxy Pearson’s product moment coefficient
(continuous  continuous)
 Rpb Bi-serial correlation (dichotomous 
continuous)
 2 (dichotomous  dichotomous)
 rsSpearman’s rank-order coefficient (ordinal 
ordinal)
And others, e.g.,
  coefficient, Odds-Ratio (OR)
 Cramer’s V, Contingency coefficient C
 Tetrachoric and polychoric correlations …. (etc)
51
Bias when dichotomising continuous
variables
X or Y are both “truly” continuous, but in the study
either is dichotomised
X = continuous, Y =50/50 split gives an rpb that is 80% of
its value, had it been continuous
X or Y are both “truly” continuous, but both are
dichotomised
Maximum value of  if x = 30/70 split and Y = 50/50 split is
 = .33
52
Calculating rs from d (1)
r can be used in all
situations d can, but d
cannot be used in all
situations where r is
appropriate
r
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-0.10
-0.20
-0.30
-0.40
-0.50
-0.60
-0.70
-0.80
-0.90
d
4.13
2.67
1.96
1.50
1.15
0.87
0.63
0.41
0.20
0.00
-0.20
-0.41
-0.63
-0.87
-1.15
-1.50
-1.96
-2.67
-4.13
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
53
Calculating rpb (2)
 If inherently continuous X and Y, mean-contrast is a
better option than rpb
ESsm 
2rpb
1 r 2
54
Calculating r (3) from t-value
Appropriate for both independent and dependent
samples t-test values
 t
r   2
 t  df
2

t
 
2
t  df

Calculating r (4) from 2-value
r  
2

2
N

2
Z
N
55
Sources of error
Cf. Structural Equation Model (circle = latent/
unobserved construct, rectangle = manifest/
observed variable)
rx*y*
Latent
(unobserved) X
rxx
Latent
(unobserved) Y
rx* y* 
Manifest (observed)
variable x
rxy
rxx ryy
ryy
Manifest (observed)
variable y
rxy
56
Alternatively: transform rs
into Fisher’s Zr-transformed
rs, which are more normally
distributed
ES Zr
1  ES r 
 .5 log e 

1  ES r 
SEZr 
1
n3
r
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-0.10
-0.20
-0.30
-0.40
-0.50
-0.60
-0.70
-0.80
-0.90
Fisher's zr
1.47
1.10
0.87
0.69
0.55
0.42
0.31
0.20
0.10
0.00
-0.10
-0.20
-0.31
-0.42
-0.55
-0.69
-0.87
-1.10
-1.47
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
57
rxy
rx* y* 
rxx ryy

.46
 .46 / .74  .61
.80  .70
study
r
n
aY
aX
rx*y*
SEZr
FisherZr
FisherZr_dis
1
0.460
0.330
0.250
-0.200
-0.250
-0.400
-0.100
0.100
0.275
0.150
145
0.80
0.70
0.6147
0.0839
0.4973
0.7164
132
0.70
0.71
0.4681
0.0880
0.3428
0.5076
80
0.83
0.78
0.3107
0.1140
0.2554
0.3213
442
0.82
0.80
-0.2469
0.0477
-0.2027
-0.2521
662
0.86
0.69
-0.3245
0.0390
-0.2554
-0.3367
320
0.75
0.80
-0.5164
0.0562
-0.4236
-0.5714
450
0.89
0.83
-0.1163
0.0473
-0.1003
-0.1169
106
0.82
0.87
0.1184
0.0985
0.1003
0.1190
1927
0.71
0.76
0.3744
0.0228
0.2823
0.3935
2863
0.80
0.83
0.1841
0.0187
0.1511
0.1862
2
3
4
5
6
7
8
9
10
rr.xls
SEZr 
1
n3
ES Zr
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
1  ES r 
 .5 log e 

1  ES r 
58
Ten effect sizes (r)
0.600
0.400
ES (r)
0.200
0.000
-0.200
r
0
500
1000
1500
2000
2500
3000
3500
-0.400
-0.600
N
Ten disattenuated ES (rx*y*)
ES (disaattenuated r)
0.800
0.600
0.400
0.200
rx*y*
0.000
-0.200 0
500
1000
1500
2000
2500
3000
3500
-0.400
-0.600
N
59
Calculating OR
(chi2.sps)
r  
2

2
[1]
N

2
Z
N
60
inocul incoulated * escape escaped disease Crosstabulation
Count
inocul incoulated
0 non-inoculated
1 inoculated
Total
escape escaped
disease
0 caught
disease
1 escaped
75
204
32
265
107
469
Total
279
297
576
[21]  24.68
r  
2
n

24.68
 0.207
576
61
Frequencies
Success
Failure
Treatment Group
a
b
Control Group
c
d
ad
ESOR 
bc
265 75
ES OR 
 3.04
204 32
.46  .06
or : ES OR 
 3.04
.35 .13
loge (3.04)  1.11
Risk Estimate
Value
Odds Ratio for
inocul incoulated
(0 non-inoculated /
1 inoculated)
For cohort escape
escaped disease =
0 caught disease
For cohort escape
escaped disease =
1 escaped
N of Valid Cases
95% Confidence
Interval
Lower
Upper
3.045
1.937
4.786
2.495
1.706
3.649
.819
.755
.889
576
logit effectsize 1.11/ 1.83  .61
62
Pearson’s 5 studies escaping Enteric Fever
(1904)
N
Study
1
2
3
4
5
%
non-inoculated
cases escaped
75
204
1489
9040
257
10724
82
1203
1475
109034
inoculated
cases escaped
32
265
35
1670
26
2509
72
1135
84
10798
Ratios
non-inoculated
inoculated
Study escaped / cases escaped / cases
1
2.72
8.28
2
6.07
47.71
3
41.73
96.50
4
14.67
15.76
5
73.92
128.55
non-inoculated
cases escaped
0.27
0.73
0.14
0.86
0.02
0.98
0.06
0.94
0.01
0.99
inoculated
cases escaped
0.11
0.89
0.02
0.98
0.01
0.99
0.06
0.94
0.01
0.99
Odds-ratio (OR) LN(OR)
success of
oculation /
success of non- (natural logarithm
oculation
of OR)
3.04
1.11
7.86
2.06
2.31
0.84
1.07
0.07
1.74
0.55
63
p
odds of an event 
1 p
where p  probability of event
p/(1-p)
p
0.01
0.05
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
0.95
0.99
0.01
0.05
0.11
0.25
0.43
0.67
1.00
1.50
2.33
4.00
9.00
19.00
99.00
Logit(p/(1-p)
-4.595
-2.944
-2.197
-1.386
-0.847
-0.405
0.000
0.405
0.847
1.386
2.197
2.944
4.595
EXP(x)
0.0101
0.0526
0.1111
0.2500
0.4286
0.6667
1.0000
1.5000
2.3333
4.0000
9.0000
19.0000
99.0000
64
265 75
ES OR 
 3.044
32 204
loge (3.044)  1.113
XLS
1.113/ 1.83  0.61
ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
65
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
66
67
Each study is one lineVariance
in
of the effect size
Sample sizes
the data base Effect size
DurationReliability of
the instrument
68
Organising effect sizes within study (1) “Flat
dataset”
Study_ID
001
002
003
004
005
006
ES1
0.77
0.20
0.40
0.25
0.30
0.60
ES2
.
0.05
0.30
0.22
0.40
0.50
ES3
.
0.10
.
.
0.10
0.30
ES4
.
.
.
.
.
0.30
DV1cat DV2cat DV3cat DV4cat
1
2
2
4
3
1
1
1
4
4
2
2
2
1
4
Categories of verbal DV
1 = verbal IQ
2 = reading comprehension
3 = reading-lag
4 = spelling-lag
69
Organising effect sizes within study (2)
“hierarchical dataset” (effect sizes nested within
study)
Study_ID
001
002
002
002
003
003
004
004
005
005
005
006
006
006
006
ES
0.77
0.20
0.05
0.10
0.40
0.30
0.25
0.22
0.30
0.40
0.10
0.60
0.50
0.30
0.30
DVcat
1
2
2
4
3
1
1
1
4
4
2
2
2
1
4
Categories of verbal DV
1 = verbal IQ
2 = reading comprehension
3 = reading-lag
4 = spelling-lag
70
Organising effect sizes within study (3)
“hierarchical dataset”, with one construct
per DV per study
Study_ID
001
002
002
003
003
004
005
005
006
006
006
ES
0.770
0.130
0.100
0.400
0.300
0.235
0.350
0.100
0.555
0.300
0.300
DVcat
1
2
4
3
1
1
4
2
2
1
4
Categories of verbal DV
1 = verbal IQ
2 = reading comprehension
3 = reading-lag
4 = spelling-lag
71
Organising effect sizes within study (4)
“hierarchical dataset”, with one DV per
study
Study_ID
001
002
003
004
005
006
ES
0.770
0.120
0.350
0.240
0.270
0.430
NOTE: alternative
to aggregating ESs
within study:
multilevel metaanalysis
72
Exercise: effect size calculation (4
method/result extracts from journals):
 Do boys have higher general (global) self-concept
(self-worth) than girls?
 Decide which effect size to use (d, r, OR)?
 Calculate appropriate effect sizes
73
Effect size literature
 Cohen, J. (1969). Statistical Power Analysis for the Behavioral
Sciences, 1st Edition, Lawrence Erlbaum Associates, Hillsdale (2nd
Edition, 1988).
 Cohen, J. (1994). The earth is round (p < .05), American Psychologist,
49, 997–1003.
 Gwet, K. (2001). Handbook of interrater reliability. How to estimate the
level of agreement between two of multiple raters. Gaithersburg:
STATAXIS Publishing.
 Huberty, C. J. (2002). A history of effect size indices. Educational and
Psychological Measurement, 62, 227-240.
 McCartney, K., & Rosenthal, R. (2000). Effect size, practical
importance, and social policy for children. Child Development, 71, 173180.
 Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative
studies: Applications, interpretations, and limitations. Contemporary
Educational Psychology, 25, 241-286.
74