Effect size - Department of Education
Download
Report
Transcript Effect size - Department of Education
Funded through the ESRC’s Researcher
Development Initiative
Session 1.2: Introduction
Prof. Herb Marsh
Ms. Alison O’Mara
Dr. Lars-Erik Malmberg
Department of Education,
University of Oxford
Meta-analysis is an increasingly popular tool for
summarising research findings
Cited extensively in research literature
Relied upon by policymakers
Important that we understand the method, whether
we conduct or simply consume meta-analytic
research
Should be one of the topics covered in all
introductory research methodology courses
Meta-analysis: a statistical analysis of a set of
estimates of an effect (the effect sizes), with the
goal of producing an overall (summary) estimate of
the effects. Often combined with analysis of
variables that moderate/predict this effect
Systematic review: a comprehensive, critical,
structured review of studies dealing with a certain
topic. They are characterised by a scientific,
transparent approach to study retrieval and
analysis
Most meta-analyses start with a systematic review
Coding: the process of extracting the information
from the literature included in the meta-analysis.
Involves noting the characteristics of the studies in
relation to a priori variables of interest (qualitative)
Effect size: the numerical outcome to be analysed
in a meta-analysis; a summary statistic of the data
in each study included in the meta-analysis
(quantitative)
Summarise effect sizes: central tendency,
variability, relations to study characteristics
(quantitative)
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Comparison of treatment & control groups
What is the effectiveness of a reading skills program for
treatment group compared to an inactive control group?
Pretest-posttest differences
Is there a change in motivation over time?
What is the correlation between two variables
What is the relation between teaching effectiveness and
research productivity
Moderators of an outcome
Does gender moderate the effect of a peer-tutoring
program on academic achievement?
Do you wish to generalise your findings to other
studies not in the sample?
Do you have multiple outcomes per study. e.g.:
achievement in different school subjects;
5 different personality scales;
multiple criteria of success
Such questions determine the choice of metaanalytic model
fixed effects
random effects
multilevel
Brown, S. A. (1990). Studies of educational
interventions and outcomes in diabetic adults:
A meta-analysis revisited. Patient Education and
Counseling, 16,189-215
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Need to have explicit inclusion and exclusion criteria
The broader the research domain, the more detailed
they tend to become
Refine criteria as you interact with the literature
Components of a detailed criteria
distinguishing features
research respondents
key variables
research methods
cultural and linguistic range
time frame
publication types
Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for
developing a coding scheme for meta-analysis. Western Journal of
Nursing Research, 25, 205-222
Search electronic databases (e.g., ISI,
Psychological Abstracts, Expanded Academic
ASAP, Social Sciences Index, PsycINFO, and
ERIC)
Examine the reference lists of included studies
to find other relevant studies
If including unpublished data, email researchers
in your discipline, take advantage of Listservs,
and search Dissertation Abstracts International
“motivation” OR “job satisfaction” produces
ALL articles that contain EITHER motivation OR
job satisfaction anywhere in the text
inclusive, larger yield
“motivation” AND “job satisfaction” will capture
only those subsets that have BOTH motivation
AND job satisfaction anywhere in the text
restrictive, smaller yield
Check abstract & title
NO
DISCARD
NO
DISCARD
YES
Check the participants
and results sections
YES
COLLECT
Inclusion process
usually requires
several steps to
cull inappropriate
studies
Example from Bazzano,
L. A., Reynolds, K.,
Holder, K. N., & He, J.
(2006).Effect of Folic
Acid Supplementation on
Risk of Cardiovascular
Diseases: A Metaanalysis of Randomized
Controlled Trials. JAMA,
296, 2720-2726
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
The researcher must have a thorough knowledge of
the literature.
The process typically involves (Brown et al.,
2003):
a) reviewing a random subset of studies to be
synthesized,
b) listing all relevant coding variables as they appear
during the review,
c) including these variables in the coding sheet, and
d) pilot testing the coding sheet on a separate subset
of studies.
Coded data usually fall into the following four basic
categories:
1. methodological features
Study identification code
Type of publication
Year of publication
Country
Participant characteristics
Study design (e.g., random assignment, representative
sampling)
2. substantive features
Variables of interest (e.g., theoretical framework)
3. study quality
‘Total’ measure of quality & study design
4. outcome measures - Effect size information
The code book guides the coding process
Almost like a dictionary or manual
“...each variable is theoretically and operationally
defined to facilitate intercoder and intracoder
agreement during the coding process. The
operational definition of each category should be
mutually exclusive and collectively exhaustive”
(Brown et al., 2003, p. 208).
Code Sheet
__
1 Study ID
_99_
Year of publication
__
2
Publication type (1-5)
__
1
Geographical region (1-7)
_87_ _ _ Total sample size
_41_ _
Total number of males
_46_ _
Total number of females
Code Book
Publication type (1-5)
1. Journal article
2. Book/book chapter
3. Thesis or doctoral
dissertation
4. Technical report
5. Conference paper
From Brown, et al. (2003).
Code sheet = Table 1.
Code book = Table 4.
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Random selection of papers coded by both
coders
Meet to compare code sheets
Where there is discrepancy, discuss to reach
agreement
Amend code materials/definitions in code book
if necessary
May need to do several rounds of piloting, each
time using different papers
Coding should ideally be done independently by 2
or more researchers to minimise errors and
subjective judgements
Ways of assessing the amount of agreement
between the raters:
Percent agreement
Cohen’s kappa coefficient
Correlation between different raters
Intraclass correlation
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Lipsey & Wilson (2001) present many formulae for
calculating effect sizes from different information
However, need to convert all effect sizes into a
common metric, typically based on the “natural”
metric given research in the area. E.g.:
Standardized mean difference
Odds-ratio
Correlation coefficient
Standardized mean difference
Group contrasts
Treatment groups
Naturally occurring groups
Inherently continuous construct
Odds-ratio
Group contrasts
Treatment groups
Naturally occurring groups
Inherently dichotomous construct
Correlation coefficient
Association between variables
ES
X Males X Females
SD pooled
ES
ad
bc
ES r
Means and standard
deviations
Correlations
P-values
F-statistics
t-statistics
d
SE
From Brown et al. (2003).
Table 3
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Includes the entire population of studies to be
considered; do not want to generalise to other
studies not included (e.g., future studies).
All of the variability between effect sizes is due to
sampling error alone. Thus, the effect sizes are only
weighted by the within-study variance.
Effect sizes are independent.
There are 2 general ways of conducting a fixed
effects meta-analysis: ANOVA & multiple regression
The analogue to the ANOVA homogeneity analysis
is appropriate for categorical variables
Looks for systematic differences between groups of
responses within a variable
Multiple regression homogeneity analysis is more
appropriate for continuous variables and/or when
there are multiple variables to be analysed
Tests the ability of groups within each variable to predict
the effect size
Can include categorical variables in multiple regression
as dummy variables. (ANOVA is a special case of
multiple regression)
Is only a sample of studies from the entire
population of studies to be considered; want to
generalise to other studies not included (including
future studies).
Variability between effect sizes is due to sampling
error plus variability in the population of effects.
Effect sizes are independent.
Variations in sampling schemes can introduce
heterogeneity to the result, which is the presence of
more than one intercept in the solution
Heterogeneity: between-study variation in effect
estimates is greater than random (sampling)
variance
Could be due to differences in the study design,
measurement instruments used, the researcher, etc
Random effects models attempt to account for
between-study differences
If the homogeneity test is rejected (it almost always
will be), it suggests that there are larger differences
than can be explained by chance variation (at the
individual participant level). There is more than one
“population” in the set of different studies.
The random effects model helps to determine how
much of the between-study variation can be
explained by study characteristics that we have
coded.
The total variance associated with the effect sizes
has two components, one associated with
differences within each study (participant level
variation) and one between study variance
Meta-analytic data is inherently hierarchical (i.e.,
effect sizes nested within studies) and has random
error that must be accounted for.
Effect sizes are not necessarily independent
Allows for multiple effect sizes per study
Level 2: study component
Publications
Level 1: outcome-level component
Effect sizes
Similar to a multiple regression equation, but
accounts for error at both the outcome (effect size)
level and the study level
Start with the intercept-only model, which
incorporates both the outcome-level and the studylevel components (analogous to the random effects
model multiple regression)
Expand model to include predictor variables, to
explain systematic variance between the study
effect sizes
Fixed, random, or multilevel?
Generally, if more than one effect size per study is
included in sample, multilevel should be used
However, if there is little variation at study level
and/or if there are no predictors included in the
model, the results of multilevel modelling metaanalyses are similar to random effects models
Do you wish to generalise your findings to other
studies not in the sample?
Yes – random
No – fixed
effects or
effects
multilevel
Do you have multiple outcomes per study?
Yes –
multilevel
No – random
effects or
fixed effects
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Publication bias
Fail-safe N (Rosenthal, 1991)
Trim and fill procedure (Duval & Tweedie, 2000a, 2000b)
Sensitivity analysis
E.g., Vevea & Woods (2005)
Power analysis
E.g., Muncer, Craigie, & Holmes (2003)
Study quality
Quality weighting (Rosenthal, 1991)
Use of kappa statistic in determining validity of quality
filtering for meta-analysis (Sands & Murphy, 1996).
Regression with “quality” as a predictor of effect size
(see Valentine & Cooper, 2008)
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for developing a
coding scheme for meta-analysis. Western Journal of Nursing Research, 25, 205-222.
Duval, S., & Tweedie, R. (2000a). A Nonparametric "Trim and Fill" Method of
Accounting for Publication Bias in Meta-Analysis. Journal of the American Statistical
Association, 95, 89-98.
Duval, S., & Tweedie, R. (2000b). Trim and fill: A simple funnel-plot-based method of
testing and adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA:
Sage Publications.
Muncer, S. J., Craigie, M., & Holmes, J. (2003). Meta-analysis and power: Some
suggestions for the use of power in research synthesis. Understanding Statistics, 2, 112.
Rosenthal, R. (1991). Quality-weighting of studies in meta-analytic research.
Psychotherapy Research, 1, 25-28.
Sands, M. L., & Murphy, J. R. (1996). Use of kappa statistic in determining validity of
quality filtering for meta-analysis: A case study of the health effects of
electromagnetic radiation. Journal of Clinical Epidemiology, 49, 1045-1051.
Valentine, J. C., & Cooper, H. M. (2008). A systematic and transparent approach for
assessing the methodological quality of intervention effectiveness research: The
Study Design and Implementation Assessment Device (Study DIAD). Psychological
Methods, 13, 130-149.
Vevea, J. L., & Woods, C. M. (2005). Publication bias in research synthesis: Sensitivity
analysis using a priori weight functions. Psychological Methods, 10, 428–443.