Transcript Document

Reading and interpreting
quantitative intervention
research syntheses: an
introduction
Steve Higgins, Durham University
Robert Coe, Durham University
Mark Newman, EPPI Centre, IoE, London University
James Thomas, EPPI Centre, IoE, London University
Carole Torgerson, IEE, York University
Part 2
Acknowledgements
•
•
•
•
•
This presentation is an outcome of the work of the ESRC-funded Researcher
Development Initiative: “Training in the Quantitative synthesis of Intervention
Research Findings in Education and Social Sciences” which ran from 20082011.
The training was designed by Steve Higgins and Rob Coe (Durham University),
Carole Torgerson (Birmingham University) and Mark Newman and James
Thomas, Institute of Education, London University.
The team acknowledges the support of Mark Lipsey, David Wilson and Herb
Marsh in preparation of some of the materials, particularly Lipsey and Wilson’s
(2001) “Practical Meta-analysis” and David Wilson’s slides at:
http://mason.gmu.edu/~dwilsonb/ma.html (accessed 9/3/11).
The materials are offered to the wider academic and educational community
community under a Creative Commons licence: Creative Commons AttributionNonCommercial-ShareAlike 3.0 Unported License
You should only use the materials for educational, not-for-profit use and you
should acknowledge the source in any use.
Session 2
1.30 pm
Reading and interpreting a metaanalysis
Overview of challenges to effective
meta-analysis
3.00 pm
Break
3.15 pm
Summary, conclusions and
evaluation
Finish
4.00 pm
Recap
• Should be conducted as part of a systematic
(or at least transparent) review
• Meta-analysis is the statistical combination of
research study findings to answer a specific
question
• Uses a common metric - effect size - to
aggregate and explore the findings across
studies
Stages of synthesis
What data
are
available?
What is the
question?
By addressing review
question according to
conceptual framework
Theories and
assumptions in the
review question
What new
research
questions
emerge?
Can the
conceptual
framework be
developed?
What does the
result mean?
(conclusions)
What are
the patterns
in the data?
Including study,
intervention,
outcomes and
participant
characteristics
How does
integrating the
data answer the
question?
To address the question
(including theory testing or
development).
How robust is
the synthesis?
What is the
result?
For quality, sensitivity,
coherence & relevance.
Cooper, H.M. (1982) Scientific Guidelines for Conducting Integrative Research Reviews Review Of Educational Research 52; 291
See also: Popay et al. (2006) Guidance on the Conduct of Narrative Synthesis in Systematic Reviews. Lancaster: Institute for Health Research, Lancaster University.
http://www.lancs.ac.uk/fass/projects/nssr/research.htm
Procedures for a meta
analysis
• Key question
• Search/ retrieval
strategy
• Inclusion/ exclusion
criteria
• Coding
• Analysis
• Synthesis
What is the question?
What data are available?
What patterns are in the data?
How does integrating the data
answer the question?
How robust is the synthesis?
What us the result?
What are the conclusions?
Overview
• Interpreting a meta-analysis
– Forest plots and other forms of data presentation
• Issues in meta-analysis
– Research designs and quality
– Heterogeneity
– Models for pooling the results
Forest plots
• Effective way of presenting results
– Studies, effect sizes, confidence intervals
– Provides an overview of consistency of
effects
– Summarises an overall effect (with
confidence interval)
• Useful visual model of a meta-analysis
Anatomy of a forest plot
Study effect size (with C.I.)
N of
study
Line of no
effect
C.I
Studies
Study
effect
size
Weighting
of study
in metaanalysis
Pooled effect size
Pooled
effect
size
Exercise
1.
What is the effect size in the Bletchman et al. study
2.
Is the effect size in the Kelley et al. study bigger or smaller than in the Patrick & Marsh
study?
3.
How many subjects were in the Patrick and Marsh Study?
4.
What is the 95% confidence interval of the pooled effect size?
5.
What is the weighting given to the Bletchman study in the meta-analysis?
6.
How does the confidence interval differ between the Kelley et al. study and the other two?
A Systematic Review of the
Research Literature on the Use of
Phonics in the Teaching of Reading
and Spelling
Torgerson, Brooks and Hall, 2006
Department for Education and Skills (DfES)
commissioned the Universities of York and
Sheffield to conduct a systematic review of
experimental research on the use of phonics
instruction in the teaching of reading and spelling.
This review is based on evidence from
randomised controlled trials (RCTs).
Interpreting a forest plot
Have a look at the forest plot from the metaanalysis of phonics interventions on the
handout. These are RCTs with a separate
analysis for lower attaining (Cluster 0) and
normally attaining pupils (Cluster 1)
– What do you notice?
– Work in a pair or small group to ‘read’ it to each
other
– What questions can you raise about the metaanalysis?
Forest plot
Study
Standardised mean difference
(95% CI)
% Weight
Ability==0
Greaney
Lovett89
Lovett90
Martinussen
O'Connor
Torgesen99
Torgesen01
Umbach
Subtotal
0.30 (-0.36, 0.95)
0.22 (-0.14, 0.57)
-0.20 (-0.85, 0.46)
0.46 (-0.30, 1.21)
0.57 (-0.59, 1.73)
0.07 (-0.34, 0.48)
-0.31 (-0.87, 0.24)
2.77 ( 1.77, 3.77)
0.21 ( 0.01, 0.41)
6.8
23.1
6.9
5.2
2.2
17.3
9.5
2.9
73.9
Ability==1
Haskell
Johnston
Leach
Skailand
Subtotal
0.07 (-0.73, 0.87)
0.97 ( 0.43, 1.51)
0.84 (-0.08, 1.75)
-0.17 (-0.78, 0.44)
0.45 ( 0.11, 0.78)
4.6
10.1
3.5
8.0
26.1
Overall
0.27 ( 0.10, 0.45)
100.0
-3.7709
0
Standardised mean difference
Favours Control
Favours Phonics
3.77098
Issues to consider
Is it reasonable to combine the results of
the individual studies ?
i) Study design/ quality
ii) Are the studies too different (heterogeneity)
• Methodological heterogeneity
• Educational heterogeneity
• Statistical heterogeneity
Assessing between study
heterogeneity
• When effect sizes differ consistent with
chance error, the effect size estimate is
considered to be homogeneous (unique true
effect).
• When the variability in effect sizes is greater
than expected by chance, the effects are
considered to be heterogeneous
• Presence of heterogeneity affects the
process of the meta-analysis
Methodological quality
• Traditional reviews privilege methodological
rigour
– Low quality studies have higher effect sizes (Hattie
Biggs & Purdie, 1996)
– No difference (Marzano, 1998)
– High quality studies, higher effect sizes (Lipsey &
Wilson, 1993)
• Depends on your definition of quality
– Assessing quality
– Dimensions of quality
– Exploring its impact
Methodological quality
• What about ‘low quality’ studies?
– All studies are likely to have weaknesses
(methodological quality is on a range or
continuum)
– Exclusivity restricts the scope and scale of the
analysis and generalizability
– Inclusivity may weaken confidence in the findings
– Some methodological quality is in the “eye-of-thebeholder”
– Needs a balance appropriate to the key research
question
Which designs?
• RCTs only?
• RCTs plus rigorously controlled
experimental and quasi-experimental
designs?
• All RCTs, and experimental designs?
• All pre-post comparisons?
Task: Diamond Ranking
• Have a look at the different
descriptions of research
– Which do you think it would
be most appropriate for a
meta-analysis?
Which would be the least
appropriate?
– Can you place or rank the
others?
Most important
Least important
Methodological heterogeneity
• Study design
• Sample characteristics
• Assessment (measures, timing)
Educational heterogeneity
• ‘Clinical’or ‘pedagogical’ heterogeneity
• Systematic variation in response to the
intervention
– Teacher level effects
– Pupil level effects
Statistical
• Due to chance
• Unexplainable
Statistical methods to identify
heterogeneity
• Presence
– Q statistic (Cooper & Hedges, 1994)
• Significance level (p-value)
– 2
– 2
• Extent
– I2 (Higgins & Thompson, 2002)
•
If it exceeds 50%, it may be advisable not to combine the studies
All have low power with a small number of
studies (Huedo-Medina et al. 2006)
Exploring heterogeneity
• In a meta-analysis, exploring heterogeneity of
effect can be as important as reporting
averages
• Exploring to what extent the variation can be
explained by factors in the coding of studies
(age, gender, duration of intervention etc)
• Forming sub-groups with greater
homogeneity
• Identifying the extent of the variation through
further analysis
Coding for exploration
• Factors which may relate to variation
– The intervention
• E.g. duration, intensity, design, implementation
– The sample
• E.g. age, gender, ethnicity, particular needs
– The research
• E.g. design (RCT, quasi-experimental), quality,
tests/outcomes, comparison group
Pooling the results
• In a meta-analysis, the effects found across studies
are combined or ‘pooled’ to produce a weighted
average effect of all the studies-the summary effect.
• Each study is weighted according to some measure
of its importance.
• In most meta-analyses, this is achieved by giving a
weight to each study in inverse proportion to the
variance of its effect.
Fixed effect model
• The difference between the studies is
due to chance
– Observed study effect = Fixed effect + error
Fixed effect model
Each study is seen as
being a sample from
a distribution of
studies, all estimating
the same overall
effect, but differing
due to random error
Random effects model
Assumes there are two component of
variation
1. Due to differences within the studies (e.g.
different design, different populations,
variations in the intervention, different
implementation, etc.)
2. Due to sampling error
Random effects model
There are two separable effects that can be
measured
1. The effect that each study is estimating
2. The common effect that all studies are
estimating
Observed study effect = study specific (random) effect + error
Random effects model
• Each study is seen
as representing the
mean of a
distribution of
studies
• There is still a
resultant overall
effect size
Which model?
• “Random effects” model assumes a different
underlying effect for each study.
• This model gives relatively more weight to
smaller studies and wider confidence
intervals than fixed effect models.
• The use of this model is recommended if
there is heterogeneity between study results.
• Usually recommended in education
Extent (on
% scale)
Study effect size (with C.I.)
Degrees of
freedom
Significance level
Other forms of data
presentation
• Box plots
• Mean and range
Presenting Results
•
Stem and
Leaf Plot
-.4
-.3
-.2
-.1
-.0
.0
.1
.2
.3
.4
.5
.6
.7
.8
.9
4
6
52
741
9200
0589
00122788
0338
004445788
06699
348
67
1
Interpreting meta-analysis results
• Conceptual
• Scope and scale searches
• Robustness of
evidence
• Wider applicability
What is the question?
How does integrating the data
answer the question?
How robust is the synthesis?
What does the result mean?
How many studies?
• How many studies and of what quality
would be needed to make a ‘strong
recommendation’ or for ‘strong evidence of
effect’?
• On what scale?
– How many participants/ sites, 350, 500?
• Is there an empirical answer?
Issues and challenges in
meta-analysis
• Conceptual
– Comparability
– Reductionist
– Atheoretical
• Technical
– Heterogeneity
– Methodological quality
– Publication bias
Comparability
• Apples and oranges
–
–
–
–
–
Same test
Different measures of the same construct
Different measures of different constructs
What question are you trying to answer?
How strong is the evidence for this?
“Of course it mixes apples and oranges; in the study of fruit,
nothing else is sensible; comparing apples and oranges is
the only endeavor worthy of true scientists; comparing apples
to apples is trivial” (Glass, 2000).
Reductionist or ‘flat earth’
critique
The “flat earth” criticism is based on Lee Cronbach’s
assertion that a meta-analysis looks at the “big picture”
and provides only a crude average.
According to Cronbach:
“… some of our colleagues are beginning to sound like a Flat Earth Society.
They tell us that the world is essentially simple: most social phenomena are
adequately described by linear relations; one-parameter scaling can
discover coherent variables independent of culture and population; and
inconsistencies among studies of the same kind will vanish if we but
amalgamate a sufficient number of studies…The Flat Earth folk seek to
bury any complex hypothesis with an empirical bulldozer…” (Cronbach,
1982, in Glass, 2000).
Over simplification - the answer is .42?
Empirical … so not
theoretical?
• What is your starting point?
• Conceptual/ theoretical critique
– Marzano, 1998
– Hattie, 2008
– Sipe and Curlette, 1997
• Theory testing
• Theory generating
Remaining technical issues
•
•
•
•
Interventions
Publication bias
(Methodological quality)
(Homogeneity/ heterogeneity)
Interventions
• “Super-realisation bias” (Cronbach & al. 1980)
– Small-scale interventions tend to get larger
effects
– Enthusiasm, attention to detail, quality of
personal relationships
Publication bias
• The ‘file drawer problem’
– Statistically significant (positive) findings
– Smaller studies need larger effect size to reach
significance
– Large studies tend to get smaller effect sizes
• Replications difficult to get published
• Sources of funding
Dealing with publication bias
• Trim and fill techniques
• ‘Funnel plot’ sometimes used to explore
this
Scatterplot of the effects from individual studies
(horizontal axis) against a study size (vertical axis)
Dealing with heterogeneity
• Tackle variation in effect sizes
• Investigate to find clusters (moderator
variables)
• Explore against coded variables
• Evaluate whether a pooled result is an
appropriate answer to the question
.42?
Summary
• “Replicable and defensible” method for
synthesizing findings across studies (Lipsey &
Wilson, 2001)
• Identifies gaps in the literature, providing a
sound basis for further research
• Indicates the need for replication in education
• Facilitates identification of patterns in the
accumulating results of individual evaluations
• Provides a frame for theoretical critique
Some useful websites
EPPI, Institute of Education, London
http://eppi.ioe.ac.uk/
The Campbell Collaboration
http://www.campbellcollaboration.org/
Best Evidence Encyclopedia, Johns Hopkins
http://www.bestevidence.org/
Best Evidence Synthesis (BES), NZ
http://www.educationcounts.govt.nz/themes/BES
Institute for Effective Education (York)
http://www.york.ac.uk/iee/research/#reviews
Google Scholar
http://scholar.google.com/
Keyword(s) + meta-analysis
Evaluation
• Please complete the evaluation sheet
• Feedback will be used:
– to inform other sessions/ revise materials
for independent use
– For ESRC evaluation
Acknowledgements
•
•
•
•
•
This presentation is an outcome of the work of the ESRC-funded Researcher
Development Initiative: “Training in the Quantitative synthesis of Intervention
Research Findings in Education and Social Sciences” which ran from 20082011.
The training was designed by Steve Higgins and Rob Coe (Durham University),
Carole Torgerson (Birmingham University) and Mark Newman and James
Thomas, Institute of Education, London University.
The team acknowledges the support of Mark Lipsey, David Wilson and Herb
Marsh in preparation of some of the materials, particularly Lipsey and Wilson’s
(2001) “Practical Meta-analysis” and David Wilson’s slides at:
http://mason.gmu.edu/~dwilsonb/ma.html (accessed 9/3/11).
The materials are offered to the wider academic and educational community
community under a Creative Commons licence: Creative Commons AttributionNonCommercial-ShareAlike 3.0 Unported License
You should only use the materials for educational, not-for-profit use and you
should acknowledge the source in any use.