Elements for making flowcharts etc

Download Report

Transcript Elements for making flowcharts etc

C2 Training: May 9 – 10, 2011
Introduction to Systematic Reviews
The Campbell Collaboration
www.campbellcollaboration.org
Part 1: The science of research synthesis
• Summarize existing empirical research to
• Inform policy and practice
• Provide directions for further research
• Using empirical evidence about the reliability and validity of
review methods
• Cochrane Collaboration (methodological reviews), Campbell
Collaboration
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Evidence for practice and policy
Adapted from: Gibbs (2003), Davies (2004)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Evidence for practice and policy
Adapted from: Gibbs (2003), Davies (2004)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Efficacy and Effectiveness
• Much attention paid to these issues
– Not more important than other topics, but
effects matter
• Much room for improvement in how we analyze,
synthesize, and understand treatment effects
– Even though many reviews deal with these
topics
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
The problem: Studies pile up
• “What can you build with thousands of bricks?” (Lipsey,
1997)
• Many studies are conducted on the same topic
• Which one(s) do we use? How do we use them?
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Rationale for Research Synthesis
Combining results of multiple studies
1. Provides more compelling evidence than results of any
single study
• Single studies can have undue influence on practice and
policy
• We don’t use single subject (N=1) designs to assess public
opinion, shouldn’t rely on single studies to answer important
questions (e.g., about treatment effects)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Rationale for Research Syntheses (cont.)
2. Provides new opportunities to investigate
What works best for whom under what conditions
• Why results may vary across studies that differ in
– Research designs
– Sample characteristics (populations/problems)
– Intervention/implementation
– Comparison conditions
– Measures
– Geo/cultural context/setting
• Using analyses that capitalize on natural variations across studies
•
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What can we build with thousands of bricks?
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
How do we build evidence?
• What are our blueprints?
• What are the raw materials?
• What methods are used to combine results
across studies?
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Blue prints
• Plans for review
• Reviews vary in amount of planning, transparency, rigor
• Three approaches:
– Traditional, narrative reviews (still very common in social
and behavioral sciences)
– Systematic reviews
– Meta-analysis
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Traditional reviews
• Convenience samples of published studies
• Narrative description of studies
• Cognitive algebra or “vote counting” to synthesize results
– Relies on statistical significance in primary studies, which
may be “underpowered” (too small or too weak to detect
effects)
• Decision rules are not transparent
• Vulnerable to many sources of bias…
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Publication bias
Studies with statistically significant, positive results are approx. 3 times more
likely to be published than similar studies with null or negative results (Song et
al., 2009, inception cohort)
– i.e., likelihood of publication is related to direction and significance of
results--net of influence of other variables
– (Dickersin, 2005; Scherer et al., 2004; Song et al., 2009; Torgerson, 2006)
• Sources of publication bias are complex
– Investigators less likely to submit null results for conference presentations
(Song et al., 2009) & publication (Dickersin, 2005; Song et al., 2009)
– Peer reviewers & editors less likely to accept/publish null results?
(Mahoney, 1977 vs. Song et al., 2009)
•
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Dissemination biases
• Studies with significant results are
– Published faster (Hopewell et al., 2001)
– Cited and reprinted more often (Egger & Smith)
– More likely to be published in English than other
languages (Egger, Zellweger-Zahner et al.)
• Easier to locate (esp. in English)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Outcome reporting bias
• Within studies with mixed results, significant results
are more likely to be
– reported (mentioned at all)
– fully reported (i.e., data provided)
– Chan et al., 2004, 2005; Williamson et al., 2006
– Recent article in the New York Review of Books on
Tamiflu
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Confirmation bias
• Tendency to seek and accept information that confirms prior
expectations (hypotheses) and ignore evidence to the
contrary
– (Bacon 1621/1960, Watson 1960, 1968; Mahoney, 1977;
Fugelsang et al., 1994; Nickerson, 1998; Schrag, 1999
• Allegiance bias
– Researchers’ preferences predict results (Luborsky et al.,
1999)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Other sources of bias
• Selection bias
– Trivial properties of studies or reports affect
recall and evaluation of information
– Memorable titles (Bushman & Wells, 2001)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Problems
• Publication and reporting biases are cumulative
(Altman, 2006)
– Tend to inflate estimates of effects
– Serve to maintain orthodoxy (popular
theories/treatments)
• These biases are ubiquitous, but often ignored
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Better blueprints: Systematic reviews
• Aim to minimize bias and error in the review
process
• Develop & follow pre-determined plan (protocol)
• Use transparent (well-documented, replicable)
procedures to locate, analyze, and synthesize
results of previous studies
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Systematic reviews (SRs)
Steps to reduce bias and error:
• Set explicit inclusion/exclusion criteria
• Develop and document strategies for locating all relevant
studies (regardless of publication status)
• Inter-rater agreement (reliability) on key decisions, data
extraction, coding
• Formal study quality assessment (risk of bias)
• Meta-analysis (when possible) to synthesize results across
studies
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Meta-analysis (MA)
Set of statistical procedures used to assess
• Averages across studies
• Variations across studies
• Potential sources of variation (moderators)
• Risk of bias (e.g., tests for publication & small sample
bias)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
•
Systematic reviews don’t always
include meta-analysis
– Might include narrative synthesis
(or no synthesis)
– Can include multiple metaanalyses
•
Meta-analyses are not always based
on systematic reviews
– Many use convenience sample of
published studies
– Vulnerable to publication and
dissemination biases
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Some “systematic reviews” aren’t
• Evidence-based standards for SRs & MA
– based on methodological research (Cochrane Library)
• Standards for conduct of SRs
– developed by Cochrane and Campbell Collaborations
(Higgins & Green, 2009)
• Standards for reporting SRs & MA
– PRISMA (Preferred Reporting Items for Systematic
reviews and Meta-Analyses; Moher et al., 2009)
• Standards not followed by US Evidence-based Practice
Centers, most peer-reviewed journals, etc.
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Quality of raw materials matters
• High quality materials (studies) needed to produce a
strong, reliable product
• “Best” materials depend what we are building (aims)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What are studies made of?
Multiple concepts and measures
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Building evidence
• A simple example: one study, 30 outcome
measures
• What did investigators make of results?
• What did reviewers make of results?
• (Littell, 2008)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
An Example: (Brunk et al. 1987)
Parent training vs Multisystemic Therapy
43 families of abused/neglected
children randomly assigned to
• Parent training (PT) groups or
Multisystemic Therapy (MST)
• 33 /43 families completed
treatment and provided data on
outcomes immediately after
treatment
• 30 outcomes (scales and
subscales)
•
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Results expected by chance (30 outcomes)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Results obtained (Brunk et al., 1987)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Results obtained (Brunk et al. 1987)
Parent Training vs Multisystemic Therapy
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What did the investigators make of these results?
• Data provided on
• all (7) statistically significant
results
• 12/22 non-significant results
• Outcome reporting bias
Data provided
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What did the investigators make of these results?
•
•
•
Abstract
Both groups showed decreased
psychiatric symptoms, reduced stress,
and reduced severity of identified
problems.
MST was more effective than PT at
restructuring parent-child relations.
PT was more effective than MST at
reducing identified social problems.
Data provided
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What did the investigators build?
A balanced report (with some missing data)
Abstract
Data provided
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What did the published reviews build?
“Parents in both groups reported decreases in psychiatric [symptoms]
and reduced overall stress....both groups demonstrated decreases in the
severity of the identified problems....[MST] improved [parent-child]
interactions, implying a decreased risk for maltreatment of children in the
MST condition” (p. 293).
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What did the published reviews build?
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What did the published reviews build?
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What did the published reviews build?
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Summary: Published reviews describing Brunk et al.
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Summary: Published reviews describing Brunk et al.
• Most reviews used a single phrase to
characterize results of this study,
highlighting advantages of one approach
(MST)
• Ignoring valuable information on relative
advantages, disadvantages, and
equivalent results of different
approaches
Outcome data
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Reviews include multiple studies
How do reviewers add them up? (synthesize results)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What methods do reviewers use?
• Analysis of reviews of research on effects of MST
published after 1996
– 86+ reviews
– more reviews than studies!
– Assessed 66 reviews
– Many “lite” reviews (rely on other reviews)
– 37 reviews cited one or more primary studies
(Littell, 2008)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
What methods do reviewers use?
• 37 reviews cited one or more primary studies
– Most were traditional, narrative summaries of convenience
samples of published reports (Littell, 2008)
– Most conclude that MST “works” (is consistently more
effective than alternatives)
– Some conclude that MST is effective across problems,
populations and settings
• citing Brunk et al. (1987) [only] as the evidence for effects
in cases of child abuse and neglect (Burns et al., 2000;
Kazdin, 2003; Kazdin & Weisz, 1998; Lansverk, 2007)
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Better ways to build evidence
• Using the science of research synthesis
• An example…
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
A Cochrane/Campbell review
• Multisystemic Therapy for social, emotional, and
behavioral problems in youth aged 10-17 (Littell, Popa,
Forsythe, 2004, 2005)
– Protocol published in 2004, review published in 2005
in the Cochrane Library and the Campbell Library
– Series of articles in Children and Youth Services
Review (includes debate with MST developers)
– Update underway now
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Campbell/Cochrane review
•
Search for relevant studies regardless of publication status
– Information retrieval specialists
– Multiple strategies for locating grey literature
•
Search for missing data from published studies
– Contacts with investigators
•
Formal data extraction & coding
– Reliability checks
•
Formal study quality assessment
– Identified methodological problems that had not be mentioned in the
literature
– Ranked studies by methodological quality
•
Separate syntheses (meta-analyses) for conceptually-distinct
outcomes
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Out of home placement
Treatment
Study or Subgroup
Control
Odds Ratio
Events
Total
Events
Total
Weight
IV, Random, 95% CI
01 Leschied 2002
70
211
63
198
31.1%
1.06 [0.70, 1.61]
04 Henggeler 1997
31
82
37
73
26.4%
0.59 [0.31, 1.12]
05 Henggeler 1999a
19
58
16
60
23.2%
1.34 [0.61, 2.96]
9
43
394
28
41
372
19.4%
100.0%
0.12 [0.05, 0.33]
0.61 [0.27, 1.39]
Odds Ratio
IV, Random, 95% CI
1.4.1 Incarceration
06 Henggeler 1992
Subtotal (95% CI)
Total events
129
144
Heterogeneity: Tau² = 0.57; Chi² = 18.15, df = 3 (P = 0.0004); I² = 83%
Test for overall effect: Z = 1.18 (P = 0.24)
1.4.2 Hospitalization
03 Henggeler 1999b
Subtotal (95% CI)
38
Total events
38
79
79
36
77
77
100.0%
100.0%
1.06 [0.56, 1.98]
1.06 [0.56, 1.98]
77
77
100.0%
100.0%
1.07 [0.56, 2.04]
1.07 [0.56, 2.04]
36
Heterogeneity: Not applicable
Test for overall effect: Z = 0.17 (P = 0.87)
1.4.3 Composite
02 Sundell
Subtotal (95% CI)
31
Total events
31
79
79
29
29
Heterogeneity: Not applicable
Test for overall effect: Z = 0.20 (P = 0.84)
0.1
0.2
0.5
Favours experimental
C2 Training Materials – Oslo – May 2011
1
2
5
Favours control
10
www.campbellcollaboration.org
Self-reported delinquency
Treatment
Study or Subgroup
Mean
Control
SD Total
Mean
Std. Mean Difference
SD Total
Weight
IV, Random, 95% CI
Std. Mean Difference
IV, Random, 95% CI
4.1.1 ITT
02 Sundell
19 Rowland 2005
Subtotal (95% CI)
29.64
46.66
79 33.45
8.47
19.82
15
94
4.13
42.42
77
29.9%
-0.08 [-0.40, 0.23]
7.43
16
93
7.6%
37.5%
0.29 [-0.42, 0.99]
-0.02 [-0.31, 0.26]
Heterogeneity: Tau² = 0.00; Chi² = 0.88, df = 1 (P = 0.35); I² = 0%
Test for overall effect: Z = 0.16 (P = 0.87)
4.1.5 TOT
04 Henggeler 1997
0.58
0.57
75
0.75
0.62
65
27.3%
-0.28 [-0.62, 0.05]
05 Henggeler 1999a
32
38
54
30
36
54
22.7%
0.05 [-0.32, 0.43]
06 Henggeler 1992
Subtotal (95% CI)
2.9
5.1
33
162
8.6
16.5
23
142
12.4%
62.5%
-0.50 [-1.04, 0.04]
-0.21 [-0.50, 0.08]
235
100.0%
-0.13 [-0.33, 0.07]
Heterogeneity: Tau² = 0.03; Chi² = 3.17, df = 2 (P = 0.21); I² = 37%
Test for overall effect: Z = 1.40 (P = 0.16)
Total (95% CI)
256
Heterogeneity: Tau² = 0.01; Chi² = 4.94, df = 4 (P = 0.29); I² = 19%
Test for overall effect: Z = 1.27 (P = 0.20)
C2 Training Materials – Oslo – May 2011
-1
-0.5
0
0.5
Favours treatment Favours control
1
www.campbellcollaboration.org
Family cohesion
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Summary: SR of MST
• Effects are not consistent across studies
– Few studies, most conducted by program
developers in USA
– All studies have mixed results across outcomes,
except those that have null results on all outcomes
• Contrary to conclusions of most published reviews
– Which suggest the effectiveness of MST is well
established, consistent across studies
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org
Why traditional reviews and well-meaning experts
can be misleading
• Scholars are human
• Rely on “natural” methods
to filter and synthesize data
• Human brain
– Good at detecting patterns,
maintaining homeostasis,
defending territory
– Bad at complex math,
revising beliefs (Runciman,
2007)
C2 Training Materials – Oslo – May 2011
Research synthesis is too
complex for informal
methods, “cognitive algebra”
www.campbellcollaboration.org
Conclusions
• Different review methods produce different results
– Traditional methods are “haphazard” (Petticrew &
Roberts, 2006) and can lead to the wrong conclusions
– Scientific methods are needed to minimize bias and error
• “Science is cumulative but scientists rarely cumulate
evidence scientifically” (Chalmers, Hedges, Cooper, 2002)
• We can use scientific principles and methods to synthesize
evidence…
C2 Training Materials – Oslo – May 2011
www.campbellcollaboration.org