The Appraisal, Extraction and Pooling of

Transcript The Appraisal, Extraction and Pooling of

Appraisal, Extraction and Pooling of
Quantitative Data for Reviews of Effects
- from experimental, observational and
descriptive studies
Introduction
• Recap of Introductory
module
–
–
–
–
Developing a question (PICO)
Inclusion Criteria
Search Strategy
Selecting Studies for Retrieval
• This Module considers
how to appraise, extract
and synthesize evidence
from experimental,
observational and
descriptive studies.
Program Overview
Day 1
Time
Session
0900 Introductions and overview of Module 3
0930
Session 1: The Critical Appraisal of Studies
1000
1030
Morning Tea
Session 2: Appraising RCTs and experimental
studies
Session 3: Appraising observational Studies
1145
1230
1330
Lunch
1415
Session 4: Study data and data extraction
1515
1530
Afternoon tea
1600
Session 5: Protocol development
1700
End
Group Work
Group Work 1: Critically appraising RCTs and
experimental studies. Report back
Group Work 2: Critically appraising
observational studies. Report back
Group Work 3: Data extraction. Report back
Protocol development
Program Overview
Day 2
Time
Session
Group Work
0900
Overview of Day 1
0915
Session 6: Data analysis and meta-analysis
1030
Morning Tea
1100
Session 7: Appraisal extraction and synthesis
using JBI MAStARI
1230
Lunch
1330
Session 8: Protocol Development
Protocol development
1415
Session 9: Assessment
MCQ Assessment
1445
Afternoon tea
1500
Session 10: Protocol Presentations
1700
End
Group Work 4: MAStARI trial.
Report back
Protocol Presentations
Session 1: The Critical Appraisal of Studies
1004 references
172 duplicates
832 references
Scanned Ti/Ab
117 studies
retrieved
35 studies for
Critical Appraisal
715 do not meet
Incl. criteria
82 do not meet
Incl. criteria
Why Critically
Appraise?
• Combining results of
poor quality research
may lead to biased or
misleading estimates
of effectiveness
The Aims of Critical Appraisal
• To establish validity
– to establish the risk of bias
Internal & External Validity
Used locally?
Relationship between IV
and EV?
Internal
Validity
External
Validity
Strength & Magnitude
How internally
valid is the
study?
Strength
How large is
the effect?
Magnitude & Precision
Clinical Significance and Magnitude of
Effect
• Pooling of homogeneous studies of effect or harm
• Weigh the effect with cost/resource of change
• Determine precision of estimate
Assessing the Risk of Bias
• Numerous tools are available for assessing methodological
quality of clinical trials and observational studies.
• JBI requires the use of a specific tool for assessing risk of
bias in each included study.
• ‘High quality’ research methods can still leave a study at
important risk of bias. (e.g. when blinding is impossible)
• Some markers of quality are unlikely to have direct
implications for risk of bias (e.g ethical approval, sample
size calculation)
Sources of Bias
•
•
•
•
Selection
Performance
Detection
Attrition
Selection Bias
• Systematic differences between participant
characteristics at the start of a trial
• Systematic differences occur during allocation to
groups
• Can be avoided by concealment of allocation of
participants to groups
Type of bias
Quality
assessment
Population
Allocation
Selection
Allocation
concealment
Treatment
Control
Performance Bias
• Systematic differences in the intervention of
interest, or the influence of concurrent interventions
• Systematic differences occur during the intervention
phase of a trial
• Can be avoided by blinding of investigators and/or
participants to group
Type of bias
Quality
assessment
Selection
Allocation
concealment
Performance
Blinding
Population
Allocation
Treatment Control
Exposed to
Not
intervention exposed
Detection Bias
• Systematic differences in how the outcome is
assessed between groups
• Systematic differences occur at measurement
points during the trial
• Can be avoided by blinding of outcome assessor
Type of bias
Quality
assessment
Population
Allocation
Selection
Performance
Allocation
concealment
Blinding
Detection
Blinding
Treatment
Control
Exposed to
intervention
Not
exposed
Population Population
Attrition Bias
• Systematic differences in withdrawals and
exclusions between groups
• Can be avoided by:
– Accurate reporting of losses and reasons for withdrawal
– Use of ITT analysis
Type of bias
Quality
assessment
Population
Allocation
Selection
Allocation
concealment
Treatment
Control
Performance
Blinding
Exposed to
intervention
Not exposed
Detection
Blinding
Population
Population
Attrition
ITT follow up
Follow up
Follow up
Ranking the “Quality” of Evidence of
Effectiveness
• To what extent does the study design minimize
bias/demonstrate validity
• Generally linked to actual study design in ranking
evidence of effectiveness
• Thus, a “hierarchy” of evidence is most often used,
with levels of quality equated with specific study
designs
Hierarchy of Evidence-Effectiveness
EXAMPLE 1
Grade I - systematic reviews of all relevant RCTs.
Grade II - at least one properly designed RCT
Grade III-1 - controlled trials without randomisation
Grade III-2 - cohort or case control studies
Grade III-3 - multiple time series, or dramatic results from
uncontrolled studies
• Grade IV - opinions of respected authorities & descriptive
studies.
(NH&MRC 1995)
•
•
•
•
•
Hierarchy of Evidence-Effectiveness
EXAMPLE 2
Grade I - systematic review of all relevant RCTs
Grade II - at least one properly designed RCT
Grade III-1 - well designed pseudo-randomised controlled trials
Grade III-2 - cohort studies, case control studies, interrupted
time series with a control group
• Grade III-3 - comparative studies with historical control, two or
more single-arm studies, or interrupted time series without
control group
• Grade IV - case series
(NH&MRC 2001)
•
•
•
•
JBI Levels of Evidence - Effectiveness
Level of
Evidence
Effectiveness
E (1-4)
1
SR (with homogeneity) of experimental studies (e.g. RCT with
concealed allocation)
OR 1 or more large experimental studies with narrow confidence
intervals
2
One or more smaller RCTs with wider confidence intervals
OR Quasi-experimental studies (e.g. without randomisation)
3
3a. Cohort studies (with control group)
3b. Case-controlled
3c. Observational studies (without control groups)
4
Expert opinion, or based on physiology, bench research or
consensus
The Critical Appraisal Process
• Every review must set out to use an explicit
appraisal process. Essentially,
– A good understanding of research design is required in
appraisers; and
– The use of an agreed checklist is usual.
Session 2: Appraising RCTs and
experimental studies
RCTs
• RCTs and quasi (pseudo) RCTs provide the most robust form
of evidence for effects
– Ideal design for experimental studies
• They focus on establishing certainty through measurable
attributes
• They provide evidence related to:
– whether or not a causal relationship exists between a stated
intervention, and a specific, measurable outcome, and
– the direction and strength of the relationship
• These characteristics are associated with the reliability and
generalizability of experimental studies
Randomised Controlled Trials
• Evaluate effectiveness of a
treatment/therapy/intervention
• Randomization critical
• Properly performed RCTs reduce bias, confounding
factors, and results by chance
Experimental studies
• Three essential elements
– Randomisation (where possible)
– Researcher-controlled manipulation of the independent
variable
– Researcher control of the experimental situation
Other experimental studies
• Quasi-experiments without a true method of
randomization to treatment groups
• Quasi experiments
– Quasi-experimental designs without control groups
– Quasi-experimental designs that use control groups but
not pre-tests
– Quasi-experimental designs that use control groups and
pre-tests
Sampling
• Selecting participants from population
• Inclusion/exclusion criteria
• Sample should represent the population
Sampling Methods
•
•
•
•
Probabilistic (Random) sampling
Consecutive
Systematic
Convenience
Randomization
Randomization Issues
• Simple methods may result in unequal group sizes
– Tossing a coin or rolling a dice
– Block randomization
• Confounding factors due to chance imbalances
– stratification – prior to randomization
– ensures that important baseline characteristics are even
in both groups
Block Randomization
• All possible combinations ignoring unequal
allocation
1 AABB
2 ABAB
3 ABBA
4 BABA
5 BAAB
6 BBAA
• Use table of random numbers and generate
allocation from sequence e.g. 533 2871
• Minimize bias by changing block size
Stratified Randomization
Blinding
•
•
•
•
Method to eliminate bias from human behaviour
Applies to participants, investigators, assessors etc
Blinding of allocation
Single, double and triple blinded
Blinding
Schulz, 2002
Intention to Treat
• ITT analysis is an analysis based on the initial
treatment intent, not on the treatment eventually
administered.
• Avoids various misleading artifacts that can arise in
intervention research.
– E.g. if people who have a more serious problem tend to drop out at
a higher rate, even a completely ineffective treatment may appear
to be providing benefits if one merely compares those who finish
the treatment with those who were enrolled in it.
• Everyone who begins the treatment is considered to
be part of the trial, whether they finish it or not.
Minimizing Risk of Bias
•
•
•
•
Randomization
Allocation
Blinding
Intention to treat (ITT) analysis
Appraising RCTs/quasi experimental
studies JBI-MAStARI Instrument
Assessing Study Quality as a Basis for
Inclusion in a Review
high quality
Included studies
cut off point
Excluded studies
poor quality
Group Work 1
•
•
Working in pairs, critically appraise the two papers
in your workbook
Reporting Back
Session 3: Appraising
Observational Studies
Rationale and potential of observational
studies as evidence
• Account for majority of published research studies
• Need to clarify what designs to include
• Need appropriate critical appraisal/quality
assessment tools
• Concerns about methodological issues inherent to
observational studies
– Confounding, biases, differences in design
– Precise but spurious results
Appraisal of Observational Studies
• Critical appraisal and assessment of quality is often
more difficult than RCTs.
• Using scales/checklists developed for RCTs may
not be appropriate
• Methods and tools are still being developed and
validated
• Some published tools are available
Confounding
• The apparent effect is not the true effect
• May be other factors relevant to outcome in
question
• Can be important threat to validity of results
• Adjustments for confounding factors can be made multivariate analysis
• Authors often look for plausible explanation for
results
Bias
• Selection bias
– differ from population with same condition
• Follow up bias
– attrition may be due to differences in outcome
• Measurement/detection bias
– knowledge of outcome may influence assessment of
exposure and vice versa
Observational Studies - Types
•
•
•
•
Cohort studies
Case-control studies
Case series/case report
Cross-sectional studies
Cohort Studies
• Group of people who share common characteristic
• Useful to determine natural history and incidence of
disorder or exposure
• Two types
– prospective (longitudinal)
– retrospective (historic)
• Aid in studying causal associations
Prospective Cohort Studies
Taken from Tay & Tinmouth, 2007
Prospective Cohort Studies
• Longitudinal observation through time
• Allows investigation of rare diseases or long latency
• Expensive
• Increased likelihood of attrition
• Long time to see useful data
Retrospective Cohort Studies
Taken from Tay & Tinmouth, 2007
Retrospective Cohort Studies
• Mainly data collection
• No follow up through time
• Cheaper, faster
Case-Control Studies
•
•
•
•
•
•
Cases’ already have disease/condition
Controls’ don’t have disease/condition
Otherwise matched to control confounding
Frequently used
Rapid means of study of risk factors
Sometimes referred to as retrospective study
Case-Control Studies
Biomedical Library, University of Minnesaota, 2002
Case-Control Study
•
•
•
•
Inexpensive
Little manpower required
Fast
No indication of absolute risk
Case series/Case reports
• Tracks patients given similar treatment
– prospective
• Examines medical records for exposure and
outcome
– retrospective
• Detailed report of individual patient
• May identify new diseases and adverse effects
Case series/Case reports
Cross-sectional Studies
• Takes ‘slice’ or ‘snapshot’ of target group
• Frequency and characteristics of disease/variables
in a population at a point in time
• Often use survey research methods
• Also called prevalence studies
Appraising comparable Cohort and Casecontrol studies JBI-MAStARI Instrument
Appraising descriptive/case series studies
JBI-MAStARI Instrument
Group Work 2
• Working in pairs:
– critically appraise the cohort study in your workbook
– critically appraise the case control study in your
workbook
• Reporting Back
Session 4: Study data and Data Extraction

Considerations in Data Extraction
•
•
•
•
•
•
•
•
Source - citation and contact details
Eligibility - confirm eligibility for review
Methods - study design, concerns about bias
Participants - total number, setting, diagnostic criteria
Interventions - total number of intervention groups
Outcomes - outcomes and time points
Results - for each outcome of interest: sample size, etc
Miscellaneous - funding source, etc
Quantitative Data Extraction
• The data extracted for a systematic review are the
results from individual studies specifically related to
the review question.
• Difficulties related to the extraction of data include:
–
–
–
–
–
different populations used
different outcome measures
different scales or measures used
interventions administered differently
reliability of data extraction (i.e: between reviewers)
Minimising Error in Data Extraction
• Strategies to minimise the risk of error when extracting
data from studies include:
– utilising a data extraction form that is developed specifically
for each review
– pilot testing the extraction form prior to commencement of
the review
– training and assessing data extractors
– having two people extract data from each study
– blinding extraction before conferring
1004 references
172 duplicates
832 references
Scanned Ti/Ab
715 do not meet
Incl. criteria
117 studies
retrieved
82 do not meet
Incl. criteria
35 studies for
Critical Appraisal
9 excluded
studies
26 studies incl.
in review
Data most frequently
extracted
Outcome Data: Effect of Treatment or
Exposure
• Dichotomous
– Effect/no effect
– Present/absent
• Continuous
– Interval or ratio level data
– BP, HR, weight, etc
What do you want to know?
• Is treatment X more effective than treatment Y?
• Is exposure to X more likely to result in an outcome
or not?
• How many people need to receive an intervention
before someone benefits or is harmed?
Risk
• Risk=
# times something happens
# opportunities for it to happen
• “Risk” of birthing baby boy?
– One boy is born for every 2 opportunities:
1/2 = .5
That is: 50% probability (risk) of having a boy
• One of every 100 persons treated, has a side-effect,
1/100 = .01
Relative Risk (RR)
• Ratio of risk in exposed group to risk in not exposed
group (Pexposed/Punexposed)
– The RR of anaemia during pregnancy = the risk of
developing anaemia for pregnant women divided by the
risk of developing anaemia for women who are not
pregnant.
– The RR of further stroke for patients who have had a
stroke = risk of a stroke within one year post stroke
divided by the risk of having a stroke in one year for a
similar group of patients who have not had a stroke.
For example
• A trial examined whether patients with chronic fatigue syndrome
improved 6 weeks after treatment with i.m. magnesium. The group
who received the magnesium were compared to a placebo group and
the outcome was feeling better
‘Risk’ of improvement on magnesium = 12/ 15 = 0.80
‘Risk’ of improvement on placebo = 3/ 17 = 0.18
Relative risk (of improvement on Mg2+ therapy vs placebo) = 0.80/0.18 = 4.5
Thus patients on magnesium therapy are 4 times more likely to feel better on magnesium
rather than placebo
Interpreting Risk
• What does a relative risk of 1 mean?
– That there is no difference in risk in the two groups.
– In the magnesium example it would mean that patients are
as likely to “feel better” on magnesium as on placebo
– If there was no difference between the groups the
confidence interval would include 1
• It is important to know whether relative or absolute risk
is being presented as this influences the way in which
it is interpreted
Issues with RR – defining success
Success
Failure
Treatment A
Treatment B
0.96
0.04
0.99
0.01
• If the outcome of interest is success then RR=0.96/0.99=0.97
• If the outcome of interest is failure then RR=0.04/0.01=4
Absolute Risk Difference
• Is the absolute additional risk of an event due to an
exposure.
– Risk in exposed group minus risk in unexposed (or
differently exposed group).
• Absolute risk reduction (ARR) = Pexposed - Punexposed
• If the absolute risk is increased by an exposure we
sometimes use the term Absolute Risk Increase
(ARI)
For example
• From the previous example of
therapy and placebo:
comparing magnesium
‘Risk’ of improvement on magnesium = 12/ 15 = 0.80
‘Risk’ of improvement on placebo
= 3/ 17 = 0.18
Absolute risk reduction
= 0.80 - 0.18 = 0.62
Number Needed to Treat
• The additional number of people you would need to give a
new treatment to in order to cure one extra person compared
to the old treatment.
• For a harmful exposure, the number needed to harm is the
additional number of individuals who need to be exposed to
the risk in order to have one extra person develop the
disease, compared to the unexposed group.
– Number needed to treat = 1 / ARR
– Number needed to harm = 1 / ARR, ignoring negative sign.
For example
From the previous example of comparing magnesium therapy
and placebo:
‘Risk’ of improvement on magnesium = 12/ 15 = 0.80
‘Risk’ of improvement on placebo
= 3/ 17 = 0.18
Absolute risk reduction
= 0.80 - 0.18 = 0.62
Number needed to treat (to benefit) = 1 / 0.62 = 1.61 ~2
Thus on average one would give magnesium to 2 patients in order to expect one extra
patient (compared to placebo) to feel better
Odds
• Odds =
# times something happens
# times it does not happen
• What are the odds of birthing a boy?
– For every 2 births, one is a boy and one isn’t
1/1 = 1
That is: odds are even
• One of every 100 persons treated, has a side-effect,
1/99 = .0101
Odds Ratio
• Ratio of odds for exposed group to the odds for not
exposed group:
{Pexposed / (1 - Pexposed)}
{Punexposed / (1 - Punexposed)}
For example
• From the previous example of comparing magnesium therapy
and placebo:
Odds of improvement on magnesium= 12/3 = 4.0
Odds of improvement on placebo = 3/14 = 0.21
Odds ratio (of Mg2+ vs placebo) = 4.0 / 0.21 = 19.0
Therefore, improvement was 19 times more likely in the Mg2+ group than the
placebo group.
Relative Risk and Odds Ratio
• The odds ratio can be interpreted as a relative risk
when an event is rare and the two are often quoted
interchangeably
• This is because when the event is rare (b+d)→ d
and (a+c)→c.
– Relative risk = a(a+c) / b(b+d)
– Odds ratio = ac / bd
Relative Risk and Odds Ratio
• For case-control studies it is not possible to
calculate the RR and thus the OR is used.
• For cohort and cross-sectional studies, both can be
derived.
• OR have mathematical properties which makes
them more often quoted for formal statistical
analyses
Continuous data
• Means, averages, change scores etc.
– E.g. BP, plasma protein concentration,
• Any value often within a specified range
• Mean, Standard deviation, N
• Often only the standard error, SE, presented
• SD = SE x √ N
MAStARI Data Extraction Instrument
Group Work 3
• Working in pairs:
– Extract the data from the two papers in your workbook
• Reporting Back
Session 5: Protocol development
Program Overview
Day 2
Time
Session
Group Work
0900
Overview of Day 1
0915
Session 6: Data analysis and meta-analysis
1030
Morning Tea
1100
Session 7: Appraisal extraction and synthesis
using JBI MAStARI
1230
Lunch
1330
Session 8: Protocol Development
Protocol development
1415
Session 9: Assessment
MCQ Assessment
1445
Afternoon tea
1500
Session 10: Protocol Presentations
1700
End
Group Work 4: MAStARI trial.
Report back
Protocol Presentations
Overview
• Recap Day 1
– Critical appraisal
– Study design
– Type of studies
(experimental and
observational)
– Data extraction
• Today focus is on data
analysis and synthesis.
Session 6: Data Analysis and Metasynthesis/Meta-analysis


General Analysis - What Can be
Reported and How
– What interventions/activities have been evaluated
– The effectiveness/appropriateness/feasibility of the
intervention/activity
– Contradictory findings and conflicts
– Limitations of study methods
– Issues related to study quality
– The use of inappropriate definitions
– Specific populations excluded from studies
– Future research needs
1004 references
Meta Analysis
172 duplicates
832 references
Scanned Ti/Ab
117 studies
retrieved
35 studies for
Critical Appraisal
715 do not meet
Incl. criteria
82 do not meet
Incl. criteria
9 excluded
studies
26 studies incl.
in review
6 studies incl.
in meta analysis
20 studies incl.
in narrative
Statistical methods for meta-analysis
• Quantitative method of combining results of
independent studies
• Aim is to increase precision of overall estimate
• Investigate reasons for differences in risk estimates
between studies
• Discover patterns of risk amongst studies
When is meta-analysis useful?
• If studies report different treatment effects.
• If studies are too small (insufficient power) to detect
meaningful effects.
• Single studies rarely, if ever, provide definitive
conclusions regarding the effectiveness of an
intervention.
When meta-analysis can be used
• Meta analysis can be used if studies:
– have the same population
– use the same intervention administered in the same way.
– measure the same outcomes
• Homogeneity
– studies are sufficiently similar to estimate an average
effect.
Calculating an Overall Effect Estimate
• Odds Ratio
– for dichotomous data eg. the outcome present or absent
– 51/49 = 1.04
– (no difference between groups = 1)
• Weighted mean difference
– Continuous data, such as weight
– (no difference between groups = 0)
• Confidence Interval
– The range in which the real result lies, with the given degree of
certainty
Confidence Intervals
• Confidence intervals are an indication of how
precise the findings are
• Sample size greatly impacts the CI
– the larger the sample size the smaller the CI, the greater
the power and confidence of the estimate
CIs indicate:
• When calculated for OR, the CI provides the upper
and lower limit of the odds that a treatment may or
may not work
• If the odds ratio is 1, odds are even and therefore, not
significantly different
– recall the odds of having a boy
The Meta-view Graph
Results of different studies combined
Favours treatment
No effect
Favours control
Heterogeneity
• Is it appropriate to combine or pool results from
various studies?
• Different methodologies?
• Different outcomes measured?
• Problem greater in observational then clinical
studies
Heterogeneity
Difference between studies
Favours treatment
No effect
Favours control
Tests of Heterogeneity
• Measure extent to which observed study outcomes
differ from calculated study outcome
• Visually inspect Forest Plot. Size of CI
• 2 Test for homogeneity or Q Test can be used
– low power (use p < 0.1 or 0.2)
Insufficient Power
Studies too small to detect any effect
Favours treatment
No effect
Favours control
Meta-analysis
• Overall summary measure is a weighted average of
study outcomes.
• Weight indicates influence of study
• Study on more subjects is more influential
• CI is measure of precision
• CI should be smaller in summary measure
Subgroup analysis
• Subgroup analysis
• Some participants, intervention or outcome you thought
were likely to be quite different to the others
• Should be specified in advance in the protocol
• Only if there are good clinical reasons
•
Two types
• Between trial – trials classified into subgroups
• Within trial – each trial contributes to all subgroups
Example subgroup analysis
Taken from Egger, M. et al. BMJ 1998;316:140-144
Sensitivity Analysis
• Exclude and/or include individual studies in the
analysis
• Establish whether the assumptions or decisions we
have made have a major effect on the results of the
review
• ‘Are the findings robust to the method used to
obtain them?’
Meta-analysis
• Statistical methods
– Fixed effects model
– Random effects model
Fixed Effects Model
• All included studies measure same outcome
• Assume any difference observed is due to chance
– no inherent variation in source population
– variation within study, not between studies
• Inappropriate where there is heterogeneity present
• CI of summary measure reflects variability between
patients within sample
Random Effects Model
• Assumed studies are different and outcome will
fluctuate around own true value
– true values drawn randomly from population
– variability between patients within study and from
differences between studies
• Overall summary outcome is estimate of mean from
which sample of outcomes was drawn
• More commonly used with observational studies due
to heterogeneity
Random Effects Model
• Summary value will often have wider CI than with
fixed effects model
• Where no heterogeneity results of two methods will
be similar
• If heterogeneity present may be best to do solely
narrative systematic review
Session 7: Appraisal,
extraction and synthesis
using JBI-MAStARI
Meta Analysis of Statistics Assessment
and Review Instrument (MAStARI)
Group Work 4
MAStARI Trial and Meta Analysis
Session 8: Protocol development
Session 9: Assessment
Session 10: Protocol Presentations