quality - Department of Education
Download
Report
Transcript quality - Department of Education
Funded through the ESRC’s Researcher
Development Initiative
Session 2.3: Assessing Quality
Prof. Herb Marsh
Ms. Alison O’Mara
Dr. Lars-Erik Malmberg
Department of Education,
University of Oxford
Session 2.3: Assessing Quality
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Does it make a difference?
How do we assess primary study quality?
3
The quality of evidence generated by a review
depends entirely on the quality of primary
studies which make up the review
Garbage-in, garbage-out!
Quality assessment helps to set apart a metaanalysis or systematic review from a narrative
review
In meta-analysis, quality refers to methodological
quality (the internal validity of primary studies)
High quality reduces the risk of bias by eliminating
or taking into account sources of bias such as:
Selection bias
Information bias
Confounding
Increasingly, meta-analysts evaluate the quality
of each study included in a meta-analysis
Sometimes this is a global holistic (subjective)
rating. In this case it is important to have multiple
raters to establish inter-rater agreement
Sometimes study quality is quantified in relation to
objective criteria of a good study, e.g.
larger sample sizes;
more representative samples;
better measures;
use of random assignment;
appropriate control for potential bias;
double blinding, and
low attrition rates (particularly for longitudinal studies)
6
In a meta-analysis of Social Science meta-analyses,
Wilson & Lipsey (1993) found an effect size of .50.
They evaluated how this was related to study quality:
For meta-analyses providing a global (subjective) rating of
the quality of each study, there was no significant
difference between high and low quality studies; the
average correlations between effect size and quality was
almost exactly zero.
Almost no difference between effect sizes based on
random- and non-random assignment (effect sizes slightly
larger for random assignment).
The only study quality characteristic to make a difference
was positively biased effects due to one-group pre/post
design with no control group at all
7
Goldring (1990) evaluated the effects of gifted
education programs on achievement. She found a
positive effect, but emphasised that findings were
questionable because of weak studies:
21 of the 24 studies were unpublished and only one used
random assignment.
Effects varied with matching procedures:
largest effects for achievement outcomes were for studies
in which all non-equivalent groups' differences controlled
by only one pretest variable.
Effect sizes reduced as the number of control variables
increase and
disappeared altogether with random assignment.
Goldring (1990, p. 324) concluded policy makers need
to be aware of the limitations of the GAT literature. 8
Schulz (1995) evaluated study quality in 250 randomized
clinical trials (RCTs) from 33 meta-analyses. Poor quality
studies led to positively biased estimates:
lack of concealment (30-41%),
lack of double-blind (17%),
participants excluded after randomization (NS).
Moher et al. (1998) reanalysed 127 RCTs randomized clinical
trials from 11 meta-analyses for study quality.
Low quality trials resulted in significantly larger effect sizes, 30-50%
exaggeration in estimates of treatment efficacy.
Wood et al. (2008) evaluated study quality (1,346 RCTs from
146 meta-analyses.
subjective outcomes: inadequate/unclear concealment & lack of
blinding resulted in substantial biases.
objective outcomes: no significant effects.
conclusion: systematic reviewers should assess risk of bias.
9
Meta-analyses should always include subjective and/or
objective indicators of study quality.
In Social Sciences, there is some evidence that studies
with highly inadequate control for pre-existing
differences leads to inflated effect sizes. However, it is
surprising that other indicators of study quality make
so little difference.
In medical research, studies are largely limited to RCTs
where there is MUCH more control than in social
science research. Here, there is evidence that
inadequate concealment of assignment and lack of
double-blind inflate effect sizes, but perhaps only for
subjective outcomes.
These issues are likely to be idiosyncratic to individual
discipline areas and research questions.
10
11
It is important to code the study quality characteristics
Juni, Witschi, Bloch, and Egger (1999):
Evaluation of scales designed to assess the quality of
randomized field trials in medicine
Used an identical set of 17 studies
Applied the quality weightings dictated by 25 different scales
Seven of the scales showed that high quality trials showed
an effect whereas low quality trials did not.
Six of the scales found that high quality trials showed no
effect whereas low quality trials did (the reverse conclusion).
For the remaining 12 scales, effect estimates were similar
across the quality levels
Overall summary quality scores were not significantly
associated with treatment effects.
In summary: the scale used to evaluate study quality can
determine whether a difference in quality levels is detected
12
Requires designing the code materials to include
adequate questions about the study design and
reporting
It requires skill and training in identifying quality
characteristics
May require additional analyses:
Quality weighting (Rosenthal, 1991)
Use of kappa statistic in determining validity of quality
filtering for meta-analysis (Sands & Murphy, 1996)
Regression with “quality” as a predictor of effect size
(see Valentine & Cooper, 2008)
13
Uses of information about quality:
Narrative discussion of impact of quality on results
Display study quality and results in a tabular format
Weight the data by quality - not usually recommended
because scales are not always consistent (see Juni et al.,
1999; Valentine & Cooper, 2008)
Subgroup analysis by quality
Include quality as a covariate in meta-regression
Developed an instrument for assessing study
quality for inclusion in meta-analysis and
systematic review
“focuses on the operational details of studies and results
in a profile of scores instead of a single score to
represent study quality (p. 130)”
Study Design and Implementation Assessment
Device (DIAD)
Hierarchical: consists of “global”, “composite”, and
“design and implementation” questions
15
From Valentine & Cooper (2008, p. 139).
Multiple questions within each global and
composite question
Example excerpt of Table 4 (p. 144)
Critical Appraisal Skills Programme (CASP) to help with
the process of critically appraising articles:
Systematic Reviews (!)
Randomised Controlled Trials (RCTs)
Qualitative Research
Economic Evaluation Studies
Cohort Studies
Case Control Studies
Diagnostic Test Studies
Good start, but not comprehensive
Download from
http://www.phru.nhs.uk/Pages/PHD/resources.htm
Example from the CASP Cohort Studies form
http://www.phru.nhs.uk/Pages/PHD/resources.htm
Centre for Reviews and Dissemination (CRD),
University of York
Report on “Study quality assessment”
www.york.ac.uk/inst/crd/pdf/crdreport4_ph5.pdf
Development of quality assessment instruments
Also, quality assessment of:
effectiveness studies
accuracy studies
qualitative research
economic evaluations
Some questions to consider regarding quality in
case series designs (can also be used for survey
data designs)
Evidence for Policy and Practice Information and
Co-ordinating Centre (EPPI-Centre), Institute of
Education, University of London
Some issues to consider when coding are listed in
the Guidelines for the REPOrting of primary
empirical research Studies in Education (The
REPOSE Guidelines)
Not as
detailed/thorough
(prescriptive?) as in
the medical
research
guidelines...
When designing your code materials, you can look
at guidelines for what should be reported, and turn
those into questions to evaluate quality
For example, Strengthening the Reporting of
Observational Studies in Epidemiology (STROBE)
STROBE Statement—Checklist of items that should be
included in reports of cross-sectional studies
Point 13(b): Give reasons for non-participation at each
stage
Could be rephrased in the coding materials as:
Did the study give a reason for non-participation?
STROBE checklists available for cohort, casecontrol, and cross-sectional studies at
http://www.strobe-statement.org/Checklist.html
Quality of reporting
It is often hard to separate quality of reporting from
methodological quality - “Not reported” is not always “Not
done”
Should code “Unspecified” as distinct from “Criteria not met”
Consult as many materials as possible when
developing coding materials
There are some good references for systematic reviews that
also apply to meta-analysis
Torgerson’s (2003) book
Gough’s (2007) framework
Search Cochrane Collaboration (http://www.cochrane.org/) for
“assessing quality”
Gough, D. (2007). Weight of evidence: a framework for the appraisal of the
quality and relevance of evidence. In J. Furlong, A. Oancea (Eds.) Applied and
Practice-based Research. Special Edition of Research Papers in Education,
22, 213-228.
Juni, P., Witschi, A., Bloch R., & Egger, M. (1999). The hazards of scoring the
quality of clinical trials for meta-analysis, JAMA, 282,1054–1060
Rosenthal, R. (1991). Quality-weighting of studies in meta-analytic research.
Psychotherapy Research, 1, 25-28.
Sands, M. L., & Murphy, J. R. (1996). Use of kappa statistic in determining
validity of quality filtering for meta-analysis: A case study of the health effects
of electromagnetic radiation. Journal of Clinical Epidemiology, 49, 1045-1051.
Torgerson, C. (2003). Systematic reviews. UK: Continuum International.
Valentine, J. C., & Cooper, H. M. (2008). A systematic and transparent
approach for assessing the methodological quality of intervention
effectiveness research: The Study Design and Implementation Assessment
Device (Study DIAD). Psychological Methods, 13, 130-149.