Transcript Slide 1

Developing evidence-based products
using the systematic review process
Session 4/Unit 10
Assessing study quality
Carole Torgerson
November 13th 2007
NCDDR training course for NIDRR grantees
1
Assessing study quality or critical
appraisal
• To investigate whether the individual studies in the
review are affected by bias. Systematic errors in a study
can bias its results by overestimating or underestimating
effects. Bias in an individual study or individual studies
in the review can in turn bias the results of the review
• To make a judgement about the weight of evidence that
should be placed on each study (higher weight given to
studies of higher quality in a meta-analysis)
• To investigate differences in effect sizes between high
an low quality studies in a meta-regression
2
Coding for study quality
• Collect and record information about each study
on quality of design and quality of analysis
(internal validity)
• Base quality assessment judgement about each
study on this information
• Use quality assessment judgements of all
included studies to inform the synthesis (metaanalysis)
» Only use findings from studies judged to be of high
quality or qualify findings
» Look for homogeneity/heterogeneity
» Examine differences in findings according to quality
(sensitivity analysis)
3
“A careful look at randomized
experiments will make clear that
they are not the gold standard. But
then, nothing is. And the
alternatives are usually worse.”
Berk RA. (2005) Journal of Experimental Criminology 1, 417-433.
4
Code for design: Experiments/quasi
experiments; fidelity to random
allocation; concealment
Code for whether RCT or quasi-experiment (specify) or
other (specify)
Is the method of assignment unclear?
» Need to ascertain how assignment was undertaken
and code for this; if unclear, may need to contact
authors for clarification
» Look for confusion between non-random and random
assignment – the former can lead to bias.
If RCT
» Need to look for and code for assignment
discrepancies, e.g. failure to keep to random
allocation
» Need to code for whether or not allocation concealed 5
Which studies are RCTs?
• “We took two groups of schools – one group had high ICT use
and the other low ICT use – we then took a random sample of
pupils from each school and tested them”.
• “We put the students into two groups, we then randomly
allocated one group to the intervention whilst the other formed
the control”
• “We formed the two groups so that they were approximately
balanced on gender and pre-test scores”
• “We identified 200 children with a low reading age and then
randomly selected 50 to whom we gave the intervention. They
were then compared to the remaining 150”.
• “Of the eight [schools] two randomly chosen schools served as
a control group”
6
Is it randomised?
“The groups were balanced for gender and,
as far as possible, for school. Otherwise,
allocation was randomised.”
Thomson et al. Br J Educ Psychology 1998;68:475-91.
7
Is it randomised?
“The students were assigned to one of three
groups, depending on how revisions were
made: exclusively with computer word
processing, exclusively with paper and
pencil or a combination of the two
techniques.”
Greda and Hannafin, J Educ Res 1992;85:144.
8
Mixed allocation
“Students were randomly assigned to either
Teen Outreach participation or the control
condition either at the student level (i.e.,
sites had more students sign up than
could be accommodated and participants
and controls were selected by picking
names out of a hat or choosing every
other name on an alphabetized list) or less
frequently at the classroom level”
Allen et al, Child Development 1997;64:729-42.
9
Non-random assignment confused with
random allocation
“Before mailing, recipients were randomized by
rearranging them in alphabetical order according
to the first name of each person. The first 250
received one scratch ticket for a lottery
conducted by the Norwegian Society for the
Blind, the second 250 received two such scratch
tickets, and the third 250 were promised two
scratch tickets if they replied within one week.”
Finsen V, Storeheier, AH (2006) Scratch lottery tickets are a poor
incentive to respond to mailed questionnaires. BMC Medical Research
Methodology 6, 19. doi:10.1186/1471-2288-6-19.
10
Misallocation issues
“23 offenders from the treatment group
could not attend the CBT course and they
were then placed in the control group”.
11
Concealed allocation – why is it
important?
» Good evidence from multiple sources shows that
effect sizes for RCTs where randomisation was not
independently conducted were larger compared with
RCTs that used independent assignment methods.
» A wealth of evidence is available that indicates that
unless random assignment was undertaken by an
independent third party, then subversion of the
allocation may have occurred (leading to selection
bias and exaggeration of any differences between the
groups).
12
Allocation concealment:
a meta-analysis
• Schulz and colleagues took a database of 250
randomised trials in the field of pregnancy and
child birth.
• The trials were divided into 3 groups with
respect to concealment:
» Good concealment (difficult to subvert);
» Unknown (not enough detail in paper);
» Poor (e.g., randomisation list on a public notice
board).
• They found exaggerated effect sizes for poorly
concealed compared with well concealed
randomisation.
13
Comparison of adequate, unclear
and inadequate concealment
Allocation
Concealment
Adequate
Effect Size
OR
1.0
Unclear
0.67
Inadequate
0.59
P < 0.01
Schulz et al. JAMA 1995;273:408.
14
Small vs large trials
• Small trials tend to give greater effect
sizes than large trials: this shouldn’t
happen.
• Kjaergard et al showed this phenomenon
was due to poor allocation concealment in
small trials; when trials were grouped by
allocation methods ‘secure’ allocation
reduced effect by 51%.
Kjaegard et al. Ann Intern Med 2001;135:982.
15
Case study
• Subversion is rarely reported for individual
studies.
• One study where it has been reported was for a
large, multi-centred surgical trial.
• Participants were randomised at 5+ centres
using sealed envelopes (sealed envelopes can
be opened in advance and participants can be
selected by the recruiting researcher into groups
rather than by randomisation).
16
Mean ages of groups
Clinician
Experimental
Control
All p < 0.01
59
63
1 p =.84
62
61
2 p = 0.60
43
52
3 p < 0.01
57
72
4 p < 0.001
33
69
5 p = 0.03
47
72
Others p = 0.99
64
59
17
Using telephone allocation
Clinician
Experimental
Control
All p = 0.37
59
57
1 p =.62
57
57
2 p = 0.24
60
51
3 NA
61
70
4 p =0.99
63
65
5 p = 0.91
57
62
Others p = 0.99
59
56
18
Recent blocked trial
“This was a block randomised study (four patients to
each block) with separate randomisation at each of
the three centres. Blocks of four cards were
produced, each containing two cards marked with
"nurse" and two marked with "house officer." Each
card was placed into an opaque envelope and the
envelope sealed. The block was shuffled and, after
shuffling, was placed in a box.”
Kinley et al., BMJ 2002 325:1323.
19
• Block randomisation is a method of
ensuring numerical balance; in this case,
blocking was by centre.
• If block randomisation of 4 was used then
numbers in each group at each centre
should not be different by more than 2
participants.
20
Problem?
Southampton
Sheffield
Doncaster
Doctor Nurse
Doctor Nurse
Doctor Nurse
500
308
118
511
319
Kinley et al., 2002 BMJ 325:1323.
118
21
Examples of good allocation
concealment
• “Randomisation by centre was conducted
by personnel who were not otherwise
involved in the research project” [1]
• Distant assignment was used to: “protect
overrides of group assignment by the staff,
who might have a concern that some
cases receive home visits regardless of
the outcome of the assignment process”[2]
[1] Cohen et al. (2005) J of Speech Language and Hearing Res. 48, 715-729.
[2] Davis RG, Taylor BG. (1997) Criminology 35, 307-333.
22
Assignment discrepancy
• “Pairs of students in each classroom were
matched on a salient pretest variable, Rapid
Letter Naming, and randomly assigned to
treatment and comparison groups.”
• “The original sample – those students were
tested at the beginning of Grade 1 – included 64
assigned to the SMART program and 63
assigned to the comparison group.”
Baker S, Gersten R, Keating T. (2000) When less may be more: A 2-year
longitudinal evaluation of a volunteer tutoring program requiring minimal training.
Reading Research Quarterly 35, 494-519.
23
Change in concealed allocation
50
45
40
35
30
25
20
15
10
5
0
<1997
>1996
Drug
P = 0.04
No Drug
P = 0.70
24
NB No education trial used concealed allocation
Example of unbalanced trial
affecting results
• Trowman and colleagues undertook a
systematic review to see if calcium supplements
were useful for helping weight loss among
overweight people.
• The meta-analysis of final weights showed a
statistically significant benefit of calcium
supplements. HOWEVER, a meta-analysis of
baseline weights showed that most of the trials
had ‘randomised’ lower weight people into the
intervention group. When this was taken into
account there was no longer any difference.
25
Meta-analysis of baseline body weight
Trowman R et al (2006) A systematic review of the effects of calcium
supplementation on body weight. British Journal of Nutrition 95, 1033-38.
26
Summary of assignment and
concealment
• Code for whether RCT or quasi-experiment (specify) or
other (specify)
• Increasing evidence to suggest that subversion of
random allocation is a problem in randomised trials. The
‘gold-standard’ method of ‘random’ allocation is the use
of a secure third party method.
• Code whether or not the trial reports that an independent
method of allocation was used. Poor quality trials: use
sealed envelopes; do not specify allocation method; or
use allocation methods within the control of the
researcher (e.g., tossing a coin).
• Code for assignment discrepancies, e.g. failure to keep
to random allocation
27
5 minute break!
28
Other design issues
• Attrition (drop-out) can introduce selection bias
• Unblinded ascertainment (outcome
measurement) can lead to ascertainment bias
• Small samples can lead to Type II error
(concluding there is no difference when there is
a difference)
• Multiple statistical tests can give Type I errors
(concluding there is a difference when this is
due to chance)
• Poor reporting of uncertainty (e.g., lack of
confidence intervals).
29
Coding for other design
characteristics
• Code for attrition in intervention and control
groups
• Code for whether or not there is ‘blinding’ of
participants
• Code for whether or not there is blinded
assessment of outcome
• Code for whether or not the sample size is
adequate
• Code for whether the primary and secondary
outcomes are pre-specified
30
Blinding of participants and
investigators
• Participants can be blinded to:
» Research hypotheses
» Nature of the control or experimental condition
» Whether or not they are taking part in a trial
• This may help to reduce bias from resentful
demoralisation
• Investigators should be blinded (if possible) to
follow-up tests as this eliminates ‘ascertainment’
bias. This is where consciously or
unconsciously investigators ascribe a better
outcome than is the truth based on the
knowledge of the assigned groups.
31
Blinding of outcome assessment
• Code for whether or not post-tests were administered by
someone who is unaware of the group allocation.
Ascertainment bias can result when the assessor is not
blind to group assignment, e.g., homeopathy study of
histamine showed an effect when researchers were not
blind to the assignment but no effect when they were.
• Example of outcome assessment blinding: Study “was
implemented with blind assessment of outcome by
qualified speech language pathologists who were not
otherwise involved in the project”
Cohen et al. (2005) J of Speech Language and Hearing Res. 48, 715-729.
32
Blinded outcome assessment
40
P = 0.13
P = 0.03
35
30
25
<1997
>1996
20
15
10
5
0
Hlth Ed
Education
Torgerson CJ, Torgerson DJ, Birks YF, Porthouse J. (2005) A comparison
of randomised controlled trials in health and education. British
Educational Research Journal,31:761-785.
33
Statistical power
• Few effective educational interventions produce
large effect sizes especially when the
comparator group is an ‘active’ intervention. In a
tightly controlled setting 0.5 of a standard
deviation difference at post-test is good. Smaller
effect sizes in field trials are to be expected (e.g.
0.25). To detect 0.5 of an effect size with 80%
power (sig = 0.05), we need 128 participants for
an individually randomised experiment.
34
Percentage of trials underpowered
(n < 128)
P = 0.22
90
80
P = 0.76
70
60
50
<1997
>1996
40
30
20
10
0
Hlth Ed
Education
Torgerson CJ, Torgerson DJ, Birks YF, Porthouse J. (2005) A comparison of
randomised controlled trials in health and education. British Educational
Research Journal,31:761-785.
35
Code for analysis issues
• Code for whether, once randomised, all
participants are included within their
allocated groups for analysis (i.e., was
intention to treat analysis used).
• Code for whether a single analysis is prespecified before data analysis.
36
Attrition
• Attrition can lead to bias; a high quality trial will
have maximal follow-up after allocation.
• It can be difficult to ascertain the amount of
attrition and whether or not attrition rates are
comparable between groups.
• A good trial reports low attrition with no between
group differences.
• Rule of thumb: 0-5%, not likely to be a problem.
6% to 20%, worrying, > 20% selection bias.
37
Poorly reported attrition
• In a RCT of Foster-Carers extra training was
given.
» “Some carers withdrew from the study once the dates
and/or location were confirmed; others withdrew once
they realized that they had been allocated to the
control group” “117 participants comprised the final
sample”
• No split between groups is given except in one
table which shows 67 in the intervention group
and 50 in the control group. 25% more in the
intervention group – unequal attrition hallmark of
potential selection bias. But we cannot be sure.
38
Macdonald & Turner, Brit J Social Work (2005) 35,1265
What is the problem here?
Random allocation
160 children in 20 schools (8 per school)
80 in each group
1 school 8 children withdrew
N = 17 children replaced following
discussion with teachers
76 children allocated to control
76 allocated to intervention group
39
• In this example one school withdrew and pupils
were lost from both groups – unlikely to be a
source of bias.
• BUT 17 pupils were withdrawn by teachers as they
did not want them to have their allocated
intervention and these were replaced by 17 others.
• This WILL introduce bias into the experiment.
Note: such a trial should be regarded as a quasiexperiment.
40
What about matched pairs?
• It is sometimes stated that selection bias due to
attrition can be avoided using a matched pairs
design, whereby the survivor of a pair is
removed from the analysis (1).
• We can only match on observable variables and
we trust to randomisation to ensure that
unobserved covariates or confounders are
equally distributed between groups.
• Using matched pairs won’t remove attrition bias
from the unknown covariate.
(1) Farrington DP, Welsh BC. (2005) Randomized experiments in criminology:
What have we learned in the last two decades? Journal of Experimental
41
Criminology 1, 9-38.
Pairs matched on gender
Control
(unknown covariate)
Boy (high)
Intervention
(unknown covariate)
Boy (low)
Girl (high)
Girl (high)
Girl (low)
Girl (high)
Boy (high)
Boy (low)
Girl (low)
Girl (high)
3 Girls and 3 highs
3 Girls and 3 highs.
42
Drop-out of 1 girl
Control
Intervention
Boy (high)
Boy (low)
Girl (high)
Girl (high)
Girl (low)
Girl (high)
Boy (high)
Boy (low)
Girl (high)
2 Girls and 3 highs
3 Girls and 3 highs.
43
Removing matched pair does not
balance the groups!
Control
Intervention
Boy (high)
Boy (low)
Girl (high)
Girl (high)
Girl (low)
Girl (high)
Boy (high)
Boy (low)
2 Girls and 3 highs
2 Girls and 2 highs.
44
Intention to treat (ITT)
• Randomisation ensures the abolition of selection
bias at baseline; after randomisation some
participants may cross over into the opposite
treatment group (e.g., fail to take allocated
treatment or obtain experimental intervention
elsewhere).
• There is often a temptation by trialists to analyse
the groups as treated rather than as
randomised.
• This is incorrect and can introduce selection
bias.
45
ITT analysis: examples
• Seven participants allocated to the control condition
(1.6%) received the intervention, whilst 65 allocated
to the intervention failed to receive treatment
(15%). (1) The authors, however, analysed by
randomised group (CORRECT approach)
• “It was found in each sample that approximately
86% of the students with access to reading
supports used them. Therefore, one-way ANOVAs
were computed for each school sample, comparing
this subsample with subjects who did not have
access to reading supports.” (2) (INCORRECT)
(1) Davis RG, Taylor BG. (1997) Criminology 35, 307-333.
(2) Feldman SC, Fish MC. (1991) Journal of Educational Computing Research 7, 25-36.
46
Unit of allocation
• Participants can be randomised individually (the
most usual approach) or as groups.
• The latter is known as cluster or group or place
randomised controlled trials.
• Often it is not possible to randomise as
individuals for example:
» Evaluating training interventions on clinicians or
teachers and measuring outcomes on patients or
students;
» Spill over or contamination of the control group.
47
Clusters
• A cluster can take many forms:
» GP practice or patients belonging to an
individual practitioner;
» Hospital ward;
» School, class;
» A period of time (week; day; month);
» Geographical area (village; town; postal
district).
48
Code for quality of cluster trials
• Code for whether the participants were recruited
before the clusters were randomised – if not this
could have lead to selection bias.
• Individuals within clusters have outcomes that
are related and this needs to be accounted for
both in the sample size calculation and the
analysis. Code for the following: did the trial
report its intracluster correlation coefficient
(ICC)? did the analysis use some form of
statistical approach to take clustering into
account (e.g., cluster level means, hierarchical
linear modelling, robust standard errors).
49
What is wrong here?
• “the remaining 4 classes of fifth-grade
students (n = 96) were randomly assigned,
each as an intact class, to the [4]
prewriting treatment groups;”
Brodney et al. J Exp Educ 1999;68,5-20.
50
Insufficient cluster replication
• The key quality criterion of a cluster trial is
not the number of individual participants in
the study but the number of clusters.
• A cluster trial with only 1 cluster per group
cannot be thought of as a trial as it is
impossible to control for cluster level
confounders. At least 4 (some say 7)
clusters per group are needed to have
some hope of balancing out confounders.
51
Which is better?
• Cluster trial A: We randomised 10 schools with
500 children in each 5 to the intervention and 5
to the control (I.e., 5,000 children in all);
OR
• Cluster trial B: We randomised 100 classes with
25 children in each 50 to the control and 50 to
the intervention (I.e., 2,500 children in all).
• Trial B is better as it has 100 units of allocation
rather than 10 despite having 50% fewer
children.
52
Selection bias in cluster
randomised trials
• Given enough clusters selection bias should not
have occurred in cluster trials as randomisation
will have dealt with this.
• HOWEVER, the clusters will be balanced at the
individual level ONLY if all eligible people, or a
random sample, within the cluster were included
in the trial.
• In some trials this does not apply as
randomisation occurred BEFORE recruitment.
This could have introduced selection bias.
53
Reviews of Cluster Trials
Years
Clustering
allowed in
sample size
Clustering
allowed in
analysis
Donner et al. 16 non-therapeutic
(1990)
intervention trials
1979–1989
<20%
<50%
Simpson et
al. (1995)
1990–1993
19%
57%
Isaakidis and 51 trials in Sub-Saharan 1973–2001
Ioannidis
Africa
(half post
(2003)
1995)
20%
37%
Puffer et al.
(2003)
36 trials in British
1997–2002
Medical Journal, Lancet,
and New England
Journal of Medicine
56%
92%
Eldridge et
al. (Clinical
Trials 2004)
152 trials in primary
health care
20%
59%
Authors
Source
21 trials from American
Journal of Public Health
and Preventive
Medicine
1997–2000
54
Analysis
Many cluster randomised health care trials were improperly
analysed. Most analyses use t-tests, chi-squared tests,
which assumes independence of observations, which
are violated in a cluster trial.
This leads to spurious p values and narrow CIs.
Various methods exist, e.g., multilevel models, comparing
means of clusters, which will produce correct estimates.
See a worked example at Martin Bland’s website:
http://www-users.york.ac.uk/~mb55/
55
Survey of trial quality
Characteristic
Cluster Randomised
Sample size justified
Concealed randomisation
Blinded Follow-up
Use of CIs
Low Statistical Power
Drug Health Education
1%
36%
18%
59% 28%
0%
40%
8%
0%
53% 30%
14%
68% 41%
1%
45% 41%
85%
Torgerson CJ, Torgerson DJ, Birks YF, Porthouse J. (2005) A comparison of
randomised controlled trials in health and education. British Educational
Research Journal,31:761-785. (based on n = 168 trials)
56
CONSORT
• Because the majority of health care trials
were badly reported, a group of health
care trial methodologists developed the
CONSORT statement, which indicates key
methodological items that must be
reported in a trial report.
• This has now been adopted by all major
medical journals and some psychology
journals.
57
The CONSORT guidelines, adapted for
trials in educational research
•
•
•
•
•
•
•
•
Was the target sample size adequately determined?
Was intention to teach analysis used? (i.e. were all children who
were randomised included in the follow-up and analysis?)
Were the participants allocated using random number tables, coin
flip, computer generation?
Was the randomisation process concealed from the investigators?
(i.e. were the researchers who were recruiting children to the trial
blind to the child’s allocation until after that child had been
included in the trial?)
Were follow-up measures administered blind? (i.e. were the
researchers who administered the outcome measures blind to
treatment allocation?)
Was precision of effect size estimated (confidence intervals)?
Were summary data presented in sufficient detail to permit
alternative analyses or replication?
Was the discussion of the study findings consistent with the
data?
58
Flow Diagram
• In health care trials reported in the main
medical journals authors are required to
produce a CONSORT flow diagram.
• The trial by Hatcher et al, clearly shows
the fate of the participants after
randomisation until analysis.
59
Flow Diagram
635 children in 16 schools
screened using group spelling test
118 children with poor spelling
skills given individual tests of
vocabulary, letter knowledge, word
reading and phoneme manipulation
2 schools excluded
due to insufficient numbers
of poor spellers
84/118 in 14 remaining
schools (6 per school) selected for
randomisation to interventions
excluded 9 children due to behaviour
1 school (6 children) withdrew
from study after randomisation
39/42 children in 13 remaining
schools allocated to 20-week
intervention
39/42 children included
39/42 children in 13 remaining
schools allocated to 10-week intervention
1 child left study (moved school)
38/42 children included
Hatcher et al. 2005 J Child Psych Psychiatry: online
60
Year 7 Pupils
N = 155
Randomised
ICT group
N = 77
N = 3 left school
No ICT Group
N = 78
N = 1 left school
70 valid pre-tests
67 valid post-tests
63 valid pre and post
75 valid pre-tests
71 valid post-tests
67 valid pre and post tests
61
Dr Carole Torgerson
Senior Research Fellow
Institute for Effective Education
University of York
[email protected]
62