Transcript Slide 1

Sample size issues & Trial
Quality
David Torgerson
Chance
• When we do a trial we want to be sure that any
effect we see is not simply by chance.
• The probability of any difference occurring by
chance declines as the sample size increases.
• Also as the sample size increases smaller
differences are likely to be statistically
significant.
Statistical significance
• A p value of 0.05 means if we repeated the
same trial 100 times we would expect to
observe the difference that occurred by
chance 5 times.
• A small p value does not relate to the
strength of the association a small
difference will have a small p value if the
sample size is large.
Power
• As well as significance there is the issue of
power.
• A sample size might have 80% power to
detect a specified difference at 5%. In
other words for a given sample size we
would have an 80% probability of
observing an effect if it exists with a 5%
significance.
Sample Size
• Don’t believe statistics textbooks. A trial can
NEVER be too big. Most trials in education use
tiny sample sizes (e.g., 30 participants).
• A small trial will miss an important difference.
For example, if a school based intervention
increased exam pass rates by 10% this would
have very important benefits to society. BUT we
would need a trial of at lease 800 participants in
an individually randomised trial to observe this
effect.
Sample Size
• Small trials will miss important differences.
• Bigger is better in trials.
• Why was the number chosen? For
example “given an incidence of 10% we
wanted to have 80% power to show a
halving to 5%” or “we enrolled 100
participants”.
What is a reasonable
difference?
• In a review by Lipsey and Wilson in 1993
of all quasi-experiments in the social
sciences they found few effective
interventions had effect sizes greater than
0.5. Health care produces similar gains of
0.5 of a standard deviation or lower.
Lipsey & Wilson, 1993. American Psychologist 48,
1181-1209
Effect size & Sample size
• We should, therefore, plan trials that are
large enough to identify a difference of 0.5
of a standard deviation between the two
experimental groups if it exists.
Who needs a statistician?
• A simple way to calculate a sample size is
to take 32 (for 80% power, 42 for 90%)
and divide this by the square of effect size.
• 0.5 squared is 0.25. 32/0.25 = 128.
• 0.25 is 512, note halving the effect size
quadruples the sample size.
Cluster sample size
• Because often in education we will
randomise by classes or schools we need
to take the correlation between pupils into
account for sample size calculations. This
can often lead to a doubling or more of the
sample participants.
Reporting Quality of Trials
• As well as having an adequate sample
size there are other important aspects of
trial quality.
Important quality items
• Allocation method;
» method randomisation;
» secure randomisation.
• Intention to treat analysis.
• Blinding.
• Attrition.
Blinding
• Who knew who got what when?
• Was the participant blind?
• Most IMPORTANT was outcome
assessment blind?
Attrition
• What was the final number of participants
compared with the number randomised?
• What happened to those lost along the
way?
• Was there equal attrition?
External Validity
• Once a trial has high internal validity our next
task is to assess whether its results are
applicable outside its sample.
• Are participants similar to the general population
on who we would apply the intervention?
• Was intervention used generalisable?
Methods comparison of trials
• We undertook a methodological review of
RCTs in health and education to answer
the following questions:
» Were only bad trials prevalent in health care?
» Was methodological quality improving over
time?
Torgerson et al. BERJ, 2004: Accepted.
Study Characteristics
C h a ra cte ristic
D ru g
C lu ste r R a n d o m ise d
1%
36%
18%
S a m p le size ju stifie d
59%
28%
0%
C o n ce a le d ra n d o m isa tio n
40%
8%
0%
B lin d e d F o llo w -u p
53%
30%
14%
U se o f C Is
68%
41%
1%
Lo w S ta tistica l P o w e r
45%
41%
85%
H e a lth E d u ca tio n
Change in concealed allocation
50
45
40
35
30
25
20
15
10
5
0
<1997
>1996
Drug
P = 0.04
No Drug
P = 0.70
NB No education trial used concealed allocation
Blinded Follow-up
60
50
40
<1997
>1996
30
20
10
0
Drug
P = 0.54
Health
P = 0.13
Education
P = 0.03
Underpowered
90
80
70
60
50
<1997
>1996
40
30
20
10
0
Drug
P = 0.01
Health
P = 0.76
Education
P = 0.22
Mean Change in Items
3.5
3
2.5
2
<1997
>1996
1.5
1
0.5
0
Drug
P= 0.001
No Drug
P= 0.07
Education
P= 0.03
Quality conclusions
• Apart from ‘drug’ trials the quality of health
care trials is poor and not improving
outside of major journals.
• Education trials are bad and getting worse!
Trial Examples
• CBT treatment vs Fire Safety Education
(FSE) for child arsonists.
• N = 38 boys randomised to receive FSE or
CBT.
• Outcomes measured at 13 weeks and 12
months included firesetting and match play
behaviour
Kolko J Child Psychiatr 2001:42:359.
Results
• Outcomes were mixed in some outcomes
were favourable for CBT, whilst for others,
there was no difference.
Results
100
90
80
70
60
50
40
30
20
10
0
FSE
CBT
Problems with Trial
• Too SMALL.
• Trial could have missed very important
differences.
• Outcomes were NOT arson, there were no
reports of arson by any children in the
study.
• Unsure whether randomisation was
concealed.
Domestic violence experiment
• 404 men convicted of partner abuse were
randomised to probation or counselling.
• Data were collected at 12 months on rearrests, beliefs and behaviours on partner
abuse.
Feder & Dugan 2002; Justice Quarterly 19;343.
Results
• No difference in re-offending as measured
by re-arrest statistics (I.,e 24% in both
groups).
• No differences in attitudes towards partner
abuse.
Trial Methods
• Trial was relatively large (> 200 in each group)
would have had enough power to detect a halving
of offending. That is 24% down to 12%.
• For ‘beliefs’ there was a high drop-out rate, (50%)
which may make those results unreliable.
• Allocation appeared to be secure.
• Cross-over was slight.
• Unclear as to whether re-arrest data was collected
‘blindly’.
Conclusion
• Counselling is probably an ineffective
method of trying to prevent spousal abuse.
• Other interventions should be sought.
• Message: if you’re being battered by your
Spouse, don’t bother with counselling!
Preventing unscheduled school
transfers
• Unscheduled school transfers are
associated with poor academic outcomes.
The raising healthy children project in
Seattle aimed to put into place
interventions among high risk students
who exhibit academic or behavioural
problems.
Fleming et al, 2001: Evaluation Review 25:655.
Design
• Cluster randomised trial
• 10 schools randomised. 5 experimental
schools received a variety of interventions
to help high risk students and their
families.
• Analysis was multilevel to take into
account clustering.
Results
• The intervention showed that there was a
reduction of 2/3rds in the transfer rates, which
was statistically significant. 61% versus 45%
difference 16% NOTE, that the intervention
schools still had a high transfer rate.
• Also effects of intervention waned over the 5
years, suggesting it would need to be
continuous to be effective.
Study implications
• Study showed an effective intervention.
Number of clusters (10) was on small side,
ideally should have been more. High
chance of missing a smaller effect.
Conclusions
• RCT is the BEST evaluative method.
• They can, and have been done, in the field
of education.
• We need MORE larger and better quality
trials to inform future policy in this area.