Experimental Design

Download Report

Transcript Experimental Design

Causality and
Experimental Design
Doctor of Education (EdD)
Analysing, Interpreting and Using Educational
Research (Research Methodology)
D
University
of Durham
Dr Robert Coe
University of Durham School of Education
Tel: (+44 / 0) 191 334 4184
Fax: (+44 / 0) 191 334 4180
E-mail: [email protected]
http://www.dur.ac.uk/r.j.coe
Causal claims

Examples








“A causes B”
“B is affected by A”
“A influences B”
“A improves B”
“B benefits from A”
“A results in B”
“A prevents B”
“The effect of A on B …”
© 2005 Robert Coe, University of Durham

If you do A, B will result
(provided X, Y, Z)


A must be something you
can choose to do
(‘manipulable’)
B would not result if you
didn’t do A
(How can you ever know
this?)
2
Can we do without ‘causality’?



Very hard to avoid (even in ‘interpretative’
writing)
No causality  no predictability
You can have it both ways




There are general laws, but we can choose
Causal laws are probabilistic – better applied to
groups than individuals
Just because we actively interact with our
environment and interpret situations does not mean
our behaviour is not (partially) predictable
Human behaviour is unpredictable, but also
predictable – the trick is to predict the bits you can 3
© 2005 Robert Coe, University of Durham
ImpaCT2: ICT and achievement

What the report said:



“evidence of a positive relationship
between ICT use and achievement” (p2)
“pupils characterised as high ICT users
outperformed, on average, low ICT users
in English and mathematics [at KS2]” (p11)
What the press release said:

“A new independent research report shows
that computers can help to raise standards
in schools”
© 2005 Robert Coe, University of Durham
4
Questions



What is the difference between
(1) “a positive relationship” and
(2) “computers can help to raise standards”?
How might the first be true and not the
second? (List as many possible reasons as
you can think of)
How could you test (2) even if you knew that
(1) was true? (To do this you will need to
define exactly what you think (2) means)
© 2005 Robert Coe, University of Durham
5
What is ‘evidence-based’?
Knowledge about ‘what works’ must
come from …


Intervention, not description
Evaluation, not common sense
© 2005 Robert Coe, University of Durham
6
Effective Teachers
(according to Hay McBer) …








Set high expectations
Are good at planning
Employ a variety of teaching strategies
Have a clear strategy for pupil management
Manage time and resources wisely
Employ a range of assessment methods
Set appropriate homework
Keep pupils on task
© 2005 Robert Coe, University of Durham
(For which the DfEE paid £3m) 7
How to get rich
(according to John D. Rockefeller)



Go to bed early
Rise early
Strike oil
© 2005 Robert Coe, University of Durham
(This advice was given for free)
8
The need for high expectations




Everybody knows (Ofsted, school
effectiveness research) teachers should
have high expectations
‘Pygmalion’ effect (Rosenthal and Jacobsen, 1968)
Hugely influential, but flawed
Is it an effective intervention for teaching?

No. For experienced teachers who know their
students, ‘expectation effect’ is zero (Raudenbush,
1984).

‘High expectations’ may not be easily alterable,
or may be the effect rather than the cause
© 2005 Robert Coe, University of Durham
9
Pre-experimental design
1 group:
‘Treatment’
Outcome 1
1. You did A
2. B followed

3. A caused B
© 2005 Robert Coe, University of Durham
10
Example 1: Mentoring




In England, part of the KS3 Strategy
Backed by Government and private
funding
‘Mentoring’ means a lot of different things
Research evidence is
Case studies
 Feelings and perceptions of participants
 Completely inadequate to infer impact

© 2005 Robert Coe, University of Durham
11
Neil Appleby’s Experiment



A randomised controlled trial involving 20
underachieving Y8 (12-13 year-old) students
Paired and split into two groups
Mentored group had 20 mins individually every
two weeks



‘It nearly killed me’
Cost estimated at £250 per mentored pupil
Approx 20% of the school’s annual per pupil
funding
© 2005 Robert Coe, University of Durham
12
What the teachers said about the
mentored students …




“**** is a changed person this year she has
progressed greatly and is a superb helpful
student.”
“Better now, has achieved more, more
confident.”
“Generally a great improvement recently.”
“****’s attitude and effort have improved over
the year. He is a lot pleasanter and more willing
to participate in lessons particularly oral work,
he responds well to praise.”
© 2005 Robert Coe, University of Durham
13
What they said about the control
group …





“Has improved overall this term.”
“****’s attitude and effort have improved over
the last few months, she is now trying very
hard to achieve her target. Great effort.”
“Commended for attitude and progress.”
“**** has settled since the beginning of the
year.”
“**** has undergone quite a transformation
since September. Her attitude towards the
teacher and her learning have improved
drastically and she should be congratulated.”
© 2005 Robert Coe, University of Durham
14
What this proves



If you identify a group of underachieving pupils
at a particular time and then come back to
them after a few months, many of them will
have improved, whatever you did.
Others (the ‘hard cases’) will not have
improved, whether mentored or left alone.
The interpretation of this would have been very
different without a ‘control’ group
© 2005 Robert Coe, University of Durham
15
Quasi-experimental design
Intervention
Post-test
Experimental
Group 1: ‘Treatment’
Group 2: No ‘treatment’
Control
Outcome 1
Outcome 2
1. The two groups were the same at the beginning
2. You treated them differently
3. They were different at the end

4. The treatment caused the difference
© 2005 Robert Coe, University of Durham
16
Example 2: Class size


Do pupils in small or large classes get better results?
Overall, large classes do better, because






Top sets are large, bottom sets are small
Pupils who misbehave are put in small classes
Socially advantaged schools are more popular and hence
need to have larger classes
Socially disadvantaged schools are given more money, so
provide smaller classes
Schools that are popular with pupils also attract the best
teachers
Pupils in large classes are not the same as those in
small classes (and would have been different,
regardless of class size)
© 2005 Robert Coe, University of Durham
17
Quasi-experimental design (with pre-test)
Group 1: Pre-test
Group 2: Pre-test



‘Treatment’
No ‘treatment’
Outcome 1
Outcome 2
Pre-test does not guarantee equivalence, it just tells you
whether they are equivalent
If they are not equivalent, interpretation is quite difficult
Even if pre-test scores are well matched, the groups may not
actually be the same
© 2005 Robert Coe, University of Durham
18
Example 3: Graduate earnings

Graduates earn an extra £400 000 over a lifetime as
a result of going to university
Qualifications
at 18
Go to university
£££
Don’t go to university
£
18 year olds
Pre-test measure
© 2005 Robert Coe, University of Durham
19
Randomised (true) experimental design
Randomised Controlled Trial (RCT)
Random
1 Group
allocation
© 2005 Robert Coe, University of Durham
‘Treatment’
No ‘treatment’
Outcome 1
Outcome 2
20
Example 4: Support for ‘at-risk’ youngsters




Cambridge-Somerville Youth Study (US, WWII)
Aim: to reduce delinquency
650 ‘difficult’ & ‘average’ boys aged 5-11, half
randomly allocated to receive support (home
visits from social workers, psycholog(&iatr)ists,
doctors, tutors, up to twice monthly for five
years, summer camps, etc)
At the end of the project, most said it was
helpful, but no clear differences between the
two groups
© 2005 Robert Coe, University of Durham
21
30 years later …




Two thirds said it was helpful; e.g. kept them
off the streets/out of jail, made them more
understanding, showed someone cared.
57 ‘objective’ comparisons of criminal
behaviour, health, family, work and leisure time,
beliefs and attitudes.
Those who had received the support were no
better on any of these outcomes
7 comparisons showed significant advantage to
the control group
© 2005 Robert Coe, University of Durham
22
Percentage who had experienced ‘undesirable outcomes’
(eg death, criminal conviction, psychiatric disorder):
45
40
35
30
25
20
15
10
5
0
T re a tm e n t
C o n tro l
Those ‘supported’ were more dissatisfied with life, work and marriage
(McCord, 1981)
© 2005 Robert Coe, University of Durham
23
The logic of experimental design
1. The two groups were the same at
the beginning
2. You treated them differently
3. They were different at the end
Hence, the treatment caused
the difference
4. A similar result would be found
elsewhere
© 2005 Robert Coe, University of Durham
Internal
validity
External
validity
24
Threats to causal inference 1:
Lack of valid comparison

No sense of what would have happened
otherwise


No control/comparison group
Non-randomised control group





Non-equivalence initially
Volunteer / selection effects
Statistical regression
Overcompensation by covariance analysis
Random allocation that has not worked
© 2005 Robert Coe, University of Durham
25
Threats to causal inference 2:
Unintended treatments

The effect (or lack of it) was due to something
other than the intended treatment





Reactivity (to pre-test or other measures)
Experiment effects (Hawthorne effects,
competition/demoralisation between groups,
contamination of treatments,
expectation/apprehension effects)
Implementation fidelity (treatment not delivered as
intended)
Control group intervention not specified
Extraneous factors (history, maturation)
© 2005 Robert Coe, University of Durham
26
Threats to causal inference 3:
Incorrect interpretation

An apparent effect (or lack of it) is not what it
seems






Inappropriate timing of outcome measures
Invalid outcome measures (inappropriate, biased,
too specific, too broad, wrong interpretation)
Measurement issues (reliability, ceiling/floor effects,
interval scales)
Attrition (missing data, drop-out, loss to follow-up)
Statistical power (sampling variation)
Post hoc bias (fishing/opportunism, selective
analysis and reporting)
© 2005 Robert Coe, University of Durham
27
Threats to causal inference 4:
Failure to generalise

Similar results would/could not be found again
in other contexts




Representativeness (context & population must be
well-defined; sample - intended and actual - must
be representative of this population)
Replicability (intervention must be feasible, welldefined, replicable)
Scale (differences between small-scale experiment
and large-scale policy)
Transfer (results may not apply in other contexts)
© 2005 Robert Coe, University of Durham
28
Experiments are unethical?






It is unethical not to evaluate practices and
policies
Teachers ‘experiment’ anyway
Need not have ‘no treatment’ control group
Can compensate control group in other
ways
Sometimes treatments are withheld anyway
Only randomise at the borderline
© 2005 Robert Coe, University of Durham
29
Making evidence-based decisions



The problem (and possible solutions)
must be ‘generalisable’ (= ‘policy’)
Agreement about outcomes?
What evidence exists already?





Theory (formal and informal)
Experience
Research
Systematic reviews of research
Conduct an experiment
© 2005 Robert Coe, University of Durham
30
© 2005 Robert Coe, University of Durham
vi
ng
iv
vi
g
P
ro
du
ci
ng
or
ig
in
al
ac
r
or
he
w
k
nt
s
nt
de
de
tu
k
as
irs
c
)
pi
pa
or
k
ls
rs
to
cl
w
tu
te
rs
e
he
th
ot
g
a
te
in
e
rs
tin
th
ng
he
en
ot
m
an
an
ki
to
or
ng
pu
or
ia
w
s)
s
ed
on
at
er
al
at
om
m
hi
(c
al
rc
k
es
or
W
ea
fr o
m
lp
fro
he
lp
pr
w
es
su
tic
ng
ps
ut
ss
ct
ou
r
di
he
do
le
di
gr
an
ac
(h
pr
s
m
s
in
ac
)
)
rt s
ea
po
R
re
ns
r
es
lp
pl
he
he
nd
te
or
co
ith
te
fro
no
IT
vi
e
s
th
on
te
s
ng
of
to
nt
g
R
lp
de
he
al
he
in
tu
oi
or
e
in
us
nt
g
o
D
te
g
no
no
in
si
by
us
d
m
(w
ys
xa
s
ac
am
te
ex
e
h
th
ug
sa
(e
es
av
ed
di
at
se
rs
re
in
du
ng
di
vi
P
ak
ic
n
le
g
sc
H
di
ow
g
au
pl
g
in
s
rin
s
ro
by
am
th
d
ex
am
on
pa
ex
si
ng
he
G
ot
in
ei
si
M
U
in
av
us
du
ak
H
sc
ng
an
si
ec
to
R
g
ei
in
ec
en
R
st
U
M
di
re
us
us
ng
te
a lm o s t n e v e r
P
io
io
ki
en
te rm
s
ev
ev
or
es
m o n th
as
pr
pr
W
pr
fo rtn ig h t
cl
m
c
O nce a
g
fr o
pi
m
to
fr o
a
a w eek
in
ng
ng
g
O n c e o r tw ic e
Li
ki
ki
in
le s s o n
av
or
or
av
E v e ry
H
W
W
H
Teaching is more complex
A v e ra g e fre q u e n c y o f a c tiv itie s in c la s s
G e o lo g y d e p a rtm e n ts
6
5
4
O nce a
3
O nce a
2
N ever or
1
31