No Slide Title

Download Report

Transcript No Slide Title

The following lecture has been approved for
University Undergraduate Students
This lecture may contain information, ideas, concepts and discursive anecdotes
that may be thought provoking and challenging
It is not intended for the content or delivery to cause offence
Any issues raised in the lecture may require the viewer to engage in further
thought, insight, reflection or critical evaluation
Validity of Research
Threats to Validity
Dr. Craig Jackson
Senior Lecturer in Health Psychology
School of Health and Policy Studies
Faculty of Health & Community Care
University of Central England
[email protected]
Validity
Important consideration
Example project:
access to 300 workers
workers’ ability is assessed
workers attend a 1 week training course
workers’ ability is assessed again
classic within-subjects design (pre-post test design)
Design Concept - Between subjects method
300 subjects randomised
150 control group
150 intervention group
assess ability
control results
intervention results
compare mean scores
Design Concept - Within-subjects method - better
300 subjects randomised
300 control group
assess ability #1
training course
300 treatment group
assess ability #2
Threats to within-subjects designs
100
75
observe increase after training course
50
25
gain from test #1 to test #2 scores
0
student concludes the outcome (improvement) is due to training
could this be wrong?
some threats to internal validity that critics (examiners) might raise
and some plausible alternative explanations for the observed effects
History threats
Some “historical” event caused increase – not the training
TV & other media
Sesame Street, Countdown, Tomorrow’s World, Open University
Elementary intellectual content
Can be mundane or extraordinary
“Specific event / chain of events”
British Journal of Psychiatry (2000) 177: pp469-72
Maturation threats
“Age is the key to wisdom”
Improvement would occur without any training course
Measuring natural maturation / growth of understanding
Effects up to a certain limit
Differential maturation
Similar to “history threat”?
Testing threats
Specific to pre-post test designs
Taking a test can increase knowledge
Taking test #1 may teach participants
Priming – make ready for training in a way they would not be
Heisenberg’s Uncertainty Principle (1927)
Instrumentation threats
Specific to pre-post test designs
“Making the goals bigger”
Taking a test twice can increase knowledge
Studies do not use same test twice
Avoiding testing threats
Perhaps 2 versions of the test are not really similar
The instrument causes changes not the training course
Instrumentation threats (further)
Specific to pre-post test designs
Especially likely with human “instruments”
Observations or Clinical assessment
3 Factors
Observers fatigue over time
Observers improve over time
Different observers
Mortality threats
Metaphorical
Dropping out of study
Obvious problem?
Especially when drop out is non-trivial
N = 300 take test #1
N =50 drop-out after taking test #1
N = 250 remain and take test #2
What if the drop-outs were low-scorers on test #1? (self-esteem)
Mortality threats (further)
Mean gain from test #1 to test #2
Using all of the scores available on each occasion
Includes 50 low test #1 scorers (soon-to-be-dropouts) in the test #1 score
Mean score
Test #1 (n=300)
60.5 (± 9.7)
Test #2 (n=250)
81.6 (± 8.9)
Problem - - drops out the potential low scorers from test #2
Inflates mean test #2 score over what it would be if the poor scorers took it
Solution - - compare mean test #1 and test #2 scores for only those workers
who stayed in the whole study (n = 250)?
No - - a sub-sample certainly not be representative of the original sample
Mortality threats (further)
Degree of this threat gauged by comparison
Compare the drop-out group (n = 50) with the non drop-out group (n = 250)
e.g.
using test #1 scores
demographic data – especially age & sex
If no major differences between groups:
Reasonable to assume mortality occurred across entire sample
Reasonable to assume mortality was not biasing results
Depends greatly on size of mortality N
Regression threats
Things can only get better – things can only get worse
“Regression artefact”
“Regression to the mean”
Purely statistical phenomenon
Whenever there is:
a non-random sample from a population
two measures imperfectly correlated
(test #1 and test #2 scores)
these will not be perfectly correlated with each other
Regression threats
Few measurements stay exactly the same – confusing?
e.g.:
If a training program only includes people who are the lowest 10% of the class
on test #1, what are the chances that they would constitute exactly the lowest
10% on test #2?
Not very likely !
Most of them would score low on the post-test but unlikely to be the lowest
10% twice!
The lowest 10% on test #1, they can't get any lower than being the lowest -they can only go up from there, relative to the larger population from which
they were selected
Summary of single-group threats
History threats
Maturation threats
Testing threats
Instrumentation threats
Mortality threats
Regression threats
Multiple Group threats
Comparison of 2 different methods
Training course to aid factory workers’ in living health lifestyle
Example of an MSc project:
Student has access to 300 workers
1. Workers’ lifestyle is assessed (test #1)
2. 50% workers attend 1 week healthy lifestyle program
2. 50% workers shown a healthy lifestyle software
3. Workers’ lifestyle is assessed again (test #2)
Design Concept – Between and Within subjects method
Randomisation of 300 subjects
software group n=150
training course group n=150
complete lifestyle assessment
test #1
complete lifestyle assessment
test #1
trained on software
attend training course
complete lifestyle assessment
test #2
complete lifestyle assessment
test #2
Factory workers and Healthy Lifestyle Training
What does the graph show?
Healthy Lifestyle Score (HLS)
100
75
50
25
0
Test #1
Test #2
Selection comparability threats
What if there is:
an overall change from test #1 to test #2
level of change different between the two groups?
Student concludes:
the outcome is due to the different styles of risk
assessment program.
How could this be wrong?
Key validity issue:
the degree to which the groups are comparable
before the study.
If groups comparable and the only difference between them is the program,
post-test differences can be attributed to the program
a big “IF”
Selection comparability threats
If groups not comparable to begin with, how much of the change can be
attributed to training programs or to the initial differences between groups?
The only multiple group threat to internal validity
This threat is a selection bias or selection threat
Selection threat -
“any factor other than the program that leads to
post-test differences between groups”
Selection History threats
Any other event that occurs between test #1 and test #2 that the 2 groups
experience differently
“selection threat”
“history threat”
the groups differ in some way
the way the groups differ is with respect to their
reactions to / experiences of “historical” events
e.g.
If the groups may differ in their reading habits
Perhaps the training course group read “health” more frequently than those
in the software group
A higher test #2 score for the training course group doesn't indicate the effect
of lifestyle training…..…..it's really an effect of the two groups differentially
experiencing a relevant event (TV)
Selection Instrumentation threats
Any differential change in the test used for each group from pre-course and
post-course.
e.g. the test may change differently for the two groups
Especially observers: - differential changes between groups
Selection Mortality threats
Arises when there is differential (non-random) dropout between the two
groups, from test #1 to test #2
Different types of workers might drop out of each group,
More may drop out of one than the other
Possibly based on how they were selected
Observed differences in results might be due to the different types of
dropouts -- the selection-mortality -- and not to the different training programs
If the selection into groups was not random a bias will often exist
Selection Regression threats
Occurs when there are different rates of regression to the mean in the two
groups.
This might happen if one group scores more extremely on test #1 than the
other group – bias again
Perhaps that the software group is getting a disproportionate number of low
ability workers (factory managers think they need the “new” tutoring)
Managers don't understand the need for 'comparable' program and
comparison groups!
Since the software group has more extreme lower scorers at test #1, their
mean will regress (increase) a greater distance toward the overall population
mean at test #2, and they will appear to “gain” more than the training course
group