Variation: role of error, bias and confounding Raj Bhopal, Bruce and John Usher Professor of Public Health, Public Health Sciences Section, Division of Community Health.

Download Report

Transcript Variation: role of error, bias and confounding Raj Bhopal, Bruce and John Usher Professor of Public Health, Public Health Sciences Section, Division of Community Health.

Variation: role of error,
bias and confounding
Raj Bhopal,
Bruce and John Usher Professor of Public Health,
Public Health Sciences Section,
Division of Community Health Sciences,
University of Edinburgh, Edinburgh EH89AG
[email protected]
Educational objectives
On completion of your studies should
understand:
 That error is crucially important in applied
sciences based on free living populations such
as epidemiology
 Bias, considered as an error which affects
comparison groups unequally, is particularly
important in epidemiology
 The major causes of error and bias in
epidemiology, can be analysed based on the
chronology of a research project
 Bias in posing the research question, stating
hypotheses and choosing the study population
are relatively neglected but important topics in
epidemiology
Educational objectives



Errors and bias in data interpretation and
publication are particularly important in
epidemiology because of its health policy
and health care applications
Confounding is the mis-measurement of the
relationship between a risk factor and
disease and arises in comparisons of
groups which differ in ways that affect
disease
Different epidemiological study designs
share most of the problems of error and bias
Exercise:


Error and bias
Reflect on the words error and bias.
What is the difference, if any, between
error and bias?
Why might error and bias be particularly
common and important in epidemiology?
Error






An error is by definition an act, an assertion, or a
belief that deviates from what is right..but what is
right?
The true length of a metre is arbitrarily decided by
agreeing a definition
The difference between a "correct" metre stick and
an erroneous one can be accurately measured
For health and disease the truth is usually
unknown and cannot be defined in the way we
define metre
Error should be considered as an inevitable and
important part of human endeavor
Popperian view is that science progresses by the
rejection of hypotheses (by falsification) rather
than the establishing of so called truths (by
verification)
Bias




A preference or an inclination
Bias may be intentional or unintentional
In statistics a bias is an error caused by
systematically favoring some outcomes
over others
Bias in epidemiology can be
conceptualised as error which applies
unequally to comparison groups.
Error and bias in biology





Biological research is difficult because of
the complexity and variety of living things
Circadian and other natural rhythms cause
change
Measurement techniques are usually limited by
technology, cost or ethical considerations
Strict rules restrict what measurement is
permissible ethically and what humans are
willing to give their consent to
Experimental manipulation to test a hypothesis
is usually done late
Figure 4.1
(a) Error is unequal
in one of these
groups leading to a
false interpretation
of the pattern of
disease
- falsely detecting
differences
(b) Error is
unequal in one of
these groups
leading to a false
interpretation of
the pattern of
disease
- here failure to
detect differences
Error and bias in epidemiology



Error and bias in epidemiology focus on:
(a) selection (of population),
(b) information (collection, analysis and
interpretation of data) and
(c) confounding
Error and bias is also inherent in the process
of developing research questions and
hypotheses but is seldom discussed
Are questions of sex or racial differences in
intelligence, disease, physiology or health
biased questions?
The research question,
theme or hypothesis



Science is done by human beings who
often have strong ideas and views
They share in the social values and beliefs
of their era such as class, racial and sexual
prejudice
The question "Are men more intelligent (or
healthy) than women?" could be
considered a biased question
Research question




Apparently the neutral hypothesis here
would be that there are no gender
differences in intelligence
The underlying values of the researchers
may be that men are more intelligent than
women
Likely to be revealed at the analysis and
interpretation stage by biased
interpretation
It is problematic to describe difference
without conveying a sense of superiority
and inferiority
The research question



Syphilis Study of the US Public Health
Service followed up 600 African American
men for some 40 years
The question: does syphilis have different
and, particularly, less serious outcomes in
African Americans than European origin
Americans?
Investigators denied the study subjects
treatment even when it was available and
curative (penicillin)
Choice of population





Known as selection bias
Volunteers are a popular choice
Volunteers tend to be different in their
attitudes, behaviours and health status
compared to those who do not volunteer
Men have been more often selected than
women
Investigators are prone to exclude individuals
and populations for reasons of convenience,
cost or preference rather than for neutral,
scientific reasons
Selection bias






Selection bias is inevitable, simply because
investigators need to make choices
Captive populations are popular-some may be
fairly representative, e.g. schoolchildren, others
not at all, e.g. university students
People are also missed either inadvertently or
because they actively do not participate
Selection bias matters much more in epidemiology
than in biologically based medical sciences.
Biological factors are usually generalisable between
individuals and populations, so there is a prior
presumption of generalisability
If an anatomist describes the presence of a particular
muscle, or cell type, based on one human being it is
likely to be present in all human beings (and possibly
all mammals)
Non-participation






Some subjects chosen for a study do not
participate causing selection bias
The non-response in good studies is typically
30%-40%
Non-responders differ from those who respond
Problem is compounded when the nonresponse differs greatly in two populations
that are to be compared
The effect may be understood if some
information is available on those not
participating e.g. their age, sex, social
circumstances and why they refused
Non-response bias is an intrinsic limitation of
the survey method and hence of epidemiology
Figure 4.2

Ignoring
Study population
populations

Questions
harming one
population

Ignored
population
Measuring
unequally

Generalising
from
unrepresentative
populations
Comparison population
Comparing risk factor-disease
outcome relationships in populations
which differ (confounding)




Confounding is a difficult idea to explain and
grasp
It is the error in the measure of association
between a specific risk factor and disease
outcome, which arises when there are differences
in the comparison populations other than the risk
factor under study
Confounding is derived from a Latin word
meaning to mix up, a useful idea, for confounding
mixes up causal and non-causal relationships
The potential for it to occur is there whenever the
cardinal rule “compare like-with-like” is broken
Exercise: Confounding




Imagine that a study follows up people
who drink alcohol and observes the
occurrence of lung cancer
A group of people who do not drink and are of
the same age and sex provide the comparison
group
The study finds that lung cancer is more
common in alcohol drinkers, i.e. there is an
association between alcohol consumption
and lung cancer.
Did alcohol causes lung cancer?
Confounding




In what other important ways might the
study (alcohol drinking) and comparison
(no alcohol drinking) populations be
different?
Could the association between alcohol and
lung cancer be confounded?
What might be the confounding variable?
First key analysis in all epidemiological
studies is to compare the characteristics of
the populations under study
Examples of confounding
The confounded
association
One possible
explanation
The confounded
factor
The confounding To check the
(causal) factor
assumption
(a) People who
drink
alcohol have a
raised
risk of lung
cancer
Alcohol drinking
and smoking are
behaviours
which go
together
Alcohol, which
is a marker for,
on average,
smoking more
cigarettes
Tobacco, which
is associated
with both
alcohol and with
the disease
See if the
alcohol-lung
cancer
relationship
holds in people
not exposed to
tobacco: if yes,
tobacco is not a
confounder
(stratified
analysis chapter
7).
Figure 4.3
The true cause &
confounding
variable
A statistical but not causal association
Apparent but spurious
risk factor for disease
Disease
Figure 4.4
Smoking
Alcohol is statistically
but not causally linked to lung cancer
Alcohol
drinking
Lung cancer
Possible actions to control
confounding
Possible Action
Study Design : Randomise individual subjects or
units of populations e.g. schools.
Study Design :Select comparable groups/ restrict
entry into study
Study Design : Match individuals or whole
populations
Analysis : Analyse subgroups separately
Analysis : Adjust data statistically
Measurement errors in
epidemiology





Information bias
Why are measurement errors in epidemiology
likely to be more common and more important
than in other scientific disciplines - say
physics, anatomy, biochemistry or animal
physiology?
Assessing the presence of disease in living
human beings requires a judgement
Measuring socio-economic circumstances,
ethnic group, cigarette smoking habits or
alcohol consumption are complex matters
These errors are life-and-death matters, even
in epidemiological research
Measurement errors





Past exposures will need to be estimated,
sometimes from contemporary measures
Biological variation needs to be taken into
account e.g. blood pressure varies from
moment to moment in response to
physiological needs related to activity, in a 24
hour (circadian) cycle with lowered pressure in
the night, and with the ambient temperature
Some variables have natural variation so great
that making estimates is extremely difficult, for
example, in diet, alcohol consumption, and the
level of stress
Machine imprecision is also inevitable
Inaccurate observation by the investigator or
diagnostician
Measurement errors and bias


Measurement errors which occur unequally
in the comparison populations are:
-differential misclassification errors or bias
-likely to irreversibly destroy a study
-will increase the strength of the association
in error
Non-differential errors or biases, occurring
in both comparison populations, are much
more likely to occur






Misclassification bias
Misclassification error (or bias) occurs when
a person is put into the wrong category
(or population sub-group), usually as a result
of faulty measurement
Some people who are hypertensive will be
misclassified as normal
Some who are not hypertensive will be misclassified
as hypertensive
The end result in terms of the prevalence of
hypertension may be about right
The degree to which a measure leads to a correct
classification can be quantified using the concepts
of sensitivity and specificity - and these are
discussed in relation to screening tests
In measuring the strength of association between
exposures and disease outcomes non-differential
misclassification error has an important and not
always predictable effect
Non differential
misclassification error




Imagine a study of 20,000 women, 10,000 on the
contraceptive pill and the rest not
Say that over 10 years 20% of those on the pill
develop a cardiovascular disease compared to
10% of those not on the pill
The rate of disease in the oral contraceptive
group is doubled (relative risk = 2)
Assume that misclassification in exposure
occurs 10% of the time, so that 10% of women
actually on the pill were classified as not on the
pill, and that 10% who were not on were
classified as on the pill
Imaginary study of cardiovascular
outcome and pill use : no
misclassification
True
classification
of pill use
status
Cardiovascular Disease
Yes
No
Total
Yes
2,000
8,000
10,000
No
1,000
9,000
10,000
3,000
17,000
20,000
Pill and cardiovascular disease :
10% misclassification of pill use
Classification of pill
use status
Cardiovascular Disease
Yes
1,800
No
7,200
Total
9,000
Yes, classified wrong
(actually not on the pill
so incidence rate is
10%).
100
900
1,000
Subtotal
No, classified right (not
on the pill (so
incidence rate is 10%)
No, classified wrong
(actually on the pill so
incidence rate is 20%)
1,900
900
8,100
8,100
10,000
9,000
200
800
1,000
Subtotal
TOTAL
1,100
3,000
8,900
17,000
10,000
20,000
Yes, classified right (on
the pill so incidence
rate is 20%)
Misclassification: the pill

The risk of CVD in the "pill users group" with
10% misclassification is1,900/10,000, and in
the "not on the pill group" is 1,100/10,000, so
the relative risk is
1,900/ 10,000
 1.7
1,100/ 10,000



Misclassification will, inevitably, also arise in
measurement of the disease outcome, further
reducing the strength of the association
Generally, non-differential misclassification
bias lowers the relative risk.
This general principle may break down when
misclassification occurs in confounding
variables as well
Analysis and interpretation






Usually the potential for data analysis is far
greater than that actually done
The choices will be informed by the prior interests
(and biases) and expertise of the researcher
External scrutiny at an early stage by objective
advisors of the research protocol could reduce
such biases
Inclusion of objective, uninvolved people in the
research team at the data analysis and
interpretation stage is possible but unusual, so,
Investigators should ensure their analysis is driven
by hypotheses, research questions and an analysis
strategy prepared in advance
Proposal is that investigators should make public
their data questionnaire, the analysis strategy, and
other information required to replicate the analysis
Judgement and action





The data and interpretation are examined by
those who need to make decisions
Interpretations, especially those which involve
change that may threaten powerful interests, will
be contested.
Interpretation is a matter of judgement and
judgement will depend on the prior values, beliefs
and interests of the observer
Epidemiologists are not the sole arbiters of the
theory and data.
Epidemiologists, however, have responsibilities
for minimising the impact of their own biases and
preventing the misinterpretation of data and
recommendations by those with vested interests
Study population bias:
generalisation



Much of epidemiology is concerned with
population subgroups and comparisons
between them
The interpretation rests on the
assumption that the results apply, at least,
to the whole group as originally chosen if
not the whole population
Error arises in the inappropriate
generalisation of study data to another
population
Controlling errors and bias



Error control requires awareness and
good scientific technique
Bias control needs equal attention to error
control in all the population sub-groups
Error and bias cannot be fully controlled
so the most important need is for
systematic, cautious and critical
interpretation of data
Conclusion





Bias is a central issue in epidemiology
When epidemiological data are applied to
provide health advice to individuals and to
shape public health policy, error and bias are
especially important
I am not aware of an epidemiological theory on
why error and bias occur
Social sciences research on the nature of
science indicates that the scientific endeavour
is not wholly objective but open to the
influence of society and context
The framework provided by the chronology
and structure of a research project offers a
logical approach to analysis of bias and error
Conclusions









The main principles are:
develop research questions and hypotheses
which benefit all the population and will not lead
to harm
study a representative population
measure accurately and with equal care across
comparison groups
compare like-with-like
check for the main findings in subgroups before
assuming that inferences and generalisations
apply across all groups
findings of a single study should rarely be
accepted at face value
first consider artefact
a critical attitude is essential