Study design Raj Bhopal, Bruce and John Usher Professor of Public Health, Public Health Sciences Section, Division of Community Health Sciences, University of Edinburgh, Edinburgh.

Download Report

Transcript Study design Raj Bhopal, Bruce and John Usher Professor of Public Health, Public Health Sciences Section, Division of Community Health Sciences, University of Edinburgh, Edinburgh.

Study design
Raj Bhopal,
Bruce and John Usher Professor of Public Health,
Public Health Sciences Section,
Division of Community Health Sciences,
University of Edinburgh, Edinburgh EH89AG
[email protected]
Educational objectives






Understanding disease causation and measuring the
burden of disease are the two key purposes underlying
epidemiological studies.
That epidemiological studies are unified by their common
purposes, by their utilisation of the survey method and
their dependency on the concept of a defined population.
All study designs potentially contribute to questions of
cause and effect, health policy and planning, and clinical
practice.
A clinical case-series is a coherent set of cases compiled
by one or a few clinicians.
A population case-series study, consisting of a set of
cases in a defined population and time, lays the
foundation for description of disease by place, time and
characteristics of population.
If cases are compared with non-cases from the same
population the design is that of a case-control study,
which generates and tests causal hypotheses, through
the analysis of associations.
Study design: introduction




There are five basic designs based on
individual data.
There are modifications of these study
designs.
Discussions tend to consider each design
as being distinct, but
Ideas which underlie study design are
interrelated.
Commonalities in study design





The common goal of epidemiological
studies is understanding the frequency,
pattern and causes of disease in
populations.
Reliance on the survey method.
Rooted in the concept of population.
The underlying, or base population, is the
starting point.
Such understanding helps define the
common ground, and relative unity, of
epidemiological study design.
Five basic epidemiological designs
for studies based on individuals;
and a mode of analysis






Case-series (clinical and population)
Cross-sectional
Case control
Cohort (prospective and retrospective)
Trial
Most textbooks refer to an ecological
study design but such studies are usually
a mode of analysis, applied in large-scale
research.
Classifications of study design:
five dichotomies





Descriptive and analytic studies.
Retrospective and prospective studies.
Observational and experimental studies.
Presence or absence of disease at the
beginning of a study.
Studies which incorporate a specific
comparison group and those which do not.
Case-series: clinical and
population based






Clinical case-series - usually a coherent and
consecutive set of cases of a disease (or similar
problem) which derive from either the practice of
one or more health care professionals or a
defined health care setting e.g. a hospital or
family practice.
A case-series is, effectively, a register of cases.
Analyse cases together to learn about the
disease.
Clinical case-series are of value in epidemiology.
Studying symptoms and signs.
Creating case definitions.
Case series



When a clinical case-series is complete for a
defined geographical area for which the
population is known, it is, effectively, a
population based case-series consisting of a
population register of cases.
Epidemiologically the most important caseseries are registers of serious diseases or
deaths, and of health service utilisation, e.g.
hospital admissions.
Usually compiled for administrative and legal
reasons.
Thought exercise: Case-series,
natural history and spectrum

How does the case-series (clinical
and population) contribute to our
understanding of the natural history
and spectrum of disease?
Case series: natural history
and spectrum



By delving into the past circumstances of
these patients, including examination of past
medical records, and by continuing to observe
them to death (and necropsy as appropriate)
clinicians can build up a picture of the natural
history of a disease.
Population case-series is a systematic
extension of this series but which includes
additional cases, e.g. those dying without
being seen by the clinicians.
Add breadth to the understanding of the
spectrum and natural history of disease.
Case series: population






Full epidemiological use of case-series data
needs information on the population to
permit calculation of rates.
Key to understanding the distribution of
disease in populations and to the study of
variations over time, between places and by
population characteristics.
Case-series can provide the key to sound case
control and cohort studies and trials.
Design of a case-series is conceptually simple.
Defines a disease or health problem to be
studied and sets up a system for capturing
data on the health status and related factors in
consecutive cases.
Many countries have no valid case-series even
for mortality.
Case series: requirements for
interpretation
To make sense of case-series data the key
requirements are:
 The diagnosis or, for mortality, the cause of death.
 The date when the disease or death occurred
(time).
 The place where the person lived, worked etc
(place).
 The characteristics of the person (person).
 The opportunity to collect additional data from
medical records (possibly by electronic data
linkage) or the person directly.
 The size and characteristics of the population at
risk.
Case series: additional data



Case-series data can be linked to other
health data either in the past or the future,
e.g. mortality data can be linked to
hospital admissions including at birth and
childhood, cancer registrations and other
records to obtain information on
exposures and disease.
Cases may also be contacted for
additional information.
This type of action may turn a case-series
design into a cohort design.
Case series: analysis





Case-series data are analysed using rates.
Three circumstances, in particular, where
rates are not used.
Spatial clustering.
When the population is stable.
When there is no suitable denominator.
(Use proportional ratios.)
Case series: strengths


Population case-series permit two arguably
unique forms of epidemiological analysis and
insight.
Paint a truly national and even international
population perspective on disease.
The disease patterns can be related to aspects
of society or the environment that affect the
population but have no sensible measure at
the individual level e.g. ozone concentration at
ground level and the thickness of the ozone
layer in the earth's atmosphere.
Figure 9.1
(b) Population based
case series
(a) Clinical case series
Boundary
CHD
CHD
CHD
Visitor
CHD
CHD
CHD
CHD
CHD
CHD
Visitor
excluded
CHD
CHD
CHD
CHD
Additional
cases
Cases outside
boundary excluded
CHD
CHD
CHD
CHD




Identify cases seen by one or more
clinicians
Case series is unlikely to be a complete
set of cases
Assess characteristics
There is no accurately defined boundary
so rates cannot be calculated



Only cases within a defined boundary are
included
Note that there are extra deaths compared
to the figure for the clinical case series
The extra cases symbolise those not seen at
the clinical facility e.g. street deaths
Figure 9.2
CHD
CHD
CHD
CHD
CHD
CHD
CHD
CHD
CHD
CHD
Natural history
Past
Healthy
Now
Diseased
Future
Dead
Making use of indicators with no
valid individual measures: exercise

How might epidemiology study the
potential role in disease causation of
factors which vary little between
individuals within a region or nation, e.g.
fluoride content of the water, the hardness
or softness of water supplies or annual
exposure to sunshine?
Ecological studies, design and analyses





Ecology is the study of living organisms in
relation to their environment.
How, then, must we conceptualise the ecological
study?
There are variables which are truly not based on
individual data and that are useful in
epidemiology.
Gross national product, air quality measures, lead
in water, the weather, expenditure on roads, the
type of political structure, the density of
population.
Variables can be studied on their own with
descriptions of time trends, variation between
places, and differences by the characteristics of
the populations in these places.
Ecological studies/ecological analysis







There are studies where exposure data relating to a
place (say hardness of water, which could be
collected on individuals) are correlated with health
data collected on individuals but summarised by
place (say CHD rates).
Are these ecological studies?
Boundaries are blurred.
Conceptually, the ecological component in this kind
of study is an issue of data analysis and not study
design.
Cross-sectional, case-control and cohort studies and
trials (and not just population case-series) could also
be analysed in relation to such "ecological" variables
and such units of analysis.
Most ecological analyses are based on population
case-series.
Ecological analyses are subject to the ecological fallacy.
Ecological fallacy: example






Imagine a study of the rate of coronary heart
disease in the capital cities of the world
relating the rate to average income.
Within the cities studied, coronary heart
disease is higher in the richer cities than in the
poorer ones.
We might predict from such a finding that
being rich increases your risk of heart disease.
In the industrialised world the opposite is the
case - within cities such as London,
Washington and Stockholm, poor people have
higher CHD rates than rich ones.
The ecological fallacy is usually interpreted as
a major weakness of ecological analyses.
Ecological analyses, however, informs us
about forces which act on whole populations.
Exercise: applying individual data
to populations



Reflect on whether observations on
individuals are always applicable to
populations.
Can you think of an example of when this
is so and when it is not?
Why do you think this happens?
Atomistic fallacy





Studies of individuals are prone to the
opposite of the ecological fallacy, the
so-called atomistic fallacy.
Wrongly assume from observations on the
causes of disease in individuals that the same
forces apply to whole populations.
For example, at an individual level a high
income or a marker of material success such
as employment, car access etc., is associated
with a lower rate of suicide.
Does not mean that populations or societies
which are rich have a lower rate of suicide or
better mental health.
Opposite seems to be true.
Case series: final comments


Viewpoint that case-series studies
(whether based on individuals or
aggregate data) are descriptive,
observational and epidemiologically weak
is inappropriate.
They offer some unique opportunities and
perspectives on the pattern and causes of
disease in populations.
Cross-sectional study





A cross-section is the shape that results from
cutting a slice from an object.
A cross-sectional study exposes and studies
disease and risk factor patterns in a
representative part of the population, in a
narrowly defined time period.
Primarily, this study provides information on
prevalence of disease and risk factors.
It also can seek associations, generate and test
hypotheses and, by repetition, be used to
measure change.
Ideal cross-sectional study is of a geographically
defined, representative sample of the population
studied within a slice of time and space.
Cross sectional study








The sampling frame usually conforms to
the snapshot analogy.
The measurements are made over a relatively
short period of time such as a year or two.
Excellent for measuring the population burden
of disease.
People representing virtually all stages of
health and disease.
A wide spectrum of disease.
Indirect insights on the natural history.
People with severe disease, however, may be
institutionalised.
Survivor bias.
Figure 9.3
CHD
CHD
CHD
CHD
CHD
future
now
past
Figure 9.4
CHD
CHD
CHD
Explore natural history
Past
Now
Case-control study




The case-control study is a comparative study
where people with the disease (or problem) of
interest are compared with a reference
population.
The comparison, control or reference group
supplies information about the expected risk
factor profile in the population from which the
case group is drawn.
Cases can be obtained from a number of sources
- from a clinical case-series, a population register
of cases, from the new cases identified in a
cohort study, and from those identified in a
cross-sectional survey.
Ideal set of cases would be all the new (incident)
ones in the population under study.
Case control studies




Control subjects should be chosen with no
selection in relation to their pattern of
exposure to the postulated causes, but should
otherwise be alike to the cases.
In some studies, controls are recruited to
match each case, e.g. if a woman of 53 years
was recruited as a case, the investigator
would seek a control of similar age, (e.g. 57
would be fine, but not 72).
The concept is clear: to find differences in
exposure to the hypothesised causes in the
past lives of cases as compared to controls.
The data are summarised first as differences
in prevalence of exposure, and then as the
odds ratio.
Case control studies


A classic study by Herbst et al on the
occurrence of the extremely rare disease
adenocarcinoma of the vagina in girls and
young women illustrates the issues.
There was an association between the disease
and use of diethylstilbestrol by mothers of
cases in the first trimester of pregnancy with
seven of eight cases being treated with the
drug compared to none of the 32 controls.
Figure 9.5
CHD
C
C
CHD
CHD
C
C
CHD
C
CHD
C
Exposure ?
CHD
Exposure?
future
now
past
Figure 9.6
CHD
CHD
C
C
CHD
CHD
C
C
C
CHD
Seek differences in exposure and
other aspects of past natural history
Past
Now
Disease is known,
exposure unknown
Cohort studies






People, particularly clinicians, speak of their
cohort, simply meaning a group, irrespective
of the study design.
The word comes from the Latin word cohors
meaning an enclosure, company or crowd.
In Roman times a cohort was a body of 300-600
infantry.
In epidemiological terms the cohort is a group of
people with something in common - usually an
exposure or involvement in a defined population
group.
Cohort study involves tracking the study
population over a period of time.
The essential idea is to relate one or more
characteristics, exercise for example, to future
outcomes e.g. incidence of coronary heart
disease.
Cohort studies






Cohort studies measure disease incidence
rates.
Cohort studies usually test the hypothesis that
disease incidence differs in people with
different characteristics (exposures) at
baseline.
They begin by establishing baseline data, often
from a cross-sectional study.
Cohort can be followed up directly, or,
The baseline data can be linked to health
records.
The ratio of the incidence rates in the exposed
and non-exposed groups derived from the
cohort study is the relative risk.
Figure 9.7
Time 1 / Future
Time 0 / Now
CHD
NE
E
NE
E
NE
NE
NE
CHD
NE
E
NE
E
NE
CHD
NE
CHD
NE
E
NE
CHD
NE
NE
NE
E
E
NE
NE
E
NE
NE
CHD
E
E
NE
E
future
now
Figure 9.8
E
E
NE
E
NE
NE
CHD
NE
NE
E
NE
NE
CHD
CHD
NE
Explores natural history
including disease outcomes
Past
Now
Exposure is known,
outcome will be explored
Future
CHD
NE
NE
E
NE
Figure 9.9: retrospective cohort study
Time 1 / Now
Time 0 / Past
CHD
NE
NE
E
NE
E
NE
NE
NE
NE
E
NE
CHD
NE
CHD
NE
E
NE
NE
E
CHD
NE
NE
NE
NE
E
E
NE
NE
NE
CHD
NE
NE
E
NE
NE
NE
E
E
CHD
E
E
NE
E
E
Define the cohort
E
now
past
Figure 9.10: the retrospective cohort study
E
E
NE
E
NE
NE
CHD
NE
NE
E
NE
NE
CHD
CHD
NE
CHD
NE
NE
E
NE
Explores natural history
including disease outcomes
Past
Now Future
Exposure status is known for the past,
outcomes are explored in the present
Clinical trials




Are studies where an intervention
designed to improve health has been
applied to a population.
Trials are experiments.
Same design as a cohort study with one
vital difference, that the exposure status
of the study population has been
deliberately changed by the investigator.
We observe how this change in exposure
alters the incidence of disease or other
features of the natural history.
Clinical trials







Define a study population suitable for
answering the question.
Divide the study population into two or more
groups.
The control group may be offered the best known
alternative.
In the ideal trial, the study and control
populations are similar in characteristics
impacting on disease outcomes.
To achieve this similarity individuals in the study
are assigned randomly to the groups.
This is a randomised, controlled trial.
“Best known alternative" is sometimes an
intervention which is “psychologically” of similar
impact to the study intervention
Time 2 / Future
Figure 9.11
Time 1 / Shortly
I
I
Time 0 / Now
I
CHD
I
I
I
I
I
I
I
I
I
I
CHD
Intervention group
randomly
allocate
CHD
C
C
C
C
C
C
CHD
C
CHD
C
C
C
Control group
future
now
Figure 9.12
Explores progression / prognosis
I
I
I
I
I
CHD
I
I
I
I
C
C
CHD
C
CHD
C
C
C
CHD
C
Explores natural history if placebo controlled,
or prognosis if best alternate treatment
Now
Future
Size of the study







Sample size will be dictated by the research
questions and stated study hypotheses.
Hypotheses needs to be specified in a way that
can be quantified.
Precision of the answer required needs to be
stated.
The size of the minimum difference that it is
important to detect should be stated.
Keep low the chances of two types of
statistical error.
Type 1
Type 2
Data and analysis and
interpretation





Interpret data properly, particularly taking into
account error, bias and frameworks for
analysis of associations.
Make appropriate choices in data analysis.
Examine numbers of cases and percentages
and age and sex specific prevalence or
incidence data.
Choices of summary measures need to be
made.
Judgements of cause and effect will often be
required.
Design and theory



Epidemiological designs are based on the
theories discussed in earlier chapters,
particularly that differential exposure to the
causes of disease leads to differential
population patterns of disease.
The cohort study tests this theory directly.
The trial tests it indirectly.
The case-series, case-control and crosssectional designs test the theory indirectly and
retrospectively.
Exercise: Strengths and
weaknesses of the study designs


Based on the principles of study design and
your knowledge of the purposes of
epidemiology, consider the relative strengths
and weakness of case-series, cross-sectional,
case-control and cohort studies, and trials.
Put these in a table. You may find the
following key words and phrases helpful in
your reflection: ease, timing, maintenance and
continuity, costs, ethics, data utilisation, main
contributions, observer and selection bias,
analytic outputs.
Some of the strengths and
weaknesses of each study design
Theme
 Ease
 Timing
 Maintenance and continuity
 Costs
 Ethics
 Data utilisation
 Main contribution
 Observer bias
 Selection bias
 Analytic output
Overlap in the conceptual basis of the caseseries, cross-sectional, case-control, cohort
and trial designs






The cross-sectional study can be repeated
If the same sample is studied for a second time i.e. it
is followed up, the original cross-sectional study now
becomes a cohort study.
If, during a cohort study, possibly in a subgroup, the
investigator imposes an intervention, a trial begins.
Cohort study also gives birth to case-control studies,
using incident cases (nested case control study).
Cases in a case-series, particularly a population based
one, may be the starting point of a case-control study
or a trial.
Not every epidemiological study fits neatly into one of
the basic five designs.
Summary







Studies have a common goal to understand
the frequency and causes of disease.
Seeking causes starts by describing
associations between exposures (causes) and
outcomes (disease).
Survey method.
Basis in defined populations.
Case-series is a coherent set of cases of a
disease (or similar problem).
Cases are compared with reference group we
have a case control study
In a population studied at a specific time and
place (a cross-section) the primary output is
prevalence data, though association between
risk factors and disease can be generated.
Summary





If the population in a cross-sectional survey
is followed up to measure health outcomes
this study design is a cohort study.
If the population of such a study are, at
baseline, divided into two groups, and the
investigators impose a health intervention
upon one of the groups the design is that of a
trial.
Studies based on aggregated data are
commonly referred to as ecological studies.
Mostly, ecological studies are mode of
analysis, rather than a design.
Interpretation and application of data are
easier when the relationship between the
population observed and the target population
is understood.