投影片 1 - cjcu.edu.tw

Download Report

Transcript 投影片 1 - cjcu.edu.tw

Epidemiology - Design and Bias Evaluation
Fall, 2008
Study type 2
Reference:
1. Szklo M and Nieto FJ. Epidemiology – Beyond the
Basics. Aspen Publishers, Maryland 2000.
2. Rothman K and Greenland S. Modern Epidemiology.
Lippincott-Rven Publishers, Philadelphia, 1998.
3. Rothman K. Epidemiology – An Introdution. Oxford
University Press, New York, 2002.
4. Kelsey JL, Whittemore AS, Evans, and Thompson
WD. Methods in Observational Epidemiology. Oxford
University Press, New York, 1996
5. Kleinbaum DG, Kupper LL, and Morgenstern H.
Epidemiologic Research. Van Nostrand Reinhold,
New York. 1982.
Class Contents:
1. Types of Observational Studdies
2. Ecological Study
3. Cohort Study
4. Case-control Study – the principles
5. Case-based Case-control Study
6. Nested Case-control Study
7. Case-cohort Study
8. Case-crossover Stugy
9. Cross-sectional Study
10.Study Critique
Questions:
Please contact me at :
06-2785123 ext 3118 or
[email protected]
Study type 1
Observational Epidemiologic Studies
Descriptive (incidence, prevalence)
Analytic (associate characteristics of
population with risk of disease)
Study type 3
Types of Observational studies
A. Ecologic
B. Cross-sectional
C. Cohort
1. Prospective cohort
2. Retrospective (non-concurrent) cohort
D. Proportional mortality studies
E. Case-control
Study type 4
Variants of the case-control design
Case-based case-control
Nested case-control study
Case-cohort
Case-crossover
Steps in epidemiologic research
Study type 5
1.Define questions/hypothesis based on present
states of knowledge.
2. Choose appropriate study design.
3. Define groups for comparison
(e.g., cases vs. controls, exposed vs.
nonexposed).
4. Define exposure variable(s), outcome
variables, and measure of its frequency, and
primary measure of association.
(continue)
Study type 6
5. Define extraneous variables to be measured.
6. Develop or choose measurement instruments
that are valid and reliable.
7. Determine sample size.
8. Develop protocol, staff training .
9. Recruit subjects, collect data, quality control
procedures.
10. Process data.
(continue)
Study type 7
11. Analyze data.
A. Determine if valid statistical association
exists.
1. Rule out chance.
2. Rule out bias.
B. Determine if there are effect-modifiers
of the association.
C. Judge if association is causal
12. Discuss practical significance of findings.
Study type 8
Studies making observations on groups of individuals vs.
individuals
• Studies using group level data are
usually called ecological studies
• Two main points about ecological
studies
– Weak design for identifying cause and
effect associations because of ecological
fallacy
– In some study situations group-level
measures may actually provide better
inference than individual-level measures
Study type 9
Example from Szklo and Nieto of grouped data
from cohorts in the Seven Countries Study
Study type 10
Ecological Fallacy
• Cannot tell whether the predictor and
the outcome are related at the individual
level
• In this example: cannot tell whether the
individuals in the cohorts eating less
saturated fat are the individuals who are
experiencing a higher rate of heart
disease
• Sometimes called confounding at the
group level
Study type 11
Confounding in group data
• If no ecological fallacy, still left with
possible confounding: some third
variable really causing the increase in
risk
• Difficult to control for because measures
may not be available
• Even if data available, don’t know
relationship of confounding variable to
other two variables at individual level
Study type 13
Advantages of group data/ecological studies
1.Inexpensive: secondary data already
collected (vital statistics, disease registries,
HMO’s, etc)
Study type 14
(continue)
2. Rapid test of hypothesis:
a. Idea that ecological studies are
hypothesis-generating doesn’t reflect
their usual purpose
b. If hypothesized risk factor is associated
with disease, it may well be seen in
group level data
Study type 15
(continue)
3. Can overcome “threshold” problem:
exposure is so universal that effect is
difficult to detect in one setting
(continue)
Study type 16
4. Some disease transmission dynamics
can only be studied at group level
(eg, herd immunity and infectious
disease transmission)
Allows global measures of group
characteristics (e.g., type of health
care system)
Allows tests of area-level interventions
(eg, closing of a public hospital)
Strategies that can strengthen inferences
from ecological Studies:
Study type 17
1. Multiple kinds of comparisons to strengthen
inference of association; eg, across
geographic areas and over different time
periods
Example: Valerie Beral’s study showing inverse
association between average family size and
ovarian cancer mortality using comparisons
among different birth cohorts, different countries,
and different social and ethnic groups (Lancet, 1978)
Study type 18
2. “Small Area” analysis: Used in health
services research to investigate
variation within small geographic areas.
Reduce confounding by comparing small
areas from a larger area thought to be fairly
homogeneous on potential confounders
(SES, disease prevalence)
Example: Wennberg’s study of variation in
rates of surgical procedures in 6 areas of
Vermont with similar disease prevalence
(Medical Care, 1987)
Study type 19
3. Mixed studies that collect data on
individuals but use secondary group data
for rare outcomes (multilevel studies)
Doesn’t avoid ecological fallacy but reduces
confounding by key measures at individual
level
Using group data may make study feasible
that would be otherwise prohibitively
expensive
Study type 20
Example: Bindman’s study of health care
access (personal data) and rates of
preventable hospitalizations (group data)
in California medical service areas
(JAMA, 1995)
Study type 21
Cohort Study Design
1.Gold standard because exposure/risk factor
is observed before the outcome occurs
2. Randomized trial is a cohort design in which
the exposure is assigned rather than
observed
3. Other study designs can be understood by
the way in which they sample the experience
of a cohort
Study type 22
4. Easiest design to understand because it
explicitly defines the study base as a cohort
5. Measures individual characteristics before
disease occurrence fulfilling the temporal
order required for cause and effect (but is
not the only study design that can do this).
6. Provides conceptual basis for understanding
sampling strategies of case-control,
case-cohort, and cross-sectional designs
Study type 23
Study type 24
Study type 25
Subjects dying or
lost to follow-up
Cohort Study
X
L
X
D
X
X
L
D
L
D
X
X
Subjects followed
until end of study
D
D
D
D
D
D
D
Begin
End
Time of Follow-up
X = dead
L = lost
D = disease
Fixed (closed) v.s. dynamic (open) cohort
Study type 26
Fixed: When the exposure groups in a cohort
study represent groups that are defined at
the start of follow-up, with no movement of
individuals between exposure groups
during the follow-up, the exposure groups
are sometimes called fixed cohorts.
Dynamic (open): It describe a population in
which the person-time experience can
accrue from a changing roster of
Individuals.
Study type 27
Cohort Design
Prospective cohort design:
Present exposure data --> Future diseases
Retrospective cohort design:
Past recorded exposure -->
Diseases accumulated to the present
Study type 28
Past
Present
Select cohort:
classify as to
exposure status
On basis of
existing records
classify
individuals in
cohort as to past
exposure status
Identify cohort
defined in past
Determine whether
disease has develop
Future
Follow to
see if
disease
develops
Study type 29
Groups Investigated in Cohort Studies
A. General population sample
B. Select groups of the population
1. Special groups - professional, insured, alumni,
veteran, etc.
2. Exposed groups - medically or toxically exposed,
occupational, etc.
Defining Exposed/Nonexposed Groups
for Comparison
Study type 31
I. Sources of exposure data
A. Types
1. Records (e.g., hospital, employment)
2. Interviews, questionnaires
3. Direct examination
4. Indirect measures of exposure estimated
from investigating the environment
B. Retrospective cohort studies use records or
indirect measures and generally obtain less
detail on exposure (and confounding)
II. Defining nonexposed
Study type 32
A.Internal comparison group
B. External comparison group
1. Common in retrospective cohort
(especially occupational studies)
2. Drawbacks
a. Typically assumes exposure is rare in
comparison group
b. Wrong comparison group
(e.g., Healthy worker effect)
3. Must be certain endpoint is comparably
defined
Study type 33
III. Other considerations
A. Exposed should be truly “exposed”;
Non exposed truly “nonexposed.”
B. Exposure should be measured similarly in
exposed and nonexposed.
C. Can select exposed/nonexposed groups
with equal or unequal sampling fraction
D. Matching
E. Problem of homogeneity of exposure
Study type 34
Follow-up
I. Objective
A. Uniform and complete follow-up of all cohort
groups
B. Complete ascertainment of outcome events
C. Standardized diagnosis of outcome events
Study type 35
II. Considerations
A. Length of follow-up is related to :
1. The natural history of the disease
2. The incubation period (latency) between
exposure and disease
B. Obtain tracing information at baseline
1. Name, address, phone number
2. Age, birthdate, states of birth, maiden name
3. Social Security Number, driver’s license
number
4. Name and address of friends, employer,
physician
Study type 36
III. Methods
A. Direct contact throughout the study
1. Correspondence, telephone
2. Re-examination
B. Indirect surveillance
1. Hospitalizations/physician records
2. Disease registries
3. Death records
4. National Death Index
5. Social Security Administration
6. Pension/retirement associations
Sources for tracing subjects
Study type 37
Post office
Tax record
Phone directory
Prison system
Relatives
Medical record
Neighbors
Family physician
Drivers Bureau
National Death Index
Schools
Commercial tracing firm
Veterans groups
Credit bureau
Union
Church
Employment (payroll files)
Pension/Retirement association
Study type 38
Study type 39
Cumulative Incidence
Cohort is followed a uniform length of time
D
D
Total
E
a
b
N1
Risk = a/N1
E
c
d
N0
Risk = c/N0
a/N1
Relative Risk =
c/N0
Study type 40
Life table analysis: (Survival Analysis)
Example: Assume that 100 persons have
received cardiac transplants and that we wish to
estimate the probability of surviving the surgery.
The data was follows:
Time
(interval)
1
2
3
4
5
6
Individual
at risk
100
80
70
50
30
20
number of
events
10
10
10
10
10
10
loss to
observation
10
0
10
10
0
0
Study type 41
Life table calculation:
Probability of
Under
observation
interval
at time
1
2
3
4
5
6
100
80
70
50
30
20
Died during
interval
10
10
10
10
10
10
Dying during
interval
.1
.125
.143
.2
.333
.5
Surviving
through interval
.9
.875
.857
.8
.667
.5
Cumulative probability of surviving 24 months is
(.9) (.875) (.857) (.8) (.667) (.5) = 0.18
Study type 42
Survival Curve
Assumptions in the estimation of cumulative
incidence based on survival analysis
Study type 43
1.Uniformity of events and losses within each
interval:
If risk increases or decreases too rapidly
within a giving interval, then calculating an
average risk (from our example,0.18) over
the interval is not fully informative.
One good way is to shortening the interval
for calculation.
Study type 44
2. Independence Between Censoring and
Survival:
One need to assume that the censored
observations have the same probability
of the event (after censoring) as those
remaining under observation.
Study type 45
Examples of non-independence Between
Censoring and Survival:
In an outcome observation of lung cancer,
participants dying from coronary heart
disease are censored. Since lung cancer
and CHD share an important risk factor,
smoking, it is possible that individual dying
from CHD would have had a higher risk of
lung cancer if they had not died from heart
disease. The risk of smoking will be underestimated.
Study type 46
3. Lack of Secular Trends
In studies in which the accumulative time
covers an extended period, the decision to
pool all individuals at time 0 assumes lack
of secular trends with regard to the type and
characteristics of these individuals that affect
the outcome of interest.
Study type 47
Examples of lack of secular trend
It would not be appropriate to carry out a
survival analysis pooling at time 0 all HIV
positive individuals recruited into a cohort
accrued between 1995 and 1999 – that is,
both before and after a new effective
treatment (protease inhibitors) became
available.
Incidence Density
Study type 48
Cohort has variable lengths of follow-up, due to:
1. Losses to follow-up
2. Deaths
3. Termination of the study
4. No longer “at risk” (able to develop the disease)
E
E
D
a
c
D
b
d
Person Years
N1
N0
Incidence rate = a/N1
Incidence rate = c/N0
a/N1
Rate ratio =
c/N0
Study type 49
Assumptions in the estimation of
incidence rates based on person-time
1.Assumptions of independence between
censoring and survival
2.Lack of secular trends
(continue)
Study type 50
3. Estimated risk apply equally to any time unit
within the interval
 n persons followed during t units of time
are equivalent to t persons observed
during n units of time.
 The effect resulting from the exposure is
not cumulative within the follow-up interval
of interest
(continue)
Study type 51
For example, the risk of chronic bronchitis for
1 smoker followed for 10 years is certainly
not the same as that of 10 smokers followed
for 1 year, In view of strong cumulative effect
of smoking.
Study type 52
Incidence Rate
average risks (cumulative incidence):
measured with individuals as the unit in the
denominator; are conceptually tied to the
identification of specific cohorts of individuals,
Incidence rates:
have person-time as the unit of measure;
can define the comparison groups in terms of
person-time units that do not correspond to
specific cohorts of individuals.
Study type 53
(continue)
Thus, an individual whose exposure
experience changes with time can,
depending on details of the study
hypothesis, contribute follow-up time
to several different exposure-specific rates.
Study type 54
Classification of Person-Time
The main guide to the classification of
persons or person-time is the study
hypothesis, which should be spelled out
in as much detail as possible.
In studies with chronic exposures, it is easy
to confuse the time during which exposure
occurs with the time at risk of exposure
effects.
Study type 55
in occupational studies, time of employment
is sometimes confused with time at risk for
exposure effects:
Time of employment:
a time during which exposure accumulates
Time at risk for exposure effects:
must logically come after the accumulation
of a specific amount of exposure, because
only after that time can disease be caused
or prevented by that amount of exposure.
(Continue)
Study type 56
Comments:
1.The length of these two time periods have
no inherent relation to one another.
2. The time at risk of effects might well extend
beyond the end of employment.
3. It is only the time at risk of effects that
should be tallied in the denominator of
incidence rates for that amount of exposure
Study type 57
(Continue)
In a study of the delayed effects of exposure
to the atomic bomb, the exposure is almost
instantaneous, but the risk period during
which the exposure has an effect may be
very long, perhaps lifelong, and the risk for
certain diseases may not go up immediately
after exposure .
Study type 58
Hypothesis of Induction time
There is no way to estimate exposure effects
without making some assumption about the
induction time.
For example, one year of induction time for
the atomic bomb subjects in Japan.
Disease occurred within one year of the
exposure were not considered as an outcome.
(continue)
Study type 59
If the investigator does not have any basis
for hypothesizing induction period, s/he can:
=> estimating effects according to
categories of time since exposure.
In an unbiased study, we would expect
the effect estimates to rise above the null
value when the minimum induction period
has passed.
Study type 60
(continue)
This procedure works best when the
exposure itself occurs at a point or narrow
interval of time, but it can be used even if
exposure is chronic, as long as there is a
way to define when a certain hypothesizing
accumulation of exposure has occurred.
Study type 61
Categorizing Exposure
One problem to consider is that the study
Hypothesis may not provide reasonable
guidance on where to draw the boundary
between exposed and exposed.
Study type 62
(continue)
1. exposure as continuous:
it is not necessary to draw boundaries
at all, but rather to use the quantitative
information from each individual fully
either by using some type of smoothing
method, such as moving averages,
or by putting the exposure variable into
regression as a continuous term.
Study type 63
(continue)
2. Rate calculations:
will require a reasonably sized population
within categories.
It should be possible to form several
cohorts that correspond to various levels
of exposure. There are two ways to allocate
the person-times of exposure:
Study type 64
(continue)
a. An individual who passes through one level
of exposure along the way to a higher level
would later have time at risk for disease that
theoretically might meet the definition for
more than one category of exposure.
Usually the time is allocated only to the
highest category of exposure that applied.
Study type 65
(continue)
b. The following time that an individual spends
at a given exposure intensity (induction time)
could begin to accumulate as soon as that
level of intensity is reached.
Once the person-time spent at each
category of exposure has been determined
for each study subject, the classification of
the disease events (cases) follows the same
rules.
(continue)
Allowing for a 5-year induction period
Study type 66
(continue)
Study type 67
c. One can also define current exposure
according to the average (arithmetic or
geometric mean) intensity or level of
exposure up to the current time, rather
than by a cumulative measure.
In the occupational setting, the average
concentration of an agent in the ambient air
would be an example of exposure intensity,
although one would also have to take into
account any protective gear that might affect
the individual’s exposure to the agent.
Study type 68
(continue)
Intensity of exposure is a concept that applied
to a point in time, and intensity typically will
vary over time. Studies that measure exposure
intensity might use a time-weighted average of
intensity, which would require multiple
measurements of exposure over time.
The weight for each exposure intensity would
equal the amount of time that an individual is
exposed to that intensity.
Study type 69
Timing of Outcome Events
For some events, such as death, it is not
difficult to determine the time of the event.
For others, such as human immunodeficiency
virus (HIV) seroconversion, the time of the
event can be defined in a reasonably precise
manner (the appearance of HIV antibodies in
the blood stream), but measurement of the
time is difficult.
Suggestions for determining
occurrence of Outcome Events
Study type 70
start with at least one written protocol to
classify subjects based on available
information.
For example, seroconversion time may be
measured as the midpoint between time of
last negative and first positive test.
Study type 71
(continue)
For unambiguously defined events, any
deviation of actual times from the protocol
determination can be viewed as
measurement error.
(continue)
Study type 72
Ambiguously timed diseases, such as cancers
or vascular conditions, are often taken as
occurring at diagnosis time, but the use of a
minimum lag period is advisable whenever a
long latent (undiagnosed) period is inevitable.
It may sometimes be possible to interview
cases about the earliest onset of symptoms,
but such recollections and asymptoms can
be subject to considerable error and
between-person variability.
Study type 73
Key Potential Biases in Cohort Studies
A. Bias related to follow-up
In principle, a cohort study could be used
to estimate average risks, rates, and
occurrence times. Loss of subjects during
the study period will prevent direct
measurements of these averages, since
the outcome of loss subjects is unknown.
(continue)
Study type 74
Subjects who die from competing risks
(outcomes other than the one of interest)
likewise prevent the investigator from
estimating conditional risk directly.
When loses and competing risks do occur,
one may still directly estimate the incidence
rate, where average risk and occurrence
time must be estimated using survival
(life-table) methods.
Study type 75
(continue)
A substantial number of subjects lost to
follow-up can raise serious doubts about the
validity of the study.
Follow-ups that trace less than about 60% of
subjects are generally regarded with
skepticism, but even follow-up of 70% or 80%
or more can be too low to provide sufficient
assurance against bias if there is reason to
believe that loss to follow-up may be correlated
with both exposure an disease.
(continue)
Study type 76
B. Bias related to participation (nonresponse)
Participants and non-participants may differ
on their exposure status or disease outcome.
C. Bias of misclassification of exposure
1. Change in exposure level over time may
lead to random error and/or, if related to
disease outcome, may lead to bias.
2. Misclassification due to measurement
error.
(continue)
Study type 77
D. Bias related to observation and detection of
disease
1. Those with certain exposure may be followed
more intensively for disease than those
without the exposure
(diagnostic suspicion bias)
2. Unmasking bias (innocent exposure leads to
greater likelihood of detecting disease)
Effects of Study Loss in Cohort Study
Study type 78
Selected cohort 1000
250 exposed
750 unexposed
13% refused participation
250-32=218
14% refused participation
750-105=645
14% of those entering
were lost to follow-up
218-35=183
14% of those entering
were lost to follow-up
645-105=540
183 remaining
Greenland. Response and follow-up bias in cohort studies. Am J Epid 1977; 106: 184-7
540 remaining
Study type 79
Effects of Study Loss in Cohort Study
Study outcome:
Disease
Exposed
16
Not Exposed
23
39
No Disease
167
517
684
Total
183
540
723
16/183
Relative Risk =
= 2.05
23/540
Evaluation of Extremes of Outcome
Study type 80
A. All of exposed developed disease
None of unexposed developed disease
Disease
Exposed
83
Not Exposed
23
106
No Disease
167
727
894
Total
250
750
1000
83/250
Relative Risk =
= 10.83
23/750
Evaluation of Extremes of Outcome
Study type 81
B. None of exposed developed disease
All of unexposed developed disease
Disease
Exposed
16
Not Exposed
233
249
No Disease
234
517
751
Total
250
750
1000
16/250
Relative Risk =
= 0.21
233/750
Case-Control Studies
Study type 82
Case-Control Studies
Study type 83
Potential Advantages
A. Quick and less expensive
B. Well-suited to evaluation of diseases with
long latent periods
C. Optimal if disease is rare
D. Can examine multiple etiologic factors for
a single disease.
Selection of Cases
Study type 84
A. Sources:
Hospital
Physicians’ office
Disease registries
Vital statistics bureau
B. Type:
General population vs. special group
Incident and/or prevalent cases
Representativeness
Study type 85
(continue)
C. Definition:
1. Should truly be a case: validation of
diagnosis using objective criteria
2. Should represent a defined eligible
population: inclusions and exclusions
should be specified clearly
Study type 86
3. Detection issues that may be important
a. Variation among cases in medical care,
self or medical diagnostic procedures
b. Must a series of sequential events be
present for detection to occur?
E.g., perceived symptoms followed by
drug use
Pathway Between A Disease and A Diagnosis
Disease person at home
Study type 87
Referal for definitive
test procedure?
Clinical signal
Procedure performed?
Is signal overt?
Positive result?
Medical surveillance
Diagnostic suspicion?
Exploratory exam?
Patient is diagnosed
“case”
Selection of Controls - Principles
Study type 88
A. Controls should be selected form the same
population - the source population or study
base – that gives rise to the cases.
Case-control studies can be viewed as
efficient versions of cohort studies, in which
the relative sizes of the denominators of the
incidence rates are estimated by taking a
sample of the source population.
(continue)
Study type 89
B. Controls should be selected independently
of their exposure status and should be
representative of the source population
with respect to exposure.
Study type 90
(continue)
C. For unmatched control selections, the
probability of selecting any potential
control subject should be proportional to
the amount of time that he or she
contributes to the denominator of the rates
that would have been calculated, had a
cohort study of the source population
been undertaken.
(continue)
Study type 91
For example, if in the source population one
subject contributes twice as much person-time
during the study period as another subject, the
first subject should have twice the probability
of the second of being selected as a control in
the case-control study.
In doing so, the sampling rate for exposed
and unexposed controls will be the same,
And the ratio of pseudo-rates will be equal
to the incidence rate ratio in the source
population.
Study type 92
(continue)
D. If a subject’s exposure may vary over
time, cases’ exposure (or exposure
history) at the time of disease occurrence
should be used as the indicator of
exposure.
(continue)
Study type 93
E. To ensure that the number of exposed
and unexposed controls will be in
proportion to the amount of exposed
and unexposed person-time in the
source population:
one must sample controls at a steady rate
throughout the study period and use the
control’s exposure (or history) at the time
of sampling; exposure after the time of
selection must be ignored.
(continue)
Study type 94
F. The time during which a a subject is eligible
to be a control should be the same in which
that individual is eligible to be a case,
if the disease should occur.
(continue)
Study type 95
One way to implement this rule is to choose
controls from the set of individuals in the
source population who are at risk of becoming
a case at the time that the case is diagnosed;
this set is sometimes referred to risk-set
for the case.
Controls sampled in this manner are matched
to the case with respect to sampling time; thus,
if time is related to exposure, the resulting data
should be analyzed as matched data.
Study type 96
(continue)
It is also possible to enforce the rule in unmatched
sampling if one knows the time interval at risk for
each population member; one then selects a
control by sampling members with probability
proportional to time at risk and then randomly
samples a time to measure exposure within the
interval at risk.
Sources of Controls
A. General population Controls
1. Useful when the series of cases is
population-based
2. Select from a population registry or
random digit dialing (RDD)
3. Advantages: generalizability
Study type 97
Study type 98
B. Hospital controls
1. Select from other hospital admissions
2. Advantages:
a. Feasibility
b. High cooperation rate
c. Have been ill, therefore, “mental set” is
similar (potentially less recall error),
d. Makes cases and controls similar with
respect to some determinants of
hospitalization
Study type 99
3. Disadvantages:
a. Selection (Berkson’s) bias
b. Controls may have a condition that
shares etiologic features with the disease
under study
4. Preferable to select from many diagnostic
categories
Study type 100
C. Neighborhood Controls
1. Similar to general population but also
matches on factors related to geography
2. Select by a modification of random digit
dialing on “walking algorithm”
D. Others:
Friends
Siblings
Co-workers
Guidelines for control selection
Study type 101
A.Goal: avoid selection bias
B. Draw controls from same reference
population as cases
C. Subjects not at risk for disease should be
excluded from control group
D. If cases are excluded because they are
not at risk for exposure, similar criteria
should be applied to controls
Study type 102
(continue)
F. Individuals with medical conditions known
to be associated with the exposure under
study (positively or negatively) usually
should be excluded from the control series
e.g., aspirin exposure for MI: rheumatoid
arthritis or peptic ulcer controls
G. Customary rule: Hospital controls should
be selected from more than one disease
category.
Study type 103
(continue)
H. Ideally, controls should undergo the same
diagnostic procedures as cases. However,
this often isn’t practical.
I. Exposure status and confounders must be
able to be measured comparably in cases
and controls. Cases or control status must
be defined before exposure determined.
Study type 104
(continue)
J. Problems: agents that cause one disease
in an organ often cause other diseases of
that organ
e.g., smoking -> lung cancer and
chronic bronchitis
may bias association towards the null.
Multiple Control Groups
Study type 105
A. Increased cost
B. Useful when no single control group is best
C. Useful to compare findings among groups
D. If findings contrast may help determine etiology
e.g., association of Hodgkins Disease and
tonsillectomy using spouse controls OR= 3.1
using sibling controls OR=1.4
So, same aspect of life-style is childhood, perhaps
an infection, may be a cause of HD.
F. Caveat: may be hard to explain the results
Study type 106
Matching
A.Refers to the selection of a reference
series - unexposed subjects in a cohort
study or controls in a case-control study –
that is identical, or nearly so, to the index
series with respect to the distribution of
one or more potentially confounding factors.
Individual matching: performed subject by
subject
frequency matching: performed for
group subjects
(continue)
Study type 107
B. In a cohort study, the index subject is
exposed, and one or more unexposed
subjects are matched to each exposed
subjects.
C. In a case-control study, the index subject
is a case, and one or more controls are
matched to each case.
(continue)
Study type 108
D. Frequency matching involves selection
of an entire stratum of reference subjects
with matching-factor values equal to that
of a stratum of index subjects.
e.g., case: 50% male and 50% female
control: also select a combination
of 50% male and 50% female
Purpose and Effect of Matching
Study type 109
A.The chief importance of matching in
observational studies stems form its effect
on study efficiency.
B. In a cohort study, without competing risks
or losses to follow-up, no additional action
is required in the analysis to control
confounding of the point estimate by the
matching factors, because matching
unexposed to exposed prevents an
association between exposure and the
matching factors.
Study type 110
(continue)
C. In case-control studies, if the matching
factors are associated with the exposure
in the source population matching requires
control by matching factors in the analysis,
even if the matching factors are not risk
factors for the disease.
Study type 111
Hypothetical source population: 2,000,000
Male
exp
diseased
total
Female
unexp
exp
unexp
90
4500
50
100
900,000
100,000
100,000
RRmale= 10
Crude RR = 32.9
900,000
RRfemale= 10
Gender is a confounder
Study type 112
Matching in cohort study
1,000,000 exposed
10% sample
100,000 exposed
90,000 Male
Male : Female =9:1
10,000 Female
Male : Female =9:1
90,000 Male
10,000 Female
Study type 113
(continue)
Male
diseased
total
Female
exp
unexp
exp
unexp
450
45
10
1
90,000
90,000
10,000
10,000
RRmale= 10
Crude RR = 10
RRfemale= 10
Confounder controlled!
Study type 114
Matching in case-control study
4,740 cases
4,550 Male
190 Female
1,995,260 non-cases
995,450 Male
999,810 Female
Male : Female = 24 : 1
Using frequency match select 4,740 controls
Sampled 4,550 Male and 190 Female controls
4,550/995,450 selection proportion in male non-cases
190/999,810 selection proportion in female non-cases
Study type 115
(continue)
Male
Female
exp
unexp
exp
unexp
cases
4500
50
100
90
controls
4092
457
19
171
ORmale= 10
Crude OR = 5
ORfemale= 10
Confounding!
Study type 116
What accounts for this discrepancy?
In a cohort study, matching is of unexposed
to exposed. It is undertaken without regard to
disease status, which is unknown at the start
of follow-up, and it alters the distribution of
the matching factors in the entire source
population from which study cases arise.
Study type 117
(continue)
In contrast, matching in a case-control
study involves matching non-diseased to
diseased and thus affects only the
distribution of controls; if the matching
factors are associated with exposure, the
selection process will be differential with
respect to both exposure and disease,
thereby resulting in selection bias.
Study type 118
Matching and Efficiency
Suppose the one anticipates that age
distribution for cases is shifted strongly
toward older ages, compared with the age
distribution of the entire population.
As a result, without matching, there may be
some age strata with many cases and few
controls, and others with few cases and many
controls. If controls were matched to cases
by age, the ratio of controls to cases would
instead be constant over age strata.
Cost of Matching
Study type 119
A. Research limitation: If a factor has been
matched in a study, it is no longer possible
to estimate the effect of that factor from the
stratified data alone, since matching distorts
the relation of the factor to the disease.
B. It is Still possible to study the factor as a
modifier of relative risk by seeing how the
odds ratio varies across strata.
C. Cost $
Overmatching
Study type 120
A. Matching that harms statistical efficiency,
such as case-control matching on a
variable associated with exposure but not
disease.
B. Matching that harms validity, such as
matching on an intermediate between
exposure and disease.
C. Matching that harms cost efficiency,
such as matching on a variable not
associated with disease and exposure.
Why perform Matching?
Study type 121
Besides enhancing efficiency, here are some situations
in which matching is desirable or even necessary.
A. If the process of obtaining exposure and confounder
information from the study subjects is expensive,
it may be more efficient to optimize the amount of
information obtain per subject than to increase the
number of subjects.
B. In the process of control selection, neighborhood,
sibling, spouse, friend, and occupation are
sometimes chosen and were inevitably matched on
some factors.
Study type 122
(however)
Selection of neighborhoods, friends, sibling, spouse,
and occupation, may induce bias under ordinary
circumstances. For example, friendship my be related
to exposure (e.g., through lifestyle factors), but not to
disease. As a result, use of such friend controls could
entail a statistical efficiency loss.
The decision to use convenient controls should weight
any cost savings against any efficiency loss and bias
relative to the viable alternatives ( e.g., general
population controls)
(continue)
Study type 123
C. Matching is sometimes employed to achieve
comparability in the quality of information
collected.
A typical situation in which such matching
might be undertaken is a case-control study
in which some or all of the cases have
already died and surrogates must be
interviewed for exposure and confounder
information. Many investigators prefer to
match dead controls to dead cases.
Study type 124
Multiple Controls per Case
A. Improved precision
Efficiency of matching often expressed as
ratio of variance of odds ratio with M
controls per case to the variance of odds
ratio with infinite controls per case.
Efficiency = M/M+1
So, one control is 50% efficient
four controls is 80% efficient
Little gained beyond M=4
Study type 125
(continue)
B. Useful when controls plentiful and
inexpensive and/or when cases are rare.
C. Varying number of controls per case is
possible
Study type 126
Classifying Exposure
A. Most commonly obtained by interview or
records.
B. Must be obtained comparably in cases and
controls.
C. Greater potential for interviewer bias and
need for blinding interviewers than in cohort
studies.
Study type 127
(continue)
D. Latency period and temporality
considerations if measurements made
directly on study subjects.
E. Recall bias
Potential Bias Occurred in
Case-Control Studies
Study type 128
I. Selection bias
A. When case and control are not selected
from the same (or similar reference
population). (e.g., Berkson’s bias)
B. Nonresponse
C. Cases died form competing risks
Study type 129
(continue)
II. Measurement Biases
A.Diagnostic suspicion
B. Exposure suspicion
C. Recall
D. Random (Non-differential)
misclassification
Study type 130
Various Designs of Case-Control Study
Case-based case-control study
Study type 131
Cases and noncases are identified at a given point in
time among living individuals. This study is carried out
“cross-sectionally” (ie, cases and controls are identified
at the same time), cases must necessarily occur over
given time period prior to their inclusion in the study.
Thus, it is necessary to assume that the cases who
survive through the time with regard to the exposure
experience and that if exposure data are obtained
through interviews, recall or other bias will not intrude
regarding to their exposure status.
(Szklo and Nieto, Epidemiology- beyond the basic, 2000)
Study type 132
(continue)
• Source of cases is often one or more
hospitals or other medical facilities
• Problem is identifying the population
who would come to those institutions if
they were diagnosed with the disease
• Careful consideration has to be given to
factors causing someone to show up at
that institution with that diagnosis
Case-based design using prevalent cases:
essentially same as cross-sectional design
Study type 133
Study type 134
Case-Control Study with Case-Based Sampling
Sampling within a Cohort Study: Ascertaining all cases and sampling
controls from subjects disease free at end of follow-up
D
D
D
D
D
D
D
D
D
D
D = disease = case
C = control (no disease)
C
C
C
C
C
C
C
C
C
C
Subjects in Case-Control Study
Possible bias: Potential controls
not in study at end of follow-up
Study type 135
(continue)
To validly compare cases and controls
regarding their exposure status, it is
necessary to assume that they are originate
from the same reference population.
Study type 136
Example of case-based design using prevalent cases
• Sampling glioma patients under
treatment in a hospital during study
period
• Poor survival so patients in treatment
will over-represent those who live
longest
• Nature of bias variable are not
predictable
Case-cohort Study and
Nested Case-Control Study
Study type 137
When cases are identified within a well defined cohort,
it is possible to carry out nested case-control or
case-cohort studies.
Case-control study within a cohort are also known as
“hybrid or ambidirectional” designs because they
combine some of the features and advantages of both
cohort and case-control designs.
Nested Case-Control Study
Study type 139
Controls are from a random sample of the cohort
selected at the time each case occurs. This study
design is called a nested case-control design and
is based on a sampling approach known as
“incidence density sampling” or “risk-set sampling.”
The idea underlying this sampling schemes is that
it allows the comparison of cases with a subset of the
cohort members at risk of being cases at the time
when each case occurs - that is, a “risk set” of all
cohort members under observation at the time of
each case’s occurrence.
Study type 140
Study type 141
Nested Case-Control (Incidence Density Sampling)
C
D
D
D
D
C
C
Controls
= 10 C’s
C
D
C
Risk Risk Etc.
Set 1 Set 2
D
C
C
C
Cases
= 10 D’s
D
C
D
Formed
D from 9
risk sets
D
C
Risk
Set 9
Subjects in Case-Control Study
Sampling within a cohort: Including all cases and sampling controls
from subjects disease free at the time each case is diagnosed
Study type 142
Risk set
A risk set is a set of people in the source
Population who are at risk for disease
at that time.
The definition of the risk set changes for each
case because the identity of the risk set is
relating to timing of the case.
If several diseases are to be studied, each one
will require its own control group to maintain
risk-set sampling.
Nested Case-Control Study
Study type 143
The control series represents the person-time
distribution of exposure in the source population.
The probability that any person in the source
population is selected as a control is
proportional to his or her person-time
contribution to the incidence rates in the source
population.
By this strategy, cases occurring later in the
follow-up are eligible to be controls for earlier
cases.
Study type 144
(continue)
Incidence density sampling is the equivalent of
Matching cases and controls on duration of
follow-up.
In this situation, it can be demonstrated that
the estimated exposure odds ratio is a
statistically unbiased estimate of the rate ratio.
Study type 145
Example of Case-Control Incidence Density
Sampling
• Use cancer registry covering San
Francisco County to identify all new
cases of glioma during a defined time
period
• At time each new glioma case is
reported, randomly sample two controls
from current residents of San Francisco
Case-cohort Study
Study type 146
Controls are from a random sample of the total cohort
at baseline (case-cohort study), thus allowing some
cases that develop during follow-up to be part of both
the case and control groups.
Every person in the source population has the same
chance of being included as a control, regardless of
how much time that person has contributed to the
person-time experience of the cohort.
Study type 147
Study type 148
Case-Cohort Study
Sampling within a cohort: Including all cases and sampling
controls from all subjects at baseline of cohort
C
C
D
D
D
C
C
C
C
D
D
D
C
D
D
C
C
C
D
D
D = disease = case
C = control (no disease)
Cases in Case-Cohort Study
Controls in Case-Cohort Study
Study subjects
Study type 149
Case-cohort Study
An important advantage of case-cohort design is that
the availability of a sample of the cohort (the control
group) allows the estimation of risk factor distributions
and the odds ratio is an estimate of the risk ratio rather
then rate ratio.
If the risk are small, then the risk ratio is approximately
equal to the incidence rate ratio, so in many instances
there may be little difference between the result from
a case-cohort study and the the result from a density or
nested case-control study.
Selection of Case-Cohort or
Nested Case-Control?
Study type 150
The definition of the risk set changes for each case
because the identity of the risk set is relating to timing
of the case. If several diseases are to be studied,
each one will require its own control group to maintain
risk-set sampling.
Control sampling for case-cohort study, however,
requires just a single sample of people from the roster
of people who constitute the cohort. The same control
group could be used to compare with various case
series, just as the same denominators for calculation of
risk.
Study type 151
Why use case-control study within a cohort?
It is an efficient approach when additional information
that was not obtained or measured for the whole cohort
is not needed.
Atypical situation is a concurrent cohort study in which
serum samples are collected at baseline and stored in
freezers. Once a sufficient number of cases is accrued
during the follow-up, the frozen serum samples for cases
(or a sample of cases) and for a sample of controls can
be thawed and analyzed.
Study type 152
Why use case-control study within a cohort?
A similar situation arises when the assessment of key
exposures or confounding variables requires laborintensive data collection activities. Collecting this
additional information in cases and a sample of the
total cohort is a cost-effective alternative to using the
entire cohort.
(Szklo and Nieto, Epidemiology- beyond the basic, 2000)
Case-crossover
Study type 153
This design is a case-control study analogue of the
crossover study.
A crossover study is an experimental study in which
two (or more) interventions are compared, with each
study participant acting as his or her own control.
Each subject receives both interventions in a random
sequence, with some time interval between them so
that the outcome can be measured after each
intervention.
Case-crossover
Study type 154
A crossover study thus requires that the effect period
of the intervention is short enough so that it does not
persist into the time period during which the next
treatment is administered.
In case-crossover study, all subjects in the study are
cases. The control series dose not comprise a different
set of people but, rather, a sample of the time
experience of the cases before they develop disease.
The control information is obtained from the cases
themselves.
Case-crossover
Study type 155
Only certain types of study question can be studied
with a case-crossover design. The exposure must be
something that varies from time to time within a person.
Case-crossover study is convenient to evaluate
exposures that trigger a short-term effect. And the
disease must have an abrupt onset.
How short is brief? The duration of the exposure effect
should be shorter than the typical interval between
episodes of exposure so that the effect of exposure
is gone before the next episode of exposure occurs.
Case-crossover
Study type 156
First, a study hypothesis is defined in relation to a
specific exposure that causes the disease within a
specified time period. Each case is considered exposed
or unexposed according to the time relation specified
in the hypothesis.
Hypothesis Testing
Study type 157
A. Significance Testing:
Tests whether, beyond chance, a point estimate is
different from expected or is different from another
point estimate
B. Confidence Interval (CI):
The limits of precision of a point estimate within
which the true population value lines
(with a specified probability)
Study type 158
C. A 95%CI can be interpreted as the range in which
the population value would lie 95 times out of 100
study samples.
For a proportion the confidence interval is:
p  z  pq/n
Therefore, CI depends on the standard error
(variance of point estimate, sample size) and
a probability value.
Measures of Association in
Case-Control study
Study type 159
Odds Ratio = odds of cases/odds of controls
= ad/bc
odds of case exposure status = exposed/unexposed
= a/c
odds of controls exposure status = exposure/unexposed
= b/d
Exposed
Unexposed
Case
a
Control
b
c
d
Equivalent Hypotheses in
Case-Control Study
Study type 160
A. Is disease (case vs. control) status associated with
exposure status?
B. Does the odds of exposure differ between cases
and controls?
C. Does the odds ratio differ significantly from one?
D. Does the confidence interval for the odds ratio
include 1.0?
Study type 161
Unmatched Case-Control Study
I. Point estimate
OR=ad/bc
II. Significance tests of Ho: OR =1
X2=[(ad-bc)2 T]/[n1n2m1m2]
If cell number less than 5, use Fisher’s exact test.
III. Confidence interval for OR
Study type 162
A. Woolf’s method
CI= exp (ln (OR)  z  SE[ln(OR)])
where SE[ln(OR)]) =
(1/a)+(1/b)+(1/c)+(1/d)
ln is the natural logarithm
B. Test-based method
(1  z  X2 )
CI= OR
where X2 is the chi-square value for the 2x2 table
Example:
Obese
Thin
Total
Study type 163
Colon cancer
30
120
150
No colon cancer
10
140
150
OR = 3.5
X2 = 11.5, p < 0.001
Woolf’s 95% CI = [1.64, 7.45]
Test-based 95% CI = [1.69, 7.22]
Total
40
260
300
(supplement)
Study type 164
case Control
exp
unexp
Odds Ratio =
a
b
c
d
a/c
b/d
Relative prevalence of exposure (odds) among cases
Relative prevalence of exposure (odds) among controls
Study type 165
In density sampling, when:
1.the sample of cases gives an unbiased
estimation of the exposure distribution
among cases
2.The sample of controls gives an unbiased
estimation of the exposure distribution in
the population at risk over the study period
Relative prevalence of exposure (odds) among controls

Relative prevalence of exposure among person-time
in population at risk
(continue)
Study type 166
Relative prevalence of exposure among cases
Relative prevalence of exposure among person-time
Number of exposed cases
Number of unexposed cases
=
Person-time among exposed
person-time among unexposed
Study type 167
(continue)
Number of exposed cases
Number of unexposed cases
=
Person-time among exposed
person-time among unexposed
Number of exposed cases
Person-time among exposed
=
Number of unexposed cases
person-time among unexposed
Study type 168
(continue)
case Person-time
exp
unexp
Odds Ratio =
=
a
b
c
d
a/c
b/d
a/b
c/d
=
Rate Ratio
1:1 Matched Case-Control Study
Control
Case
exp
unexp
exp
a
b
unexp
c
d
OR=b/c
Study type 169
Cross-Sectional Studies
(Surveys, Prevalence Studies)
I. Features:
A. Selection often by probability sampling
B. Subjects observed, questioned, examined to
determine disease status, their current or past
study factor level, and relevant variables.
Study type 170
Cross-Sectional Studies
(Surveys, Prevalence Studies)
Study type 171
II. Uses:
A. Descriptive
1. Measure prevalence of common diseases
2. Assess need for health services and facilities
in a target population
3. Assess impact of a planned intervention on the
health status of a target population
B. Analytic
1. Generate new etiologic hypotheses
2. Analyze the determinants of frequent diseases
of long duration, which often goes undiagnosed
or unreported
Cross-Sectional Studies
Study type 172
The cross-sectional study can be conceptualized as a
way to analyze cohort data, albeit an often flawed one,
in that it consists of taking a “snapshot” of a cohort by
recording information on disease outcomes and
exposures at a single point in time.
Study type 173
Cross-Sectional Studies
Study type 174
Accordingly, the case-based case-control study
can also be regarded as a cross-sectional study,
as it includes cross-sectionally ascertained
prevalent cases and noncases .
Study type 175
Study type 176
Cross-sectional Study in a Dynamic Population
Cross-sectional sample of a dynamic population differs from
sampling in fixed cohort setting. Persons enter as well as leave
the population. Disease sampling is still of prevalent cases.
D
D
Persons leaving the population
D
D
D
D
D
D
D
D
D = disease = case
Subjects in cross-sectional study
Persons entering the population
D
Data Analysis for Cross-Sectional Studies
Study type 177
The analysis follows that when cross-sectional data
are obtained for a defined reference population or
cohort, the analytic approach may consist of either
comparing point prevalence rates for the outcome
of interest between exposed and unexposed individuals
or using a “case-control” strategy, in which prevalent
cases and noncases are compared with regard to
odds of exposure.
Study type 178
Cross-Sectional Studies can also be done
periodically for the purpose of monitoring disease
Or risk factor prevalence rates, as in the case
Of the US National Health Survey.
Study type 179
Major Drawbacks of the Cross-Sectional
Studies for Etiologic Research
I. Temporality:
Separating cause from effect is impossible
Temporal bias typically occurs in cross-sectional survey
when information is lacking on the time sequence with
regard to the presume risk factor and the outcome.
In another words, it is difficult to know which came first,
the exposure or the disease.
I. Temporal Bias
Study type 180
To prevent temporal bias it is occasionally possible
to improve the information on temporality when
obtaining data through questionnaires. By questions
such as “When were you first exposed to … ?“
Obviously, even if temporality can be established
in a cross-sectional study, the investigator will
still have the incidence-prevalence bias to
contend with.
Study type 181
Job A
100 workers
Job B
100 workers
80 well
80 well
10 ill workers
20 ill
10 ill
95 well
95 well
5 ill
15 ill
point x
point y
Prevalence in job A: 20% (20/100) 11% (10/90)
Prevalence in job B:
5% (5/100)
Ratio of prevalence (A/B): 4.0
14% (15/110)
0.8
Bias of Cross-Sectional Studies
Study type 182
II. Incidence-Prevalence Bias
A study of prevalent cases will have a higher
proportion of cases with disease of long duration
compared to incident cases
Data obtained reflects determinants of survival
as well as etiology.
Study type 188
Choosing a Study Design
• What has already been done?
– If no research, a rapid and inexpensive
ecological study may be useful
– If several case-control studies have
already been done, what would yours
contribute?
– Is it worth repeating a cohort study that has
been done in a one population in a
different population (eg, in women rather
than in men)?
Study type 189
(continue)
Cohort study decision:
– Need to represent a larger population?
• Not necessarily relevant to
biological question of relative
disease risk in exposed and
unexposed
• May be important to generalizing
findings
(continue)
Study type 190
Larger cohort versus longer follow-up:
Shorter follow-up limits potential usefulness
of cohort to examine other research
questions.
Shorter follow-up desirable if rapid answer
to research question is a high priority
Choosing a Study Design: Case-cohort versus
nested case-control
Study type 191
• Nested case-control somewhat more
statistically efficient in cohorts with long
follow-up and substantial censoring
• Analysis is more familiar and available
for nested case-control
• Power of nested case-control requires
only estimate of number of cases and
controls; case-cohort requires
information on whole cohort and drop
out rate
Study type 192
(continue)
• Case-cohort can use same controls for
multiple disease outcomes
• Case-cohort allows direct modeling of
disease incidence in exposed and
unexposed
• Case-cohort allows multiple time scales
(age, calendar time); nested case-control
only one
• Nested case-control allows more efficient
collection of time dependent exposures
Study type 193
(continue)
• Case-cohort can use same controls for a
future period of additional cohort follow-up
• Case-cohort can use controls for other
purposes (such as monitoring compliance)
• Controls can be selected more rapidly in
case-cohort; nested case-control may
require control selection at end of study for
late cases