Transcript Confounder

Confounding in epidemiology
Maura Pugliatti, MD, PhD
Associate Professor of Neurology
Dept. of Clinical and Experimental Medicine, Unit of Clinical Neurology
University of Sassari, Italy
1st International Course of Neuroepidemiology
Chisinau, Moldova, 24-28 Sept. 2012
Definitions
“Confounding, the situation in which an apparent effect of an
exposure on risk is explained by its association with other factors, is
probably the most important cause of spurious associations in
observational epidemiology”
BMJ Editorial: “The scandal of poor epidemiological research” BMJ 2004;329:868-869
“Bias of the estimated effect of an exposure on an outcome, due to
the presence of a common cause of the exposure and the
outcome”
Porta, 2008
Overview
Causality: central concern of epidemiology
Confounding: central concern when
establishing causality
Four approaches to understand confounding
Avoiding and controlling for confounding is
essential in health research
Causality
Main application of epidemiology:
to identify etiologic (causal) associations
between exposure(s) and outcome(s)
?
Exposure
Outcome
Key biases in identifying causal effects:
Causal Effect
Random Error
Confounding
Information bias (misclassification)
Selection bias
Bias in inference
Reporting & publication bias
Bias in knowledge use
RRcausal
“truth”
RRassociation
Adapted from: Maclure, M, Schneeweis S. Epidemiology 2001;12:114-122.
Confounding: four approaches
1. “Mixing of effects”
2. Based on a priori criteria (classical
approach)
3. Data-based criteria
4. “Counterfactual” and non-comparability
approaches
Overlapping
“Confounding is confusion, or mixing, of
effects; the effect of the exposure is mixed
together with the effect of another variable,
leading to bias”
Latin: “confundere” = “to mix together”
Rothman KJ. Epidemiology. An introduction. Oxford: Oxford University Press, 2002
Association between birth order and Down Syndrome
Data from Stark and Mantel (1966)
Association between maternal age and Down Syndrome
Data from Stark and Mantel (1966)
Association between maternal age and Down Syndrome,
stratified by birth order
Data from Stark and Mantel (1966)
A factor is a confounder if 3 criteria are met:
1. A confounder must be causally or non-causally
associated with the exposure in the source population
(study base) being studied;
C
E
2. A confounder must be a causal risk factor (or a
C
surrogate measure of a cause) for the disease in the
unexposed cohort; and
3. A confounder must not be an intermediate cause (not
an intermediate step in the causal pathway between
the exposure and the disease)
E
C
X
D
D
Confounder
C
Exposure
Disease (outcome)
D
E
Exposure
E
Intermediate cause
C
Disease
D
Szklo M, Nieto JF. Epidemiology: Beyond the basics. Aspen Publishers, Inc., 2000.
Gordis L. Epidemiology. Philadelphia: WB Saunders, 4th Edition.
Confounder:
‘parent’ of the exposure not ‘daughter’ of
the exposure!!!
Exposure
Disease
E
D
Confounder
C
Confounding factor:
Maternal Age
C
Birth Order
E
Down Syndrome
D
Simple causal graphs
C
E
D
Maternal age (C) can confound the association
between multivitamin use (E) and the risk of certain
birth defects (D)
Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an application
to birth defects epidemiology. Am J Epidemiol 2002;155:176-84.
Complex causal graphs
U
C
E
D
History of birth defects (C) may increase the chance of
periconceptional vitamin intake (E). A genetic factor (U) could
have been the cause of previous birth defects in the family, and
could again cause birth defects in the current pregnancy (D)
Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an
application to birth defects epidemiology. Am J Epidemiol 2002;155:176-84.
More complicated causal graphs
Physical
Activity
Smoking
A
B
BMI
C
U
E
D
Calcium
supplementation
Bone
fractures
Source: Hertz-Picciotto
A factor is a confounder if:
a) the effect measure is homogeneous across the
strata defined by the confounder and
b) the crude and common stratum-specific
(adjusted) effect measures are unequal (“lack of
collapsibility”)
Usually evaluated using 2x2 tables, and simple
stratified analyses to compare crude effects with
adjusted effects
“Collapsibility is equality of stratum-specific measures of effect with the crude
(collapsed), unstratified measure” Porta, 2008, Dictionary
Crude vs. Adjusted Effects
Crude: does not take into account the effect of the
confounder
Adjusted: accounts for the confounder
Mantel-Haenszel method estimator
Multivariate analyses (e.g. logistic regression)
Confounding is likely when:
RRcrude =/= RRadjusted
ORcrude =/= ORadjusted
Stratified Analysis
Crude
ORCrude
Crude 2 x 2 table
Calculate Crude OR (or RR)
Stratify by Confounder
Stratum 1
Stratum 2
OR1
OR2
Calculate OR’s
for each stratum
If stratum-specific OR’s are similar,
calculate adjusted OR (e.g. MH)
If Crude OR =/= Adjusted OR,
confounding is likely
If Crude OR = Adjusted OR,
confounding is unlikely
Ideal “causal contrast” between exposed
and unexposed groups:
“A causal contrast compares disease frequency
under two exposure distributions, but in one target
population during one etiologic time period”
If the ideal causal contrast is met, the observed
effect is the “causal effect”
Maldonado & Greenland, Int J Epi 2002;31:422-29
Ideal counterfactual comparison to
determine causal effects:
Exposed
cohort
Iexp
Initial conditions are identical in the
exposed and unexposed groups, except
for presence of exposure (=cause)
Unexposed
cohort
Iunexp
RRcausal = Iexp / Iunexp
Maldonado & Greenland, Int J Epi 2002;31:422-29
What happens in reality?
Exposed
cohort
Unexposed
cohort
Substitute, unexposed
cohort
RRassoc = Iexp / Isubstitute
Iexp
Iunexp
Isubstitute
In this case:
RRcausal = Iexp / Iunexp
IDEAL
RRassoc = Iexp / Isubstitute
ACTUAL
“Confounding is present if the substitute population represents
imperfectly what the target would have been like under the
counterfactual condition”
Simulating the counter-factual comparison:
Experimental Studies: Randomized Clinical Trials
Eligible population
Disease +
Treated
individuals
Disease -
compare rates
Randomization
Disease +
Untreated
individuals
Disease -
Randomization helps to make the groups “comparable” (i.e. similar
initial conditions) with respect to known and unknown confounders
Confounding is unlikely at randomization - time t0
Simulating the counter-factual comparison:
Observational Studies: Cohort studies, case-control studies
Disease +
Exposed cohort
Disease -
compare rates
Disease +
Unexposed
cohort
Disease -
PRESENT
FUTURE
In observational studies, because exposures are not assigned randomly,
attainment of exchangeability is impossible – “initial conditions” are likely to be
different and the groups may not be comparable
Confounding:
Observational studies vs
randomized trials
Example:
Aspirin to reduce cardiovascular mortality
Confounding: adjustment and controls
• Control at the design stage
– Randomization
– Restriction
– Matching
• Control at the analysis stage
– Conventional approaches
• Stratified analyses
• Multivariate analyses
– Newer approaches
•
•
•
•
Graphical approaches using DAGs
Propensity scores
Instrumental variables
Marginal structural models
• Options at the design stage:
– Randomization
• Reduces potential for confounding by generating groups that
are fairly comparable with respect to known and unknown
confounding variables
– Restriction
• Eliminates variation in the confounder (e.g. only recruiting one
gender)
– Matching
• Involves selection of a comparison group that is forced to
resemble the index group with respect to the distribution of
one or more potential confounders
Randomization
• Randomization
– Only for intervention studies
– Definition: random assignment of study subjects to
exposure categories
– To control/reduce the effect of confounding variables
about which the investigator is unaware (i.e. both known
and unknown confounders get distributed evenly
because of randomization)
– Randomization does not always eliminate confounding
• Covariate imbalance in small trials
• “Maldistribution” of potentially confounding variables after
randomization (“Table I: Baseline characteristics” in the
randomized trial)
Randomization breaks any links
between treatment and prognostic factors
Confounder
C
Randomization
X
Exposure
E
Disease (outcome)
D
Restriction
• The distribution of the potential confounding factors does
not vary across exposure or disease categories
– An investigator may restrict study subjects to only those falling
with specific level(s) of a confounding variable
• Advantages of restriction
– straightforward, convenient, inexpensive (but, reduces
recruitment!)
• Disadvantages of restriction
–
–
–
–
Limits number of eligible subjects
Limits ability to generalize the study findings
Residual confounding
Impossible to evaluate the relationship of interest at different
levels of the confounder
Matching
• Matching is commonly used in case-control
studies
• Match on strong confounder
• Types:
– Pair (individual) matching
– Frequency matching
• The use of matching usually requires special
analysis techniques (e.g. matched pair
analyses and conditional logistic regression)
Matching
• Disadvantages of matching
– Finding appropriate control subjects: difficult and
expensive and limit sample size
– Confounder used to match subjects cannot be
evaluated with respect to the outcome/disease
– Matching does not control for confounders other than
those used to match
– The use of matching makes the use of stratified
analysis very difficult
– Matching is most often used in case-control studies
(prohibitive in a large cohort study)
– In a case-control study, matching may even introduce
confounding
Controlling Confounding:
At the analysis stage
Conventional approaches
Confounding: control at the analysis stage
• Confounding is one type of bias that can be
adjusted in the analysis (unlike selection and
information bias)
• Options at the analysis stage:
– Stratification
– Multivariate methods
• To control for confounding in the analyses,
confounders must be measured in the study
Stratification
• Produce groups within which the confounder
does not vary
• Evaluate the exposure-disease association
within each stratum of the confounder
Cases of Down syndrom
by birth order and mother's age
Cases per 100000
1000
900
800
700
600
500
400
300
200
100
0
1
2
3
4
5
Birth order
Source: www.epiet.org
<2
0
20
-24
25
-29
30
-34
35
-39
Ag
e
40
+
o
gr
up
s
Stratified Analysis
Crude
ORCrude
Crude 2 x 2 table
Calculate Crude OR (or RR)
Stratify by Confounder
Stratum 1
Stratum 2
OR1
OR2
Calculate OR’s
for each stratum
If stratum-specific OR’s are similar,
calculate adjusted OR (e.g. MH)
If Crude OR =/= Adjusted OR,
confounding is likely
If Crude OR = Adjusted OR,
confounding is unlikely
Direction of Confounding
• Confounding “pulls” the observed association
away from the true association
– It can either exaggerate/over-estimate the true
association (positive confounding)
• Example
– ORcausal = 1.0
– ORobserved = 3.0
or
– It can hide/under-estimate the true association
(negative confounding)
• Example
– ORcausal = 3.0
– ORobserved = 1.0
Multivariate Analysis
• Stratified analysis works best only in the presence of 1 or
2 confounders
• If the number of potential confounders is large,
multivariate analyses offer the only real solution
– Can handle large numbers of confounders (covariates)
simultaneously
– Based on statistical regression “models”
• E.g. logistic regression, multiple linear regression
– Always done with statistical software packages
Residual confounding
• Confounding that can persist, even after
adjustment
– Unmeasured confounding
– Some variables were actually not confounders
– Confounders were measured with error (eg.,
misclassification)
– Categories of the confounder improperly defined
44
Effect modification and interaction
Maura Pugliatti, MD, PhD
Associate Professor of Neurology
Dept. of Clinical and Experimental Medicine, Unit of Clinical Neurology
University of Sassari, Italy
1st International Course of Neuroepidemiology
Chisinau, Moldova, 24-28 Sept. 2012
Definition
 Biological interaction
 Effect modification (“effect-measure
modification”)
 Heterogeneity of effects
 Subgroup effects
 Statistical Interaction
 Deviation from a specified model form
(additive or multiplicative)
Biological interaction
“the interdependent operation of two or
more biological causes to produce,
prevent or control an effect”
[Porta, Dictionary, 2008]
Multicausality and interdependent effects
 Disease processes tend to be multifactorial:
“multicausality”
 The “one-variable-at-a-time” perspective has
several limitations
 Confounding and effect modification:
manifestations of multicausality
Schoenbach, 2000
Effect modification and statistical interaction
 Two definitions (related):
 Based on homogeneity or heterogeneity of effects
 Interaction occurs when the effect of a risk factor (X) on an
outcome (Y) is not homogeneous in strata formed by a third
variable (Z, effect modifier)
 “Differences in the effect measure for one factor at different levels
of another factor” [Porta, 2008]
 This is often called “effect modification”
 Based on the comparison between observed and
expected joint effects of a risk factor and a third variable
 Interaction occurs when the observed joint effects of the risk
factor (X) and third variable (Z) differs from that expected on the
basis of their independent effects
 This is often called “statistical interaction”
Szklo & Nieto, Epidemiology: Beyond the basics. 2007
Definition based on homogeneity or
heterogeneity of effects
Effect of exposure on the disease is modified
depending on the value of a third variable:
the “effect modifier”
Effect modifier
Exposure
Disease
Stratified Analysis
Crude 2 x 2 table
Calculate Crude OR (or RR)
Crude
ORCrude
Stratify by Confounder
Stratum 1
Stratum 2
OR1
OR2
Calculate OR’s
for each stratum
If stratum-specific OR’s are the same
or similar, calculate adjusted OR (e.g.
MH)
If stratum-specific OR’s are not similar,
calculate adjusted OR (e.g. MH)
Effect modification is present.
Report Stratum-specific OR
If Crude OR =/= Adjusted OR,
confounding is likely.
Report Adjusted OR
If Crude OR = Adjusted OR,
confounding is unlikely.
Report Crude OR
Confounding vs. interaction
 Confounding is a problem we want to eliminate
(control or adjust for) in our study
 Comparing crude vs. adjusted effect estimates
 Interaction is a natural occurrence that we want
to describe and study further
 Comparing stratum-specific estimates
Heterogeneity of effects
 Can occur at the level of:
 Individual study: within subgroups of a single study or
trial
 Seen in subgroup or stratified analyses within a study
 Across studies: if several studies are done on the
same topic, the effect measures may vary across
studies
 Seen in meta-analyses (across trials)
Definition based on the comparison between
observed and expected joint effects of a risk
factor and a third variable
Deviation from additive or multiplicative joint effects
This is often called “statistical interaction”
Observed vs expected joint effects of a risk factor and a third
variable
No interaction
Positive interaction
Negative interaction
Szklo & Nieto, Epidemiology: Beyond the basics. 2007
Deviation from additive or multiplicative
joint effects
 Interaction on an “additive” scale (additive interaction)
 Effect measure modification when risk difference is used as
measure of effect
 Additive statistical model:
 Linear regression: y = a + b1x1 + b2x2
 Interaction on a “multiplicative” scale (multiplicative
interaction)
 Effect measure modification when risk ratio is used as measure
of effect
 Multiplicative statistical model:
 Logistic regression:
Additive or multiplicative model?
 The additive model underpins the methods for assessing biological
interaction
 Interaction here is a departure from additivity of disease rates (risk
difference is the key measure)
 Risk difference scale is of greatest public health importance (based on
attributable risk)
 Many of the models used in epidemiology are inherently
multiplicative (e.g. logistic regression)
 Vast majority of epi analyses implicitly use the multiplicative scale (risk
ratio is the key measure)
 Because most epi studies report RR and OR estimates and use
regression models such as logistic and survival analyses – these
models inherently use ratio measures and are therefore multiplicative
Ahlbom A et al. Eur J Epi 2005
Why is interaction/effect modification important?
 Better understanding of causation
 Identification of “high-risk” groups
 Target interventions at specific subgroups