Transcript Confounding

Confounding
混杂偏倚
Michael Engelgau
Shanghai FETP
August 15, 2012
The Nature of Epidemiologic Research
 Epidemiology is the study of disease occurrence and
health indicators in human populations
 The use of populations distinguishes epidemiology
from other biomedical sciences and clinical medicine
 Basic features of population epidemiology:



Quantitative/empirical
Probabilistic
Comparative
Causal Inference in Epidemiology
Bridging the gap between our ideas and our observations.
Criteria:








Strength of association
Consistency of findings
Specificity of association
Temporality (lack of ambiguity)
Biologic gradient (dose-response effect)
Biologic plausibility of the hypothesis
Coherence of evidence
Experimental evidence
Confounding: A Fundamental
Problem of Causal Inference
 Confounding is bias due to inherent (unobservable)
differences in risk between exposed and unexposed
populations, i.e., a lack of comparability.
 Confounding is usually not a major source of bias in
randomized trials (assuming sample size is large
enough) because randomization tends to equalize
inherent risks between treatment groups
(treated group = exposed, untreated = unexposed)
Confounding
 May lead to observation of association when
none exists
 May obscure an association that exists
 Information on potential confounders should be
collected in the study and used in analysis,
otherwise they cannot be excluded as alternate
explanations for findings
 Confounding factors must be considered during
study design
Confounding
 Mixing of the effect of the exposure on disease with the
effect of another factor that is associated with the
exposure
 Bias in estimating the effect of exposure (E) on disease
(D) occurrence, due to the lack of comparability between
exposed and unexposed populations
 Risk among exposed ≠ Risk among exposed if they had
been unexposed
Confounding
 We cannot directly examine the correctness of the
comparability assumption that defines confounding
(presence or absence of confounding cannot be
observed because it depends on a counterfactual
condition: risk in the exposed group in the absence of
exposure)
 Instead we attempt to identify and control for empirical
manifestations of confounding.
Properties of Confounders
3 Criteria for a variable to be a confounder (C):
 C must be a risk factor for the disease (D) in the
unexposed population
 C must be associated with exposure (E) in the
population from which the cases arose
 The association between C and E must not be
due entirely to the effect of E on C (meaning C
cannot be an intermediate step between E and D)
EXPOSURE
DISEASE
EXPOSURE
DISEASE
CONFOUNDER
INTERMEDIATE
EXPOSURE
DISEASE
CONFOUNDER
Example of Confounding
Alcohol drinking
Oral cancer
Potential Confounders
Example of Confounding
Alcohol drinking
Oral cancer
Cigarette smoking
Example of Confounding
Birth order
Down Syndrome
Potential Confounders
Down
Syndrome
by
Birth
Order
Cases of Down syndroms by birth order
Cases per 100 000
live births
180
160
140
120
100
80
60
40
20
0
1
2
3
Birth order
4
5
Second, third and fourth child are more
often affected by Down Syndrome than
the first child
DownCases
Syndrome
by by
Maternal
of Down Syndrom
age groups Age
Cases per
100000 live
births
1000
900
800
700
600
500
400
300
200
100
0
< 20
20-24
25-29
30-34
Age groups
35-39
40+
Down Syndrome by Birth Order
and Maternal Age
Cases per
100000
1000
900
800
700
600
500
400
300
200
100
0
1
2
3
Birth order
4
5
<2
0
25
-29
35
-39
up
o
r
g
e
g
A
s
Example of Confounding
Birth Order
Down Syndrome
Maternal Age
Confounding or Intermediate Effect?
 If a covariate is an intermediate variable (I) in the
causal pathway linking E and D, then conventional
adjustment for this variable will produce a biased
estimate of the net E effect.
 Typically, the direction of this bias will be toward the
null (no effect).
 The process of executing sophisticated statistical
modeling is, at times, divorced from making sound
causal inference.
Confounding or Intermediate Effect?
 Researchers should carefully scrutinize each variable
considered for adjustment in an attempt to report
unbiased estimates of the effect of exposure.
 Bulterys & Morgenstern proposed the term “iatrogenic
bias” to denote bias introduced by the analyst when
inappropriately controlling for variables as though they
were confounders (Paediatr Perinat Epidemiol 1993; 7:387-94).
Confounding or Intermediate Effect?
 The process of covariate adjustment depends critically
on the investigator’s prior knowledge of disease etiology
and on adequate resources for measuring confounders
accurately.
 Graphical examination of the relationships among 3 or
more variables useful.
 Alternative, more complex analytic approaches such as
G-estimation (Robins JM et al.) may also be used.
Confounding or Intermediate Effect?
Physical Activity
Colorectal Cancer
?
Body Mass Index
Obesity
Confounding and/or Intermediate
Effect?
 In many instances, it may be most appropriate to present
both adjusted and unadjusted estimates of effect. Thus,
readers can assess the sensitivity of conclusions to
alternative assumptions about the possible effect of the
exposure on certain covariates.
 CAN YOU THINK OF EXAMPLES?
Residual Confounding
 If a confounding variable is misclassified, the ability to
control confounding in the analysis is hampered.
 If confounding is strong and the E – D relation is weak,
misclassification of the confounding variable can lead to
very misleading results.
 Residual confounding occurs when adjustment is not
sufficiently fine to take into account the full variability of
the outcome.
Example: adjusting for smoking history using a crude
ever/never variable vs. using detailed smoking duration
or age began smoking.
Effect Measure Modification
 Heterogeneity in measure of effect across levels of a
third variable
 Identify a subgroup with a lower or higher risk to study
interaction between risk factors, and to target public
health action
HIV prevalence and age difference in years between
pregnant women and spouse/partner, Zambia, 2004
Age Difference
between women and
spouse/partner
All Women 15-44 Years
%HIV+
POR (95% CI)
Partner is younger
18.4 0.86 (0.60-1.22)
Partner 0-1 yrs older
20.9 1.00
Partner 2-3 yrs older
17.1 0.79 (0.64-0.97)
Partner 4-5 yrs older
17.5 0.81 (0.66-0.99)
Partner 6-7 yrs older
19.4 0.91 (0.74-1.12)
Partner 8-9 yrs older
21.2 1.02 (0.81-1.28)
Partner 10+ yrs older
23.5 1.16 (0.94-1.44)
HIV prevalence and age difference in years between
pregnant women and spouse/partner, Zambia, 2004
Age Difference
between women and
spouse/partner
All Women 15-44 Years
%HIV+
POR (95% CI)
Young Women 15-19 Years
%HIV+
POR (95% CI)
Partner is younger
18.4 0.86 (0.60-1.22)
0 --
Partner 0-1 yrs older
20.9 1.00
7.8 1.00
Partner 2-3 yrs older
17.1 0.79 (0.64-0.97)
9.2 1.21 (0.57-2.56)
Partner 4-5 yrs older
17.5 0.81 (0.66-0.99)
10.1 1.34 (0.65-2.78)
Partner 6-7 yrs older
19.4 0.91 (0.74-1.12)
13.7 1.88 (0.91-3.90)
Partner 8-9 yrs older
21.2 1.02 (0.81-1.28)
13.6 1.88 (0.86-4.10)
Partner 10+ yrs older
23.5 1.16 (0.94-1.44)
19.9 2.94 (1.40-6.20)
Controlling Confounding
In the design
 Restrict the study
population
 Matching
 Collect information on
potential confounders
In the analysis
 Control for confounding
through
 Restrict the analysis to
subgroups
 Stratified analysis
 Multivariable
regression
Restriction
Restrict the study or the analysis to a subgroup that is
homogenous for the possible confounder.
Evaluation of Confounding and Effect Modification
by Stratification
 Consider potential confounders and effect measure
modifiers
 Stratify by levels of potential confounder or modifiers
 Compute stratum specific measures of association
(OR or RR)
 Evaluate similarity of stratum specific estimates (test
for homogeneity)
 If stratum specific estimates are similar, then
calculate summary adjusted estimate
 Evaluate change in estimate between crude and
adjusted estimates (5%, 10%, 20%)
 If the effect are not uniform, and are statistically
different, then report stratum specific estimates
Adjusting for Confounding:
Stratified Analysis
Strengths
 Ease and clarity of presentation
 Mantel-Haenszel method combines subgroups to
provide a summary
Weaknesses
 Small numbers in the subgroups
 Adjusts for only one variable (the stratum)
Adjusting for Confounding:
Multivariate Analysis
 Analyze data in a statistical model that includes both the
presumed cause (exposure) and possible confounders
 Determine a priori the criteria for inclusion of covariates in
the model (prior knowledge, change in estimate)
 Evaluate the independent effect of an exposure after
adjustment for other measured confounders
Multivariate Analysis
Strengths
 Can adjust for multiple covariates simultaneously
Weaknesses
Subjects with missing data on covariates are deleted
from analysis, may lead to biased results
Sophisticated process requires valid assumptions on
which the model is based.
Results can be difficult to display or explain to
inexperienced readers
Limitations of Regression Modeling
 The logistic regression model and the Cox proportional
hazards model are most commonly used. Both models
are based on similar assumptions (e.g., joint effects are
multiplicative).
 Selection of variables in the model should be based
primarily on prior knowledge of relevant associations.
 Liberal use of graphical methods is recommended for
checking the reasonableness of model assumptions.
 Model-based results should always be subjected to
sensitivity analyses.
Model Building
Terms in the model
Model colorectal cancer = Physical activity
0.60 (0.44-0.83)
Model colorectal cancer = Body mass index
6.31 (1.55-25.70)
Model colorectal cancer = Age + physical activity 0.64 (0.42-0.96)
Model colorectal cancer = Age + physical activity + body mass index
0.73 (0.52-1.01)
Model Building
Terms in the model
Model colorectal cancer = Physical activity
0.60 (0.44-0.83)
Model colorectal cancer = Age + physical activity 0.64 (0.42-0.96)
(0.64 – 0.60) = 0.04; (0.04/0.60 x 100) = 6.7%
Model colorectal cancer = Age + physical activity + body mass index
0.73 (0.52-1.01)
(0.73 – 0.64) = .09; (0.09/0.64 x 100) = 14.1%
MET-hours per week – year before enrollment
Colon cancer, men
Terms in model
Highest vs. lowest
Age
Age + education
Age + family history
Age + BMI
Age + energy
Age + occupation
Age + cigarette smoking
Age + alcohol
Age + aspirin
Age + multivitamin use
Age + fiber
Age + folate
Age + calcium
Age + red meat
Age + vegetables
Age + fruit
Age + hours spent sitting
0.64 (0.42-0.96)
0.67 (0.45-1.02)
0.64 (0.42-0.96)
0.69 (0.46-1.04)
0.64 (0.42-0.96)
0.64 (0.43-0.97)
0.65 (0.43-0.98)
0.64 (0.43-0.97)
0.64 (0.43-0.97)
0.65 (0.43-0.97)
0.68 (0.45-1.03)
0.67 (0.45-1.02)
0.66 (0.43-0.99)
0.66 (0.44-0.99)
0.67 (0.44-1.01)
0.66 (0.44-1.00)
0.63 (0.42-0.95)
Further Reading
 Modern Epidemiology (3rd Edition). Eds: K. Rothman, S.
Greenland, T Lash. Lippincott et al, 2008. [chapters 2, 9,
12, 21 & 26]
 Rothman KJ, Greenland S. Causation and causal
inference in epidemiology. Am J Public Health 2005;
95:S144-S150.
 Greenland S, Morgenstern H. Confounding in health
research. Annu Rev Public Health 2001; 22:189-212.
 Special thanks to Drs. Bob Fontaine and Marc Bulterys.
Exercise
Modify what you wrote down:
- What is the research question (issue)?
- What is/are the outcome(s) or disease(s)?
- What is/are the exposure(s)?
- What’s the study population? Where? Age?
- What data will you collect? What variables?
- How will you collect the data?
- What analyses will you perform?
- What manuscripts will you generate?