Transcript Title

Controlling Confounders
Logistic Regression
Aya Goto
Nguyen Quang Vinh
SLIDE 1
Key concepts
 Confounding  Indicative of true
association. Can be controlled at the
designing or analysis stage.
 Bias  Should be minimized at the
designing stage.
 Random errors  Is the nature of
quantitative data.
 Non-differential (random) misclassification
 Is the nature of (inaccurate) measurement.
SLIDE 2
Review question
In a cohort study, pregnant women with
intended and unintended pregnancy are
compared, and it is found that a proportion
of mothers who lack confidence in child
rearing after birth is higher than in the
mothers with unintended pregnancy.
SLIDE 3
EXAMPLE OF (
)
Higher proportion of mothers with unintended
pregnancy are first-time mothers, and thus they
naturally have less confidence in child rearing.
We are not sure if the observed relationship
between the pregnancy intention and maternal
confidence is true or not true due to the effect
of pregnancy history. This is called (
)
because the observation is correct, but should
be carefully interpreted to foresee the truth.
SLIDE 4
EXAMPLE OF (
)
Interview by a public health nurse after birth
about parenting is less complete among women
with intended pregnancy because they are
considered as lower risk mothers. This is called
(
) because the observation itself is in
error.
SLIDE 5
EXAMPLE OF (
)
By chance, there are more episodes of losing
confidence in the unintended pregnancy group
in the study sample.
EXAMPLE OF (
)
Lack of good information on pregnancy
intention results in some intended pregnancy
mothers being randomly classified as
unintended pregnancy, and vice-versa. If this
happens, the study finding underestimates the
true RR.
SLIDE 6
7
“Mothers with unintended pregnancy
tend to lose confidence.”
Pregnancy
intention
Maternal
confidence
SLIDE 7
Fist-time
motherhood
Confounding
It occurs when there is a confounder, which
is associated with both exposure and
disease independently.
Exposure
Disease
Confounder
SLIDE 8
http://www.amazon.co.jp/
Coffee-Cigarettes-RobertoBenigni/dp/B0001XAO7U
Does drinking coffee
increase the risk of
myocardial infarction?
Coffee
MI
Smoking
SLIDE 9
Control confounding at the designing stage
Strategy
Advantages
Disadvantages
Specification
“Include only
non-smokers.”
• Easily understood
• Limits generalizability
• May limit sample size
Matching
“Match smoking
status of cases
and controls”
• Useful for eliminating
influence of strong
constitutional
confounders like age and
sex
• Decision to match must be
made when designing and can
have irreversible adverse effects
on analysis
• Time consuming
• Can not analyze associations of
matched variables with the
outcome
SLIDE 10
Control confounding at the analysis stage
Strategy
Stratification
“Conduct analysis
separately for
smokers and nonsmokers.”
Statistical adjustment
“Conduct multivariate
analysis controlling
(adjusting) for
smoking status.”
Advantages
Disadvantages
• Easily understood
• Reversible
• May be limited by sample
size for each stratum
• Difficult to control for
multiple confounders
• Multiple confounders
can be controlled.
• Reversible
• Need advanced statistical
techniques
• Results may be difficult to
understand
SLIDE 11
“Whichever method you choose,
you have to know potential
confounders reported in previous
studies.”
 Literature searching is important
SLIDE 12
Correlation and regression
Correlation
“Variable A and B are correlated / associated.”
Regression
“Variable A predicts / explains Variable B.”
Commonly used statistical analyses can only be
applied to a linear relationship.
SLIDE 13
Linear relationship
No
Yes
Variable A predicts / explains Variable B
No
Yes
Correlation
Regression
Parametric
Non-para.
Parametric
Pearson’s
correlation
Spearman’s
correlation
SLIDE 14
Non-para.
Median
Linear
regression regression
Logistic
regression
UNIVARIATE logistic regression
Y = aX + e
Research question:
Does Factor X1 explain Outcome Y?
Outcome (dependent variable):
Binominal (yes or no)
Factors of interest (independent variables):
Any type of data
X
Pregnancy
intention
Y
Low
confidence
SLIDE 15
Assumptions
(For advanced learners)
Linearity: Logistic regression does not require linear
relationships between the independent factor or covariates and
the dependent, but it does assume a linear relationship
between the independents and the log odds (logit) of the
dependent. One strategy for mitigating lack of linearity of a
continuous variable is to divide it into categories.
Normal distribution: The dependent variable need not be
normally distributed.
Homoscedacity: The dependent variable need not be
homoscedastic for each level of the independents; that is,
there is no homogeneity of variance assumption: variances
need not be the same within categories.
SLIDE 16
MULTIVARIATE logistic regression
Y = aX1 + bX2 + cX3 …. + e
X1
Pregnancy
intention
Y
Low
confidence
X2 Pregnancy history
X3 Financial status
etc.
SLIDE 17
Major types of multivariate model
1. Find associated factors
STEP 1. Univariate analysis
STEP 2. Multivariate analysis
2. Test a specific hypothesis
Multivariate analysis controlling
for potential confounders at once.
SLIDE 18
MODEL 1. Finding associated factors
X1 Pregnancy intention
X2 Pregnancy history
X3 Financial status
etc.
SLIDE 19
Y
Low
confidence
SLIDE 20
MODEL 2. Testing a specific hypothesis
X1
Pregnancy
intention
Y
Low
confidence
X2 Pregnancy history
X3 Financial status
etc.
SLIDE 21
22
SLIDE 22
Influence of pregnancy intention on mother’s
attachment towards her baby
Not
confident
Confident
Intended
42%
58%
Unintended
67%
33%
OR
(95%CI)
1.00
3.1
(1.1-8.8)
The odds ratio was calculated by using intended
pregnancy as a reference group and adjusted by
multiple logistic regression analysis for six factors:
whether she was a first-time mother or not…
SLIDE 23
Technical notes
 Sample size and number of independent factors
“As a rule of thumb, there should be 5 to 10 events
for each X variable.”
Example: A study was conducted among 457
patients receiving aortic grafts, 86 of whom had a
cardiac complication, and factors associated with
its occurrence was analyzed.
 Limit variables to 8 to 16.
SLIDE 24
 Conceptual framework
“Key for clarity and teamwork”
•
•
•
•
•
•
•
•
•
•
•
S o cio - d em o g rap hic
ch aracteristics
Age
R esid en ce in com m u ne
w h o le life
D u ratio n o f m arriage
L ivin g w ith h u sb an d
R elig io n
O ccu p atio n
E d u catio n
H o u seh old assets
H ealth b eh avio rs
F req u en cy o f gen ital w ash ing
W ater u sed fo r gen ital w ash in g
M aterial u sed fo r gen ital
w ashin g
•
•
RTI
•
•
•
•
•
•
P resen t p reg n ancy
co n d itio n
G estatio n al w eek
P reg n an cy related
sym p tom s
N u m b er o f an ten atal
care
C u rren t R T I treatm en t
C u rren t u sage o f
an tib io tics
M ed ical h isto ry
P ast co n tracep tive u se
P reg n an cy h isto ry
A g e at first sexu al in terco u rse
25
SLIDE 25
 Categorization of variables
“Important, challenging and rewarding”
Where to “cut”?
1. Defined cut-off-point
2. Median
3. Quantile (Tertile, quartile, etc.)
How to decide?  Tabulate or draw a graph.
26
SLIDE 26
11
16
18
24
25
26
27
28
29
30
31
32
Intended
1
1
0
6
2
1
5
8
10
9
8
64
Unintended
0
0
1
5
3
0
0
1
1
2
1
10
60 %
50
40
Intended
Unintended
30
20
10
0
11 16 18 24 25 26 27 28 29 30 31 32
27
 Stepwise selection
“Convenient, but do not rely.”
Especially when you are testing a hypothesis.
28
SLIDE 28
STATA: Logistic regression
SLIDE 29
When a categorical variable with a
multiple level is included, enter in the
command window.
Job (1=housewife, 2=office worker,
3=manual worker)
xi: logistic attach pregint age i.job
SLIDE 30
. xi: logistic attach pregint age i.job
Logistic regression
Number of obs = 122
LR chi2(5) = 9.81
Prob > chi2 = 0.0809
Log likelihood = -65.300399 Pseudo R2 = 0.0698
-----------------------------------------------------------------------------aichaku2 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------pregint | 4.268066 2.536309 2.44 0.015 1.331688 13.67917
age | 1.015464 .0508943 0.31 0.759 .9204558 1.120279
job_2 | .7797739 .3511123 -0.55 0.581 .3226223 1.884704
job_3 | .683234 .8093097 1.08 0.279 .6559658 1.319243
-----------------------------------------------------------------------------SLIDE 31
Odds ratio = Odds change for a one-unit change
in the independent variable.
Interpretation:
Dichotomy: The odds ratio is 4.3 for pregnancy intention
(1=intended, 2=unintended), which means that the risk of
less attachment for unintended pregnancy is 4.3 times
higher than intended pregnancy.
Continuous variable: Each additional year of age increases
the risk 1.03 times.
Categorical variable: The “job" has three levels:
1=housewife, 2=office worker, 3=manual worker. The
lowest category “housewife" is the default reference
category (OR=1), meaning that the risk of office workers is
0.8 times higher (=lower) than that of housewives.
SLIDE 32
Pseudo R2 = "percent of variance explained".
Interpretation
About 7% variance of attachment is explained
by the three factors (pregnancy intention, age
and job).
SLIDE 33
STATA: Goodness-of-fit
SLIDE 34
. lfit
Logistic model for attach, goodness-of-fit test
number of observations =
122
number of covariate patterns =
85
Pearson chi2(79) =
89.36
Prob > chi2 =
0.1996
Not significant = the model fits
If the goodness-of-fit test statistic is greater than .05, we
fail to reject the null hypothesis that there is no
difference between observed and model-predicted
values, implying that the model's estimates fit the data
at an acceptable level
SLIDE 35
SPSS: Logistic regression
Statnotes:
Topics in Multivariate Analysis, by G. David Garson
http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm
Binary logistic regression with SPSS, by Karl L. Wuensch
http://www.ecu.edu/cs-cas/psyc/wuenschk/index.cfm
SLIDE 36
In SPSS, binomial logistic regression is under
Analyze - Regression - Binary Logistic
Example:
Factors
associated
with “not
owing a gun”
SLIDE 37
Categorical independent variables must be declared
by clicking “categorical” button. The last category
becomes the reference by default.
SLIDE 38
SLIDE 39
39
OR
Most reported
Goodness-of-fit
SLIDE 40