Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Recall: Matching  Matching can control for extraneous sources of variability and increase the.

Transcript Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Recall: Matching  Matching can control for extraneous sources of variability and increase the.

Logistic Regression III: Advanced
topics
Conditional Logistic Regression for
Matched Data
Recall: Matching

Matching can control for extraneous sources
of variability and increase the power of a
statistical test.
 Match M controls to each case based on
potential confounders, such as age and
gender.
 If the data are matched, you must account
for the matching in the statistical analysis!!
Recall: Agresti example,
diabetes and MI
Match each MI case to an MI control based on
age and gender.
Ask about history of diabetes to find out if
diabetes increases your risk for MI.
MI controls
MI cases
Diabetes
No Diabetes
Diabetes
9
37
No diabetes
16
82
25
119
odds(“favors” case/discordant pair) =
b 37
OR  
c 16
46
98
144
Conditional Logistic Regression
The Conditional Likelihood: each
discordant stratum (rather than individual)
gets 1 term in the likelihoodThe denominator is the probability
The numerator is the probability (as a
function of exposures) that the case
gets disease and the control does not.
all strata

i 1
that the case gets disease and the
control does not OR that the control
(with all her exposures) gets disease
and the case doesn’t (with all her
exposure).
P( D / case  exposures) * P(~ D / control  exposures)
P( D / case  exposures) * P(~ D / control  exposures)  P(~ D / case  exposures) * P( D / control  exposures)
For each stratum, we add to the likelihood: the CONDITIONAL probability that the
case got disease and the control did not, given that we have a case-control pair.
Note: the marginal probability of disease may differ in each age-gender stratum, but
we assume that the (multiplicative) increase in disease risk due to exposure is
constant across strata.
Recall probability terms:
P( D / E )
ln(
)  α   (1)
1  P( D / E )
P( D / ~ E )
ln(
)  α   (0)  α
1  P( D / ~ E )
e  
P( D / E ) 
 
1 e
P(~ D / E ) 
1
 
1 e
e
P( D / ~ E ) 
1  e
P(~ D / ~ E ) 
1

1 e
Case (MI)
Diabetes
1
No diabetes
0
Case (MI)
Diabetes
1
No diabetes
0
Case (MI)
Diabetes
No diabetes
Diabetes
No diabetes
0
Control
1
L( ,  ) 
0
e i  
1  e i  
Control
e i  
0
L( ,  )    
1
ei
1  e i  
Control
L( ,  ) 
0
Case (MI)
Control
0
0
1
e i  
1  e i  
L( ,  ) 
e
*
1
*
1  e i 1  e i  
1
e
1
*

*
1  e i 1  e i 1  e i  
e i
i
1
*
1  e i   1  e i
1
e i
1
*

*
1  e i 1  e i 1  e i  
e i
1
1
1
e i  
1
*
1  e i   1  e i  
1
i  
1
1
e
*

*
1  e  i   1  e i   1  e  i  
1 e
1
i
1  e i 1  e i
*

1
1  e i
1
*
e
i
1  e i 1  e i
1
The conditional likelihood=
Each age-gender stratum has the same
baseline odds of disease; but these
baseline odds may differ across strata
m discordant strata
that favor the control
1
 j 
j
*
e
j
1 e
1 e
x
j
 j 
1
e
e
1
j 1
*

*
 j 
j
 j 
j
1 e
1 e
1 e
1 e
i  
e
1
n discordant strata
*
that favor the case
i  
1 e
1  e i
i  
i
e
1
1
e
i 1
*

*
i  
i
i  
1 e
1 e
1  e i
1 e


Conditional Logistic Regression
m discordant strata
that favor the control


j
e
 j 
e
j1
n discordant strata
that favor the case
j
e
x

i 1
i  
e
e i    e i
* * * The e i ' s cancel!! (gets rid of nuisance parameter)

m

j 1
n
1
e
x



e  1 i 1 e  1

1 m e n
( 
) ( 
)
e 1 e 1
Example: MI and diabetes
1 16 e  37
L(  )  ( 
) ( 
)
e 1
e 1
Conditional Logistic Regression
log(L)  37   53 * log(e   1)

dlog(L)
53e
 37 - 
0
d
e 1
37(e   1)  53e 
37  16e 
37

e 
16
In SAS…

proc logistic data = YourData;
model MI (event = "Yes") = diabetes;
strata PairID;
run;
Example:Prenatal ultrasound examinations and risk
of childhood leukemia: case-control study
BMJ 2000;320:282-283

Could there be an association between exposure to
ultrasound in utero and an increased risk of
childhood malignancies?
 Previous studies have found no association, but
they have had poor statistical power to detect an
association.
 Swedish researchers performed a nationwide
population based case-control study using
prospectively assembled data on prenatal exposure
to ultrasound.
Example:Prenatal ultrasound examinations and risk
of childhood leukemia: case-control study
BMJ 2000;320:282-283

535 cases: all children born and diagnosed
as having myeloid leukemia between
1973 and 1989 in Swedish registers of birth,
cancer, and causes of death.
 535 matched controls: 1 control was
randomly selected for each case from the
Swedish Birth Registry, matched by sex and
year and month of birth.
Leukemia
cases
Ultrasound
No ultrasound
Myeloid leukemia
controls
Ultrasound
No Ultrasound
115
85
200
100
235
335
215
320
535
b 85
OR  
 .85
c 100
But this type of analysis is limited to single dichotomous exposure…

Used conditional logistic regression to look at
dose-response with number of ultrasounds:

Results:
 Reference OR = 1.0; no ultrasounds
 OR =.91 for 1-2 ultrasounds
 OR=.64 for >=3 ultrasounds

Conclusion: no evidence of a positive association
between prenatal ultrasound and childhood
leukemia; even evidence of inverse association
(which could be explained by reasons for frequent
ultrasound)
Extension: 1:M matching

Each term in the likelihood represents a
stratum of 1+M individuals
 More complicated likelihood expression!
 Just as easy to implement in SAS as we’ll
see Wednesday…
Ordinal Logistic Regression
Ordinal Logistic Regression
What if your outcome variable has more than two
levels?
For ordinal outcomes, use ordinal logistic regression:
*Relies on the cumulative logit
*Models the predicted probability of multiple
outcomes
*Also known as the “proportional odds model”
Ordinal Variable Example: Likert
Scale
Cumulative outcomes:
1 = strongly disagree
2 = disagree
3 = neutral
4 = agree
5 = strongly agree
*strongly agree vs. the rest
*agree or strongly agree vs. neutral
or negative
*agree or neutral vs. negative
*the rest vs. strongly negative
Ordinal logistic regression gives you a way to model these
cumulative outcomes all at once!
Ordinal Variable Example: Continuous
variable measured crudely
1 = breastfed >=6 months
2 = breastfed 4-5 months
3 = breastfed 2-3 months
4 = breastfed <2 months
The outcome variable, breastfeeding,
was only measured at limited time
points. So, may not be best modeled
as continuous variable in linear
regression. Use ordinal logistic!
Another example, 3 levels:
From my data on runners:
More
inclusive
Most
definition
of
“severe”
a “positive”
outcome
outcome
1 = eumenorrhea (normal menses) (66.6%)
2 = oligomenorrhea (mild irregularity) (24.6%)
3 = amenorrhea (severe irregularity) (8.6%)
Cumulative logit, 3 groups
(2 potential “positive” outcomes)


p
amenorrhea

cumulativelogit for amenorrhea log
p

 oligomenorrea or normal 
 pamenorrhea or oligomenorrhea 

cumulativelogit for any irregularity  log
pnormal


In words:
The log odds of having amenorrhea (versus everything else).
And the log odds of having any irregularity (versus normal).
Corresponding logistic model (no
predictors)
The intercept-only model, no predictors (two intercepts!):
Log odds (amenorrhea)= amen
Log odds (any irregularity)= amen or oligo
Fitted model:
Logit of amenorreha=
8.6% of my sample has amenorrhea
Odds = 8.6/91.4=.094
Ln (.094) = -2.3623
Logit of any irregularity=
33.3% has any irregularity (24.6% + 8.6%)
Odds=(1/3)/(2/3) = 1/2
Ln(1/2) = -.70
Fitted models are: Log odds (amenorrhea)= -2.36
Log odds (any irregularity)= -0.70
Logistic model with predictors:
Log odds (amenorrhea)= amen + β1*X1 + β2*X2
Log odds (any irregularity)= amen or oligo + β1*X1 + β2*X2
Note, different intercepts but shared betas (shared slopes)!
Odds ratio interpretation (a):
odds of amenorrheafor theexposed
OR 
odds of amenorrheafor theunexposed


 amen   exp osure (1)   confounder (1)
e
 amen   exp osure ( 0 )   confounder (1)
e
 amen   exp osure (1)   confounder (1)
e
 amen   exp osure ( 0 )   confounder (1)
e

e
 exp osure (1)
1
e
 exp osure (1)
Odds ratio interpretation (b):
odds of any menstrualirregularity for theexposed
OR 
odds of any menstrualirregularity for theunexposed


 amenorolig o   exp osure (1)   confounder (1)
e
 amenorolig o   exp osure ( 0 )   confounder (1)
e
 amenorolig o   exp osure (1)   confounder (1)
e
 amenorolig o   exp osure ( 0 )   confounder (1)
e

e
 exp osure (1)
1
e
 exp osure (1)
Odds ratio interpretation:
Interpretation of the betas:
eβ = adjusted odds ratio
For every 1-unit increase in X, it’s the increase in the odds of any
menstrual irregularity compared with none and it’s also the
increase in the odds of amenorrhea compared with the other two
categories (adjusted for any other predictors in the model).
Note: proportional odds assumption! The odds ratios are the
same across different levels of the outcome.
Example predictor, EDI-A:
Score on the anorexia subscale of the eating disorder inventory
(EDI-A)
Cumulative logit plot (4 bins)
These lines should be linear and parallel (equal slopes, one beta!)
The slopes represent the increase in the log odds of either outcome for
every 1-unit increase in EDI-A score.
The intercept for
any irregularity
(the log odds of
any irregularity
where EDI-A=0)
The intercept for
amenorrhea (the
log odds of
amenorrhea where
EDI-A=0)
Fitted model with EDI-A:
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate
Intercept 1 1 -3.2630
Intercept 2 1 -1.3888
EDIA
1 0.1211
Standard
Wald
Error Chi-Square Pr > ChiSq
0.3823
72.8648
<.0001
0.2478
31.4220
<.0001
0.0265
20.9065
<.0001
Log odds (amen)= -3.2630 + 0.1211*EDI-A
Log odds (any irregularity)= -1.3888 + 0.1211*EDI-A
Fitted Model: Predicted logit at
every level of EDI-A
Compare actual data and fitted
model:
Fitted model with EDI-A:
Odds Ratio Estimates
Effect
EDIA
Estimate
1.129
Point
95% Wald
Confidence Limits
1.072
1.189
For every 1-unit increase in EDI-A score, there’s a 13% increase
in the odds of being amenorrheic versus the other two categories
and a 13% increase in the odds of being amenorrheic or
oligomenorrheic versus normal.
Predictions:
Log odds (outcome)= -3.2630 + -1.3888 + 0.1211*EDIA-1
The model predicts that a woman with an EDI-A score of 15
would have:
e 3.2630 .1211(15 )
e 1.4461
P (amen 1) 

 19%
3.2630 .1211(15 )
1.4461
1 e
1 e
e 1.3888 .1211(15)
e.4281
P (anyirregularity  1) 

 60.5%
1.3888 .1211(15 )
.4281
1 e
1 e
Predictions:
Predicted logit=.4281
50%
probability
line
Predicted probability = 60.5%
Predicted logit=-1.446
Predicted probability = 19%
Advantages & disadvantages

Ordinal logistic is better than running separate
logistic models for different outcomes (e.g., one
model for amenorrhea, one model for any
irregularity) because of the improvement in
statistical power!
 Ordinal logistic prevents you from having to
arbitrarily turn an ordinal variable into a binary
variable!
 But does require that you meet the proportional
odds assumption…

Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Recall: Matching  Matching can control for extraneous sources of variability and increase the.

Transcript Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Recall: Matching  Matching can control for extraneous sources of variability and increase the.

Directory