Transcript Slide 1

Summer School
Week 2
1
Contents
• Logistic regression refresher
• Some familiar + some less familiar polytomous
models
• 1PL/2PL in Stata and R
• PCM/RSM/GRM in R
• Link IRT to CFA/UV in Mplus
• DIF/MIMIC in Mplus
2
Types of outcome
• Two categories
– Binary / dichotomous
• Ordered
– e.g. low birthweight (< 2500g), height > 6ft, age > 70
• Unordered
– e.g. gender, car-ownership, disease status
• Presence of ordering is unimportant for binaries
3
Types of outcome
• 3+ categories
– Polytomous
• Ordered (ordinal)
– Age (<30,30-40,41+)
– “Likert” items (str disagree, disagree, …, str agree)
• Unordered (nominal)
– Ethnicity (white/black/asian/other)
– Pet ownership (none/cat/dog/budgie/goat)
4
Modelling options (LogR/IRT)
Logistic regression
models
IRT models
Binary logistic
Binary IRT
(1PL (Rasch), 2PL, 3PL)
Nominal
Multinomial logistic
Nominal response model
(2PL)
Ordinal
Ordinal logistic e.g.
proportional odds model
Partial Credit (1PL),
Rating Scale (1PL),
Graded Response (2PL)
Outcome
Binary
Polytomous
5
Binary Logistic Regression
6
Binary Logistic Regression
exp(c  X )
P(u  1 | X ) 
1  exp(c  X )
Probability of a positive
response / outcome
given a covariate
Intercept
Regression
coefficient
7
Binary Logistic Regression
exp(c  X )
P(u  1 | X ) 
1  exp(c  X )
Probability of a negative
response
1
P(u  0 | X ) 
1  exp(c  X )
8
Logit link function
 P(u  1 | X ) 
Logit  ln 
 c  X

1  P(u  1 | X ) 
• Probabilities only in range [0,1]
• Logit transformation is cts in range (–inf,inf)
• Logit is linear in covariates
9
Simple example – cts predictor
• Relationship between birthweight and head
circumference (at birth)
.3
.2
.1
0
variable |
mean
sd
----------+-----------------bwt | 3381.5g 580.8g
-----------------------------
Density
.4
– birthweight (standardized)
.5
• Exposure
-6
-4
-2
0
2
4
bwt_z
10
Simple example – cts predictor
• Outcome
– Head-circumference ≥ 53cm
.1
.05
0
Density
.15
.2
headcirc |
Freq.
%
----------+----------------0 |
8,898
84.4%
1 |
1,651
15.7%
----------+----------------Total |
10,549
30
40
50
60
headcirc
11
.6
.4
0
.2
headcirc
.8
1
Simple example – cts predictor
-6
-4
-2
0
2
4
bwt_z
The raw data – doesn’t show much
12
Simple example – cts predictor
Logistic regression models the probabilities
(here shown for deciles of bwt)
|
bwt_z_grp
headcirc |
0
1
2
3
4 |
-----------+-------------------------------------------------------+-0 |
1,006
993
1,050
946
1,024 |
|
99.80
98.12
97.95
96.04
93.35 |
-----------+-------------------------------------------------------+-1 |
2
19
22
39
73 |
|
0.20
1.88
2.05
3.96
6.65 |
-----------+-------------------------------------------------------+--
headcirc |
5
6
7
8
9 |
Total
-----------+-------------------------------------------------------+---------0 |
931
922
856
688
381 |
8,797
|
89.95
84.98
81.84
66.67
35.94 |
84.33
-----------+-------------------------------------------------------+---------1 |
104
163
190
344
679 |
1,635
|
10.05
15.02
18.16
33.33
64.06 |
15.67
-----------+-------------------------------------------------------+---------13
Simple example – cts predictor
Percentage with +ve outcome
100
80
60
40
20
0
1
2
3
4
5
6
7
8
9
10
Birthweight group
Increasing, non-linear relationship
14
Simple example – cts predictor
Logistic regression
Log likelihood = -3240.9881
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
10432
2577.30
0.0000
0.2845
-----------------------------------------------------------------------------headcirc | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------bwt_z |
7.431853
.378579
39.38
0.000
6.72569
8.212159
------------------------------------------------------------------------------
Or in less familiar log-odds format
-----------------------------------------------------------------------------headcirc |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------bwt_z |
2.005775
.0509401
39.38
0.000
1.905935
2.105616
_cons | -2.592993
.0474003
-54.70
0.000
-2.685896
-2.50009
------------------------------------------------------------------------------
15
Simple example – cts predictor
-5
-10
-15
logitp
0
5
Fitted model – logit scale
-6
-4
-2
0
2
4
bwt_z
16
Simple example – cts predictor
0
5
Fitted model – logit scale
-5
-10
Slope = 2.00
-15
logitp
Cons
= -2.59
-6
-4
-2
0
2
4
bwt_z
17
Simple example – cts predictor
-5
-10
-15
logitp
0
5
But also…a logit of zero represents point at which both
levels of outcome are equally likely
-6
-4
-2
0
2
4
bwt_z
18
Simple example – cts predictor
.6
.4
.2
0
Pr(headcirc)
.8
1
Fitted model – probability scale
-6
-4
-2
0
2
4
bwt_z
19
Simple example – cts predictor
.6
.2
.4
Point at which
curve changes
direction
0
Pr(headcirc)
.8
1
Fitted model – probability scale
-6
-4
-2
0
2
4
bwt_z
20
Simple example – cts predictor
Observed and fitted values (within deciles of bwt)
Percentage with +ve outcome
100
80
60
40
20
0
1
2
3
4
5
6
7
8
9
10
Birthweight group
21
LogR cts predictor - summary
-5
-15
-10
logitp
0
5
• Logit is linearly related to covariate
• Gradient gives strength of association
• Intercept is related to prevalence of outcome
+ seldom used
-6
-4
-2
0
2
4
.6
.4
.2
0
Pr(headcirc)
.8
1
bwt_z
-6
-4
-2
0
bwt_z
2
4
• Non-linear (S-shaped) relationship between
probabilities and covariate
• Steepness of linear-section infers
strength of association
• Point at which curve changes direction is where
P(u=1|X) = P(u=0|X) can be thought of as the
location + is related to prevalence of outcome
22
LogR – binary predictor
• Define binary predictor: bwt ≥ 8lb
– 32% of the sample had a birthweight of 8lb+
• Same outcome
– Head circumference > 53cm
• Does being 8lb+ at birth increase the chance of
you being born with a larger head?
23
Association can be cross-tabbed
headcirc
bwt_8lb |
0
1 |
Total
-----------+----------------------+---------0 |
6,704
384 |
7,088
|
94.58
5.42 |
100.00
-----------+----------------------+---------1 |
2,093
1,251 |
3,344
|
62.59
37.41 |
100.00
-----------+----------------------+---------Total |
8,797
1,635 |
10,432
|
84.33
15.67 |
100.00
24
Association can be cross-tabbed
headcirc
bwt_8lb |
0
1 |
Total
-----------+----------------------+---------0 |
6,704
384 |
7,088
|
94.58
5.42 |
100.00
-----------+----------------------+---------1 |
2,093
1,251 |
3,344
|
62.59
37.41 |
100.00
-----------+----------------------+---------Total |
8,797
1,635 |
10,432
|
84.33
15.67 |
100.00
Familiar with
(6704*1251)/(2093*384)
= 10.43 = odds-ratio
25
Association can be cross-tabbed
headcirc
bwt_8lb |
0
1 |
Total
-----------+----------------------+---------0 |
6,704
384 |
7,088
|
94.58
5.42 |
100.00
-----------+----------------------+---------1 |
2,093
1,251 |
3,344
|
62.59
37.41 |
100.00
-----------+----------------------+---------Total |
8,797
1,635 |
10,432
|
84.33
15.67 |
100.00
Familiar with
(6704*1251)/(2093*384)
= 10.43 = odds-ratio
However
ln[(6704*1251)/(2093*384)]
= 2.345 = log odds-ratio
26
Association can be cross-tabbed
headcirc
bwt_8lb |
0
1 |
Total
-----------+----------------------+---------0 |
6,704
384 |
7,088
|
94.58
5.42 |
100.00
-----------+----------------------+---------1 |
2,093
1,251 |
3,344
|
62.59
37.41 |
100.00
-----------+----------------------+---------Total |
8,797
1,635 |
10,432
|
84.33
15.67 |
100.00
However
ln[(6704*1251)/(2093*384)]
= 2.345 = log odds-ratio
Familiar with
(6704*1251)/(2093*384)
= 10.43 = odds-ratio
and
ln[(384)/(6704)]= ln(0.057)
= -2.86 = intercept on logit scale
27
Logit output (from Stata)
. logit headcirc bwt_8lb
Logistic regression
Log likelihood = -3703.6925
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
10432
1651.89
0.0000
0.1823
-----------------------------------------------------------------------------headcirc |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------bwt_8lb |
2.345162
.063486
36.94
0.000
2.220732
2.469592
_cons | -2.859817
.0524722
-54.50
0.000
-2.962661
-2.756974
------------------------------------------------------------------------------
28
-2
0
-3
.1
.2
-2.5
.3
.4
logitp_binary
-1.5
-1
-.5
What lovely output figures!
0
.2
.4
.6
.8
1
bwt_8lb
There is still an assumed s-shape on
probability scale although the curve
is not apparent
0
.2
.4
.6
.8
1
bwt_8lb
Linear relationship in logit space
29
-2
Intercept = -2.86
.3
.4
logitp_binary
-1.5
-1
-.5
What lovely output figures!
0
-3
.1
.2
-2.5
Slope = 2.35
0
.2
.4
.6
.8
1
bwt_8lb
There is still an assumed s-shape on
probability scale although the curve
is not apparent
0
.2
.4
.6
.8
1
bwt_8lb
Linear relationship in logit space
30
LogR binary predictor - summary
• The same maths/assumptions underlie the
models with a binary predictor
• Estimation is simpler – can be done from
crosstab rather than needing ML
• Regression estimates relate to linear
relationship on logit scale
31
Multinomial Logistic Regression
32
Multinomial Logistic Regression
• Typically used for non-ordinal (nominal) outcomes
• Can be used for ordered data (some information is
ignored)
• 3+ outcome levels
• Adding another level adds another set of parameters so
more than 4 or 5 levels can be unwieldy
33
Multinomial Logistic Regression
P(u  j | X ) 
exp(c j   j X )
J
 exp(ci  i X )
i 0
where c0 = α0 = 0
Here the probabilities are obtained by a “divide-by-total”
procedure
34
Examples
• Outcome: head-circumference
– 4 roughly equal groups (quartiles)
– Ordering will be ignored
headcirc4 |
Freq.
Percent
------------+--------------------------<= 49cm |
2,574
24.4%
49.1–50.7cm |
2,655
25.2%
50.8–51.9cm |
2,260
21.4%
52+ cm |
3,060
29.0%
------------+--------------------------Total |
10,549
100.00
• Exposure 1: birthweight of 8lb or more
• Exposure 2: standardized birthweight
35
Exposure 1: bwt > 8lb
• 32% of the sample had a birthweight of 8lb+
• Does being 8lb+ at birth increase the chance of you
being born with a larger head
• Unlike the logistic model we are concerned with three
probabilities
• P(headcirc = 49.1 – 50.7cm)
• P(headcirc = 50.8 – 51.9cm)
• P(headcirc = 52+cm)
• Each is referenced against the “negative response” i.e.
that headcirc <= 49cm
36
Exposure 1: bwt > 8lb
. mlogit headcirc4 bwt_8lb, baseoutcome(0)
Multinomial logistic regression
-------------------------------------------------------------headcirc4 |
Coef.
SE
z
P>|z|
[95% CI]
-------------+-----------------------------------------------1
|
bwt_8lb |
1.56
.135
11.53
0.000
1.30
1.83
_cons |
-.07
.029
-2.30
0.022
-0.12 -0.01
-------------+-----------------------------------------------2
|
bwt_8lb |
3.09
.129
23.98
0.000
2.84
3.34
_cons |
-.58
.034
-17.33
0.000
-0.65 -0.52
-------------+-----------------------------------------------3
|
bwt_8lb |
4.39
.127
34.43
0.000
4.14
4.64
_cons |
-.99
.039
-25.56
0.000
-1.06 -0.92
-------------------------------------------------------------(headcirc4==0 is the base outcome)
3 sets of results
Each is reference against the “baseline” group, i.e. <=49cm
37
Exposure 1: bwt > 8lb
. mlogit headcirc4 bwt_8lb, baseoutcome(0)
Multinomial logistic regression
--------------------------------headcirc4 |
Coef.
(SE)
-------------+------------------1
|
bwt_8lb |
1.56
(.135)
_cons |
-.07
(.029)
-------------+------------------2
|
bwt_8lb |
3.09
(.129)
_cons |
-.58
(.034)
-------------+------------------3
|
bwt_8lb |
4.39
(.127)
_cons |
-.99
(.039)
--------------------------------(headcirc4==0 is the base outcome)
Logistic regression
---------------------------------head_1 |
Coef.
Std. Err.
--------+------------------------bwt_8lb |
1.56099
.1353772
_cons | -.0664822
.0289287
---------------------------------Logistic regression
---------------------------------head_2 |
Coef.
Std. Err.
--------+------------------------bwt_8lb |
3.088329
.1287576
_cons | -.5822197
.0335953
---------------------------------Logistic regression
---------------------------------head_3 |
Coef.
Std. Err.
--------+------------------------bwt_8lb |
4.389338
.127473
_cons | -.9862376
.0385892
----------------------------------
38
Exposure 1: bwt > 8lb
. mlogit headcirc4 bwt_8lb, baseoutcome(0)
Multinomial logistic regression
--------------------------------headcirc4 |
Coef.
(SE)
-------------+------------------1
|
bwt_8lb |
1.56
(.135)
_cons |
-.07
(.029)
-------------+------------------2
|
bwt_8lb |
3.09
(.129)
_cons |
-.58
(.034)
-------------+------------------3
|
bwt_8lb |
4.39
(.127)
_cons |
-.99
(.039)
--------------------------------(headcirc4==0 is the base outcome)
Logistic regression
---------------------------------head_1 |
Coef.
Std. Err.
--------+------------------------bwt_8lb |
1.56099
.1353772
_cons | -.0664822
.0289287
---------------------------------Logistic regression
---------------------------------head_2 |
Coef.
Std. Err.
--------+------------------------bwt_8lb |
3.088329
.1287576
_cons | -.5822197
.0335953
---------------------------------Logistic regression
---------------------------------head_3 |
Coef.
Std. Err.
--------+------------------------bwt_8lb |
4.389338
.127473
_cons | -.9862376
.0385892
----------------------------------
39
Exposure 1: bwt > 8lb
• For a categorical exposure, a multinomial logistic model
fitted over 4 outcome levels gives the same estimates as
3 logistic models, i.e.
Multinomial(0v1,0v2,0v3)
≡
Logit(0v1)
Logit(0v2)
Logit(0v3)
• In this instance, the single model is merely more
convenient and allows the testing of equality constraints
40
Exposure 2: Continuous bwt
• Using standardized birthweight we are interesting in how
the probability of having a larger head, i.e.
• P(headcirc = 49.1 – 50.7cm)
• P(headcirc = 50.8 – 51.9cm)
• P(headcirc = 52+cm)
increases as birthweight increases
As with the binary logistic models, estimates will reflect
• A change in log-odds per SD change in birthweight
• The gradient or slope when in the logit scale
41
Exposure 2: Continuous bwt
mlogit headcirc4 bwt_z, baseoutcome(0)
Multinomial logistic regression
-------------------------------------------------------------headcirc4 |
Coef.
SE
z
P>|z|
[95% CI]
-------------+-----------------------------------------------1
|
bwt_z |
2.10
.063
33.11
0.000
1.97
2.22
_cons |
1.06
.044
23.85
0.000
0.97
1.14
-------------+-----------------------------------------------2
|
bwt_z |
3.52
.078
44.89
0.000
3.37
3.68
_cons |
0.78
.046
16.95
0.000
0.69
0.87
-------------+-----------------------------------------------3
|
bwt_z |
4.88
.086
56.90
0.000
4.72
5.05
_cons |
0.33
.051
6.51
0.000
0.23
0.43
-------------------------------------------------------------(headcirc4==0 is the base outcome)
42
Exposure 2: Continuous bwt
Multinomial logistic regression
-----------------------------headcirc4 |
Coef. (SE)
-------------+---------------1
|
bwt_z |
2.10 (.063)
_cons |
1.06 (.044)
-------------+---------------2
|
bwt_z |
3.52 (.078)
_cons |
0.78 (.046)
-------------+---------------3
|
bwt_z |
4.88 (.086)
_cons |
0.33 (.051)
-----------------------------(headcirc4==0 is the base
outcome)
No longer identical
Logistic regression
------------------------------------head_1 |
Coef.
Std. Err.
-------+----------------------------bwt_z |
2.093789
.0650987
_cons |
1.058445
.0447811
------------------------------------Logistic regression
------------------------------------head_2 |
Coef.
Std. Err.
-------+----------------------------bwt_z |
3.355041
.0959539
_cons |
.6853272
.0464858
------------------------------------Logistic regression
------------------------------------head_3 |
Coef.
Std. Err.
-------+----------------------------bwt_z |
3.823597
.1065283
_cons |
.3129028
.0492469
------------------------------------43
20
10
10
0
logitp4
0
logitp3
-6
-4
-2
0
2
bwt_z
4
-10
-20
-10
-20
-10
-5
0
logitp2
5
10
20
Exposure 2: Continuous bwt
-6
-4
-2
0
2
bwt_z
4
-6
-4
-2
0
2
4
bwt_z
Outcome level 2
= [49.1 – 50.7]
Outcome level 3
= [50.8 – 51.9]
Outcome level 4
= [52.0 –]
Intercept = 1.06
Slope = 2.10
Shallowest
Intercept = 0.78
Slope = 3.52
Intercept = 0.33
Slope = 4.88
Steepest
Risk of being in outcome
level 4 increases most
sharply as bwt increases
44
0
0
.1
.2
.2
.3
-6
.8
1
-4
-4
-2
0
-2
bwt_z
0
.6
-6
.4
Pr(headcirc4==3)
.4
Pr(headcirc4==2)
0
0
.1
.2
.2
.3
.6
.4
Pr(headcirc4==1)
.4
Pr(headcirc4==0)
.8
1
Can plot probabilities for all 4 levels
bwt_z
2
2
4
4
-6
-6
-4
-4
-2
bwt_z
-2
0
2
4
bwt_z
0
2
4
45
0
.2
.4
.6
.8
1
Or altogether on one graph….
-6
-4
-2
0
2
4
bwt_z
Pr(headcirc4==0)
Pr(headcirc4==2)
Pr(headcirc4==1)
Pr(headcirc4==3)
46
0
.2
.4
.6
.8
1
Or altogether on one graph….
-6
-4
-2
0
2
4
bwt_z
Pr(headcirc4==0)
Pr(headcirc4==2)
Pr(headcirc4==1)
Pr(headcirc4==3)
47
Ordinal Logistic Models
48
Ordinal Logistic Models
• When applicable, it is useful to favour ordinal models
over multinomial models
• If outcome levels are increasing, e.g. in severity of a
condition or agreement with a statement, we expect the
model parameters to behave in a certain way
• The typical approach is to fit ordinal models with
constraints resulting in greater parsimony (less
parameters)
49
Contrasting among response categories some alternative models
Outcome
Level 1
Model 1
C1
C2
Model 2
C3
Model 3
C1
C1
C2
Level 2
Level 3
C1
C3
C1
C1
C2
C2
C2
C3
C2
Level 4
C3
C3
C3
For a 4-level outcome there are three comparisons to be made
Model 1 – that used in the multinomial logistic model
Model 2 – used with the proportional-odds ordinal model
Model 3 – adjacent category model
50
Proportional odds model
• For an ordinal outcome, all three models shown on the
previous slide could be fitted.
• Which one is chosen will depend on the type of data
being analysed, and the assumptions you wish to make
• POM is very popular in the literature (thanks to SPSS)
• Consists of model 2 applied with a parameter constraint
51
Proportional odds model
P(u  j | X ) 
exp(c j  X )
1  exp(c j  X )
Alternative but equivalent
parameterizations are
sometimes used
 P(u  j | X ) 
logit  ln 
 c j  X

 P(u  j | X ) 
• The alphas do not have a subscript, hence the proportional
odds assumption, AKA equal slopes
52
Proportional odds model
• As we are modelling cumulative probabilities, the
probability of a specific response category occurring
must be obtained by subtraction
P(u  j | X )
 P(u  j | X )  P(u  j  1 | X )
53
E.g. for 3-category ordinal 0,1,2
exp(c0  X )
exp(c1  X )
P(u  1 | X ) 

1  exp(c1  X ) 1  exp(c0  X )
exp(c2  X )
exp(c1  X )
P(u  2 | X ) 

1  exp(c2  X ) 1  exp(c1  X )
54
Exposure: continuous birthweight
. gologit2 headcirc4 bwt_z, npl
-20
-10
0
10
20
Generalized Ordered Logit Estimates
------------------------------headcirc4 |
Coef. (SE)
-------------+----------------0
|
bwt_z |
2.79 (.059)
_cons |
1.90 (.040)
-------------+----------------1
|
bwt_z |
2.61 (.049)
_cons | -0.13 (.026)
-------------+----------------2
|
bwt_z |
2.23 (.048)
_cons | -1.55 (.034)
-------------------------------
Unconstrained model
-6
-4
-2
0
2
4
bwt_z
logit_npl1
logit_npl3
logit_npl2
WARNING! 8 in-sample cases have an outcome with a predicted probability that is less than 0.
See the gologit2 help section on Warning Messages for more information.
55
. gologit2 headcirc4 bwt_z if kz030>1.25, npl
Generalized Ordered Logit Estimates
Log likelihood = -10396.564
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
=
=
=
=
10422
7981.91
0.0000
0.2774
-----------------------------------------------------------------------------headcirc4 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------0
|
bwt_z |
2.785725
.0585952
47.54
0.000
2.67088
2.900569
_cons |
1.904522
.0395861
48.11
0.000
1.826935
1.982109
-------------+---------------------------------------------------------------1
|
bwt_z |
2.60551
.0494048
52.74
0.000
2.508678
2.702341
_cons | -.1269783
.0260562
-4.87
0.000
-.1780475
-.0759092
-------------+---------------------------------------------------------------2
|
bwt_z |
2.232922
.0481026
46.42
0.000
2.138642
2.327201
_cons | -1.550289
.0336295
-46.10
0.000
-1.616202
-1.484377
-----------------------------------------------------------------------------56
Exposure: continuous birthweight
. gologit2 headcirc4 bwt_z, pl
-15
-10
-5
0
5
10
Generalized Ordered Logit Estimates
------------------------------headcirc4 |
Coef. (SE)
-------------+----------------0
|
bwt_z |
2.53 (.036)
_cons |
1.78 (.031)
-------------+----------------1
|
bwt_z |
2.53 (.036)
_cons | -0.17 (.025)
-------------+----------------2
|
bwt_z |
2.53 (.036)
_cons | -1.70 (.030)
-------------------------------
-6
-4
-2
0
2
4
bwt_z
logit_pl1
logit_pl3
logit_pl2
Constrained model
57
Exposure: continuous birthweight
• Equality constraint has brought us 2 d.f.
• With more covariates (or more outcome levels) the savings could be
considerable
• Model test:
. gologit2 headcirc4 bwt_z, npl store(unconstrained)
. gologit2 headcirc4 bwt_z, pl store(constrained)
. lrtest constrained unconstrained
Likelihood-ratio test
LR chi2(2) =
(Assumption: constrained nested in unconstrained)
Prob > chi2 =
69.78
0.0000
58
1
.8
.6
.4
.2
0
0
.2
.4
.6
.8
1
Exposure: continuous birthweight
-6
-4
-2
0
bwt_z
Pr(headcirc4==0)
Pr(headcirc4==2)
Unconstrained
2
4
-6
-4
-2
0
2
4
bwt_z
Pr(headcirc4==1)
Pr(headcirc4==3)
Pr(headcirc4==0)
Pr(headcirc4==2)
Pr(headcirc4==1)
Pr(headcirc4==3)
Constrained
- Proportional odds
(equal slopes)
59
Introduce a new model
Proportional odds model
P(u  j | X ) 
exp(c j  X )
1  exp(c j  X )
Adjacent category model
P(u  j | X )
 exp(c j   j X )
P(u  j  1 | X )
60
Adjacent-category model
• Adjacent category model is a nominal response
• Each category in turn is compared to it’s nearest
neighbour
• Within each comparison, nothing is said about the other
categories – unless model constraints are imposed
• Potential for testing the ordinal nature of the item – as
we will see later
61
3-category adjacent-category model
Consider a categorical outcome with 3 levels: 0,1,2
P(u  0 | X )
 exp(c0   0 X )
P(u  1 | X )
P(u  1 | X )
 exp(c1  1 X )
P(u  2 | X )
P(u  0 | X )  P(u  1 | X )  P(u  2 | X )  1
62
We can rearrange this to give (us a headache)
P(u  0 | X ) 
exp((c0   0 X )  (c1  1 X ))
1  exp(c1  1 X )  exp((c0   0 X )  (c1  1 X ))
P(u  1 | X ) 
exp(c1  1 X )
1  exp(c1  1 X )  exp((c0   0 X )  (c1  1 X ))
P(u  2 | X ) 
1
1  exp(c1  1 X )  exp((c0   0 X )  (c1  1 X ))
63
And?
Notice
•
•
•
•
The 3 denominators are the same
Each response category probability can be written
DIRECTly as ratio of sums of exponents
Also known as a “divide by total” method
This is contrary to POM which was an INDIRECT or
“difference” method since probabilities obtained
through subtraction
64
Summary so far
• Aim of logistic regression is to model non-linear
probabilistic relationship between explanatory
variables and a categorical outcome
• The logit form of the regression is assumed
linear in its regression terms
• Can be classified as a generalized linear model
65
Summary so far
• For polytomous outcomes, J outcome levels
leads to J-1 regression analyses
• Appropriate (ordinal) data gives the potential for
parsimony through constraints such as
proportional odds
66
Worked example
• Outcome:
– 4-level categorical head circumference
• Exposure:
– Standardized birthweight
• Aim:
– Fit the equivalent to Stata’s -mlogit- and -gologit2functions in R and compare with adjacent category model
67
Ingredients
• one dataset “data_for_R2.dta”
• one R-script “bwt headcirc for R.R”
• one copy of R (latest version 2.9.2)
• R packages
– Foreign (to load dataset)
– VGAM (to run models)
68
[R] Multinomial / adjacent category models
##########################################
# multinomial logit model - cts covariate
##########################################
fit_multinom = vglm(headc4 ~ bwtz, multinomial(refLevel=1))
constraints(fit_multinom)
coef(fit_multinom, matrix=TRUE)
Summary(fit_multinom)
##################################################
# adjacent categories logit model - cts covariate
##################################################
fit_adjcat = vglm(headc4 ~ bwtz, acat)
constraints(fit_adjcat)
coef(fit_adjcat, matrix=TRUE)
summary(fit_adjcat)
69
[R] Cumulative logistic model
##########################################
# cumulative model - cts covariate
##########################################
# constrained (parallel) model
###############################
fit_pom = vglm(headc4 ~ bwtz, cumulative(parallel=TRUE, reverse=TRUE))
constraints(fit_pom)
coef(fit_pom, matrix=TRUE)
# unconstrained (non-parallel) model
#####################################
fit_nonpom = vglm(headc4 ~ bwtz, cumulative(parallel=FALSE, reverse=TRUE))
constraints(fit_nonpom)
coef(fit_nonpom, matrix=TRUE)
# Check the proportional odds assumption with a LRT
####################################################
1 - pchisq(2*(logLik(fit_nonpom)-logLik(fit_pom)),
df=length(coef(fit_nonpom))-length(coef(fit_pom)))
70
Non-POM in VGAM
• VGAM procedure struggles with unequal slopes
when continuous predictor strongly related to
outcome
• This occurred in this example and also in the
exercises
• We are using R as a useful teaching
environment, but it is free, and not everything is
100% reliable
71
summary(fit_multinom)
Call:
vglm(formula = headc4 ~ bwtz, family = multinomial(refLevel = 1))
Coefficients:
Value Std. Error
(Intercept):1 1.05708
0.044320
(Intercept):2 0.78024
0.046040
(Intercept):3 0.33206
0.050998
bwtz:1
2.09505
0.063267
bwtz:2
3.52171
0.078459
bwtz:3
4.88490
0.085853
t value
23.8514
16.9471
6.5113
33.1146
44.8858
56.8987
Number of linear predictors: 3
Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1]),
log(mu[,4]/mu[,1])
Dispersion Parameter for multinomial family:
1
Residual Deviance: 20946.78 on 31290 degrees of freedom
Log-likelihood: -10473.39 on 31290 degrees of freedom
Number of Iterations: 6
72
summary(fit_adjcat)
Call:
vglm(formula = headc4 ~ bwtz, family = acat)
Coefficients:
Value Std. Error t value
(Intercept):1 1.05708
0.044320 23.8514
(Intercept):2 -0.27685
0.031349 -8.8312
(Intercept):3 -0.44817
0.040190 -11.1512
bwtz:1
2.09505
0.063267 33.1146
bwtz:2
1.42666
0.057806 24.6800
bwtz:3
1.36319
0.052771 25.8323
Number of linear predictors: 3
Names of linear predictors:
log(P[Y=2]/P[Y=1]), log(P[Y=3]/P[Y=2]), log(P[Y=4]/P[Y=3])
Dispersion Parameter for acat family:
1
Residual Deviance: 20946.78 on 31290 degrees of freedom
Log-likelihood: -10473.39 on 31290 degrees of freedom
Number of Iterations: 6
73
summary(fit_pom)
Call:
vglm(formula = headc4 ~ bwtz, family = cumulative(parallel = TRUE,
reverse = TRUE))
Pearson Residuals:
Min
1Q
Median
3Q
Max
logit(P[Y>=2]) -20.3575 0.013626 0.138778 0.46234 8.1378
logit(P[Y>=3]) -9.7316 -0.461336 0.030467 0.44740 12.7044
logit(P[Y>=4]) -6.0701 -0.374240 -0.134896 0.30254 14.7435
Coefficients:
Value Std. Error t value
(Intercept):1 1.78467
0.031368 56.8950
(Intercept):2 -0.17112
0.025049 -6.8316
(Intercept):3 -1.70386
0.030295 -56.2419
bwtz
2.52516
0.036331 69.5052
Number of linear predictors: 3
Names of linear predictors: logit(P[Y>=2]), logit(P[Y>=3]), logit(P[Y>=4])
Dispersion Parameter for cumulative family:
1
Residual Deviance: 20862.91 on 31292 degrees of freedom
Log-likelihood: -10431.46 on 31292 degrees of freedom
Number of Iterations: 6
74
summary(fit_nonpom)
Call:
vglm(formula = headc4 ~ bwtz, family = cumulative(parallel = FALSE,
reverse = TRUE), maxit = 100)
Pearson Residuals:
Min
1Q
Median
3Q
Max
logit(P[Y>=2]) -26.8061 0.010352 0.127186 0.39716 10.639
logit(P[Y>=3]) -10.2091 -0.488329 0.031261 0.44992 15.139
logit(P[Y>=4]) -4.8869 -0.375265 -0.151709 0.36328 10.577
Coefficients:
Value Std. Error t value
(Intercept):1 1.90529 0.0392664 48.5222
(Intercept):2 -0.12164 0.0238242 -5.1056
(Intercept):3 -1.54436 0.0249593 -61.8751
bwtz:1
2.77653 0.0566026 49.0531
bwtz:2
2.57170 0.0085618 300.3694
bwtz:3
2.21528 0.0048085 460.7037
Number of linear predictors: 3
Names of linear predictors: logit(P[Y>=2]), logit(P[Y>=3]), logit(P[Y>=4])
Dispersion Parameter for cumulative family:
1
Residual Deviance: 20793.62 on 31290 degrees of freedom
Log-likelihood: NA on 31290 degrees of freedom
Number of Iterations: 67
75
Summary of estimates
Contrast
Multinom
Adjcat
Pom
Non-pom
1v2
2.10 (0.06)
2.10 (0.06)
1v3
3.52 (0.08)
1v4
4.88 (0.09)
1 v 2-4
2.53 (0.04)
2.78 (0.060
1-2 v 3-4
2.53 (0.04)
2.57 (0.009)
1-3 v 2-4
2.53 (0.04)
2.22 (0.005)
2v3
1.43 (0.06)
3v4
1.36 (0.05)
76
Plots for multinomial model
77
Plots for adjacent category model
LOGITS have different interpretation compared with last model
78
Plots for cumulative models: POM
79
OCC’s for POM
80
Plots for cumulative models: non-POM
81
OCC’s for non-POM
82
Some decision about which is best?
• Seen an exhausting but non-exhaustive range of ordinal
models
• The decision regarding which is “best” is not really a
statistical one
• One should always consider a variety of models to
assess whether ones conclusions are affected by a
(perhaps arbitrary) model choice
• Here practical differences are likely to be minor, at least
away from the distribution tails
83
Now for some exercises
84
Exercise 1 – binary predictor
Categorical outcome and binary predictor
• Using a simple crosstab, estimate log-odds estimates for
– Multinomial logistic
– Un-equal slopes cumulative logistic
– Adjacent category logistic
• Verify the results by fitting the equivalent models in R
using -VGAM-
85
Exercise 2 – continuous predictor
Categorical outcome and continuous predictor
• Using the -VGAM- function estimate the following models
–
–
–
–
Multinomial logistic
Un-equal slopes cumulative logistic
Equal slopes cumulative logistic (POM)
Adjacent category logistic
• Examine the predicted probabilities/logits graphically for
each model
86
Data for the exercise –bmi11to14data.dta• Binary predictor (bmi7_med)
– BMI ≥ median (15.6) at age 7
• Continuous predictor (bmi7/bmi7z)
– cts BMI at age 7
• 4-level categorical outcome (bmi14_g4)
– BMI at age 14
– Categorised: min/20=0 20/22=1 22/24=2 24/max=3
87
Exercise 1
• Crosstab of bmi7>20 and bmi14_g4
|
bmi14_g4
bmi7_med |
0
1
2
3 |
Total
-----------+--------------------------------------------+---------0 |
2,079
313
71
37 |
2,500
|
83.2%
12.5%
2.8%
1.5% |
-----------+--------------------------------------------+---------1 |
707
690
464
639 |
2,500
|
28.3%
27.6%
18.6%
25.6% |
-----------+--------------------------------------------+---------Total |
2,786
1,003
535
676 |
5,000
88
Optional “homework”
• Refit these models in a more familiar stats package e.g.
– SPSS: nomreg / PLUM
(http://www.norusis.com/pdf/ASPC_v13.pdf)
– Stata: mlogit / gologit2
– http://www.ats.ucla.edu/stat/Stata/dae/ologit.htm
– SAS: proc logistic
– http://www.ats.ucla.edu/stat/sas/dae/ologit.htm
• ACAT not possible in SPSS, nor Stata (except possibly
in GLLAMM)
89
http://cran.r-project.org/
90
91
92