Transcript Slide 1
Summer School Week 2 1 Contents • Logistic regression refresher • Some familiar + some less familiar polytomous models • 1PL/2PL in Stata and R • PCM/RSM/GRM in R • Link IRT to CFA/UV in Mplus • DIF/MIMIC in Mplus 2 Types of outcome • Two categories – Binary / dichotomous • Ordered – e.g. low birthweight (< 2500g), height > 6ft, age > 70 • Unordered – e.g. gender, car-ownership, disease status • Presence of ordering is unimportant for binaries 3 Types of outcome • 3+ categories – Polytomous • Ordered (ordinal) – Age (<30,30-40,41+) – “Likert” items (str disagree, disagree, …, str agree) • Unordered (nominal) – Ethnicity (white/black/asian/other) – Pet ownership (none/cat/dog/budgie/goat) 4 Modelling options (LogR/IRT) Logistic regression models IRT models Binary logistic Binary IRT (1PL (Rasch), 2PL, 3PL) Nominal Multinomial logistic Nominal response model (2PL) Ordinal Ordinal logistic e.g. proportional odds model Partial Credit (1PL), Rating Scale (1PL), Graded Response (2PL) Outcome Binary Polytomous 5 Binary Logistic Regression 6 Binary Logistic Regression exp(c X ) P(u 1 | X ) 1 exp(c X ) Probability of a positive response / outcome given a covariate Intercept Regression coefficient 7 Binary Logistic Regression exp(c X ) P(u 1 | X ) 1 exp(c X ) Probability of a negative response 1 P(u 0 | X ) 1 exp(c X ) 8 Logit link function P(u 1 | X ) Logit ln c X 1 P(u 1 | X ) • Probabilities only in range [0,1] • Logit transformation is cts in range (–inf,inf) • Logit is linear in covariates 9 Simple example – cts predictor • Relationship between birthweight and head circumference (at birth) .3 .2 .1 0 variable | mean sd ----------+-----------------bwt | 3381.5g 580.8g ----------------------------- Density .4 – birthweight (standardized) .5 • Exposure -6 -4 -2 0 2 4 bwt_z 10 Simple example – cts predictor • Outcome – Head-circumference ≥ 53cm .1 .05 0 Density .15 .2 headcirc | Freq. % ----------+----------------0 | 8,898 84.4% 1 | 1,651 15.7% ----------+----------------Total | 10,549 30 40 50 60 headcirc 11 .6 .4 0 .2 headcirc .8 1 Simple example – cts predictor -6 -4 -2 0 2 4 bwt_z The raw data – doesn’t show much 12 Simple example – cts predictor Logistic regression models the probabilities (here shown for deciles of bwt) | bwt_z_grp headcirc | 0 1 2 3 4 | -----------+-------------------------------------------------------+-0 | 1,006 993 1,050 946 1,024 | | 99.80 98.12 97.95 96.04 93.35 | -----------+-------------------------------------------------------+-1 | 2 19 22 39 73 | | 0.20 1.88 2.05 3.96 6.65 | -----------+-------------------------------------------------------+-- headcirc | 5 6 7 8 9 | Total -----------+-------------------------------------------------------+---------0 | 931 922 856 688 381 | 8,797 | 89.95 84.98 81.84 66.67 35.94 | 84.33 -----------+-------------------------------------------------------+---------1 | 104 163 190 344 679 | 1,635 | 10.05 15.02 18.16 33.33 64.06 | 15.67 -----------+-------------------------------------------------------+---------13 Simple example – cts predictor Percentage with +ve outcome 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 Birthweight group Increasing, non-linear relationship 14 Simple example – cts predictor Logistic regression Log likelihood = -3240.9881 Number of obs LR chi2(1) Prob > chi2 Pseudo R2 = = = = 10432 2577.30 0.0000 0.2845 -----------------------------------------------------------------------------headcirc | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------bwt_z | 7.431853 .378579 39.38 0.000 6.72569 8.212159 ------------------------------------------------------------------------------ Or in less familiar log-odds format -----------------------------------------------------------------------------headcirc | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------bwt_z | 2.005775 .0509401 39.38 0.000 1.905935 2.105616 _cons | -2.592993 .0474003 -54.70 0.000 -2.685896 -2.50009 ------------------------------------------------------------------------------ 15 Simple example – cts predictor -5 -10 -15 logitp 0 5 Fitted model – logit scale -6 -4 -2 0 2 4 bwt_z 16 Simple example – cts predictor 0 5 Fitted model – logit scale -5 -10 Slope = 2.00 -15 logitp Cons = -2.59 -6 -4 -2 0 2 4 bwt_z 17 Simple example – cts predictor -5 -10 -15 logitp 0 5 But also…a logit of zero represents point at which both levels of outcome are equally likely -6 -4 -2 0 2 4 bwt_z 18 Simple example – cts predictor .6 .4 .2 0 Pr(headcirc) .8 1 Fitted model – probability scale -6 -4 -2 0 2 4 bwt_z 19 Simple example – cts predictor .6 .2 .4 Point at which curve changes direction 0 Pr(headcirc) .8 1 Fitted model – probability scale -6 -4 -2 0 2 4 bwt_z 20 Simple example – cts predictor Observed and fitted values (within deciles of bwt) Percentage with +ve outcome 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 Birthweight group 21 LogR cts predictor - summary -5 -15 -10 logitp 0 5 • Logit is linearly related to covariate • Gradient gives strength of association • Intercept is related to prevalence of outcome + seldom used -6 -4 -2 0 2 4 .6 .4 .2 0 Pr(headcirc) .8 1 bwt_z -6 -4 -2 0 bwt_z 2 4 • Non-linear (S-shaped) relationship between probabilities and covariate • Steepness of linear-section infers strength of association • Point at which curve changes direction is where P(u=1|X) = P(u=0|X) can be thought of as the location + is related to prevalence of outcome 22 LogR – binary predictor • Define binary predictor: bwt ≥ 8lb – 32% of the sample had a birthweight of 8lb+ • Same outcome – Head circumference > 53cm • Does being 8lb+ at birth increase the chance of you being born with a larger head? 23 Association can be cross-tabbed headcirc bwt_8lb | 0 1 | Total -----------+----------------------+---------0 | 6,704 384 | 7,088 | 94.58 5.42 | 100.00 -----------+----------------------+---------1 | 2,093 1,251 | 3,344 | 62.59 37.41 | 100.00 -----------+----------------------+---------Total | 8,797 1,635 | 10,432 | 84.33 15.67 | 100.00 24 Association can be cross-tabbed headcirc bwt_8lb | 0 1 | Total -----------+----------------------+---------0 | 6,704 384 | 7,088 | 94.58 5.42 | 100.00 -----------+----------------------+---------1 | 2,093 1,251 | 3,344 | 62.59 37.41 | 100.00 -----------+----------------------+---------Total | 8,797 1,635 | 10,432 | 84.33 15.67 | 100.00 Familiar with (6704*1251)/(2093*384) = 10.43 = odds-ratio 25 Association can be cross-tabbed headcirc bwt_8lb | 0 1 | Total -----------+----------------------+---------0 | 6,704 384 | 7,088 | 94.58 5.42 | 100.00 -----------+----------------------+---------1 | 2,093 1,251 | 3,344 | 62.59 37.41 | 100.00 -----------+----------------------+---------Total | 8,797 1,635 | 10,432 | 84.33 15.67 | 100.00 Familiar with (6704*1251)/(2093*384) = 10.43 = odds-ratio However ln[(6704*1251)/(2093*384)] = 2.345 = log odds-ratio 26 Association can be cross-tabbed headcirc bwt_8lb | 0 1 | Total -----------+----------------------+---------0 | 6,704 384 | 7,088 | 94.58 5.42 | 100.00 -----------+----------------------+---------1 | 2,093 1,251 | 3,344 | 62.59 37.41 | 100.00 -----------+----------------------+---------Total | 8,797 1,635 | 10,432 | 84.33 15.67 | 100.00 However ln[(6704*1251)/(2093*384)] = 2.345 = log odds-ratio Familiar with (6704*1251)/(2093*384) = 10.43 = odds-ratio and ln[(384)/(6704)]= ln(0.057) = -2.86 = intercept on logit scale 27 Logit output (from Stata) . logit headcirc bwt_8lb Logistic regression Log likelihood = -3703.6925 Number of obs LR chi2(1) Prob > chi2 Pseudo R2 = = = = 10432 1651.89 0.0000 0.1823 -----------------------------------------------------------------------------headcirc | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------bwt_8lb | 2.345162 .063486 36.94 0.000 2.220732 2.469592 _cons | -2.859817 .0524722 -54.50 0.000 -2.962661 -2.756974 ------------------------------------------------------------------------------ 28 -2 0 -3 .1 .2 -2.5 .3 .4 logitp_binary -1.5 -1 -.5 What lovely output figures! 0 .2 .4 .6 .8 1 bwt_8lb There is still an assumed s-shape on probability scale although the curve is not apparent 0 .2 .4 .6 .8 1 bwt_8lb Linear relationship in logit space 29 -2 Intercept = -2.86 .3 .4 logitp_binary -1.5 -1 -.5 What lovely output figures! 0 -3 .1 .2 -2.5 Slope = 2.35 0 .2 .4 .6 .8 1 bwt_8lb There is still an assumed s-shape on probability scale although the curve is not apparent 0 .2 .4 .6 .8 1 bwt_8lb Linear relationship in logit space 30 LogR binary predictor - summary • The same maths/assumptions underlie the models with a binary predictor • Estimation is simpler – can be done from crosstab rather than needing ML • Regression estimates relate to linear relationship on logit scale 31 Multinomial Logistic Regression 32 Multinomial Logistic Regression • Typically used for non-ordinal (nominal) outcomes • Can be used for ordered data (some information is ignored) • 3+ outcome levels • Adding another level adds another set of parameters so more than 4 or 5 levels can be unwieldy 33 Multinomial Logistic Regression P(u j | X ) exp(c j j X ) J exp(ci i X ) i 0 where c0 = α0 = 0 Here the probabilities are obtained by a “divide-by-total” procedure 34 Examples • Outcome: head-circumference – 4 roughly equal groups (quartiles) – Ordering will be ignored headcirc4 | Freq. Percent ------------+--------------------------<= 49cm | 2,574 24.4% 49.1–50.7cm | 2,655 25.2% 50.8–51.9cm | 2,260 21.4% 52+ cm | 3,060 29.0% ------------+--------------------------Total | 10,549 100.00 • Exposure 1: birthweight of 8lb or more • Exposure 2: standardized birthweight 35 Exposure 1: bwt > 8lb • 32% of the sample had a birthweight of 8lb+ • Does being 8lb+ at birth increase the chance of you being born with a larger head • Unlike the logistic model we are concerned with three probabilities • P(headcirc = 49.1 – 50.7cm) • P(headcirc = 50.8 – 51.9cm) • P(headcirc = 52+cm) • Each is referenced against the “negative response” i.e. that headcirc <= 49cm 36 Exposure 1: bwt > 8lb . mlogit headcirc4 bwt_8lb, baseoutcome(0) Multinomial logistic regression -------------------------------------------------------------headcirc4 | Coef. SE z P>|z| [95% CI] -------------+-----------------------------------------------1 | bwt_8lb | 1.56 .135 11.53 0.000 1.30 1.83 _cons | -.07 .029 -2.30 0.022 -0.12 -0.01 -------------+-----------------------------------------------2 | bwt_8lb | 3.09 .129 23.98 0.000 2.84 3.34 _cons | -.58 .034 -17.33 0.000 -0.65 -0.52 -------------+-----------------------------------------------3 | bwt_8lb | 4.39 .127 34.43 0.000 4.14 4.64 _cons | -.99 .039 -25.56 0.000 -1.06 -0.92 -------------------------------------------------------------(headcirc4==0 is the base outcome) 3 sets of results Each is reference against the “baseline” group, i.e. <=49cm 37 Exposure 1: bwt > 8lb . mlogit headcirc4 bwt_8lb, baseoutcome(0) Multinomial logistic regression --------------------------------headcirc4 | Coef. (SE) -------------+------------------1 | bwt_8lb | 1.56 (.135) _cons | -.07 (.029) -------------+------------------2 | bwt_8lb | 3.09 (.129) _cons | -.58 (.034) -------------+------------------3 | bwt_8lb | 4.39 (.127) _cons | -.99 (.039) --------------------------------(headcirc4==0 is the base outcome) Logistic regression ---------------------------------head_1 | Coef. Std. Err. --------+------------------------bwt_8lb | 1.56099 .1353772 _cons | -.0664822 .0289287 ---------------------------------Logistic regression ---------------------------------head_2 | Coef. Std. Err. --------+------------------------bwt_8lb | 3.088329 .1287576 _cons | -.5822197 .0335953 ---------------------------------Logistic regression ---------------------------------head_3 | Coef. Std. Err. --------+------------------------bwt_8lb | 4.389338 .127473 _cons | -.9862376 .0385892 ---------------------------------- 38 Exposure 1: bwt > 8lb . mlogit headcirc4 bwt_8lb, baseoutcome(0) Multinomial logistic regression --------------------------------headcirc4 | Coef. (SE) -------------+------------------1 | bwt_8lb | 1.56 (.135) _cons | -.07 (.029) -------------+------------------2 | bwt_8lb | 3.09 (.129) _cons | -.58 (.034) -------------+------------------3 | bwt_8lb | 4.39 (.127) _cons | -.99 (.039) --------------------------------(headcirc4==0 is the base outcome) Logistic regression ---------------------------------head_1 | Coef. Std. Err. --------+------------------------bwt_8lb | 1.56099 .1353772 _cons | -.0664822 .0289287 ---------------------------------Logistic regression ---------------------------------head_2 | Coef. Std. Err. --------+------------------------bwt_8lb | 3.088329 .1287576 _cons | -.5822197 .0335953 ---------------------------------Logistic regression ---------------------------------head_3 | Coef. Std. Err. --------+------------------------bwt_8lb | 4.389338 .127473 _cons | -.9862376 .0385892 ---------------------------------- 39 Exposure 1: bwt > 8lb • For a categorical exposure, a multinomial logistic model fitted over 4 outcome levels gives the same estimates as 3 logistic models, i.e. Multinomial(0v1,0v2,0v3) ≡ Logit(0v1) Logit(0v2) Logit(0v3) • In this instance, the single model is merely more convenient and allows the testing of equality constraints 40 Exposure 2: Continuous bwt • Using standardized birthweight we are interesting in how the probability of having a larger head, i.e. • P(headcirc = 49.1 – 50.7cm) • P(headcirc = 50.8 – 51.9cm) • P(headcirc = 52+cm) increases as birthweight increases As with the binary logistic models, estimates will reflect • A change in log-odds per SD change in birthweight • The gradient or slope when in the logit scale 41 Exposure 2: Continuous bwt mlogit headcirc4 bwt_z, baseoutcome(0) Multinomial logistic regression -------------------------------------------------------------headcirc4 | Coef. SE z P>|z| [95% CI] -------------+-----------------------------------------------1 | bwt_z | 2.10 .063 33.11 0.000 1.97 2.22 _cons | 1.06 .044 23.85 0.000 0.97 1.14 -------------+-----------------------------------------------2 | bwt_z | 3.52 .078 44.89 0.000 3.37 3.68 _cons | 0.78 .046 16.95 0.000 0.69 0.87 -------------+-----------------------------------------------3 | bwt_z | 4.88 .086 56.90 0.000 4.72 5.05 _cons | 0.33 .051 6.51 0.000 0.23 0.43 -------------------------------------------------------------(headcirc4==0 is the base outcome) 42 Exposure 2: Continuous bwt Multinomial logistic regression -----------------------------headcirc4 | Coef. (SE) -------------+---------------1 | bwt_z | 2.10 (.063) _cons | 1.06 (.044) -------------+---------------2 | bwt_z | 3.52 (.078) _cons | 0.78 (.046) -------------+---------------3 | bwt_z | 4.88 (.086) _cons | 0.33 (.051) -----------------------------(headcirc4==0 is the base outcome) No longer identical Logistic regression ------------------------------------head_1 | Coef. Std. Err. -------+----------------------------bwt_z | 2.093789 .0650987 _cons | 1.058445 .0447811 ------------------------------------Logistic regression ------------------------------------head_2 | Coef. Std. Err. -------+----------------------------bwt_z | 3.355041 .0959539 _cons | .6853272 .0464858 ------------------------------------Logistic regression ------------------------------------head_3 | Coef. Std. Err. -------+----------------------------bwt_z | 3.823597 .1065283 _cons | .3129028 .0492469 ------------------------------------43 20 10 10 0 logitp4 0 logitp3 -6 -4 -2 0 2 bwt_z 4 -10 -20 -10 -20 -10 -5 0 logitp2 5 10 20 Exposure 2: Continuous bwt -6 -4 -2 0 2 bwt_z 4 -6 -4 -2 0 2 4 bwt_z Outcome level 2 = [49.1 – 50.7] Outcome level 3 = [50.8 – 51.9] Outcome level 4 = [52.0 –] Intercept = 1.06 Slope = 2.10 Shallowest Intercept = 0.78 Slope = 3.52 Intercept = 0.33 Slope = 4.88 Steepest Risk of being in outcome level 4 increases most sharply as bwt increases 44 0 0 .1 .2 .2 .3 -6 .8 1 -4 -4 -2 0 -2 bwt_z 0 .6 -6 .4 Pr(headcirc4==3) .4 Pr(headcirc4==2) 0 0 .1 .2 .2 .3 .6 .4 Pr(headcirc4==1) .4 Pr(headcirc4==0) .8 1 Can plot probabilities for all 4 levels bwt_z 2 2 4 4 -6 -6 -4 -4 -2 bwt_z -2 0 2 4 bwt_z 0 2 4 45 0 .2 .4 .6 .8 1 Or altogether on one graph…. -6 -4 -2 0 2 4 bwt_z Pr(headcirc4==0) Pr(headcirc4==2) Pr(headcirc4==1) Pr(headcirc4==3) 46 0 .2 .4 .6 .8 1 Or altogether on one graph…. -6 -4 -2 0 2 4 bwt_z Pr(headcirc4==0) Pr(headcirc4==2) Pr(headcirc4==1) Pr(headcirc4==3) 47 Ordinal Logistic Models 48 Ordinal Logistic Models • When applicable, it is useful to favour ordinal models over multinomial models • If outcome levels are increasing, e.g. in severity of a condition or agreement with a statement, we expect the model parameters to behave in a certain way • The typical approach is to fit ordinal models with constraints resulting in greater parsimony (less parameters) 49 Contrasting among response categories some alternative models Outcome Level 1 Model 1 C1 C2 Model 2 C3 Model 3 C1 C1 C2 Level 2 Level 3 C1 C3 C1 C1 C2 C2 C2 C3 C2 Level 4 C3 C3 C3 For a 4-level outcome there are three comparisons to be made Model 1 – that used in the multinomial logistic model Model 2 – used with the proportional-odds ordinal model Model 3 – adjacent category model 50 Proportional odds model • For an ordinal outcome, all three models shown on the previous slide could be fitted. • Which one is chosen will depend on the type of data being analysed, and the assumptions you wish to make • POM is very popular in the literature (thanks to SPSS) • Consists of model 2 applied with a parameter constraint 51 Proportional odds model P(u j | X ) exp(c j X ) 1 exp(c j X ) Alternative but equivalent parameterizations are sometimes used P(u j | X ) logit ln c j X P(u j | X ) • The alphas do not have a subscript, hence the proportional odds assumption, AKA equal slopes 52 Proportional odds model • As we are modelling cumulative probabilities, the probability of a specific response category occurring must be obtained by subtraction P(u j | X ) P(u j | X ) P(u j 1 | X ) 53 E.g. for 3-category ordinal 0,1,2 exp(c0 X ) exp(c1 X ) P(u 1 | X ) 1 exp(c1 X ) 1 exp(c0 X ) exp(c2 X ) exp(c1 X ) P(u 2 | X ) 1 exp(c2 X ) 1 exp(c1 X ) 54 Exposure: continuous birthweight . gologit2 headcirc4 bwt_z, npl -20 -10 0 10 20 Generalized Ordered Logit Estimates ------------------------------headcirc4 | Coef. (SE) -------------+----------------0 | bwt_z | 2.79 (.059) _cons | 1.90 (.040) -------------+----------------1 | bwt_z | 2.61 (.049) _cons | -0.13 (.026) -------------+----------------2 | bwt_z | 2.23 (.048) _cons | -1.55 (.034) ------------------------------- Unconstrained model -6 -4 -2 0 2 4 bwt_z logit_npl1 logit_npl3 logit_npl2 WARNING! 8 in-sample cases have an outcome with a predicted probability that is less than 0. See the gologit2 help section on Warning Messages for more information. 55 . gologit2 headcirc4 bwt_z if kz030>1.25, npl Generalized Ordered Logit Estimates Log likelihood = -10396.564 Number of obs LR chi2(3) Prob > chi2 Pseudo R2 = = = = 10422 7981.91 0.0000 0.2774 -----------------------------------------------------------------------------headcirc4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------0 | bwt_z | 2.785725 .0585952 47.54 0.000 2.67088 2.900569 _cons | 1.904522 .0395861 48.11 0.000 1.826935 1.982109 -------------+---------------------------------------------------------------1 | bwt_z | 2.60551 .0494048 52.74 0.000 2.508678 2.702341 _cons | -.1269783 .0260562 -4.87 0.000 -.1780475 -.0759092 -------------+---------------------------------------------------------------2 | bwt_z | 2.232922 .0481026 46.42 0.000 2.138642 2.327201 _cons | -1.550289 .0336295 -46.10 0.000 -1.616202 -1.484377 -----------------------------------------------------------------------------56 Exposure: continuous birthweight . gologit2 headcirc4 bwt_z, pl -15 -10 -5 0 5 10 Generalized Ordered Logit Estimates ------------------------------headcirc4 | Coef. (SE) -------------+----------------0 | bwt_z | 2.53 (.036) _cons | 1.78 (.031) -------------+----------------1 | bwt_z | 2.53 (.036) _cons | -0.17 (.025) -------------+----------------2 | bwt_z | 2.53 (.036) _cons | -1.70 (.030) ------------------------------- -6 -4 -2 0 2 4 bwt_z logit_pl1 logit_pl3 logit_pl2 Constrained model 57 Exposure: continuous birthweight • Equality constraint has brought us 2 d.f. • With more covariates (or more outcome levels) the savings could be considerable • Model test: . gologit2 headcirc4 bwt_z, npl store(unconstrained) . gologit2 headcirc4 bwt_z, pl store(constrained) . lrtest constrained unconstrained Likelihood-ratio test LR chi2(2) = (Assumption: constrained nested in unconstrained) Prob > chi2 = 69.78 0.0000 58 1 .8 .6 .4 .2 0 0 .2 .4 .6 .8 1 Exposure: continuous birthweight -6 -4 -2 0 bwt_z Pr(headcirc4==0) Pr(headcirc4==2) Unconstrained 2 4 -6 -4 -2 0 2 4 bwt_z Pr(headcirc4==1) Pr(headcirc4==3) Pr(headcirc4==0) Pr(headcirc4==2) Pr(headcirc4==1) Pr(headcirc4==3) Constrained - Proportional odds (equal slopes) 59 Introduce a new model Proportional odds model P(u j | X ) exp(c j X ) 1 exp(c j X ) Adjacent category model P(u j | X ) exp(c j j X ) P(u j 1 | X ) 60 Adjacent-category model • Adjacent category model is a nominal response • Each category in turn is compared to it’s nearest neighbour • Within each comparison, nothing is said about the other categories – unless model constraints are imposed • Potential for testing the ordinal nature of the item – as we will see later 61 3-category adjacent-category model Consider a categorical outcome with 3 levels: 0,1,2 P(u 0 | X ) exp(c0 0 X ) P(u 1 | X ) P(u 1 | X ) exp(c1 1 X ) P(u 2 | X ) P(u 0 | X ) P(u 1 | X ) P(u 2 | X ) 1 62 We can rearrange this to give (us a headache) P(u 0 | X ) exp((c0 0 X ) (c1 1 X )) 1 exp(c1 1 X ) exp((c0 0 X ) (c1 1 X )) P(u 1 | X ) exp(c1 1 X ) 1 exp(c1 1 X ) exp((c0 0 X ) (c1 1 X )) P(u 2 | X ) 1 1 exp(c1 1 X ) exp((c0 0 X ) (c1 1 X )) 63 And? Notice • • • • The 3 denominators are the same Each response category probability can be written DIRECTly as ratio of sums of exponents Also known as a “divide by total” method This is contrary to POM which was an INDIRECT or “difference” method since probabilities obtained through subtraction 64 Summary so far • Aim of logistic regression is to model non-linear probabilistic relationship between explanatory variables and a categorical outcome • The logit form of the regression is assumed linear in its regression terms • Can be classified as a generalized linear model 65 Summary so far • For polytomous outcomes, J outcome levels leads to J-1 regression analyses • Appropriate (ordinal) data gives the potential for parsimony through constraints such as proportional odds 66 Worked example • Outcome: – 4-level categorical head circumference • Exposure: – Standardized birthweight • Aim: – Fit the equivalent to Stata’s -mlogit- and -gologit2functions in R and compare with adjacent category model 67 Ingredients • one dataset “data_for_R2.dta” • one R-script “bwt headcirc for R.R” • one copy of R (latest version 2.9.2) • R packages – Foreign (to load dataset) – VGAM (to run models) 68 [R] Multinomial / adjacent category models ########################################## # multinomial logit model - cts covariate ########################################## fit_multinom = vglm(headc4 ~ bwtz, multinomial(refLevel=1)) constraints(fit_multinom) coef(fit_multinom, matrix=TRUE) Summary(fit_multinom) ################################################## # adjacent categories logit model - cts covariate ################################################## fit_adjcat = vglm(headc4 ~ bwtz, acat) constraints(fit_adjcat) coef(fit_adjcat, matrix=TRUE) summary(fit_adjcat) 69 [R] Cumulative logistic model ########################################## # cumulative model - cts covariate ########################################## # constrained (parallel) model ############################### fit_pom = vglm(headc4 ~ bwtz, cumulative(parallel=TRUE, reverse=TRUE)) constraints(fit_pom) coef(fit_pom, matrix=TRUE) # unconstrained (non-parallel) model ##################################### fit_nonpom = vglm(headc4 ~ bwtz, cumulative(parallel=FALSE, reverse=TRUE)) constraints(fit_nonpom) coef(fit_nonpom, matrix=TRUE) # Check the proportional odds assumption with a LRT #################################################### 1 - pchisq(2*(logLik(fit_nonpom)-logLik(fit_pom)), df=length(coef(fit_nonpom))-length(coef(fit_pom))) 70 Non-POM in VGAM • VGAM procedure struggles with unequal slopes when continuous predictor strongly related to outcome • This occurred in this example and also in the exercises • We are using R as a useful teaching environment, but it is free, and not everything is 100% reliable 71 summary(fit_multinom) Call: vglm(formula = headc4 ~ bwtz, family = multinomial(refLevel = 1)) Coefficients: Value Std. Error (Intercept):1 1.05708 0.044320 (Intercept):2 0.78024 0.046040 (Intercept):3 0.33206 0.050998 bwtz:1 2.09505 0.063267 bwtz:2 3.52171 0.078459 bwtz:3 4.88490 0.085853 t value 23.8514 16.9471 6.5113 33.1146 44.8858 56.8987 Number of linear predictors: 3 Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1]), log(mu[,4]/mu[,1]) Dispersion Parameter for multinomial family: 1 Residual Deviance: 20946.78 on 31290 degrees of freedom Log-likelihood: -10473.39 on 31290 degrees of freedom Number of Iterations: 6 72 summary(fit_adjcat) Call: vglm(formula = headc4 ~ bwtz, family = acat) Coefficients: Value Std. Error t value (Intercept):1 1.05708 0.044320 23.8514 (Intercept):2 -0.27685 0.031349 -8.8312 (Intercept):3 -0.44817 0.040190 -11.1512 bwtz:1 2.09505 0.063267 33.1146 bwtz:2 1.42666 0.057806 24.6800 bwtz:3 1.36319 0.052771 25.8323 Number of linear predictors: 3 Names of linear predictors: log(P[Y=2]/P[Y=1]), log(P[Y=3]/P[Y=2]), log(P[Y=4]/P[Y=3]) Dispersion Parameter for acat family: 1 Residual Deviance: 20946.78 on 31290 degrees of freedom Log-likelihood: -10473.39 on 31290 degrees of freedom Number of Iterations: 6 73 summary(fit_pom) Call: vglm(formula = headc4 ~ bwtz, family = cumulative(parallel = TRUE, reverse = TRUE)) Pearson Residuals: Min 1Q Median 3Q Max logit(P[Y>=2]) -20.3575 0.013626 0.138778 0.46234 8.1378 logit(P[Y>=3]) -9.7316 -0.461336 0.030467 0.44740 12.7044 logit(P[Y>=4]) -6.0701 -0.374240 -0.134896 0.30254 14.7435 Coefficients: Value Std. Error t value (Intercept):1 1.78467 0.031368 56.8950 (Intercept):2 -0.17112 0.025049 -6.8316 (Intercept):3 -1.70386 0.030295 -56.2419 bwtz 2.52516 0.036331 69.5052 Number of linear predictors: 3 Names of linear predictors: logit(P[Y>=2]), logit(P[Y>=3]), logit(P[Y>=4]) Dispersion Parameter for cumulative family: 1 Residual Deviance: 20862.91 on 31292 degrees of freedom Log-likelihood: -10431.46 on 31292 degrees of freedom Number of Iterations: 6 74 summary(fit_nonpom) Call: vglm(formula = headc4 ~ bwtz, family = cumulative(parallel = FALSE, reverse = TRUE), maxit = 100) Pearson Residuals: Min 1Q Median 3Q Max logit(P[Y>=2]) -26.8061 0.010352 0.127186 0.39716 10.639 logit(P[Y>=3]) -10.2091 -0.488329 0.031261 0.44992 15.139 logit(P[Y>=4]) -4.8869 -0.375265 -0.151709 0.36328 10.577 Coefficients: Value Std. Error t value (Intercept):1 1.90529 0.0392664 48.5222 (Intercept):2 -0.12164 0.0238242 -5.1056 (Intercept):3 -1.54436 0.0249593 -61.8751 bwtz:1 2.77653 0.0566026 49.0531 bwtz:2 2.57170 0.0085618 300.3694 bwtz:3 2.21528 0.0048085 460.7037 Number of linear predictors: 3 Names of linear predictors: logit(P[Y>=2]), logit(P[Y>=3]), logit(P[Y>=4]) Dispersion Parameter for cumulative family: 1 Residual Deviance: 20793.62 on 31290 degrees of freedom Log-likelihood: NA on 31290 degrees of freedom Number of Iterations: 67 75 Summary of estimates Contrast Multinom Adjcat Pom Non-pom 1v2 2.10 (0.06) 2.10 (0.06) 1v3 3.52 (0.08) 1v4 4.88 (0.09) 1 v 2-4 2.53 (0.04) 2.78 (0.060 1-2 v 3-4 2.53 (0.04) 2.57 (0.009) 1-3 v 2-4 2.53 (0.04) 2.22 (0.005) 2v3 1.43 (0.06) 3v4 1.36 (0.05) 76 Plots for multinomial model 77 Plots for adjacent category model LOGITS have different interpretation compared with last model 78 Plots for cumulative models: POM 79 OCC’s for POM 80 Plots for cumulative models: non-POM 81 OCC’s for non-POM 82 Some decision about which is best? • Seen an exhausting but non-exhaustive range of ordinal models • The decision regarding which is “best” is not really a statistical one • One should always consider a variety of models to assess whether ones conclusions are affected by a (perhaps arbitrary) model choice • Here practical differences are likely to be minor, at least away from the distribution tails 83 Now for some exercises 84 Exercise 1 – binary predictor Categorical outcome and binary predictor • Using a simple crosstab, estimate log-odds estimates for – Multinomial logistic – Un-equal slopes cumulative logistic – Adjacent category logistic • Verify the results by fitting the equivalent models in R using -VGAM- 85 Exercise 2 – continuous predictor Categorical outcome and continuous predictor • Using the -VGAM- function estimate the following models – – – – Multinomial logistic Un-equal slopes cumulative logistic Equal slopes cumulative logistic (POM) Adjacent category logistic • Examine the predicted probabilities/logits graphically for each model 86 Data for the exercise –bmi11to14data.dta• Binary predictor (bmi7_med) – BMI ≥ median (15.6) at age 7 • Continuous predictor (bmi7/bmi7z) – cts BMI at age 7 • 4-level categorical outcome (bmi14_g4) – BMI at age 14 – Categorised: min/20=0 20/22=1 22/24=2 24/max=3 87 Exercise 1 • Crosstab of bmi7>20 and bmi14_g4 | bmi14_g4 bmi7_med | 0 1 2 3 | Total -----------+--------------------------------------------+---------0 | 2,079 313 71 37 | 2,500 | 83.2% 12.5% 2.8% 1.5% | -----------+--------------------------------------------+---------1 | 707 690 464 639 | 2,500 | 28.3% 27.6% 18.6% 25.6% | -----------+--------------------------------------------+---------Total | 2,786 1,003 535 676 | 5,000 88 Optional “homework” • Refit these models in a more familiar stats package e.g. – SPSS: nomreg / PLUM (http://www.norusis.com/pdf/ASPC_v13.pdf) – Stata: mlogit / gologit2 – http://www.ats.ucla.edu/stat/Stata/dae/ologit.htm – SAS: proc logistic – http://www.ats.ucla.edu/stat/sas/dae/ologit.htm • ACAT not possible in SPSS, nor Stata (except possibly in GLLAMM) 89 http://cran.r-project.org/ 90 91 92