Transcript Logistic and Probit Regression
Logistic Regression Chongming Yang Research Support Center FHSS College
Rules of Logarithm
Log
(
uv
) =
Log
(u) +
Log
(v)
Log
(u/v) =
Log
(u) -
Log
(v)
Log
(
u
) v = v
Log
(u)
Exponential & Logarithmic Inverse of One Another Y = a x X = Log a (y)
Assumptions of Linear Regression Y i Y i I = + X i + i continuous & unbounded expected or mean ( i )= 0 = normally distributed not correlated with predictors Absence of perfect multicollinearity No measurement error in all variables
Violation of LR Assumptions Dichotomous Dependent Variable (DV) Unordered Categorical (Nominal) DV Ordered Categorical (Ordinal) DV
Natural Logarithmic Transformation (Binary DV) Let p = probability of an event
Logit Model
Rearranged Logit Model
Logistic Model
Odds Ratio
OR
p p
(0) / [1
p
(1)
p
(0)]
e B
Interpretation of Coefficients (odds ratio) Dichotomous predictor X1: The predicted odds of a positive response for group A is ? times the odds for the group B.
The odds of a positive response for group a is ?% higher than the odds for group B.
Continuous predictor X2: One unit increase is associated with ?% increase in the predicted odds of X
Interpretation See Handout
Interpretation of Interaction Definition: The effect of a covariate depends on the level of another covariate.
Interpretation: Plug in some values of two variables Plot estimated logit Interpret interaction effect only when main effects is present
Likelihood at value of X (left side of equation)
L
i n
1 1
p i p i
y i
1
p i
Log Likelihood (left side of equation)
Log Logit Model (right side of equation)
Maximum Likelihood Estimation
Likelihood Ratio Test of 0 , 1 … Likelihood Ratio Test = Deviance = -2log (likelihood of fitted model / likelihood of Saturated model) likelihood of Saturated model=1 Deviance = -2log (likelihood of fitted model)
2 Test of 0 , 1 … 1. 2 =-2Ln(likelihood of without x )/ (likelihood model with x) 2. Degree of Freedom = j - (p+1) where j = (# of Categories) + (# of continuous variables) p = # of parameters,
Hosmer-Lemeshow Test(
2
)
(grouping percentile of estimated p)
k g
1 (
o k n p k k
(1
k
k
p k
) ) Where
o k
C k j
1
y i p k
C k j
1
j n k
j g
= 10, k = 1..10,
n'
patterns,
p ¯
= number of subjects in k th group, = average estimated probability, df=
g
-2
c k
= # of covariate
y = 1 Group 1 (10% prob.) Group 2 20% prob.
Estimated Observed Estimated Observed y = 0 Estimated N1 N2 … … Group 10 100% prob.
Estimated Observed … … Observed N3 N4
Wald Test of 0 , 1 … W = / se( ) (se = standard error) Normal Distribution test
Multinomial Logistic Regression (non-ordered categorical DV) P = probability of a response category P i1 log + P i2
p i
1
p i
3 + P i3 = 1
B X
1 log
p i
2
p i
3
B X
2 log
p i
1
p i
2
B X
3
Multinomial Logistic Regression
p
(
i
k
) 1 1
K K
1 1
e
x
Interpretation See handout
Ordinal Logistic Models Adjacent Category Model Compare two adjacent categories
Adjacent Categories Model Let j be an ordinal scale j = 1… j & j+1 = two adjacent categories Model log
p p ij
1
a j B x j j
Practice Run Logistic Regression Using ‘binary.sav’ DV = Admit IV = gre, gpa, rank Annotated output: http://www.ats.ucla.edu/stat/spss/dae/logit.htm
Pseudo R-squared (based on Likelihood) Explained Variability Improvement from null model to fitted model Square of correlation (predicted and observed)
Psudo R Square Cox & Snell Improvement of full model over intercept model Nagelkerke Improvement of full model over intercept model McFadden adjusted R-squared in OLS penalizing a model with too many predictors http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm
Practice (continued) Run Multinomial Logistic Regression Using ‘mlogit.sav’ DV= Brand IV = female, age Annotated output: http://www.ats.ucla.edu/stat/spss/dae/mlogit.htm
Practice (continued) Run Ordinal Logistic Regression Using ologit.sav
DV= admit IV = gre, gpa, topnotch Annotated output: http://www.ats.ucla.edu/stat/SPSS/dae/ologit.htm
Practical Issues 1. Low Ratio of Cases to Variables Problem: Extremely large parameter estimates and standard errors Solution: Collapse categories Delete the offending category Delete discrete predictors
Practical Issues 2. Inadequacy of Expected Frequencies & Power Problems: Lower power with small frequency cells Solution: Accept low power Collapse categories or delete discrete predictors Evaluate model fit with 2
Practical Issues 3. Presence of multicollinearity Problem: Large standard errors, or estimates Solution: Run multiway frequency tables to identify categorical variables Run correlations to identify continuous variables Delete theoretically less important predictors or combine with other procedures
Practical Issues Rare events may be appropriate for poisson regression or negative binomial regression.
References 1.
2.
3.
4.
5.
Allison, P. D. (Logistic regression using the SAS system. NC, Cary: SAS Institute, Inc.
Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. New York: John Wiley & Sones, Inc.
Menard, S. (1994). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications, Inc.
Liao, T. F. (1994). Interpreting Probability models: logit, probit, and other generalized linear models. Thousand Oaks, CA: Sage Publications, Inc.
Long, S.J. & Freese, J. (2006). Regression models for categorical dependent variables using stata. College Station, Texus: Stata press