Logistic and Probit Regression

Download Report

Transcript Logistic and Probit Regression

Logistic Regression Chongming Yang Research Support Center FHSS College

Rules of Logarithm 

Log

(

uv

) =

Log

(u) +

Log

(v) 

Log

(u/v) =

Log

(u) -

Log

(v) 

Log

(

u

) v = v

Log

(u)

Rules of Exponentiation (0

Exponential & Logarithmic  Inverse of One Another  Y = a x  X = Log a (y)

Assumptions of Linear Regression       Y i Y i  I =  +  X i +  i continuous & unbounded expected or mean (  i )= 0 = normally distributed not correlated with predictors Absence of perfect multicollinearity No measurement error in all variables

Violation of LR Assumptions  Dichotomous Dependent Variable (DV)  Unordered Categorical (Nominal) DV  Ordered Categorical (Ordinal) DV

Natural Logarithmic Transformation (Binary DV)  Let p = probability of an event

Logit Model

Rearranged Logit Model

Logistic Model

Odds Ratio

OR

p p

(0) / [1

 

p

(1)

p

(0)]

 

e B

Interpretation of Coefficients (odds ratio)  Dichotomous predictor X1:   The predicted odds of a positive response for group A is ? times the odds for the group B.

The odds of a positive response for group a is ?% higher than the odds for group B.

 Continuous predictor X2:  One unit increase is associated with ?% increase in the predicted odds of X

Interpretation  See Handout

Interpretation of Interaction  Definition:  The effect of a covariate depends on the level of another covariate.

 Interpretation:  Plug in some values of two variables   Plot estimated logit Interpret interaction effect only when main effects is present

Likelihood at value of X (left side of equation)

L

i n

  1    1 

p i p i

  

y i

 1 

p i

Log Likelihood (left side of equation)

Log Logit Model (right side of equation)

Maximum Likelihood Estimation

Likelihood Ratio Test of  0 ,  1 …   Likelihood Ratio Test = Deviance = -2log (likelihood of fitted model / likelihood of Saturated model)  likelihood of Saturated model=1  Deviance = -2log (likelihood of fitted model)

 2 Test of  0 ,  1 … 1.  2 =-2Ln(likelihood of without x )/ (likelihood model with x) 2. Degree of Freedom = j - (p+1) where j = (# of Categories) + (# of continuous variables) p = # of parameters,

Hosmer-Lemeshow Test(

 2

)

(grouping percentile of estimated p) 

k g

  1 (

o k n p k k

 (1

k

k

p k

) ) Where

o k

C k j

  1

y i p k

C k j

  1

j n k

j g

= 10, k = 1..10,

n'

patterns,

p ¯

= number of subjects in k th group, = average estimated probability, df=

g

-2

c k

= # of covariate

y = 1 Group 1 (10% prob.) Group 2 20% prob.

Estimated Observed Estimated Observed y = 0 Estimated N1 N2 … … Group 10 100% prob.

Estimated Observed … … Observed N3 N4

Wald Test of  0 ,  1 …  W =  / se(  ) (se = standard error)  Normal Distribution test

Multinomial Logistic Regression (non-ordered categorical DV)   P = probability of a response category P i1 log + P i2  

p i

1

p i

3 + P i3    = 1

B X

1 log   

p i

2

p i

3    

B X

2 log   

p i

1

p i

2    

B X

3

Multinomial Logistic Regression

p

(

i

k

)  1  1

K K

 1   1

e

x

Interpretation  See handout

Ordinal Logistic Models  Adjacent Category Model  Compare two adjacent categories

Adjacent Categories Model   Let j be an ordinal scale  j = 1…  j & j+1 = two adjacent categories Model log    

p p ij

 1    

a j B x j j

Practice  Run Logistic Regression Using ‘binary.sav’  DV = Admit  IV = gre, gpa, rank  Annotated output: http://www.ats.ucla.edu/stat/spss/dae/logit.htm

Pseudo R-squared (based on Likelihood)  Explained Variability  Improvement from null model to fitted model  Square of correlation (predicted and observed)

Psudo R Square    Cox & Snell  Improvement of full model over intercept model Nagelkerke  Improvement of full model over intercept model McFadden   adjusted R-squared in OLS penalizing a model with too many predictors http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm

Practice (continued)  Run Multinomial Logistic Regression Using ‘mlogit.sav’  DV= Brand  IV = female, age  Annotated output: http://www.ats.ucla.edu/stat/spss/dae/mlogit.htm

Practice (continued)  Run Ordinal Logistic Regression Using ologit.sav

 DV= admit  IV = gre, gpa, topnotch  Annotated output: http://www.ats.ucla.edu/stat/SPSS/dae/ologit.htm

Practical Issues 1. Low Ratio of Cases to Variables  Problem:  Extremely large parameter estimates and standard errors  Solution:   Collapse categories Delete the offending category  Delete discrete predictors

Practical Issues 2. Inadequacy of Expected Frequencies & Power    Problems: Lower power with small frequency cells  Solution: Accept low power   Collapse categories or delete discrete predictors Evaluate model fit with  2

Practical Issues 3. Presence of multicollinearity   Problem:  Large standard errors, or estimates Solution:   Run multiway frequency tables to identify categorical variables Run correlations to identify continuous variables  Delete theoretically less important predictors or combine with other procedures

Practical Issues  Rare events may be appropriate for poisson regression or negative binomial regression.

References 1.

2.

3.

4.

5.

Allison, P. D. (Logistic regression using the SAS system. NC, Cary: SAS Institute, Inc.

Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. New York: John Wiley & Sones, Inc.

Menard, S. (1994). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications, Inc.

Liao, T. F. (1994). Interpreting Probability models: logit, probit, and other generalized linear models. Thousand Oaks, CA: Sage Publications, Inc.

Long, S.J. & Freese, J. (2006). Regression models for categorical dependent variables using stata. College Station, Texus: Stata press