VT PowerPoint Template

Download Report

Transcript VT PowerPoint Template

Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics November 5, 2008

Presentation Outline 1.

2.

3.

Introduction to Generalized Linear Models Binary Response Data Logistic Regression Model Count Response Data Poisson Regression Model

Reference Material

Categorical Data Analysis – Alan Agresti

Examples found with SAS Code at www.stat.ufl.edu/~aa/cda/cda.html

Presentation and Data from Examples

www.stat.vt.edu/consult/short_courses.html

Generalized Linear Models

• • • Generalized linear models (GLM) extend ordinary regression to non-normal response distributions.

3 Components • • • Random – identifies response Y and its probability distribution Systematic – explanatory variables in a linear predictor function (X β) Link function – function (g(.)) that links the mean of the response (E[Y i ]= μ i ) to the systematic component.

Model • g   i  

j

j x ij

for i = 1 to n

Generalized Linear Models

• Why do we use GLM’s?

• • Linear regression assumes that the response is distributed normally GLM’s allow us to analyze the linear relationship between predictor variables and the mean of the response variable when it is not reasonable to assume the data is distributed normally.

Generalized Linear Models

• Predictor Variables • Two Types: Continuous and Categorical • • Continuous Predictor Variables • • Examples – Time, Grade Point Average, Test Score, etc.

Coded with one parameter – β i x i Categorical Predictor Variables • • Examples – Sex, Political Affiliation, Marital Status, etc.

Actual value assigned to Category not important • • Ex) Sex - Male/Female, M/F, 1/2, 0/1, etc.

Coded Differently than continuous variables

Generalized Linear Models

• Categorical Predictor Variables cont.

• Consider a categorical predictor variable with L categories • One category selected as reference category • Assignment of Reference Category is arbitrary • • Variable represented by L-1 dummy variables • Model Identifiability Two types of coding – Dummy and Effect

Generalized Linear Models

• Categorical Predictor Variables cont.

• Dummy Coding (Used in R) • • x x k k = 1 if predictor variable is equal to category k 0 otherwise = 0 for all k if predictor variable equals category I • Effect Coding (Used in JMP) • x k = 1 if predictor variable is equal to category k 0 otherwise • x k = -1 for all k if predictor variable equals category I

Generalized Linear Models

Saturated Model

• • • Contains a separate indicator parameter for each observation Perfect fit μ = y Not useful since there is no data reduction, i.e. number of parameters equals number of observations.

• Maximum achievable log likelihood – baseline for comparison to other model fits

Generalized Linear Models

• Deviance • Let L( μ|y) = maximum of the log likelihood for the model • • L(y|y) = maximum of the log likelihood for the saturated model Deviance = D(y| μ) = -2 [L(μ|y) - L(y|y) ] • • Likelihood Ratio Statistic for testing the null hypothesis that the model is a good alternative to the saturated model Likelihood ratio statistic has an asymptotic chi-squared distribution with N – p degrees of freedom, where p is the number of parameters in the model.

Allows for the comparison of one model to another using the likelihood ratio test.

Generalized Linear Models

• Nested Models • • • Model 1 - model with p predictor variables {X 1 , X 2 , X 3 ,….,X p } and vector of fitted values μ 1 Model 2 - model with q

X

1 + … +  p

X

p + 0 

X

p  1 + 0 

X

p  2 + …  0 

X

q

Generalized Linear Models

• Likelihood Ratio Test • Null Hypothesis: There is not a significant difference between the fit of two models.

• Null Hypothesis for Nested Models: The predictor variables in Model 1 that are not found in Model 2 are not significant to the model fit.

• Alternate Hypothesis for Nested Models - The predictor variables in Model 1 that are not found in Model 2 are significant to the model fit.

• Likelihood Ratio Statistic = -2* [L(y,u 2 )-L(y,u 1 )] = D(y, μ 2 ) - D(y, μ 1 ) • • Difference of the deviances of the two models Always D(y, μ 2 ) > D(y, μ 1 ) implies LRT > 0 LRT is distributed Chi-Squared with p-q degrees of freedom

Generalized Linear Models

• Likelihood Ratio Test cont.

• Later, we will use the Likelihood Ratio Test to test the significance of variables in Logistic and Poisson regression models.

Generalized Linear Models

• Theoretical Example of Likelihood Ratio Test • 3 predictor variables – 1 Continuous (X 1 ), 1 Categorical with 4 Categories (X 2 , X 3 , X 4 ), 1 Categorical with 1 Category (X 5 ) • Model 1 - predictor variables {X 1 , X 2 , X 3 , X 4 , X 5 } • • Model 2 - predictor variables {X 1 , X 5 } Null Hypothesis – Variables with 4 categories is not significant to the model ( β 2 = β 3 = β 4 = 0) • Alternate Hypothesis - Variable with 4 categories is significant • Likelihood Ratio Statistic = D(y, μ 2 ) - D(y, • μ 1 ) Difference of the deviance statistics from the two models • Chi-Squared Distribution with 5-2=3 degrees of freedom

Generalized Linear Models

• Model Selection • 2 Goals: Complex enough to fit the data well Simple to interpret, does not overfit the data • • Study the effect of each predictor on the response Y • • Continuous Predictor – Graph P[Y=1] versus X Discrete Predictor - Contingency Table of P[Y=1] versus categories of X Unbalance Data – Few responses of one type • Guideline – 10 outcomes of each type for each X terms • Example – Y=1 for only 30 observations out of 1000 Model should contain no more than 3 X terms

Generalized Linear Models

• Model Selection cont.

• Multicollinearity • • • Correlations among predictors resulting in an increase in variance Reduces the significance value of the variable Occurs when several predictor variables are used in the model • Determining Model Fit • Other criteria besides significance tests (i.e. Likelihood Ratio Test) can be used to select a model

Generalized Linear Models

• Model Selection cont.

• Determining Model Fit cont.

• Akaike Information Criterion (AIC) – Penalizes model for having many parameters – AIC = Deviance+2*p where p is the number of parameters in model • Bayesian Information Criterion (BIC) – BIC = -2 Log L + ln(n)*p where p is the number of parameters in model and n is the number of observations

Generalized Linear Models

• Model Selection cont.

• Selection Algorithms • • Best subset – Tests all combinations of predictor variables to find best subset Algorithmic – Forward, Backward and Stepwise Procedures

Generalized Linear Models

• Best Subsets Procedure • Run model with all possible combinations of the predictor variables • Number of possible models equal to 2 predictor variables p where p is the number of • • Dummy Variables for categorical predictors considered together Ex) For a set of predictors {X 1 , X 2 , X 3 } • runs models with sets of predictors {X 1 , X 2 , X 3 }, {X 1 , X 2 }, • {X 2 , X 3 }, {X 1 , X 3 }, {X 1 }, {X 2 }, {X 3 }, and no predictor variables.

2 3 = 8 possible models • Most programs only allow for a small set of predictor variables • • Cannot be run in a reasonable amount of time 2 10 = 1024 models run for a set of 10 predictor variables

Generalized Linear Models

• Forward Selection • Idea: Start with no variables in the model and add one at a • • • • time Step One: Step Two: Step Three: Add each variable to the model one at a Step Four: Fit model with single predictor variable and determine fit Select predictor variable with best fit and add to model time and determine fit If at least one variable produces better fit, return to step two • If no variables produce better fit, use model Drawback: Variables Added to the model cannot be taken out.

Generalized Linear Models

• • Backward Selection • Idea: Start with all variables in the model and take out one at a time • Step One: Fit all predictor variables in model and determine fit • Step Two: Delete one variable at a time and determine fit • Step Three: If the deletion of at least one variable produces better fit, remove variable that produces best fit when deleted and return to step 2 If the deletion of a variable does not produce a better fit, use model Drawback: Variables taken out of model cannot be added back in.

Generalized Linear Models

• Stepwise Selection • Idea: Combination of forward and backward selection • Forward Step then backward step • • • • • • • • Step One: Fit each predictor variable as a single predictor variable and determine fit Step Two: Select variable that produces best fit and add to model.

Step Three: Add each predictor variable one at a time to the model and determine fit Step Four: Select variable that produces best fit and add to the model Step Five: Delete each variable in the model one at a time and determine fit Step Six: Remove variable that produces best fit when deleted Step Seven: Return to Step Two Loop until no variables added or deleted improve the fit.

Generalized Linear Models

• Summary • 3 Components of the GLM • • • Random (Y) Link Function (g(E[Y])) Systematic (x t β) • Continuous and Categorical Predictor Variables • Coding Categorical Variables – Effect and Dummy Coding • • Likelihood Ratio Test for Nested Models • Test the significance of a predictor variable or set of predictor variables in the model.

Model Selection – Best Subset, Forward, Backward, Stepwise

Generalized Linear Models

Questions/Comments

Logistic Regression

• Consider a binary response variable.

• Variable with two outcomes • One outcome represented by a 1 and the other represented by a 0 • Examples: Does the person have a disease? Yes or No Who is the person voting for?

McCain or Obama Outcome of a baseball game? Win or loss

Logistic Regression

• Logistic Regression Example Data Set • Response Variable –> Admission to Grad School (Admit) • 0 if admitted, 1 if not admitted • Predictor Variables • GRE Score (gre) – Continuous • University Prestige (topnotch) – 1 if prestigious, 0 otherwise • Grade Point Average (gpa) – Continuous

Logistic Regression

• First 10 Observations of the Data Set 0 1 0 1 0 0 1 0 ADMIT 1 0 GRE 380 660 800 640 520 760 560 400 540 700 0 0 0 1 1 0 0 0 TOPNOTCH 0 1 GPA 3.61

3.67

4 3.19

2.93

3 2.98

3.08

3.39

3.92

Logistic Regression

• Consider the linear probability model

E

 

i

P

(

Y

i

 0 |

x

i

)   (

x

i

)    

x

i

where y i = response for observation i x i = 1x(p+1) matrix of covariates for observation i p = number of covariates • • • GLM with binomial random component and identity link g( μ) = μ Issue: π(X i ) can take on values less than 0 or greater than 0 Issue: Predicted probability for some subjects fall outside of the [0,1] range.

Logistic Regression

• Consider the logistic regression model

E

 

i

P

(

Y i

 0 |

x i

)   (

x i

)  1  exp   exp     

x i

x i

  log

it

   

i

  log   1     

i

 

i

     

x

i

• • GLM with binomial random component and identity link g( μ) = logit( μ) Range of values for π(X i ) is 0 to 1

Logistic Regression

• Consider the logistic regression model log

it

   

i

     *

gpa

i

And the linear probability model  (

x

i

)     *

gpa

i

Then the graph of the predicted probabilities for different grade point averages:

Important Note: JMP models P(Y=0) and effect coding is used for categorical variables

Logistic Regression

Logistic Regression

• Interpretation of Coefficient β – Odds Ratio • The odds ratio is a statistic that measures the odds of an event compared to the odds of another event.

• Say the probability of Event 1 is π 1 π 2 and the probability of Event 2 is . Then the odds ratio of Event 1 to Event 2 is:

Odds

_

Ratio

Odds

(  1

Odds

(  2 ) )   1 1   1  2 1   2 • • • • Value of Odds Ratio range from 0 to Infinity Value between 0 and 1 indicate the odds of Event 2 are greater Value between 1 and infinity indicate odds of Event 1 are greater Value equal to 1 indicates events are equally likely

Logistic Regression

• Interpretation of Coefficient β – Odds Ratio cont.

• Link to Logistic Regression :

Log

(

Odds

_

Ratio

) 

Log

(  1  1  1 ) 

Log

(  1  2  2 ) 

Logit

(  1 ) 

Logit

(  2 ) • Thus the odds ratio between two events is

Odds

_

Ratio

 exp{

Logit

(  2 ) 

Logit

(  1 )}

Logistic Regression

• Interpretation of Coefficient β – Odds Ratio cont.

• Consider Event 1 is Y=0 given X and Event 2 is Y=0 given X+1

Log

(

Odds

_

Ratio

) 

Logit

(

P

(

Y

 0 |

X

 1 )) 

Logit

(

P

(

Y

 0 |

X

)) • From our logistic regression model  (    (

X

 1 ))  (   

X

)   • Thus the ratio of the odds of Y=0 for X and X+1 is

Odds

_

Ratio

 exp(  )

Logistic Regression

• Single Continuous Predictor Variable - GPA

Generalized Linear Model Fit

Response: Admit Modeling P(Admit=0) Distribution: Binomial Link: Logit Observations (or Sum Wgts) = 400

Whole Model Test Model -LogLikelihood

Difference Full Reduced 6.50444839

243.48381

249.988259

L-R ChiSquare

13.0089

DF

1

Goodness Of Fit Statistic

Pearson 401.1706

398 Deviance 486.9676

398

ChiSquare

0.4460

0.0015

DF

398 398

Prob>ChiSq

0.4460

0.0015

Prob>ChiSq

0.0003

Logistic Regression

• Single Continuous Predictor Variable – GPA cont.

Effect Tests Source

GPA

DF

1

L-R ChiSquare Prob>ChiSq

13.008897

0.0003

Parameter Estimates Term Estimate

Intercept GPA -4.357587

1.0511087

Std Error

1.0353175

0.2988695

L-R ChiSquare Prob>ChiSq

19.117873

<.0001

13.008897

0.0003

Lower CL

-6.433355

0.4742176

Upper CL

-2.367383

1.6479411

Interpretation of the Parameter Estimate: Exp{1.0511087} = 2.86 = odds ratio between the odds at x+1 and odds at x for all x The ratio of the odds of being admitted between a person with a 3.0 gpa and 2.0 gpa is equal to 2.86 or equivalently the odds of the person with the 3.0 is 2.86 times the odds of the person with the 2.0.

I

Logistic Regression

• Single Categorical Predictor Variable – Top Notch

Generalized Linear Model Fit

Response: Admit Modeling P(Admit=0) Distribution: Binomial Link: Logit Observations (or Sum Wgts) = 400

Whole Model Test Model -LogLikelihood

Difference Full Reduced 3.53984692

246.448412

249.988259

Goodness Of Fit Statistic

Pearson Deviance

L-R ChiSquare DF

7.0797

1

Prob>ChiSq

0.0078

ChiSquare

400.0000

492.8968

DF

398 398

Prob>ChiSq

0.4624

0.0008

Logistic Regression

• Single Categorical Predictor Variable – Top Notch cont.

Effect Tests Source

TOPNOTCH

DF

1

L-R ChiSquare

7.0796939

Prob>ChiSq

0.0078

Parameter Estimates Term

Intercept TOPNOTCH[0]

Estimate

-0.525855

-0.371705

Std Error

0.138217

0.138217

L-R ChiSquare Prob>ChiSq

14.446085

0.0001

7.0796938

0.0078

Lower CL

-0.799265

-0.642635

Upper CL

-0.255667

-0.099011

Interpretation of the Parameter Estimate: Exp{2*-.371705} = 0.4755 = odds ratio between the odds of admittance for a student at a less prestigous university and the odds of admittance for a student from a more prestigous university.

The odds of being admitted from a less prestigous university is .48 times the odds of being admitted from a more prestigous university.

Logistic Regression

• • Variable Selection – Likelihood Ratio Test • Consider the model with GPA, GRE, and Top Notch as predictor variables

Generalized Linear Model Fit

Response: Admit Modeling P(Admit=0) Distribution: Binomial Link: Logit Observations (or Sum Wgts) = 400

Whole Model Test Model -LogLikelihood

Difference Full Reduced 10.9234504

239.064808

249.988259

Goodness Of Fit Statistic

Pearson Deviance

L-R ChiSquare

21.8469

DF

3

ChiSquare

396.9196

478.1296

DF

396 396

Prob>ChiSq

0.4775

0.0029

Prob>ChiSq

<.0001

Logistic Regression

• Variable Selection – Likelihood Ratio Test cont.

Effect Tests Source DF L-R ChiSquare

TOPNOTCH 1 2.2143635

GPA GRE 1 1 4.2909753

5.4555484

Prob>ChiSq

0.1367

0.0383

0.0195

Parameter Estimates Term Estimate

Intercept TOPNOTCH[0] GPA GRE -4.382202

-0.218612

0.6675556

0.0024768

Std Error

1.1352224

0.1459266

0.3252593

0.0010702

L-R ChiSquare Prob>ChiSq

15.917859

<.0001

2.2143635

4.2909753

5.4555484

0.1367

0.0383

0.0195

Lower CL

-6.657167

-0.503583

0.0356956

0.0003962

Upper CL

-2.197805

0.070142

1.3133755

0.0046006

Logistic Regression

• Model Selection – Forward

Stepwise Fit

Response: Admit

Stepwise Regression Control

Prob to Enter 0.250

Prob to Leave Direction: 0.100

Rules:

Current Estimates -LogLikelihood RSquare

239.06481

0.0437

Logistic Regression

• Model Selection – Forward cont.

Parameter

Intercept[1]

Estimate

-4.3821986

GRE GPA 0.00247683

0.66755511

TOPNOTCH{1-0} 0.21861181

1 1 1

nDF

1

Wald/Score ChiSq

0 5.356022

4.212258

2.244286

"Sig Prob"

1.0000

0.0207

0.0401

0.1341

1 2

Step History Step

3

Parameter

GRE GPA TOPNOTCH{1-0}

Action

Entered Entered Entered

L-R ChiSquare "Sig Prob"

13.92038

0.0002

5.712157

2.214363

0.0168

0.1367

RSquare

0.0278

0.0393

0.0437

3 4

p

2

Logistic Regression

• Model Selection – Backward • Start by selecting to enter all variables into the model

Stepwise Fit

Response: Admit

Stepwise Regression Control

Prob to Enter Prob to Leave 0.250

0.100

Direction: Backward Rules: Combine

Logistic Regression

• Model Selection – Backward cont.

Current Estimates -LogLikelihood RSquare

240.17199

0.0393

Parameter

Intercept[1] GRE GPA TOPNOTCH{1-0} 0

Estimate

-4.9493751

0.00269068

0.75468641

1 1

nDF

1 1

Wald/Score ChiSq

0 6.473978

5.576461

2.259729

"Sig Prob"

1.0000

0.0109

0.0182

0.1328

Step History Step

1

Parameter

TOPNOTCH{1-0}

Action

Removed

L-R ChiSquare "Sig Prob"

2.214363

0.1367

RSquare

0.0393

p

3

Logistic Regression

• Variable Selection – Stepwise

Stepwise Fit

Response: Admit

Stepwise Regression Control

Prob to Enter Prob to Leave 0.250

0.250

Direction: Mixed Rules: Combine

Current Estimates -LogLikelihood RSquare

239.06481

0.0437

Logistic Regression

• Variable Selection – Stepwise cont.

Parameter

Intercept[1] GRE

Estimate

-4.3821986

0.00247683

GPA 0.66755511

TOPNOTCH{1-0} 0.21861181

1 1

nDF

1 1

Wald/Score ChiSq

0 5.356022

4.212258

2.244286

"Sig Prob"

1.0000

0.0207

0.0401

0.1341

2 3

Step History Step

1

Parameter

GRE GPA TOPNOTCH{1-0}

Action

Entered Entered Entered

L-R ChiSquare

13.92038

5.712157

2.214363

"Sig Prob"

0.0002

0.0168

0.1367

Rsquare

0.0278

0.0393

0.0437

3 4

p

2

Logistic Regression

• Summary • Introduction to the Logistic Regression Model • • Interpretation of the Parameter Estimates β – Odds Ratio Variable Significance – Likelihood Ratio Test • Model Selection • • • Forward Backward Stepwise

Logistic Regression

Questions/Comments

Poisson Regression

• Consider a count response variable.

• • Response variable is the number of occurrences in a given time frame.

Outcomes equal to 0, 1, 2, ….

• Examples: Number of penalties during a football game.

Number of customers shop at a store on a given day.

Number of car accidents at an intersection.

Poisson Regression

• Poisson Regression Example Data Set • Response Variable –> Number of Days Absent – Integer • Predictor Variables • • • • • • Gender- 1 if Female, 2 if Male Ethnicity School – 6 Ethnic Categories – 1 if School, 2 if School 2 Math Test Score Bilingual Status – Continuous Language Test Score – Continuous – 6 Bilingual Categories

Poisson Regression

• First 10 Observations from the Poisson Regression Example Data Set 1 GENDER ethnicity school.1.or.2 ctbs.math.nce ctbs.lang.nce bilingual.status number.days.absent

2 4 1 56.988830 42.45086 2 4 2 3 4 2 4 1 1 4 1 1 4 1 37.094160 46.82059 2 4 32.275460 43.56657 2 2 29.056720 43.56657 2 3 5 6 7 8 9 10 1 4 1 1 4 1 1 4 1 2 4 1 2 4 1 2 6 1 6.748048 27.24847 3 3 61.654280 48.41482 0 13 56.988830 40.73543 2 11 10.390490 15.35938 2 7 50.527950 52.11514 2 10 49.472050 42.45086 0 9

Poisson Regression

• Consider the model

E

 

i

 

i

   

x

i

where Y i = response for observation i x i = 1x(p+1) matrix of covariates for observation i p = number of covariates μ i = expected number of events given x i • • GLM with poisson random component and identity link g( μ) = μ Issue: Predicted values range from ∞ to +∞

Poisson Regression

• Consider the Poisson log-linear model

E

Y

i

|

x

i

  

i

 exp    

x

i

 log  

i

   

x i

• • • GLM with poisson random component and log link g( μ) = log(μ) Predicted response values fall between 0 and +∞ In the case of a single predictor, An increase of one unit of x results an increase of exp( β) in μ

Poisson Regression

• Consider the Poisson log-linear model log  

i

    *

Math

_

Score

i

And the Poisson linear model 

i

   

x

*

Math

_

Score

i

Then a graph of the predicted values from the model:

Poisson Regression

12 10 8 6 4 2 0 0

Predicted Number of Days Absent Versus Match Score

20 40 60

Math Score

80 100 120 Predicted Days Absent - Log Link Predicted Days Absent - Identity Link

Poisson Regression

• Single Continuous Predictor Variable – Math Score > fitline<-glm(number.days.absent~ctbs.math.nce,data=poisson_data,family=poisson(link=log)) > summary(fitline) Call: glm(formula = number.days.absent ~ ctbs.math.nce, family = poisson(link = log), data = poisson_data) Deviance Residuals: Min 1Q Median 3Q Max -4.4451 -2.5583 -1.0842 0.6647 12.4431 Coefficients: (Intercept) ctbs.math.nce Estimate Std. Error z value Pr(>|z|) 2.302100 0.062776 36.671 <2e-16 *** -0.011568 0.001294 -8.939 <2e-16 *** -- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Poisson Regression

• Single Continuous Predictor Variable – Math Score (Dispersion parameter for poisson family taken to be 1) Null deviance: 2409.8 on 315 degrees of freedom Residual deviance: 2330.6 on 314 degrees of freedom AIC: 3196 Number of Fisher Scoring iterations: 6 Interpretation of the parameter estimate: Exp{-0.011568} = .98 = multiplicative effect on the expected number of days absent for an increase of 1 in the Math Score Fabricated Example – If a student is expected to miss 5 days with a math of 50, then another student with a math score of 51 is expected to miss 5*.98 = 4.9 days

Poisson Regression

• Single Continuous Predictor Variable – Gender > fitline<-glm(number.days.absent~factor(GENDER),data=poisson_data,family=poisson(link=log)) > summary(fitline) Call: glm(formula = number.days.absent ~ factor(GENDER), family = poisson(link = log), data = poisson_data) Deviance Residuals: Min 1Q Median 3Q Max -3.660 -2.755 -1.128 0.902 9.738 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) factor(GENDER)2 1.90174 0.03036 62.644 < 2e-16 *** -0.31729 0.04747 -6.684 2.32e-11 *** -- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Poisson Regression

• Single Continuous Predictor Variable – Gender (Dispersion parameter for poisson family taken to be 1) Null deviance: 2409.8 on 315 degrees of freedom Residual deviance: 2364.5 on 314 degrees of freedom AIC: 3229.9

Number of Fisher Scoring iterations: 5 Important Note: The function factor(categorical variable) uses the dummy coding Interpretation of the parameter estimate: Exp{-0.31729} = 0.7289 = multiplicative effect on the expected number of days absent of being male rather than female If a female student is expected to miss X days, then a male student is expected to miss 0.7289*X.

Poisson Regression

• • Variable Selection – Likelihood Ratio Test Model with all variables > fitline<-glm(number.days.absent~factor(GENDER)+factor(school.1.or.2)+ctbs.math.nce+ctbs.lang.nce+factor(bilingual.status)+   factor(ethnicity),data=poisson_data,family=poisson(link=log)) summary(fitline) Call: glm(formula = number.days.absent ~ factor(GENDER) + factor(school.1.or.2) + ctbs.math.nce + ctbs.lang.nce + factor(bilingual.status) + factor(ethnicity), family = poisson(link = log), data = poisson_data) Deviance Residuals: Min 1Q Median 3Q Max -4.5222 -2.1863 -0.9622 0.7454 10.4077

Poisson Regression

• • Variable Selection – Likelihood Ratio Test Model with all variables Cont > Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.972325 0.424645 7.000 2.57e-12 *** factor(GENDER)2 -0.401980 0.048954 -8.211 < 2e-16 *** factor(school.1.or.2)2 -0.582321 0.070717 -8.235 < 2e-16 *** ctbs.math.nce -0.001043 0.001845 -0.565 0.57181 ctbs.lang.nce -0.003048 0.002003 -1.521 0.12822 factor(bilingual.status)1 -0.344696 0.083754 -4.116 3.86e-05 *** factor(bilingual.status)2 -0.282194 0.070846 -3.983 6.80e-05 *** factor(bilingual.status)3 -0.053406 0.081850 -0.652 0.51409 factor(ethnicity)2 -0.131202 0.420704 -0.312 0.75515 factor(ethnicity)3 -0.434061 0.418013 -1.038 0.29909 factor(ethnicity)4 -0.326230 0.419158 -0.778 0.43639 factor(ethnicity)5 -0.876270 0.416398 -2.104 0.03534 * factor(ethnicity)6 -1.188835 0.457470 -2.599 0.00936 ** -- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Poisson Regression

• • Variable Selection – Likelihood Ratio Test Model with all variables Cont (Dispersion parameter for poisson family taken to be 1) Null deviance: 2409.8 on 315 degrees of freedom

Residual deviance: 1909.2 on 303 degrees of freedom

AIC: 2796.6

Number of Fisher Scoring iterations: 6

Poisson Regression

• • Variable Selection – Likelihood Ratio Test Model with all variables except Ethnicity >fitline summary(fitline) Call: glm(formula = number.days.absent ~ factor(GENDER) + factor(school.1.or.2) + ctbs.math.nce + ctbs.lang.nce + factor(bilingual.status), family = poisson(link = log), data = poisson_data) Deviance Residuals: Min 1Q Median 3Q Max -4.6955 -2.3130 -0.9115 0.7527 11.4247

Poisson Regression

• • Variable Selection – Likelihood Ratio Test Model with all variables except Ethnicity Coefficients: (Intercept) Estimate 2.5741133 factor(GENDER)2 -0.4212841 factor(school.1.or.2)2 -0.8242109 ctbs.math.nce 0.0008193 Std. Error 0.0838754 0.0484383 0.0570241 0.0018278 ctbs.lang.nce -0.0050753 factor(bilingual.status)1 -0.3080131 factor(bilingual.status)2 -0.1815997 factor(bilingual.status)3 0.0363656 0.0019380 0.0762534 0.0581877 0.0686396 -- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 z value 30.690 -8.697 -14.454 0.448 -2.619 -4.039 -3.121 0.530 Pr(>|z|) < 2e-16 *** < 2e-16 *** < 2e-16 *** 0.65398 0.00882 ** 5.36e-05 *** 0.00180 ** 0.59625

Poisson Regression

• • Variable Selection – Likelihood Ratio Test Model with all variables except Ethnicity (Dispersion parameter for poisson family taken to be 1) Null deviance: 2409.8 on 315 degrees of freedom

Residual deviance: 1984.1 on 308 degrees of freedom

AIC: 2861.5

Number of Fisher Scoring iterations: 6

Poisson Regression

• Variable Selection – Likelihood Ratio Test • Model 1 with All Variables – Deviance = -2 Log L = 1909.2 with df = 303 • Model 2 without Ethnicity - Deviance = -2 Log L = 1984.1 with • • df = 308 Likelihood Ratio Test = Deviance (Model 2) – Deviance (Model 1) = 1984.1 – 1909.2= 74.9

Likelihood Ratio Test ~ Chi Square with 308-303 = 5 degrees of freedom • • P-Value < .0001

There is significant evidence to conclude that ethnicity is a significant predictor variable.

Poisson Regression

• Model Selection • Forward Selection > fitline<-glm(number.days.absent~1,data=data1,family=poisson(link=log)) > step(fitline,scope = list(upper = ~factor(GENDER)+factor(school.1.or.2)+ctbs.math.nce+ctbs.lang.nce+factor(bilingual.status)+factor(ethnicity), lower = ~1),direction="forward") Start: AIC=3273.22

number.days.absent ~ 1 Df + factor(school.1.or.2) 1 + factor(ethnicity) 5 + ctbs.lang.nce 1 + ctbs.math.nce 1 + factor(bilingual.status) 3 + factor(GENDER) 1 Deviance 2103.7 2095.9 2311.7 2330.6 2339.2 2364.5 2409.8 AIC 2969.1

2969.3

3177.0

3196.0

3208.6

3229.9

3273.2

-

Poisson Regression

• Model Selection • Forward Selection cont.

Step: AIC=2969.12

number.days.absent ~ factor(school.1.or.2) Df + factor(ethnicity) 5 + factor(GENDER) 1 + factor(bilingual.status) 3 + ctbs.lang.nce 1 + ctbs.math.nce 1 Deviance 2018.7 2029.3 2066.0 2092.7 2096.7 2103.7 AIC 2894.1

2896.7

2937.4

2960.1

2964.1

2969.1

Poisson Regression

• Model Selection • Forward Selection cont.

Step: AIC=2894.07

number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) Df + factor(GENDER) 1 Deviance 1951.3 AIC 2828.7

+ factor(bilingual.status) 3 + ctbs.math.nce 1 + ctbs.lang.nce 1 1981.6 2011.1 2012.5 2018.7 2863.0

2888.5

2889.9

2894.1

Step: AIC=2828.67

number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) + factor(GENDER) Df Deviance AIC + factor(bilingual.status) 3 1915.3 2798.8

+ ctbs.lang.nce 1 + ctbs.math.nce 1 1938.5 1942.3 1951.3 2817.8

2821.7

2828.7

Poisson Regression

• Model Selection • Forward Selection cont.

Step: AIC=2798.75

number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) + factor(GENDER) + factor(bilingual.status) Df + ctbs.lang.nce 1 + ctbs.math.nce 1 Deviance 1909.5 1911.5 1915.3 AIC 2794.9

2796.9

2798.8

Step: AIC=2794.89

number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) + factor(GENDER) + factor(bilingual.status) + ctbs.lang.nce

Df + ctbs.math.nce 1 Deviance 1909.5 1909.2 AIC 2794.9

2796.6

Poisson Regression

• Model Selection • Forward Selection cont.

Call: glm(formula = number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) + factor(GENDER) + factor(bilingual.status) + ctbs.lang.nce, family = poisson(link = log), data = data1) Coefficients: (Intercept) factor(school.1.or.2)2 factor(ethnicity)2 factor(ethnicity)3 factor(ethnicity)4 2.948689 -0.586678 -0.126806 -0.423376 -0.313360 factor(ethnicity)5 factor(ethnicity)6 factor(GENDER)2 factor(bilingual.status)1 factor(bilingual.status)2 -0.862743 -1.175574 -0.404215 -0.343907 -0.284027 factor(bilingual.status)3 ctbs.lang.nce -0.051558 -0.003763 Degrees of Freedom: 315 Total (i.e. Null); 304 Residual Null Deviance: 2410

Poisson Regression

• Model Selection cont.

• Backward Selection > fitline<-glm(number.days.absent~factor(GENDER)+factor(school.1.or.2)+ctbs.math.nce+ctbs.lang.nce+factor(bilingual.status)+ factor(ethnicity),data=poisson_data,family=poisson(link=log)) > backwards<-step(fitline,direction="backward") Start: AIC=2796.57

number.days.absent ~ factor(GENDER) + factor(school.1.or.2) + ctbs.math.nce + ctbs.lang.nce + factor(bilingual.status) + factor(ethnicity) Df -

ctbs.math.nce 1

- ctbs.lang.nce 1 - factor(bilingual.status) 3 - factor(ethnicity) 5 - factor(GENDER) 1 - factor(school.1.or.2) 1 Deviance

1909.5

1909.2 1911.5 1937.8 1984.1 1977.8 1983.6 AIC

2794.9

2796.6

2796.9

2819.2

2861.5

2863.2

2869.0

Poisson Regression

• Model Selection cont.

• Backward Selection cont.

Step: AIC=2794.89

number.days.absent ~ factor(GENDER) + factor(school.1.or.2) + ctbs.lang.nce + factor(bilingual.status) + factor(ethnicity) Df - ctbs.lang.nce 1 - factor(bilingual.status) 3 - factor(ethnicity) 5 - factor(GENDER) 1 - factor(school.1.or.2) 1 Deviance 1909.5 1915.3 1938.5 1984.3 1979.4 1986.5 AIC 2794.9

2798.8

2817.8

2859.7

2862.8

2869.9

Poisson Regression

• Model Selection cont.

• Stepwise Selection cont.

> fitline<-glm(number.days.absent~1,data=data1,family=poisson(link=log)) > step(fitline,scope = list(upper=~factor(GENDER)+factor(school.1.or.2)+ctbs.math.nce+ctbs.lang.nce+factor(bilingual.status)+factor(ethnicity), lower = ~1),direction="both") Start: AIC=3273.22

number.days.absent ~ 1 Df + factor(school.1.or.2) 1 + factor(ethnicity) 5 + ctbs.lang.nce 1 + ctbs.math.nce 1 + factor(bilingual.status) 3 + factor(GENDER) 1 Deviance 2103.7 2095.9 2311.7 2330.6 2339.2 2364.5 2409.8 AIC 2969.1

2969.3

3177.0

3196.0

3208.6

3229.9

3273.2

Poisson Regression

• Model Selection cont.

• Stepwise Selection cont.

Step: AIC=2969.12

number.days.absent ~ factor(school.1.or.2) Df + factor(ethnicity) 5 + factor(GENDER) 1 + factor(bilingual.status) 3 + ctbs.lang.nce 1 + ctbs.math.nce 1 - factor(school.1.or.2) 1 Deviance 2018.7 2029.3 2066.0 2092.7 2096.7 2103.7 2409.8 AIC 2894.1

2896.7

2937.4

2960.1

2964.1

2969.1

3273.2

Poisson Regression

• Model Selection cont.

• Stepwise Selection cont.

• • • • • • • • • • Step: AIC=2894.07

number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) Df + factor(GENDER) 1 Deviance 1951.3 AIC 2828.7

+ factor(bilingual.status) + ctbs.math.nce + ctbs.lang.nce 3 1 1 1981.6 2011.1 2012.5 2863.0

2888.5

2889.9

- factor(ethnicity) - factor(school.1.or.2) 5 1 2018.7 2103.7 2095.9 2894.1

2969.1

2969.3

Poisson Regression

• Model Selection cont.

• Stepwise Selection cont.

Step: AIC=2828.67

number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) + factor(GENDER) Df Deviance AIC + factor(bilingual.status) + ctbs.lang.nce + ctbs.math.nce 3 1 1 1915.3 1938.5 1942.3 1951.3 2798.8

2817.8

2821.7

2828.7

- factor(GENDER) - factor(ethnicity) - factor(school.1.or.2) 1 5 1 2018.7 2029.3 2050.5 2894.1

2896.7

2925.9

Poisson Regression

• Model Selection cont.

• Stepwise Selection cont.

Step: AIC=2798.75

number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) + factor(GENDER) + factor(bilingual.status) Df + ctbs.lang.nce 1 Deviance 1909.5 AIC 2794.9

+ ctbs.math.nce 1 - factor(bilingual.status) 3 - factor(GENDER) 1 - factor(ethnicity) 5 - factor(school.1.or.2) 1 1911.5 1915.3 1951.3 1981.6 1993.4 2003.4 2796.9

2798.8

2828.7

2863.0

2866.8

2884.8

Poisson Regression

• Model Selection cont.

• Stepwise Selection cont.

• • Step: AIC=2794.89

number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) + factor(GENDER) + factor(bilingual.status) + ctbs.lang.nce

Df Deviance 1909.5 AIC 2794.9

+ ctbs.math.nce 1 - ctbs.lang.nce 1 - factor(bilingual.status) 3 1909.2 1915.3 1938.5 2796.6

2798.8

2817.8

- factor(ethnicity) 5 - factor(GENDER) 1 - factor(school.1.or.2) 1 1984.3 1979.4 1986.5 2859.7

2862.8

2869.9

Poisson Regression

• Model Selection cont.

• Stepwise Selection cont.

Call: glm(formula = number.days.absent ~ factor(school.1.or.2) + factor(ethnicity) + factor(GENDER) + factor(bilingual.status) + ctbs.lang.nce, family = poisson(link = log), data = data1) Coefficients: (Intercept) factor(school.1.or.2)2 factor(ethnicity)2 factor(ethnicity)3 factor(ethnicity)4 2.948689 -0.586678 -0.126806 -0.423376 -0.313360 factor(ethnicity)5 factor(ethnicity)6 factor(GENDER)2 factor(bilingual.status)1 factor(bilingual.status)2 -0.862743 -1.175574 -0.404215 -0.343907 -0.284027 factor(bilingual.status)3 ctbs.lang.nce -0.051558 -0.003763 Degrees of Freedom: 315 Total (i.e. Null); 304 Residual Null Deviance: 2410 Residual Deviance: 1909 AIC: 2795

Poisson Regression

• Lets look back at the Poisson log-linear model log  

i

    *

Math

_

Score

i

• Taking the sample mean and sample variance of the response for intervals of Math Scores

Math Score

0-20 20-40 40-60 60-80 80-100

Sample Mean

11.66666667

6.453333333

5.270072993

4.324675325

9.666666667

Sample Standard Deviation

10.64397095

6.595029523

7.382913152

5.434881392

14.50861813

Poisson Regression

• Overdispersion for Poisson Regression Models • •

For Y

i

~Poisson( λ

i

), E [Y

i

] = Var [Y

i

] = λ

i

The variance of the response is much larger than the mean.

Larger variance known as overdispersion

Consequences: Parameter estimates are still consistent

Standard errors are inconsistent Remedy: Negative Binomial model

Poisson Regression

• Summary • Introduction to the Poisson Regression Model • • Interpretation of β Variable Significance – Likelihood Ratio Test • Model Selection • • • Forward Backward Stepwise • Overdispersion

Poisson Regression

Questions/Comments