Generalized linear panel data models

Download Report

Transcript Generalized linear panel data models

10. Generalized linear models
• 10.1 Homogeneous models
– Exponential families of distributions, link functions,
likelihood estimation
• 10.2 Example: Tort filings
• 10.3 Marginal models and GEE
• 10.4 Random effects models
• 10.5 Fixed effects models
– Maximum likelihood, conditional likelihood, Poisson data
• 10.6 Bayesian Inference
• Appendix 10A Exponential families of distributions
10.1 Homogeneous models
• Section Outline
– 10.1.1 Exponential families of distributions
– 10.1.2 Link functions
– 10.1.3 Likelihood estimation
• In this section, we consider only independent responses.
– No serial correlation.
– No random effects that would induce serial correlation.
Exponential families of distributions
• The basic one parameter exponential family is
 yq  b(q )

p( y,q ,f )  exp 
 S( y,f ) 
f


– Here, y is a response and q is the parameter of interest.
– The parameter f is a scale parameter that we often will
assume is known.
– The term b(q) depends only on the parameter q, not the
responses.
– S(y, f ) depends only on the responses and the scale
parameter, not the parameter q.
– The response y may be discrete or continuous.
• Some straightforward calculations show that
E y = b(q) and Var y = b(q) f.
Special cases of the basic exponential family
• Normal
– The probability density function is
 ( y  m )2 
 ( ym  m 2 / 2) y 2 1
1
2
2


f( y, m ,s ) 
exp 

exp


ln
2
ps
2
2
2


2
2
s
s
2
s
2p s



– Take m = q, s2 = f , b(q) = q 2/2 and
S(y, f ) = - y2 / (2f) - ln(2 p f))/2 .
– Note that E y = b(q) = q  m and Var y = b(m) s2 = s2.
(
)

• Binomial, n trials and prob p of success
– The probability mass function is

 n y
 n 
p
n y

f( y , p )   p (1  p )
 exp  y ln
 n ln(1  p )  ln  
(1  p )
 y
 y 

– Take ln (p/(1-p))= logit (p) = q, 1 = f , b(q) = n ln (1  eq) and
S(y, f ) = ln((n choose y)) .
– Note that E y = b(q) = n eq/(1  eq) = n p and Var y = b(q) (1)
= n eq/(1  eq)2 = n p(1-p) , as anticipated.
Another special case of the basic exponential
family
• Poisson
– The probability mass function is
f( y, l ) 
l y exp ( l )
y!
 exp ( y ln l  l  ln y!)
– Take ln (l) = q, 1 = f , b(q) = eq and S(y, f ) = -ln( y!)) .
– Note that E y = b(q) = eq = l and
– Var y = b(q) (1) = eq = l , as anticipated.
Table 10A.1 Selected Distributions of the One-Parameter Exponential Family
Distribution
General
Parameters
Normal
m, s 2
Binomial
Poisson
q, f
p
l
Density
or Mass Function
 yq  b(q )

exp
 S( y, f ) 
f


Components
Ey
Var y
q
f
b(q)
S(y,f)
b(q)
b(q) f
m
s2
q2
2
 y 2 ln( 2pf ) 

 

2 
 2f
q=m
f=s2
 n y
 p (1  p ) n  y
 y
 p 
ln

1p 
1
n ln 1  eq
n
ln 
 y
eq
n
1  eq
 np
eq
l y exp( l )
ln l
1
eq
-ln (y!)
eq = l



1
-ln (-q)
 f 1 ln f
 ( y  m)2
exp 
2s 2
s 2p

1




(
)
n
(1  e )
q 2
 np (1  p )
eq = l
y!
Gamma
, 
   1
y exp( y )
( )

(
 ln (f 1 )
)
 (f 1  1) ln y

1
q



f


q2 2
10.1.2 Link functions
• To link up the univariate exponential family with regression
problems, we define the systematic component of yit to be
hit = xit .
• The idea is to now choose a “link” between the systematic
component and the mean of yit , say mit , of the form:
hit = g(mit) .
– g(.) is the link function.
• Linear combinations of explanatory variables, hit = xit,
may vary between negative and positive infinity.
– However, means may be restricted to smaller range. For
example, Poisson means vary between zero and infinity.
– The link function serves to map the domain of the mean
function onto the whole real line.
Bernoulli illustration of links
• Bernoulli means vary between 0 and 1, although linear
combinations of explanatory variables may vary between
negative and positive infinity.
• Here are three important examples of link functions for the
Bernoulli distribution:
– Logit: h = g(m) = logit(m) = ln (m/(1 m)) .
– Probit: h = g(m) = F-1(m) , where F-1 is the inverse of
the standard normal distribution function.
– Complementary log-log: h = g(m) = ln ( -ln(1 m) ) .
• Each function maps the unit interval (0,1) onto the whole
real line.
Canonical links
• As we have seen with the Bernoulli, there are several link
functions that may be suitable for a particular distribution.
• When the systematic component equals the parameter of
interest (h = q ), this is an intuitively appealing case.
– That is, the parameter of interest, q , equals a linear
combination of explanatory variables, h.
– Recall that h = g(m) and m = b(q).
– Thus, if g-1 = b, then h = g(b(q)) = q.
– The choice of g, such that g-1 = b, is called a canonical
link.
• Examples: Normal: g(q) = q, Binomial: g(q) = logit(q),
Poisson: g(q) = ln q.
10.1.3 Estimation
• Begin with likelihood estimation for canonical links
• Consider responses yit, with mean mit, systematic component
hit = g(mit) = xit and canonical link so that hit = qit.
– Assume the responses are independent.
• Then, the log-likelihood is
ln p(y ) 

it

 yit θit  b(θit )

 S( yit ,f )

f



it
 yit (xit β )  b(xit β )

 S( yit ,f )

f


MLEs - Canonical links
• The log-likelihood is
ln p(y ) 

it
 yit (xit β )  b(xit β )

 S( yit ,f )

f


• Taking the partial derivative with respect to  yields the score
equations:
0

it
 xit ( yit  b(xitβ))


f



it
 xit ( yit  mit )


f


– because mit = b(qit) = b(xit  ).
• Thus, we can solve for the mle’s of  through:
0  Sit xit (yit - mit).
– This is a special case of the method of moments.
MLEs - general links
• For general links, we no longer assume the relation qit = xit .
• We assume that  is related to qit through
mit = b(qit) and hit = xit  = g(mit).
• Recall that the log-likelihood is
 yit θit  b(θit )

ln p(y )   
 S( yit ,f )
f

it 
– Further, E yit = mit and Var yit = b(qit) / f .
• The jth element of the score function is
 θ it y it  m it 

ln p(y ) 


 j
f

it 
  j

– because b (qit) = mit
MLEs - more on general links
• To eliminate qit, we use the chain rule to get
• Thus,
m it  b (θ it )
θ it Var y it θ it



 b (θ it )

 j
 j
 j
f  j
θ it m it
f

 j  j Var y it
• This yields

 j
ln p(y ) 

it
 m it

1
(Var yit ) ( yit  m it )

  j

• This is called the generalized estimating equations form.
Overdispersion
• When fitting models to data with binary or count dependent
variables, it is common to observe that the variance exceeds
that anticipated by the fit of the mean parameters.
– This phenomenon is known as overdispersion.
– A probabilistic models may be available to explain this
phenomenon.
• In many situations, analysts are content to postulate an
approximate model through the relation
Var yit = s 2 f b(xit β) / wit.
– The scale parameter f is specified through the choice of
the distribution
– The scale parameter σ2 allows for extra variability.
• When the additional scale parameter σ2 is included, it is
customary to estimate it by Pearson’s chi-square statistic
divided by the error degrees of freedom. That is,
1
sˆ 
NK
2

it
( y it  b(x it b MLE )) 2
wit
f b(xit b MLE )
10.2 Example: Tort filings
Table 10.2 State Characteristics
Dependent Variable
NUMFILE
Number of filings of tort actions against insurance companies.
State Legal Characteristics
JSLIAB
An indicator of joint and several liability reform.
COLLRULE
An indicator of collateral source reform.
CAPS
An indicator of caps on non-economic reform.
PUNITIVE
An indicator of limits of punitive damage
State Economic and Demographic Characteristics
POP
The state population, in millions.
POPLAWYR
The population per lawyer.
VEHCMILE
Number of automobiles miles per mile of road, in thousands.
POPDENSY
Number of people per ten square miles of land.
WCMPMAX
Maximum workers’ compensation weekly benefit.
URBAN
Percentage of population living in urban areas.
UNEMPLOY
State unemployment rate, in percentages.
Source: An Empirical Study of the Effects of Tort Reforms on the Rate of Tort Filings,
unpublished Ph.D. Dissertation, Han-Duck Lee, University of Wisconsin (1994).
Table 10.3 Averages with Explanatory Indicator Variables
JSLIAB
Average Explanatory Variable
Average NUMFILE
When Explanatory Variable = 0
When Explanatory Variable = 1
Explanatory Variable
COLLRULE
CAPS
PUNITIVE
0.491
0.304
0.232
0.321
15,530
25,967
20,727
20,027
24,682
6,727
17,693
26,469
Table 10.4. Summary Statistics for Other Variables
Variable
Mean
Median
Minimum
Maximum
Standard
deviation
Correlation
with NUMFILE
NUMFILE
POP
POPLAWYR
VEHCMILE
POPDENSY
WCMPMAX
URBAN
UNEMPLOY
20514
6.7
377.3
654.8
168.2
350.0
69.4
6.2
9085
3.4
382.5
510.5
63.9
319.0
78.9
6.0
512
0.5
211.0
63.0
0.9
203.0
18.9
2.6
137455
29.0
537.0
1899.0
1043.0
1140.0
100.0
10.8
29039
7.2
75.7
515.4
243.9
151.7
24.8
1.6
1.0
0.902
-0.378
0.518
0.368
-0.265
0.550
0.008
Offsets
• We assume that yit is Poisson distribution with parameter
POPit exp(xit β),
– where POPit is the population of the ith state at time t.
• In GLM terminology, a variable with a known coefficient
equal to 1 is known as an offset.
• Using logarithmic population, our Poisson parameter for yit
is
exp(ln POPit  xit,1 1    xit, K  K )  exp(ln POPit  xit β)  POPit exp(xit β)
• An alternative approach is to use the average number of tort
filings as the response and assume approximate normality.
– Note that in the Poisson model above the expectation of
the average response is
E ( yit / POPit )  exp(xit β)
– whereas the variance is
Var ( yit / POPit )  exp(xit β) / POPit
Tort filings
• Purpose: to understand ways in which state legal, economic
and demographic characteristics affect the number of
filings.
• Table 10.3 suggests more filings under JSLIAB and
PUNITIVE but less under CAPS
• Table 10.5
– All variables under the homogenous model are
statistically significant
– However, estimated scale parameter seems important
• Here, only JSLIAB is (positively) statistically significant
– Time (categorical) variable seems important
Table 10.5 Tort Filings Model Coefficient Estimates
Variable
Intercept
POPLAWYR/1000
VEHCMILE/1000
POPDENSY/1000
WCMPMAX/1000
URBAN/1000
UNEMPLOY
JSLIAB
COLLRULE
CAPS
PUNITIVE
Scale
Deviance
Pearson Chi-Square
Based on N = 112 observations from n = 19 states and T = 6 years.
Logarithmic population is used as an offset.
Homogeneous Poisson
Model with estimated
Model with scale
model
scale parameter
parameter and time
categorical variable
Parameter
p-values Parameter
p-values Parameter
p-values
estimate
estimate
estimate
-7.943
2.163
0.862
0.392
-0.802
0.892
0.087
0.177
-0.030
-0.032
0.030
1.000
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
118,309.0
129,855.7
-7.943
2.163
0.862
0.392
-0.802
0.892
0.087
0.177
-0.030
-0.032
0.030
35.857
<.0001
0.0002
<.0001
0.0038
0.1226
0.8187
0.0005
0.0292
0.7444
0.7457
0.6623
118,309.0
129,855.7
2.123
0.856
0.384
-0.826
0.977
0.086
0.130
-0.023
-0.056
0.053
36.383
0.0004
<.0001
0.0067
0.1523
0.8059
0.0024
0.2705
0.8053
0.6008
0.4986
115,496.4
127,073.9
10.3 Marginal models
• This approach reduces the reliance on the distributional
assumptions by focusing on the first two moments.
• We first assume that the variance is a known function of the
mean up to a scale parameter, that is, Var yit = v(mit) f .
– This is a consequence of the exponential family, although
now it is a basic assumption.
– That is, in the GLM setting, we have Var yit = b(qit) f and
mit = b(qit).
– Because b(.) and f are assumed known, Var yit is a known
function of mit .
• We also assume that the correlation between two observations
within the same subject is a known function of their means, up
to a vector of parameters t.
– That is corr(yir , yis ) = r(mir , mis , t) , for r( .) known.
Marginal model
• This framework incorporates the linear model nicely; we
simply use a GLM with a normal distribution.
– However, for nonlinear situations, a correlation is not
always the best way to capture dependencies among
observations.
• Here is some notation to help see the estimation procedures.
• Define mi = (mi1, mi2, ..., miTi)´ to be the vector of means for
the ith subject.
• To express the variance-covariance matrix, we
– define a diagonal matrix of variances
Vi = diag(v(mi1),..., v( miTi) )
– and the matrix of correlations Ri(t) to be a matrix with
r(mir , mis , t) in the rth row and sth column.
– Thus, Var yi = Vi1/2 Ri(t) Vi1/2.
Generalized estimating equations
•
•
These assumptions are suitable for a method of moments estimation
procedure called “generalized estimating equations” (GEE) in
biostatistics, also known as the generalized method of moments
(GMM) in econometrics.
GEE with known correlation parameter
– Assuming t is known, the jth row of the GEE is
0K 
n

G m (b)Vi1 (b)( y i  μ i (b))
i 1
– Here, the matrix
 m
G m (β,t )   i1
 β
•
m iTi 


β 
– is Ti x K*.
For linear models with mit = zit i + xit  , this is the GLS estimator
introduced in Section 3.3.
Consistency of GEEs
• The solution, bEE, is asymptotically normal with
covariance matrix


1
  G m (b)Vi (b)G m (b) 
 i 1

n
Var b EE

1
– Because this is a function of the means, mi, it can be
consistently estimated.
Robust estimation of standard
errors
• empirical standard errors may be calculated using the
following estimator of the asymptotic variance of bEE
 n

1
 G m Vi G m 


 i 1


1
 n
 n


1
1
1
 G m Vi (y i  μ i )(y i  μ i ) Vi G m  G m Vi G m 



 i 1
 i 1



1
GEE - correlation parameter estimation
• For GEEs with unknown correlation parameters, Prentice
(1988) suggests using a second estimating equation of the
form:
  E y *i  1 *

 Wi y i  E y *i  0
 τ 
i 

– where
*
2
2
2 
y i  yi1 yi 2 yi1 yi 3  yi1 yiTi  yi1 yi 2  yiTi
(

)
(
)
– Diggle, Liang and Zeger (1994) suggest using the
identity matrix for most discrete data.
• However, for binary responses,
– they note that the last Ti observations are redundant
because yit = yit2 and should be ignored.
– they recommend using
(
Wi  diag Var( yi1 yi 2 )  Var( yi ,Ti 1 yiTi )
)
Tort filings
• Assume an independent working correlation
– This yields at the same parameter estimators as in Table
10.5, under the homogenous Poisson model with an
estimated scale parameter.
– JSLIAB is (positively) statistically significant, using
both model-based and robust standard errors.
• To test the robustness of this model fit, we fit the same
model with an AR (1) working correlation.
– Again, JSLIAB is (positively) statistically significant.
– Interesting that CAPS is now borderline but in the
opposite direction suggested by Table 10.3
Table 10.6 Comparison of GEE Estimators.
Parameter
All models use an estimated scale parameter.
Logarithmic population is used as an offset.
AR(1) Working Correlation
Independent Working Correlation
Model-Based
Empirical
Estimate
Model-Based
Empirical
Estimate
Standard
Standard
Standard
Standard
Error
Error
Error
Error
0.870
-7.840*
1.306
2.231
0.166
0.748*
0.181
0.400*
0.664
-0.764
7.251
3.508
0.018
0.048*
0.020
0.139*
0.079
-0.014
0.068
0.142*
0.054
-0.043
35.857
0.8541
The asterisk (*) indicates that the estimate is more than twice the empirical standard error, in absolute value.
Intercept
POPLAWYR/1000
VEHCMILE/1000
POPDENSY/1000
WCMPMAX/1000
URBAN/1000
UNEMPLOY
JSLIAB
COLLRULE
CAPS
PUNITIVE
Scale
AR(1) Coefficient
-7.943*
2.163
0.862*
0.392*
-0.802
0.892
0.087*
0.177*
-0.030
-0.032
0.030
35.857
0.612
1.101
0.265
0.175
0.895
5.367
0.042
0.089
0.120
0.098
0.125
0.435
0.589
0.120
0.135
0.519
3.891
0.025
0.081
0.091
0.098
0.068
0.806
0.996
0.180
0.223
0.506
7.130
0.021
0.049
0.065
0.066
0.049
10.4 Random effects models
• The motivation and sampling issues regarding random
effects were introduced in Chapter 3.
• The model is easiest to introduce and interpret in the
following hierarchical fashion:
– 1. Subject effects {i} are a random sample from a
distribution that is known up to a vector of parameters t.
– 2. Conditional on {i}, the responses
– {yi1, yi2, ... , yiTi } are a random sample from a GLM with
systematic component hit = zit i + xit  .
Random effects models
• This model is a generalization of:
– 1. The linear random effects model in Chapter 3 - use a
normal distribution.
– 2. The binary dependent variables random effects model
of Section 9.2 - using a Bernoulli distribution. (In
Section 9.2, we focused on the case zit =1.)
• Because we are sampling from a known distribution with a
finite/small number of parameters, the maximum likelihood
method of estimation is readily available.
• We will use this method, assuming normally distributed
random effects.
• Also available in the literature is the EM (for expectationmaximization) algorithm for estimation - See Diggle, Liang
and Zeger (1994).
Random effects likelihood
• Conditional on i, the likelihood for the ith subject at the tth
observation is
 yit θit  b(θit )


exp 
 S( yit ,f ) 
f


• where b(qit) = E (yit | i) and hit = zit i + xit  = g(E (yit | i) ).
• Conditional on i, the likelihood for the ith subject is:

exp 


t
 yit θit  b(θit )


 S( yit ,f ) 
f


• We take expectations over i to get the (unconditional) likelihood.
• To see this explicitly, let’s use the canonical link so that qit = hit. The
(unconditional) likelihood for the ith subject is

li  exp


t


S( yit ,f ) exp




t
 yit (zita  xitβ)  b(zita  xitβ) 

d G(a)
f


• Hence, the total log-likelihood is Si ln li .
– The constant SitS(yit , f ) is unimportant for determining mle’s.
– Although evaluating, and maximizing, the likelihood requires
numerical integration, it is easy to do on the computer.
Random effects and serial correlation
• We saw in Chapter 3 that permitting subject-specific
effects, i, to be random induced serial correlation in the
responses yit.
– This is because the variance-covariance matrix of yit is
no longer diagonal.
• This is also true for the nonlinear GLM models. To see this,
– let’s use a canonical link and
– recall that E (yit | i) ) = b(qit) = b(hit ) = b(i + xit ).
Covariance calculations
• The covariance between two responses, yi1 and yi2 , is
Cov(yi1 , yi2 ) = E yi1 yi2 - E yi1 E yi2
= E {b(i+xi1) b(i+xi2)}
- E b(i+xi1) E b(i+xi2)
• To see this, using the law of iterated expectations,
E yi1 yi2 = E E (yi1 yi2 | i)
= E {E (yi1| i) E(yi2 | i)}
= E {b(i + xi1 ) b(i + xi2 )}
More covariance calculations
• Normality
• For the normal distribution we have b(a) = a.
• Thus, Cov(yi1 , yi2 )
= E {(i + xi1) (i + xi2)} - E (i + xi1) E (i + xi2)
= E i2 + (xi1) (xi2) - (xi1) (xi2) = Var i .
• For the Poisson, we have b(a) = ea. Thus,
E yit = E b(i+ xit) = E exp(i+ xit)
= exp(xit) E exp(i) and
• Cov(yi1 , yi2 )
= E {exp(i+ xi1) exp(i+ xi2)} - exp((xi1xi2) ) {E exp(i)}2
= exp((xi1xi2)) {E exp(2) - (E exp())2 }
= exp((xi1xi2)) Var exp() .
Random effects likelihood
• Recall, from Section 10.2, that the (unconditional) likelihood for the
ith subject is

li  exp


t


S( yit ,f ) exp




t
 yit (zit a  xitβ)  b(zita  xitβ) 

d G(a)
f


• Here, we use zit = 1,f = 1, and g(a) is the density of i.
• For the Poisson, we have b(a) = ea , and S(y, f) = -ln(y!), so the
likelihood is



li  exp ln( yit !) exp
 t



 
 exp ln( yit !) exp
 t
 




t

t

( yit (a  xitβ)  exp(a  xitβ)) g(a)da




yit xitβ exp ( yit a  exp(a  xit β)) g(a)da

 t



• As before, evaluating and maximizing the likelihood requires
numerical integration, yet it is easy to do on the computer.
Table 10.7 Tort Filings Model Coefficient Estimates – Random Effects
Variable
Intercept
POPLAWYR/1000
VEHCMILE/1000
POPDENSY/1000
WCMPMAX/1000
URBAN/1000
UNEMPLOY
JSLIAB
COLLRULE
CAPS
PUNITIVE
State Variance
-2 Log Likelihood
Logarithmic population is used as an offset.
Homogeneous Model with
Random Effects Model
estimated scale parameter
Parameter
p-values
Parameter estimate
p-values
estimate
-7.943
2.163
0.862
0.392
-0.802
0.892
0.087
0.177
-0.030
-0.032
0.030
<.0001
0.0002
<.0001
0.0038
0.1226
0.8187
0.0005
0.0292
0.7444
0.7457
0.6623
119,576
-2.753
-2.694
-0.183
9.547
-1.900
-47.820
0.035
0.546
-1.031
0.391
0.110
2.711
<.0001
<.0001
0.0004
<.0001
<.0001
<.0001
<.0001
0.3695
0.1984
0.5598
0.8921
15,623
10.5 Fixed effects models
• Consider responses yit, with mean mit, systematic component
hit = g(mit) = zit i + xit and canonical link so that hit = qit.
– Assume the responses are independent.
• Then, the log-likelihood is
 yit θit  b(θit )

ln p(y )   
 S( yit ,f )
f

it 


it
 yit (zit α i  xit β )  b(zit α i  xit β )

 S( yit ,f )

f


• Thus, the responses yit depend on the parameters through
only summary statistics.
– That is, the statistics St yit zit are sufficient for i .
– The statistics Sit yit xit are sufficient for .
– This is a convenient property of the canonical links. It is
not available for other choices of links.
MLEs - Canonical links
• The log-likelihood is
ln p(y) 

it
 yit (zit α i  xit β )  b(zit α i  xit β )

 S( yit ,f )

f


• Taking the partial derivative with respect to i yields:
 z ( y  b(zit α i  xit β ))
 z it ( yit  mit )
0    it it

 

f
f
 t 

t 
– because mit = b(qit) = b(zit i + xit  ).
• Taking the partial derivative with respect to  yields:
 x it ( yit  b(z it α i  x it β )) 
 x it ( yit  m it ) 
0





f
f
 it 

it 
• Thus, we can solve for the mle’s of i and  through:
0  St zit (yit - mit), and 0  Sit xit (yit - mit).
– This is a special case of the method of moments.
– This may produce inconsistent estimates of  , as we have seen in
Chapter 9.


Table 10.8 Tort Filings Model Coefficient Estimates – Fixed Effects
All models have an estimated scale parameter.
Logarithmic population is used as an offset.
Homogeneous Model
Model with state
categorical variable
Variable
Intercept
POPLAWYR/1000
VEHCMILE/1000
POPDENSY/1000
WCMPMAX/1000
URBAN/1000
UNEMPLOY
JSLIAB
COLLRULE
CAPS
PUNITIVE
Scale
Deviance
Pearson Chi-Square
Parameter
estimate
p-values
-7.943
2.163
0.862
0.392
-0.802
0.892
0.087
0.177
-0.030
-0.032
0.030
35.857
<.0001
0.0002
<.0001
0.0038
0.1226
0.8187
0.0005
0.0292
0.7444
0.7457
0.6623
118,309.0
129,855.7
Parameter
estimate
p-values
0.788
0.093
4.351
0.546
-33.941
0.028
0.131
-0.024
0.079
-0.022
16.779
0.5893
0.7465
0.2565
0.3791
0.3567
0.1784
0.0065
0.6853
0.2053
0.6377
22,463.4
23,366.1
Model with state and
time categorical
variables
Parameter
p-values
estimate
-0.428
0.303
3.123
1.150
-31.910
0.014
0.120
-0.016
0.040
0.039
16.315
0.7869
0.3140
0.4385
0.0805
0.4080
0.5002
0.0592
0.7734
0.5264
0.4719
19,834.2
20,763.0
Conditional likelihood estimation
• Assume the canonical link so that qit = hit = zit i + xit .
• Define the likelihood for a single observation to be
 yit (zit α i  xit β )  b(zit α i  xit β )

p( yit , α i , β)  exp
 S( yit ,f ) 
f


• Let Si be the random vector representing St zit yit and let sumi
be the realization of St zit yit .
– Recall that St zit yit are sufficient for i .
• The conditional likelihood of the data set is
 p( yi1 , α i , β) p( yi 2 , α i , β)p( yiTi , α i , β) 


Pr
ob
(
S

sum
)
i
i
i 1 

n

– This likelihood does not depend on {i}, only on .
– Maximizing it with respect to  yields root-n consistent
estimates.
• The distribution of Si is messy and is difficult to compute.
Poisson distribution
• The Poisson is the most widely used distribution for
counted responses.
– Examples include the number of migrants from state to
state and the number of tort filings within a state.
• A feature of the fixed effects version of the model is that the
mean equals the variance.
• To illustrate the application of Poisson panel data models,
let’s use the canonical link and zit = 1, so that
ln E (yit | i) = g(E (yit | i) ) = qit = hit = i + xit  .
• Through the log function, it links the mean to a linear
combination of explanatory variables. It is the basis of the
so-called “log-linear” model.
Conditional likelihood estimation
• We first examine the fixed effects model and thus assume
that {i} are fixed parameters.
– Thus, E yit = exp (i + xit ).
y
– The distribution is
(
E yit ) it exp (- E yit )
p( yit , i , β) 
yit !
– From Section 10.1, St yit is a sufficient statistic for i.
• The distribution of St yit turns out to be Poisson, with mean
exp(i) St exp(xit ) .
• Note that the ratio of means,
E yit
exp (xit β )
p it 

E yit
exp (xit β )
– does not depend on i.


t
t
Conditional likelihood details
• Thus, as in Section 10.1, the conditional likelihood for the
ith subject is
p( yi1 , α i , β) p( yi 2 , α i , β) p( yiTi , α i , β)
Pr ob( S i 
y
it )
t
(E yi1 ) y

i1
(
 E yiTi
)
yiTi
( (
exp - E yi1    yiTi
))
yi1! yiTi !

E


y
t


  yit 
 t 
it







 
exp  - E 

 

yit !
t



t

yit  


Conditional likelihood details


 yit !

 (E y ) yi1  E y yiTi
i1
iTi
t





yi1! yiTi !
  yit 

 t 
 E yit 


 t


(



 yit !


t



yi1! yiTi !

– where
p
p it 
yit
it
t
E yit

E yit
exp (xit β )
exp (xit β )


t
t
– This is a multinomial distribution.
)
Multinomial distribution
• Thus, the joint distribution of yi1, ..., yiTi given St yit has a
multinomial distribution.



  yit !
• The conditional likelihood is:




n

yit 

L    t
p
it 

i 1  yi1! yiTi ! t





• Taking partial derivatives yields:

ln L 
β

it

yit
ln p it (β ) 
β

it

yit  x it 


t

x itp it (β )

exp (xit β )
– where
p it (β ) 
exp (xit β )

– .
t
• Thus, the conditional MLE, b, is the solution of:


yit  x it  x itp it (b )  0
it
t



