Maximum Likelihood Estimation. Panel Data Structures

Download Report

Transcript Maximum Likelihood Estimation. Panel Data Structures

Part 6: MLE for RE Models [ 1/38]
Econometric Analysis of Panel Data
William Greene
Department of Economics
Stern School of Business
Part 6: MLE for RE Models [ 2/38]
The Random Effects Model

The random effects model
y it =xitβ+ci +εit , observation for person i at time t
y i =X iβ+cii+εi , Ti observations in group i
=X iβ+ci +ε i , note ci  (c i , c i ,...,c i )
y =Xβ+c +ε, Ni=1 Ti observations in the sample
c=(c1 , c2 ,...cN ), Ni=1 Ti by 1 vector

ci is uncorrelated with xit for all t;


E[ci |Xi] = 0
E[εit|Xi,ci]=0
Part 6: MLE for RE Models [ 3/38]
Error Components Model
Generalized Regression Model
y it  x it b+εit +ui
E[εit | X i ]  0
E[εit2 | X i ]  σ 2
E[ui | X i ]  0
E[ui2 | X i ]  σ u2
 2   u2
 u2

2
2
2




u

u
Var[ε i +uii ]  


2
 u2
  u
y i =X iβ+ε i +uii for Ti observations




2
2
    u 
 u2
 u2
Part 6: MLE for RE Models [ 4/38]
Notation
 2   u2
 u2

2
2
2




u

u
Var[ε i +uii ]  


2
 u2
  u
=  2I Ti   u2ii Ti  Ti




2
2
    u 
 u2
 u2
=  2I Ti   u2ii
= Ωi
Ω1
0
Var[w | X ]  


 0
0
Ω2
0
0
0

 (Note these differ only

 in the dimension Ti )

ΩN 
Part 6: MLE for RE Models [ 5/38]
Maximum Likelihood
Assuming normality of it and ui.
Treat T joint observations on [(i1 , i2 ,...iTi ),ui ] as one Ti
variate observation. The mean vector of ε i  uii is zero
and the covariance matrix is Ωi=2 I  u2ii.
The joint density for ε i  ( y i - X iβ) is
f(ε i )  (2) Ti / 2 | Ωi |1 / 2 exp   12 ( y i - X iβ)Ωi-1 ( y i - X iβ) 
logL= Ni=1logL i where
-1
 Ti log2  log | Ωi | ( y i - X iβ)Ωi-1 ( y i - X iβ) 
2
-1
 Ti log2  log | Ωi |  εiΩi-1ε i 
=
2
logL i (β, 2 ,u2 ) =
Part 6: MLE for RE Models [ 6/38]
MLE Panel Data Algebra (1)
Ωi-1
1
 2



2
ii
I Ti  2
2
  Tiu 

So,
1

ε iΩ ε i  2

-1
i
1
 2



2



ε iii ε i 
 ε iε i  2
2
  Tiu



2 (Ti i )2 
εiε i  2
2 


T


i u 

Part 6: MLE for RE Models [ 7/38]
MLE Panel Data Algebra (1, cont.)
Ωi =2 I  u2ii =2 [I  2ii]=2 A
|Ωi|=(2 ) Ti  t i 1  t , = a characteristic root of A
T
Roots are (real since A is symmetric) solutions to Ac = c
Ac = c = c + 2iic or 2i(ic) = ( - 1)c
Any vector whose elements sum to zero (ic=0)
is a characteristic vector that corresponds to root  = 1.
There are Ti -1 such vectors, so Ti - 1 of the roots are 1.
Suppose ic  0. Premultiply by i to find
2ii(ic) = ( - 1)ic = Ti2 (ic)=( - 1)ic. Since ic  0,
divide by it to obtain the remaining root =1+Ti2 .
2 Ti

Therefore, |Ωi|=( )
2 Ti
2


(

)
(1

T

)
 t 1 t 
i
Ti
Part 6: MLE for RE Models [ 8/38]
MLE Panel Data Algebra (1, conc.)
-1
 Ti log 2  log | Ωi |  εiΩi-1ε i 
2
2 (Ti i )2  
-1 
1 
2
2
  Ti log 2  Ti log   log(1  Ti )  2 εiε i  2

2 
 
  Tiu2  
logL  Ni1 logL i
logL i 
2 (Ti i )2 
-1
1 N 
2
N
N
2
 [(log 2  log  )i1 Ti + i1 log(1  Ti )]  2 i1 εiε i  2
2
2
2 


T


i u 

2
2
u
2

since    /  ,
2 (Ti i )2
2 (Ti i )2
(Ti i )2


2
2
2
2 2
  Tiu
    Ti 1  2 Ti
-T
1
logL i  i [(log 2  log 2 ) +log(1  Ti2 )]  2
2
2 

(Ti i )2 
εiε i 

2
1


T

i
Part 6: MLE for RE Models [ 9/38]
Maximizing the Log Likelihood


Difficult: “Brute force” + some elegant theoretical results: See
Baltagi, pp. 22-23. (Back and forth from GLS to ε2 and u2.)
Somewhat less difficult and more practical: At any iteration, given
estimates of ε2 and u2 the estimator of  is GLS (of course), so
we iterate back and forth between these. See Hsiao, pp. 39-40.
0. Begin iterations with, say, FGLS estimates of β, 2 , u2 .
2
2
2
2
ˆ by FGLS (
1. Given 
ˆ  ,r and 
ˆ u,r , compute β
ˆ  ,r , 
ˆ u,r )
r+1
ˆ compute 
2. Given β
ˆ
r+1
2
 ,r+1
Ni=1ˆi,r 1MDi ˆi,r 1
=
Ni=1 (Ti  1)
N
ˆi.r2 1

i=1
ˆ ,
3. Given β
compute 
=
ˆ
r+1 ˆ
N
ˆ -β
ˆ = 0.
4. Return to step 1 and repeat until β
r+1
r
2
 ,r+1
2
u,r+1
Part 6: MLE for RE Models [ 10/38]
Direct Maximization of LogL
Simpler : Take advantage of the invariance of maximum
likelihood estimators to transformations of the parameters.
Let =1/2 , =u2 / 2 , R i  Ti  1, Qi   / R i ,
logL i  (1 / 2)[(εiε i  Qi (Ti i )2 )  logR i  Ti log   Ti l og2]
Can be maximized using ordinary optimization methods (not
Newton, as suggested by Hsiao). Treat as a standard nonlinear
optimization problem. Solve with iterative, gradient methods.
Part 6: MLE for RE Models [ 11/38]


Part 6: MLE for RE Models [ 12/38]
Part 6: MLE for RE Models [ 13/38]
Maximum Simulated Likelihood
Assuming it and ui are normally distributed. Write ui = u v i
where v i ~ N[0,1]. Then y it = x it β + u v i  it . If v i were
observed data, all observations would be independent, and
log f(y it | x it , v i )  1 / 2[log 2  log 2  (y it - x it β - u v i )2 / 2 ]
Let 2  1 / 2
The log of the joint density for Ti observations with common v i is
logL i (β, u , 2 | v i )   tTi 1 (1 / 2)[log 2  log 2  2 (y it - x itβ - u v i ) 2 ]
The conditional log likelihood for the sample is then


logL(β, u , 2 | v)  Ni1  tTi 1 (1 / 2)[log 2  log 2  2 (y it - x itβ - u v i )2 ]
Part 6: MLE for RE Models [ 14/38]
Likelihood Function for Individual i
The conditional log likelihood for the sample is then


logL(β, u , 2 | v)  Ni1  tTi 1 (1 / 2)[log 2  log 2  2 (y it - x it β - u v i ) 2 ]
The unconditional log likelihood is obtained by integrating v i out of L i (β, u , 2 | v i );
2

L i (β, u ,  ) 



Ti
t 1
2 exp[(2 / 2)(y it - x it β - u v i )2 ]
(v i )dv i  E vi L i (β, u , 2 | v i )
2
The integral usually does not have a closed form. (For the normal distribution

above, actually, it does. We used that earlier. We ignore that for now.)
Part 6: MLE for RE Models [ 15/38]
Log Likelihood Function
The full log likelihood function that needs to be maximized is
logL   i1 logL i (β, u , 2 ) 
N
=



N
i1
N
i1
   Ti 2 exp[(2 / 2)(y it - x it β - u v i )2 ] 

log     t 1
 (v i )dv i 

2






logE vi L i (β, u , 2 | v i )

This is the function to be maximized to obtain the MLE of [β,  , u ]
Part 6: MLE for RE Models [ 16/38]
Computing the Expected LogL
How to compute the integral: First note, (v i )  exp( v i2 / 2) / 2




Ti
t 1
2 exp[(2 / 2)(y it - x it β - u v i )2 ]
2
 E vi L i (β, u , 2 | v i )
(v i )dv i
(1) Numerical (Gauss-Hermite) quadrature for integrals of this form is
remarkably accurate;



2
e  v g(v)dv

H
h 1
w hg(ah )
Example: Hermite Quadrature Nodes and Weights, H=5
Nodes: -2.02018,-0.95857, 0.00000, 0.95857, 2.02018
Weights: 1.99532,0.39362, 0.94531, 0.39362, 1.99532
Applications usually use many more points, up to 96 and
Much more accurate (more digits) representations.
Part 6: MLE for RE Models [ 17/38]
Quadrature
A change of variable is needed to get it into the right form: Each
term then becomes
L i,Q   h1 wh
H
1

Ti
t 1
2 exp[(2 / 2)(y it - x it β - uah )2 ]

2
and the problem is solved by maximizing with respect to β, 2 , u
logL Q   i1 logL i,Q
N
(Maximization will be continued later in the semester.)
Part 6: MLE for RE Models [ 18/38]
Gauss-Hermite Quadrature




Ti
t 1
2 exp[(2 / 2)(y it - x it β - u v i )2 ]
2
(v i )dv i
(v i )  exp( v i2 / 2) / 2
Make a change of variable to ai  v i / 2 ,v i= 2 ai , dv i = 2 dai
1 1
2 2
1 2
2 2
1 


2
1
2 










2
i
Ti
t 1
2
i
Ti
t 1
exp(a )
exp(a )
2 exp[(2 / 2)(y it - x it β - u 2ai )2 ]
2 dai
2
2 exp[(2 / 2)(y it - x it β - u 2ai )2 ]
2


] dai
exp(ai2 )  tTi 12 exp[(2 / 2)(y it - x it β - u 2ai ) 2 ] dai
exp(ai2 )g  ai  dai 
1
2 
Hh1w hg(ah )
Part 6: MLE for RE Models [ 19/38]
Simulation
The unconditional log likelihood is an expected value;
logL i (β, u , 2 )
 Ti 2 exp[(2 / 2)(y it - x it β - u v i )2 ] 
= log   t 1
 (v i )dv i

2


 logE vi L i (β, u , 2 | v i ) = E v g(v i )

An expected value can be 'estimated' by sampling observations
and averaging them
2
2
2


exp[

(

/
2)(y
x
β

v
)
]
1
R 
T


it
it
u
ir
i
ˆ v g(v i )    t 1
E

R r 1 
2

The unconditional log likelihood function is then
1 R  Ti 2 exp[(2 / 2)(y it - x it β - u v ir )2 ] 

 i1 log R  r 1  t 1
2



N
This is a function of (β, 2 , u| y i , X i , v i,1 ,..., v i,R ),i  1,...,N
The random draws on v i,r become part of the data, and the function
is maximized with respect to the unknown parameters.
Part 6: MLE for RE Models [ 20/38]
Convergence Results
Target is expected log likelihood: logE vi [L(β,2 |v i )]
Simulation estimator based on random sampling from population of v i
LogL S (β,2 )
1 R  Ti 2 exp[(2 / 2)(y it - x it β - u v ir )2 ] 
log  i1  r 1  t 1

R
2



N
The essential result is
plim(R  )LogL S (β,2 )  logE vi [L(β,2|v i )]
Conditions:
(1) General regularity and smoothness of the log likelihood
(2) R increases faster than N. ('Intelligent draws' - e.g. Halton sequences
makes this somewhat ambiguous.)
Result:
Maximizer of LogL S (β,2 ) converges to the maximizer of logE vi [L(β,2|v i )].
Part 6: MLE for RE Models [ 21/38]
MSL vs. ML
.154272 = .023799
Part 6: MLE for RE Models [ 22/38]
Two Level Panel Data


Nested by construction
Unbalanced panels



No real obstacle to estimation
Some inconvenient algebra.
In 2 step FGLS of the RE, need “1/T” to solve for an
estimate of σu2. What to use?
Q 1/ T
(1/T)=(1/N)Ni1 (1 / Ti ) (early NLOGIT)
QH =[Ni=1 (1/Ti )]1/N
(Stata)
(TSP, current NLOGIT, do not use this.)
Part 6: MLE for RE Models [ 23/38]
Balanced Nested Panel Data
Zi,j,k,t = test score for student
t, teacher k, school j, district i
L = 2 school districts, i = 1,…,L
Mi = 3 schools in each district, j = 1,…,Mi
Nij = 4 teachers in each school, k = 1,…,Nij
Tijk = 20 students in each class, t = 1,…,Tijk
Antweiler, W., “Nested Random Effects Estimation in Unbalanced
Panel Data,” Journal of Econometrics, 101, 2001, pp. 295-313.
Part 6: MLE for RE Models [ 24/38]
Nested Effects Model
y ijkt  x ijkt  uijk  v ij  w i  ijkt
Strict exogeneity, all parts uncorrelated.
Normality assumption added later
Var[uijk  v ij  w i  ijkt ]=u2  2v  2w  2
Overall covariance matrix Ω is block diagonal over i, each diagonal block
is block diagonal over j, each of these, in turn, is block diagonal over k,
and each lowest level block has the form of Ω we saw earlier.
Part 6: MLE for RE Models [ 25/38]
GLS with Nested Effects
Define
2
12  Tu2  2
 2  Tu2
22  NT2v  Tu2  2
 12  NT2v
2v  MNT2w  NT2v  Tu2  2  22  MNT2w
GLS is equivalent to OLS regression of



 
 
 
y ijkt  y ijkt  1    y ijk .       y ij ..       y i ...
1 

 1 2 
 2 3 
on the same transformation of x ijkt . FGLS estimates are
obtained by "three group-wise between estimators and
the within estimator for the innermost group."
Part 6: MLE for RE Models [ 26/38]
Unbalanced Nested Data




With unbalanced panels, all the preceding
results fall apart.
GLS, FGLS, even fixed effects become
analytically intractable.
The log likelihood is very tractable
Note a collision of practicality with
nonrobustness. (Normality must be assumed.)
Part 6: MLE for RE Models [ 27/38]
Log Likelihood (1)
Define :
u2
2v
2w
u  2 ,  v  2 , w  2 .



Construct: ijk  1  Tijk u , ij  
ij  1  ij v , i  
Nij
k 1
Mi
j 1
Tijk
ijk
ij
ij
i  1  w i
T
2
Sums of squares: A ijk   t ijk1eijkt
, eijkt  y ijkt  x ijktβ
Tijk
t 1 ijkt
B ijk   e , B ij  
Nij
k 1
B ijk
ijk
, Bi  
Mi
j 1
B ij
ij
Part 6: MLE for RE Models [ 28/38]
Log Likelihood (2)
H  total number of observations
logL=
-1
[Hlog(22 )  Li1 {
2
log i  Mji1 {
N
log ij  k ij1 {
2
2
u B ijk
v B ij
w B i2
log ijk  2 
}}}]
2
2
2
 ijk 
ij 
i 
A ijk
(For 3 levels instead of 4, set L = 1 and w = 0.)
Part 6: MLE for RE Models [ 29/38]
Maximizing Log L


Antweiler provides analytic first derivatives for
gradient methods of optimization. Ugly to
program.
Numerical derivatives:
Let δ be the full vector of K+4 parameters.
Let r  perturbation vector, with =max(0 ,1 | r |)
in the rth position and zero in the other K+3 positions.
 logL logL(δ   r )  logL(δ   r )

r
2
Part 6: MLE for RE Models [ 30/38]
Asymptotic Covariance Matrix
"Even with an analytic gradient, however, the Hessian
matrix, Ψ is typically obtained through numeric approximation
methods." Read "the second derivatives are too complicated
to derive, much less program." Also, since logL is not a sum
of terms, the BHHH estimator is not useable. Numerical
second derivatives were used.
Part 6: MLE for RE Models [ 31/38]
An Appropriate Asymptotic
Covariance Matrix
The expected Hessian is block diagonal. We can isolate β.
 2 logL
1
N
T
 2 Li1Mji1k ij1 t ijk1 x ijkt x ijkt
ββ



W L Mi Nij 1
Tijk




i1 j1 k 1
t 1 x ijkt
2

ijk
 
Tijk
t 1
x ijkt

  Nij 1

v L Mi 1  Nij 1
Tijk
Tijk

 2 i1 j1  k 1
 t 1 x ijkt   k 1
 t 1 x ijkt 



ij 
ijk

ijk


   M 1  Nij 1
u L  Mi 1  Nij 1
Tijk
T
 2 i1   j1  k 1
 t 1 x ijkt     ji1  k 1
 t ijk1 x ijkt








ijk
ij
ijk
ij





The inverse of this, evaluated at the MLEs provides the appropriate








ˆ. Standard errors for the
estimated asymptotic covariance matrix for β
variance estimators are not needed.

 


Part 6: MLE for RE Models [ 32/38]
Some Observations

Assuming the wrong (e.g., nonnested) error
structure



Still consistent – GLS with the wrong weights
Standard errors (apparently) biased downward
(Moulton bias)
Adding “time” effects or other nonnested
effects is “very challenging.” Perhaps do with
“fixed” effects (dummy variables).
Part 6: MLE for RE Models [ 33/38]
An Application




Y1jkt = log of atmospheric sulfur dioxide
concentration at observation station k at time t,
in country i.
H = 2621, 293 stations, 44 countries, various
numbers of observations, not equally spaced
Three levels, not 4 as in article.
Xjkt =1,log(GDP/km2),log(K/L),log(Income),
Suburban, Rural,Communist,log(Oil price),
average temperature, time trend.
Part 6: MLE for RE Models [ 34/38]
Estimates
x1
Dimension
. . .
x2
x3
C S T
C . T
x4
x5
x6
x7
x8
x9
x10
C
C
C
C
.
C
.
.
S
S
.
.
S
.
T
T
T
.
T
T
T
Random Effects
-10.787 (12.03)
0.445 (7.921)
0.255 (1.999)
-0.714
-0.627
-0.834
0.471
-0.831
-0.045
-0.043
(5.005)
(3.685)
(2.181)
(2.241)
(2.267)
(4.299)
(1.666)
Nested Effects
-7.103 (5.613)
0.202 (2.531)
0.371 (2.345)
-0.477 (2.620)
-0.720 (4.531)
-1.061 (3.439)
0.613 (1.443)
-0.089 (2.410)
-0.044 (3.719)
-0.046 (10.927)
2
0.330
0.329
u
1.807
1.017
v
1.347
logL
-2645.4
(t ratios in parentheses)
-2606.0
Part 6: MLE for RE Models [ 35/38]
Rotating Panel-1
The structure of the sample and selection of individuals in a rotating sampling
design are as follows: Let all individuals in the population be numbered
consecutively. The sample in period 1 consists of N, individuals. In period 2, a
fraction, met (0 < me2 < N1) of the sample in period 1 are replaced by mi2 new
individuals from the population. In period 3 another fraction of the sample in the
period 2, me2 (0 < me2 < N2) individuals are replaced by mi3 new individuals and
so on. Thus the sample size in period t is Nt = {Nt-1 - met-1 + mii }. The
procedure of dropping met-1 individuals selected in period t - 1 and replacing
them by mit individuals from the population in period t is called rotating
sampling. In this framework total number of observations and individuals
observed are ΣtNt and N1 + Σt=2 to Tmit respectively.
Heshmati, A,“Efficiency measurement in rotating panel data,” Applied Economics, 30,
1998, pp. 919-930
Part 6: MLE for RE Models [ 36/38]
Rotating Panel-2
The outcome of the rotating sample for farms producing dairy products is given
in Table 1. Each of the annual sample is composed of four parts or subsamples.
For example, in 1980 the sample contains 79, 62, 98, and 74 farms. The first
three parts (79, 62, and 98) are those not replaced during the transition from
1979 to 1980. The last subsample contains 74 newly included farms from the
population. At the same time 85 farms are excluded from the sample in 1979.
The difference between the excluded part (85) and the included part (74)
corresponds to the changes in the rotating sample size between these two
periods, i.e. 313-324 = -11. This difference includes only the part of the sample
where each farm is observed consecutively for four years, Nrot. The difference in
the non-rotating part, N„„„, is due to those farms which are not observed
consecutively. The proportion of farms not observed consecutively, Nnon in the
total annual sample, Nnon varies from 11.2 to 22.6% with an average of 18.7 per
cent.
Part 6: MLE for RE Models [ 37/38]
Rotating Panels-3

Simply an unbalanced panel



Time effects may be complicated.


Treat with the familiar techniques
Accounting is complicated
Biorn and Jansen (Scand. J. E., 1983) households cohort 1 has
T = 1976,1977 while cohort 2 has T=1977,1978.
But,… “Time in sample bias…” may require special
treatment. Mexican labor survey has 3 periods rotation.
Some families in 1 or 2 or 3 periods.
Part 6: MLE for RE Models [ 38/38]
Pseudo Panels
T different cross sections.
y i(t),t  x i(t),t  ui(t)  i(t),t , i(t)=1,...,N(t); t=1,...,T
T
N(t) independent observations.
These are  t=1
Define C cohorts - e.g., those born 1950-1955.
y c,t  x c,t  uc,t  c,t , c=1,...,C; t=1,...,T
Cohort sizes are Nc (t). Assume large. Then
uc,t  uc for each cohort. Creates a fixed effects model:
y c,t  x c,t  uc  c,t , c=1,...,C; t=1,...,T.
(See Baltagi 10.3 for issues relating to measurement error.)