Greene-TrueRandomEffects-APPC2014 - NYU Stern

Download Report

Transcript Greene-TrueRandomEffects-APPC2014 - NYU Stern

1/76
True Random Effects in Stochastic
Frontier Models
William Greene
New York University
2/76
Agenda
 Skew normality – Adelchi Azzalini





3/76
Stochastic frontier model
Panel Data: Time varying and time invariant inefficiency models
Panel Data: True random effects models
Maximum Simulated Likelihood Estimation
Applications of true random effects
 Persistent and transient inefficiency in Swiss railroads
 A panel data sample selection corrected stochastic frontier model
 Spatial effects in a stochastic frontier model
http://people.stern.nyu.edu/wgreene/appc2014.pdf
Skew Normality
4/76
The Stochastic Frontier Model
ln yi    xi  vi  ui ,
vi ~ N 0, v2  ,
ui  | U i |, U i ~ N 0, u2  ,
i  vi  ui = vi  | U i |
Convenient parameterization (notation)
i  vVi  u | U i | = v N [0,1]i  u | N [0,1] |
5/76
Log Likelihood
u
  ,  = u2  2v
v
log L( , , , ) =

N
i 1
Skew Normal
Density
=
6/76

N
i 1
2

 yi    x i  

log   log  





 ( yi    xi )  
  log  






  2  i   i   

log      

         
Birnbaum (1950) Wrote About Skew Normality
Effect of
Linear
Truncation on
a Multinormal
Population
7/76
Weinstein (1964) Found f()
Query 2: The Sum of
Values from a
Normal and a
Truncated Normal
Distribution
See, also, Nelson (Technometrics, 1964), Roberts (JASA, 1966)
8/76
O’Hagan and Leonard (1976) Found
Something Like f()
Resembles f()
Bayes Estimation
Subject to Uncertainty
About Parameter
Constraints
9/76
ALS (1977) Discovered How
to Make Great Use of f()
See, also, Forsund and Hjalmarsson (1974), Battese and Corra (1976)
Poirier,… Timmer, … several others.
10/76
Azzalini (1985) Figured Out f()
And Noticed the Connection to ALS
The standard skew normal distribution
f() = 2()()
11/76
© 2014
http://azzalini.stat.unipd.it/SN/
12/76
http://azzalini.stat.unipd.it/SN/abstracts.html#sn99
ALS
13/76
A Useful FAQ About the Skew Normal
How to generate pseudo random draws on 
1. Draw U ,V from independent N[0,1]
2.  = uV + u | U |
14/76
Random Number Generator
For a particular desired  and 
 2 2
2
Use  u 
and  v 
=
2
2
1 
1 
Then
   v N (0,1)   u | N (0,1) |
15/76
 2   u2
How Many Applications of SF Are There?
16/76
W. D. Walls (2006) On Skewness in the Movies
17/76
Cites Azzalini.
2( z )(z )
SNARCH Model for Financial Crises (2013)
“The skew-normal
distribution
developed by Sahu et
al. (2003)…”
Does not
know Azzalini.
18/76
A Skew Normal Mixed Logit Model (2010)
Mixed Logit Model
Prob(Choicei  j ) 
exp(i xij )

J
j 1
exp(i xij )
Random Parameters
ik    wik
Asymmetric (Skewed) Parameter Distribution
wik  vik   | U ik |~ SN (0, , )
Greene (2010, knows Azzalini and ALS),
Bhat (2011, knows not Azzalini … or ALS)
19/76
Skew Normal Applications
 Foundation: An Entire Field

Stochastic Frontier Model
 Occasional Modeling Strategy
 Culture: Skewed Distribution of Movie Revenues
 Finance: Crisis and Contagion
 Choice Modeling: The Mixed Logit Model
 How can these people find each other?
 Where else do applications appear?
20/76
Stochastic Frontier
21/76
The Cross Section Departure Point: 1977
Aigner et al. (ALS) Stochastic Frontier Model
yi    x i  vi  ui
vi ~ N [0, v2 ]
ui | U i | and U i ~ N [0, u2 ]
Jondrow et al. (JLMS) Inefficiency Estimator
  
(i ) 

uˆi  E[ui | i ]  
 

2 i
1



(

)


i 
i  vi  ui ,  
22/76
u
i
,   2v  u2 , i 
v

The Panel Data Models Appear: 1981
Pitt and Lee Random Effects Approach: 1981
Time
yit    x it  vit  ui
fixed
vit ~ N [0, v2 ], ui | U i | and U i ~ N [0, u2 ]
it  vit  ui
Counterpart to Jondrow et al. (1982)
 (i / ) 
uˆi  E [ui | i1 ,..., iT ]  i   

1


(

/

)
i


u2
u
 T i 
i = 


,



1


T
 2v
1  T


23/76
Reinterpreting the Within Estimator: 1984
Schmidt and Sickles Fixed Effects Approach: 1984
yit  i  x it  vit
vit ~ N [0, v2 ], i semiparametically specified
fixed mean, constant variance.
Counterpart to Jondrow et al. (1982)
uˆi  max i ( ˆ i )  ˆ i
(The cost of the semiparametric specification is the
location of the inefficiency distribution. The authors
also revisit Pitt and Lee to demonstrate.)
24/76
Time
fixed
Misgivings About Time Fixed Inefficiency: 1990-
Cornwell Schmidt and Sickles (1990)
it  0i  1i t  2i t 2
Kumbhakar (1990)
uit  [1  exp(bt  ct 2 )]1 | U i |
Battese and Coelli (1992, 1995)
uit  exp[(t  T )] | U i |, uit  exp[ g (t, T , zit )] | U i |
Cuesta (2000)
uit  exp[i (t  T )] | U i |, uit  exp[ gi (t , T , zit )] | U i |
25/76
Are the systematically time varying models
more like time fixed or freely time varying?
A Pooled Model
yit    x it  vit  uit
Battese and Coelli (1992)
uit  exp[ ( t  T )] | U i |
yit    x it  vit  | U i |
Pitt and Lee (1981)
Where is Battese and Coelli?
Closer to the pooled model or to Pitt and Lee?
Greene (2004): Much closer to the Pitt and Lee model
26/76
In these models with time varying inefficiency,
yit    x it  vit  gi (t , z it ) | U i |
vit ~ N [0, 2v ] and U it ~ N [0, u2 ],
where does unobserved time invariant
heterogeneity end up?
In the inefficiency! Even with the extensions.
27/76
Skepticism About Time Varying Inefficiency
Models: Greene (2004)
28/76




True Random Effects
29/76
True Random and Fixed Effects: 2004
True Random and Fixed Effects Approach: 2004
Time
yit  i  x it  vit  uit
varying
vit ~ N [0, v2 ], uit | U it | and U it ~ N [0, u2 ]
Time
fixed
i  Unobserved time invariant heterogeneity,
not unobserved time invariant inefficiency
Jondrow et al. (JLMS) Inefficiency Estimator
(it ) 
   
E [uit | it ]  
 

2   it
1



(

)


it 
u
it
2
2
it  vit  uit ,  
,   v  u , i 
v

30/76
Estimation of TFE and TRE Models: 2004
True Fixed Effects: MLE
yit  i  x it  vit  uit
vit ~ N [0, v2 ], uit | U it | and U it ~ N [0, u2 ]
i  Unobserved time invariant heterogeneity,
not unobserved time invariant inefficiency
Just add firm dummy variables to the SF model (!)
True Random Effects: Maximum Simulated Likelihood (RPM)
yit  (   wi )  x it  vit  uit
vit ~ N [0, v2 ], uit | U it | and U it ~ N [0, u2 ], wi ~ N [0, 2w ]
i  Unobserved time invariant heterogeneity,
not unobserved time invariant inefficiency
Random parameters stochastic frontier model
31/76
Log likelihood function for stochastic frontier model
log L(, , , ) =
32/76

N
i 1
2

 yi    xi  

log   log  





 ( yi    xi )  

  log  




Simulated log likelihood function for stochastic frontier model
with a time invariant random constant term. (TRE model)
 2  yit  (    w wir )  x it  

 

 

N
T
1 R
 
S
log L (,,,, w ) =  i 1 log  r 1  t 1 
R
  ( yit  (   w wir )  x it )  

 




wir  draws from N[0,1].
33/76
The Most Famous Frontier Study Ever
34/76
The Famous WHO Model
 logCOMP= +1logPerCapitaHealthExpenditure +
2logYearsEduc +
3Log2YearsEduc + 
  = v - u
 Schmidt/Sickles FEM
 191 Countries.
140 of them observed 1993-1997.
35/76
The Notorious WHO Results
37
36/76
August
12, 2012
37
No, it
doesn’t.
37/76
Huffington Post, April 17, 2014
38/76
we are #37
39/76
Greene, W., Distinguishing Between
Heterogeneity and Inefficiency:
Stochastic Frontier Analysis of the
World Health Organization’s Panel
Data on National Health Care
Systems, Health Economics, 13, 2004,
pp. 959-980.
40/76
x  1,log Exp,log Ed ,log 2 Ed
z  log PopDen,log PerCapitaGDP,
GovtEff ,VoxPopuli, OECD, GINI
41/76
Three Extensions of the
True Random Effects Model
42/76
Generalized True Random Effects Model
Generalized True Random Effects Stochastic Frontier Model
yit    Ai  Bi  xit  vit  uit
Transient random components
vit  uit
Time varying normal - half normal SF
Persistent random components
Ai  Bi
43/76
Time fixed normal - half normal SF
A Stochastic Frontier Model with Short-Run and
Long-Run Inefficiency:
Colombi, R., Kumbhakar, S., Martini, G., Vittadini,
G., University of Bergamo, WP, 2011, JPA 2014,
forthcoming.
Tsionas, G. and Kumbhakar, S.
Firm Heterogeneity, Persistent and Transient Technical Inefficiency:
A Generalized True Random Effects Model
Journal of Applied Econometrics. Published online, November, 2012.
Extremely involved Bayesian MCMC procedure. Efficiency components estimated by
data augmentation.
44/76
Generalized True Random Effects Stochastic Frontier Model
yit  (   w wi   | ei |)  xit  vit  uit
Time varying, transient random components
vit ~ N [0, v2 ], uit | U it | and U it ~ N [0, u2 ],
Time invariant random components
wi ~ N [0,1], ei ~ N [0,1]
The random constant term in this model has a closed skew
normal distribution, instead of the usual normal distribution.
45/76
Estimating Efficiency in the CSN Model
Moment Generating Function for the Multivariate CSN Distribution
E[exp(tui ) | y i ] 
 T 1 (Rri  t,  )
exp  tRri  12 tt 
 T 1 (Rri ,  )
 (...,  )  Multivariate normal cdf. Parts defined in Colombi et al.
Computed using GHK simulator.
 ei 
 1
u 
0
u i   i1  , t =   ,
 
 
 
 
u
0
 iT 
46/76
0
0
 1
0
  , ...,  
 
 
 
 
0
 
 1
Estimating the GTRE Model
47/76
Colombi et al. Classical Maximum Likelihood Estimator
log T (y i  Xi  1T ,   AVA) 

log L   i 1 

log

(
R
(
y

X


1

,

))

nq
log
2
q
i
i
T


T (...)
 T-variate normal pdf.
N
 q (...,  ))  (T  1)  Multivariate normal integral.
Very time consuming and complicated.
“From the sampling theory perspective, the application
of the model is computationally prohibitive when T is
large. This is because the likelihood function depends
on a (T+1)-dimensional integral of the normal
distribution.” [Tsionas and Kumbhakar (2012, p. 6)]
48/76
Kumbhakar, Lien, Hardaker
Technical Efficiency in Competing Panel Data Models: A Study of
Norwegian Grain Farming, JPA, Published online, September, 2012.
Three steps based on GLS:
(1) RE/FGLS to estimate (,)
(2) Decompose time varying residuals using MoM and SF.
(3) Decompose estimates of time invariant residuals.
49/76
Maximum Simulated Full Information log likelihood function for the
"generalized true random effects stochastic frontier model"
 2  yit  (   w wir   | U ir |)  xit  
 
 

T

 

 t 1   ( y  (   w   | U |)  x ) 
it
w ir
ir
it
 





 draws from N[0,1]
 ,  
N
1 R


logLS  ,   =  i 1 log  r 1
R
 ,
 w 
wir
|Uir | absolute values of draws from N[0,1]
50/76
WHO Results: 2014
x  1, log Exp, log Ed , log 2 Ed
z  log PopDen, log PerCapitaGDP,
GovtEff ,VoxPopuli, OECD, GINI
it  Ai  Bi  vit  uit
51/76
52/76
Empirical application
Cost Efficiency of Swiss Railway
Companies
53/76
Model Specification
TC = f ( Y1, Y2, PL , PC , PE , N, NS, dt )
C : Total costs
Y1 : Passenger-km
Y2 : Ton-km
PL : Price of labor (wage per FTE)
PC : Price of capital (capital costs / total number of
PE : Price of electricity
N : Network length
NS: Number of stations
Dt: time dummies
54/76
seats)
Data






55/76
50 railway companies
Period 1985 to 1997
unbalanced panel with number of periods (Ti) varying from 1 to 13 and
with 45 companies with 12 or 13 years, resulting in 605 observations
Data source: Swiss federal transport office
Data set available at http://people.stern.nyu.edu/wgreene/
Data set used in: Farsi, Filippini, Greene (2005), Efficiency and
measurement in network industries: application to the Swiss railway
companies, Journal of Regulatory Economics
56/76
57/76
Cost Efficiency Estimates
58/76
Correlations
59/76
MSL Estimation
60/76
Why is the MSL method so computationally
efficient compared to classical FIML and
Bayesian MCMC for this model?
 Conditioned on the persistent effects, the group
observations are independent.
 The joint conditional distribution is simple and easy to
compute, in closed form.
 The full likelihood is obtained by integrating over only
one dimension. (This was discovered by Butler and
Moffitt in 1982.)
 Neither of the other methods takes advantage of this
result. Both integrate over T+1 dimensions.
61/76
62/76
Equivalent Log Likelihood – Identical Outcome
One Dimensional Integration over δi
T+1 Dimensional Integration over Rei.
63/76
Simulated [over (w,h)] Log Likelihood

N
i 1
1 R

S
log   r 1 Gi (ir | , , , , w , h ) 
R

Very Fast – with T=13, one minute or so
64/76
Also Simulated Log Likelihood
GHK simulator is used to approximate the T+1 variate normal
integrals.
Very Slow – Huge amount of unnecessary computation.
65/76
Computation of the GTRE Model is Actually Fast and Easy
247 Farms, 6 years.
100 Halton draws.
Computation time:
35 seconds including
computing efficiencies.
66/76
Simulation Variance
67/76
Does the simulation chatter degrade the
econometric efficiency of the MSL estimator?
 Hajivassiliou, V., “Some practical issues in maximum simulated
likelihood,” Simulation-based Inference in Econometrics: Methods
and Applications, Mariano, R., Weeks, M. and Schuerman, T.,
Cambridge University Press, 2008
 Speculated that Asy.Var[estimator] = V + (1/R)C
 The contribution of the chatter would be of second or third order.
R is typically in the hundreds or thousands.
 No other evidence on this subject.
68/76
An Experiment
Pooled Spanish Dairy Farms Data
 Stochastic frontier using FIML.
 Random constant term linear regression with
constant term equal to  - |w|, w~ N[0,1]
This is equivalent to the stochastic frontier
model.
 Maximum simulated likelihood
 500 random draws for the simulation for the base case.
Uses Mersenne Twister for the RNG
 50 repetitions of estimation based on 500 random
draws to suggest variation due to simulation chatter.
69/76
ˆ v  0.10371
ˆ u  0.15573
70/76
Simulation Noise in Standard Errors of Coefficients
Chatter
.00543
.00590
.00042
.00119
71/76
Quasi-Monte Carlo Integration Based on
Halton Sequences
Coverage of the unit interval is the objective,
not randomness of the set of draws.
Halton sequences --- Markov chain
p = a prime number,
r= the sequence of integers, decomposed as

I
i 0
bi p i
H(r|p)   i  0 bi p  i 1 , r = r1 ,... (e.g., 10,11,12,...)
I
For example, using base p=5, the integer r=37 has b0 =
2, b1 = 2, and b3 = 1; (37=1x52 + 2x51 + 2x50). Then
H(37|5) = 25-1 + 25-2 + 15-3 = 0.488.
72/76
Is It Really Simulation?
 Halton or Sobol sequences are not
random
 Far more stable than random draws, by a
factor of about 10.
 There is no simulation chatter
 View the same as numerical quadrature
 There may be some approximation error.
How would we know?
73/76
Halton Sequences
Coverage of the unit interval is the objective,
not randomness of the set of draws.
Halton sequences --- Markov chain
p = a prime number,
r= the sequence of integers, decomposed as

H(r|p)   i  0 bi p  i 1 , r = r1 ,... (e.g., 10,11,12,...)
I
74/76
I
i 0
bi p i
Haltonized Log Likelihood
LogL(, , , , , )

 2  yit    xit  i   
 T  
  
N
 


    2  i

  log   
  



i 1
 t 1     yit    xit  i        

 


  
 

LogLS (, , , , , )


N
1 R  T
  log  
R r 1  t 1
i 1


ir   wWir   h | H ir |
 2  yit    xit  ir   
  
  


 

    yit    xit  ir    

 
 

  
 
Wir   1  Halton[prime( w), r  burn in]
H ir   1  Halton[prime(h), r  burn in]
75/76
  i  


   
Summary
 The skew normal distribution
 Two useful models for panel data (and one
potentially useful model pending development)
 Extension of TRE model that allows both transient and
persistent random variation and inefficiency
 Sample selection corrected stochastic frontier
 Spatial autocorrelation stochastic frontier model
 Methods: Maximum simulated likelihood as an
alternative to received brute force methods




76/76
Simpler
Faster
Accurate
Simulation “chatter” is a red herring – use Halton sequences
Sample Selection
77/76
TECHNICAL EFFICIENCY ANALYSIS CORRECTING FOR
BIASES FROM OBSERVED AND UNOBSERVED
VARIABLES: AN APPLICATION TO A NATURAL RESOURCE
MANAGEMENT PROJECT
Empirical Economics: Volume 43, Issue 1 (2012), Pages 55-72
Boris Bravo-Ureta
University of Connecticut
Daniel Solis
University of Miami
William Greene
New York University
78/76
The MARENA Program in Honduras
 Several programs have been implemented to address
resource degradation while also seeking to improve
productivity, managerial performance and reduce
poverty (and in some cases make up for lack of public
support).
 One such effort is the Programa Multifase de Manejo de
Recursos Naturales en Cuencas Prioritarias or MARENA
in Honduras focusing on small scale hillside farmers.
79/76
Expected Impact Evaluation
80/76
Methods
 A matched group of beneficiaries and control
farmers is determined using Propensity Score
Matching techniques to mitigate biases that
would stem from selection on observed
variables.
 In addition, we deal with possible self-selection
on unobservables arising from unobserved
variables using a selectivity correction model for
stochastic frontiers introduced by Greene (2010).
81/76
A Sample Selected SF Model
di = 1[′zi + hi > 0], hi ~ N[0,12]
yi =  + ′xi + i, i ~ N[0,2]
(yi,xi) observed only when di = 1.
i = vi - ui
ui = u|Ui| where Ui ~ N[0,12]
vi = vVi where Vi ~ N[0,12].
(hi,vi) ~ N2[(0,1), (1, v, v2)]
82/76
Simulated logL for the Standard SF Model
exp[ 12 ( yi    xi  u |Ui |)2 / v2 ]
f ( yi | xi ,| U i |) 
v 2 
f ( yi | xi )  
|Ui |
exp[ 12 ( yi    xi  u |Ui |)2 / v2 ]
p(| Ui |)d | Ui |
v 2 
2exp[  12 | U i |2 ]
p(| U i |) 
, |U i |  0. (Half normal)
2
1 R exp[ 12 ( yi    xi  u |Uir |)2 / v2 ]
f ( y | xi ) 

R r 1
v 2 
2
2

 1 R exp[ 12 ( yi    xi  u |Uir |) / v ] 

logLS (,,u ,v ) = i =1 log   r 1

R

2




v

N
This is simply a linear regression with a random constant term, αi = α - σu |Ui |
83/76
Likelihood For a Sample Selected SF Model
f  yi | ( x i , d i , zi ,| U i |) 
 exp   12 ( yi    x i  u | U i |)2 / v2 ) 

v 2 

 di 
  ( yi    x i  u | U i |) /   zi 

 
2

1 

 
f  yi | ( x i , d i , zi )  
84/76

|U i |



  (1  d i ) (  zi )



f  yi | ( xi , d i , zi ,| U i |)  f (| U i |)d | U i |
Simulated Log Likelihood for a Selectivity
Corrected Stochastic Frontier Model
The simulation is over the inefficiency term.
log LS (, , u , v , , )   i 1 log
N
85/76
1 R

R r 1
  exp   12 ( yi    x i  u | U ir |) 2 / v2 )  
di 

v 2 
 




 ( y    x   | U |) /   z   
i
i
u
ir

i
 
 
2

 
1 







 (1  d ) (  z )

i
i


JLMS Estimator of ui


 exp  12 ( yi  ˆ  ˆ x i  ˆ u | U ir |) 2 / ˆ v2 ) 




ˆ v 2 
fˆir  

  ˆ ( yi  ˆ  ˆ x i  ˆ u | U ir |) / ˆ v  ai  

 
2

1  ˆ
 
 
ˆA = 1  R ( ˆ | U |) fˆ , Bˆ  1  R fˆ
i
u
ir
ir
i
ir
R r 1
R r 1
Aˆi
uˆi  Estimator of E [ui |i ] 
Bˆi
R
R
fˆir
ˆ
ˆ
ˆ
  r 1 gir | uU ir | where gir  R
,  r 1 gˆ ir  1
ˆ
f
 r 1 ir
86/76
Closed Form for the Selection Model
 The selection model can be estimated without
simulation
 “The stochastic frontier model with correction
for sample selection revisited.” Lai, Hung-pin.
Forthcoming, JPA
 Based on closed skew normal distribution
 Similar to Maddala’s 1982 result for the linear
selection model. See slide 42.
 Not more computationally efficient.
 Statistical properties identical.
 Suggested possibility that simulation chatter is an element of
inefficiency in the maximum simulated likelihood estimator.
87/76
Closed Form vs. Simulation
Spanish Dairy Farms: Selection based on being farm #1-125. 6 periods
The theory works.
88/76
Variables Used
in the Analysis
Production
Participation
89/76
Findings from the First Wave
90/76
A Panel Data Model
 Selection takes place only at the baseline.
 There is no attrition.
d i 0  1[zi 0  hi 0 > 0]
Sample Selector
yit    wi  x it  vit  uit , t  0,1,... Stochastic Frontier
Selection effect is exerted on wi ; Corr(hi 0 , wi ,)  
P( yit , d i 0 )  P(d i 0 ) P( yit | d i 0 )
Conditioned on the selection (hi 0 ) observations are independent.
P( yi 0 , yi1 ,..., yiT | d i 0 )   t 0 P( yit | d i 0 )
T
I.e., the selection is acting like a permanent random effect.
P( yi 0 , yi1 ,..., yiT , d i 0 )  P( d i 0 ) t 0 P( yit | d i 0 )
T
91/76
Simulated Log Likelihood
log LS ,C (, , u , v , )
1 R
  d 1 log  r 1
i
R
92/76

T
t 0
 exp   12 ( yit    xit  u | U itr |) 2 /  v2 )  


v 2




  ( yit    xit  u | U itr |) /  v  ai 0  

 
2

1 
 
 
Main Empirical Conclusions from Waves 0 and 1




93/76
Benefit group is more efficient in both years
The gap is wider in the second year
Both means increase from year 0 to year 1
Both variances decline from year 0 to year 1
94/76
Spatial Autocorrelation
95/76
True Random Spatial Effects
 Spatial Stochastic Frontier Models: Accounting for Unobserved
Local Determinants of Inefficiency: A.M.Schmidt, A.R.B.Morris,
S.M.Helfand, T.C.O.Fonseca, Journal of Productivity Analysis, 31,
2009, pp. 101-112
 Simply redefines the random effect to be a ‘region effect.’ Just a
reinterpretation of the ‘group.’ No spatial decay with distance.
 True REM does not “perform” as well as several other
specifications. (“Performance” has nothing to do with the frontier
model.)
96/76