Part 22: Stochastic Frontier [1/83] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
Download ReportTranscript Part 22: Stochastic Frontier [1/83] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
Part 22: Stochastic Frontier [1/83] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business Part 22: Stochastic Frontier [2/83] 22. Stochastic Frontier Models And Efficiency Measurement Part 22: Stochastic Frontier [3/83] Applications Banking Accounting Firms, Insurance Firms Health Care: Hospitals, Nursing Homes Higher Education Fishing Sports: Hockey, Baseball World Health Organization – World Health Industries: Railroads, Farming, Several hundred applications in print since 2000 Part 22: Stochastic Frontier [4/83] Technical Efficiency Part 22: Stochastic Frontier [5/83] Technical Inefficiency = Production parameters, “i” = firm i. Part 22: Stochastic Frontier [6/83] (Nonparametric) Data Envelopment Analysis Part 22: Stochastic Frontier [7/83] DEA is done using linear programming Part 22: Stochastic Frontier [8/83] Regression Basis Part 22: Stochastic Frontier [9/83] Maintaining the Theory One Sided Residuals, ui < 0 Deterministic Frontier Statistical Approach: Gamma Frontier. Not successful Nonstatistical Approach: Data Envelopment Analysis based on linear programming – wildly successful. Hundreds of applications; an industry with an army of management consultants Part 22: Stochastic Frontier [10/83] Gamma Frontier Greene (1980, 1993, 2003) Part 22: Stochastic Frontier [11/83] Cost Frontier Part 22: Stochastic Frontier [12/83] The Stochastic Frontier Model Part 22: Stochastic Frontier [13/83] Stochastic Frontier Disturbances Part 22: Stochastic Frontier [14/83] Half Normal Model (ALS) Closed Skew Normal Distribution = v - u where v ~ N[0,2w ] and u = |N[0,u2 ] | u 2 f () ; = v Part 22: Stochastic Frontier [15/83] Estimating the Stochastic Frontier OLS Slope estimator is unbaised and consistent Constant term is biased downward e’e/N estimates Var[ε]=Var[v]+Var[u]=v2+ u2[(π-2)/ π] No estimates of the variance components Maximum Likelihood The usual properties Likelihood function has two modes: OLS with =0 and ML with >0. Part 22: Stochastic Frontier [16/83] Other Possible Distributions Exponential fu (ui ) = θexp(-θui ), θ > 0, ui > 0. LogL(α, β,σ v ,σ u ) = N i 1 2 -(εi + σ 2v /σ u ) εi 1 σv -lnσ u + +lnΦ . + 2 σu σv σ u Gamma σ u-P fu (ui ) = exp(-ui /σ u )uP-1 i , ui > 0,P > 0 Γ(P) LogL(α, β,σ v ,σ u ) = N i=1 -Plnσ u - lnΓ(P) +lnq(P -1,ε i ) 2 2 -(εi + σ v /σ u ) εi . 1 σv + + 2 σ +lnΦ σ σ u v u lnq(P -1,εi ) must be approximated using simulation Part 22: Stochastic Frontier [17/83] Normal vs. Exponential Models Part 22: Stochastic Frontier [18/83] Estimating Inefficiency ( z i) E u i| i = + z i 2 ( z i ) 1+ - i where zi = Part 22: Stochastic Frontier [19/83] Dual Cost Function Part 22: Stochastic Frontier [20/83] Application: Electricity Data Sample = 123 Electricity Generating Firms, Data from 1970 Variable Mean Std. Dev. Description ======================================================== FIRM 62.000 35.651 Firm number, 1,…,123 COST 48.467 64.064 Total cost OUTPUT 9501.1 12512. Total generation in KWH CAPITAL .14397 .19558 K = Capital share * Cost / PK LABOR .00074 .00099 L = Labor share * Cost / PL FUEL 1.0047 1.2867 F = Fuel share * Cost / PL LPRICE 7988.6 1252.8 PL = Average labor price LSHARE .14286 .056310 Labor share in total cost CPRICE 72.895 9.5163 PK = Capital price CSHARE .22776 .06010 Capital share in total cost FPRICE 30.807 7.9282 PF = Fuel price in cents ber BTU FSHARE .62938 .08619 Fuel share in total cost LOGC_PF -.38339 1.5385 Log (Cost/PF) LOGQ 8.1795 1.8299 Log output LOGQSQ 35.113 13.095 ½ Log (Q)2 Part 22: Stochastic Frontier [21/83] OLS – Cost Function +----------------------------------------------------+ | Ordinary least squares regression | | Residuals Sum of squares = 2.443509 | | Standard error of e = .1439017 | | Fit R-squared = .9915380 | | Diagnostic Log likelihood = 66.47364 | +----------------------------------------------------+ |Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant -7.29402077 .34427692 -21.186 .0000 LOGQ .39090935 .03698792 10.569 .0000 8.17947153 LOGPL_PF .26078497 .06810921 3.829 .0002 5.58088278 LOGPK_PF .07478746 .06164533 1.213 .2275 .88666047 LOGQSQ .06241301 .00515483 12.108 .0000 35.1125267 Part 22: Stochastic Frontier [22/83] ML – Cost Function +---------------------------------------------+ | Maximum Likelihood Estimates | | Log likelihood function 66.86502 | | Variances: Sigma-squared(v)= .01185 | | Sigma-squared(u)= .02233 | | Sigma(v) = .10884 | | Sigma(u) = .14944 | | Sigma = Sqr[(s^2(u)+s^2(v)]= .18488 | +---------------------------------------------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Primary Index Equation for Model Constant -7.49421176 .32997411 -22.712 .0000 LOGQ .41097893 .03599288 11.418 .0000 8.17947153 LOGPL_PF .26058898 .06554430 3.976 .0001 5.58088278 LOGPK_PF .05531289 .06001748 .922 .3567 .88666047 LOGQSQ .06058236 .00493666 12.272 .0000 35.1125267 Variance parameters for compound error Lambda 1.37311716 .29711056 4.622 .0000 Sigma .18487506 .00110120 167.884 .0000 Part 22: Stochastic Frontier [23/83] Estimated Efficiencies Part 22: Stochastic Frontier [24/83] Panel Data Applications Ui is the ‘effect’ Fixed (OLS) or random effect (ML) Is inefficiency fixed over time? yit xit vit ui ‘True’ fixed and random effects Is inefficiency time varying? Where does heterogeneity show up in the model? yit i xit (zi ) vit uit (zi ) Part 22: Stochastic Frontier [25/83] Main Issues in Panel Data Modeling Issues Capturing Time Invariant Effects Dealing with Time Variation in Inefficiency Separating Heterogeneity from Inefficiency Contrasts – Panel Data vs. Cross Section Part 22: Stochastic Frontier [26/83] Familiar RE and FE Models Wisdom from the linear model FE: y(i,t) = f[x(i,t)] + a(i) + e(i,t) RE: y(i,t) = f[x(i,t)] + u(i) + e(i,t) What does a(i) capture? Nonorthogonality of a(i) and x(i,t) The LSDV estimator How does u(i) differ from a(i)? Generalized least squares and maximum likelihood What are the time invariant effects? Part 22: Stochastic Frontier [27/83] Frontier Model for Panel Data y(i,t) = β’x(i,t) – u(i) +v(i,t) Effects model with time invariant inefficiency Same dichotomy between FE and RE – correlation with x(i,t). FE case is completely unlike the assumption in the cross section case Part 22: Stochastic Frontier [28/83] Pitt and Lee RE Model Part 22: Stochastic Frontier [29/83] Estimating Efficiency Part 22: Stochastic Frontier [30/83] Schmidt and Sickles FE Model lnyit = + β’xit + ai + vit estimated by least squares (‘within’) ˆ j ˆi > 0 (for production or profit) u ˆi = max(a j -)a ˆi - min(a ˆ)j > 0 (for cost or distance) u ˆi = a j Implications: One firm is perfectly efficient. Either deterministic frontier, or firms are compared to other firms, not an absolute standard of zero. Part 22: Stochastic Frontier [31/83] A Problem of Heterogeneity In the “effects” model, u(i) absorbs two sources of variation Time invariant inefficiency Time invariant heterogeneity unrelated to inefficiency (Decomposing u(i,t)=u*(i)+u**(i,t) in the presence of v(i,t) is hopeless.) Part 22: Stochastic Frontier [32/83] Time Invariant Heterogeneity Part 22: Stochastic Frontier [33/83] A True RE Model (Greene, 2004) Part 22: Stochastic Frontier [34/83] Kumbhakar et al.(2011) – True True RE yit = b0 + b’xit + (ei0 + eit) - (ui0 + uit) ei0 and eit full normally distributed ui0 and uit half normally distributed (So far, only one application) Colombi, Kumbhakar, Martini, Vittadini, “A Stochastic Frontier with Short Run and Long Run Inefficiency, 2011 Part 22: Stochastic Frontier [35/83] Generalized True Random Effects Model Generalized True Random Effects Stochastic Frontier Model yit Ai Bi xit vit uit Transient random components vit uit Time varying normal - half normal SF Persistent random components Ai Bi Time fixed normal - half normal SF Part 22: Stochastic Frontier [36/83] A Stochastic Frontier Model with Short-Run and Long-Run Inefficiency: Colombi, R., Kumbhakar, S., Martini, G., Vittadini, G., University of Bergamo, WP, 2011, JPA 2014, forthcoming. Tsionas, G. and Kumbhakar, S. Firm Heterogeneity, Persistent and Transient Technical Inefficiency: A Generalized True Random Effects Model Journal of Applied Econometrics. Published online, November, 2012. Extremely involved Bayesian MCMC procedure. Efficiency components estimated by data augmentation. Part 22: Stochastic Frontier [37/83] Generalized True Random Effects Stochastic Frontier Model yit ( w wi | ei |) xit vit uit Time varying, transient random components vit ~ N [0, v2 ], uit | U it | and U it ~ N [0, u2 ], Time invariant random components wi ~ N [0,1], ei ~ N [0,1] The random constant term in this model has a closed skew normal distribution, instead of the usual normal distribution. Part 22: Stochastic Frontier [38/83] Estimating Efficiency in the CSN Model Moment Generating Function for the Multivariate CSN Distribution E[exp(tui ) | y i ] T 1 (Rri t, ) exp tRri 12 tt T 1 (Rri , ) (..., ) Multivariate normal cdf. Parts defined in Colombi et al. Computed using GHK simulator. ei 1 u 0 u i i1 , t = , u 0 iT 0 0 1 0 , ..., 0 1 Part 22: Stochastic Frontier [39/83] Estimating the GTRE Model Part 22: Stochastic Frontier [40/83] Colombi et al. Classical Maximum Likelihood Estimator log T (y i Xi 1T , AVA) log L i 1 log ( R ( y X 1 , )) nq log 2 q i i T T (...) T-variate normal pdf. N q (..., )) (T 1) Multivariate normal integral. Very time consuming and complicated. “From the sampling theory perspective, the application of the model is computationally prohibitive when T is large. This is because the likelihood function depends on a (T+1)-dimensional integral of the normal distribution.” [Tsionas and Kumbhakar (2012, p. 6)] Part 22: Stochastic Frontier [41/83] Kumbhakar, Lien, Hardaker Technical Efficiency in Competing Panel Data Models: A Study of Norwegian Grain Farming, JPA, Published online, September, 2012. Three steps based on GLS: (1) RE/FGLS to estimate (,) (2) Decompose time varying residuals using MoM and SF. (3) Decompose estimates of time invariant residuals. Part 22: Stochastic Frontier [42/83] Maximum Simulated Full Information log likelihood function for the "generalized true random effects stochastic frontier model" 2 yit ( w wir | U ir |) xit T t 1 ( y ( w | U |) x ) it w ir ir it draws from N[0,1] , N 1 R logLS , = i 1 log r 1 R , w wir |Uir | absolute values of draws from N[0,1] Part 22: Stochastic Frontier [43/83] WHO Results: 2014 x 1, log Exp, log Ed , log 2 Ed z log PopDen, log PerCapitaGDP, GovtEff ,VoxPopuli, OECD, GINI it Ai Bi vit uit Part 22: Stochastic Frontier [44/83] A True FE Model Part 22: Stochastic Frontier [45/83] Schmidt et al. (2011) – Results on TFE Problem of TFE model – incidental parameters problem. Where is the bias? Estimator of u Is there a solution? Not based on OLS Chen, Schmidt, Wang: MLE for data in group mean deviation form yit yi [xit xi ] (vit vi ) (uit ui ) Trades fixed effects problem for the problem of obtaining the distribution of the deviations of the one sided terms. Derives a "within MLE" estimator. Part 22: Stochastic Frontier [46/83] Part 22: Stochastic Frontier [47/83] Health Care Systems Part 22: Stochastic Frontier [48/83] Part 22: Stochastic Frontier [49/83] Part 22: Stochastic Frontier [50/83] WHO Was Interested in Broad Goals of a Health System Part 22: Stochastic Frontier [51/83] They Created a Measure – COMP = Composite Index “In order to assess overall efficiency, the first step was to combine the individual attainments on all five goals of the health system into a single number, which we call the composite index. The composite index is a weighted average of the five component goals specified above. First, country attainment on all five indicators (i.e., health, health inequality, responsiveness-level, responsiveness-distribution, and fair-financing) were rescaled restricting them to the [0,1] interval. Then the following weights were used to construct the overall composite measure: 25% for health (DALE), 25% for health inequality, 12.5% for the level of responsiveness, 12.5% for the distribution of responsiveness, and 25% for fairness in financing. These weights are based on a survey carried out by WHO to elicit stated preferences of individuals in their relative valuations of the goals of the health system.” (From the World Health Organization Technical Report) Part 22: Stochastic Frontier [52/83] Did They Rank Countries by COMP? Yes, but that was not what produced the number 37 ranking! Part 22: Stochastic Frontier [53/83] Comparative Health Care Efficiency of 191 Countries Part 22: Stochastic Frontier [54/83] The US Ranked 37th in Efficiency! Countries were ranked by overall efficiency Part 22: Stochastic Frontier [55/83] Part 22: Stochastic Frontier [56/83] World Health Organization Variable Mean Std. Dev. Description ============================================================================== Time Varying: 1993-1997 COMP 75.0062726 12.2051123 Composite health attainment DALE 58.3082712 12.1442590 Disability adjusted life expectancy HEXP 548.214857 694.216237 Health expenditure per capita EDUC 6.31753664 2.73370613 Education Time Invariant OECD .279761905 .449149577 OECD Member country, dummy variable GDPC 8135.10785 7891.20036 Per capita GDP in PPP units POPDEN 953.119353 2871.84294 Population density GINI .379477914 .090206941 Gini coefficient for income distribution TROPICS .463095238 .498933251 Dummy variable for tropical location PUBTHE 58.1553571 20.2340835 Proportion of health spending paid by govt GEFF .113293978 .915983955 World bank government effectiveness measure VOICE .192624849 .952225978 World bank measure of democratization Application: Distinguishing Between Heterogeneity and Inefficiency: Stochastic Frontier Analysis of the World Health Organization’s Panel Data on National Health Care Systems, Health Economics, 2005 Part 22: Stochastic Frontier [57/83] WHO Results Based on FE Model Part 22: Stochastic Frontier [58/83] SF Model with Country Heterogeneity Part 22: Stochastic Frontier [59/83] Stochastic Frontier Results Part 22: Stochastic Frontier [60/83] TECHNICAL EFFICIENCY ANALYSIS CORRECTING FOR BIASES FROM OBSERVED AND UNOBSERVED VARIABLES: AN APPLICATION TO A NATURAL RESOURCE MANAGEMENT PROJECT Empirical Economics: Volume 43, Issue 1 (2012), Pages 55-72 Boris Bravo-Ureta University of Connecticut Daniel Solis University of Miami William Greene Stern School of Business, New York University Part 22: Stochastic Frontier [61/83] The MARENA Program in Honduras Several programs have been implemented to address resource degradation while also seeking to improve productivity, managerial performance and reduce poverty (and in some cases make up for lack of public support). One such effort is the Programa Multifase de Manejo de Recursos Naturales en Cuencas Prioritarias or MARENA in Honduras focusing on small scale hillside farmers. OVERALL CONCEPTUAL FRAMEWORK Part 22: Stochastic Frontier [62/83] MARENA Training & Financing Natural, Human & Social Capital Off-Farm Income More Production and Productivity More Farm Income Sustainability Working HYPOTHESIS: if farmers receive private benefits (higher income) from project activities (e.g., training, financing) then adoption is likely to be sustainable and to generate positive externalities. Part 22: Stochastic Frontier [63/83] Expected Impact Evaluation Part 22: Stochastic Frontier [64/83] Methods A matched group of beneficiaries and control farmers is determined using Propensity Score Matching techniques to mitigate biases that would stem from selection on observed variables. In addition, we deal with possible self-selection on unobservables arising from unobserved variables using a selectivity correction model for stochastic frontiers recently introduced by Greene (2010). Part 22: Stochastic Frontier [65/83] First Wave MARENA Study This paper brings together the stochastic frontier analysis with impact evaluation methodology to analyze the impact of a development program in Central America. We compare technical efficiency (TE) across treatment and control groups using cross sectional data associated with the MARENA Program in Honduras. Part 22: Stochastic Frontier [66/83] “Standard” Sample Selection Linear Model: 2 Step di = 1[′zi + hi > 0], hi ~ N[0,12] yi = + ′xi + i, i ~ N[0,2] (hi,i) ~ N2[(0,1), (1, , 2)] (yi,xi) observed only when di = 1. E[yi|xi,di=1] = + ′xi + E[i|di=1] = + ′xi + (′zi)/(′zi) = + ′xi + i. Part 22: Stochastic Frontier [67/83] MLE for Sample Selection: FIML and “2 Step” exp 12 ( i2 / 2 ) ( / ) z i i di N 2 log L(, , , , ) i 1 log 2 1 (1 d i ) ( zi ) Two – Step MLE for Sample Selection: Estimate first then treat ’zi as data. 2nd step estimation based on selected sample. exp 12 (i2 / 2 ) ( / ) ˆ z i i 2 2 log L(, , , | ˆ ) d 1 log 1 i (1 di )(ˆ z i ) Part 22: Stochastic Frontier [68/83] Stochastic Frontier Model: ML Part 22: Stochastic Frontier [69/83] Simulated logL for the Standard SF Model exp[ 12 ( yi xi u |Ui |)2 / v2 ] f ( yi | xi ,| U i |) v 2 f ( yi | xi ) |Ui | exp[ 12 ( yi xi u |Ui |)2 / v2 ] p(| Ui |)d | Ui | v 2 2exp[ 12 | U i |2 ] p(| U i |) , |U i | 0. (Half normal) 2 1 R exp[ 12 ( yi xi u |Uir |)2 / v2 ] f ( y | xi ) R r 1 v 2 2 2 1 R exp[ 12 ( yi xi u |Uir |) / v ] logLS (,,u ,v ) = i =1 log r 1 R 2 v N This is simply a linear regression with a random constant term, αi = α - σu |Ui | Part 22: Stochastic Frontier [70/83] A Sample Selected SF Model di = 1[′zi + hi > 0], hi ~ N[0,12] yi = + ′xi + i, i ~ N[0,2] (yi,xi) observed only when di = 1. i = vi - ui ui = u|Ui| where Ui ~ N[0,12] vi = vVi where Vi ~ N[0,12]. (hi,vi) ~ N2[(0,1), (1, v, v2)] Part 22: Stochastic Frontier [71/83] Likelihood For a Sample Selected SF Model f yi | ( x i , d i , zi ,| U i |) exp 12 ( yi x i u | U i |)2 / v2 ) v 2 di ( yi x i u | U i |) / zi 2 1 f yi | ( x i , d i , zi ) |U i | (1 d i ) ( zi ) f yi | ( xi , d i , zi ,| U i |) f (| U i |)d | U i | Part 22: Stochastic Frontier [72/83] Simulated Log Likelihood for a Selectivity Corrected Stochastic Frontier Model The simulation is over the inefficiency term. log LS (, , u , v , , ) i 1 log N 1 R R r 1 exp 12 ( yi x i u | U ir |) 2 / v2 ) di v 2 ( y x | U |) / z i i u ir i 2 1 (1 d ) ( z ) i i Part 22: Stochastic Frontier [73/83] A 2 Step MSL Approach Estimate – Probit MLE for selection mechanism Estimate [,β,σv,σu,ρ] by maximum simulated likelihood using selected observations, conditioned on the estimate of . 2nd step standard errors corrected by Murphy-Topel. Part 22: Stochastic Frontier [74/83] 2nd Step of the MSL Approach log LS ,C (, , u , v , ) d 1 log i 1 R R r 1 exp 12 ( yi xi u | U ir |)2 / v2 ) di v 2 ( yi xi u | U ir |) / v ai 2 1 (1 d )(a ) i i where ai = ˆ zi 1 R log LS ,C (, , u , v , ) d 1 log r 1 i R exp 12 ( yi xi u | U ir |)2 / v2 ) v 2 ( yi xi u | Uir |) / v ai 2 1 Part 22: Stochastic Frontier [75/83] JLMS Estimator of ui exp 12 ( yi ˆ ˆ x i ˆ u | U ir |) 2 / ˆ v2 ) ˆ v 2 ˆf ir ˆ ( yi ˆ ˆ x i ˆ u | U ir |) / ˆ v ai 2 1 ˆ 1 R 1 R Aˆi = r 1 ( ˆ u | U ir |) fˆir , Bˆi r 1 fˆir R R Aˆi uˆi Estimator of E [ui |i ] Bˆi R R fˆir r 1 gˆ ir | ˆ uU ir | where gˆ ir R , r 1 gˆ ir 1 r 1 fˆir Part 22: Stochastic Frontier [76/83] Variables Used in the Analysis Production Participation Part 22: Stochastic Frontier [77/83] Findings from the First Wave B C U M = = = = Benefits recipients Controls Unmatched Sample Matched Subsamples (Propensity Score Matching) Part 22: Stochastic Frontier [78/83] Findings from the first Wave Avg. TE for Beneficiaries is 71% in all models except for BENEF-U-SS where average TE is 80%. Average TE for control farmers ranges from 39% (CONTROL-U) to 66% (CONTROL-U-SS). TE gap between beneficiaries and control decreases with matching. This result is expected since PSM makes both studied samples comparable. Correcting for Sample Selection further decreases this gap. TE for Beneficiaries remains consistently higher than for control farmers. Part 22: Stochastic Frontier [79/83] A Panel Data Model Selection takes place only at the baseline. There is no attrition. d i 0 1[zi 0 hi 0 > 0] Sample Selector yit wi x it vit uit , t 0,1,... Stochastic Frontier Selection effect is exerted on wi ; Corr(hi 0 , wi ,) P( yit , d i 0 ) P(d i 0 ) P( yit | d i 0 ) Conditioned on the selection (hi 0 ) observations are independent. P( yi 0 , yi1 ,..., yiT | d i 0 ) t 0 P( yit | d i 0 ) T I.e., the selection is acting like a permanent random effect. P( yi 0 , yi1 ,..., yiT , d i 0 ) P( d i 0 ) t 0 P( yit | d i 0 ) T Part 22: Stochastic Frontier [80/83] Simulated Log Likelihood Using the Two Step Approach log LS ,C (, , u , v , ) 1 R d 1 log r 1 i R T t 0 exp 12 ( yit xit u | U itr |)2 / v2 ) v 2 ( yit xit u | U itr |) / v ai 0 2 1 Part 22: Stochastic Frontier [81/83] Main Empirical Conclusions from Waves 0 and 1 Benefit group is more efficient in both years The gap is wider in the second year Both means increase from year 0 to year 1 Both variances decline from year 0 to year 1 Part 22: Stochastic Frontier [82/83] Part 22: Stochastic Frontier [83/83]