Part 2A: Basic Econometrics [ 1/75] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.

Download Report

Transcript Part 2A: Basic Econometrics [ 1/75] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.

Part 2A: Basic Econometrics [ 1/75]
Econometric Analysis of Panel Data
William Greene
Department of Economics
Stern School of Business
Part 2A: Basic Econometrics [ 2/75]
Endogeneity



y = X+ε,
Definition: E[ε|x]≠0
Why not?






Omitted variables
Unobserved heterogeneity (equivalent to omitted
variables)
Measurement error on the RHS (equivalent to omitted
variables)
Structural aspects of the model
Endogenous sampling and attrition
Simultaneity (?)
Part 2A: Basic Econometrics [ 3/75]
Instrumental Variable Estimation




One “problem” variable – the “last” one
yit = 1x1it + 2x2it + … + KxKit + εit
E[εit|xKit] ≠ 0. (0 for all others)
There exists a variable zit such that



E[xKit| x1it, x2it,…, xK-1,it,zit] = g(x1it, x2it,…, xK-1,it,zit)
In the presence of the other variables, zit “explains” xit
E[εit| x1it, x2it,…, xK-1,it,zit] = 0
In the presence of the other variables, zit and εit are
uncorrelated.
A projection interpretation: In the projection
XKt =θ1x1it,+ θ2x2it + … + θk-1xK-1,it + θK zit,
θK ≠ 0.
Part 2A: Basic Econometrics [ 4/75]
The First IV Study: Natural Experiment
(Snow, J., On the Mode of Communication of Cholera, 1855)
http://www.ph.ucla.edu/epi/snow/snowbook3.html


London Cholera epidemic, ca 1853-4
Cholera = f(Water Purity,u)+ε.


‘Causal’ effect of water purity on cholera?
Purity=f(cholera prone environment (poor, garbage
in streets, rodents, etc.). Regression does not work.
Two London water companies
Lambeth
Southwark
River
Thames
Main sewage discharge
Paul Grootendorst: A Review of Instrumental Variables Estimation of Treatment Effects…
http://individual.utoronto.ca/grootendorst/pdf/IV_Paper_Sept6_2007.pdf
Part 2A: Basic Econometrics [ 5/75]
IV Estimation





Cholera=f(Purity,u)+ε
Z = water company
Cov(Cholera,Z)=δCov(Purity,Z)
Z is randomly mixed in the population (two full
sets of pipes) and uncorrelated with behavioral
unobservables, u)
Cholera=α+δPurity+u+ε


Purity = Mean+random variation+λu
Cov(Cholera,Z)= δCov(Purity,Z)
Part 2A: Basic Econometrics [ 6/75]
Cornwell and Rupert Data
Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years
Variables in the file are
EXP
WKS
OCC
IND
SOUTH
SMSA
MS
FEM
UNION
ED
LWAGE
=
=
=
=
=
=
=
=
=
=
=
work experience
weeks worked
occupation, 1 if blue collar,
1 if manufacturing industry
1 if resides in south
1 if resides in a city (SMSA)
1 if married
1 if female
1 if wage set by union contract
years of education
log of wage = dependent variable in regressions
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel
Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied
Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The
data were downloaded from the website for Baltagi's text.
Part 2A: Basic Econometrics [ 7/75]
Part 2A: Basic Econometrics [ 8/75]
Specification: Quadratic Effect of Experience
Part 2A: Basic Econometrics [ 9/75]
The Effect of Education on LWAGE
LWAGE  1  2EDUC  3EXP  4EXP2  ...  ε
What is ε? Ability, Motivation,... + everything else
EDUC = f(GENDER, SMSA, SOUTH, Ability, Motivation,...)
Part 2A: Basic Econometrics [ 10/75]
What Influences LWAGE?
LWAGE  1  2EDUC( X, Ability, Motivation,...)
 3EXP  4EXP2  ...
 ε(Ability, Motivation)
Increased Ability is associated with increases in
EDUC( X, Ability, Motivation,...) and ε(Ability, Motivation)
What looks like an effect due to increase in EDUC may
be an increase in Ability. The estimate of 2 picks up
the effect of EDUC and the hidden effect of Ability.
Part 2A: Basic Econometrics [ 11/75]
An Exogenous Influence
LWAGE  1  2EDUC( X, Z, Ability, Motivation,...)
 3EXP  4EXP2  ...
 ε(Ability, Motivation)
Increased Z is associated with increases in
EDUC( X, Z, Ability, Motivation,...) and not ε(Ability, Motivation)
An effect due to the effect of an increase Z on EDUC will
only be an increase in EDUC. The estimate of 2 picks up
the effect of EDUC only.
Z is an Instrumental Variable
Part 2A: Basic Econometrics [ 12/75]
Instrumental Variables

Structure



LWAGE (ED,EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION)
ED (MS, FEM)
Reduced Form:
LWAGE[ ED (MS, FEM),
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION ]
Part 2A: Basic Econometrics [ 13/75]
Two Stage Least Squares Strategy


Reduced Form:
LWAGE[ ED (MS, FEM,X),
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION ]
Strategy



(1) Purge ED of the influence of everything but MS,
FEM (and the other variables). Predict ED using all
exogenous information in the sample (X and Z).
(2) Regress LWAGE on this prediction of ED and
everything else.
Standard errors must be adjusted for the predicted ED
Part 2A: Basic Econometrics [ 14/75]
OLS
Part 2A: Basic Econometrics [ 15/75]
The weird results for the
coefficient on ED happened
because the instruments,
MS and FEM are dummy
variables. There is not
enough variation in these
variables.
Part 2A: Basic Econometrics [ 16/75]
Source of Endogeneity


LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + 
ED
= f(MS,FEM,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + u
Part 2A: Basic Econometrics [ 17/75]
Remove the Endogeneity



LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + u + 
LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + u + 
Strategy


Estimate u
Add u to the equation. ED is uncorrelated with  when u is
in the equation.
Part 2A: Basic Econometrics [ 18/75]
Auxiliary Regression for
ED to Obtain Residuals
Part 2A: Basic Econometrics [ 19/75]
OLS with Residual (Control Function) Added
2SLS
Part 2A: Basic Econometrics [ 20/75]
A Warning About Control Functions
Sum of squares is not computed correctly because U is in the regression.
A general result. Control function estimators usually require a fix to the
estimated covariance matrix for the estimator.
Part 2A: Basic Econometrics [ 21/75]
On Sat, May 3, 2014 at 4:48 PM, … wrote:
Dear Professor Greene,
I am giving an Econometrics course in Brazil and we are using
your great textbook. I got a question which I think only you can
help me. In our last class, I did a formal proof that
var(beta_hat_OLS) is lower or equal than var(beta_hat_2SLS),
under homoscedasticity.
We know this assertive is also valid under heteroscedasticity, but
a graduate student asked me the proof (which is my problem).
Do you know where can I find it?
Part 2A: Basic Econometrics [ 22/75]
Part 2A: Basic Econometrics [ 23/75]
Part 2A: Basic Econometrics [ 24/75]
Part 2A: Basic Econometrics [ 25/75]
The General Problem
y  X1  X 2   
Cov( X1 , )  0, K1 variables
Cov( X 2 , )  0, K 2 variables
X 2 is endogenous
OLS regression of y on (X1 ,X 2 ) cannot estimate (, )
consistently. Some other estimator is needed.
Additional structure:
X 2 =Z +V where Cov(V,)  0 but Cov(Z,)= 0.
An instrumental variable (IV) estimator based on (X1 ,X 2 , Z) may
be able to estimate (,) consistently.
Part 2A: Basic Econometrics [ 26/75]
Instrumental Variables


Framework: y = X + , K variables in X.
There exists a set of K variables, Z such that
plim(Z’X/n)  0 but plim(Z’/n) = 0

The variables in Z are called instrumental variables.
An alternative (to least squares) estimator of  is
bIV = (Z’X)-1Z’y

We consider the following:



Why use this estimator?
What are its properties compared to least squares?
We will also examine an important application
Part 2A: Basic Econometrics [ 27/75]
IV Estimators
* Consistent
bIV = (Z’X)-1Z’y
= (Z’X/n)-1 (Z’X/n)β+ (Z’X/n)-1Z’ε/n
= β+ (Z’X/n)-1Z’ε/n  β
* Asymptotically normal (same approach to proof as for OLS)
* Inefficient – to be shown.
* By construction, the IV estimator is consistent. We have an
estimator that is consistent when least squares is not.
Part 2A: Basic Econometrics [ 28/75]
IV Estimation
Why use an IV estimator? Suppose that X and
 are not uncorrelated. Then least squares is
neither unbiased nor consistent.
Recall the proof of consistency of least squares:
b =  + (X’X/n)-1(X’/n).
Plim b =  requires plim(X’/n) = 0. If this does
not hold, the estimator is inconsistent.
Part 2A: Basic Econometrics [ 29/75]
A Popular Misconception
If only one variable in X is correlated with , the other coefficients are
consistently estimated. False.
Suppose only the first variable is correlated with ε
 1 
 
0
Under the assumptions, plim( X'ε /n) =   . Then
 ... 
 
 . 
 q11 
 1 
 21 
 
0
q 
plim b - β = plim(X'X /n)-1    1 
 ... 
 ... 
 K 1 
 
.
 
q 
 1 times the first column of Q-1
The problem is “smeared” over the other coefficients.
Part 2A: Basic Econometrics [ 30/75]
Consistency and Asymptotic
Normality of the IV Estimator
ˆ = (Z'X )-1Z'y
β
= β + (Z'X )-1Z'ε
ˆ - β ) = plim (Z'X/N)-1 (Z'ε/N)
plim (β
=
Q-1
ZX  0
ˆ - β] = Q-1 Asy.Var[Z'ε/N]Q-1
Asy.Var[β
ZX
XZ
-1
-1
-1
= (1/N)Q-1
ZX [plim Z'ΩZ/N]Q XZ = (1/N)Q ZX Ω ZZ Q XZ
(Assuming "well behaved" data)
 z 
ˆ - β ) = (Z'X/N)-1 N  i,t it it  = (Z'X/N)-1 N w
N (β
 N 
Invoke Slutsky and Lindberg-Feller for N w to assert asymptotic normality.
ˆ] with
Estimate Asy.Var[β
ˆ )2
i,t (y it  x itβ
N or (N-K)
[Z'X ]-1 [Z'Z][X Z]-1
Part 2A: Basic Econometrics [ 31/75]
Asymptotic Covariance Matrix of bIV
1
bIV    (Z'X) Z ' 
(bIV  )(bIV  ) '  (Z'X) 1 Z ' 'Z(X'Z)-1
E[(bIV  )(bIV  ) ' | X, Z]  2 (Z'X) 1 Z ' Z(X'Z)-1
Part 2A: Basic Econometrics [ 32/75]
Asymptotic Efficiency
Asymptotic efficiency of the IV estimator. The variance is
larger than that of LS. (A large sample type of GaussMarkov result is at work.)
(1) It’s a moot point. LS is inconsistent.
(2) Mean squared error is uncertain:
MSE[estimator|β]=Variance + square of bias.
IV may be better or worse. Depends on the data
Part 2A: Basic Econometrics [ 33/75]
Two Stage Least Squares
How to use an “excess” of instrumental variables
(1) X is K variables. Some (at least one) of the K
variables in X are correlated with ε.
(2) Z is M > K variables. Some of the variables in
Z are also in X, some are not. None of the
variables in Z are correlated with ε.
(3) Which K variables to use to compute Z’X and Z’y?
Part 2A: Basic Econometrics [ 34/75]
Choosing the Instruments




Choose K randomly?
Choose the included Xs and the remainder randomly?
Use all of them? How?
A theorem: (Brundy and Jorgenson, ca. 1972) There is
a most efficient way to construct the IV estimator from
this subset:



(1) For each column (variable) in X, compute the predictions
of that variable using all the columns of Z.
(2) Linearly regress y on these K predictions.
This is two stage least squares
Part 2A: Basic Econometrics [ 35/75]
Algebraic Equivalence
Two stage least squares is equivalent to


(1) each variable in X that is also in Z is replaced by
itself.
(2) Variables in X that are not in Z are replaced by
predictions of that X using


All other variables in X that are not correlated with ε
All the variables in Z that are not in X.
Part 2A: Basic Econometrics [ 36/75]
The weird results for the
coefficient on ED happened
because the instruments,
MS and FEM are dummy
variables. There is not
enough variation in these
variables.
Part 2A: Basic Econometrics [ 37/75]
2SLS Algebra
ˆ  Z(Z'Z)-1 Z'X
X
ˆ ˆ )1 X'y
ˆ
b2SLS  ( X'X
But, Z(Z'Z)-1 Z'X = (I - MZ )X and (I - MZ ) is idempotent.
ˆ ˆ = X'(I - MZ )(I - MZ )X = X'(I - MZ )X so
X'X
ˆ )1 X'y
ˆ = a real IV estimator by the definition.
b2SLS  ( X'X
ˆ /n) = 0 since columns of X
ˆ are linear combinations
Note, plim(X'
of the columns of Z, all of which are uncorrelated with 
b2SLS  X'(I - MZ )X ]-1 X'(I - MZ )y
Part 2A: Basic Econometrics [ 38/75]
Asymptotic Covariance Matrix for 2SLS
General Result for Instrumental Variable Estimation
E[(bIV  )(bIV  ) ' | X, Z]  2 (Z'X) 1 Z ' Z(X'Z)-1
ˆ = (I - MZ ) X
Specialize for 2SLS, using Z = X
ˆ ) 1 X
ˆ 'X
ˆ (X'X
ˆ )-1
E[(b2SLS  )(b2SLS  ) ' | X, Z]  2 ( X'X
ˆ ˆ ) 1 X
ˆ'X
ˆ (X'X
ˆ ˆ )-1
 2 ( X'X
ˆ ˆ )1
 2 ( X'X
Part 2A: Basic Econometrics [ 39/75]
2SLS Has Larger Variance than LS
A comparison to OLS
ˆ 'X
ˆ )-1
Asy.Var[2SLS]=2 ( X
Neglecting the inconsistency,
Asy.Var[LS]
=2 ( X ' X)-1
(This is the variance of LS around its mean, not β)
Asy.Var[2SLS]  Asy.Var[LS] in the matrix sense.
Compare inverses:
ˆ'X
ˆ]
{Asy.Var[LS]} -1 - {Asy.Var[2SLS]} -1  (1 / 2 )[ X ' X - X
 (1 / 2 )[X ' X - X '(I  MZ ) X ]=(1 / 2 )[X ' MZ X ]
This matrix is nonnegative definite. (Not positive definite
as it might have some rows and columns which are zero.)
Implication for "precision" of 2SLS.
The problem of "Weak Instruments"
Part 2A: Basic Econometrics [ 40/75]
Estimating σ2
Estimating the asymptotic covariance matrix a caution about estimating 2 .
ˆ,
Since the regression is computed by regressing y on x
one might use
2
n
ˆ 2sls )

ˆ  1n i1 (y i  x'b
This is inconsistent. Use
2
n

ˆ  1n i1 (y i  x'b2sls )
(Degrees of freedom correction is optional. Conventional,
but not necessary.)
Part 2A: Basic Econometrics [ 41/75]
Robust estimation of VC
Counterpart to the White estimator allows heteroscedasticity
“Predicted” X


ˆ )2 x
ˆ ˆ )-1 i,t (y it  xitβ
ˆ ˆ )-1
ˆ it x
ˆ it (X'X
Est.Asy.Var[]=(X'X
“Actual” X
Part 2A: Basic Econometrics [ 42/75]
2SLS vs. Robust Standard Errors
+--------------------------------------------------+
| Robust Standard Errors
|
+---------+--------------+----------------+--------+
|Variable | Coefficient | Standard Error |b/St.Er.|
+---------+--------------+----------------+--------+
B_1
45.4842872
4.02597121
11.298
B_2
.05354484
.01264923
4.233
B_3
-.00169664
.00029006
-5.849
B_4
.01294854
.05757179
.225
B_5
.38537223
.07065602
5.454
B_6
.36777247
.06472185
5.682
B_7
.95530115
.08681261
11.000
+--------------------------------------------------+
| 2SLS Standard Errors
|
+---------+--------------+----------------+--------+
|Variable | Coefficient | Standard Error |b/St.Er.|
+---------+--------------+----------------+--------+
B_1
45.4842872
.36908158
123.236
B_2
.05354484
.03139904
1.705
B_3
-.00169664
.00069138
-2.454
B_4
.01294854
.16266435
.080
B_5
.38537223
.17645815
2.184
B_6
.36777247
.17284574
2.128
B_7
.95530115
.20846241
4.583
Part 2A: Basic Econometrics [ 43/75]
Endogeneity Test? (Hausman)
Exogenous
Endogenous
OLS
Consistent, Efficient
Inconsistent
2SLS
Consistent, Inefficient
Consistent
Base a test on d = b2SLS - bOLS
Use a Wald statistic, d’[Var(d)]-1d
What to use for the variance matrix?
Hausman: V2SLS - VOLS
Part 2A: Basic Econometrics [ 44/75]
Hausman Test
Part 2A: Basic Econometrics [ 45/75]
Hausman Test: One at a Time?
Part 2A: Basic Econometrics [ 46/75]
Endogeneity Test: Wu



Considerable complication in Hausman test
(text, pp. 234-237)
Simplification: Wu test.
ˆ estimated for the
Regress y on X and X
endogenous part of X. Then use an ordinary
Wald test.
Part 2A: Basic Econometrics [ 47/75]
Wu Test
Part 2A: Basic Econometrics [ 48/75]
Regression Based Endogeneity Test
An easy t test. (Wooldridge 2010, p. 127)
y it  x it δ  qit   it
Z = a set of M instruments.
Write q = Zπ + v
Can be estimated by ordinary least squares.
Endogeneity concerns correlation between v and .
Add ˆ
v = q - z
ˆ to the equation and use OLS
y it  x it δ  qit   ˆ
v it  + {it  error}
Simple t test on whether  equals 0.
ˆ
Even easier, algebraically identical, (Wu, 1973), add q
to the equation and do the same test.
Part 2A: Basic Econometrics [ 49/75]
Testing Endogeneity of WKS
(1) Regress WKS on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,MS.
U=residual, WKSHAT=prediction
(2) Regress LWAGE on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,WKS, U or WKSHAT
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
-9.97734299
.75652186
-13.188
.0000
EXP
.01833440
.00259373
7.069
.0000
19.8537815
EXPSQ
-.799491D-04
.603484D-04
-1.325
.1852
514.405042
OCC
-.28885529
.01222533
-23.628
.0000
.51116447
SOUTH
-.26279891
.01439561
-18.255
.0000
.29027611
SMSA
.03616514
.01369743
2.640
.0083
.65378151
WKS
.35314170
.01638709
21.550
.0000
46.8115246
U
-.34960141
.01642842
-21.280
.0000 -.341879D-14
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
-9.97734299
.75652186
-13.188
.0000
EXP
.01833440
.00259373
7.069
.0000
19.8537815
EXPSQ
-.799491D-04
.603484D-04
-1.325
.1852
514.405042
OCC
-.28885529
.01222533
-23.628
.0000
.51116447
SOUTH
-.26279891
.01439561
-18.255
.0000
.29027611
SMSA
.03616514
.01369743
2.640
.0083
.65378151
WKS
.00354028
.00116459
3.040
.0024
46.8115246
WKSHAT
.34960141
.01642842
21.280
.0000
46.8115246
Part 2A: Basic Econometrics [ 50/75]
General Test for Endogeneity
Extending the Wu test
y = X1β1 + X 2β2 + ε
There are M  K 2 instruments Z.
ˆ 2 column
To carry out the test, compute X
by column with regressions on Z.
Compute by OLS the extended model
ˆ 2  + error
y = X1β1 + X 2β2 + X
Use an F test to test H0 :   0
Part 2A: Basic Econometrics [ 51/75]
Alternative to Hausman’s Formula?



H test requires the difference between an
efficient and an inefficient estimator.
Any way to compare any two competing
estimators even if neither is efficient?
Bootstrap? (Maybe)
Part 2A: Basic Econometrics [ 52/75]
Part 2A: Basic Econometrics [ 53/75]
Weak Instruments



Symptom: The relevance condition, plim Z’X/n not zero, is close
to being violated.
Detection:

Standard F test in the regression of xk on Z. F < 10 suggests a
problem.

F statistic based on 2SLS – see text p. 351.
Remedy:

Not much – most of the discussion is about the condition, not
what to do about it.

Use LIML? Requires a normality assumption. Probably not too
restrictive.
Part 2A: Basic Econometrics [ 54/75]
Weak Instruments (cont.)
ˆ = β + [Cov(Z, X)]-1Cov(Z, ε)
plim β
If Cov(Z, ε) is "small" but nonzero, small
Cov(Z, X) may hugely magnify the effect.
IV is not only inefficient, may be very badly
biased by "weak" instruments.
Solutions? Can one "test" for weak instruments?
Part 2A: Basic Econometrics [ 55/75]
Weak Instruments
Which is better?
LS is inconsistent, but probably has smaller variance
LS may be more precise
IV is consistent, but probably has larger variance
ˆ] = Q-1 Ω Q-1
Asy.Var[β
ZX ZZ XZ
Q-1XZ may be large. (Compared to what?)
Strange results with "small" QZX
IV estimator tends to resemble OLS (bias) (not a
function of sample size).
Contradictory result. Suppose z is perfectly correlated
with x. IV MUST be the same as OLS.
Part 2A: Basic Econometrics [ 56/75]
A study of moral hazard
Riphahn, Wambach, Million: “Incentive Effects in
the Demand for Healthcare”
Journal of Applied Econometrics, 2003
Did the presence of the ADDON insurance
influence the demand for health care – doctor
visits and hospital visits?
For a simple example, we examine the PUBLIC
insurance (89%) instead of ADDON insurance (2%).
Part 2A: Basic Econometrics [ 57/75]
Application: Health Care Panel Data
German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods
Variables in the file are
Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293
individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary
choice. This is a large data set. There are altogether 27,326 observations. The number of observations
ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). Note, the
variable NUMOBS below tells how many observations there are for each person. This variable is repeated in each
row of the data for the person. (Downloaded from the JAE Archive)
DOCTOR
HOSPITAL
HSAT
DOCVIS
HOSPVIS
PUBLIC
ADDON
HHNINC
HHKIDS
EDUC
AGE
MARRIED
EDUC
= 1(Number of doctor visits > 0)
= 1(Number of hospital visits > 0)
= health satisfaction, coded 0 (low) - 10 (high)
= number of doctor visits in last three months
= number of hospital visits in last calendar year
= insured in public health insurance = 1; otherwise = 0
= insured by add-on insurance = 1; otherswise = 0
= household nominal monthly net income in German marks / 10000.
(4 observations with income=0 were dropped)
= children under age 16 in the household = 1; otherwise = 0
= years of schooling
= age in years
= marital status
= years of education
Part 2A: Basic Econometrics [ 58/75]
Evidence of Moral Hazard?
Part 2A: Basic Econometrics [ 59/75]
Regression Study
Part 2A: Basic Econometrics [ 60/75]
Endogenous Dummy Variable


Doctor Visits = f(Age, Educ, Health,
Presence of Insurance,
Other unobservables)
Insurance
= f(Expected Doctor Visits,
Other unobservables)
Part 2A: Basic Econometrics [ 61/75]
Approaches



(Parametric) Control Function: Build a structural
model for the two variables (Heckman)
(Semiparametric) Instrumental Variable: Create
an instrumental variable for the dummy variable
(Barnow/Cain/ Goldberger, Angrist, Current
generation of researchers)
(?) Propensity Score Matching (Heckman et al.,
Becker/Ichino, Many recent researchers)
Part 2A: Basic Econometrics [ 62/75]
Heckman’s Control Function Approach


Y = xβ + δT + E[ε|T] + {ε - E[ε|T]}
λ = E[ε|T] , computed from a model for whether T = 0 or 1
Magnitude = 11.1200 is nonsensical in this context.
Part 2A: Basic Econometrics [ 63/75]
Instrumental Variable Approach
Construct a prediction for T using only the exogenous information
Use 2SLS using this instrumental variable.
Magnitude = 23.9012 is also nonsensical in this context.
Part 2A: Basic Econometrics [ 64/75]
Propensity Score Matching



Create a model for T that produces probabilities for T=1: “Propensity Scores”
Find people with the same propensity score – some with T=1, some with T=0
Compare number of doctor visits of those with T=1 to those with T=0.
Part 2A: Basic Econometrics [ 65/75]
Treatment Effect


Earnings and Education: Effect of an additional
year of schooling
Estimating Average and Local Average
Treatment Effects of Education when
Compulsory Schooling Laws Really Matter


Philip Oreopoulos
AER, 96,1, 2006, 152-175
Part 2A: Basic Econometrics [ 66/75]
Treatment Effects and
Natural Experiments
Part 2A: Basic Econometrics [ 67/75]
How do panel data fit into this?

We can use the usual models.



Observations are surely correlated.




We can use far more elaborate models
We can study effects through time
The same individual is observed more than once
Unobserved heterogeneity that appears in the disturbance in a
cross section remains persistent across observations (on the
same ‘unit’).
Procedures must be adjusted.
Dynamic effects are likely to be present.
Part 2A: Basic Econometrics [ 68/75]
Appendix:
Structure and
Regression
Part 2A: Basic Econometrics [ 69/75]
Least Squares Revisited
b = (X'X )-1 X'y = β + (X'X )-1 X'ε
plim b = β + plim (X'X/N)-1plim(X'ε/N) = β + Q-1
XX γ
-1
-1
b - plim b = β + (X'X )-1 X'ε - (β + Q -1
γ
)
=
(
X'X/
N)
(
X'ε/
N)
Q
XX
XX γ
-1
-1
Asy.Var[b - Q-1
XX γ] = Q XX Asy.Var[X'ε/N]Q XX
-1
-1
-1
= (1/N)Q-1
XX [plim X'ΩX/N] Q XX = (1/N) Q XX Ω XX Q XX
(Assuming "well behaved" data)
 x 

N(b - plim b ) = (X'X/N)-1 N  i,t it it  Q-1
γ
XX 
N


= (X'X/N)-1 N (w - plim w)
Invoke Slutsky and Lindberg-Feller for N (w - plim w) to assert asymptotic
normality. b is also asymptotically normally distributed, but inconsistent.
Part 2A: Basic Econometrics [ 70/75]
Inference with IV Estimators
(1) Wald Statistics:
ˆ - q)' {Est.Asy.Var[β]}
ˆ -1 (Rβ
ˆ - q)
(Rβ
(E.g., the usual 't-statistics')
(2) A type of F statistic:
ˆ )'( y  Xβ
ˆ ) without restrictions
Compute SSUA=( y  Xβ
u
u
ˆ ˆ R )'( y  Xβ
ˆ ˆ R ) with restrictions
Compute SSR=( y  Xβ
ˆ ˆ U )'( y  Xβ
ˆ ˆ U ) without restrictions
Compute SSU=( y  Xβ
(SSR  SSU) / J
F=
~ F[J,N  K]
SSUA/(N-K)
Part 2A: Basic Econometrics [ 71/75]
Comparing OLS and IV
Least squares b
plim b = β + Q-1XX 
Asy.Var[b] = Q-1XX Ω XX Q-1XX
Precision = Mean squared error[b|β]
= Asy.Var[b] + [plim(b - β)][plim(b - β)]
= Q-1XX Ω XX Q-1XX  Q-1XX  Q-1XX
ˆ
Instrumental variables β
ˆ=β
plim β
ˆ] = Q-1 Ω Q-1
Asy.Var[β
ZX ZZ XZ
ˆ|β]
Precision = Mean squared error[β
-1
= Q-1
Ω
Q
ZX ZZ XZ
Part 2A: Basic Econometrics [ 72/75]
Testing for Endogeneity(?)
A test for endogeneity? Consider one variable:
y it  x it δ  qit   it ==> y = Xβ + ε
q may be endogenous. Z = a set of M instruments.
(1) OLS: b = (X'X )-1 X'y Inconsistent if Cov[q,]  0.
Consistent and efficient if Cov[q,] = 0.
ˆ = (X'X
ˆ
ˆ ˆ )-1 X'y
Always consistent.
(2) 2SLS: β
Inefficient if Cov[q,] = 0.
-1 1 ˆ
2
2
ˆ - b]'{
ˆ ˆ )-1  
- b]'
Hausman test? [β
ˆ  ,OLS (X'X ) } [β
ˆ  ,2SLS (X'X
(a) Need to use the same estimator of 2 .
1
2

ˆ  ,2SLS


ˆ - b]'
ˆ - b]' {(X'X
ˆ ˆ )-1  (X'X )-1 } 1 [β
[β
(b) Even with this fix, the resulting matrix is singular.
Part 2A: Basic Econometrics [ 73/75]
Structure vs. Regression


Reduced Form vs. Stuctural Model
Simultaneous equations origin



Q(d) = a0 + a1P + a2I + e(d) (demand)
Q(s) = b0 + b1P + b2R + e(s) (supply)
Q(.) = Q(d) = Q(s)
What is the effect of a change in I on Q(.)?
(Not a regression)
Reduced form: Q = c0 + c1I + c2R + v.
(Regression)
Modern concepts of structure vs. regression:
The search for causal effects.
Part 2A: Basic Econometrics [ 74/75]
Implications



The structure is the theory
The regression is the conditional mean
There is always a conditional mean



It may not equal the structure
It may be linear in the same variables
What is the implication for least squares estimation?



LS estimates regressions
LS does not necessarily estimate structures
Structures may not be estimable – they may not be
identified.
Part 2A: Basic Econometrics [ 75/75]
Structure and Regression


Simultaneity? What if E[ε|x]≠0
y=x+ε, x=δy+u. Cov[x, ε]≠0


x is not the regression?
What is the regression?




Reduced form: Assume ε and u are uncorrelated.
y = [/(1- δ)]u + [1/(1- δ)]ε
x= [1/(1- δ)]u + [δ /(1- δ)]ε
Cov[x,y]/Var[x] = 
 [u2  2 ] /[u2  2 2 ]
 w   (1  w )(1 / ) where w=u2 /[u2  22 ]

The regression is y = x + v, where E[v|x]=0
Part 2A: Basic Econometrics [ 76/75]
Structure vs. Regression
Supply = a + b*Price + c*Capacity
Demand = A + B*Price + C*Income