Econometrics I Professor William Greene Stern School of Business Department of Economics 13-1/62 Part 13: Endogeneity.

Download Report

Transcript Econometrics I Professor William Greene Stern School of Business Department of Economics 13-1/62 Part 13: Endogeneity.

Econometrics I
Professor William Greene
Stern School of Business
Department of Economics
13-1/62
Part 13: Endogeneity
Econometrics I
Part 13 - Endogeneity
13-2/62
Part 13: Endogeneity
I am here to ask a little help for endogeneity.
I have a main regression, in which the independent variabels are lagged 1 year
(this is an unbalanced panel dataset); I use fixed effect, xtreg:
Main Regression: Yt = Xt-1 + Qt-1 + Z3t-1
I suspect endogeneity: variable X may be itself determined by prior-year Y.
As a solution, I read this strategy: regress the endogenous variable Xt-1 on the
dependent variable (Yt-2) and other independent variables (i.e., Qt-2 and Zt-2);
these Y Q and Z are all in year t-2, while X is in t-1. Then, from this regression,
calculate the “predicted” values for X, and include them as a control-forendogeneity (e.g., a variable named “Endogeneity-control”) in the main
regression above.
Question 1: in the Main Regression above, when including the control for
endogeneity (i.e., the variable “Endogeneity-control”), do I have to lag its value?
That is, do I have to include Endogeneity-control in t-1? or just the predicted
values, without lagging?
13-3/62
Part 13: Endogeneity
Cornwell and Rupert Data
Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years
Variables in the file are
EXP
WKS
OCC
IND
SOUTH
SMSA
MS
FEM
UNION
ED
LWAGE
=
=
=
=
=
=
=
=
=
=
=
work experience
weeks worked
occupation, 1 if blue collar,
1 if manufacturing industry
1 if resides in south
1 if resides in a city (SMSA)
1 if married
1 if female
1 if wage set by union contract
years of education
log of wage = dependent variable in regressions
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel
Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied
Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The
data were downloaded from the website for Baltagi's text.
13-4/62
Part 13: Endogeneity
Specification: Quadratic Effect of Experience
13-5/62
Part 13: Endogeneity
The Effect of Education on LWAGE
LWAGE  1  2EDUC  3EXP  4EXP2  ...  ε
What is ε? Ability, Motivation,... + everything else
EDUC = f(GENDER, SMSA, SOUTH, Ability, Motivation,...)
13-6/62
Part 13: Endogeneity
What Influences LWAGE?
LWAGE  1  2EDUC( X, Ability, Motivation,...)
 3EXP  4EXP2  ...
 ε(Ability, Motivation)
Increased Ability is associated with increases in
EDUC( X, Ability, Motivation,...) and ε(Ability, Motivation)
What looks like an effect due to increase in EDUC may
be an increase in Ability. The estimate of 2 picks up
the effect of EDUC and the hidden effect of Ability.
13-7/62
Part 13: Endogeneity
An Exogenous Influence
LWAGE  1  2EDUC( X, Z, Ability, Motivation,...)
 3EXP  4EXP2  ...
 ε(Ability, Motivation)
Increased Z is associated with increases in
EDUC( X, Z, Ability, Motivation,...) and not ε(Ability, Motivation)
An effect due to the effect of an increase Z on EDUC will
only be an increase in EDUC. The estimate of 2 picks up
the effect of EDUC only.
Z is an Instrumental Variable
13-8/62
Part 13: Endogeneity
Instrumental Variables

Structure



LWAGE (ED,EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION)
ED (MS, FEM)
Reduced Form:
LWAGE[ ED (MS, FEM),
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION ]
13-9/62
Part 13: Endogeneity
Two Stage Least Squares Strategy
Reduced Form:
LWAGE[ ED (MS, FEM,X),
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION ]
 Strategy




(1) Purge ED of the influence of everything but MS,
FEM (and the other variables). Predict ED using all
exogenous information in the sample (X and Z).
(2) Regress LWAGE on this prediction of ED and
everything else.
Standard errors must be adjusted for the predicted ED
13-10/62
Part 13: Endogeneity
OLS
13-11/62
Part 13: Endogeneity
The weird results for the
coefficient on ED happened
because the instruments,
MS and FEM are dummy
variables. There is not
enough variation in these
variables.
13-12/62
Part 13: Endogeneity
Source of Endogeneity
LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + 
 ED
= f(MS,FEM,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + u

13-13/62
Part 13: Endogeneity
Remove the Endogeneity

LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + u + 

Strategy


13-14/62
Estimate u
Add u to the equation. ED is uncorrelated with  when u is in
the equation.
Part 13: Endogeneity
Auxiliary Regression for ED to Obtain Residuals
13-15/62
Part 13: Endogeneity
OLS with Residual (Control Function) Added
2SLS
13-16/62
Part 13: Endogeneity
A Warning About Control Function
13-17/62
Part 13: Endogeneity
The Problem
y  X  Y  
Cov( X, )  0, K1 variables
Cov(Y, )  0, K 2 variables
Y is endogenous
OLS regression of y on (X,Y) cannot estimate (, )
consistently. Some other estimator is needed.
Additional structure:
Y =Z +V where Cov(Z,)=0.
An instrumental variable (IV) estimator based on (X,Z) may
be able to estimate (,) consistently.
13-18/62
Part 13: Endogeneity
Instrumental Variables


Framework: y = X + , K variables in X.
There exists a set of K variables, Z such that
plim(Z’X/n)  0 but plim(Z’/n) = 0

The variables in Z are called instrumental variables.
An alternative (to least squares) estimator of  is
bIV = (Z’X)-1Z’y

We consider the following:



13-19/62
Why use this estimator?
What are its properties compared to least squares?
We will also examine an important application
Part 13: Endogeneity
IV Estimators
Consistent
bIV = (Z’X)-1Z’y
= (Z’X/n)-1 (Z’X/n)β+ (Z’X/n)-1Z’ε/n
= β+ (Z’X/n)-1Z’ε/n  β
Asymptotically normal (same approach to proof as
for OLS)
Inefficient – to be shown.
13-20/62
Part 13: Endogeneity
The General Result
By construction, the IV estimator is consistent. So,
we have an estimator that is consistent when
least squares is not.
13-21/62
Part 13: Endogeneity
LS as an IV Estimator
The least squares estimator is
(X X)-1Xy = (X X)-1ixiyi
=  + (X X)-1ixiεi
If plim(X’X/n) = Q nonzero
plim(X’ε/n) = 0
Under the usual assumptions LS is an IV estimator
X is its own instrument.
13-22/62
Part 13: Endogeneity
IV Estimation
Why use an IV estimator? Suppose that X and 
are not uncorrelated. Then least squares is
neither unbiased nor consistent.
Recall the proof of consistency of least squares:
b =  + (X’X/n)-1(X’/n).
Plim b =  requires plim(X’/n) = 0. If this does not
hold, the estimator is inconsistent.
13-23/62
Part 13: Endogeneity
A Popular Misconception
A popular misconception. If only one variable in X is correlated with ,
the other coefficients are consistently estimated. False.
Suppose only the first variable is correlated with ε
 1 
 
0
Under the assumptions, plim(X'ε/n) =   . Then
 ... 
 
 . 
 q 11 
 1 
 21 
 
0
q 
plim b - β = plim(X'X/n)-1    1 
 ... 
 ... 
 K 1 
 
.
 
q 
 1 times the first column of Q-1
The problem is “smeared” over the other coefficients.
13-24/62
Part 13: Endogeneity
Asymptotic Covariance Matrix of bIV
1
bIV    (Z'X) Z ' 
(bIV  )(bIV  ) '  (Z'X) 1 Z ' 'Z(X'Z)-1
E[(bIV  )(bIV  ) ' | X, Z]  2 (Z'X) 1 Z ' Z(X'Z)-1
13-25/62
Part 13: Endogeneity
Asymptotic Efficiency
Asymptotic efficiency of the IV estimator. The variance is
larger than that of LS. (A large
sample type of
Gauss-Markov result is at work.)
(1) It’s a moot point. LS is inconsistent.
(2) Mean squared error is uncertain:
MSE[estimator|β]=Variance + square of bias.
IV may be better or worse. Depends on the data
13-26/62
Part 13: Endogeneity
Two Stage Least Squares
How to use an “excess” of instrumental variables
(1) X is K variables. Some (at least one) of the K
variables in X are correlated with ε.
(2) Z is M > K variables. Some of the variables in
Z are also in X, some are not. None of the
variables in Z are correlated with ε.
(3) Which K variables to use to compute Z’X and Z’y?
13-27/62
Part 13: Endogeneity
Choosing the Instruments




Choose K randomly?
Choose the included Xs and the remainder randomly?
Use all of them? How?
A theorem: (Brundy and Jorgenson, ca. 1972) There is a
most efficient way to construct the IV estimator from this
subset:



(1) For each column (variable) in X, compute the predictions of
that variable using all the columns of Z.
(2) Linearly regress y on these K predictions.
This is two stage least squares
13-28/62
Part 13: Endogeneity
Algebraic Equivalence

Two stage least squares is equivalent to


13-29/62
(1) each variable in X that is also in Z is replaced by
itself.
(2) Variables in X that are not in Z are replaced by
predictions of that X with all the variables in Z that are
not in X.
Part 13: Endogeneity
2SLS Algebra
ˆ  Z(Z'Z)-1 Z'X
X
ˆ ˆ )1 X'y
ˆ
b2SLS  ( X'X
But, Z(Z'Z)-1 Z'X = (I - MZ )X and (I - MZ ) is idempotent.
ˆ ˆ = X'(I - MZ )(I - MZ )X = X'(I - MZ )X so
X'X
ˆ )1 X'y
ˆ = a real IV estimator by the definition.
b2SLS  ( X'X
ˆ /n) = 0 since columns of X
ˆ are linear combinations
Note, plim(X'
of the columns of Z, all of which are uncorrelated with 
b2SLS  X'(I - MZ )X ]-1 X'(I - MZ )y
13-30/62
Part 13: Endogeneity
Asymptotic Covariance Matrix for 2SLS
General Result for Instrumental Variable Estimation
E[(bIV  )(bIV  ) ' | X, Z]  2 (Z'X) 1 Z ' Z(X'Z)-1
ˆ = (I - MZ ) X
Specialize for 2SLS, using Z = X
ˆ ) 1 X
ˆ 'X
ˆ (X'X
ˆ )-1
E[(b2SLS  )(b2SLS  ) ' | X, Z]  2 ( X'X
ˆ ˆ ) 1 X
ˆ'X
ˆ (X'X
ˆ ˆ )-1
 2 ( X'X
ˆ ˆ )1
 2 ( X'X
13-31/62
Part 13: Endogeneity
2SLS Has Larger Variance than LS
A comparison to OLS
ˆ 'X
ˆ )-1
Asy.Var[2SLS]=2 ( X
Neglecting the inconsistency,
Asy.Var[LS]
=2 ( X ' X)-1
(This is the variance of LS around its mean, not β)
Asy.Var[2SLS]  Asy.Var[LS] in the matrix sense.
Compare inverses:
ˆ'X
ˆ]
{Asy.Var[LS]} -1 - {Asy.Var[2SLS]} -1  (1 / 2 )[ X ' X - X
 (1 / 2 )[X ' X - X '(I  MZ ) X ]=(1 / 2 )[X ' MZ X ]
This matrix is nonnegative definite. (Not positive definite
as it might have some rows and columns which are zero.)
Implication for "precision" of 2SLS.
The problem of "Weak Instruments"
13-32/62
Part 13: Endogeneity
Estimating σ2
Estimating the asymptotic covariance matrix a caution about estimating 2 .
ˆ,
Since the regression is computed by regressing y on x
one might use
2
n
ˆ 2sls )

ˆ  1n i1 (y i  x'b
This is inconsistent. Use
2
n

ˆ  1n i1 (y i  x'b2sls )
(Degrees of freedom correction is optional. Conventional,
but not necessary.)
13-33/62
Part 13: Endogeneity
Cornwell and Rupert Data
Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years
Variables in the file are
EXP
WKS
OCC
IND
SOUTH
SMSA
MS
FEM
UNION
ED
LWAGE
=
=
=
=
=
=
=
=
=
=
=
work experience
weeks worked
occupation, 1 if blue collar,
1 if manufacturing industry
1 if resides in south
1 if resides in a city (SMSA)
1 if married
1 if female
1 if wage set by union contract
years of education
log of wage = dependent variable in regressions
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel
Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied
Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data
were downloaded from the website for Baltagi's text.
13-34/62
Part 13: Endogeneity
Endogeneity Test? (Hausman)
Exogenous
Endogenous
OLS
Consistent, Efficient
Inconsistent
2SLS
Consistent, Inefficient
Consistent
Base a test on d = b2SLS - bOLS
Use a Wald statistic, d’[Var(d)]-1d
What to use for the variance matrix?
Hausman: V2SLS - VOLS
13-35/62
Part 13: Endogeneity
Hausman Test
13-36/62
Part 13: Endogeneity
Hausman Test: One at a Time?
13-37/62
Part 13: Endogeneity
Endogeneity Test: Wu
Considerable complication in Hausman test
(text, pp. 322-323)
 Simplification: Wu test.
 Regress y on X and X^ estimated for the
endogenous part of X. Then use an ordinary
Wald test.

13-38/62
Part 13: Endogeneity
Wu Test
Note: .05544 +
.54900 = .60444,
which is the 2SLS
coefficient on ED.
13-39/62
Part 13: Endogeneity
Alternative to Hausman’s Formula?

H test requires the difference between an
efficient and an inefficient estimator.

Any way to compare any two competing
estimators even if neither is efficient?

Bootstrap? (Maybe)
13-40/62
Part 13: Endogeneity
13-41/62
Part 13: Endogeneity
Measurement Error
y = x* +  all of the usual assumptions
x = x* + u the true x* is not observed
(education vs. years of school)
What happens when y is regressed on x? Least
squares attenutation:
cov(x,y) cov(x * u, x * )
plim b =

var(x)
var(x * u)
 var(x*)
=
<
var(x*)  var(u)
13-42/62
Part 13: Endogeneity
Why Is Least Squares Attenuated?
y = x* + 
x = x* + u
y = x + ( - u)
y = x + v, cov(x,v) = -  var(u)
Some of the variation in x is not
associated with variation in y. The
effect of variation in x on y is
dampened by the measurement error.
13-43/62
Part 13: Endogeneity
Measurement Error in Multiple Regression
Multiple regression: y = 1 x1 * 2 x 2 *  
x1 * is measured with error; x1  x1 * u
x 2 is measured with out error.
The regression is estimated by least squares
Popular myth #1. b1 is biased downward, b 2 consistent.
Popular myth #2. All coefficients are biased toward zero.
Result for the simplest case. Let
ij  cov(x i *, x j *),i, j  1, 2 (2x2 covariance matrix)
ij  ijth element of the inverse of the covariance matrix
2  var(u)
For the least squares estimators:
 2 12 
1


plim b1  1 
, plim b2  2  1 
2 11 
2 11 
1





1    
The effect is called "smearing."
13-44/62
Part 13: Endogeneity
Twins
Application from the literature:
Ashenfelter/Kreuger: A wage
equation for twins that includes
“schooling.”
13-45/62
Part 13: Endogeneity
Orthodoxy
13-46/62

A proxy is not an instrumental variable

Instrument is a noun, not a verb

Are you sure that the instrument really
exogenous? The “natural experiment.”
Part 13: Endogeneity
The First IV Study
(Snow, J., On the Mode of Communication of Cholera, 1855)


London Cholera epidemic, ca 1853-4
Cholera = f(Water Purity,u)+ε.
 Effect of water purity on cholera?
 Purity=f(cholera prone environment (poor, garbage in
streets, rodents, etc.). Regression does not work.
Two London water companies
Lambeth

Southwark
======|||||======
Main sewage discharge
Paul Grootendorst: A Review of Instrumental Variables Estimation of Treatment Effects…
http://individual.utoronto.ca/grootendorst/pdf/IV_Paper_Sept6_2007.pdf
13-47/62
Part 13: Endogeneity
IV Estimation
Cholera=f(Purity,u)+ε
 Z = water company
 Cov(Cholera,Z)=δCov(Purity,Z)
 Z is randomly mixed in the population (two full
sets of pipes) and uncorrelated with behavioral
unobservables, u)
 Cholera=α+δPurity+u+ε



13-48/62
Purity = Mean+random variation+λu
Cov(Cholera,Z)= δCov(Purity,Z)
Part 13: Endogeneity
Autism: Natural Experiment
Autism ----- Television watching
 Which way does the causation go?
 We need an instrument: Rainfall




Rainfall effects staying indoors which influences TV
watching
Rainfall is definitely absolutely truly exogenous, so it
is a perfect instrument.
The correlation survives, so TV “causes” autism.
13-49/62
Part 13: Endogeneity
Two Problems with 2SLS

Z’X/n may not be sufficiently large. The
covariance matrix for the IV estimator is
Asy.Cov(b ) = σ2[(Z’X)(Z’Z)-1(X’Z)]-1


If Z’X/n -> 0, the variance explodes.
Additional problems:



2SLS biased toward plim OLS
Asymptotic results for inference fall apart.
When there are many instruments, Xˆ is too close
to X; 2SLS becomes OLS.
13-50/62
Part 13: Endogeneity
Weak Instruments



Symptom: The relevance condition, plim Z’X/n not zero, is close to
being violated.
Detection:
 Standard F test in the regression of xk on Z. F < 10 suggests a
problem.
 F statistic based on 2SLS – see text p. 351.
Remedy:
 Not much – most of the discussion is about the condition, not
what to do about it.
 Use LIML? Requires a normality assumption. Probably not too
restrictive.
13-51/62
Part 13: Endogeneity
A study of moral hazard
Riphahn, Wambach, Million: “Incentive Effects in the Demand
for Healthcare”
Journal of Applied Econometrics, 2003
Did the presence of the ADDON insurance influence the
demand for health care – doctor visits and hospital visits?
For a simple example, we examine the PUBLIC insurance
(89%) instead of ADDON insurance (2%).
13-52/62
Part 13: Endogeneity
Application: Health Care Panel Data
German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods
Variables in the file are
Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293
individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate
binary choice. This is a large data set. There are altogether 27,326 observations. The number of
observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051,
6=1000, 7=987). Note, the variable NUMOBS below tells how many observations there are for each
person. This variable is repeated in each row of the data for the person. (Downloaded from the JAE
Archive)
DOCTOR
HOSPITAL
HSAT
DOCVIS
HOSPVIS
PUBLIC
ADDON
HHNINC
HHKIDS
EDUC
AGE
MARRIED
EDUC
13-53/62
=
=
=
=
=
=
=
=
1(Number of doctor visits > 0)
1(Number of hospital visits > 0)
health satisfaction, coded 0 (low) - 10 (high)
number of doctor visits in last three months
number of hospital visits in last calendar year
insured in public health insurance = 1; otherwise = 0
insured by add-on insurance = 1; otherswise = 0
household nominal monthly net income in German marks / 10000.
(4 observations with income=0 were dropped)
= children under age 16 in the household = 1; otherwise = 0
= years of schooling
= age in years
= marital status
= years of education
Part 13: Endogeneity
Evidence of Moral Hazard?
13-54/62
Part 13: Endogeneity
Regression Study
13-55/62
Part 13: Endogeneity
Endogenous Dummy Variable

Doctor Visits = f(Age, Educ, Health,
Presence of Insurance,
Other unobservables)

Insurance
13-56/62
= f(Expected Doctor Visits,
Other unobservables)
Part 13: Endogeneity
Approaches

(Parametric) Control Function: Build a structural
model for the two variables (Heckman)

(Semiparametric) Instrumental Variable: Create
an instrumental variable for the dummy variable
(Barnow/Cain/ Goldberger, Angrist, Current
generation of researchers)

(?) Propensity Score Matching (Heckman et al.,
Becker/Ichino, Many recent researchers)
13-57/62
Part 13: Endogeneity
Heckman’s Control Function Approach


Y = xβ + δT + E[ε|T] + {ε - E[ε|T]}
λ = E[ε|T] , computed from a model for whether T = 0 or 1
Magnitude = 11.1200 is nonsensical in this context.
13-58/62
Part 13: Endogeneity
Instrumental Variable Approach
Construct a prediction for T using only the exogenous information
Use 2SLS using this instrumental variable.
Magnitude = 23.9012 is also nonsensical in this context.
13-59/62
Part 13: Endogeneity
Propensity Score Matching



Create a model for T that produces probabilities for T=1: “Propensity Scores”
Find people with the same propensity score – some with T=1, some with T=0
Compare number of doctor visits of those with T=1 to those with T=0.
13-60/62
Part 13: Endogeneity
Treatment Effect

Earnings and Education: Effect of an additional
year of schooling

Estimating Average and Local Average
Treatment Effects of Education when
Compulsory Schooling Laws Really Matter


13-61/62
Philip Oreopoulos
AER, 96,1, 2006, 152-175
Part 13: Endogeneity
Treatment Effects and Natural Experiments
13-62/62
Part 13: Endogeneity