Differences-in-Differences and A (Very) Brief Introduction

Download Report

Transcript Differences-in-Differences and A (Very) Brief Introduction

Instrumental Variables:
Problems
Methods of Economic
Investigation
Lecture 16
Last Time

IV



Monotonic
Exclusion Restriction
Can we test our exclusion restriction?


Overidentification test
Separate Regression Tests
Today’s Class

Issues with Instrumental Variables

Heterogeneous Treatment Effects



LATE framework
interpretation
Weak Instruments



Bias in 2SLS
Asymptotic properties
Problems when the first stage is not very big
Heterogeneous Treatment Effects

Recall our counterfactual worlds



Individual has two potential outcomes Y0 and Y1
We only observe 1 of these for any given
individuals
Define a counterfactual S now:



S1 is the value of S if Z = 1
S0 is the value of S if Z = 0
We only ever observe one of these for any given
individual
The Counterfactual
Individual U
Z=1
Z=0
S1U =1
Y1U
Y0U
S1U=0
Y1U
Y0U
S0U=1
Y1U
Y0U
S0U=0
Y1U
Y0U
ATE is the E[Y1U – Y0U]: which node doesn’t matter if homogeneous
effects. With heterogeneous effects, it’s the average across the nodes
Observed difference is E[Y | S=1] – E[Y | S=0] (light blue vs dark blue)
ITT is E[Y | Z=1] – E[Y | Z=0] (red vs. orange)
LATE is E[Y | S1U=1, Z=1] – E[Y | S0U=1, Z=0]
Writing the first stage as counterfactuals

We can now define our variable of interest
S as follows:
S = S0i + (S1i – S0i)zi=π0 + π1i Zi + νi
In this specification:



π0 = E[S0i]
π1i = S1i – S0i
E[π1i] = E[S1i – S0i]: The average effect of Z on
S—this is just our ATE for the first stage
regression
Exclusion Restriction

The instrument operates only through the
channel of the variable of interest

With homogeneous effects we describe this as
E[ηZ]=0

For any value of S (i.e. S = 0, 1)
Y(S, 0) = Y(S, 1)

Another way to think of this is that the
exclusion restriction says we only want to
look at the part of S that is varying with Z
The set of potential outcomes

Use the exclusion restriction to define
potential outcomes with Y(S,Z)


Y1i = Y(1,1) = Y(1,0) = Y(S=1)
Y0i = Y(0,1) = Y(0,0) = Y(S=0)
Rewrite the potential outcome as:
Yi= Yi(0,zi) + [Yi(1,zi) – Yi(0,zi)]Si
= Y0i + (Y1i – Y0i)Si
= α0 + ρiSi + η

S is the
unique
Channel
through
which the
instrument
operates
Monotonicity

For the set of individuals affected by the
instrument, the instrument must have the
same effect



It can have no effect on some people (e.g.
always takers, never takers)
For those it has an effect on (e.g. complier) it
must be that π1i >0 or π1i <0 for all i
where Si = π0 + π1i Zi + νi
In terms of the counterfactual, it must be the
case that S1i ≥ S0i (or S1i ≤ S0i ) for all i
Back to LATE

Given these assumptions
E[Y | Z  1]  E[Y | Z  0]
 E[Y1i  Y0i | S1i  S 0i ]  E[  i |  i  0]
E[ S | Z  1]  E[ S | Z  0]

To see why note the following:



E[Yi | Zi=1] = E[Y0i + (Y1i – Y0i)Si | Zi=1]
= E[Y0i + (Y1i – Y0i)S1i ]
E[Yi | Zi=0] = E[Y0i + (Y1i – Y0i)S0i ]
E[S |Zi =1] – E[S |Zi =0] = E[S1i – S0i] = Pr[S1i>S0i]
LATE continued

Substituting these equalities in to our
formula we get:
E[Yi | Z i  1]  E[Yi | Z i  0] E[(Y1i  Y0i )(S1i  S 0i )]

E[ S i | Z i  1]  E[ S i | Z i  0]
Pr(S1i  S 0i )
E[(Y1i  Y0i | S1i  S 0i ) Pr(S1i  S 0i )]

Pr(S1i  S 0i )

We are left with our LATE estimate
E(Y1i  Y0i | S1i  S0i )
How to Interpret the LATE

Remember we thought of the LATE as
useful because




Y(S=1, Z=0) = Y(S=1, Z=1)= Y(S=1)
In the case of heterogeneous effects this is not
true
The LATE will not be the same as the ATE
Our estimate is “local” to the set of people
our instrument effects (the compliers)


Is this group we care about on it’s own?
Is there a theory on how this group’s effect size
might relate to other group’s effect
Finite Sample Problems

This is a very complicated topic


Exact results for special cases, approximations for
more general cases
Hard to say anything that is definitely true but can
give useful guidance

With sufficiently strong instruments in a
sufficiently large finite sample—you’re fine

Weak Instruments generate 3 problems:



Bias
Incorrect measurement of variance
Non-normal distribution
Some Intuition for why Strength of
Instruments is Important

Consider very strong instrument



Z can explain a lot of variation in s
Z very close to s-hat
Think of limiting case where correlation perfect –
then s-hat=s
 IV estimator identical to OLS estimator
 Will have same distribution
 If errors normal then this is same as
asymptotic distribution
What if we have weak instrument…

Think of extreme case where true
correlation between s and Z is useless



First-stage tries to find some correlation so
estimate of coefficients will not normally be zero
and will have some variation in X-hat
No reason to believe X-hat contains more ‘good’
variation than X itself
So central tendency is OLS estimate

But a lot more noise – so very big variance
A Simple Example
One endogenous variable, no exogenous
variables, one instrument
 All variables known to be mean zero so
estimate equations without intercepts

yi   xi   i
xi   zi  ui
zi yi
IV
ˆ
 
zi xi
Finite Sample Problems 1 and 2

To address issue of bias want to take
expectation of final term – would like it to
be zero.

Problem – mean does not exist




‘fat tails’ i.e. sizeable probability of getting
vary large outcome
This happens when Σzixi is small
more likely when instruments are weak
Similar issue for variance estimation
Finite Sample Problem 3

zi non-stochastic

(εi,ui) have joint normal distribution with
mean zero, variances σ2ε,σ 2u, and
covariance σ2εu

If σ2εu =0 then no endogeneity problem and
OLS estimator consistent

If σ2εu ≠0 then endogeneity problem and
OLS estimator is inconsistent
IV Estimator for this special case..
ˆ IV



1
zi i
zi   zi   ui   i 
n


1 2 1
zi  zi  ui 
 zi  zi ui
n
n
Both numerator and denominator of final term are
linear combinations of normal random variables so
are also normally distributed
So deviation of IV estimator from β is ratio of two
(correlated) normal random variables
Sounds simple but isn’t
A Very Special Case: π=


2
σ εu
=0
X exogenous and Z useless (basically, OLS would
be okay but maybe you don’t know this
In this case numerator and denominator in:
1

z

i
i

z

z


u




i
i
i
n
ˆ IV  i

1 2 1
zi  zi  ui 
 zi  zi ui
n
n
• Ε and u are independent with mean zero
• The IV estimator has a Cauchy distribution –
this has no mean (or other moments)
Rules-of-Thumb
Mean of IV estimator exists if more than
two over-identifying restrictions
 Where mean exists:

• Probably can use as measure of central tendency
of IV estimator where mean does not exist
• This is where rule-of-thumb on F-stat comes from
What to do - 1

Report the first stage and think about whether
it makes sense.



Are the magnitude and sign as you would
expect,
are the estimates too big or large but wrongsigned?
Report the F-statistic on the instruments.


The bigger this is, the better.
General suggestion: F-statistics above about
10 put you in the safe zone
What to do – 2

Pick your best single instrument and report
just-identified estimates using this one only.


Just-identified IV is median-unbiased and therefore
unlikely to be subject to a weak-instruments
critique.
Look at the coefficients, t-statistics, and Fstatistics for excluded instruments in the
reduced-form regression of dependent
variables on instruments.


Remember that the reduced form is proportional to
the causal effect of interest.
Most importantly, the reduced-form estimates, since
they are OLS, are unbiased.
Next time

Maximum Likelihood Estimation

Two uses:


LIML as an alternative to 2SLS
Discrete Choice Models (logit, probit, etc.)