Instrumental Variables

Download Report

Transcript Instrumental Variables

Causal Inference: experimental and
quasi-experimental methods
Draft
©G. Mason
(2005)
Scientific truth always goes through three stages. First, people say it
conflicts with the Bible; next they say it has been discovered before;
and lastly they say that they always believed it
Louis Agassiz, Swiss naturalist
We do not now a truth without knowing its cause
Aristotle, Nicomachean Ethics
Development of Western science is based on two great achievements:
the invention of the formal logical system (Euclidean geometry) by the
Greek philosophers, and the discovery of the possibility to find out
causal relationships by systematic experiment (during the
Renaissance)
Albert Einstein
Models of cause and effect 1
• Fishbone
A
Effect
B
C
Models of cause and effect 2
• Path Diagram (Regression sentence)
X1
B1
X2
B2
:
BK
XK
Y
Error
Concept of causality
• Causality often implies inevitability, but the
reality is that causal statements usually reflect
degrees of uncertainty.
• Causality and probability are fundamentally
connected because we want to :
– Know the causes of an event
– Measure the relative strength of these causes
Randomized experiments
• Classic experiment is the random, doubleblind experiment (RDE):
– subjects are selected randomly into a treatment
and control group
– each subject received a code
– an independent third party assigns codes
randomly to treatment and control group
members.
– the treatment is not identifiable (i.e., the real and
fake pill are identical.
– those administering the treatments and placebo
have no knowledge of what subject receives.
Key benefits of the RDE
• randomization creates statistically equivalent
groups
• in the absence of any interventions (the drug
under tests), the incidence of disease is the
same for both groups
• the groups are the “same” (statistically,
except that one gets the drug and the other a
placebo
• analysis can be done by difference of means
tests or other basic techniques.
Limits of RDE
• In social science, randomized double
blind experiments are often not feasible:
– human subjects are unreliable (they move,
die or otherwise fail to participate in the full
experiment).
– many see the administration of a placebo
as withholding a treatment.
– social policy cannot be masked (creating a
placebo is difficult).
Quasi-experimental designs
• Most policy testing in social sciences uses a
quasi-experimental design.
• Two approaches exist
– Multivariate (regression) models specify
dependent variable outcome, and include dummy
variables to identify those in the program. Other
covariates are included to control for the
interventions.
– Matching: Program participants and nonparticipants serve as the basis for the treatment
and control groups.
Four potential models for evaluating policy
1. Randomized control (RC)
2. Natural experiments (difference-indifference, discontinuity)
3. Quasi-experimental methods
–
–
Heckman two step
Statistical Matching
4. Instrumental variables (treated separately)
Randomized control
Attempts to create a situation where
Cov (X’, ) = 0, or
E(T’, ) = E(T”, W), where W are the
omitted variables that determine selection
into treatment.
Natural experiments
• Create a “split” in the sample, where treated
and untreated are classified by a variable that
is not related to the the treatment.
• This split occur “naturally” where the program
change occurs in one area/jurisdiction, not in
others that are “closely similar.”
• Difference-in-differences (DID) methods are a
common evaluation framework.
Difference in Differences
• The DID estimator uses the average before and after values for
an outcome variable for the program and comparison group.
DID = [Yp (t=a) – Yp(t=b) ] – [Yc(t=a) - Yc(t=b) ]
• Example:
–
–
–
–
Avg. earned income before - program group = $4500
Avg. earned income after - program group = $6500
Avg. earned income before - comparison group = $10,500
Avg. earned income after - comparison group = $11,000
DID = [6500 - 4500] – [11,000 – 10,500] = $1,500
= net impact attributable to the program (treatment)
Net impact using DID
a
Yp (t=a)
b
Y
(income,
hours, etc.)
c
Yp (t=b)
bc = de
d
e
Yc (t=a)
Yc (t=b)
Time
t=b
t=a
Causal Inference – comparison in
regression
Problem: Estimate effect of treatment (T) on
observed outcome (Y), or estimate B in
Yi = B0 + B1Ti + i = Xi B + I (where X = [1, T]
Assume
– dichotomous treatment variable: T=1 if treated, 0 otherwise
– homogeneous treatment effect (B) (every i experiences
same effect) “average treatment effect”
– Linear (no dose)
– no covariates mediate the outcome
The simple comparison group model
T = 1 (treatment)
= 0 otherwise
B
Bhat = Ybar(T=1) – YBar (T=0)
Y
i
No covariates is the key assumption
OLS assumption E (X’, ) = 0, or
E (X’, ) = X’ (Y- BOLSX)
which then creates the OLS estimator
BOLS = (X’X) –1 X’y
But, with omitted variables, the validity of OLS
requires the omitted variables to be
uncorrelated with T (the treatment). This is the
essence of the self-selection problem.
Selection and attrition
Random selection
into program
Non-random selection into program
(self-selection)


Random samples may be
upset by self-selection and
attrition (or both)
participant choice
program manager choice (creaming)
Non-random selection out of
program (attrition)


participant choice
program manager choice (creaming)
Heckman two step procedure (basic)
The original motivation for this procedure was to correct evaluation
studies for sample distortion caused by self-selection.
Two steps:
1.
Estimate the probability of participation for each participant
and non-participant.
Yi = B0 + B1Xi1 + B2Xi2 + … BkXik + ei (Y=1 for a participant,
for a non-participant).
2.
For each participant/non-participant a unique probability of
participation will be estimated. Call this λi Now, this is
inserted into the outcome regression
Wi = B0 + B1Xi1 + B2Xi2 + … BkXik + Bl λi + ei, where Wi is the
outcome for person i (wages, hours worked, etc.)
Matching
In social experiments, participants differ from nonparticipants because:
– failure to hear of program
– constraints on participation or completion
– selection by staff
Matching participants and non-participants can be
accomplished via
– pair-wise
– statistical
Matching Process
PARTICIPANTS
PROGRAM
GROUP
Matching
Process
 Pairwise
 Statistical
NONPARTICIPANTS
COMPARISON
GROUP
Pair wise matching
• The theory will indicate those attributes that
are likely to make a difference in the quasiexperiment.
– For labour markets, gender, education and ruralurban location are important
– For health policy, age, rural-urban, and family
history might be important.
• The analyst starts with the first variable, and
divides the participants and non-participants
into two sets.
• Within the sets the samples are classified
with respect to the second, variable and so
on.
Pair wise Matching
GENDER
EDUCATION
High School or Less
Men
College
Graduate
Comparison
Non-Participants
High School or Less
Women
College
Graduate
High School or Less
Men
College
Graduate
Program
Participants
High School or Less
Women
College
Graduate
Statistical Matching
•
•
•
•
•
•
•
Matching is needed because we cannot randomly allocate clients to the
program and comparison groups. Program benefits cannot be
withheld.
Logit model provides the estimate of the propensity to participate for
participants and non-participants.
The key idea is that we estimate that propensity to participate is based
on observed attributes of the participants and non-participants.
Participants are assigned a “Y”value of 1 and non-participants are
assigned a “Y” value of 0.
A logistic regression then estimates the propensity to participate.
Note that even though a non-participant actually did not participate the
model will assign a score between 0 and 1. Typically non-participants
will have lower scores than participants, but there will be an overlap.
The overlap is termed the region of common support.
Rationale for statistical matching
•
•
•
•
•
•
•
Matching is needed because we cannot randomly allocate EI clients to
the program and comparison groups. Part II benefits cannot be
withheld.
Logit model provides the estimate of the propensity to participate for
participants and non-participants.
The key idea is that we estimate that propensity to participate is based
on observed attributes of the participants and non-participants.
Participants are assigned a “Y”value of 1 and non-participants are
assigned a “Y” value of 0.
A logistic regression then estimates the propensity to participate.
Note that even though a non-participant actually did not participate the
model will assign a score between 0 and 1. Typically non-participants
will have lower scores than participants, but there will be an overlap.
The overlap is termed the region of common support.
Statistical matching simplified
X
C om parison
X
X
X
X
P rogram
X
X
X
X
X
X
XX
X
Propensity to participate
0
1
E ach participant is m atched to a "nearest
neighbour" non-participant. M ost nonparticipants are not m atched to
participants and are discarded from the
sam ple survey and the analysis
The logit model
LPM Model Pi = E(Y = 1| Xi)
= B1 + B2X2 + B3X3 +..+BkXk
Logit Model Pi = E(Y = 1| Xi)
= 1/[1+ e – (BiXi)]
In Log Odds format
Li = ln(Pi/1-Pi) = Zi = BiXi
Region of Common Support
• Each participant has the value of 1 for P and
each non-participant has the value 0.
• However, once the model is estimated, each
participant and non-participant has a score
between 0 and 1. Participants tend to have
scores closer to 1 and non-participants are
closer to 0.
• The distribution of scores can be graphed.
Statistical Matching
Participants
EI Clients
Twin 1
(Program)
Matching variables

age

gender income

prior interventions

region

time on EI

......
Twin 2
(Comparison)
:
Difference in pre and post
program earnings, hours,
etc. regressed against
intervention dummy
variables, active/
reachback, etc. for all
twinned pairs
Twin 10,000
(Program)
Non-participants
Twin 10,000
(Comparison)
Statistical matching
Statistical matching and structural modelling
Analysis
Statistical Matching: The region of
common support
Relative frequency
N o n -p articip an ts
P articip an ts
O v erlap
0
1
P robability of participation
(propensity score)
Issues in Matching
• The matching is limited to the variables available in
the administrative files.
• The balancing test compares the program and
participant groups for each covariate using a t test for
differences in means.
• Two key weaknesses are:
– matching on the observed variables may not align the
program and comparison groups on the non-observed
variables.
– The statistical quality of the match is very important Every
additional variable that is introduced to the matching
equation process, potentially improves the closeness of the
match.
Application of the DID estimator
in a matching context
• When combined with control, it measures the impact
of the observed differences between the two groups,
which is participation in treatment (program)
• Cannot measure the net impact of different
interventions unless these are added as covariates.
• This requires a 1 – 1 match between a program
participant and a non-participant (i.e.., matched
program and comparison groups)
DIDi-j = B1T1,i-j + B2T2,i-j + ..BkTk,i-j + ui-j