Randomization and Impact evaluation

Download Report

Transcript Randomization and Impact evaluation

Randomization and Impact
evaluation
WBI WORKSHOP
The Types of Program Evaluation
Process evaluation
•
Audit and monitoring
•
Did the intended policy actually happen
(2) Impact evaluation
•
What effect (if any) did the policy have?
(1)
WBI WORKSHOP
Why Impact Evaluation ?

Knowledge is a global public good

Long term credibility

Help choosing best projects: build long term support
for development
WBI WORKSHOP
The evaluation problem and alternative solutions
• Impact is the difference between the relevant outcome
indicator with the program and that without it.
• However, we can never simultaneously observe
someone in two different states of nature.
• So, while a post-intervention indicator is observed, its
value in the absence of the program is not, i.e., it is a
counter-factual.
WBI WORKSHOP
Problems when Evaluation is not Built in Ex-Ante

Need a reliable comparison group

Before/After: Other things may happen

Units with/without the policy:May be different for other
reasons than the policy (e.g. because policy is placed
in specific areas)
WBI WORKSHOP
We observe an outcome indicator,
(observedl)
Y1
Y0
t=0
Intervention
WBI WORKSHOP
and its value rises after the program:
(observedl)
Y1
Y0
t=1 time
t=0
Intervention
WBI WORKSHOP
However, we need to identify the counterfactual…
Y1
(observedl)
Y1*
(counterfactual)
Y0
t=1 time
t=0
Intervention
WBI WORKSHOP
… since only then can we determine the impact of
the intervention
Y1
Impact = Y1- Y1
Y1*
Y0
t=0
WBI WORKSHOP
t=1 time
*
How can we fill in the missing data
on the counterfactual?
•
•
•
•
•
•
•
Randomization
Matching
Propensity-score matching
Difference-in-difference
Matched double difference
Regression Discontinuity Design
Instrumental variables
WBI WORKSHOP
1. Randomization
“Randomized out” group reveals counterfactual.
•Only a random sample participates.
•As long as the assignment is genuinely random, impact is
revealed in expectation.
•Randomization is the theoretical ideal, and the benchmark
for non-experimental methods. Identification issues are more
transparent compare with other evaluation technique.
•But there are problems in practice:
•internal validity: selective non-compliance
•external validity: difficult to extrapolate results from a
pilot experiment to the whole population
WBI WORKSHOP
2. Matching
Matched comparators identify counterfactual.
Propensity-score matching: Match on the basis of
the probability of participation.
•
•
•
•
Match participants to non-participants from a larger
survey.
The matches are chosen on the basis of similarities
in observed characteristics.
This assumes no selection bias based on
unobservable heterogeneity.
Validity of matching methods depends heavily on
data quality.
WBI WORKSHOP
3. Propensity-score matching (PSM)
Match on the probability of participation.
• Ideally we would match on the entire vector X of
observed characteristics. However, this is practically
impossible. X could be huge.
• Rosenbaum and Rubin: match on the basis of the
propensity score =
P( X i )  Pr( Di  1 X i )
• This assumes that participation is independent of
outcomes given X. If no bias give X then no bias
given P(X).
WBI WORKSHOP
Steps in score matching:
1: Representative, highly comparable, surveys of the
non-participants and participants.
2: Pool the two samples and estimate a logit (or
probit) model of program participation. Predicted
values are the “propensity scores”.
3: Restrict samples to assure common support
Failure of common support is an important source
of bias in observational studies (Heckman et al.)
WBI WORKSHOP
Density
0
WBI WORKSHOP
Density of scores for participants
Propensity score
1
Density
Density of scores for non-participants
0
1
Propensity score
WBI WORKSHOP
Density of scores for non-participants
Density
0
Region of common support
Propensity score
WBI WORKSHOP
1
Steps in score matching:
4: For each participant find a sample of nonparticipants that have similar propensity scores.
5: Compare the outcome indicators. The difference
is the estimate of the gain due to the program for
that observation.
6: Calculate the mean of these individual gains to
obtain the average overall gain.
WBI WORKSHOP
4. Difference-in-difference (double difference)
Observed changes over time for nonparticipants
provide the counterfactual for participants.
Collect baseline data on non-participants and (probable)
participants before the program.
•
•
•
Compare with data after the program.
Subtract the two differences, or use a regression with
a dummy variable for participant.
This allows for selection bias but it must be timeinvariant and additive.
WBI WORKSHOP
Selection bias
Y1
Impact
Y1*
Y0
Selection bias
t=0
WBI WORKSHOP
t=1 time
Diff-in-diff requires that the bias is additive and timeinvariant
Y1
Impact
Y1*
Y0
t=0
WBI WORKSHOP
t=1 time
The method fails if the comparison group is on a
different trajectory
Y1
Impact?
Y1*
Y0
t=0
WBI WORKSHOP
t=1 time
Diff-in-diff:
T
E[(Yit
C
 Yit )]  Git
if (i) change over time for comparison group reveals
counterfactual
C
EYit

*
EYit
and (ii) baseline is uncontaminated by the program,
Gi 0  0
WBI WORKSHOP
5. Matched double difference
Matching helps control for bias in diff-in-diff
Score match participants and non-participants based on
observed characteristics in baseline
•Then do a double difference
•This deals with observable heterogeneity in initial
conditions that can influence subsequent changes over
time
WBI WORKSHOP
6. Regression Discontinuity Design

Selection function is a
discontinuous function
Selection
1

UPP in Indonesia: two
similar kecamatan in
the same kabupaten
that have scores within
the neighborhood of the
cut off score can be
treated differently
WBI WORKSHOP
0
Kecamatan
score
treatment
control
7. Instrumental variables
Identifying exogenous variation using a 3rd variable
Outcome regression:
Yi  Di   i
D = 0,1 is our program – not random
• “Instrument” (Z) influences participation, but does not
affect outcomes given participation (the “exclusion
restriction”).
• This identifies the exogenous variation in outcomes due
to the program.
Treatment regression:
WBI WORKSHOP
Di  Z i  ui
Randomization: An example from Mexico

Progresa: Grants to poor families, conditional on preventive
health care and school attendance for children. Given to
women

Mexican government wanted an evaluation; order of
community phase-in was random

Results: child illness down 23%; height increased 1-4cm;
3.4% increase in enrollment

After evaluation: PROGRESA expanded within Mexico,
similar programs adopted throughout other Latin American
countries
WBI WORKSHOP
Randomization: An example from Kenya

School-based deworming: treat with a single pill every 6
months at a cost of 49 cents per student per year

27% of treated students had moderate-to-heavy
infection, 52% of comparison

Treatment reduced school absenteeism by 25%, or 7
percentage points

Costs only $3 per additional year of school participation
WBI WORKSHOP
Lessons randomized experiments

Randomized evaluations are often feasible
 Have been conducted successfully
 Are labor intensive and costly, but no more so
than other data collection activities

Results from randomized evaluations can be quite
different from those drawn from retrospective
evaluations

NGOs are well-suited to conduct randomized
evaluations in collaboration with academics and
external funders
WBI WORKSHOP
Lessons randomized experiments
While randomization is a powerful tool:

Internal validity can be questionable if we do not
allow properly for selective compliance with the
randomized assignment.

Not always feasible beyond pilot projects, which
raises concerns about external validity.

Contextual factors influence outcomes; scaled up
program may work differently.
WBI WORKSHOP
Matching Method Example :
Piped water and child health in rural India

Is a child less vulnerable to diarrhea if he/she lives in
a household with piped water?

Do children in poor, or poorly educated, households
realize the same health gains from piped water as
others?

Does income matter independently of parental
education?
WBI WORKSHOP
The evaluation problem

There are observable differences between those
households with piped water and those without it.

And these differences probably also matter to
child health.
WBI WORKSHOP
Naïve comparisons can be deceptive

Common practice: compare villages with piped water,
or some other infrastructure facility, and those
without.

Failure to control for differences in village
characteristics that influence infrastructure placement
can severely bias such comparisons.
WBI WORKSHOP
Model for the propensity scores for piped water
placement in India

Village variables: agricultural modernization,
educational and social infrastructure.

Household variables: demographics, education,
religion, ethnicity, assets, housing conditions, and
state dummy variables.
WBI WORKSHOP
More likely to have piped water if:






Household lives in a larger village, with a high school,
a pucca road, a bus stop, a telephone, a bank, and a
market;
it is not a member of a scheduled tribe;
it is a Christian household;
it rents rather than owns the home; this is not a
perverse wealth effect, but is related to the fact that
rental housing tends to be better equipped;
it is female-headed;
it owns more land.
WBI WORKSHOP
Impacts of piped water on child health

The results for mean impact indicate that access to
piped water significantly reduces diarrhea incidence
and duration.

Disease incidence amongst those with piped water
would be 21% higher without it. Illness duration would
be 29% higher.
WBI WORKSHOP
Stratifying by income per capita:

No significant child-health gains amongst the poorest
40% (roughly corresponding to the poor in India).

Very significant impacts for the upper 60%

Without piped water there would be no difference in
infant diarrhea incidence between the poorest quintile
and the richest.
WBI WORKSHOP
When we stratify by both income and education:

For the poor, the education of female members
matters greatly to achieving the child-health benefits
from piped water.

Even in the poorest 40%, women’s schooling results
in lower incidence and duration of diarrhea among
children from piped water.

Women’s education matters much less for upper
income groups.
WBI WORKSHOP
Lessons on matching methods

When neither randomization nor a baseline survey
are feasible, careful matching to control for
observable heterogeneity is crucial.

This requires good data, to capture the factors
relevant to participation.

Look for heterogeneity in impact; average impact
may hide important differences in the characteristics
of those who gain or lose from the intervention.
WBI WORKSHOP
Tracking participants and non-participants over time
1. Single-difference matching can still be contaminated by
selection bias
Latent heterogeneity in factors relevant to participation
2. Tracking individuals over time allows a double difference
This eliminates all time-invariant additive selection bias
3. Combining double difference with matching:
This allows us to eliminate observable heterogeneity in
factors relevant to subsequent changes over time
WBI WORKSHOP
Improving Evaluation Practice
When there is an impact evaluation:
 Build in evaluation ex-ante

Make a quality evaluation a primary responsibility of
the manager of the program

Allocate the necessary resources

Encourage randomization whenever feasible
(education, health, micro-finance, governance, not
monetary policy…)
WBI WORKSHOP
Practical suggestions

Not every project needs impact evaluation: select
projects in priority areas, where knowledge needed

Take advantage of budget constraints and phase-in

Require pilot project before large scale project

Finance pilot projects and evaluations with grants

Collaborate with others:
 Academics (e.g. Evaluation Based Policy Fund in
UK)
 NGOs
WBI WORKSHOP
Evaluation: An Opportunity

WBI WORKSHOP
Creating hard evidence of success will
 spend future resources more effectively
 influence other policymakers
 build public support