Transcript Slide 1

Using Randomized Evaluations to test
Development Effectiveness
Dean Karlan
Kamilla Gumede
Annie Duflo
3ie Conference on Perspectives on Evaluations,
Cairo, April 1, 2009
1
Why Evaluate?
• Three simple reasons
1.To motivate those with money to give
more
2.To know where to spend limited resources
3.To know how to improve programs
2
Peter Singer
• Utilitarian
• Would you save a child drowning in a lake if it would cost
you $100 in ruined clothing or a missed appointment?
• Would you send $100 right now to an NGO in a poor
country to save a child?
• Why are these questions not the same?
– Some say because “who really knows if my $100 can save a
child? Maybe it will just get wasted.”
– This is a common excuse for inaction.
• Evaluation rebuts this.
3
Why Evaluate?
• Three simple reasons
1.To motivate those with money to give
more
2.To know where to spend limited
resources
3.To know how to improve programs
4
Key Themes
• Evaluation is an investment
– Not a cost
• Context matters
– Replication and theory are needed to make
reliable prescriptions
• Quantitative vs qualitative is a false debate
– Confusion between survey methodology and
measurement of counterfactual
5
Why is More Evidence Needed?
• Knowing what to do (several ideas seem good)
Example: Can community empowerment or external auditors best
reduce corruption in road construction?
• Sometimes conventional wisdom needs to be
rethought.
Example: Group liability is an essential and necessary aspect of
successful microfinance schemes. Consumer lending not
beneficial, should focus on entrepreneurial credit.
• Can teach us how to improve design
Example: Reminders to save. Marketing of rainfall insurance.
Providing computer games to improve math skills.
6
Different Types of Evaluation
(1) Process evaluation
• Audit and monitoring
• Did the intended policy actually happen?
•
How many people reached, books distributed etc
(2) Impact evaluation
• What effect (if any) did the policy have?
•
•
How would individuals who did benefit from the program
have fared in the absence of the program
How would those who did not benefit have fared if they had
been exposed to the program
7
Why is Measuring Impact so Hard?
•
To know the impact of a program must be able to answer counterfactual:
– How would individual have fared without the program
– But can’t observed same individual with and without the program
•
Need an adequate comparison group
– Individuals who, except for the fact that they were not beneficiaries of the
program, are similar to those who received the program
•
Common approaches:
– Before and after
– Cross section
•
Programs done in particular place at particular time for a reason
•
Even with more sophisticated approaches can’t control for unobservables
– Study 1.9 million voters matched on 10 characteristics gave wrong policy
conclusion (Arceneaux, Gerber, Green, 2004)
•
No simple way of determining when alternatives to randomization will give
you the right answer
8
Randomized Evaluation
• Assign to treatment and control randomly
• By construction program beneficiaries are not more
motivated, richer, more educated etc than nonbeneficiaries
• Gives nice clean results—hard to manipulate or dispute
• Randomization can be incorporated in many different
ways
• Must be planned ex-ante
• Can be done ethically (in many cases, is more ethical as
it is fair, and avoids favoritism, nepotism, politicking, etc.)
• Can measure externalities or spill-overs
9
How to Introduce Randomness
1. Lottery
- e.g. whether you get into the training program
2. Randomize order of phase-in of a program
- e.g. order in which you clean up springs
3. Randomly encourage some more than others
-
e.g. offer saving commitment scheme to some bank
account holders
10
Constraints and Opportunities for Evaluation
•
Not all programs can (or should) be evaluated using the randomized
methodology
•
Hard to evaluate monetary policy or freedom of the press
•
Projects that are most straightforward to evaluate:
– Serve specific beneficiaries (individuals or communities)
– Limited budget or organizational constraints gives natural rational for phasing
•
Opportunities to do a randomized evaluation
–
–
–
–
–
•
Beta testing
Pilot project
Expanding project into a new area over several years
Popular program is over subscribed
Test impact of national program if take-up is not yet 100 percent
Can measure outcomes that might seem hard to measure
– Empowerment of girls
– How to reduce corruption
11
Post-Conflict Sierra Leone
12
A
B
D
U
L
L
A
T
I
F
J
A
M
E
E
L
P
O
V
E
R
T
Y
A
C
T
I
O
N
L
A
B
Sierra Leone Background
• Devastating 10 year civil war.
• Conflict was intergenerational not ethnic reflecting lack of power and
control among young as well as economic mismanagement.
• Many decision makers killed, schools and other infrastructure
destroyed.
• Government of Sierra Leone with World Bank implementing
ambitious decentralization program
• Piloting a Community Driven Development program which gives
money to villages who decide their own priorities for investment.
Includes processes designed to promote participation of excluded
groups.
13
Sierra Leone Evaluation
• 250 villages in two districts
– half get CDD pilot, half do not
• Outcomes: trust, participation
• Measures of trust
– frequency of common actions that require trust of neighbor
• Measure of participation
–
–
–
–
follow common decisions,
observe role of youth, women, outsiders
how often do the speak, do people respond
outcomes more linked to preferences of youth or elders?
14
Women’s Empowerment in India
15
A
B
D
U
L
L
A
T
I
F
J
A
M
E
E
L
P
O
V
E
R
T
Y
A
C
T
I
O
N
L
A
B
Women’s Empowerment
•
Does having a woman leader make a difference?
•
In 1992, India devolved power over expenditures on local public goods to
Gram Panchyats. One third randomly chosen to be headed by women.
•
Many believed policy had little impact as women leaders appeared to be
deferring to their husbands.
•
Collected data on women’s preferences and public works carried out in
West Bengal and Rajasthan in reserved and unreserved villages.
•
Women invested more in goods that were of higher priority to women in that
state
•
Perceptions surveys suggest that even when women are doing a better job
than men (eg on water quality) perceptions are that they are doing a worse
job. Potential justification for quotas of some kind.
16
Comparative Costs in Education
17
A
B
D
U
L
L
A
T
I
F
J
A
M
E
E
L
P
O
V
E
R
T
Y
A
C
T
I
O
N
L
A
B
Calculating comparative costs
•
MDGs for education seek to get 100% participation in primary school and
gender equality in education participation more generally
•
Many different approaches have been tried to increase enrolment
•
Reducing the cost of school is clearly effective in increasing attendance:
– Progresa
– Providing free uniforms to school children
– School meals for preschoolers
•
Health can have an important impact—and be very cost effective
•
Gender equity
– do women teachers make a difference?
– Scholarships for girls
– General programs had important impact on girls
•
Quantity and quality
18
19
Which proved true, which false?
1. Giving children in poor schools computer math games
increases math test scores.
2. Group liability in microfinance produces higher
repayment rates than when individuals get individual
loans.
3. Monitoring by locals is more effective in combating
corruption in local community projects than bringing in
an outside auditor from the government auditing
agency (Indonesia).
20
Impact Evaluations: A Public Good
• Evaluation results benefit everybody. Knowledge of what
works and what does not is a global public good
• If it is difficult for donors to distinguish good and bad
evaluations, promoters will prefer biased or imprecise
methods, and select the best result for their program
• Sponsors understand the game that is being played, and
rationally discount any estimate that is presented to them
– Money flows to the most eloquent, empassioned and connected?
• If nobody knows what works and what does not,
efficiency and support for development assistance falls
• Randomized evaluation can cut through this
– Difficult to manipulate results
– Simple standard for sponsors to recognize quality
– Simple presentation of results, no fancy econometrics.
21
What We Have Learned So Far
Kamilla Gumede
3ie Conference on Perspectives on Evaluations,
Cairo, April 1, 2009
22
J-PAL has 181 completed and ongoing
projects in 30 countries
Sectors we work in
General findings
• Aid effectiveness: Lot to learn from randomized experiments;
impacts cannot be assumed.
• Poor, rational … and human: Randomized evaluations help
shape economic theory.
• Important, cross-sectoral similarities.
25
Aid effectiveness
• Impacts cannot be assumed, should be rigorously
tested.
• Lack intuition about sub-program cost effectiveness.
• Cheap, effective and scalable anti-poverty programs do
exist.
26
Poor, rational … and human
• Procrastination affects the poor too (much)
• Demand for commitment devices
• Something special about zero
27
Cross-sectoral similarities
• Problem of provider absence
• Take-up (incentives, commitment needed)
• Sensitivity to positive prices
28
Conclusion
• Randomized experiments are feasible.
• New generation of studies go beyond impacts to look
inside ‘black box’ and understand why projects work
(or don’t) and for whom.
• Impacts cannot be assumed, must be tested.
• Many similarities across sectors and locations.
29
Working with implementing
organizations
• Randomized controlled trials can be used
to evaluate impact (of an intervention vs
no intervention)
• But also to improve products and
programs and understand what works best
(one program variation vs another)
• It can be very easy to implement, and in
some cases can be fairly cheap
– e.g product innovation testing in microfinance
How can RCTs be useful for
practitioners?
Problem
Solution
Implementation
Voluntary?
Safe savings products
Commitment
mechanisms?
Vendors caught in a
debt trap
Financial literacy
Health prevention?
Defaults due to health
events
Insurance?
Independent
insurance?
Bundled product?
Farmers often do not
make productive
investments
Innovative financial
product design?
Training?
Our partners
• We work with different types of partners:
• NGOs, Micro Finance institutions, Governments
and government bodies, International
organizations
• Some examples:
– Pratham, an Indian NGO focused on education in all
India
– Green Bank, a Micro Finance Institution in the
Philippines
– Care International in Ghana, Malawi and Uganda
– The Rajasthan police in India
– The government in Sierra Leone
How do we work with organizations
• First step Talk to the partner organization and jointly identify issues
they deal with, and interventions they’d like to test
• Seva Mandir, an Indian NGO, was concerned about the health
status in the district in which they work
– We conducted a baseline survey to identify main health issues in the
districts, and relevant interventions to test
• Green Bank in the Philippines was concerned about a specific
problem: why don’t people save more?
– We brainstormed about possible product designs before deciding to
evaluate one
• In other cases, organizations are interested in evaluating the impact
of a project that they have already designed
– Micro-credit programs
– Care International Village Loans and Savings Associations program
How do we work with organizations
• Second step: take the organization’s staff through the
steps of implementing an RCT, and its implications for
the organization
• Third step: identify best ways to introduce a randomized
design without disturbing the organization’s objectives
– In India, one of the main issues we identified in the baseline was
a very low rate of immunization
– Designed a health camp + incentives program
– There was not enough funds to do this program everywhere : the
project had to be implemented in a limited number of villages
– In the Philippines, the organization was not sure whether the
new product would work, and wanted to pilot it first
– Randomizing was a way to pilot the program on a small number,
in such a way that it could be rigorously tested
How do we work with organizations
• Fourth step: identify the right area and the
right sample frame
– Either in new areas, or among existing
beneficiaries, depending on the evaluation
– Need to do the evaluation in a representative
area
– Need to understand what population is
targeted by the program, so that the study is
carried out among such populations
How do we work with organizations
• Fifth step: start the evaluation!
– Conduct a baseline
– Randomize (i.e randomly assign individuals, or
communities, to receive the new program or not)
– Start the intervention in treatment groups
– After some time, conduct a follow up survey
• Note that in some cases, what we are
interested in can be found in the
organization’s database
– For e.g when we are interested in a new savings
product aiming to get people to save more
How do organizations use our
results
• Pratham Read India
– Have scale up the program to more than
20,000,000 children in India, will scale to
60,000,000
• Seva Mandir immunization program (in
India)
• Flip chart program was stopped after it
was proven not effective
Scaling up beyond those
organizations
• Once we have findings about what works and what does
not in different contexts, one of our goals is to encourage
scaling up of those ideas
• Deworm the World
– An NGO was created to scale up a deworming program
– Was evaluated in Kenya and proved very effective at keeping
children in school
• Microfinance products:
– Through practitioners’ manuals and technical assistants
– Starting such an exercise for text reminders to remind people to
save