Transcript Slide 1

The Overall Agenda
• When Will We Ever Learn: general introduction
to Impact Evaluation
• When Random Assignment is Possible?
– Implementing and Evaluating RCTs
• When Random Assignment is Not Possible?
– Quasi-experimental methods  propensity scores,
matching, IV, Regression Discontinuity and DinD.
When Will We Ever Learn?
What is impact evaluation, and when
and how should we use it?
Session 1
Scott Rozelle
Stanford University
Amazing Ideas
• Sleeping Bag Incubator
• Treadle Pump Irrigation
• Agricultural Price
Services through
Mobile Phones
• Computer Assisted
Learning for Remedial
Tutoring
New programs in China (huge
fiscal investment)
• Rural Health Insurance (NCMS or hezuo yiliao in
China)
• New Subsidy Program (liangshi butie)
• New Education Programs (e.g., raising teacher
salaries … or … eliminating tuition for high
school)
• Financial Crisis Stimulus Package (investments
by central gov’t; investments by localities)
• How many of the innovations/programs that we
heard about on the news …
… how many of the new technologies/programs
that we have become excited about …
… how many have been rigorously evaluated?
• Do we have empirical evidence, based on a
carefully constructed counterfactuals, that these
breakthroughs/programs work, can positively
affect the lives of the poor and do so in a cost
effective way?Unfortunately, the answer is almost certainly some, but, not
many …
Huge Global Initiatives
• UN Millennium Development Villages
• USAID Bilateral Investment Program
• World Bank / ADB’s Loan Program
Statement of Facts:
“Accelerating social progress in low- and middleincome countries requires knowledge about
what kinds of social programs are effective. Yet
all too often, such basic knowledge is lacking
because governments, development agencies,
and foundations/NGOs have few incentives to
start and sustain the impact evaluations that
generate this important information.”
(International Evaluation Gap Working Group)
“When it comes to attribution, there is shockingly
little concrete evidence about what works and
what does not” (Author of report: When will we
ever learn)
I was at a conference about 2 to 3 years ago, where one young researcher claimed (in
front of scores of older, experienced development economists  after 40+ years of
development economics and we had not learned anything … until his work (of course)
The Excuses
• We don’t have time
• It costs too much to do rigorous impact
evaluation
• It is unethical
• Project implementation is site and context
specific
The Excuses
• We don’t have time
• It costs too much to do rigorous impact
evaluation
• It is unethical
• Project implementation is completely site
and context specific
• We already know!
Example:
• J. Sachs  we already know ITN’s work
(insecticide treated nets) for the prevention of
malaria
• In fact, this time  rigorous public health trials
support it:
– 90 villages: give residents ITN’s
– 90 villages: give residents “0”
In treatment villages, reductions of malaria, anemia
and other benefits
 even positive spillovers: villages/hamlet around the
treatment villages (within 300 meters) also benefitted through
reduction of malaria (although no ITN’s) … miracle?
ITN’s
(insecticide
treated
nets)
Policy implications
• People were not buying them …
– Despite people being “very afraid” of malaria
…
• Why?
– Stanford University team’s hypothesis:
 one-time cost too high
– Leads to a new RCT
Micro credit or free
• Through an NGO that had “cells of members”
[10 to 20 to 30 households per village) in 100s of
villages in India, did RCT with two treatment
arms:
– Treatment village 1: give away for free to all NGO
members
– Treatment village 2: sell ITN’s to households as part
of a Micro credit (peer monitoring) project
– Control villages: “0”
• What is the outcome?
Impact: ZERO
[none: for malaria / none: for anemia …
NONE
none in treatment village 1 / none in treatment
village 2 / none in control villages]
• Explanation:
– Have no definitive proof (though now we may know
why villagers do not buy them … they don’t seem to
work …
• Theory:
– Revisit the original trial … and Revisit and live in own
project villages
• People do not always use ITN’s … trouble / hard to hang /
uncomfortable / too many people, not enough nets
An explanation
• How is it that if people do not use them (even in
the original public health trial treatment villages)
that they have an impact in the villages AND on
surrounding villages?
• Only real difference between original trial (100%
of households in trial) and Stanford’s trial (10 to
20% of households in trial)  maybe it is that all
mosquitos are killed and populations collapse
when all households have ITN’s … this would
account for efficacy in trial and the spillover …
• However, in the partial roll out villages, the ITN’s
not effective!
ITN’s do work … but, with a caveat
• Current most plausible explanation:
– In the large public health trial, when all of the villagers
received the ITN’s … and were encouraged to use
them (and did, at first) … ALL of the mosquitos died
…
this reduces malaria in the treatment villages
and
the surrounding hamlets
• Jeffey’s response?
– Of course, that is why we give them to all of the
families … of course, he had no idea …
– Maybe he did “know” … but, surely he does not
understand … [but, then none of us do now]
http://reap.stanford.edu
The Rural Education Action Project of Stanford University is a
Research Organization / NGO / Government Organization /
Policy Action partnership.
At Stanford University
Collaborators in China
Fundamental question which we try to answer:
What Can Be Done to Overcome the
Gap in Human Capital between
the rural, unskilled, poor and the urban,
skilled middle class?
To understand the barriers keeping the rural poor from
closing the gap AND learn what can be done….
REAP Works in Two Ways . . .
1.) We design and implement new program interventions
AND we do the evaluations
2.) We partner with NGOs and gov’t agencies who are
trying to implement projects
– We advise.
– They carry out.
– We evaluate.
REAP Partners
REAP Partners
Including our best partner
(of course):
Qu ickT ime™ an d a
de com pres sor
are nee ded to se e th is pic ture .
QuickTime™ and a
decompressor
are needed to see this picture.
LICOS
REAP’s Educational Challenge Areas
Health, Nutrition and Education
Technology and
Human Capital
Access to Secondary
Education and Beyond
REAP Projects in China (1)
Health, Nutrition and Education
1. Overcoming the Anemia Puzzle in Rural China
2. Worm Count: Intestinal Worms in Rural China
3. Is One Egg Enough? School Nutrition Programs in Rural Shaanxi
4. Vitameal or Vitamins? Grades and Nutrition in Shaanxi
5. Experimenting with Nutrition: Ningshan County
6. Paying for Performance in the Battle against Anemia
7. Conditional Cash Transfers and Cost Effectiveness in the Battle Against
Anemia
8. Nutritional Training in Ningxia
9. Eggs and Grades
10. Reducing Transaction Costs: Chewable Vitamins in Gansu
11.Best Buy Toolkit: Nutrition, Deworming & Vision Interventions in Rural
Schools
REAP Projects in China (2)
Technology and Human Capital
12. Computer Assisted Learning in Beijing Area Migrant Schools
13. Computer Assisted Learning in Rural Boarding Schools
14. Computer Assisted Learning in Rural Minority Areas
15. One Laptop Per Child: Does It Help?
16. Nutritional Training and Mobile Messaging
REAP Projects in China (3)
Access to Quality Secondary Education and Beyond
17. Boarding School Management
18. Pre School Vouchers for Needy Families
19. Evaluating Pre School Teacher Training (Nokia, China)
20. Early Commitment of Financial Aid for University
21. SOAR Foundation: What if High School Were Free?
22. Scholarships with Strings Attached: Community Service
23. Financial Aid in Shilou County
24. Contracting for Dreams in Ningshan County
25. Summer Fresh Migrant School Teacher Training Program
26. Peer Tutoring versus Paying for Grades
27. Vouchers, Vocational Education and Career Counselling
28. Scholarships at Four Tier One Universities
29. Breaking the Cycle of Poverty: Cash Transfers for Jr. High
REAP Projects in China
Today’s (this session’s) plan
• Introduce the concept of IE
– Definitions and examples of what is right and what is
not right
– RCT’s … when possible, sexy gold standards!
– When you can’t randomize (still a lot of excitement)
• IE is not enough: Supplementary tools
• Issues in choosing an IE strategy
– Selecting a control group
– RCTs or Quasi-experimental approaches?”
A lot more rigor in sessions 2 and 3 …
What is impact?
• Impact = the outcome with the intervention
compared to what it would have been in
the absence of the intervention
• Unpacking the definition
–
–
–
–
Can include unintended outcomes
Can include others not just intended beneficiaries
No reference to time-frame, which is context-specific
At the heart of it is the idea of a attribution – and
attribution implies a counterfactual (either implicit or
explicit)
Defined in this way – UNFORTUNATELY (as
discussed above) – we (the international
development community) have little
evidence on impact of development
programs
[in other words: we don’t (systematically)
know the results of many of the
uncountable number of programs that
development agencies, gov’ts and other
The attribution problem:
factual and counterfactual
With project
Project
impact
With project
Impact varies over time
Impacts also are defined over time … Little attention has been given to
the dynamics over time … though people think about this …
Change in the CAL program effect on the standardized
math test scores over time
Standardized math test score
0.2
0.18
0.12
0.16
0.11
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
Time
0
-0.02
Baseline (Sept. 1)
Midterm (early Nov)
Final (late Dec)
-0.04
The CAL program effect occurred by the midterm evaluation, less
than two months after the start of the program.
Change in the CAL program effect on the standardized
math test scores over time
Standardized math test score
0.2
0.18
0.12
0.16
0.11
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
Time
0
-0.02
Baseline (Sept. 1)
Midterm (early Nov)
Final (late Dec)
-0.04
There is no improvement between month two and month three..
Impact of nutrition at infancy in
Guatemala
• After 2 years  greater BMI
• After 10 years  higher grades in school
• After 15 years  higher school attainment
• After 40 years  higher wages / income
And, even longer run …
What has been the impact of
the French revolution?
“It is too early to say”
Zhou Enlai
Lets examine a less grandiose
intervention
• The venue: Poor areas of South West China … a
remote mountainous region … populated by groups
of Dai and Dong minorities …
• In 1980s and 1990s only small share of girls
attended school … most were involved with farming,
tending livestock and raising siblings …
• An NGO began giving scholarships in the early
1990s … objective: increase attendance of girls …
they claim in their very polished promotion material
and in the many workshops that they attend that
they have been effective in their mission …
What do we need to measure impact?
Girl’s primary school enrollment
Before
Project
(treatment)
After
92
Control
The majority of evaluations have just this
information … which means we can say
absolutely nothing about impact
NOTE: if you
measure this
well, what is
it? Outcome
monitoring
What does 92 percent mean?
• Is it high?
• Is it low?
• What does a single number mean?
• What do we compare this to?
Even if done well … output monitoring in its simplest
form  TELLS US NOTHING about impact
“Before versus after” single-difference comparisons
Before versus after = 92 – 40 = 52
Project
(treatment)
Before
After
40
92
Control
“scholarships
have led to rising
schooling of
young girls in the
project villages”
This ‘before versus after’ approach is more careful
outcome monitoring, which has become popular
recently.
Outcome monitoring has its place, but:
outcome monitoring ≠ impact evaluation
The changing macro environment
… and rising employment
opportunities and wages
Percent of cohort
Yuan / month
100
1000
50
500
0
0
1990
1995
2000
2005
Employment in the off farm labor
market – 16 to 25 year olds
1990
1995
2000
Off farm wage rate
2005
Rates of completion of elementary male and
female students in all rural China’s poor areas
Share of rural children
100
80
60
1993 2008
1993 2008
girls
20
boys
40
0
Outcome monitoring does not
tell us about effectiveness
Results… cannot as a rule be attributed
specifically, either wholly or in part, to the
intervention
An (important) aside
Collecting data in order to measure
outcomes “before an intervention”
• Can we collect data about outcomes before interventions, after
the intervention (that is: is recollection data valid?)
• No (or be careful): work by economists have shown clearly that
there are lots of biases introduced to IE by relying on
recollection data (most of them psychological)
– If individuals have been given a treatment, they often will selectively
remember … they will exaggerate the benefits as a way of showing their
gratefulness …
– Those in the control groups will often want to show they are less
fortunate and understate their condition (or improvement)
– Empirically, recollection data have lots of biases … hard to determine the
direction …
– Best practice (only practice?): collect baseline before the project begins
Another common approach (lets compare to another set of villages):
Post-treatment control comparisons
Single-difference = 92 – 84 = 8
Before
After
Project
(treatment)
92
Control
84
But we don’t know if treatment and
control groups were similar before…
• How often are intervention villages / schools / clinics / etcetera / chosen
in a way that make them systematically different than control villages?
[either for convenience / political necessity / feasibility / cost considerations /
or from leaving it to the local partner who uses who-knows-what-type of
selection method]
• In the SW China villages, the NGO went to a poor county, but, the local
bureau of education chose the villages … and chose them along the road
…
• Is attendance in elementary school lower in the control villages because
the NGO did not pass out scholarships, or because villagers in control
villages had less use for education (or the cost of going to school higher)
Another common approach (lets compare to another set of villages):
Post-treatment control comparisons
Single difference = 92 – 84 = 7
Before
After
Project
(treatment)
92
Control
84
Main point: Post treatment control comparisons
are only valid if treatments and controls were
identical at the time the intervention began …
Therefore: lets collect data for all of the cells?
Double difference =
(92-40)-(84-26) = 52-58 = -6
Before
After
Project
(treatment)
40
92
Control
26
84
Conclusion: Longitudinal (panel) data, with a control group,
allow for the strongest impact evaluation design
(BUT: still need matching … if they are different at the start of
the project … is there something different in the village which
would affect the village’s response to the intervention?)
Main points so far
• Analysis of impact implies a counterfactual
comparison
• Outcome monitoring is a factual analysis, and so
cannot tell us about impact
• The counterfactual is most commonly
determined by using a rigorously/carefully
chosen control group
If you are going to do impact evaluation you
need a credible counterfactual using a control
group (not necessarily RCT / but, still need
control)
“Gold Standard”  Randomized
Control Trials
Medical
Zero
What is the
counterfactual?
“Gold Standard”  Randomized
Control Trials
Crop field
trials
Zero
What is the
counterfactual?
Can also do randomized control trials in schools to test the
effectiveness of new school program …
Social Experimentation …
Step 1: choose 50 schools … randomly divide into 2 groups
25 elementary schools in Gansu 25 elementary schools in Gansu
Does one egg per day, improve test scores / attendance?
One Egg Per Day
None
0
25 elementary schools in Gansu 25 elementary schools in Gansu
What is counterfactual?
Randomized Control Trial
[like in agriculture or medicine]
our question:
Will one egg per day lead to higher test scores?
Three Stages
1. Baseline survey
2. POLICY
EXPERIMENT
RCT’s
treated
3. Evaluation
survey
control
0
Change in test
scores between
baseline and
evaluation surveys
Results: One Egg vs. “0”
Difference statistically significant at 95% level
of confidence
5
4
3
2
1
0
-1
-2
-3
One Egg /
Day Schools
control
Control
Schools (0)
What is causing the difference between Treatment Schools
and Control Schools? What is the counterfactual?
Before, we talk more about pros/cons and
keys/pitfalls of running large RCTs studies:
Broaden our set of definitions about IE
• Discussion above was for ‘large n’
interventions
– There are a large number of units of intervention, e.g. children, households,
firms, schools.
– Examples of “small n” are most (but not all) policy reform and many (but not
all) capacity building projects.
– E.g.: some reforms (e.g. health insurance) can be given large n designs
• ‘Small n’ interventions require:
– Modelling (computable general equilibrium, CGE, models), e.g. trade and
fiscal policy … or role of agriculture in development?
– A theory-based analysis (this is what is modeled in a small-n study …) … it is
the logic through which the reform / new capacity will drive economic change
…
In fact, many things can’t be
randomized?
• Effect of a road on access to off farm
employment
• An agricultural subsidy program (that already
has been rolled out?
• Impact of the decisions of migrant families to:
leave kids behind (get educated in village’s rural
public schools while living with Grandma … or
go with Mom and Dad and get educated in the
city in a private, unregulated migrant school).
How well do students that attend migrant
schools perform in standardized tests?
Standardized math
score
Children in migrant schools actually are a bit above
those in poor rural schools
85
80
75
70
79.7
80.3
71.9
68.6
65
Urban students-BJ Migrant students- Migrant students- Rural students-SX
public schools BJ public schools
BJ Migrant
Rural schools
schools
Control for observable characteristics of students and parents (in
both rural schools and migrant schools) … and for length of time
that migrant children have been in migrant schools … and using
quasi experimental methods (e.g., matching in this case)
Standardized
The
argument is that parents
math score
bring their85children that are better
students into urban areas with
80.3
79.7
them … so80after controlling for
these factors … the difference
goes away75…
70.5
70
AND:
68.2
65
Urban
Migrant
Migrant
Left Behind
If you then compare
students in
students-BJ
students-BJ Children (rural
migrant schoolsstudents-BJ
that have been
in
public schools public schools
Migrant
schools)
Beijing for > 3 years
schools
In fact, many things can’t be
randomized?
• Effect of a road on access to off farm
employment
• An agricultural subsidy program (that already
has been rolled out?
• Impact of the decisions of migrant families to:
leave kids behind (get educated in village’s rural
public schools while living with Grandma … or
go with Mom and Dad and get educated in the
city in a private, unregulated migrant school).
Should we not work on these questions?
But some “randomistas” act as
though “if you can’t randomize,
don’t study it”
• Why?
• They argue: can’t control for
unobservables … no matter what you do
… except for randomize …
[there is a name for this:
RADICAL SKEPTICISM]
OLS does not work / IV does not work / matching not
enough / regression discontinuity … there are always
possible unobservables that might confound the results …
so just don’t try … just randomize them away ..
Carter and Barrett response:
• Barrett C & M. Carter. “The Power and
Pitfalls of Experiments in Development
Economics,” Applied Economic
Perspectives and Policy” 32(4): 515-548
• Carter, M. & C. Barrett. “Retreat from
Radical Skepticism: Rebalancing Theory,
Observational Data and Randomization in
Development Economics.” On the web
(forthcoming in some edited volume)
Their main point:
• According to Randomistas: “If you are a radical skeptic
(you can never account for unobservables) …”
• Carter/Barrett’s statement: If this is true, you should NOT
be a Randomista … [based on the randomista’s own logic]
• Logically, you can not believe that there is anything
generalizable that can be learned from impact evaluations
… Why? … because there is NO external validity  due
to the same reason of radical skepticism … [there are
unobservables in the locality in which the experiments are
being run that interact with the treatment and are part of
the measured effect … since those unobservables are
unknowable / unmeasureable and uncountable, then we
have ZERO predictive power when we take the program
outside of the original experimental zone …
How do biologists deal with this?
• I met a biologist recently who told me that
he was working on a study:
Meta Analysis of Meta Analyses: Effect of
Aspirin on Heart Attacks …
In past 15 years: 1500 studies (almost all
RCTs) … 33 Meta Studies … and now 1
Meta-Meta Study …
This is one way to deal with it? But, are we ever going to do 1500
RCTs on the effect of ICTs on Malaria?
In economics, will we ever do two?
Truth is somewhere between:
• Need skepticism … not radical skepticism
…
• RCTs are great, when you can do them …
but, there are still limitations …
• Observational data sets have limitations,
but, there is still a lot to learn from them …
if the care and cautions are taken …
Regardless of being large-n or small-n, our focus
is on learning why things work, not just what:
(measurement is not evaluation)
This is where we need:
• Qualitative supplementary work (with
quantitative IE)
and/or
• Theory-based impact evaluation
Why? To allow us to interpret the IE measurements … and identify why a
project is working … or not … or how it can be improved … etc …
This helps address one more criticism
of the current wave of IE studies …
They only tell us: what works … and
not much else!
• This has two disadvantages:
– Why would anyone want to be told that their
project does not work
• World Bank employee?
• Government official?
• NGO?
– If you only know it does not work, what is the
implication? Eliminate the program … or fix it?
But, how?
International Initiative for Impact
Evaluation (3ie) is an international
organization trying to put the “how” [or
wow] in rigorous IE with “theory-based
evaluation” or “causal chain analysis”
Example: a nutrition project in
Bangladesh
• Source: Howard White and Edoardo Masset (2007) ‘The
Bangladesh Integrated Nutrition Program: findings from an
impact evaluation’ Journal of International Development 19:
627-652
• Bangladesh Integrated Nutrition Project
(BINP) … a World Bank Project
• Problem: lots of malnutrition … difficult to
solve in traditional institutional structures

• Growth monitoring, nutritional counselling
and supplementary feeding (based on a
program in Tamil Nadu, which was
successful)
• According to the design of the project,
implemented by NGOs at field level, used
Community Nutrition Practitioners (CNPs)
Program design (theory of change)
Target group
participate in
program
(mothers of
young
children)
Target group
for
nutritional
counselling is
the relevant
one
Exposure to
nutritional
counselling
results in
knowledge
acquisition and
behaviour
change
Behaviour change
sufficient to change
child nutrition
Children are
correctly
identified to
be enrolled in
the program
Food is
delivered to
those enrolled
Supplementary
feeding is
supplemental, i.e.
no leakage or
substitution
Improved
nutritional
outcomes
Food is of sufficient
quantity and quality
mothers
A
targeting
participants
B1
B2
children
counselling
C1
C2
supplemental
feeding
change behavior in child nutrition
D1
D2
no leakage / substitute
D3
sufficient qnty / qlty
E
improved
nutrition
outcome
The evaluation story
• Looked like it was working – all bits in place and
outcome monitoring data showed fall in severe
malnutrition
• Bank agreed to scale up (this is an expensive
program … funded at expense of other projects)
• But Save the Children UK critical, though Bank’s
M&E team was positive
• Bank’s evaluation department (IEG) did a more
rigorous evaluation – found little or no impact
• Theory-based approach explains why
Height for Age Scores single differences
(between treatment and controls)
Measuring outcomes and impacts (M&E)
0.15
Project M&E: Post
treatment control
differences
OED: propensity score
matching (PSM)
0.1
After the
project
0.05
Mid After the
term project
0
-0.05
-0.1
Mid
term
The evaluation story
• Looked like it was working – all bits in place and
outcome monitoring data showed fall in severe
malnutrition
• Bank agreed to scale up (this is an expensive
program … funded at expense of other projects)
• But Save the Children UK critical, though Bank’s
M&E team was positive
• Bank’s evaluation department (IEG) did a more
rigorous evaluation – found little or no impact
• Theory-based approach explains why
Height for Age Scores single differences
(between treatment and controls)
Measuring outcomes and impacts
0.15
Project M&E: Post
treatment control
differences
OED: propensity score
matching (PSM)
0.1
After the
project
0.05
Mid After the
term project
0
-0.05
Mid
term
-0.1
Two points:
1.) need for control group that is similar to treatment group (this is
what PSM does)
2.) demands an explanation of the problems with the program
The evaluation story
• Looked like it was working – all bits in place and
outcome monitoring data showed fall in severe
malnutrition
• Bank agreed to scale up (this is an expensive
program … funded at expense of other projects)
• But Save the Children UK critical, though Bank’s
M&E team was positive
• Bank’s evaluation department (IEG) did a more
rigorous evaluation – found little or no impact
• Theory-based approach explains why
Implementing theory-based analysis
Assumption
Findings
Provide nutritional counseling to care
givers
Mothers are not decision makers,
especially if they live with their
mother-in-law
Women know about sessions and
attend
90% participation, lower in more
conservative areas
Malnourished and growth faltering
children correctly identified
No – community nutrition
practitioners (CNPs) cannot interpret
growth charts
Women acquire knowledge
Those attending training do so
And knowledge is turned into practice No there is a substantial knowledgepractice gap
Supplementary feeding is additional
food for intended beneficiary
No, considerable evidence of
substitution and leakage
Adopted changes are sufficient to
improve intended outcomes
Only sometimes
Program design (theory of change)
Target group
participate in
program
(mothers of
young
children)
Target group
for
nutritional
counselling is
the relevant
one
Exposure to
nutritional
counselling
results in
knowledge
acquisition and
behaviour
change
Behaviour change
sufficient to change
child nutrition
Children are
correctly
identified to
be enrolled in
the program
Food is
delivered to
those enrolled
Supplementary
feeding is
supplemental, i.e.
no leakage or
substitution
Improved
nutritional
outcomes
Food is of sufficient
quantity and quality
mothers
A
targeting
participants
B1
B2
children
counselling
C1
C2
supplemental
feeding
change behavior in child nutrition
D1
D2
no leakage / substitute
D3
sufficient qnty / qlty
E
improved
nutrition
outcome
Height for Age Scores single differences
(between treatment and controls)
Impacts when mother participated
0.15
Project M&E: Post
treatment control
differences
OED: propensity score
matching (PSM)
0.1
After the
project
0.05
Mid After the
term project
0
-0.05
Mid
term
-0.1
When examining just mothers that did
not live with their mother in laws … and
babies were supposed to be in the project …
Illustrating the principles of rigorous IE
(good measurement and astute analysis)
• Rigorous evaluation of impact using an appropriate
counterfactual: PSM versus simple control
• Map out the causal chain (programme theory)
• Understand context: Bangladesh is not Tamil Nadu
(where CPNs were well trained; they were NOT well
trained in Bangladesh)
• Anticipate heterogeneity: more malnourished
children are in one subgroup of population than others
• Use mixed methods: informed by anthropology, focus
groups, own field visits
Final comments
Things to remember
Why has impact evaluation become
Why has impact
become important?
important?
• The results agenda
– E.g. Millennium Development Goals and report cards
– Focus on outcomes an improvement over impact
monitoring
• But outcome monitoring is not impact
– Experience of USAID
Other places it is taken more seriously
• Examples of results-orientation
– Academic funding in developed countries
– Mexico: Progresa/Opportunidades and
Coneval
• Progresa/Opportunidades…
– India: outcome budgeting and IEO
Key steps for impact evaluation design
• Well defined objectives and measurable
outcomes
• A credible counterfactual, usually using a
comparison group based on either random
assignment or matching (before versus after and
naïve comparisons don’t work)
Key steps for impact evaluation design (2)
• Baseline data considerably strengthens
the design [start early!!]
• A theory-based approach (using causal
chain) allowing analysis of causal
pathways
The appeal of random assignment
• In a randomized control trial the treatment is randomly
allocated
• This is more possible than you may think, with examples
from governance, legal reform, and environment as well
as health and education
• Most arguments against it are readily countered
Alternatives to randomization
• Randomization not always possible
• Then use statistical matching / other quasi-experimental
methods …
• Don’t give up on observational data (either your
own …or others  Can often examine program impact
using existing data (e.g. IADB vocational training studies)
“Dean Karlan fights poverty one RCT at a time …” … this is the type of
statements that get Esther duFlo and others who practice rigorous IE in
trouble … [on cover of his book: “More Than Good Intentions”
When to do an impact evaluation
• Pilot programs – only a meaningful pilot if you
evaluate it (and outcome monitoring tells you
nothing about impact)
• Representative or important programs
• We have learned a lot about development in the
past 40 years … we could have learned a lot
more!
The final word
• Impact evaluation matters
• Matters to spending money well
• Matters to know how to make programs
work better
• Matters to ending poverty
Thank you
[see next slide for schedule for rest of today and
tomorrow]
Rest of the day / tomorrow
• Doing Impact Evaluation  When Relying
on Randomized Control Trials
• Doing Impact Evaluation  With Quasiexperimental Methods