English (1-day version)
Download
Report
Transcript English (1-day version)
Promoting a RealWorld
and Holistic approach to
Impact Evaluation
Designing Evaluations under Budget, Time, Data and Political Constraints
AEA pre-conference professional development workshop
Minneapolis, October 24, 2012
Facilitated by
Michael Bamberger and Jim Rugh
Note: This PowerPoint presentation and the summary
(condensed) chapter of the book are available at:
www.RealWorldEvaluation.org
1
Workshop Objectives
1. The basics of the RealWorld Evaluation approach
for addressing common issues and constraints faced
by evaluators such as: when the evaluator is not
called in until the project is nearly completed and
there was no baseline nor comparison group; or
where the evaluation must be conducted with
inadequate budget and insufficient time; and
where there are political pressures and
expectations for how the evaluation should be
conducted and what the conclusions should say;
2
Workshop Objectives
2.
3.
4.
5.
6.
7.
Identifying and assessing various design options
that could be used in a particular evaluation setting;
Some tips on developing logic models;
Ways to reconstruct baseline data;
How to account for what would have happened
without the project’s interventions: alternative
counterfactuals;
The advantages of using mixed-methods designs;
Considerations for more holistic approaches to
impact evaluation.
3
Workshop Objectives
Note: In this workshop we focus on projectlevel impact evaluations. There are, of
course, many other purposes, scopes,
evaluands and types of evaluations. Some
of these methods may apply to them, but our
examples will be based on project impact
evaluations, most of them in the context of
developing countries.
4
Workshop agenda: morning
8:00-8:30
8:30-9:00
9:00-9:30
9:30-9:50
9:50-10:20
10:20-10:40
10:40-11:00
11:00-12:00
12:00-1:00
Introduction + Brief overview of the RealWorld Evaluation
approach
Basic evaluation design scenarios Jim 29-48
Small group discussion: personal introductions,
including RWE-type constraints you have faced in your
own practice
break
Logic models; reconstructing baselines Jim 51- 77
Mixed Method evaluations Michael 78-102
Alternative counterfactuals Jim 103-115
Small group exercise, Part I: Reading of case studies
then preparing an evaluation design when working under
budget, time, data or political constraints
lunch
Workshop agenda: afternoon
1:00-1:45
Impact Evaluations Michael 120-134, Jim 135-155
1:45-2:30
Small group exercise, Part II: 'Clients’ and ‘consultants’
re-negotiate the case study evaluation ToR. Note: In addition to becoming
familiar with some of the RWE approaches, the case study exercises will
also illustrate the different evaluation agendas and perspectives of
evaluation consultants, project implementers and funding agencies.
2:30-2:50
2:50-3:00
Plenary discussion of main learnings from group exercises
and the RealWorld Evaluation approach. Jim 156-159
Wrap-up and evaluation of workshop by participants
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
OVERVIEW OF THE
RWE APPROACH
7
Typical RealWorld Evaluation
Scenario
Evaluator(s) not brought in until near end of
project
For political, technical or budget reasons:
• There was no life-of-project evaluation plan
• There was no baseline survey
• Project implementers did not collect
adequate data on project participants at the
beginning nor during the life of the project
• It is difficult to collect data on comparable
control groups
8
Reality Check – Real-World
Challenges to Evaluation
•
•
•
•
•
•
All too often, project designers do not think
evaluatively – evaluation not designed until the
end
There was no baseline – at least not one with data
comparable to evaluation
There was/can be no control/comparison group.
Limited time and resources for evaluation
Clients have prior expectations for what they want
evaluation findings to say
Many stakeholders do not understand evaluation;
distrust the process; or even see it as a threat
(dislike of being judged)
9
RealWorld Evaluation
Quality Control Goals
Achieve maximum possible evaluation rigor
within the limitations of a given context
Identify and control for methodological
weaknesses in the evaluation design
Negotiate with clients trade-offs between
desired rigor and available resources
Presentation of findings must acknowledge
methodological weaknesses and how they
affect generalization to broader populations
10
The Need for the RealWorld
Evaluation Approach
As a result of these kinds of constraints, many of
the basic principles of rigorous impact
evaluation design (comparable pre-test -- post
test design, control group, adequate instrument
development and testing, random sample
selection, control for researcher bias, thorough
documentation of the evaluation methodology
etc.) are often sacrificed.
11
The RealWorld Evaluation Approach
An integrated approach to
ensure acceptable standards
of methodological rigor while
operating under real-world
budget, time, data and
political constraints.
This is the cover of the cover of the 1se edition of the
RealWorld Evaluation book (2006)
12
EDI T I O N
This book addresses the challenges of conducting program evaluations in real-world contexts where evaluators
and their clients face budget and time constraints and where critical data may be missing. The book is organized
around a seven-step model developed by the authors, which has been tested and refined in workshops and in
practice. Vignettes and case studies—representing evaluations from a variety of geographic regions and sectors—
demonstrate adaptive possibilities for small projects with budgets of a few thousand dollars to large-scale, longterm evaluations of complex programs. The text incorporates quantitative, qualitative, and mixed-method designs
and this Second Edition reflects important developments in the field over the last five years.
New to the Second Edition:
Adds
two new chapters on organizing and managing evaluations, including how to strengthen capacity and
promote the institutionalization of evaluation systems
Includes
a new chapter on the evaluation of complex development interventions, with a number of
promising new approaches presented
Incorporates
new material, including on ethical standards, debates over the “best” evaluation designs and how
to assess their validity, and the importance of understanding settings
Expands
the discussion of program theory, incorporating theory of change, contextual and process analysis,
multi-level logic models, using competing theories, and trajectory analysis
Provides
case studies of each of the 19 evaluation designs, showing how they have been applied in the field
“This book represents a significant achievement. The authors have succeeded in creating a book that can be used
in a wide variety of locations and by a large community of evaluation practitioners.”
—Michael D. Niles, Missouri Western State University
RealWorld Evaluation
RealWorld
Evaluation
Bamberger
Rugh
Mabry
2
“This book is exceptional and unique in the way that it combines foundational knowledge from social sciences with
theory and methods that are specific to evaluation.”
—Gary Miron, Western Michigan University
“The book represents a very good and timely contribution worth having on an evaluator’s shelf, especially if you
work in the international development arena.”
—Thomaz Chianca, independent evaluation consultant, Rio de Janeiro, Brazil
2
EDITI
ON
Working Under Budget, Time,
Data, and Political Constraints
Michael
Bamberger
Jim
Rugh
Linda
Mabry
EDITION
The RealWorld Evaluation
approach
Developed to help evaluation practitioners
and clients
• managers, funding agencies and external
consultants
Still a work in progress (we continue to learn
more through workshops like this)
Originally designed for developing countries,
but equally applicable in industrialized
nations
14
Most RealWorld Evaluation tools are not
new— but promote a holistic, integrated
approach
Most of the RealWorld Evaluation data
collection and analysis tools will be familiar to
experienced evaluators.
What we emphasize is an integrated
approach which combines a wide range of
tools adapted to produce the best quality
evaluation under RealWorld constraints.
15
What is Special About the
RealWorld Evaluation Approach?
There is a series of steps, each with
checklists for identifying constraints and
determining how to address them
These steps are summarized on the following
slide and in the more detailed flow-chart on
page 5 of the Condensed Overview.
16
The Steps of the RealWorld
Evaluation Approach
Step 1: Planning and scoping the evaluation
Step 2: Addressing budget constraints
Step 3: Addressing time constraints
Step 4: Addressing data constraints
Step 5: Addressing political constraints
Step 6: Assessing and addressing the strengths and
weaknesses of the evaluation design
Step 7: Helping clients use the evaluation
See page 5 of the Condensed Summary for more details
17
Planning and Scoping the Evaluation
Understanding client information needs
Defining the program theory model
Preliminary identification of constraints to
be addressed by the RealWorld
Evaluation
18
Understanding client information
needs
Typical questions clients want answered:
Is the project achieving its objectives?
Is it having desired impact?
Are all sectors of the target population
benefiting?
Will the results be sustainable?
Which contextual factors determine the
degree of success or failure?
19
Understanding client information
needs
A full understanding of client information
needs can often reduce the types of
information collected and the level of
detail and rigor necessary.
However, this understanding could also
increase the amount of information
required!
20
Still part of scoping: Other questions
to answer as you customize an
evaluation Terms of Reference (ToR):
1.
2.
3.
4.
Who asked for the evaluation? (Who are
the key stakeholders)?
What are the key questions to be
answered?
Will this be mostly a developmental,
formative or summative evaluation?
Will there be a next phase, or other
projects designed based on the findings of
this evaluation?
21
Other questions to answer as
you customize an evaluation
ToR:
5.
6.
7.
8.
9.
What decisions will be made in response
to the findings of this evaluation?
What is the appropriate level of rigor?
What is the scope / scale of the
evaluation / evaluand (thing to be
evaluated)?
How much time will be needed /
available?
What financial resources are needed /
available?
22
Other questions to answer as
you customize an evaluation
ToR:
10.
11.
12.
13.
14.
15.
Should the evaluation rely mainly on
quantitative or qualitative methods?
Should participatory methods be used?
Can / should there be a household
survey?
Who should be interviewed?
Who should be involved in planning /
implementing the evaluation?
What are the most appropriate forms of
communicating the findings to different
stakeholder audiences?
23
Before we return to
the RealWorld steps,
let’s gain a
perspective on levels
of rigor, and what a
life-of-project
evaluation plan could
look like
24
Different levels of rigor
depends on source of evidence; level of confidence; use of information
Objective, high precision – but requiring more time & expense
Level 5: A very thorough research project is undertaken to conduct indepth analysis of situation; P= +/- 1% Book published!
Level 4: Good sampling and data collection methods used to gather data
that is representative of target population; P= +/- 5% Decision maker reads
full report
Level 3: A rapid survey is conducted on a convenient sample of
participants; P= +/- 10% Decision maker reads 10-page summary of report
Level 2: A fairly good mix of people are asked their perspectives about
project; P= +/- 25% Decision maker reads at least executive summary of report
Level 1: A few people are asked their perspectives about project;
P= +/- 40% Decision made in a few minutes
Level 0: Decision-maker’s impressions based on anecdotes and sound
bytes heard during brief encounters (hallway gossip), mostly intuition;
Level of confidence +/- 50%; Decision made in a few seconds
Quick & cheap – but subjective, sloppy
25
CONDUCTING AN EVALUATION IS
LIKE LAYING A PIPELINE
QUALITY OF INFORMATION GENERATED BY AN EVALUATION
DEPENDS UPON LEVEL OF RIGOR OF ALL COMPONENTS
AMOUNT OF “FLOW” (QUALITY) OF INFORMATION IS LIMITED TO
THE SMALLEST COMPONENT OF THE SURVEY “PIPELINE”
Determining appropriate levels of precision for
events in a life-of-project evaluation plan
High rigor
Same level of rigor
4
Final
evaluation
Baseline
study
Mid-term
evaluation
3
Needs
assessment
Special
Study
Annual
self-evaluation
2
Low rigor
Time during project life cycle
28
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
EVALUATION
DESIGNS
29
Some of the purposes for program evaluation
Formative: learning and improvement including early
identification of possible problems
Knowledge generation: identify cause-effect correlations
and generic principles about effectiveness.
Accountability: to demonstrate that resources are used
efficiently to attain desired results
Summative judgment: to determine value and future of
program
Developmental evaluation: adaptation in complex,
emergent and dynamic conditions
-- Michael Quinn Patton, Utilization-Focused Evaluation, 4th edition, pages 139-140
30
Determining appropriate (and
feasible) evaluation design
Based on the main purpose for
conducting an evaluation, an
understanding of client information
needs, required level of rigor, and what
is possible given the constraints, the
evaluator and client need to determine
what evaluation design is required and
possible under the circumstances.
31
Some of the considerations
pertaining to evaluation design
1: When evaluation events take place
(baseline, midterm, endline)
2. Review different evaluation designs
(experimental, quasi-experimental, other)
3. Determine what needs to be done to
“fill in the missing pieces” when the
“ideal” evaluation design/scenario is not
feasible.
32
An introduction to various evaluation designs
Illustrating the need for quasi-experimental
longitudinal time series evaluation design
Project participants
Comparison group
baseline
scale of major impact indicator
end of project
evaluation
post project
evaluation
33
OK, let’s stop the action to
identify each of the major
types of evaluation (research)
design …
… one at a time, beginning with the
most rigorous design.
34
First of all: the key to the traditional symbols:
X = Intervention (treatment), I.e. what the
project does in a community
O = Observation event (e.g. baseline, mid-term
evaluation, end-of-project evaluation)
P (top row): Project participants
C (bottom row): Comparison (control) group
Note: the 7 RWE evaluation designs are laid out on page 8 of the
Condensed Overview of the RealWorld Evaluation book
35
Design #1: Longitudinal Quasi-experimental
P1
X
C1
P2
X
C2
P3
P4
C3
C4
Project participants
Comparison group
baseline
midterm
end of project
evaluation
post project
evaluation
36
Design #2: Quasi-experimental (pre+post, with comparison)
P1
X
P2
C1
C2
Project participants
Comparison group
baseline
end of project
evaluation
37
Design #2+: Randomized Control Trial
P1
X
P2
C1
C2
Project participants
Research subjects
randomly assigned
either to project or
control group.
Control group
baseline
end of project
evaluation
38
Design #3: Truncated Longitudinal
X
P1
X
C1
P2
C2
Project participants
Comparison group
midterm
end of project
evaluation
39
Design #4: Pre+post of project; post-only comparison
P1
X
P2
C
Project participants
Comparison group
baseline
end of project
evaluation
40
Design #5: Post-test only of project and comparison
X
P
C
Project participants
Comparison group
end of project
evaluation
41
Design #6: Pre+post of project; no comparison
P1
X
P2
Project participants
baseline
end of project
evaluation
42
Design #7: Post-test only of project participants
X
P
Project participants
end of project
evaluation
43
See Table 2.2 on page 8 of Condensed Overview of RWE
D
e
s
i
g
n
T4
cont.)
(endline)
(ex-post)
X
P3
C3
P4
C4
X
P2
C2
X
P2
C2
X
X
P2
C2
X
X
P1
C1
X
X
P2
X
X
P1
X
(baseline)
(intervention)
1
P1
C1
X
2
P1
C1
X
3
4
X
P1
5
6
7
P1
T2
X
T3
T1
(midterm)
P2
C2
P1
C1
(intervention,
44
“Non-Experimental” Designs
[NEDs]
NEDs are impact evaluation designs that
do not include a matched comparison
group
Outcomes and impacts assessed without
a conventional statistical counterfactual
to address the question
•
“what would have been the situation of the
target population if the project had not taken
place?”
45
Situations in which an NED may
be the best design option
Not possible to define a comparison group
When the project involves complex processes of
behavioral change
Complex, evolving contexts
Outcomes not known in advance
Many outcomes are qualitative
Projects operate in different local settings
When it is important to study implementation
Project evolves slowly over a long period of time
46
Some potentially strong NEDs
A.
B.
C.
D.
E.
F.
G.
Interrupted time series
Single case evaluation designs
Longitudinal designs
Mixed method case study designs
Analysis of causality through program
theory models
Concept mapping
Contribution Analysis
47
Any questions?
48
TIME FOR SMALL
GROUP DISCUSSION
49
1. Self-introductions
2. What constraints of
these types have you
faced in your evaluation
practice?
3. How did you cope with
them?
50
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
LOGIC
MODELS
51
Defining the program theory
model
All programs are based on a set of assumptions
(hypothesis) about how the project’s
interventions should lead to desired outcomes.
Sometimes this is clearly spelled out in project
documents.
Sometimes it is only implicit and the evaluator
needs to help stakeholders articulate the
hypothesis through a logic model.
52
Defining the program theory
model
Defining and testing critical assumptions
are essential (but often ignored)
elements of program theory models.
The following is an example of a model
to assess the impacts of microcredit on
women’s social and economic
empowerment
53
Critical logic chain hypothesis for a
Gender-Inclusive Micro-Credit Program
Sustainability
• Structural changes will lead to long-term impacts.
Medium/long-term impacts
• Increased women’s economic and social empowerment.
• Economic and social welfare of women and their families will
improve.
Short-term outcomes
• If women obtain loans they will start income-generating activities.
• Women will be able to control the use of loans and reimburse them.
Outputs
• If credit is available women will be willing and able to obtain loans
and technical assistance.
54
What does it take to measure
indicators at each level?
Impact :Population-based survey
(baseline, endline evaluation)
Outcome: Change in behavior of participants
(can be surveyed annually)
Output: Measured and reported by project staff (annually)
Activities: On-going (monitoring of interventions)
Inputs: On-going (financial accounts)
Consequences
Consequences
Consequences
PROBLEM
PRIMARY
CAUSE 1
Secondary
cause 2.1
Tertiary
cause 2.2.1
PRIMARY
CAUSE 2
Secondary
cause 2.2
Tertiary
cause 2.2.2
PRIMARY
CAUSE 3
Secondary
cause 2.3
Tertiary
cause 2.2.3
Consequences
Consequences
Consequences
DESIRED IMPACT
OUTCOME
1
OUTPUT 2.1
OUTCOME
2
OUTPUT 2.2
OUTCOME
3
OUTPUT 2.3
Intervention
Intervention
Intervention
2.2.1
2.2.2
2.2.3
Reduction in poverty
Women empowered
Women in
leadership roles
Improved
educational
policies
Parents
persuaded to
send girls to
school
Young women
educated
Economic
opportunities
for women
Female
enrollment rates
increase
Curriculum
improved
Schools
built
School system
hires and pays
teachers
To have synergy and achieve impact all of these need to address
the same target population.
Program Goal: Young
women educated
Advocacy
Project
Goal:
Improved
educational
policies
enacted
ASSUMPTION
(that others will do this)
Construction
Project Goal:
More
classrooms
built
Teacher
Education
Project
Goal:
Improve
quality of
curriculum
OUR project
PARTNER will do this
Program goal at impact level
One form of Program Theory (Logic) Model
Economic context
in which the
project operates
Design
Inputs
Institutional and
operational
context
Political context in
which the project
operates
Implementation
Process
Outputs
Outcomes
Impacts
Sustainability
Socio-economic and cultural characteristics
of the affected populations
Note: The orange boxes are included in conventional Program Theory Models. The
addition of the blue boxes provides the recommended more complete analysis.
60
61
Education Intervention Logic
Output
Clusters
Institutional
Management
Specific
Impact
Outcomes
Better
Allocation of
Educational
Resources
Increased
Affordability of
Education
Quality of
Education
Economic
Growth
Skills and
Learning
Enhancement
MDG 2
Equitable
Access to
Education
Improved
Participation in
Society
Education
Facilities
MDG 3
Poverty
Reduction
MDG 1
Social
Development
MDG 2
Health
Global
Impacts
Improved Family
Planning &
Health Awareness
Curricula &
Teaching
Materials
Teacher
Recruitment
& Training
Intermediate
Impacts
Greater Income
Opportunities
Optimal
Employment
Source: OECD/DAC Network on Development Evaluation
Expanding the results chain for multi-donor, multicomponent program
Impacts
Intermediate
outcomes
Outputs
Inputs
Increased
rural H/H
income
Increased
production
Credit for
small
farmers
Donor
Increased
political
participation
Access to offfarm
employment
Rural
roads
Government
Improved
education
performance
Increased school
enrolment
Improved
health
Increased use of
health services
Schools
Health
services
Other donors
Attribution gets very difficult! Consider plausible contributions each makes.
Contribution Analysis*
1.
2.
3.
Develop a program logic (theory of change)
Identify existing evidence on the causal links
between interventions and possible effects.
Identify gaps in information.
Assess alternative explanations of possible
effects. Assess the possible contributions of
different factors to effects.
* John Mayne et al in EES’ Evaluation journal, Vol. 18, No. 3, July 2012
64
Contribution Analysis*
4.
5.
6.
Create a performance story on the
contribution of the intervention to changes in
impact variables.
Seek out additional evidence to improve the
program’s performance story.
Revise and strengthen the performance
story.
* John Mayne et al in EES’ Evaluation journal, Vol. 18, No. 3, July 2012
65
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Where there was
no baseline
Ways to reconstruct baseline
conditions
A.
B.
C.
D.
E.
Secondary data
Project records
Recall
Key informants
Participatory methods
67
Assessing the utility of potential
secondary data
Reference period
Population coverage
Inclusion of required indicators
Completeness
Accuracy
Free from bias
68
Assessing the reliability of
project records
Who collected the data and for what
purpose?
Were they collected for record-keeping or to
influence policymakers or other groups?
Do monitoring data only refer to project
activities or do they also cover changes in
outcomes?
Were the data intended exclusively for
internal use? For use by a restricted group?
Or for public use?
69
Limitations of recall
Generally not reliable for precise
quantitative data
Sample selection bias
Deliberate or unintentional distortion
Few empirical studies (except on
expenditure) to help adjust estimates
70
Sources of bias in recall
Who provides the information
Under-estimation of small and routine expenditures
“Telescoping” of recall concerning major expenditures
Distortion to conform to accepted behavior:
•
•
•
Intentional or unconscious
Romanticizing the past
Exaggerating (e.g. “We had nothing before this project came!”)
Contextual factors:
•
•
Time intervals used in question
Respondents expectations of what interviewer wants to
know
Implications for the interview protocol
71
Improving the validity of recall
Conduct small studies to compare recall
with survey or other findings.
Ensure all relevant groups interviewed
Triangulation
Link recall to important reference events
• Elections
• Drought/flood/tsunami/war/displacement
• Construction of road, school etc
72
Key informants
Not just officials and high status people
Everyone can be a key informant on
their own situation:
• Single mothers
• Factory workers
• Users of public transport
• Sex workers
• Street children
73
Guidelines for key-informant
analysis
Triangulation greatly enhances validity
and understanding
Include informants with different
experiences and perspectives
Understand how each informant fits into
the picture
Employ multiple rounds if necessary
Carefully manage ethical issues
74
PRA and related participatory
techniques
PRA (Participatory Rapid Appraisal) and PLA
(Participatory Learning and Action)
techniques collect data at the group or
community [rather than individual] level
Can either seek to identify consensus or
identify different perspectives
Risk of bias:
• If only certain sectors of the community
•
participate
If certain people dominate the discussion
75
Participatory approaches should be
used as much as possible
but even they should be used with appropriate
rigor: how many (and which) people’s
perspectives contributed to the story? 76
Summary of issues in baseline
reconstruction
Variations in reliability of recall
Memory distortion
Secondary data not easy to use
Secondary data incomplete or unreliable
Key informants may distort the past
Yet in many cases we need to use one or
more of these methods to determine what
change occurred during the life of a project.
77
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Mixed-method
(MM) evaluations
The Main Messages
1.
2.
3.
4.
No single evaluation approach can fully address the
complexities of development evaluations
MM combines the breadth of quantitative (QUANT)
evaluation methods with the depth of qualitative
(QUAL) methods
MM is an integrated approach to evaluation with
specific tools and techniques for each stage of the
evaluation cycle
MM are used differently by evaluators with a QUANT
orientation and those with a QUAL orientation – and
offer distinct benefits for each kind of evaluation
79
A. Why mixed methods?
No single
evaluation
methodology can
fully explain how
development
programs operate
in the real world
This explains the growing
interest in mixed methods
evaluations
80
Why mixed methods? No single evaluation
method can fully explain how development programs
operate in the real world
1.
2.
3.
4.
5.
Programs operate in complex and changing
environments
Interventions are affected by historical, cultural, political,
economic and other contextual factors
Different methodologies are needed to measure
different contextual factors, processes and outcomes.
Even “simple” interventions often involve complex
processes of organizational and behavioral change
Programs change depending on how different sectors of
the target population respond
81
The benefits of a mixed
methods approach
QUANTITATIVE
breadth
How many?
How much?
How representative of
the total population?
Are changes statistically
significant?
QUALITATIVE
depth
+
•
•
•
How were changes
experienced by
individuals?
What actually happened
on the ground?
The quality of services
82
Four decisions for designing a
mixed methods evaluation
1.
2.
3.
4.
At which stages of the evaluation
are mixed methods used?
Is the design sequential or
concurrent?
Which approach is dominant?
Is the design single or multi-level?
83
Decision 1: At which stages of the
evaluation are mixed methods used?
QUANT
QUAL
Mixed
1. Formulation of hypotheses
2. Sample design
3. Evaluation design
4. Data collection and recording
5. Triangulation
6. Data analysis and interpretation
Mixed methods can be used at any stage of the evaluation.
A fully integrated MM design combines QUANT and QUAL
methods at all stages of the evaluation
84
Decision 2: Is the design sequential or
concurrent?
Sequential designs:
• QUANT and QUAL approaches are used in
sequence
Concurrent designs
• QUANT and QUAL approaches are both
used at the same time
85
Sequential QUAL dominant MM design:
Evaluating the adoption of new seed varieties by
different types of rural families.
quant
Rapid QUANT
household survey
in project villages to
estimate HH
characteristics,
ethnicity,
agricultural
production and
seed adoption
QUAL
QUAL
QUAL data
collection using
key informants
focus groups,
observation, and
preparation of
case studies on
households and
farming
practices.
QUAL data
analysis using
within and
between-case
analysis and
constant
comparison.
Triangulation
among different
data sources.
86
A concurrent MM design: Triangulating QUANT
and QUAL estimates of household income in project
and comparison areas
QUANT and QUAL data collection methods are used at the same time
Project
communities
Comparison
communities
QUANT household
surveys
QUANT/QUAL
Observation of
household
possessions and
construction quality
QUAL: Focus groups
Triangulation
of estimates
from the 3
sources – to
obtain the
most reliable
estimate of
household
income
87
Decision 3: which approach is dominant?
QUAL oriented studies gradually incorporating more QUANT focus
A
B
E
F
QUAL
QUANT
D
G
C
QUANT oriented studies gradually incorporating more QUAL focus
A = completely QUANT design
B = dominant QUANT with some QUAL elements
C = QUANT oriented design giving equal weight to both approaches
D = Study designed as balanced MM
E = QUAL oriented design giving equal weight to both approaches.
F = dominant QUAL design with some QUANT elements
G = completely QUAL design
88
A QUANT dominant
evaluation design
A
B
E
F
QUAL
QUANT
D
G
C
Example:
A rapid qualitative diagnostic study is
conducted to help design a quantitative
household survey. The data is analyzed
using quantitative analysis techniques
[e.g. regression analysis]
89
A QUAL dominant
evaluation design
A
B
E
F
QUAL
QUANT
D
G
C
Example
A rapid quantitative sample survey is conducted.
This is used to develop a typology of rice
production systems. Qualitative case studies are
selected to represent each type. The data is
analyzed and presented using qualitative
methods such as narrative descriptions,
photographs and social maps.
90
A Multi-level mixed methods design
The effects of a school feeding program on school enrolment
C. Using mixed methods to strengthen
each stage of the evaluation
1. Hypothesis formulation
2. Sample design
3. Evaluation design
4. Data collection
5. Triangulation
6. Data analysis and
interpretation
Stage 1. Mixed methods approaches to
hypothesis development
Combining deductive (QUANT) and inductive
(QUAL) hypotheses
Basing the evaluation framework on a theory
of change
Strengthening construct validity by combining
different QUANT and QUAL indicators
Contextualizing the evaluation
93
Comparing DEDUCTIVE and
INDUCTIVE hypotheses
Deductive
Inductive
Mainly used in QUANT research
Mainly used in QUAL research
Hypotheses test theories based on
prior research
Hypotheses based on observations
in the field
Hypotheses defined at start of the
evaluation before data collection
begins
Hypotheses not defined until data
collection begins
Hypotheses normally do not change Hypotheses evolve as data
collection progresses
Hypotheses can be tested
experimentally
Hypotheses are tested using
Theory of Change or logically
Mixed methods hypotheses combine
both deductive and inductive
94
Stage 2. Mixed method sample
designs
Parallel mixed method sampling
• Random (QUANT) and purposive (QUAL)
sampling
Sequential MM sampling
Multi-level MM sampling
Strengthening the coverage of the sampling
frame
Strengthening the matching of the project
and control groups
95
Stage 3. Mixed method
evaluation design
Combining experimental and quasi-experimental;
designs with QUAL techniques to explore:
• Processes and quality of services
• Context
• Behavioral change
Flexibility to adapt the evaluation to changes in
the project design or the project context
In-depth analysis of how the project affects
different groups
Creative identification of comparison groups
96
Stage 4. Strengthening data
collection
A.
B.
Integrating survey and QUAL data collection
Commonly used mixed method data collection
methods for strengthening QUANT evaluations
1.
2.
3.
4.
C.
D.
E.
F.
Focus groups
Observation
Secondary data
Case studies
Reconstructing baseline data
Interviewing difficult-to-reach groups
Collecting information on sensitive topics
Attention to contextual clues
97
Stage 5. Validating findings through triangulation
Possible return to field
QUANT
data collection
Household survey
data collected on
income and
expenditures
QUANT
data analysis:
Calculating mean,
frequency
distributions and
standard deviation of
income and
expenditures
QUAL
TRIANGULATION
PROCESS
Findings compared,
reconciled and
integrated. When
different estimates
are obtained all of
the data is reviewed
to understand why
differences occur. If
necessary teams
may return to the
field to investigate
further
Data collection.
Sub-sample of household
interview families
selected. Interviews, key
informants and
observation. Detailed
notes, taped interviews
and photos.
QUAL
Data analysis
Review of interview and
observation notes,
Analysis using constant
comparative method
98
Different kinds of triangulation
Different data collection methods
Different interviewers
Collecting information at different times
Different locations and contexts
99
Stage 6. Mixed method data
analysis and interpretation
Parallel MM data analysis
Conversion MM data analysis
• Converting QUAL data into QUANT
indicators and vice versa]
Sequential MM data analysis
Multi-level MM data analysis
Generalizing findings and recommendations
to other potential program settings
100
Quantitative data collection methods
Strengths and weaknesses
Strengths
Generalization
statistically representative
Estimate magnitude and
distribution of impacts
Clear documentation of
methods
Standardized approach
Statistical control of bias and
external factors
Weaknesses
Surveys cannot capture many
types of information
Do not work for difficult to reach
groups
No analysis of context
Survey situation may alienate
Long delay in obtaining results
Reducing data to easy-to-measure
indicators loses perspectives on
complex information
101
Using Qualitative methods to improve
the Evaluation design and results
Use recall to reconstruct the pre-test situation
Interview key informants to identify other changes in the
community or in gender relations
Conduct interviews or focus groups with women and
men to
•
•
assess the effect of loans on gender relations within the
household, such as changes in control of resources and
decision-making
identify other important results or unintended consequences:
• increase in women’s work load,
• increase in incidence of gender-based or domestic violence
102
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Determining
Counterfactuals
Attribution and counterfactuals
How do we know if the observed changes in
the project participants or communities
•
income, health, attitudes, school attendance. etc
are due to the implementation of the project
•
credit, water supply, transport vouchers, school
construction, etc
or to other unrelated factors?
•
changes in the economy, demographic movements,
other development programs, etc
104
The Counterfactual
What change would have occurred in
the relevant condition of the target
population if there had been no
intervention by this project?
105
Where is the counterfactual?
After families had been living
in a new housing project for
3 years, a study found
average household income
had increased by an 50%
Does this show that housing is
an effective way to raise
income?
106
I n c o m e
Comparing the project with two
possible comparison groups
Project group. 50% increase
750
Scenario 2. 50% increase in
comparison group income. No
evidence of project impact
500
Scenario 1. No increase in
comparison group income.
Potential evidence of project
impact
250
2004
2009
Control group and comparison group
Control group = randomized allocation of
subjects to project and non-treatment group
Comparison group = separate procedure for
sampling project and non-treatment groups
that are as similar as possible in all aspects
except the treatment (intervention)
108
IE Designs: Experimental Designs
Randomized Control Trials
Eligible individuals, communities, schools
etc are randomly assigned to either:
• The project group (that receives the services)
or
• The control group (that does not have access
to the project services)
109
Primary Outcome
A graphical illustration of an ‘ideal’
counterfactual using pre-project
trend line then RCT
Subjects randomly
assigned either to …
Intervention
IMPACT
Impact
Treatment group
Control group
Time
110
There are other methods for
assessing the counterfactual
Reliable secondary data that depicts
relevant trends in the population
Longitudinal monitoring data (if it includes
non-reached population)
Qualitative methods to obtain perspectives
of key informants, participants, neighbors,
etc.
111
Ways to reconstruct comparison
groups
Judgmental matching of communities
When there is phased introduction of
project services beneficiaries entering in
later phases can be used as “pipeline”
comparison groups
Internal controls when different subjects
receive different combinations and levels
of services
112
Using propensity scores and
other methods to strengthen
comparison groups
Propensity score matching
Rapid assessment studies can compare
characteristics of project and comparison
groups using:
•
•
•
•
•
Observation
Key informants
Focus groups
Secondary data
Aerial photos and GIS data
113
Issues in reconstructing
comparison groups
Project areas often selected purposively and
difficult to match
Differences between project and comparison
groups - difficult to assess whether outcomes
were due to project interventions or to these
initial differences
Lack of good data to select comparison
groups
Contamination (good ideas tend to spread!)
Econometric methods cannot fully adjust for
initial differences between the groups
[unobservables]
114
What experiences have you had with
identifying counterfactual data?
115
Enough of our
presentations: it’s time
for you (THE
RealWorld PEOPLE!)
to get involved
yourselves.
Time for smallgroup work. Read
your case studies
and begin your
discussions.
Small group case study work
1.
2.
3.
Some of you are playing the role of
evaluation consultants, others are
clients commissioning the evaluation.
Decide what your group will propose
to do to address the given constraints/
challenges.
Prepare to negotiate the ToR with the
other group (later this afternoon).
The purpose of this exercise to gain some practical ‘feel’ for applying what
we’ve learned about RealWorld Evaluation.
Group A (consultants) Evaluation team to consider how they will propose a
revised evaluation design and plan that reduces the budget by 25% to 50%,
and yet meets the needs of both clients (City Housing Department and the
international donor).
Group B (clients) will also review the initial proposal in light of what they have
learned about RealWorld Evaluation, and prepare to re-negotiate the plans
with the consultancy group. Note: there are two types of clients: the Housing
Department (project implementers) and the international donor (foundation).
Groups are given 45 minutes to prepare their cases.
Later the Consultants’ group will meet with Clients’ groups to negotiate their
proposed revisions in the plans for this evaluation. 45 minutes will be available
for those negotiation sessions.
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
More holistic
approaches to
Impact Evaluation
Some recent developments in
impact evaluation in development
2003
2006
J-PAL is best understood as a network of
affiliated researchers … united by their use of
the randomized trial methodology…
2008
Impact Evaluation for
Improving Development –
3ie and AfrEA conference in
Cairo March 2009
2012 Stern, E., N. Stame, J. Mayne, K.
Forss, R. Davies and B. Befani.
Broadening the Range of Designs
and Methods for Impact Evaluations
DFID Working Paper 38
2009
121
Randomized control trials
and strong statistical
designs
When might they be
appropriate … and when not?
The arguments in favor of RCTs
1.
Selection bias has resulted in wrongly attributing
changes to the results of an intervention
a.
b.
2.
3.
Selection bias tends to over-estimate program impacts
Agencies continue to support their favorite programs –
even though they may be producing fewer benefits than
claimed
A high percentage of evaluations use
methodologically unsound methods, but funding
agencies are rarely challenged
Agencies evaluate themselves and are able to
ensure that the findings are never too critical.
123
The arguments in favor of RCTs
4.
5.
Isolating defined treatments and testing
them in a systematically controlled way is
an effective way to improve our
understanding of the potential benefits of
this specific intervention.
It has proved possible to use RCTs in a
very wide range of contexts and sectors
and in many cases they have produced
useful findings at the operational and
policy levels
124
The arguments in favor of RCTs
6.
RCTs provide a reference point against
which to assess the methodological
rigor of other evaluation designs.
A good way to keep evaluators honest.
125
Practical, political and ethical
limitations on the use of RCTs
1.
It is only possible to use RCTs in a
small proportion of evaluations
It is often not possible to identify a
comparison group
The goal of reaching as many people as
possible does not permit the exclusion of
certain people
Programs are often implemented gradually
over a period of time and it is not possible to
define a specific starting point.
126
Practical, political and ethical
limitations on the use of RCTs
2.
Opposition to the random assignment of
beneficiaries
Politicians do not wish to lose their control
over who receives benefits
Managers (and others) may wish to target
the poorest or most vulnerable sectors and
not to give scarce resources to better-off
groups
Communities may not accept this approach
127
Practical, political and ethical
limitations on the use of RCTs
3.
Ethical considerations
Withholding needed resources from people
who need them
Giving benefits to less needy groups while
depriving the most needy
Treating people as guinea pigs in a
laboratory
128
Practical, political and ethical
limitations on the use of RCTs
The ethical arguments are not so clear and
one-sided
Using weaker evaluation designs may result in
poor programs continuing to be funded – thus
depriving people of potentially higher benefits
from new programs that could use these funds
Resources are frequently limited so that some
allocation criteria is required – what are the
alternatives to randomization?
Rigorous evaluations can improve program
performance
129
Methodological limitations of
RCTs and strong QEDs
1.
2.
3.
4.
Can only evaluate a single component
(intervention) of a multi-component program
Inflexible design – difficult to adjust design to
modifications to program design or to changes
in the context in which programs are
implemented
Cannot control what happens during
implementation
Regression only estimates the average effect
size – difficult to generalize the findings to other
contexts.
130
Methodological limitations of
RCTs and strong QEDs
5.
Simple RCTs to not assess the
implementation process and when
intended impacts are not found, the
analysis cannot distinguish between
“design failure” and “implementation
failure”
131
Implications for RWE
1.
RCTs are a potentially useful component
of some kinds of evaluation – but should
almost never be used in isolation
An RCT is strongest when part of a mixed
methods design
2.
Define whether a program is intended:
as an experiment to rigorously assess promising
new interventions that might be replicated on a
large scale OR
To provide services to as many people as
possible
132
Implications for RWE
3.
4.
If programs are designed as experiments, an
RCT can be cheaper than many alternative
survey designs.
Always ask the question of how nonexperimental and weak QEDs can address the
very serious issues of selection bias and the
over-estimation of project impacts.
Never automatically reject RCTs without assessing
the important reasons why these methods were
developed
But never accept RCTs (or any other evaluation
method) as the best or only approach
133
Implications for RWE
5.
The real world is too complex for any
single evaluation approach to fully
address all of the important questions
that must be answered.
134
So what should be included in a
“rigorous impact evaluation”?
1.
Direct cause-effect relationship between one output (or a
very limited number of outputs) and an outcome that can
be measured by the end of the research project? Pretty
clear attribution.
… OR …
2.
Changes in higher-level indicators of sustainable
improvement in the quality of life of people, e.g. the MDGs
(Millennium Development Goals)? More significant but
much more difficult to assess direct attribution.
135
So what should be included in a
“rigorous impact evaluation”?
OECD-DAC (2002: 24) defines impact as “the positive and
negative, primary and secondary long-term effects
produced by a development intervention, directly or
indirectly, intended or unintended. These effects can be
economic, sociocultural, institutional, environmental,
technological or of other types”.
Does it mention or imply direct attribution? Or point to the
need for counterfactuals or Randomized Control Trials
(RCTs)?
136
Different lenses needed for different
situations in the RealWorld
Simple
Complicated
Following a recipe
Sending a rocket to the Raising a child
moon
Recipes are tested to
assure easy replication
Sending one rocket to
the moon increases
assurance that the next
will also be a success
The best recipes give
There is a high degree
good results every time of certainty of outcome
Complex
Raising one child
provides experience
but is no guarantee of
success with the next
Uncertainty of outcome
remains
Sources: Westley et al (2006) and Stacey (2007), cited in Patton 2008;
also presented by Patricia Rodgers at Cairo impact conference 2009.
137
Examples of cause-effect correlations
that are generally accepted
• Vaccinating young children with a standard
set of vaccinations at prescribed ages leads to
reduction of childhood diseases (means of
verification involves viewing children’s health charts,
not just total quantity of vaccines delivered to clinic)
•
Other examples … ?
138
But look at examples of what kinds of interventions
have been “rigorously tested” using RCTs
• Conditional cash transfers
• The use of visual aids in Kenyan schools
• Deworming children (as if that’s all that’s needed
to enable them to get a good education)
• Note that this kind of research is based on
the quest for Silver Bullets – simple, most
cost-effective solutions to complex problems.
139
“Far better an approximate answer to
the right question, which is often vague,
than an exact answer to the wrong
question, which can always be made
precise.“
J. W. Tukey (1962, page 13), "The future of data analysis".
Annals of Mathematical Statistics 33(1), pp. 1-67.
Quoted by Patricia Rogers, RMIT University
140
Is that what we call “scientific method”?
There is much more to impact, to rigor,
and to “the scientific method” than
RCTs. Serious impact evaluations
require a more holistic approach.
141
Consequences
Consequences
Consequences
DESIRED IMPACT
OUTCOME
1
OUTCOME
2
OUTCOME
3
A more comprehensive design
OUTPUT 2.1
OUTPUT 2.2
OUTPUT 2.3
A Simple RCT
Intervention
Intervention
Intervention
2.2.1
2.2.2
2.2.3
The limited use of strong
evaluation designs
In the RealWorld (at least of international
development programs) we estimate that:
• fewer than 5%-10% of project impact
evaluations use a strong experimental or even
quasi-experimental designs
• significantly less than 5% of impact
evaluations use randomized control trials
(‘pure’ experimental design)
143
What kinds of evaluation designs are
actually used in the real world of
international development? Findings from
meta-evaluations of 336 evaluation reports
of an INGO.
Post-test only
59%
Before-and-after
25%
With-and-without
15%
Other
counterfactual
1%
Rigorous impact evaluation should
include (but is not limited to):
1) thorough consultation with and
involvement by a variety of stakeholders,
2) articulating a comprehensive logic model
that includes relevant external influences,
3) getting agreement on desirable ‘impact
level’ goals and indicators,
4) adapting evaluation design as well as data
collection and analysis methodologies to
respond to the questions being asked, …
Rigorous impact evaluation should
include (but is not limited to):
5) adequately monitoring and
documenting the process throughout the
life of the program being evaluated,
6) using an appropriate combination of
methods to triangulate evidence being
collected,
7) being sufficiently flexible to account
for evolving contexts, …
Rigorous impact evaluation should
include (but is not limited to):
8) using a variety of ways to determine
the counterfactual,
9) estimating the potential sustainability
of whatever changes have been
observed,
10) communicating the findings to
different audiences in useful ways,
11) etc. …
The point is that the list of
what’s required for ‘rigorous’
impact evaluation goes way
beyond initial randomization
into treatment and ‘control’
groups.
We must be careful that in using the
“Gold Standard”
we do not violate the “Golden Rule”:
“Judge not that you not be judged!”
In other words:
“Evaluate others as you would have
them evaluate you.”
Caution: Too often what is called Impact Evaluation is based
on a “we will examine and judge you” paradigm. When we
want our own programs evaluated we prefer a more holistic
approach.
How much more helpful it is when the approach to
evaluation is more like holding up a mirror to help people
reflect on their own reality: facilitated self-evaluation.
The bottom line is defined by this
question:
Are our programs making plausible
contributions towards positive
impact on the quality of life of our
intended beneficiaries?
Let’s not forget them!
153
153
ANY QUESTIONS?
154
154
Time for consultancy
teams to meet with
clients to negotiate the
revised ToRs for the
evaluation of the
housing project.
155
What did you learn about
RealWorld Evaluation by
getting involved in that
case study and roleplaying exercise?
156
Next steps: What are some of the practical
realities of applying RWE approaches in our
own work?
157
Main workshop messages
Evaluators must be prepared for RealWorld
evaluation challenges.
2. There is considerable experience to learn from.
3. A toolkit of practical “RealWorld” evaluation
techniques is available (see www.RealWorldEvaluation.org).
4. Never use time and budget constraints as an excuse
for sloppy evaluation methodology.
5. A “threats to validity” checklist helps keep you
honest by identifying potential weaknesses in your
evaluation design and analysis.
6. The evaluation report should disclose what
constraints were faced and what was done to address
them.
1.
158
Additional References for IE
DFID “Broadening the range of designs and methods for impact evaluations”
http://www.oecd.org/dataoecd/0/16/50399683.pdf
Robert Picciotto “Experimententalism and development evaluation: Will the
bubble burst?” in Evaluation journal (EES) April 2012:
http://evi.sagepub.com/
Martin Ravallion “Should the Randomistas Rule?” (Economists’ Voice 2009)
http://ideas.repec.org/a/bpj/evoice/v6y2009i2n6.html
William Easterly “Measuring How and Why Aid Works—or Doesn't”
http://aidwatchers.com/2011/05/controlled-experiments-and-uncontrollablehumans/
Control freaks: Are “randomised evaluations” a better way of doing aid and
development policy? The Economist June 12, 2008
http://www.economist.com/node/11535592
Series of guidance notes (and webinars) on impact evaluation produced by
InterAction (consortium of US-based INGOs):
http://www.interaction.org/impact-evaluation-notes
Evaluation (EES journal) Special Issue on Contribution Analysis, Vol. 18, No.
3, July 2012. www.evi.sagepub.com
Impact Evaluation In Practice www.worldbank.org/ieinpractice
159
Unanticipated consequences of
development interventions:
a blind spot for evaluation theory
and practice
[and if time permits]
A few thoughts on evaluating
complex development interventions
Michael Bamberger
1. Development programs almost
never work out as planned
The outcomes of most programs are unpredictable
•
•
Programs propose simple, linear solutions to complex
problems
Even when programs go smoothly and are implemented as
planned – human behavior is difficult to predict
Programs operate in complex systems with
constantly changing economic, political, security,
natural environmental and socio-cultural
environments
Program planners have much less control than they
often think
161
The scope of unanticipated outcomes
increases as we move from programs to
complex development interventions -
• such as Country Programs or multi-donor
collaborative initiatives
162
2. So why are evaluators still
surprised when programs have
unanticipated outcomes?
Unanticipated outcomes still catch
planners and evaluators off-guard even
though experience has shown that very
few programs work out exactly as
planned
Many unanticipated outcomes, including
serious negative consequences, are not
even detected by many evaluations
163
Some serious negative outcomes
not captured by the evaluation
A food-for-work program in Central America:
•
•
Many women were forbidden by their husbands from participating
in the project
Some women were seriously beaten by their husbands for
attending meetings
A slum upgrading project in South East Asia:
•
Prior to the official start of the project many slum dwellers were
forced (sometimes at gunpoint) to sell their houses at a very low
price to people with political contacts
A road construction project in East Africa:
•
Prior to the first supervision mission for a road construction project
the government destroyed a number of houses to avoid paying
compensation to the families as specified in the loan agreement
164
3. Why are evaluations often
unable to detect these
unanticipated outcomes?
Funders only want evaluators to look at
whether their programs achieve their
intended outcomes/ goals
• Don’t rock the boat
Real-world evaluation constraints:
• Budget, time, data constraints
• Political and organizational constraints
165
Many evaluation designs make it difficult
to identify unanticipated outcomes
Many theories of change and logic models only
explain and assess how a program is expected
to achieve its objectives
•
They say very little about other things that might
happen
Results frameworks only monitor intended
outcomes
Conventional quantitative evaluation designs
only assess whether observed changes can be
attributed to the program intervention
166
Many evaluation designs make it difficult
to identify unanticipated outcomes, cont.
Structured surveys collect information
relevant to program objectives but rarely
provide an opportunity for respondents
to talk about other things on their mind
Difficult to find/interview vulnerable
groups and those who do not benefit
Focus groups often only cover people
involved in the project
167
Many development evaluations focus too
much on the internal concerns and
procedures of their own agency
• And do not focus sufficiently on the national
development context and the concerns of
national stakeholders.
168
Many unanticipated events occur outside
the vision field of the evaluation
Key events occur before the evaluation begins
(e.g. people are forced to sell land or houses at
bargain prices)
Events occur when the evaluator is not there (at
night, during the rainy season, in the privacy of
the household)
Critical events occur after the evaluation has
ended
Many quantitative evaluations have no contact
with the program between the two applications
of the survey instruments (pre-test and post-test)
169
Many unanticipated events occur outside
the vision field of the evaluation, cont.
Evaluators often ignore context
Budget constraints mean that the
poorest and most vulnerable groups who
live far from the village center, often
cannot be interviewed
170
4. Why should we worry?
Unanticipated events:
• Seriously reduce the efficiency and
•
•
•
effectiveness of programs
Important lessons are not learned
The vulnerable and voiceless tend to be the
most affected
Some groups or communities can be
significantly worse off as a result of
development interventions
171
5. What can be done?
This section draws on and expands the
framework presented in:
Jonathan Morell “Evaluation in the Face
of Uncertainty: Anticipating Surprise
and Responding to the Inevitable”
Guilford 2010
172
What can be done?
Understanding the nature of the unexpected
Types of surprise
•
•
•
Surprises we anticipate and address
Surprises that could be planned for and detected
Surprises that cannot be anticipated
The probability and magnitude of surprise is related
to:
•
•
•
•
•
•
The level of innovation in the program [uncharted territory]
The nature of the intervention
Fidelity to the design protocol
Robustness of the program in different settings
Level of understanding of the program context
Time between the intervention and the estimation of
outcomes
173
What can be done?
Ways to foresee and address unanticipated
outcomes
Use of theory-based evaluation
•
With focus on mechanisms, linkages and testing
assumptions
Systematic use of past experience
•
Participatory consultations with affected populations
Break the evaluation down into shorter
segments
Adapt forecasting methods used by planners
Broader definition of the role of monitoring
174
What can be done?
Agile evaluation
Always have a “Plan B” [see next slide]
Flexibility in terms of data collection
Flexible designs that can be rapidly adapted to
changing circumstances
Dynamic monitoring
Unpacking the program into separate, but
closely linked, components
•
Each of which is easier to evaluate
Constant review of the program theory and the
underlying assumptions
175
“Plan B”
Planning for the unexpected
What to do when:
the program begins before the baseline
study has been conducted?
the comparison groups vanish?
the program is restructured?
Essential data is not available?
• It may be intentionally withheld
176
“Eyes wide open” evaluation
Knowing what is really happening on
the ground
Find out about critical events that took place
before the evaluation/ program officially began
Develop diverse sources inside and outside the
program to make sure you know what everyone
else in the country knows
Check who you are getting information from and
who you are not meeting
•
Know who participates in focus groups
Develop creative ways to get information on, and
about, non-beneficiaries and groups who are
worse off
177
A few thoughts on the
evaluation of complex programs
The dimensions of complexity
1.
Scope and scale
2.
Organizational complexity
3.
Contextual factors
4.
Behavioral
5.
Challenges assessing
•
•
•
6.
Attribution
Contribution
Substitution
Some interventions are less complex than claimed
178
1. Scope and scale
a.
b.
c.
Number of components and service
levels
Number and types of geographic areas
Larger programs tend to have more
features and to be implemented
differently in different regions
179
2. Organizational complexity
a.
•
•
•
b.
c.
d.
Programs co-funded by different agencies tend
to be more complex
Different implementation strategies and management procedures
Different components/services
Coordination problems
Programs with several coordinating and
implementing agencies
National and regional level programs have
more levels of coordination
National agencies may lack implementation
capacity
180
3. Contextual factors affecting
complexity
a.
b.
c.
d.
e.
f.
g.
Economic setting
Political context
Administrative and organizational
Climatic and ecological
Legal and regulatory
Historical
Socio-cultural
181
4. Behavioral
a.
b.
c.
“Simple” projects may involve complex
processes of behavioral change
Understanding group behavior
Social networks and social
communication
182
5. Attribution, contribution and
substitution
a.
b.
c.
•
•
•
•
•
•
Attribution difficult to assess for complex programs
Different donors
Target groups exposed to other programs
Difficult to define a counterfactual
Contribution analysis:
Conventional CA only focuses on process but not
outcomes
Substitution
Difficult to track donor funds and other inputs
Even more difficult to track substitution of donor for
government resources
183
6. Interventions that look
“complex” may not be
“complex” interventions may comprise a
number of “simple” components
a.
•
b.
Coordination and management may be the
only “complex” dimensions
Some agencies may find it convenient
to claim their programs are too complex
to be rigorously evaluated – to avoid
critical scrutiny.
184
STRATEGIES FOR EVALUATING COMPLEX
PROGRAMS
Counterfactual designs
• Attribution analysis
• Contribution analysis
• Substitution analysis
Theorybased
approaches
Qualitative
approaches
Quantitative
approaches
Mixed
method
designs
Rating
scales
strengthening alternative
counterfactuals
• “Unpacking complex
programs”
• Portfolio analysis
•Reconstructing baseline
data
• Creative use of secondary
data
• Secondary data
•Triangulation
Estimating
impacts
The valueadded of
agency X
Net increase
in resources
for a program
185