RealWorld Evaluation

Transcript RealWorld Evaluation

RealWorld Evaluation
Designing Evaluations under Budget, Time, Data
and Political Constraints
ReLAC conference
professional pre-session workshop
San Jose, Costa Rica
26 July 2010
Facilitated by
Jim Rugh
Note: This PowerPoint presentation and the summary
chapter of the book are available at:
www.RealWorldEvaluation.org
1
Workshop Objectives
1. The seven steps of the RealWorld Evaluation
approach for addressing common issues and
constraints faced by evaluators such as: when the
evaluator is not called in until the project is nearly
completed and there was no baseline nor
comparison group; or where the evaluation must
be conducted with inadequate budget and
insufficient time; and where there are political
pressures and expectations for how the evaluation
should be conducted and what the conclusions
should say;
2
Workshop Objectives
2.
3.
4.
5.
Defining what impact evaluation should be;
Identifying and assessing various design options
that could be used in a particular evaluation setting;
Ways to reconstruct baseline data when the
evaluation does not begin until the project is well
advanced or completed;
How to minimize threats to validity or adequacy
by utilizing appropriate combinations of quantitative
and qualitative approaches (i.e. mixed methods)
with reference to the specific context of RealWorld
evaluations.
3
Workshop Objectives
Note: This workshop will focus on projectlevel impact evaluations. There are, of
course, many other purposes, scopes,
evaluands and types of evaluations. Some
of these methods may apply to them, but our
examples will be based on project impact
evaluations, most of them in the context of
developing countries.
4
Workshop agenda
1. Introduction [10 minutes]
2. Brief summary of the RealWorld Evaluation (RWE) approach [30 minutes]
3. Small group self-introductions and sharing of RWE issues you have faced in
your own practice. [30 minutes]
4. RWE Steps 1, 2 and 3: Scoping the evaluation and strategies for addressing
budget and time constraints [75 minutes]
--- short break [15 minutes]--5. RWE Step 4: Addressing data constraints [30 minutes]
6. Small groups read their case studies and begin discussions [30 minutes]
--- lunch [60 minutes] --7. Quantitative, qualitative and mixed methods [20 minutes]
8. Small groups complete preparation of their case study ToR exercises [30
minutes]
9. Paired groups negotiate their ToRs [60 minutes]
10. Feedback from exercise [15 minutes]
11. Wrap-up discussion, evaluation of the workshop [30 minutes]
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
OVERVIEW OF THE
RWE APPROACH
6
RealWorld Evaluation Scenarios
Scenario 1: Evaluator(s) not brought in until near
end of project
For political, technical or budget reasons:
• There was no life-of-project evaluation plan
• There was no baseline survey
• Project implementers did not collect
adequate data on project participants at the
beginning nor during the life of the project
• It is difficult to collect data on comparable
control groups
7
RealWorld Evaluation Scenarios
Scenario 2: The evaluation team is called in
early in the life of the project
But for budget, political or methodological
reasons:
 The ‘baseline’ was a needs assessment,
not comparable to eventual evaluation
 It was not possible to collect baseline data
on a comparison group
8
Reality Check – Real-World
Challenges to Evaluation
•
•
•
•
•
•
All too often, project designers do not think
evaluatively – evaluation not designed until the
end
There was no baseline – at least not one with data
comparable to evaluation
There was/can be no control/comparison group.
Limited time and resources for evaluation
Clients have prior expectations for what they want
evaluation findings to say
Many stakeholders do not understand evaluation;
distrust the process; or even see it as a threat
(dislike of being judged)
9
RealWorld Evaluation
Quality Control Goals




Achieve maximum possible evaluation rigor
within the limitations of a given context
Identify and control for methodological
weaknesses in the evaluation design
Negotiate with clients trade-offs between
desired rigor and available resources
Presentation of findings must acknowledge
methodological weaknesses and how they
affect generalization to broader populations
10
The Need for the RealWorld
Evaluation Approach

As a result of these kinds of constraints, many
of the basic principles of rigorous impact
evaluation design (comparable pre-test -- post
test design, control group, adequate instrument
development and testing, random sample
selection, control for researcher bias, thorough
documentation of the evaluation methodology
etc.) are often sacrificed.
11
The RealWorld Evaluation Approach
An integrated approach to
ensure acceptable standards
of methodological rigor while
operating under real-world
budget, time, data and
political constraints.
See the RealWorld Evaluation book
or at least summary chapter for more details
12
The RealWorld Evaluation
approach

Developed to help evaluation practitioners
and clients
• managers, funding agencies and external
consultants


Still a work in progress (we continue to learn
more through workshops like this)
Originally designed for developing countries,
but equally applicable in industrialized
nations
13
Special Evaluation Challenges in
Developing Countries






Unavailability of needed secondary data
Scarce local evaluation resources
Limited budgets for evaluations
Institutional and political constraints
Lack of an evaluation culture (though
evaluation associations are addressing this)
Many evaluations are designed by and for
external funding agencies and seldom reflect
local and national stakeholder priorities
14
Expectations for « rigorous »
evaluations
Despite these challenges, there is a
growing demand for methodologically
sound evaluations which assess the
impacts, sustainability and replicability of
development projects and programs.
(We’ll be talking more about that later.)
15
Most RealWorld Evaluation tools are not
new— but promote a holistic, integrated
approach


Most of the RealWorld Evaluation data
collection and analysis tools will be familiar to
experienced evaluators.
What we emphasize is an integrated
approach which combines a wide range of
tools adapted to produce the best quality
evaluation under RealWorld constraints.
16
What is Special About the
RealWorld Evaluation Approach?


There is a series of steps, each with
checklists for identifying constraints and
determining how to address them
These steps are summarized on the following
slide and then the more detailed flow-chart
…
17
The Steps of the RealWorld
Evaluation Approach
Step 1: Planning and scoping the evaluation
Step 2: Addressing budget constraints
Step 3: Addressing time constraints
Step 4: Addressing data constraints
Step 5: Addressing political constraints
Step 6: Assessing and addressing the strengths and
weaknesses of the evaluation design
Step 7: Helping clients use the evaluation
18
The Real-World Evaluation Approach
Step 1: Planning and scoping the evaluation
A. Defining client information needs and understanding the political context
B. Defining the program theory model
C. Identifying time, budget, data and political constraints to be addressed by the RWE
D. Selecting the design that best addresses client needs within the RWE constraints
Step 2
Addressing budget
constraints
A. Modify evaluation design
B. Rationalize data needs
C. Look for reliable secondary
data
D. Revise sample design
E. Economical data collection
methods
Step 3
Addressing time constraints
All Step 2 tools plus:
F. Commissioning preparatory
studies
G. Hire more resource persons
H. Revising format of project
records to include critical data for
impact analysis.
I. Modern data collection and
analysis technology
Step 6
Assessing and addressing the strengths and weaknesses of the
evaluation design
An integrated checklist for multi-method designs
A. Objectivity/confirmability
B. Replicability/dependability
C. Internal validity/credibility/authenticity
D. External validity/transferability/fittingness
Step 4
Addressing data constraints
A. Reconstructing baseline data
B. Recreating comparison
groups
C. Working with non-equivalent
comparison groups
D. Collecting data on sensitive
topics or from difficult to reach
groups
E. Multiple methods
Step 5
Addressing political
influences
A. Accommodating pressures
from funding agencies or
clients on evaluation design.
B. Addressing stakeholder
methodological preferences.
C. Recognizing influence of
professional research
paradigms.
Step 7
Helping clients use the evaluation
A. Utilization
B. Application
C. Orientation
D. Action
19
TIME FOR SMALL
GROUP DISCUSSION
20
1. Self-introductions
2. What constraints of
these types have you
faced in your evaluation
practice?
3. How did you cope with
them?
21
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Step 1
PLANNING AND SCOPING THE
EVALUATION
22
Step 1: Planning and Scoping the
Evaluation



Understanding client information needs
Defining the program theory model
Preliminary identification of constraints to
be addressed by the RealWorld
Evaluation
23
A. Understanding client information
needs
Typical questions clients want answered:
 Is the project achieving its objectives?
 Is it having desired impact?
 Are all sectors of the target population
benefiting?
 Will the results be sustainable?
 Which contextual factors determine the
degree of success or failure?
24
A. Understanding client information
needs
A full understanding of client information
needs can often reduce the types of
information collected and the level of
detail and rigor necessary.
However, this understanding could also
increase the amount of information
required!
25
B. Defining the program theory
model
All programs are based on a set of assumptions
(hypothesis) about how the project’s
interventions should lead to desired outcomes.
 Sometimes this is clearly spelled out in project
documents.
 Sometimes it is only implicit and the evaluator
needs to help stakeholders articulate the
hypothesis through a logic model.
26
B. Defining the program theory
model

Defining and testing critical assumptions
are essential (but often ignored)
elements of program theory models.

The following is an example of a model
to assess the impacts of microcredit on
women’s social and economic
empowerment
27
Critical logic chain hypothesis for a
Gender-Inclusive Micro-Credit Program




Sustainability
• Structural changes will lead to long-term impacts.
Medium/long-term impacts
• Increased women’s economic and social empowerment.
• Economic and social welfare of women and their families will
improve.
Short-term outcomes
• If women obtain loans they will start income-generating activities.
• Women will be able to control the use of loans and reimburse them.
Outputs
• If credit is available women will be willing and able to obtain loans
and technical assistance.
28
Consequences
Consequences
Consequences
PROBLEM
PRIMARY
CAUSE 1
Secondary
cause 2.1
Tertiary
cause 2.2.1
PRIMARY
CAUSE 2
Secondary
cause 2.2
Tertiary
cause 2.2.2
PRIMARY
CAUSE 3
Secondary
cause 2.3
Tertiary
cause 2.2.3
Consequences
Consequences
Consequences
DESIRED IMPACT
OUTCOME
1
OUTPUT 2.1
OUTCOME
2
OUTPUT 2.2
OUTCOME
3
OUTPUT 2.3
Intervention
Intervention
Intervention
2.2.1
2.2.2
2.2.3
Reduction in poverty
Women empowered
Women in
leadership roles
Improved
educational
policies
Parents
persuaded to
send girls to
school
Young women
educated
Economic
opportunities
for women
Female
enrollment rates
increase
Curriculum
improved
Schools
built
School system
hires and pays
teachers
To have synergy and achieve impact all of these need to address
the same target population.
Program Goal: Young
women educated
Advocacy
Project
Goal:
Improved
educational
policies
enacted
ASSUMPTION
(that others will do this)
Construction
Project Goal:
More
classrooms
built
Teacher
Education
Project
Goal:
Improve
quality of
curriculum
OUR project
PARTNER will do this
Program goal at impact level
What does it take to measure
indicators at each level?
Impact :Population-based survey
(baseline, endline evaluation)
Outcome: Change in behavior of participants
(can be surveyed annually)
Output: Measured and reported by project staff (annually)
Activities: On-going (monitoring of interventions)
Inputs: On-going (financial accounts)
We need to recognize which evaluative
process is most appropriate for
measurement at various levels
• Impact
• Outcomes
• Output
• Activities
• Inputs
IMPACT EVALUATION
PROJECT EVALUATION
PERFORMANCE MONITORING
One form of Program Theory (Logic) Model
Economic context
in which the
project operates
Design
Inputs
Institutional and
operational
context
Political context in
which the project
operates
Implementation
Process
Outputs
Outcomes
Impacts
Sustainability
Socio-economic and cultural characteristics
of the affected populations
Note: The orange boxes are included in conventional Program Theory Models. The
addition of the blue boxes provides the recommended more complete analysis.
35
36
Expanding the results chain for multi-donor, multi-component program
Impacts
Intermediate
outcomes
Outputs
Inputs
Increased
rural H/H
income
Increased
production
Credit for
small
farmers
Donor
Increased
political
participation
Access to offfarm
employment
Rural
roads
Government
Improved
education
performance
Increased school
enrolment
Improved
health
Increased use of
health services
Schools
Health
services
Other donors
Attribution gets very difficult! Consider plausible contributions each makes.
Education Intervention Logic
Output
Clusters
Institutional
Management
Specific
Impact
Outcomes
Better
Allocation of
Educational
Resources
Increased
Affordability of
Education
Quality of
Education
Economic
Growth
Skills and
Learning
Enhancement
MDG 2
Equitable
Access to
Education
Improved
Participation in
Society
Education
Facilities
MDG 3
Poverty
Reduction
MDG 1
Social
Development
MDG 2
Health
Global
Impacts
Improved Family
Planning &
Health Awareness
Curricula &
Teaching
Materials
Teacher
Recruitment
& Training
Intermediate
Impacts
Greater Income
Opportunities
Optimal
Employment
Source: OECE/DAC Network on Development Evaluation
So what should be included in a
“rigorous impact evaluation”?
1.
Direct cause-effect relationship between one output (or a
very limited number of outputs) and an outcome that can
be measured by the end of the research project?  Pretty
clear attribution.
… OR …
2.
Changes in higher-level indicators of sustainable
improvement in the quality of life of people, e.g. the MDGs
(Millennium Development Goals)?  More significant but
much more difficult to assess direct attribution.
39
So what should be included in a
“rigorous impact evaluation”?
OECD-DAC (2002: 24) defines impact as “the positive and
negative, primary and secondary long-term effects
produced by a development intervention, directly or
indirectly, intended or unintended. These effects can be
economic, sociocultural, institutional, environmental,
technological or of other types”.
Does it mention or imply direct attribution? Or point to the
need for counterfactuals or Randomized Control Trials
(RCTs)?
40
Coming to agreement on what levels of the
logic model to include in evaluation


This can be a sensitive issue: Project staff generally don’t
like to be held accountable for more than the output level,
while donors (and intended beneficiaries) may insist on
evaluating higher-level outcomes.
An approach evaluators might take is that if the correlation
between intermediary effects (outcomes) and impact has
been adequately established thorugh research and
previous evaluations, then assessing intermediary
outcome-level indicators might suffice, as long as the
contexts can be shown to be sufficiently similar to where
such cause-effect correlations have been tested.
41
Definition of program evaluation
Program evaluation is the systematic collection of
information about the activities, characteristics and results
of a program to make judgments about the program,
improve or further develop program effectiveness, inform
decisions about future programming, and/or increase
understanding.
-- Michael Quinn Patton, Utilization-Focused Evaluation, 4th edition, 2008, page 39
42
Some of the purposes for program evaluation





Formative: learning and improvement including early
identification of possible problems
Knowledge generating: identify cause-effect correlations and
generic principles about effectiveness.
Accountability: to demonstrate that resources are used
efficiently to attain desired results
Summative judgment: to determine value and future of program
Developmental evaluation: adaptation in complex, emergent
and dynamic conditions
-- Michael Quinn Patton, Utilization-Focused Evaluation, 4th edition, pages 139-140
43
Determining appropriate (and
feasible) evaluation design

Based on the main purpose for
conducting an evaluation, an
understanding of client information
needs, required level of rigor, and what
is possible given the constraints, the
evaluator and client need to determine
what evaluation design is required and
possible under the circumstances.
44
Some of the considerations
pertaining to evaluation design
1: When evaluation events take place
(baseline, midterm, endline)
2. Review different evaluation designs
(experimental, quasi-experimental, other)
3: Levels of rigor
4: Qualitative & quantitative methods
5: A life-of-project evaluation design
perspective.
45
An introduction to various evaluation designs
Illustrating the need for quasi-experimental
longitudinal time series evaluation design
Project participants
Comparison group
baseline
scale of major impact indicator
end of project
evaluation
post project
evaluation
46
OK, let’s stop the action to
identify each of the major
types of evaluation (research)
design …
… one at a time, beginning with the
most rigorous design.
47
First of all: the key to the traditional symbols:




X = Intervention (treatment), I.e. what the
project does in a community
O = Observation event (e.g. baseline, mid-term
evaluation, end-of-project evaluation)
P (top row): Project participants
C (bottom row): Comparison (control) group
Note: the 7 RWE evaluation designs are laid out on page 41 of the
Condensed Overview of the RealWorld Evaluation book
48
Design #1: Longitudinal Quasi-experimental
P1
X
C1
P2
X
C2
P3
P4
C3
C4
Project participants
Comparison group
baseline
midterm
end of project
evaluation
post project
evaluation
49
Design #2: Quasi-experimental (pre+post, with comparison)
P1
X
P2
C1
C2
Project participants
Comparison group
baseline
end of project
evaluation
50
Design #2+: Randomized Control Trial
P1
X
P2
C1
C2
Project participants
Research subjects
randomly assigned
either to project or
control group.
Control group
baseline
end of project
evaluation
51
Design #3: Truncated Longitudinal
X
P1
X
C1
P2
C2
Project participants
Comparison group
midterm
end of project
evaluation
52
Design #4: Pre+post of project; post-only comparison
P1
X
P2
C
Project participants
Comparison group
baseline
end of project
evaluation
53
Design #5: Post-test only of project and comparison
X
P
C
Project participants
Comparison group
end of project
evaluation
54
Design #6: Pre+post of project; no comparison
P1
X
P2
Project participants
baseline
end of project
evaluation
55
Design #7: Post-test only of project participants
X
P
Project participants
end of project
evaluation
56
D
e
s
i
g
n
T4
cont.)
(endline)
(ex-post)
X
P3
C3
P4
C4
X
P2
C2
X
P2
C2
X
X
P2
C2
X
X
P1
C1
X
X
P2
X
X
P1
X
(baseline)
(intervention)
1
P1
C1
X
2
P1
C1
X
3
4
X
P1
5
6
7
P1
T2
X
T3
T1
(midterm)
P2
C2
P1
C1
(intervention,
57
Attribution and counterfactuals
How do we know if the observed changes in
the project participants or communities
•
income, health, attitudes, school attendance. etc
are due to the implementation of the project
•
credit, water supply, transport vouchers, school
construction, etc
or to other unrelated factors?
•
changes in the economy, demographic movements,
other development programs, etc
58
The Counterfactual

What change would have occurred in
the relevant condition of the target
population if there had been no
intervention by this project?
59
Where is the counterfactual?
After families had been living
in a new housing project for
3 years, a study found
average household income
had increased by an 50%
Does this show that housing is
an effective way to raise
income?
60
I n c o m e
Comparing the project with two
possible comparison groups
Project group. 50% increase
750
Scenario 2. 50% increase in
comparison group income. No
evidence of project impact
500
Scenario 1. No increase in
comparison group income.
Potential evidence of project
impact
250
2004
2009
Control group and comparison group


Control group = randomized allocation of
subjects to project and non-treatment group
Comparison group = separate procedure for
sampling project and non-treatment groups
that are as similar as possible in all aspects
except the treatment (intervention)
62
Some recent developments in
impact evaluation in development
2003
2006
J-PAL is best understood as a network of
affiliated researchers … united by their use of
the randomized trial methodology…
2008
2010
2009
63
So, is Jim saying that Randomized Control Trials
(RCTs) are the Gold Standard and should be used
in most if not all program impact evaluations?
Yes or no?
Why or why not?
If so, under what circumstances
should they be used?
If not, under what circumstances
would they not be appropriate?
64
Evidence-based policy for simple interventions (or
simple aspects): when RCTs may be appropriate
Question needed for evidence-based
policy 
What works?
What interventions look like 
Discrete, standardized intervention
How interventions work 
Pretty much the same everywhere
Process needed for evidence
uptake 
Knowledge transfer
Adapted from Patricia Rogers, RMIT University
65
When might rigorous evaluations of higherlevel “impact” indicators not be needed?



Complicated, complex programs where there are multiple
interventions by multiple actors
Projects working in evolving contexts (e.g. conflicts, natural
disasters)
Projects with multiple layered logic models, or unclear
cause-effect relationships between outputs and higher level
“vision statements” (as is often the case in the RealWorld of
international development projects)
66
When might rigorous evaluations of higherlevel “impact” indicators not be needed?

An approach evaluators might take is that if the correlation
between intermediary effects (outcomes) and higher-level
impact has been adequately established though research
and previous evaluations, then assessing intermediary
outcome-level indicators might suffice, as long as the
contexts (internal and external conditions) can be shown to
be sufficiently similar to where such cause-effect
correlations have been tested.
67
Examples of cause-effect correlations
that are generally accepted
• Vaccinating young children with a standard
set of vaccinations at prescribed ages leads to
reduction of childhood diseases (means of
verification involves viewing children’s health charts,
not just total quantity of vaccines delivered to clinic)
•
Other examples … ?
68
Different lenses needed for different
situations in the RealWorld
Simple
Complicated
Following a recipe
Sending a rocket to the Raising a child
moon
Recipes are tested to
assure easy replication
Sending one rocket to
the moon increases
assurance that the next
will also be a success
The best recipes give
There is a high degree
good results every time of certainty of outcome
Complex
Raising one child
provides experience
but is no guarantee of
success with the next
Uncertainty of outcome
remains
Sources: Westley et al (2006) and Stacey (2007), cited in Patton 2008;
also presented by Patricia Rodgers at Cairo impact conference 2009.
69
“Far better an approximate answer to
the right question, which is often vague,
than an exact answer to the wrong
question, which can always be made
precise.“
J. W. Tukey (1962, page 13), "The future of data analysis".
Annals of Mathematical Statistics 33(1), pp. 1-67.
Quoted by Patricia Rogers, RMIT University
70
There can be validity problems
with RTCs


Internal validity
Quality issues – Poor measurement, Poor adherence to
randomisation, Inadequate statistical power, Ignored differential
effects, Inappropriate comparisons, Fishing for statistical
significance, Differential attrition between control and treatment
groups, Treatment leakage, Unplanned cross-over, Unidentified
poor quality implementation
Other issues - Random error, Contamination from other sources,
Need for a complete causal package, Lack of blinding,
External validity
Effectiveness in real world practice, Transferability to new
situations
Patricia Rogers, RMIT University
71
The limited use of strong
evaluation designs

In the RealWorld (at least of international
development programs) we estimate that:
• fewer than 5%-10% of project impact
evaluations use a strong experimental or even
quasi-experimental designs
• significantly less than 5% use randomized
control trials (‘pure’ experimental design)
72
Consider the RealWorld of programs
to be evaluated as a giant puzzle
Experimental research (evaluation) designs,
much less RCTs, are only appropriate for a few
pieces of that giant puzzle.
That’s why good evaluators (and those who
commission evaluations) need a bigger toolbox,
with a more diverse set of tools to be customized
when designing evaluations that respond to
different purposes and circumstances.
73
There are other methods for
assessing the counterfactual




Reliable secondary data that depicts
relevant trends in the population
Longitudinal monitoring data (if it includes
non-reached population)
Qualitative methods to obtain perspectives
of key informants, participants, neighbors,
etc.
We’ll talk more about this later
74
Still part of Step 1: Other questions to
answer as you customize an
evaluation Terms of Reference (ToR):
1.
2.
3.
4.
Who asked for the evaluation? (Who are
the key stakeholders)?
What are the key questions to be
answered?
Will this be a formative or summative
evaluation?
Will there be a next phase, or other
projects designed based on the findings of
this evaluation?
75
Other questions to answer as
you customize an evaluation
ToR:
5.
6.
7.
8.
9.
What decisions will be made in response
to the findings of this evaluation?
What is the appropriate level of rigor?
What is the scope / scale of the
evaluation / evaluand (thing to be
evaluated)?
How much time will be needed /
available?
What financial resources are needed /
available?
76
Other questions to answer as
you customize an evaluation
ToR:
10.
11.
12.
13.
14.
15.
Should the evaluation rely mainly on
quantitative or qualitative methods?
Should participatory methods be used?
Can / should there be a household
survey?
Who should be interviewed?
Who should be involved in planning /
implementing the evaluation?
What are the most appropriate media
for communicating the findings to
different stakeholder audiences?
77
Evaluation (research) design?
Key questions?
Evaluand (what to evaluate)?
Qualitative?
Quantitative?
Scope?
Appropriate level of rigor?
Resources available?
Time available?
Skills available?
Participatory?
Extractive?
Evaluation FOR whom?
Does this help, or just confuse things more? Who
said evaluations (like life) would be easy?!! 78
Before we return to
the RealWorld steps,
let’s gain a
perspective on levels
of rigor, and what a
life-of-project
evaluation plan could
look like
79
Different levels of rigor
depends on source of evidence; level of confidence; use of information
Objective, high precision – but requiring more time & expense
Level 5: A very thorough research project is undertaken to conduct indepth analysis of situation; P= +/- 1% Book published!
Level 4: Good sampling and data collection methods used to gather data
that is representative of target population; P= +/- 5% Decision maker reads
full report
Level 3: A rapid survey is conducted on a convenient sample of
participants; P= +/- 10% Decision maker reads 10-page summary of report
Level 2: A fairly good mix of people are asked their perspectives about
project; P= +/- 25% Decision maker reads at least executive summary of report
Level 1: A few people are asked their perspectives about project;
P= +/- 40% Decision made in a few minutes
Level 0: Decision-maker’s impressions based on anecdotes and sound
bytes heard during brief encounters (hallway gossip), mostly intuition;
Level of confidence +/- 50%; Decision made in a few seconds
Quick & cheap – but subjective, sloppy
80
CONDUCTING AN EVALUATION IS
LIKE LAYING A PIPELINE
QUALITY OF INFORMATION GENERATED BY AN EVALUATION
DEPENDS UPON LEVEL OF RIGOR OF ALL COMPONENTS
AMOUNT OF “FLOW” (QUALITY) OF INFORMATION IS LIMITED TO
THE SMALLEST COMPONENT OF THE SURVEY “PIPELINE”
Determining appropriate levels of precision for
events in a life-of-project evaluation plan
High rigor
Same level of rigor
4
Final
evaluation
Baseline
study
Mid-term
evaluation
3
Needs
assessment
Special
Study
Annual
self-evaluation
2
Low rigor
Time during project life cycle
83
Now, where were we?
Oh, yes, we’re ready for Steps 2 and
3 of the RealWorld Evaluation
Approach.
Let’s continue …
84
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Steps 2 + 3
ADDRESSING BUDGET AND
TIME CONSTRAINTS
85
Step 2: Addressing budget
constraints
A.
B.
C.
D.
E.
Clarifying client information needs
Simplifying the evaluation design
Look for reliable secondary data
Review sample size
Reducing costs of data collection and
analysis
86
Rationalize data needs



Use information from Step 1 to identify
client information needs
Simplify evaluation design (but be
prepared to compensate for ‘missing
pieces’)
Review all data collection instruments and
cut out any questions not directly related
to the objectives of the evaluation.
87
Look for reliable secondary
sources

Planning studies, project administrative
records, government ministries, other
NGOs, universities / research institutes,
mass media.
88
Look for reliable secondary
sources, cont.
Assess the relevance and reliability of
sources for the evaluation with respect
to:
 Coverage of the target population
 Time period
 Relevance of the information collected
 Reliability and completeness of the data
 Potential biases
89
Some ways to save time and
money

Depending upon the purpose and level
of rigor required, some of the options
might include:
• Reducing the number of units studied
•
•
(communities, families, schools)
Reducing the number of case studies or the
duration and complexity of the cases
Reducing the duration or frequency of
observations
90
Seeking ways to reduce
sample size
Accepting a lower level of precision
significantly reduces the required
number of interviews:
 To test for a 5% change in proportions
requires a minimum sample of 1086
 To test for a 10% change in proportions
requires a minimum sample of 270
91
Reducing costs of data
collection and analysis







Use self-administered questionnaires
Reduce length and complexity of survey
instrument
Use direct observation
Obtain estimates from focus groups and
community forums
Key informants
Participatory assessment methods
Multi-methods and triangulation
92
Step 3: Addressing time
constraints
In addition to Step 2 (budget constraint)
methods:
 Reduce time pressures on external
consultants
• Commission preparatory studies
• Video conferences



Hire more consultants/researchers
Incorporate outcome indicators in project
monitoring systems and documents
Technology for data inputting/coding
93
Addressing time constraints
Negotiate with the client to discuss questions such as the
following:
1.
What information is essential and what could be
dropped or reduced?
2.
How much precision and detail is required for the
essential information? E.g. is it necessary to have
separate estimates for each geographical region or
sub-group or is a population average acceptable?
3.
Is it necessary to analyze all project components and
services or only the most important?
4.
Is it possible to obtain additional resources (money,
staff, computer access, vehicles etc) to speed up the
data collection and analysis process?
94
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Step 4
Addressing data
constraints
Ways to reconstruct baseline
conditions
A.
B.
C.
D.
Secondary data
Project records
Recall
Key informants
97
Ways to reconstruct baseline
conditions
E.
PRA (Participatory Rapid Appraisal)
and PLA (Participatory Learning and
Action) and other participatory
techniques such as timelines and
critical incidents to help establish the
chronology of important changes in the
community
98
Assessing the utility of potential
secondary data






Reference period
Population coverage
Inclusion of required indicators
Completeness
Accuracy
Free from bias
99
Examples of secondary data to
reconstruct baselines






Census
Other surveys by government agencies
Special studies by NGOs, donors
University research studies
Mass media (newspapers, radio, TV)
External trend data that might have been
monitored by implementing agency
100
Using internal project records
Types of data
 Feasibility/planning studies
 Application/registration forms
 Supervision reports
 Management Information System (MIS) data
 Meeting reports
 Community and agency meeting minutes
 Progress reports
 Construction, training and other
implementation records, including costs
101
Assessing the reliability of
project records




Who collected the data and for what
purpose?
Were they collected for record-keeping or to
influence policymakers or other groups?
Do monitoring data only refer to project
activities or do they also cover changes in
outcomes?
Were the data intended exclusively for
internal use? For use by a restricted group?
Or for public use?
102
Using recall to reconstruct
baseline data








School attendance and time/cost of travel
Sickness/use of health facilities
Income and expenditures
Community/individual knowledge and skills
Social cohesion/conflict
Water usage/quality/cost
Periods of stress
Travel patterns
103
Where Knowledge about Recall
is Greatest

Areas where most research has been
done on the validity of recall
• Income and expenditure surveys
• Demographic data and fertility behavior

Types of Questions
• Yes/No; fact
• Scaled
• Easily related to major events
104
Limitations of recall




Generally not reliable for precise
quantitative data
Sample selection bias
Deliberate or unintentional distortion
Few empirical studies (except on
expenditure) to help adjust estimates
105
Sources of bias in recall






Who provides the information
Under-estimation of small and routine expenditures
“Telescoping” of recall concerning major expenditures
Distortion to conform to accepted behavior:
•
•
•
Intentional or unconscious
Romanticizing the past
Exaggerating (e.g. “We had nothing before this project came!”)
Contextual factors:
•
•
Time intervals used in question
Respondents expectations of what interviewer wants to
know
Implications for the interview protocol
106
Improving the validity of recall




Conduct small studies to compare recall
with survey or other findings.
Ensure all relevant groups interviewed
Triangulation
Link recall to important reference events
• Elections
• Drought/flood/tsunami/war/displacement
• Construction of road, school etc
107
Key informants


Not just officials and high status people
Everyone can be a key informant on
their own situation:
• Single mothers
• Factory workers
• Users of public transport
• Sex workers
• Street children
108
Guidelines for key-informant
analysis





Triangulation greatly enhances validity
and understanding
Include informants with different
experiences and perspectives
Understand how each informant fits into
the picture
Employ multiple rounds if necessary
Carefully manage ethical issues
109
PRA and related participatory
techniques



PRA (Participatory Rapid Appraisal) and PLA
(Participatory Learning and Action)
techniques collect data at the group or
community [rather than individual] level
Can either seek to identify consensus or
identify different perspectives
Risk of bias:
• If only certain sectors of the community
•
participate
If certain people dominate the discussion
110
Summary of issues in baseline
reconstruction





Variations in reliability of recall
Memory distortion
Secondary data not easy to use
Secondary data incomplete or unreliable
Key informants may distort the past
111
2. Ways to reconstruct
comparison groups



Judgmental matching of communities
When there is phased introduction of
project services beneficiaries entering in
later phases can be used as “pipeline”
comparison groups
Internal controls when different subjects
receive different combinations and levels
of services
112
Using propensity scores to
strengthen comparison groups


Propensity score matching
Rapid assessment studies can compare
characteristics of project and comparison
groups using:
•
•
•
•
•
Observation
Key informants
Focus groups
Secondary data
Aerial photos and GIS data
113
Issues in reconstructing
comparison groups





Project areas often selected purposively and
difficult to match
Differences between project and comparison
groups - difficult to assess whether outcomes
were due to project interventions or to these
initial differences
Lack of good data to select comparison
groups
Contamination (good ideas tend to spread!)
Econometric methods cannot fully adjust for
initial differences between the groups
[unobservables]
114
Enough of my
presentations: it’s time
for you (THE
RealWorld PEOPLE!)
to get involved
yourselves.
Time for small-group
work. Read your case
studies and begin your
discussions.
117
Small group case study work
1.
2.
3.
Some of you are playing the role of
evaluation consultants, others are
clients commissioning the evaluation.
Decide what your group will propose
to do to address the given constraints/
challenges.
Prepare to negotiate the ToR with the
other group (later this afternoon).
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Mixed-method
evaluations
It should NOT be a fight between pure
QUALITATIVE
(verbiage alone)
Quantoid!
OR
QUANTITATIVE
(numbers alone)
Qualoid!
120
“Your numbers
“Your human
look impressive,
interest story
but let me tell
sounds nice, but
you the human
let me show you
interest story.”
the statistics.”
121
What’s needed is the right combination of
BOTH QUALITATIVE methods
AND QUANTITATIVE methods
122
Quantitative data collection methods





Structured surveys (household, farm,
transport usage, etc)
Structured observation
Anthropometric methods
Aptitude and behavioral tests
Indicators that can be counted
123
Qualitative data collection methods
Characteristics






The researcher’s perspective is an integral part of
what is recorded about the social world
Scientific detachment is not possible
Meanings given to social phenomena and situations
must be understood
Programs cannot be studied independently of their
context
Difficult to define clear cause and effect
Change must be studied holistically
124
Using Qualitative methods to improve
the Evaluation design and results
 Use recall to reconstruct the pre-test situation
 Interview key informants to identify other changes in the
community or in gender relations
 Conduct interviews or focus groups with women and
men to
•
•
assess the effect of loans on gender relations within the
household, such as changes in control of resources and
decision-making
identify other important results or unintended consequences:
• increase in women’s work load,
• increase in incidence of gender-based or domestic violence
125
Mixed method evaluation designs

Combine the strengths of both QUANT and QUAL
approaches

One approach ( QUANT or QUAL) is often
dominant and the other complements it

Can have both approaches equal but harder to
design and manage.

Can be used sequentially or concurrently
126
Determining appropriate precision and mix of multiple methods
Nutritional
measurements
Nutritional
measurements
HH
surveys
Focus
Groups
HH
surveys
Focus
Groups
Key
Informant
interviews
Large
group
Low rigor, questionable quality, quick and cheap
Participatory --- Qualitative
Extractive --- Quantitative
High rigor, high quality, more time & expense
Participatory approaches should be
used as much as possible
but even they should be used with appropriate
rigor: how many (and which) people’s
perspectives contributed to the story? 128
Questions?
129
Time for consultancy
teams to meet with
clients to negotiate the
revised ToRs for the
evaluation of the
housing project.
130
In conclusion:
Evaluators must be prepared to:
1.
Enter at a late stage in the project cycle;
2.
Work under budget and time restrictions;
3.
Not have access to comparative baseline
data;
4.
Do without a feasible comparison groups;
5.
Work with very few well qualified evaluation
researchers;
6.
Reconcile different evaluation paradigms and
information needs of different stakeholders.
131
Main workshop messages
1.
2.
3.
4.
5.
Evaluators must be prepared for RealWorld
evaluation challenges.
There is considerable experience to learn from.
A toolkit of practical “RealWorld” evaluation
techniques is available (see
www.RealWorldEvaluation.org).
Never use time and budget constraints as an
excuse for sloppy evaluation methodology.
A “threats to validity” checklist helps keep you
honest by identifying potential weaknesses in
your evaluation design and analysis.
132
133
133

RealWorld Evaluation

Transcript RealWorld Evaluation

Directory