International Program for Development Evaluation Training

Download Report

Transcript International Program for Development Evaluation Training

RealWorld Evaluation
Designing Evaluations under Budget, Time, Data
and Political Constraints
American Evaluation Association
Professional pre-session workshop
Denver
November 5, 2008
Facilitated by:
Michael Bamberger
and Jim Rugh
1
Workshop Objectives
1. The seven steps of the RealWorld Evaluation
approach for addressing common issues and
constraints faced by evaluators such as: when the
evaluator is not called in until the project is nearly
completed and there was no baseline nor
comparison group; or where the evaluation must be
conducted with inadequate budget and insufficient
time; and where there are political pressures and
expectations for how the evaluation should be
conducted and what the conclusions should say
2
Workshop Objectives
2. Identifying and assessing various design options
that could be used in a particular evaluation setting
3. Ways to reconstruct baseline data when the
evaluation does not begin until the project is well
advanced or completed.
4. How to identify and to address threats to the validity
or adequacy of quantitative, qualitative and mixed
methods designs with reference to the specific
context of RealWorld evaluations
3
Workshop Objectives
Note: Given time constraints the workshop
will focus on project-level impact
evaluations. However, if the results of a preworkshop survey of participants call for it, a
brief introduction to the application of RWE
techniques in other forms of evaluation,
including the assessment of country
programs and policy interventions, could be
included.
4
Workshop agenda
8.00 – 8.20: Session 1: Introduction:
•
•
•
Workshop objectives
Feedback from participant survey
Handout: RealWorld Evaluation Overview (summary chapter of book)
8.20 – 8.50: Session 2: RealWorld Evaluation overview and
addressing the counterfactual
•
Handout: “Why evaluators can’t sleep at night”
8.50 - 9.20: Session 3: Small Group discussions.
•
Participants will introduce themselves and then share experiences on the types of constraints they have
faced when designing and conducting evaluations, and what they did to try to address those constraints.
9:20-10.00: Session 4: RWE Steps 1, 2 and 3: Scoping the
evaluation and strategies for addressing budget and time
constraints
•
Presentation and discussion
10.00-10.15: BREAK
Workshop agenda, cont.
10:15 – 10:45: Session 5: RWE Step 4: Addressing data constraints
•
Presentation and discussion
10.45 – 11.15: Session 6: Mixed methods
•
Presentation and discussion
11.15 – 12.00: Session 7: Small groups read their case studies and
begin to discuss the learning exercise.
•
We will use a low-cost housing case study. All four groups will discuss the same project but from different
perspectives.
12:00 – 1:00: LUNCH
2.10 – 1.45: Session 8: Identifying and addressing threats to the
validity of the evaluation design and conclusions
1.45 – 2.30: Session 9: Small groups complete exercise.
•
Negotiate with your paired group how you propose to modify the ToR of your case study.
2.30 – 2.45: Session 10: Feedback from exercise.
•
Discussion of lessons learned from the case study or the RealWorld Evaluation approach in general.
2.45 – 3.00: Session 11: Wrap up and workshop evaluation
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Session 2. a
OVERVIEW OF THE
RWE APPROACH
7
RealWorld Evaluation Scenarios
Scenario 1: Evaluator(s) not brought in until near
end of project
For political, technical or budget reasons:
• There was no baseline survey
• Project implementers did not collect
adequate data on project participants at the
beginning or during the life of the project
• It is difficult to collect data on comparable
control groups
8
RealWorld Evaluation Scenarios
Scenario 2: The evaluation team is called in
early in the life of the project
But for budget, political or methodological
reasons:
 The ‘baseline’ was a needs assessment,
not comparable to eventual evaluation
 It was not possible to collect baseline data
on a comparison group
9
Reality Check – Real-World
Challenges to Evaluation
•
•
•
•
•
•
All too often, project designers do not think
evaluatively – evaluation not designed until the
end
There was no baseline – at least not one with data
comparable to evaluation
There was/can be no control/comparison group.
Limited time and resources for evaluation
Clients have prior expectations for what the
evaluation findings will say
Many stakeholders do not understand evaluation;
distrust the process; or even see it as a threat
(dislike of being judged)
11
RealWorld Evaluation
Quality Control Goals




Achieve maximum possible evaluation rigor
within the limitations of a given context
Identify and control for methodological
weaknesses in the evaluation design
Negotiate with clients trade-offs between
desired rigor and available resources
Presentation of findings must recognize
methodological weaknesses and how they
affect generalization to broader populations
12
The Need for the RealWorld
Evaluation Approach

As a result of these kinds of constraints, many
of the basic principles of impact evaluation
design (comparable pre-test-post test design,
comparison group, instrument development and
testing, random sample selection, control for
researcher bias, thorough documentation of the
evaluation methodology etc.) are often
sacrificed.
13
The RealWorld Evaluation
Approach
An integrated approach to
ensure acceptable standards
of methodological rigor while
operating under real-world
budget, time, data and
political constraints.
See handout summary chapter extracted from
RealWorld Evaluation book for more details
15
The RealWorld Evaluation
approach

Developed to help evaluation practitioners
and clients
• managers, funding agencies and external
consultants


A work in progress
Originally designed for developing countries,
but equally applicable in industrialized
nations
16
Special Evaluation Challenges in
Developing Countries






Unavailability of needed data
Scarce local evaluation resources
Limited budgets for evaluations
Institutional and political constraints
Lack of an evaluation culture
Many evaluations are designed by, and for,
external funding agencies and seldom reflect
local and national stakeholder priorities
17
Special Evaluation Challenges in
Developing Countries
Despite these challenges, there is a
growing demand for methodologically
sound evaluations which assess the
impacts, sustainability and replicability of
development projects and programs
…………………….
18
Most RealWorld Tools are not New—
Only the Integrated Approach is New


Most of the RealWorld Evaluation data
collection and analysis tools will be familiar to
most evaluators
What is new is the integrated approach
which combines a wide range of tools to
produce the best quality evaluation under
real-world constraints
19
Who Uses RealWorld Evaluation
and When?


Two main users:
• Evaluation practitioners
• Managers, funding agencies and external
consultants
The evaluation may start at:
• The beginning of the project
• After the project is fully operational
• During or near the end of project
implementation
• After the project is finished
21
What is Special About the
RealWorld Evaluation Approach?


There is a series of steps, each with
checklists for identifying constraints and
determining how to address them
These steps are summarized on the following
slide and then the more detailed flow-chart
…
(See page6of handout)
22
The Steps of the RealWorld
Evaluation Approach
Step 1: Planning and scoping the evaluation
Step 2: Addressing budget constraints
Step 3: Addressing time constraints
Step 4: Addressing data constraints
Step 5: Addressing political constraints
Step 6: Assessing and Addressing the strengths
and weaknesses of the evaluation design
Step 7: Helping clients use the evaluation
23
The Real-World Evaluation Approach
Step 1: Planning and scoping the evaluation
A. Defining client information needs and understanding the political context
B. Defining the program theory model
C. Identifying time, budget, data and political constraints to be addressed by the RWE
D. Selecting the design that best addresses client needs within the RWE constraints
Step 2
Addressing budget
constraints
A. Modify evaluation design
B. Rationalize data needs
C. Look for reliable secondary
data
D. Revise sample design
E. Economical data collection
methods
Step 3
Addressing time constraints
All Step 2 tools plus:
F. Commissioning preparatory
studies
G. Hire more resource persons
H. Revising format of project
records to include critical data for
impact analysis.
I. Modern data collection and
analysis technology
Step 6
Assessing and addressing the strengths and weaknesses of the
evaluation design
An integrated checklist for multi-method designs
A. Objectivity/confirmability
B. Replicability/dependability
C. Internal validity/credibility/authenticity
D. External validity/transferability/fittingness
Step 4
Addressing data constraints
A. Reconstructing baseline data
B. Recreating comparison
groups
C. Working with non-equivalent
comparison groups
D. Collecting data on sensitive
topics or from difficult to reach
groups
E. Multiple methods
Step 5
Addressing political
influences
A. Accommodating pressures
from funding agencies or
clients on evaluation design.
B. Addressing stakeholder
methodological preferences.
C. Recognizing influence of
professional research
paradigms.
Step 7
Helping clients use the evaluation
A. Utilization
B. Application
C. Orientation
D. Action
24
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Session 2.b
The challenge of the
counterfactual
Attribution and counterfactuals
How do we know if the observed changes in
the project participants or communities
•
income, health, attitudes, school attendance etc
are due to the implementation of the project
•
credit, water supply, transport vouchers, school
construction etc
or to other unrelated factors?
•
changes in the economy, demographic movements,
other development programs etc
26
The Counterfactual

What would have been the condition of
the project population at the time of the
evaluation if the project had not taken
place?
27
Where is the counterfactual?
After families had been living
in a new housing project for
3 years, a study found
average household income
had increased by an 50%
Does this show that housing is
an effective way to raise
income?
28
Comparing the project with two
possible comparison groups
I
n
c
o
m
e
Project group. 50% increase
750
Scenario 1. 50% increase in
comparison group income. No
evidence of project impact
500
Scenario 2. No increase in
comparison group income.
Potential evidence of project
impact
250
2000
2002
5 main evaluation strategies
for addressing the counterfactual
Randomized designs
I. True experimental designs
II. Randomized field designs
Quasi-experimental designs
III. Strong quasi-experimental designs
IV. Weaker quasi-experimental designs
Non-experimental designs.
V. No logically defensible counterfactual
30
The best statistical design option in most field
settings: Randomized or strong quasi-experimental
evaluation designs
Subjects randomly
assigned to the
project and control
groups or control
group selected
using statistical or
judgmental
matching
T1
Pre-test
T2
Treatment
[project]
T3
Posttest
Project group
P1
X
P2
Control group
C1
Gain score [impact] = P2 – P1
C2– C1
C2
Conditions of both
groups are not
controlled during
the project
Control group and comparison group


Control group = randomized allocation of
subjects to project and non-treatment group
Comparison group = separate procedure for
sampling project and non-treatment groups
32
Reference sources for
randomized field trial designs
1. MIT Poverty Action Lab
www.povertyactionlab.org
2. Center for Global Development
“When will we ever learn?”
http://www.cgdev.org/content/publications/detail/7973
3. International Initiative for Impact Evaluation = 3ie
http://www.3ieimpact.org/
33
The limited use of strong
evaluation designs

It is estimated that
• Less only 5-10% of impact evaluations use a
•
strong quasi-experimental design
Significantly less than 5% use randomized
control trials
34
TIME FOR DISCUSSION
35
Introductory small-group
discussions
Introduce yourselves, including something
about your experience in coordinating or
conducting evaluations.
In particular share experiences on the types
of constraints you have faced when
designing and conducting evaluations, and
what you did to try to address those
constraints.
36
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Session 4, Step #1
PLANNING AND SCOPING THE
EVALUATION
37
Step 1: Planning and Scoping the
Evaluation



Understanding client information needs
Defining the program theory model
Preliminary identification of constraints to
be addressed by the RealWorld
Evaluation
38
A. Understanding client information
needs
Typical questions clients want answered:
 Is the project achieving its objectives?
 Are all sectors of the target population
benefiting?
 Are the results sustainable?
 Which contextual factors determine the
degree of success or failure?
39
A. Understanding client information
needs
A full understanding of client information
needs can often reduce the types of
information collected and the level of
detail and rigor necessary.
However, this understanding could also
increase the amount of information
required!
40
B. Defining the program theory
model
All programs are based on a set of assumptions
(hypothesis) about how the project’s
interventions should lead to desired outcomes.
 Sometimes this is clearly spelled out in project
documents.
 Sometimes it is only implicit and the evaluator
needs to help stakeholders articulate the
hypothesis through a logic model.
41
B. Defining the program theory
model

Defining and testing critical assumptions
are a essential (but often ignored)
elements of program theory models.

The following is an example of a model
to assess the impacts of microcredit on
women’s social and economic
empowerment
42
Critical Hypothesis for a Gender-Inclusive
Micro-Credit Program




Outputs
• If credit is available women will be willing and able to obtain loans
and technical assistance.
Short-term outcomes
• If women obtain loans they will start income-generating activities.
• Women will be able to control the use of loans and reimburse them.
Medium/long-term impacts
• Economic and social welfare of women and their families will
improve.
• Increased women’s economic and social empowerment.
Sustainability
• Structural changes will lead to long-term impacts.
43
C. Determining appropriate (and
feasible) evaluation design

Based on an understanding of client
information needs, required level of rigor,
and what is possible given the
constraints, the evaluator and client
need to determine what evaluation
design is required and possible under
the circumstances.
44
Let’s focus for a while on evaluation
design (a quick review)
1: Review different evaluation
(experimental/research) designs
2: Develop criteria for determining appropriate
Terms of Reference (ToR) for evaluating a
project, given its own (planned or unplanned) evaluation design.
3: Defining levels of rigor
4: A life-of-project evaluation design
perspective.
45
45
An introduction to various evaluation designs
Illustrating the need for quasi-experimental
longitudinal time series evaluation design
Project participants
Comparison group
baseline
scale of major impact indicator
end of project
evaluation
post project
evaluation
46
OK, let’s stop the action to
identify each of the major
types of evaluation (research)
design …
… one at a time, beginning with the
most rigorous design.
47
First of all: the key to the traditional symbols:




X = Intervention (treatment), I.e. what the
project does in a community
O = Observation event (e.g. baseline, mid-term
evaluation, end-of-project evaluation)
P (top row): Project participants
C (bottom row): Comparison (control) group
Note: the RWE evaluation designs are laid out in Table 3 on page 46 of your handout
48
Design #1: Longitudinal Quasi-experimental
P1
X
C1
P2
X
C2
P3
P4
C3
C4
Project participants
Comparison group
baseline
midterm
end of project
evaluation
post project
evaluation
49
Design #1+: Longitudinal Randomized Control Trial
P1
X
C1
P2
X
C2
P3
P4
C3
C4
Project participants
Research subjects
randomly assigned
either to project or
control group.
Control group
baseline
midterm
end of project
evaluation
post project
evaluation
50
Design #2: Randomized Control Trial
P1
X
P2
C1
C2
Project participants
Research subjects
randomly assigned
either to project or
control group.
Control group
baseline
end of project
evaluation
51
Design #3: Quasi-experimental (pre+post, with comparison)
P1
X
P2
C1
C2
Project participants
Comparison group
baseline
end of project
evaluation
52
Design #7: Truncated Longitudinal
X
P1
X
C1
P2
C2
Project participants
Comparison group
midterm
end of project
evaluation
53
Design #8: Pre+post of project; post-only comparison
P1
X
P2
C
Project participants
Comparison group
baseline
end of project
evaluation
54
Design #9: Post-test only of project and comparison
X
P
C
Project participants
Comparison group
end of project
evaluation
55
Design #10: Pre+post of project; no comparison
P1
X
P2
Project participants
baseline
end of project
evaluation
56
Design #11: Post-test only of project participants
X
P
Project participants
end of project
evaluation
57
Some of the questions to consider as
you customize an evaluation Terms of
Reference (ToR):
1.
2.
3.
4.
Who asked for the evaluation? (Who are
the key stakeholders)?
What are the key questions to be
answered?
Will this be a formative or summative
evaluation?
Will there be a next phase, or other
projects designed based on the findings of
this evaluation?
58
Other questions to answer as
you customize an evaluation
ToR:
5.
6.
7.
8.
9.
What decisions will be made in response
to the findings of this evaluation?
What is the appropriate level of rigor?
What is the scope / scale of the
evaluation / evaluand (thing to be
evaluated)?
How much time will be needed /
available?
What financial resources are needed /
available?
59
Other questions to answer as
you customize an evaluation
ToR:
10.
11.
12.
13.
14.
15.
Should the evaluation rely mainly on
quantitative or qualitative methods?
Should participatory methods be used?
Can / should there be a household
survey?
Who should be interviewed?
Who should be involved in planning /
implementing the evaluation?
What are the most appropriate media
for communicating the findings to
different stakeholder audiences?
60
Evaluation (research) design?
Key questions?
Evaluand (what to evaluate)?
Qualitative?
Quantitative?
Scope?
Appropriate level of rigor?
Resources available?
Time available?
Skills available?
Participatory?
Extractive?
Evaluation FOR whom?
Does this help, or just confuse things more? Who
said evaluations (like life) would be easy?!! 61
TIME FOR DISCUSSION
65
Now, where were we?
Oh, yes, we’re ready for Steps 2 and
3 of the RealWorld Evaluation
Approach.
Let’s continue …
66
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Steps 2 + 3
ADDRESSING BUDGET AND
TIME CONSTRAINTS
67
Step 2: Addressing budget
constraints
A.
B.
C.
D.
E.
Clarifying client information needs
Simplifying the evaluation design
Look for reliable secondary data
Review sample size
Reducing costs of data collection and
analysis
68
2A: Simplifying the evaluation
design


For quantitative evaluations it is possible
to select among the most common
evaluation designs (noting the trade-offs
when using a simpler design).
For qualitative evaluations the options
will vary depending on the type of
design.
69
2A (cont): Qualitative designs

Depending upon the design, some of the
options might include:
• Reducing the number of units studied
•
•
(communities, families, schools)
Reducing the number of case studies or the
duration and complexity of the cases.
Reducing the duration or frequency of
observations
70
2.B. Rationalize data needs


Use information from Step 1 to identify
client information needs
Review all data collection instruments
and cut out any questions not directly
related to the objectives of the
evaluation.
71
2.C. Look for reliable
secondary sources

Planning studies, project administrative
records, government ministries, other
NGOs, universities / research institutes,
mass media.
72
2.C. Look for reliable
secondary sources, cont.
Assess the relevance and reliability of
sources for the evaluation with respect
to:
 Coverage of the target population
 Time period
 Relevance of the information collected
 Reliability and completeness of the data
 Potential biases
73
2.D. Seeking ways to reduce
sample size
Accepting a lower level of precision
significantly reduces the required
number of interviews:
 To test for a 5% change in proportions
requires a maximum sample of 1086
 To test for a 10% change in proportions
requires a maximum sample of up to 270
74
2.E. Reducing costs of data
collection and analysis







Use self-administered questionnaires
Reduce length and complexity of
instrument
Use direct observation
Obtain estimates from focus groups and
community forums
Key informants
Participatory assessment methods
Multi-methods and triangulation
76
Step 3: Addressing time
constraints
In addition to Step 2 methods:
 Reduce time pressures on external
consultants
• Commission preparatory studies
• Video conferences



Hire more consultants/researchers
Incorporate outcome indicators in project
monitoring systems and documents
Technology for data inputting/coding
77
Addressing time constraints

It is important to distinguish between approaches
that reduce the:
a) duration in terms of time over the life of the
project (e.g. from baseline to final evaluation over 5
years)
b) duration in terms of the time needed to undertake
the actual evaluation study/studies (e.g. 6 weeks,
whether completed in an intensive consecutive 6
weeks or a cumulative total of 6 weeks periodically
over the course of a year), and
b) the level of effort (person-days, i.e. number of
staff x total days required).
78
Addressing time constraints
Negotiate with the client to discuss questions such as the
following:
1.
What information is essential and what could be
dropped or reduced?
2.
How much precision and detail is required for the
essential information? E.g. is it necessary to have
separate estimates for each geographical region or
sub-group or is a population average acceptable?
3.
Is it necessary to analyze all project components and
services or only the most important?
4.
Is it possible to obtain additional resources (money,
staff, computer access, vehicles etc) to speed up the
data collection and analysis process?
79
TIME FOR DISCUSSION
80
80
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Session 5
Addressing data
constraints
Step 4 Addressing data constraints
Step 1
Planning and Scoping the Evaluation
Step 2
Addressing budget
constraints
Step 3
Addressing time
constraints
Step 4
Addressing data
constraints
Step 5
Addressing political
constraints
Step 6 Assessing the strengths and weaknesses
of the evaluation
design
Step 7 Strengthening the
Evaluation Design
Step 4
Addressing data constraints
A. Reconstructing baseline data
B. Special challenges in working with
comparison groups.
C. Collecting data on sensitive topics
D. Collecting data on difficult to
reach groups
Two kinds of data constraints:
1.
2.
Reconstructing baseline
data
Special data issues for
comparison groups
83
1. Reconstructing baseline
conditions for project
and comparison groups
[see Table 10, p. 59]
1. The importance
of baseline data


Hard to assess change without data on preproject conditions
Post-test comparisons do not fully address:
•
Selection bias: initial differences between participants
and non-participants
• Propensity score matching and instrumental variables
•
partially addresses this
Historical factors influencing outcomes that were
assumed to have been caused by the project
intervention
85
1. Ways to reconstruct baseline
conditions
A.
B.
C.
D.
E.
Secondary data.
Project records.
Recall
Key informants
PRA and other participatory techniques
such as timelines, and critical incidents
to help establish the chronology of
important changes in the community
86
1-A. Assessing the utility of
potential secondary data






Reference period
Population coverage
Inclusion of required indicators
Completeness
Accuracy
Free from bias
87
1-A. Using secondary data to
reconstruct baselines







Census
Surveys
Project administrative data
Agency reports
Special studies by NGOs, donors
University studies
Mass media (newspapers, radio, TV)
88
1-A. Using secondary data to
reconstruct baselines






Community organization records
Notices in offices, community centers etc
Posters
Birth/death records
Wills and documents concerning
property
Private Sector data
89
1-B. Using project records
Types of data
 Feasibility/planning studies
 Application/registration forms
 Supervision reports
 MIS data
 Meeting reports
 Community and agency meeting minutes
 Progress reports
 Construction costs
90
1-B. Assessing the reliability of
project records




Who collected the data and for what
purpose?
Were they collected for record-keeping or to
influence policymakers or other groups?
Do monitoring data only refer to project
activities or do they also cover changes in
outcomes?
Were the data intended exclusively for
internal use? For use by a restricted group?
Or for public use?
91
1-B. Assessing the reliability of
project records


How accurate and complete are the
data? Are there obvious gaps? Were
these intentional or due to poor recordkeeping.
Potential biases with respect to the key
indicators required for the impact
evaluation?
92
1-B. Working with the client to improve the
utility of project data for evaluation



Collecting additional information on
applicants or participants
Ensure identification data is included and
accurate.
Ensure data organized in the way
needed for evaluation [by community/
types of service/ family rather than just
individuals/ economic level etc]
93
1-C. Using recall to reconstruct
baseline data








School attendance and time/cost of travel
Sickness/use health facilities
Income and expenditures
Community/individual knowledge and skills
Social cohesion/conflict
Water usage/quality/cost
Periods of stress
Travel patterns
94
1-C. Where Knowledge about
Recall is Greatest

Areas where most research has been
done on the validity of recall
• Income and expenditure surveys
• Demographic data and fertility behavior

Types of Questions
• Yes/No; fact
• Scaled
• Easily related to major events
95
1-C. Limitations of recall




Generally not reliable for precise
quantitative data
Sample selection bias
Deliberate or unintentional distortion
Few empirical studies (except on
expenditure) to help adjust estimates.
96
1-C. Sources of bias in recall






Who provides the information
Under-estimation of small and routine expenditures
“Telescoping” of recall concerning major expenditures.
Distortion to conform to accepted behavior.
•
•
Intentional
Romanticizing the past
Contextual factors:
•
•
Time intervals used in question
Respondents expectations of what interviewer wants to
know
Implications for the interview protocol
97
1-C. Improving the validity of
recall




Conduct small studies to compare recall
with survey or other findings.
Ensure all groups interviewed
Triangulation
Link recall to important reference events
• Elections
• Drought/floods
• Construction of road, school etc
98
1-D. Key informants


Not just officials and high status people
Everyone can be a key informant on
their own situation:
• Single mothers
• Factory workers
• Users of public transport
• Sex-workers
• Street children
99
1-D. Guidelines for keyinformant analysis





Triangulation greatly enhances validity
and understanding
Include informants with different
experiences and perspectives
Understand how each informant fits into
the picture.
Employ multiple rounds if necessary
Carefully manage ethical issues
100
1-E. PRA and related participatory
techniques



PRA techniques collect data at the group
or community [rather than individual]
level.
Can either seek to identify consensus or
identify different perspectives.
Risk of bias:
• Only certain sectors of the community attend
• Certain people dominate the discussion
101
1-E. Time-related PRA techniques
useful for reconstructing the past








Time line
Trend analysis
Historical transect
Seasonal diagram
Daily activity schedule
Participatory genealogy
Dream map
Critical incidents
102
1-E. Using PRA recall methods: seasonal calendars
Seasonal Calendar of Poverty Drawn by Villagers in Nyamira,
Kenya
Jan
Feb
Mar
April
Light meals
OOO
OOO
O
O
Begging
OOO
OOO
OOO
OOO
O
Migration
OOO
OOO
OO
Unemployment
OOO
OOO
OOO
OOO
OO
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
OO
OOO
OO
O
O
OO
Income
O
OOO
O
OO
OO
OOO
O
OOO
OOO
OOO
OOO
OOO
OO
Disease
O
OOO
O
OO
OO
OOO
OO
OOO
OO
Rainfall
OO
OO
OOO
O
O
O
Source: Rietbergen-McCracken and Narayan 1997
O
O
O
OO
O
OO
O
O
1-F. Issues in baseline
reconstruction





Variations in reliability of recall.
Memory distortion.
Secondary data not easy to use
Secondary data incomplete or unreliable.
Key informants may distort the past
104
2. Reconstructing comparison
(control) groups
105
2. Ways to reconstruct control
groups



Judgmental matching of communities.
When phased introduction of project
services beneficiaries entering in later
phases can be used as “pipeline” control
group.
Internal controls when different subjects
receive different combinations and levels
of services
106
2. Using propensity scores to
strengthen comparison groups


Propensity score matching
Rapid assessment studies can compare
characteristics of project and control groups
using:
•
•
•
•
•
Observation
Key informants
Focus groups
Secondary data
Aerial photos and GIS data
107
2. Using propensity scores to
strengthen comparison groups



Logistical regression (Logit) on project and
comparison population to identify determinants
of project participation
Select “nearest neighbors” (usually around 5)
from comparison group who most closely
match a participant.
Project impact = gain score = difference
between project participant score and mean
score for nearest neighbors.
108
Issues in reconstructing control
groups





Project areas often selected purposively and
difficult to match.
Differences between project and control
groups - difficult to assess if outcomes due to
project or to these initial differences.
Lack of good data to select control groups
Contamination
Econometric methods cannot fully adjust for
initial differences between the groups
[unobservables].
109
References




Bamberger, Rugh and Mabry (2006).
RealWorld Evaluation. Chapter 5
Kumar, S (2002). Methods for Community
Participation. A complete guide for
practitioners.
Patton, M.Q. (2002). Qualitative research
and evaluation methods. Chapters 6 and 7.
Roche, C. 1999. Impact assessment for
development agencies. Chapter 5.
110
Pause for DISCUSSION
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Session 6
Mixed-method evaluations
It should NOT be a fight between pure
QUALITATIVE
(verbiage alone)
Quantoid!
OR
QUANTITATIVE
(numbers alone)
Qualoid!
113
“Your human
interest story
sounds nice, but
let me show you
the statistics.”
QUALITATIVE
QUANTITATIVE
“Your numbers
look impressive,
but let me tell
you the human
interest story.”
114
What’s needed is the right combination of
BOTH QUALITATIVE methods
AND QUANTITATIVE methods
115
I. Mixed Method Designs
1. Quantitative data collection
methods




Structured surveys (household, farm,
transport usage etc)
Structured observation
Anthropometric methods
Aptitude and behavioral test
116
1. Quantitative data collection methods
Strengths and weaknesses
Strengths






Generalization
Statistically representative
Estimate magnitude and
distribution of impacts
Clear documentation of
methods
Standardized approach
Statistical control of bias and
external factors
Weaknesses






Surveys cannot capture many
types of information
Do not work for difficult to reach
groups
No analysis of context
Survey situation may alienate
Long delay in obtaining results
Data reduction loses information
117
2. Qualitative data collection methods
Interviewing







Structured
Semi-structured
Unstructured
Focus groups
Community
interviews
PRA
Audio recording
Analysis of
Documents and
Artifacts
Observation




Participant observation
Structured observation
Unstructured observation
Photography and video
recording




Project documents
Published reports
E-mail
Legal documents:
•
•
•
birth and death certificates,
property transfer documents
marriage certificates
 Posters
 Decorations in the house
 Clothing and gang insignia
118
2. Qualitative data collection methods
Characteristics





The researcher’s perspective is an integral part of
what is recorded about the social world
Scientific detachment is not possible
Meanings given to social phenomena and situations
must be understood
Programs cannot be studied independently of their
context.
Cause and effect cannot be defined and change
must be studied holistically.
119
2. Qualitative data collection methods
Strengths and weaknesses
Strengths






Flexible to evolve
Sampling focuses on high
value subjects
Holistic focus (“the big picture”)
Multiple sources provide
complex understanding
Narrative more accessible to
non-specialists
Triangulation strengthens
validity of findings
Weaknesses





Lack of clear design may
frustrate clients
Lack of generalizability
Multiple perspectives -hard to
reach consensus
Individual factors not isolated.
Interpretive methods appear too
subjective
120
3. Mixed method evaluation designs

Combine the strengths of both QUANT and QUAL
approaches

One approach ( QUANT or QUAL) is often
dominant and the other complements it

Can have both approaches equal but harder to
design and manage.

Can be used sequentially or concurrently
121
Determining appropriate precision and mix of multiple methods
Nutritional
measurements
HH
surveys
Focus
Groups
Nutritional
measurements
Focus
Groups
HH
surveys
Key
Informant
interviews
Large
group
Low rigor, questionable quality, quick and cheap
Participatory --- Qualitative
Extractive --- Quantitative
High rigor, high quality, more time & expense
3. Mixed method evaluation designs
How quantitative and qualitative methods
complement each other
A. Broaden the conceptual framework
• Combining theories from different disciplines:
• Exploratory QUAL studies can help define framework
B. Combine generalizability with depth and context
• Random subject selection ensures representativity and generalizability
• Case studies, focus groups etc can help understand the characteristics of the
different groups selected in the sample
C. Permit access to difficult to reach groups [QUAL]
• PRA, focus groups, case studies, snowball samples, etc can be effective
•
ways to reach women, ethnic minorities and other vulnerable groups
Direct observation can provide information on groups difficult to interview.
For example, informal sector and illegal economic activities
D. Enable Process analysis [QUAL]
• Observation, focus groups and informal conversations are more effective for
understanding group processes or interaction between people and public
agencies, and studying the organization
123
3. Mixed method evaluation designs
How quantitative and qualitative methods
complement each other (cont.)
D.
Analysis and control for underlying structural factors [QUANT]
•
•
Sampling and statistical analysis can avoid misleading conclusions
Propensity scores and multivariate analysis can statistically control for
differences between project and control groups
Example:
•
•
Meetings with women may suggest gender biases in local firms’ hiring
practices; however,
Using statistical analysis to control for years of education or experience
may show there are no differences in hiring policies for workers with
comparable qualifications
Example:
•
•
Participants who volunteer to attend a focus group may be strongly in
favor or opposed to a certain project, but
A rapid sample survey may show that most community residents have
different views
124
3. Mixed method evaluation designs
How quantitative and qualitative methods
complement each other (cont.)
F. Triangulation and consistency checks
•
•
Direct observation may identify inconsistencies in interview responses
Examples:
•
•
A family may say they are poor but observation shows they have new
furniture, good clothes etc.
A woman may say she has no source of income, but an early morning visit
may show she operates an illegal beer brewing business
G. Broadening the interpretation of findings:
•
•
Combining personal experience with “social facts”
Statistical analysis frequently includes unexpected or interesting
findings which cannot be explained through the statistics. Rapid
follow-up visits may help explain the findings
125
3. Mixed method evaluation designs
How quantitative and qualitative methods
complement each other (cont.)
G. Interpreting findings
Example:
• A QUANT survey of community water management in
Indonesia found that with only one exception all village water
supply was managed by women
• Follow-up visits found that in the one exceptional village
women managed a very profitable dairy farming business –
so men were willing to manage water to allow women time to
produce and sell dairy produce
Source: Brown (2000)
126
Using Qualitative methods to improve
the Evaluation design and results
 Use recall to reconstruct pre-test situation
 Interviews with key informants to identify other changes
in the community or in gender relations
 Interviews or focus groups with women and men to
•
•
assess the effect of loans on gender relations within the
household, such as
• changes in control of resources and decision-making
identify other important results or unintended consequences:
• increase in women’s work load,
• increase in incidence of gender-based or domestic violence
127
Enough of our
presentations: it’s time for
you (THE RealWorld
PEOPLE!) to get involved
Small group case study work
1.
2.
3.
Some of you are playing the role of
evaluation consultants, others are clients
coordinating the evaluation.
Decide what your group will do to
address the given constraints/
challenges.
Prepare to negotiate the ToR with the
other group after lunch.
129
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Session 8
Identifying and addressing threats
to the validity of the evaluation
design and conclusions
130
The Real World Evaluation [RWE] Approach
Step 1
Planning and scoping the evaluation
Step 2
Addressing budget constraints
Step 3
Addressing time constraints
Step 4
Addressing data constraints
Step 5
Addressing political influences
Step 6
Strengthening the evaluation design
and validity
Step 7
Helping clients use the evaluation
Step 6
Strengthening the evaluation
design and the validity of the conclusions
A. Identifying threats to validity of quasiexperimental designs
B. Assessing the adequacy of qualitative
designs
C. An integrated checklist for mixed-method
designs
D. Addressing threats to quantitative
evaluation designs
E. Addressing threats to the adequacy of
qualitative designs
F. Addressing threats to mixed-method designs
Session outline
1.
2.
3.
4.
What is validity and why does it
matter?
General guidelines for assessing
validity
Additional threats to validity for
quantitative evaluation designs
Strategies for addressing threats to
validity
132
1. What is validity and
why does it matter?
Defining validity
The degree to which the evaluation findings and
recommendations are supported by:
 The conceptual framework describing how the
project is supposed to achieve its objectives
 Statistical techniques (including sample design)
 How the project and the evaluation were
implemented
 The similarities between the project population and
the wider population to which findings are
generalized
134
Importance of validity
Evaluations provide recommendations for
future decisions and action. If the
findings and interpretation are not valid:
 Programs which do not work may
continue or even be expanded
 Good programs may be discontinued
 Priority target groups may not have
access or benefit
135
RWE quality control goals




The evaluator must achieve greatest possible
methodological rigor within the limitations of a given
context
Standards must be appropriate for different types of
evaluation
The evaluator must identify and control for
methodological weaknesses in the evaluation design.
The evaluation report must identify methodological
weaknesses and how these affect generalization to
broader populations.
136
2. General guidelines for assessing the
validity of all evaluation designs
[see Overview Handbook Appendix 1]
A.
B.
C.
D.
E.
Confirmability
Reliability
Credibility
Transferability
Utilization
137
A.
Confirmability
Are the conclusions drawn from the available evidence
and is the research relatively free of researcher
bias?
Examples:
A-1: Inadequate documentation of methods and
procedures
A-2: Is data presented to support the conclusions and
are the conclusions consistent with the findings?
[Compare the executive summary with the data in
the main report]
138
B.
Reliability
Is the process of the study consistent, reasonably
stable over time and across researchers and
methods?
Examples:
B-2: Data was only collected from people who
attended focus groups or community meetings
B-4: Were coding and quality checks made and
did they show agreement?
139
C. Credibility
Are the findings credible to the people studied and to
readers? Is there an authentic picture of what is being
studied?
Examples:
C-1: Is there sufficient information to provide a credible
description of the subjects or situations studied?
C-3: Was triangulation among methods and data sources
systematically applied? Were findings generally
consistent? What happened if they were not?
140
D. Transferability
Do the conclusions fit other contexts and how
widely can they be generalized?
Examples:
D-1: Are the characteristics of the sample
described in enough detail to permit
comparisons with other samples?
D-4: Does the report present enough detail for
readers to assess potential transferability?
141
E. Utilization
Were findings useful to clients,
researchers and communities studied?
Examples:
E-1: Were findings intellectually and
physically accessible to potential
users?
E-3: Do the findings provide guidance for
future action?
142
3. Additional threats to validity for
Quasi-Experimental Designs [QED]
[see Overview Handbook Appendix 1]
F.
Threats to statistical conclusion validity
why inferences about statistical association between two variables
(for example project intervention and outcome) may not be valid
G.
Threats to internal validity why assumptions that
project interventions have caused observed outcomes may not be
valid
H.
Threats to construct validity why selected
indicators may not adequately describe the constructs and causal
linkages in the evaluation model
I.
Threats to external validity why assumptions
about the potential replicability of a project in other locations or with
other groups may not be valid
143
F. Statistical conclusion validity
The statistical design and analysis may incorrectly
assume that program interventions have contributed
to the observed outputs.



The wrong tests are used or they are
applied/interpreted incorrectly
Problems with sample design
Measurement errors
144
G. Threats to internal validity
It may be incorrectly assumed that there is a
causal relationship between project
interventions and observed outputs.




Unclear temporal sequence between the
project and the observed outcomes.
Need to control for external factors
Effects of time
Unreliable measures
145
Example of threat to internal
validity: The assumed causal model
Women join the village
bank where they
receive loans, learn
skills and gain
self-confidence
WHICH ………
Increases women’s
income
Increases women’s
control over
household resources
An alternative causal model
Some women
had
previously taken
literacy training
which increased
their selfconfidence and
work skills
Women who had taken
literacy training are
more likely to join
the village bank.
Their literacy and selfconfidence makes
them more effective
entrepreneurs
Women’s income and
control over
household resources
increased as a
combined result of
literacy, selfconfidence and loans
H. Threats to construct validity
The indicators of outputs, impacts and contextual
variables may not adequately describe and
measure the constructs [hypotheses/concepts]
on which the program theory is based.
 Indicators may not adequately measure key
concepts
 The program theory model and the interactions
between stages of the model may not be
adequately specified.
 Reactions to the experimental context are not
well understood.
148
I. Threats to external validity
Assumptions about how the findings could be
generalized to other contexts may not be valid.
 Some important characteristics of the project
context may not be understood.
 Important characteristics of the project
participants may not be understood.
 Seasonal and other cyclical effects may have
been overlooked.
149
RealWorld Evaluation book
• Appendix 2 gives a worksheet for
assessing the quality and validity of an
evaluation design
• Appendix 3 provides worked examples
150
4. Addressing generic
threats to validity for
all evaluation designs
A. Confirmability
Example: Threat A-1: inadequate
documentation of methods and procedures
Possible ways to address:
 Request the researchers to revise their
documentation to explain more fully their
methodology or to provide missing material.
 Rapid data collection methods (surveys, desk
research, secondary data) to fill gaps
152
B. Reliability
Example: Threat B-4: data were not collected across
the full range of appropriate settings, times,
respondents etc.
Possible ways to address


If the study has not yet been conducted revise the
sample design or use qualitative methods to cover
the missing settings, times or respondents.
If data collection has already been completed
consider using rapid assessment methods such as
focus groups, interviews with key informants,
participant observation etc to fill in some of the gaps.
153
C. Credibility
Example: Threat C-2: The account does not ring true
and does not reflect the local context
Possible ways to address


If the study has not yet been conducted, revise the
sample design or use qualitative methods to cover the
missing settings, times or respondents.
If data collection has already been completed consider
using rapid assessment methods such as focus groups,
interviews with key informants, participant observation etc
to fill in some of the gaps.
154
D. Transferability
Example: Threat D-3: Sample does not permit
generalization to other populations
Possible ways to address:

Organize workshops or consult key informants to assess
whether the problems concern missing information,
factual issues or how the material was interpreted by the
evaluator.

Return to the field to fill in the gaps or include the
impressions of key informants, focus group participants,
or participant observers to provide different perspectives.
155
E. Utilization
Example: Threat E-2: The findings do not provide
guidance for future action
Possible ways to address
 If the researchers have the necessary information, ask
them to make their recommendations more explicit.
 It they do not have the information organize
brainstorming sessions with community groups or the
implementing agencies to develop more specific
recommendations for action.
156
Lightning feedback
What are some of the most serious threats to
validity affecting your evaluations?
 How can they be addressed?
Time for more discussion
158
Small group case study work,
cont.
1.
2.
3.
Evaluation ‘consultants’ meet with
‘clients’ working on same case study
(1A+1B) and (2A+2B)
Negotiate your proposed modification of
the ToR in order to cope with the given
constraints
Be prepared to summarize lessons
learned from this exercise (and
workshop)
159
In conclusion:
Evaluators must be prepared to:
1.
Enter at a late stage in the project cycle;
2.
Work under budget and time restrictions;
3.
Not have access to comparative baseline
data;
4.
Not have access to identified comparison
groups;
5.
Work with very few well qualified evaluation
researchers;
6.
Reconcile different evaluation paradigms and
information needs of different stakeholders.
160
Main workshop messages
1.
2.
3.
4.
5.
Evaluators must be prepared for real-world
evaluation challenges
There is considerable experience to draw on
A toolkit of rapid and economical “RealWorld”
evaluation techniques is available
Never use time and budget constraints as an
excuse for sloppy evaluation methodology
A “threats to validity” checklist helps keep you
honest by identifying potential weaknesses in
your evaluation design and analysis
161
162
162