International Program for Development Evaluation Training

Download Report

Transcript International Program for Development Evaluation Training

RealWorld Evaluation
Designing Evaluations under Budget, Time, Data
and Political Constraints
International Perspectives on Impact Evaluation
Professional pre-session workshop #5
Cairo
29 March 2009
Facilitated by
Jim Rugh
Note: This PowerPoint presentation and the summary
chapter of the book are available at:
www.RealWorldEvaluation.org
1
Workshop Objectives
1. The seven steps of the RealWorld Evaluation
approach for addressing common issues and
constraints faced by evaluators such as: when the
evaluator is not called in until the project is nearly
completed and there was no baseline nor
comparison group; or where the evaluation must
be conducted with inadequate budget and
insufficient time; and where there are political
pressures and expectations for how the evaluation
should be conducted and what the conclusions
should say
2
Workshop Objectives
2.
3.
4.
5.
Defining what impact evaluation should be
Identifying and assessing various design options
that could be used in a particular evaluation setting
Ways to reconstruct baseline data when the
evaluation does not begin until the project is well
advanced or completed.
How to identify and to address threats to the
validity or adequacy of quantitative, qualitative and
mixed methods designs with reference to the
specific context of RealWorld evaluations
3
Workshop Objectives
Note: This workshop will focus on projectlevel impact evaluations. There are, of
course, many other purposes, scopes,
evaluands and types of evaluations. Some
of these methods may apply to them, but our
examples will be based on project impact
evaluations, most of them in the context of
developing countries.
4
Workshop agenda
1. Introduction [15 minutes]
2. Quick summary of the RealWorld Evaluation (RWE) approach [30 minutes]
3. Small group self-introductions and sharing of RWE issues you have faced in
your own practice. [45 minutes]
4. Scoping the evaluation and identifying budget and time constraints, also
logic models, evaluation designs [60 minutes]
--- short break --5. Addressing data constraints [30 minutes]
6. Mixed methods [30 minutes]
7.a. Group exercise: preparing an evaluation design when working under
budget, time, data or political constraints. The cases will also illustrate the
different evaluation agendas and perspectives of evaluation consultants,
project implementers and funding agencies. [30 minutes]
--- lunch [60 minutes] --7.b. Small group work continues [60 minutes]
8. Plenary: Identifying threats to validity [30 minutes]
9. Paired groups negotiate their ToRs [45 minutes]
10-11-12: Feedback, wrap-up discussion, evaluation of the workshop
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
OVERVIEW OF THE
RWE APPROACH
6
RealWorld Evaluation Scenarios
Scenario 1: Evaluator(s) not brought in until near
end of project
For political, technical or budget reasons:
• There was no baseline survey
• Project implementers did not collect
adequate data on project participants at the
beginning or during the life of the project
• It is difficult to collect data on comparable
control groups
7
RealWorld Evaluation Scenarios
Scenario 2: The evaluation team is called in
early in the life of the project
But for budget, political or methodological
reasons:
 The ‘baseline’ was a needs assessment,
not comparable to eventual evaluation
 It was not possible to collect baseline data
on a comparison group
8
Reality Check – Real-World
Challenges to Evaluation
•
•
•
•
•
•
All too often, project designers do not think
evaluatively – evaluation not designed until the
end
There was no baseline – at least not one with data
comparable to evaluation
There was/can be no control/comparison group.
Limited time and resources for evaluation
Clients have prior expectations for what the
evaluation findings will say
Many stakeholders do not understand evaluation;
distrust the process; or even see it as a threat
(dislike of being judged)
9
RealWorld Evaluation
Quality Control Goals




Achieve maximum possible evaluation rigor
within the limitations of a given context
Identify and control for methodological
weaknesses in the evaluation design
Negotiate with clients trade-offs between
desired rigor and available resources
Presentation of findings must recognize
methodological weaknesses and how they
affect generalization to broader populations
10
The Need for the RealWorld
Evaluation Approach

As a result of these kinds of constraints, many
of the basic principles of rigorous impact
evaluation design (comparable pre-test-post
test design, control group, adequate instrument
development and testing, random sample
selection, control for researcher bias, thorough
documentation of the evaluation methodology
etc.) are often sacrificed.
11
The RealWorld Evaluation
Approach
An integrated approach to
ensure acceptable standards
of methodological rigor while
operating under realworld
budget, time, data and
political constraints.
See handout summary chapter extracted from
RealWorld Evaluation book for more details
12
The RealWorld Evaluation
approach

Developed to help evaluation practitioners
and clients
• managers, funding agencies and external
consultants


Still a work in progress (more to be learned)
Originally designed for developing countries,
but equally applicable in industrialized
nations
13
Special Evaluation Challenges in
Developing Countries






Unavailability of needed secondary data
Scarce local evaluation resources
Limited budgets for evaluations
Institutional and political constraints
Lack of an evaluation culture (though
evaluation associations are addressing this)
Many evaluations are designed by and for
external funding agencies and seldom reflect
local and national stakeholder priorities
14
Special Evaluation Challenges in
Developing Countries
Despite these challenges, there is a
growing demand for methodologically
sound evaluations which assess the
impacts, sustainability and replicability of
development projects and programs
…………………….
15
Most RealWorld Tools are not New—
Only the Integrated Approach is New


Most of the RealWorld Evaluation data
collection and analysis tools will be familiar to
most evaluators
What is new is the integrated approach
which combines a wide range of tools to
produce the best quality evaluation under
realworld constraints
16
Who Uses RealWorld Evaluation
and When?


Two main users:
• evaluation practitioners
• managers, funding agencies and external
consultants
The evaluation may start at:
• the beginning of the project
• after the project is fully operational
• during or near the end of project
implementation
• after the project is finished
18
What is Special About the
RealWorld Evaluation Approach?


There is a series of steps, each with
checklists for identifying constraints and
determining how to address them
These steps are summarized on the following
slide and then the more detailed flow-chart
…
(See page 6 of handout)
19
The Steps of the RealWorld
Evaluation Approach
Step 1: Planning and scoping the evaluation
Step 2: Addressing budget constraints
Step 3: Addressing time constraints
Step 4: Addressing data constraints
Step 5: Addressing political constraints
Step 6: Assessing and Addressing the strengths
and weaknesses of the evaluation design
Step 7: Helping clients use the evaluation
20
The Real-World Evaluation Approach
Step 1: Planning and scoping the evaluation
A. Defining client information needs and understanding the political context
B. Defining the program theory model
C. Identifying time, budget, data and political constraints to be addressed by the RWE
D. Selecting the design that best addresses client needs within the RWE constraints
Step 2
Addressing budget
constraints
A. Modify evaluation design
B. Rationalize data needs
C. Look for reliable secondary
data
D. Revise sample design
E. Economical data collection
methods
Step 3
Addressing time constraints
All Step 2 tools plus:
F. Commissioning preparatory
studies
G. Hire more resource persons
H. Revising format of project
records to include critical data for
impact analysis.
I. Modern data collection and
analysis technology
Step 6
Assessing and addressing the strengths and weaknesses of the
evaluation design
An integrated checklist for multi-method designs
A. Objectivity/confirmability
B. Replicability/dependability
C. Internal validity/credibility/authenticity
D. External validity/transferability/fittingness
Step 4
Addressing data constraints
A. Reconstructing baseline data
B. Recreating comparison
groups
C. Working with non-equivalent
comparison groups
D. Collecting data on sensitive
topics or from difficult to reach
groups
E. Multiple methods
Step 5
Addressing political
influences
A. Accommodating pressures
from funding agencies or
clients on evaluation design.
B. Addressing stakeholder
methodological preferences.
C. Recognizing influence of
professional research
paradigms.
Step 7
Helping clients use the evaluation
A. Utilization
B. Application
C. Orientation
D. Action
21
TIME FOR DISCUSSION
22
1. Self-introductions
2. What constraints of
these types have you
faced in your evaluation
practice?
3. How did you cope with
them?
23
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
The challenge of the
counterfactual
Attribution and counterfactuals
How do we know if the observed changes in
the project participants or communities
•
income, health, attitudes, school attendance etc
are due to the implementation of the project
•
credit, water supply, transport vouchers, school
construction etc
or to other unrelated factors?
•
changes in the economy, demographic movements,
other development programs etc
25
The Counterfactual

What would have been the condition of
the project population at the time of the
evaluation if the project had not taken
place?
26
Where is the counterfactual?
After families had been living
in a new housing project for
3 years, a study found
average household income
had increased by an 50%
Does this show that housing is
an effective way to raise
income?
27
Comparing the project with two
possible comparison groups
I
n
c
o
m
e
Project group. 50% increase
750
Scenario 2. 50% increase in
comparison group income. No
evidence of project impact
500
Scenario 1. No increase in
comparison group income.
Potential evidence of project
impact
250
2000
2002
5 main evaluation strategies
for addressing the counterfactual
Randomized designs
I. True experimental designs
II. Randomized selection of participants &
control
Quasi-experimental designs
III. Strong quasi-experimental designs
IV. Weaker quasi-experimental designs
Non-experimental designs.
V. No logically defensible counterfactual
29
The most rigorous statistical designs: Randomized
experimental or at least strong quasi-experimental
evaluation designs
Subjects randomly
assigned to the
project and control
groups or control
group selected
using statistical or
judgmental
matching
T1
Pre-test
T2
Treatment
[project]
T3
Posttest
Project group
P1
X
P2
Control group
C1
Gain score [impact] = P2 – P1
C2– C1
C2
Conditions of both
groups are not
controlled during
the project
Control group and comparison group


Control group = randomized allocation of
subjects to project and non-treatment group
Comparison group = separate procedure for
sampling project and non-treatment groups
that are as similar as possible in all aspects
except the treatment (intervention)
31
Reference sources for
randomized field trial designs
1. MIT Poverty Action Lab
www.povertyactionlab.org
2. Center for Global Development
“When will we ever learn?”
http://www.cgdev.org/content/publications/
detail/7973
32
The limited use of strong
evaluation designs

In the realworld we estimate that
• Less than 5-10% of impact evaluations use a
•
strong quasi-experimental design
Significantly less than 5% use randomized
control trials (experimental design)
33
There are other methods for
assessing the counterfactual




Reliable secondary data that depicts
relevant trends in the population
Longitudinal monitoring data (if it includes
non-reached population)
Qualitative methods to obtain perspectives
of key informants, participants, neighbors,
etc.
We’ll talk more about this in the 5th session
34
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Step 1
PLANNING AND SCOPING THE
EVALUATION
35
Step 1: Planning and Scoping the
Evaluation



Understanding client information needs
Defining the program theory model
Preliminary identification of constraints to
be addressed by the RealWorld
Evaluation
36
A. Understanding client information
needs
Typical questions clients want answered:
 Is the project achieving its objectives?
 Are all sectors of the target population
benefiting?
 Are the results sustainable?
 Which contextual factors determine the
degree of success or failure?
37
A. Understanding client information
needs
A full understanding of client information
needs can often reduce the types of
information collected and the level of
detail and rigor necessary.
However, this understanding could also
increase the amount of information
required!
38
B. Defining the program theory
model
All programs are based on a set of assumptions
(hypothesis) about how the project’s
interventions should lead to desired outcomes.
 Sometimes this is clearly spelled out in project
documents.
 Sometimes it is only implicit and the evaluator
needs to help stakeholders articulate the
hypothesis through a logic model.
39
B. Defining the program theory
model

Defining and testing critical assumptions
are a essential (but often ignored)
elements of program theory models.

The following is an example of a model
to assess the impacts of microcredit on
women’s social and economic
empowerment
40
Critical Hypothesis for a Gender-Inclusive
Micro-Credit Program




Outputs
• If credit is available women will be willing and able to obtain loans
and technical assistance.
Short-term outcomes
• If women obtain loans they will start income-generating activities.
• Women will be able to control the use of loans and reimburse them.
Medium/long-term impacts
• Economic and social welfare of women and their families will
improve.
• Increased women’s economic and social empowerment.
Sustainability
• Structural changes will lead to long-term impacts.
41
Consequences
Consequences
Consequences
PROBLEM
PRIMARY
CAUSE 1
Secondary
cause 2.1
Tertiary
cause 2.2.1
PRIMARY
CAUSE 2
Secondary
cause 2.2
Tertiary
cause 2.2.2
PRIMARY
CAUSE 3
Secondary
cause 2.3
Tertiary
cause 2.2.3
High infant mortality rate
Children are malnourished
Insufficient
food
Contaminated
water
Flies and
rodents
Diarrheal
disease
Unsanitary
practices
Do not use
facilities
correctly
Poor quality
of food
Need for
improved health
policies
People do not
wash hands
before eating
Reduction in poverty
Women empowered
Women in
leadership roles
Women achieve
rights within
household
S&L groups
organized
Women able to
reimburse loans
Women
educated
Credit provided
to
entrepreneurs
Improved
economic
conditions
MFI provides
credit
Training of
agents
What does it take to measure
indicators at each level?
Program Impact: Population-based survey
(program baseline, program evaluation some time after projects completd)
Project Impact :Population-based survey (baseline, evaluation)
Effect: b) Population-based survey (usually only during baseline and
evaluation)
Effect: a) Follow-up survey of participants (can be done annually)
Output: Measured by project staff annually
Activities: On-going (monitoring)
Inputs: On-going (financial accounts)
We need to recognize which evaluative
process is most appropriate for
measurement at various levels
• Impact
• Effect
• Output
• Activities
• Inputs
PROGRAM EVALUATION
PROJECT EVALUATION
PERFORMANCE MONITORING
Coming to agreement on what levels of the
logic model to include in evaluation


This can be a sensitive issue: Project staff generally don’t
like to be held accountable for more than the output level,
while donors (and intended beneficiaries) may insist on
evaluating higher-level outcomes.
An approach evaluators might take is that if the correlation
between intermediary outcomes (or even qualified outputs)
and impact has been adequately established though
research and program evaluations, then assessing
intermediary outcome-level indicators might suffice, as long
as the contexts can be shown to be sufficiently similar to
where such hypotheses have been tested.
47
Determining appropriate (and
feasible) evaluation design

Based on an understanding of client
information needs, required level of rigor,
and what is possible given the
constraints, the evaluator and client
need to determine what evaluation
design is required and possible under
the circumstances.
48
Let’s focus for a while on evaluation
design (a quick review)
1: Review different evaluation (experimental
/quasi-experimental) designs
2: Develop criteria for determining appropriate
Terms of Reference (ToR) for evaluating a
project, given its own (planned or unplanned) evaluation design.
3: Use decision tree to make choices of what’s
required (or feasible) to include in an
evaluation ToR.
4: A life-of-project evaluation design
perspective.
49
49
An introduction to various evaluation designs
Illustrating the need for quasi-experimental
longitudinal time series evaluation design
Project participants
Comparison group
baseline
scale of major impact indicator
end of project
evaluation
post project
evaluation
50
OK, let’s stop the action to
identify each of the major
types of evaluation (research)
design …
… one at a time, beginning with the
most rigorous design.
51
First of all: the key to the traditional symbols:




X = Intervention (treatment), I.e. what the
project does in a community
O = Observation event (e.g. baseline, mid-term
evaluation, end-of-project evaluation)
P (top row): Project participants
C (bottom row): Comparison (control) group
Note: the 7 RWE evaluation designs are laid out on page 46 of your handout
52
Design #1: Longitudinal Quasi-experimental
P1
X
C1
P2
X
C2
P3
P4
C3
C4
Project participants
Comparison group
baseline
midterm
end of project
evaluation
post project
evaluation
53
Design #1+: Longitudinal Randomized Control Trial
P1
X
C1
P2
X
C2
P3
P4
C3
C4
Project participants
Research subjects
randomly assigned
either to project or
control group.
Control group
baseline
midterm
end of project
evaluation
post project
evaluation
54
Design #2+: Randomized Control Trial
P1
X
P2
C1
C2
Project participants
Research subjects
randomly assigned
either to project or
control group.
Control group
baseline
end of project
evaluation
55
Design #2: Quasi-experimental (pre+post, with comparison)
P1
X
P2
C1
C2
Project participants
Comparison group
baseline
end of project
evaluation
56
Design #3: Truncated Longitudinal
X
P1
X
C1
P2
C2
Project participants
Comparison group
midterm
end of project
evaluation
57
Design #4: Pre+post of project; post-only comparison
P1
X
P2
C
Project participants
Comparison group
baseline
end of project
evaluation
58
Design #5: Post-test only of project and comparison
X
P
C
Project participants
Comparison group
end of project
evaluation
59
Design #6: Pre+post of project; no comparison
P1
X
P2
Project participants
baseline
end of project
evaluation
60
Design #7: Post-test only of project participants
X
P
Project participants
end of project
evaluation
61
Other questions to answer as
you customize an evaluation
Terms of Reference (ToR):
1.
2.
3.
4.
Who asked for the evaluation? (Who are
the key stakeholders)?
What are the key questions to be
answered?
Will this be a formative or summative
evaluation?
Will there be a next phase, or other
projects designed based on the findings of
this evaluation?
68
Other questions to answer as
you customize an evaluation
ToR:
5.
6.
7.
8.
9.
What decisions will be made in response
to the findings of this evaluation?
What is the appropriate level of rigor?
What is the scope / scale of the
evaluation / evaluand (thing to be
evaluated)?
How much time will be needed /
available?
What financial resources are needed /
available?
69
Other questions to answer as
you customize an evaluation
ToR:
10.
11.
12.
13.
14.
15.
Should the evaluation rely mainly on
quantitative or qualitative methods?
Should participatory methods be used?
Can / should there be a household
survey?
Who should be interviewed?
Who should be involved in planning /
implementing the evaluation?
What are the most appropriate media
for communicating the findings to
different stakeholder audiences?
70
Evaluation (research) design?
Key questions?
Evaluand (what to evaluate)?
Qualitative?
Quantitative?
Scope?
Appropriate level of rigor?
Resources available?
Time available?
Skills available?
Participatory?
Extractive?
Evaluation FOR whom?
Does this help, or just confuse things more? Who
said evaluations (like life) would be easy?!! 71
Before we return to
the RealWorld steps,
let’s gain a
perspective on levels
of rigor, and what a
Life-Of-Project
Evaluation Plan could
look like
72
Different levels of rigor
depends on source of evidence; level of confidence; use of information
Objective, High precision, More time & expense
Level 5: A thorough research project is undertaken to conduct in-depth
analysis of situation; P= +/- 1% Book published!
Level 4: Good sampling and data collection methods used to gather data
which is representative of target population; P= +/- 5% Decision maker reads
full report
Level 3: A rapid survey is conducted on a convenient sample of
participants; P= +/- 15% Decision maker reads 10-page summary of report
Level 2: A fairly good mix of people are asked their perspectives about
project; P= +/- 25% Decision maker reads at least executive summary of report
Level 1: A few people are asked their perspectives about project;
P= +/- 40% Decision made in a few minutes
Level 0: Decision-maker’s impressions based on anecdotes and sound
bytes heard during brief encounters (hallway gossip), mostly intuition;
Level of confidence +/- 50%; Decision made in a few seconds
Subjective , Sloppy, Quick & cheap
73
Determining appropriate levels of precision for
events in a life-of-project evaluation plan
High rigor
Same level of rigor
4
Final
evaluation
Baseline
study
Mid-term
evaluation
3
Needs
assessment
Special
Study
Annual
self-evaluation
2
Low rigor
Time during project life cycle
74
TIME FOR DISCUSSION
75
Now, where were we?
Oh, yes, we’re ready for Steps 2 and
3 of the RealWorld Evaluation
Approach.
Let’s continue …
76
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Steps 2 + 3
ADDRESSING BUDGET AND
TIME CONSTRAINTS
77
Step 2: Addressing budget
constraints
A.
B.
C.
D.
E.
Clarifying client information needs
Simplifying the evaluation design
Look for reliable secondary data
Review sample size
Reducing costs of data collection and
analysis
78
2A: Simplifying the evaluation
design


For quantitative evaluations it is possible
to select among the 7 most common
evaluation designs (noting the trade-offs
when using a simpler design).
For qualitative evaluations the options
will vary depending on the type of
design.
79
2A (cont): Qualitative designs

Depending upon the design, some of the
options might include:
• Reducing the number of units studied
•
•
(communities, families, schools)
Reducing the number of case studies or the
duration and complexity of the cases.
Reducing the duration or frequency of
observations
80
2.B. Rationalize data needs


Use information from Step 1 to identify
client information needs
Review all data collection instruments
and cut out any questions not directly
related to the objectives of the
evaluation.
81
2.C. Look for reliable
secondary sources

Planning studies, project administrative
records, government ministries, other
NGOs, universities / research institutes,
mass media.
82
2.C. Look for reliable
secondary sources, cont.
Assess the relevance and reliability of
sources for the evaluation with respect
to:
 Coverage of the target population
 Time period
 Relevance of the information collected
 Reliability and completeness of the data
 Potential biases
83
2.D. Seeking ways to reduce
sample size
Accepting a lower level of precision
significantly reduces the required
number of interviews:
 To test for a 5% change in proportions
requires a maximum sample of 1086
 To test for a 10% change in proportions
requires a maximum sample of up to 270
84
2.E. Reducing costs of data
collection and analysis







Use self-administered questionnaires
Reduce length and complexity of
instrument
Use direct observation
Obtain estimates from focus groups and
community forums
Key informants
Participatory assessment methods
Multi-methods and triangulation
86
Step 3: Addressing time
constraints
In addition to Step 2 methods:
 Reduce time pressures on external
consultants
• Commission preparatory studies
• Video conferences



Hire more consultants/researchers
Incorporate outcome indicators in project
monitoring systems and documents
Technology for data inputting/coding
87
Addressing time constraints

It is important to distinguish between approaches
that reduce the:
a) duration in terms of time over the life of the
project (e.g. from baseline to final evaluation over 5
years)
b) duration in terms of the time needed to undertake
the actual evaluation study/studies (e.g. 6 weeks,
whether completed in an intensive consecutive 6
weeks or a cumulative total of 6 weeks periodically
over the course of a year), and
b) the level of effort (person-days, i.e. number of
staff x total days required).
88
Addressing time constraints
Negotiate with the client to discuss questions such as the
following:
1.
What information is essential and what could be
dropped or reduced?
2.
How much precision and detail is required for the
essential information? E.g. is it necessary to have
separate estimates for each geographical region or
sub-group or is a population average acceptable?
3.
Is it necessary to analyze all project components and
services or only the most important?
4.
Is it possible to obtain additional resources (money,
staff, computer access, vehicles etc) to speed up the
data collection and analysis process?
89
TIME FOR A BREAK !
90
90
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Step 4
Addressing data
constraints
Step 4 Addressing data constraints
Step 1
Planning and Scoping the Evaluation
Step 2
Addressing budget
constraints
Step 3
Addressing time
constraints
Step 4
Addressing data
constraints
Step 5
Addressing political
constraints
Step 6 Assessing the strengths and weaknesses
of the evaluation
design
Step 7 Strengthening the
Evaluation Design
Step 4
Addressing data constraints
A. Reconstructing baseline data
B. Special challenges in working with
comparison groups.
C. Collecting data on sensitive topics
D. Collecting data on difficult to
reach groups
1. The importance
of baseline data


Hard to assess change without data on preproject conditions
Post-test comparisons do not fully address:
•
Selection bias: initial differences between participants
and non-participants
• Propensity score matching and instrumental variables
•
partially addresses this
Historical factors influencing outcomes that were
assumed to have been caused by the project
intervention
93
1. Ways to reconstruct baseline
conditions
A.
B.
C.
D.
E.
Secondary data.
Project records.
Recall
Key informants
PRA and other participatory techniques
such as timelines, and critical incidents
to help establish the chronology of
important changes in the community
94
1-A. Assessing the utility of
potential secondary data






Reference period
Population coverage
Inclusion of required indicators
Completeness
Accuracy
Free from bias
95
1-A. Using secondary data to
reconstruct baselines







Census
Surveys
Project administrative data
Agency reports
Special studies by NGOs, donors
University studies
Mass media (newspapers, radio, TV)
96
1-B. Using project records
Types of data
 Feasibility/planning studies
 Application/registration forms
 Supervision reports
 Management Information System (MIS) data
 Meeting reports
 Community and agency meeting minutes
 Progress reports
 Construction costs
97
1-B. Assessing the reliability of
project records




Who collected the data and for what
purpose?
Were they collected for record-keeping or to
influence policymakers or other groups?
Do monitoring data only refer to project
activities or do they also cover changes in
outcomes?
Were the data intended exclusively for
internal use? For use by a restricted group?
Or for public use?
98
1-B. Assessing the reliability of
project records


How accurate and complete are the
data? Are there obvious gaps? Were
these intentional or due to poor recordkeeping.
Potential biases with respect to the key
indicators required for the impact
evaluation?
99
1-B. Working with the client to improve the
utility of project monitoring data for
evaluation



Collecting additional information on
applicants or participants
Ensure identification data is included and
accurate.
Ensure data organized in the way
needed for evaluation [by community/
types of service/ family rather than just
individuals/ economic level etc]
100
1-C. Using recall to reconstruct
baseline data








School attendance and time/cost of travel
Sickness/use health facilities
Income and expenditures
Community/individual knowledge and skills
Social cohesion/conflict
Water usage/quality/cost
Periods of stress
Travel patterns
101
1-C. Where Knowledge about
Recall is Greatest

Areas where most research has been
done on the validity of recall
• Income and expenditure surveys
• Demographic data and fertility behavior

Types of Questions
• Yes/No; fact
• Scaled
• Easily related to major events
102
1-C. Limitations of recall




Generally not reliable for precise
quantitative data
Sample selection bias
Deliberate or unintentional distortion
Few empirical studies (except on
expenditure) to help adjust estimates.
103
1-C. Sources of bias in recall






Who provides the information
Under-estimation of small and routine expenditures
“Telescoping” of recall concerning major expenditures.
Distortion to conform to accepted behavior.
•
•
•
Intentional or unconscious
Romanticizing the past
Exaggerating (e.g. “We had nothing before this project came.”)
Contextual factors:
•
•
Time intervals used in question
Respondents expectations of what interviewer wants to
know
Implications for the interview protocol
104
1-C. Improving the validity of
recall




Conduct small studies to compare recall
with survey or other findings.
Ensure all groups interviewed
Triangulation
Link recall to important reference events
• Elections
• Drought/floods
• Construction of road, school etc
105
1-F. Summary of issues in
baseline reconstruction





Variations in reliability of recall.
Memory distortion.
Secondary data not easy to use
Secondary data incomplete or unreliable.
Key informants may distort the past
111
2. Reconstructing comparison
(control) groups
112
2. Ways to reconstruct control
groups



Judgmental matching of communities.
When phased introduction of project
services beneficiaries entering in later
phases can be used as “pipeline” control
group.
Internal controls when different subjects
receive different combinations and levels
of services
113
2. Using propensity scores to
strengthen comparison groups


Propensity score matching
Rapid assessment studies can compare
characteristics of project and control groups
using:
•
•
•
•
•
Observation
Key informants
Focus groups
Secondary data
Aerial photos and GIS data
114
2. Using propensity scores to
strengthen comparison groups



Logistical regression (Logit) on project and
comparison population to identify determinants
of project participation
Select “nearest neighbors” (usually around 5)
from comparison group who most closely
match a participant.
Project impact = gain score = difference
between project participant score and mean
score for nearest neighbors.
115
Issues in reconstructing control
groups





Project areas often selected purposively and
difficult to match.
Differences between project and control
groups - difficult to assess if outcomes due to
project or to these initial differences.
Lack of good data to select control groups
Contamination
Econometric methods cannot fully adjust for
initial differences between the groups
[unobservables].
116
Pause for DISCUSSION
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Mixed-method evaluation
I. Mixed Method Designs
1. Quantitative data collection
methods




Structured surveys (household, farm,
transport usage, etc)
Structured observation
Anthropometric methods
Aptitude and behavioral tests
120
1. Quantitative data collection methods
Strengths and weaknesses
Strengths






Statistically representative
Generalization
Estimate magnitude and
distribution of impacts
Clear documentation of
methods
Standardized approach
Statistical control of bias and
external factors
Weaknesses






Surveys cannot capture many
types of information
Do not work for difficult to reach
groups
Lack analysis of context
Survey situation may alienate
Long delay in obtaining results
Data reduction loses information
121
2. Qualitative data collection methods
Interviewing







Structured
Semi-structured
Unstructured
Focus groups
Community
interviews
Participatory Rapid
Appraisal (PRA)
methods
Audio recording
Analysis of
Documents and
Artifacts
Observation




Participant observation
Structured observation
Unstructured observation
Photography and video
recording




Project documents
Published reports
E-mail
Legal documents:
•
•
•
birth and death certificates,
property transfer documents
marriage certificates
 Posters
 Decorations in the house
 Clothing and gang insignia
122
2. Qualitative data collection methods
Characteristics






The researcher’s perspective is an integral part of
what is recorded about the social world
Scientific detachment is not possible
Meanings given to social phenomena and situations
must be understood
Programs cannot be studied independently of their
context
Difficult to define clear cause and effect
Change must be studied holistically
123
2. Qualitative data collection methods
Strengths and weaknesses
Strengths






Flexible to evolve
Sampling focuses on high
value subjects
Holistic focus (“the big picture”)
Multiple sources provide
complex understanding
Narrative more accessible to
non-specialists
Triangulation strengthens
validity of findings
Weaknesses





Lack of clear design may
frustrate clients
Lack of generalizability
Multiple perspectives - hard to
reach consensus
Individual factors not isolated.
Interpretive methods appear too
subjective
124
3. Mixed method evaluation designs

Combine the strengths of both QUANT and QUAL
approaches

One approach ( QUANT or QUAL) is often
dominant and the other complements it

Can have both approaches equal but harder to
design and manage.

Can be used sequentially or concurrently
125
It should NOT be a fight between pure
QUALITATIVE
(verbiage alone)
Quantoid!
OR
QUANTITATIVE
(numbers alone)
Qualoid!
126
“Your numbers
“Your human
look impressive,
interest story
but let me tell
sounds nice, but
you the human
let me show you
interest story.”
the statistics.”
127
What’s needed is the right combination of
BOTH QUALITATIVE methods
AND QUANTITATIVE methods
128
Participatory approaches should be
used as much as possible
but even they should be used with appropriate
rigor: how many (and which) people’s
perspectives contributed to the story? 129
3. Mixed method evaluation designs
How quantitative and qualitative methods
complement each other
A. Broaden the conceptual framework
• Combining theories from different disciplines:
• Exploratory QUAL studies can help define framework
B. Combine generalizability with depth and context
• Random subject selection ensures representativity and generalizability
• Case studies, focus groups, etc., can help understand the characteristics of
the different groups selected in the sample
C. Permit access to difficult to reach groups [QUAL]
• PRA, focus groups, case studies, etc., can be effective ways to reach
•
women, ethnic minorities and other vulnerable groups
Direct observation can provide information on groups difficult to interview.
For example, informal sector and illegal economic activities
D. Enable Process analysis [QUAL]
• Observation, focus groups and informal conversations are more effective for
understanding group processes or interaction between people and public
agencies, and studying the organization
130
3. Mixed method evaluation designs
How quantitative and qualitative methods
complement each other (cont.)
D.
Analysis and control for underlying structural factors [QUANT]
•
•
Sampling and statistical analysis can avoid misleading conclusions
Propensity scores and multivariate analysis can statistically control for
differences between project and control groups
Example:
•
•
Meetings with women may suggest gender biases in local firms’ hiring
practices; however,
Using statistical analysis to control for years of education or experience
may show there are no differences in hiring policies for workers with
comparable qualifications
Example:
•
•
Participants who volunteer to attend a focus group may be strongly in
favor or opposed to a certain project, but
A rapid sample survey may show that most community residents have
different views
131
3. Mixed method evaluation designs
How quantitative and qualitative methods
complement each other (cont.)
E. Triangulation and consistency checks
•
•
Direct observation may identify inconsistencies in interview responses
Examples:
•
•
A family may say they are poor but observation shows they have new
furniture, good clothes, etc.
A woman may say she has no source of income, but an early morning visit
may show she operates an illegal beer brewing business
F. Broadening the interpretation of findings:
•
•
Combining personal experience with “social facts”
Statistical analysis frequently includes unexpected or interesting
findings which cannot be explained through the statistics. Rapid
follow-up visits may help explain the findings
132
3. Mixed method evaluation designs
How quantitative and qualitative methods
complement each other (cont.)
G. Interpreting findings
Example:
• A QUANT survey of community water management in
Indonesia found that with only one exception all village water
supply was managed by women
• Follow-up visits found that in the one exceptional village
women managed a very profitable dairy farming business –
so men were willing to manage water to allow women time to
produce and sell dairy produce
Source: Brown (2000)
133
Using Qualitative methods to improve
the Evaluation design and results
 Use recall to reconstruct the pre-test situation
 Interview key informants to identify other changes in the
community or in gender relations
 Conduct interviews or focus groups with women and
men to
•
•
assess the effect of loans on gender relations within the
household, such as changes in control of resources and
decision-making
identify other important results or unintended consequences:
• increase in women’s work load,
• increase in incidence of gender-based or domestic violence
134
Questions?
135
Enough of my
presentations: it’s time
for you (THE
RealWorld PEOPLE!)
to get involved
yourselves.
Time for small-group
work. Read your case
studies and begin your
discussions. (To be
continued after lunch.)
Small group case study work
1.
2.
3.
Some of you are playing the role of
evaluation consultants, others are
clients commissioning the evaluation.
Decide what your group will propose
to do to address the given
constraints/ challenges.
Prepare to negotiate the ToR with the
other group (later this afternoon).
138
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Step 6
Identifying and addressing threats
to the validity of the evaluation
design and conclusions
139
The Real World Evaluation [RWE] Approach
Step 1
Planning and scoping the evaluation
Step 2
Addressing budget constraints
Step 3
Addressing time constraints
Step 4
Addressing data constraints
Step 5
Addressing political influences
Step 6
Strengthening the evaluation design
and validity
Step 7
Helping clients use the evaluation
Step 6
Strengthening the evaluation
design and the validity of the conclusions
A. Identifying threats to validity of quasiexperimental designs
B. Assessing the adequacy of qualitative
designs
C. An integrated checklist for mixed-method
designs
D. Addressing threats to quantitative
evaluation designs
E. Addressing threats to the adequacy of
qualitative designs
F. Addressing threats to mixed-method designs
1. What is validity
and why does it
matter?
Defining validity
The degree to which the evaluation findings and
recommendations are supported by:
 The conceptual framework describing how the
project is supposed to achieve its objectives
 Statistical techniques (including sample design)
 How the project and the evaluation were
implemented
 The similarities between the project population and
the wider population to which findings are
generalized
142
Importance of validity
Evaluations provide recommendations for
future decisions and action. If the
findings and interpretation are not valid:
 Programs which do not work may
continue or even be expanded
 Good programs may be discontinued
 Priority target groups may not have
access or benefit
143
RWE quality control goals




The evaluator must achieve greatest possible
methodological rigor within the limitations of a given
context
Standards must be appropriate for different types of
evaluation
The evaluator must identify and control for
methodological weaknesses in the evaluation
design.
The evaluation report must identify methodological
weaknesses and how these affect generalization to
broader populations.
144
2. General guidelines for assessing the
validity of all evaluation designs
[see Overview Handout Appendix 1, p. 69]
A.
B.
C.
D.
E.
Confirmability
Reliability
Credibility
Transferability
Utilization
145
A.
Confirmability
Are the conclusions drawn from the
available evidence and is the
research relatively free of researcher
bias?
Examples:
A-1: Inadequate documentation of methods and
procedures
A-2: Is data presented to support the conclusions and
are the conclusions consistent with the findings?
[Compare the executive summary with the data in
the main report]
146
B.
Reliability
Is the process of the study consistent,
reasonably stable over time and
across researchers and methods?
Examples:
B-2: Data was only collected from people who
attended focus groups or community meetings
B-4: Were coding and quality checks made and
did they show agreement?
147
C. Credibility
Are the findings credible to the people
studied and to readers? Is there an
authentic picture of what is being
studied?
Examples:
C-1: Is there sufficient information to provide a credible
description of the subjects or situations studied?
C-3: Was triangulation among methods and data sources
systematically applied? Were findings generally
consistent? What happened if they were not?
148
D. Transferability (generalizability)
Do the conclusions fit other contexts
and how widely can they be
generalized?
Examples:
D-1: Are the characteristics of the sample
described in enough detail to permit
comparisons with other samples?
D-4: Does the report present enough detail for
readers to assess potential transferability?
149
E. Utilization
Were findings useful to clients,
researchers and communities
studied?
Examples:
E-1: Were findings intellectually and
physically accessible to potential
users?
E-3: Do the findings provide practical
guidance for future action?
150
3. Additional threats to validity for
Quasi-Experimental Designs [QED]
[see Overview Handout Appendix 2]
F.
Threats to statistical conclusion validity
why inferences about statistical association between two variables
(for example project intervention and outcome) may not be valid
G.
Threats to internal validity why assumptions that
project interventions have caused observed outcomes may not be
valid
H.
Threats to construct validity why selected
indicators may not adequately describe the constructs and causal
linkages in the evaluation model
I.
Threats to external validity why assumptions
about the potential replicability of a project in other locations or with
other groups may not be valid
151
Example of threat to internal
validity: The assumed causal model
Women join the village
bank where they
receive loans, learn
skills and gain
self-confidence
WHICH ………
Increases women’s
income
Increases women’s
control over
household resources
An alternative causal model
Some women
had
previously taken
literacy training
which increased
their selfconfidence and
work skills
Women who had taken
literacy training are
more likely to join
the village bank.
Their literacy and selfconfidence makes
them more effective
entrepreneurs
Women’s income and
control over
household resources
increased as a
combined result of
literacy, selfconfidence and loans
RealWorld Evaluation book
• Appendix 2 gives a worksheet for
assessing the quality and validity of an
evaluation design
• Appendix 3 gives a worked examples
158
Lightning feedback
What are some of the most serious threats to
validity affecting your evaluations?
 How can they be addressed?
Time for small groups
to meet together to
negotiate the ToRs for
the evaluation of the
housing project.
166
In conclusion:
Evaluators must be prepared to:
1.
Enter at a late stage in the project cycle;
2.
Work under budget and time restrictions;
3.
Not have access to comparative baseline
data;
4.
Not have access to identified comparison
groups;
5.
Work with very few well qualified evaluation
researchers;
6.
Reconcile different evaluation paradigms and
information needs of different stakeholders.
167
Main workshop messages
1.
2.
3.
4.
5.
Evaluators must be prepared for realworld
evaluation challenges
There is considerable experience to draw on
A toolkit of rapid and economical “RealWorld”
evaluation techniques is available (see
www.RealWorldEvaluation.org)
Never use time and budget constraints as an
excuse for sloppy evaluation methodology
A “threats to validity” checklist helps keep you
honest by identifying potential weaknesses in
your evaluation design and analysis
168
169
169