Transcript Document

IDEV 624 – Monitoring and Evaluation
Evaluating Program
Impact
Elke de Buhr, PhD
Payson Center for International Development
Tulane University
Process vs. Outcome/Impact
Monitoring
Process
Monitoring
Outcome Impact
Monitoring Evaluation
LFM
USAID Results Framework
A Public Health Questions Approach
to HIV/AIDS M&E
Are we doing
them on a large
enough scale?
Determining Collective
OUTCOMES
Effectiveness
& IMPACTS
OUTCOMES
Are we doing
them right?
Monitoring &
Evaluating National
OUTPUTS
Programs
ACTIVITIES
Are collective efforts being implemented on a large
enough scale to impact the epidemic? (coverage;
impact)? Surveys & Surveillance
Are interventions working/making a difference? Outcome
Evaluation Studies
Are we implementing the program as planned?
Outputs Monitoring
What are we doing? Are we doing it right?
Process Monitoring & Evaluation, Quality Assessments
Are we doing
What interventions and resources are needed?
INPUTS
the right
Needs, Resource, Response Analysis & Input Monitoring
things?
Understanding
What interventions can work (efficacy & effectiveness)?
Potential
Efficacy & Effectiveness Studies, Formative & Summative Evaluation, Research Synthesis
Responses
What are the contributing factors?
Determinants Research
Problem
Identification
What is the problem?
Situation Analysis & Surveillance
7/21/2015
(UNAIDS 2008)
Strategic Planning for M&E:
Setting Realistic Expectations
All
Most
Input/ Output
Monitoring
Process
Evaluation
Number
of
Projects
Some
Outcome
Monitoring /
Evaluation
Few*
Impact
Monitoring /
Evaluation
Levels of Monitoring & Evaluation Effort
*Disease impact monitoring is synonymous with disease surveillance and should be part of all
national-level efforts, but cannot be easily linked to specific projects
7/21/2015
4
Monitoring Strategy
• Process  Activities
• Outcome/Impact  Goals and Objectives
Impact Evaluation
Impact Evaluation
• Impact evaluations are undertaken to find out
whether a program has accomplished its intended
effects
• Directed at the net effects of an intervention,
impact evaluations produce "an estimate of the
impact of the intervention uncontaminated by the
influence of other processes and events that also
may affect the behavior or conditions at which the
social program being evaluated is directed”
(Rossi/Freeman 1989: 229)
• Ideally, impact assessments establish causality by
means of a randomized experiment
Outcome vs. Impact
• Outcome level: Status of an outcome at some
point of time
• Outcome change: Difference between
outcome levels at different points in time
• Impact/program effect: Proportion of an
outcome change that can be attributed
uniquely to a program as opposed to the
influence of some other factor
(Rossi/Lipsey/Freeman 2004)
Outcome vs. Impact (cont.)
• Impact/program effect: the value added or
net gain that would not have occurred without
the program and the only part of the outcome
for which the program can honestly take
credit
– Most demanding evaluation task
– Time-consuming and costly
(Rossi/Lipsey/Freeman 2004: 207)
Outline of an Impact
Evaluation
1. Unit of analysis
2. Research question/hypothesis
3. Evaluation design
4. Sampling method
5. Impact indicators
6. Data analysis plan
1. Unit of Analysis
Unit of Analysis
• Unit of analysis: The units on which outcome
measures are taken in an impact assessment
and, correspondingly, the units on which data are
available for analysis
• The unit of analysis in impact assessments is
determined by
1.
2.
the nature of the intervention and
the targets to which the intervention is directed
• Can be individuals, households, neighborhoods,
organizations, geographic areas, etc.
(Rossi/Lipsey/Freeman 2004)
What are your program’s units
of analysis?
2. Research
Question/Hypothesis
Hypothesis
• Hypothesis: Formal statement that
predicts relationship between one or more
factors and the problem under study
• Support or reject the null hypothesis
• Null = no relationship
• Test:
– Compare same variable over time
– Comparison between two or more groups
Can you formulate a null
hypothesis for your program?
3. Evaluation Design
Evaluation Designs
• Evaluation strategies:
– Comparisons over time
– Comparison between groups
• Research designs:
– Pre-test/Post-test designs
– Time series
– Quasi-experiments
– Randomized experiments
Comparisons Over Time
Time
O1
X
Pretest/Posttest design
O2
Time
O1
O2
O3
X
O4
O5
O6
O3
X
O4
Time
O1
X
O2
X
Longitudinal
designs /
Time series
Effect of Intervention?
(Fisher, A A and J R Foreit Designing HIV/AIDS Intervention Studies: An Operations
Handbook Population Council: May 2002, p.56)
Effect of Intervention?
(Fisher and Foreit, p.57)
Effect of Intervention?
(Fisher and Foreit, p. 57)
Effect of Intervention?
(Fisher and Foreit, p. 58)
Comparisons Between Groups
Time
Experimental group
O1
Comparison group
O3
X
O2
O4
Quasiexperimental
design
Time
Experimental group
Control group
R
O1
O3
X
O2
O4
Experimental
design
Randomized Experiments
• “Flagships of impact assessment”
(Rossi/Lipsey/Freeman 2004: 262)
• When conducted well, provide the most credible
conclusions about program effects
• Isolate the effects of the intervention being evaluated
by ensuring that intervention and control group are
statistically equivalent except for the intervention
received
• In practice, it is sufficient if groups, as aggregates,
are comparable with regard to any characteristic
relevant to the outcome
Randomization
• Randomization: Assignment of potential targets to
intervention and control groups on the basis of
chance so that every unit in a target population
has the same probability as any other to be
selected for either group
• Approximations of randomization: Acceptable if
the groups that are being compared do not differ
on any characteristic relevant to the intervention or
the expected outcomes ( Quasi-experiments)
(Rossi/Lipsey/Freeman 2004)
Feasible?
• Randomized experiments are not feasible for all impact
assessments
• Results may be ambiguous if
– program in early stages of implementation
– interventions change in ways experiments cannot easily
capture
• In addition, the method may
– be perceived as unfair or unethical (requires withholding
services from parts of the target population)
– be too resource intensive (technical expertise, time, costs, etc.)
– cause disruption in program procedures for delivering services,
create artificial situation
Quasi-Experimental Designs
• Often used when it is not feasible to
randomly assign targets to intervention
and control groups
• Types of quasi-experimental designs:
matched controls, statistical controls,
reflexive controls, etc.
• Threats to validity: Selection bias, secular
trends, interfering events, maturation
Threats to Validity
Threats to Internal
Validity
•
INTERNAL VALIDITY: Any changes that are observed in the dependent variable are due to the effect of the
independent variable. They are not due to some other independent variables (extraneous variables, alternative
explanations, rival hypotheses). The extraneous variables need to be controlled for in order to be sure that any
results are due to the treatment and thus the study is internally valid.
•
Threat of History: Study participants may have had outside learning experiences and enhanced their knowledge on a topic
and thus score better when they are assessed after an intervention independent from the impact of the intervention. (No
control group)
Threat of Maturation: Study participants may have matured in their ability to understand concepts and developed learning
skills over time and thus score better when they are assessed after an intervention independent from the impact of the
intervention. (No control group)
Threat of Mortality: Study participants may drop out and do not participate in all measures. Those that drop out are likely to
differ from those that continue to participate. (No pretest)
Treat of Testing: Study participants might do better on the posttest compared to the pretest simply because they take the
same test a second time.
Threat of Instrumentation: The posttest may have been revised or otherwise modified compared to the pretest and the two
test are not comparable anymore.
John Henry Effect: Control group may try extra hard after not becoming part of the “chosen” group (compensatory rivalry).
Resentful Demoralization of Control Group: Opposite of John Henry Effect. Control group may be demoralized and
perform below normal after not becoming part of the “chosen” group.
Compensatory Equalization: Control group may feel disadvantaged for not being part of the “chosen” group and receive
extra resources to keep everybody happy. This can cloud the effect if the intervention.
Statistical Regression: Threat to validity in cases in which the researcher uses extreme groups as study participants that
have been selected based on test scores. Due to the role that chance plays in test scores, the scores of students that score
at the bottom of the normal curve are likely to go up, the scores of those that score at the top will go down if they are
assessed a second time.
Differential Selection: Experimental and control group differ in its characteristics. This may influence the results.
Selection-Maturation Interaction: Combines the threats to validity described as differential selection and maturation. If
experimental and control group differ in important respects, as for example age, differences in achievement might be due to
this maturational characteristic rather than the treatment.
Experimental Treatment Diffusion: Close proximity of treatment and control group might result in treatment diffusion. This
clouds the effect of the intervention.
•
•
•
•
•
•
•
•
•
•
•
Threats to Validity
Matrix
History
Maturation
Mortality
Testing
Instrumentation
One-Shot Case
Study
YES
YES
YES
-
-
One-Group PretestPosttest Design
YES
YES
CONT.
YES
Time Series Design
YES
CONT.
CONT.
Pretest-Posttest
Control Group Design
CONT.
CONT.
Posttest-Only Control
Group Design
CONT.
Single-Factor Multiple
Treatment Designs
John
Henry
Effect
Compensatory Equalization
Differential
Selection
-
-
-
MAYBE
-
-
-
YES
MAYBE
-
-
-
CONT.
CONT.
CONT.
MAYBE
MAYBE
CONT.
CONT.
YES
-
-
MAYBE
MAYBE
CONT.
CONT.
CONT.
CONT.
CONT.
CONT.
MAYBE
MAYBE
CONT.
Solomon 4 – Group
Design
CONT.
CONT.
CONT.
CONT.
CONT.
MAYBE
MAYBE
CONT.
Factorial Design
CONT.
CONT.
CONT.
CONT.
CONT.
MAYBE
MAYBE
CONT.
Static-Group
Comparison Design
CONT.
CONT.
YES
-
-
MAYBE
MAYBE
YES
Nonequivalent
Control Group Design
CONT.
CONT.
CONT.
CONT.
CONT.
MAYBE
MAYBE
CONT.
Research Designs - Variations
A. Simple Designs
B. Cross-Sectional Studies
C. Longitudinal Studies
D. Experimental Designs
A. Simple Designs
• One-Shot Case Study
XO
• One-Group Pretest-Posttest Design
OXO
• Time Series Design
OOOOXOOOO
R = Random assignment of subjects to conditions
X = Experimental treatment
O = Observation of the dependent variable (pretest, posttest, interim measure, etc.)
B. Cross-Sectional Studies
Group 1
Comparison of
groups. One point
in time.
Variations:
Case-control study
Group 2
Group 3
Case-Control Study
Group 1
(with
characteristic)
Event(s)
Group 2
(without
characteristic)
Comparison of
groups. One point
in time.
Major limitations:
Cannot be sure that population has not changed since event(s).
C. Longitudinal Studies
Population
Population
Population
Comparison of population over time. Repeated
measurements.
Variations: Panel study, Cohort study
Panel Study
Group 1
Group 1
Group 1
Measures change over time. Repeated data
collection from same individuals.
Major limitations: High drop-out rates pose threat to
internal validity.
Cohort Study
Cohort (1)
Cohort (2)
Cohort (3)
Measures change over time. Repeated data collection from
same cohort but different individuals.
Major limitations: Measures total change but fluctuations within
cohort are not assessed.
D. Experimental
Designs
Group 1
Experiment
Group 1
Pre-Test
Post-Test
Group 2
Group 2
Compares group(s) exposed to treatment with group not
exposed to treatment. Measures at two points of time.
Variations: True experimental design, Quasi-experimental
design
True Experimental
Design
Group 1
Target
population
Groups
assigned
randomly.
Experiment
Group 1
Pre-Test
Post-Test
Group 2
Group 2
Compares group(s) exposed to treatment with group not exposed to
treatment. Measures at two points of time. Research subjects are
assigned randomly to treatment and control group.
Major limitations: Not feasible for all research & ethical problems.
True Experimental Designs
• True experimental designs use control groups
and random assignment of participants
Variations:
• Pretest-Posttest Control Group Design
• Posttest-Only Control Group Design
• Single-Factor Multiple Treatment Designs
• Solomon 4 – Group Design
• Factorial Design
Pretest-Posttest Control Group
Design
ROXO
RO O
• The randomly assigned experimental
group receives the treatment and the
control group receives no treatment or an
alternative treatment
Posttest-Only Control Group
Design
RXO
R O
• Like previous but without pretest.
Single-Factor Multiple Treatment
Designs
R O X1 O
R O X2 O
RO
O
• Extension of Pretest-Posttest Control
Group Design
• Sample is assigned randomly to one of
several conditions
Solomon 4 – Group Design
ROXO
RO O
R XO
R
O
• Developed by researchers that worried
about the effect of pretesting on the
validity of the results.
Factorial Design
Two Independent Variables
A
B
AxB
Three Independent Variables
A
B
C
AxB
AxC
BxC
AxBxC
• Allows to include more than one independent
variable.
• Test for the effects of different kinds of variables
that might be expected to influence outcomes
(gender, age, etc.).
Quasi-Experimental
Design
Group 1
Target
population
Groups not
assigned
randomly.
Experiment
Group 1
Pre-Test
Post-Test
Group 2
Group 2
Compares group(s) exposed to treatment with group not exposed to
treatment. Measures at two points of time. Random assignment not
possible.
Major limitations: Not a true experiment. Threats to validity.
( Selection bias)
Quasi-Experimental Designs
• Quasi-experimental designs lack the random
assignment of experimental designs.
Variations:
• Static-Group
Comparison Design
XO
------------O
• Nonequivalent Control
Group Design
OXO
------------O O
Choosing an Evaluation Design
Impact Evaluation Strategy
• Comparison
– Same group (over time)
– Different groups
• Design balances accuracy and reliability
with cost and feasibility
 What is a “good enough” research
design?
Research Design Flow-Chart
Research Design
Observational
Study
Cross-Sectional
Experimental
Study
Longitudinal
Methods
Survey Research
Participant Observation
Single Group
True Experiment
Quasi-Experiment
Methods
Clinical Experiment
Natural Experiment
Comparison Group Flow Chart
(Methodologist Toolchest, Version 3.0)
4. Sampling Methods
Sample Selection
• Sample size
• Sampling frame
• Sample selection = sampling
– Probability sampling
– Nonprobability sampling
Sampling Methods
• Census vs. Sampling
– Census measures all units in a population
– Sampling identifies and measures a subset of
individuals within the population
• Probability vs. Non-Probability Sampling
– Probability sampling results in a sample that
is representative of the target population
– A non-probability sample is not representative
of any population
Probability Sampling
•
Sample representative of the target population, large
sample size
– Simple random/systematic
sampling
.
– Stratified random/systematic sampling
– Cluster sampling
– Experimental and quasi-experimental designs
Advantages
• Findings representative
of the population
• Advanced statistical
analysis
Disadvantages
• Costly and time
consuming (depending
on target population)
• Significant training needs
5. Impact Indicators
Concepts, Variables and Indicators
Example 1
Example 2
Example 3
Concepts
Size
Health
Variables
Area
Economic
well-being
Income per
capita
Indicators
Square
kilometers
(Phuong Pham, Introduction to Quantitative Analysis)
Life
Expectancy
Purchasing
Average
Power Parity years of life
(PPP) GNP
if born in
($) per capita
1970
Indicator Criteria
1. Measurable (able to be recorded and
analyzed in quantitative or qualitative
terms)
2. Precise (defined the same way by all
people)
3. Consistent (not changing over time so
that it always measures the same
thing)
4. Sensitive (changing proportionally in
response to actual changes in the
condition or item being measured)
Categorical vs. Continuous
Variables
• Continuous variables
– A variable that can be measured (weight,
height, age, etc.)
• Categorical variables
– A variable that cannot be measured but
can be categorized (ethnic group, age
group, educational level, socio-economic
class, etc.)
6. Data Analysis Plan
Data Analysis
• Type of variable
– Categorical
– Continuous
• Type of data analysis
– Descriptive analysis
– Hypothesis testing
Descriptive Analysis vs.
Hypothesis Testing
• Descriptive data analysis
– Organizing and summarizing data
• Statistical inference
– Procedure by which we reach a
conclusion about a population on the
basis of the information contained in a
sample that has been drawn from that
population
(Phuong Pham, Introduction to Quantitative Analysis)
Exercise
Exercise
• Outline an Outcome and/or Impact Evaluation
for your program
• Include a description of:
1.
2.
3.
4.
5.
6.
Unit of analysis
Research question/hypothesis
Evaluation design
Sampling method
Impact indicators
Data analysis plan