American Lessons on Designing Reliable Impact Evaluations

Transcript American Lessons on Designing Reliable Impact Evaluations

American Lessons on Designing Reliable
Impact Evaluations, from Studies of WIA
and Its Predecessor Programs
Larry L. Orr, Independent Consultant
Stephen H. Bell, Abt Associates
Jacob A. Klerman, Abt Associates
The early evaluations
• 1960s: MDTA (pre/post)
• 1970s:
– YEDPA (400+ studies; various methods)
– CETA (comparison groups from national survey samples)
• 1980s:
– National Academy review of YEDPA studies found “little reliable
information on the effectiveness of the programs”, recommended
random assignment
– More than a dozen CETA evaluations produced widely divergent
impact estimates – with essentially the same data (Barnow, 1987)
– DOL-convened expert panel recommended random assignment
for evaluation of new Job Training Partnership Act (JTPA)
Evaluating the econometric evaluations
• LaLonde (1986) and Maynard and Fraker (1987) applied a
variety of nonexperimental methods to data from a
randomized trial, were unable to replicate the
experimental estimates
• Since then, a number of replication studies have been
conducted (see summaries in Glazerman et al., 2003;
Bloom et al., 2005; and Pirog et al., 2009). No
nonexperimental method has consistently replicated
experimental results.
The current consensus
• No known nonexperimental method can reliably produce unbiased
estimates of the impact of training programs – this means that you can
never know ex post whether you have a good estimate or not
• Randomized trials are the strongly preferred method of estimating
training program impacts on technical grounds
• Randomized trials are also more intuitively understandable to policy
makers than complex econometric methods
• Nonexperimental studies frequently give rise to technical controversy
that detracts from their credibility and acceptance, whereas
randomized trials are generally accepted by both evaluators and
policy makers
Why is it so hard to obtain reliable results from
nonexperimental studies?
• “Impact” is the difference between trainees’ actual
outcomes (e.g., earnings) and what those outcomes
would have been without training
• The fundamental problem of evaluation is to estimate
what the trainees’ outcomes would have been without
training
• To see how difficult that task is, consider the time path of
earnings for the JTPA control group – individuals who
were just like the trainees except that they didn’t get JTPA
services…
Time path of earnings, control group, National JTPA Study
800
Earnings
600
400
200
-12 -10 -8
-6
-4
-2
0
2
4
6
8
10 12 14 16 18
Months After Program Entry
What is the margin for error?
800
Treatment Group
600
Earnings
Control Group
400
200
-12 -10 -8 -6 -4 -2
0
2
4
6
8
10 12 14 16 18
Months After Program Entry
Time path of earnings, program and comparison
groups, from Heinrich et al.
Our Conclusions/Recommendations (1)
• Random assignment is the only safe way to estimate the
impacts of training programs
– Different nonexperimental approaches yield widely varying
results
– In dozens of replication studies, nonexperimental methods
have almost never satisfactorily replicated the experimental
estimates
– The stakes are too high to take the kind of risk and
uncertainty entailed in nonexperimental methods
– Nonexperimental evaluations inevitably shift the debate
from substance to method
Our Conclusions/Recommendations (2)
• If the ESF does decide to use nonexperimental methods:
– Need to pay close attention to timing of job loss and preprogram dynamics of earnings in matching comparison
group (a necessary, but not sufficient, condition)
– Before adopting any nonexperimental method, it should be
demonstrated that it replicates multiple experimental results
(Note that what should be tested is an algorithm that can
be applied in other evaluations, not a set of estimates that
are unique to a single evaluation.)
Our Conclusions/Recommendations (3)
Learn from our mistakes –
don’t spend 40 years
repeating them!
For copies of these slides, contact…
[email protected]

American Lessons on Designing Reliable Impact Evaluations

Transcript American Lessons on Designing Reliable Impact Evaluations

Directory