Using Mixed Methods in Development Research and Project Evaluation Michael Woolcock, DECRG World Bank Poverty & Inequality Course May 11-12, 2010

Download Report

Transcript Using Mixed Methods in Development Research and Project Evaluation Michael Woolcock, DECRG World Bank Poverty & Inequality Course May 11-12, 2010

Using Mixed Methods in
Development Research and
Project Evaluation
Michael Woolcock, DECRG
World Bank Poverty & Inequality Course
May 11-12, 2010
Source material
• Bamberger, Michael, Vijayendra Rao and Michael Woolcock
(2010) “Using Mixed Methods in Monitoring and Evaluation:
Experiences from International Development”, in Abbas
Tashakkori and Charles Teddlie (eds.) Handbook of Mixed
Methods (2nd revised edition) Thousand Oaks, CA: Sage
Publications
• Barron, Patrick, Rachael Diprose and Michael Woolcock
(2010) Contesting Development: Participatory Projects and
Local Conflict Dynamics in Indonesia New Haven: Yale
University Press (in press)
• Woolcock, Michael (2009) ‘Toward a Plurality of Methods in
Project Evaluation: A Contextualized Approach to
Understanding Impact Trajectories and Efficacy’ Journal of
Development Effectiveness 1(1): 1-14
Ten Reasons to Use Qualitative
Approaches in Projects and Evaluation
1.
Understanding Political, Social Change
•
•
2.
‘Process’ often as important as ‘product’
Modernization of rules, social relations, meaning systems
Examining Dynamics (not just ‘Demographics’) of
Group Membership
•
3.
How are boundaries defined, determined? How are
leaders determined?
Accessing Sensitive Issues and
Stigmatized/Marginalized Groups
•
E.g., conflict and corruption; sex workers
3
Ten Reasons to Use Qualitative
Approaches in Projects and Evaluation
4.
Explaining Context Idiosyncrasies
•
•
5.
Beyond “context matters” to understanding how and why,
at different units of analysis
‘Contexts’ not merely “out there” but “in here”; the Bank
produces legible contexts
Unpacking Understandings of Concepts and (‘Fixed’)
Categories
•
•
Surveys assume everyone understands questions and
categories the same way; do they?
Qualitative methods can be used to correct and/or
complement orthodox surveys
4
Ten Reasons to Use Qualitative
Approaches in Projects and Evaluation
6.
Facilitating Researcher-Respondent Interaction
•
•
7.
Enhance two-way flow of information
Cross-checking; providing feedback
Exploring Alternative Approaches to Understanding
‘Causality’
•
•
•
•
Econometrics: robustness tests on large N datasets;
controlling for various contending factors
History: single/rare event processes
Anthropology: deep knowledge of contexts
Exploring inductive approaches
• Cf. ER doctors, courtroom lawyers, solving jigsaws
5
Ten Reasons to Use Qualitative
Approaches in Projects and Evaluation
8.
Observing ‘Unobservables’
•
9.
Project impact not just a function of easily measured
factors; unobserved factors—such as motivation, political
ties—also important
Exploring Characteristics of ‘Outliers’
•
Not necessarily ‘noise’ or ‘exceptional’; can be high
instructive (cf. illness informs health)
10. Resolving Apparent Anomalies
•
Nice when inter and intra method results align, but
sometimes they don’t; who/which is ‘right’?
6
Overview
• Three challenges:
– Allocating development resources
– Assessing project effectiveness (in general)
– Assessing effectiveness of complex ‘social’
projects (in particular)
• Discussion of options, strategies for assessing
projects using mixed methods
Three challenges
• How to allocate development resources?
• How to assess project effectiveness in general?
• How to assess social development projects
(such as ‘Justice for the Poor’) in particular?
1. Allocating development resources
• How to allocate finite resources to projects believed
likely to have a positive development impact?
• Allocations made for good and bad reasons, only a part
of which is ‘evidence-based’, but most of which is
‘theory-based’, i.e., done because of an implicit (if not
explicit) belief that Intervention A will ‘cause’ Impact B
in Place C net of Factors D and E for Reasons F and G.
– E.g., micro-credit will raise the income of villagers in Flores,
independently of their education and wealth, because it
enhances their capacity to respond to shocks (floods, illness)
and enables larger-scale investment in productive assets
(seeds, fertilizer)
1. Allocating development resources
• Imperatives of the prevailing resource allocation
mechanisms (e.g., those of the World Bank) strongly
favor one-size-fits-all policy solutions (despite
protestations to the contrary!) that deliver predictable,
readily-measurable results in a short time frame
– Roads, electrification, immunization
• Want project impacts to be independent of context,
scale, and time so that ‘successful’ examples (‘best
practices’) can be scaled up and replicated
• Projects that diverge from this structure enter the
resource allocation game at a distinct disadvantage. But
the obligation to demonstrate impact (rightly) remains;
just need to enter the fray well armed, empirically and
strategically…
Core task
• Ask interesting and important questions, then assemble
the best combination of methods to answer it
– Not, “What questions can I answer with this data?”
– Not, “I don’t have a randomized design, so therefore I can’t
say anything defensible”
• Generate data to help projects ‘learn’, in real time
– Be useful
– Make ‘M’ as cool as ‘E’
• Help to more carefully identify the conditions under
which given interventions ‘work’
– Individual methods, per se, are not inherently ‘rigorous’;
they become so to the extent they appropriately match the
problems they confront, the constraints they overcome
– Focus on understanding SD as much as determining LATE
How to Assess Project Effectiveness?
• Need to disentangle the effect of a given intervention over and
above other factors occurring simultaneously
– Distinguishing between the ‘signal’ and ‘noise’
• Is my job creation program reducing unemployment, or is it just the
booming economy?
• Furthermore, an intervention itself may have many components
– TTLs are most immediately concerned about which aspect is the
most important, or the binding constraint
– (Important as this is, it is not the same thing as assessing impact)
• Need to be able to make defensible causal claims about project
efficacy even (especially) when the apparent ‘rigor’ of
econometric methods aren’t suitable/available
– Thus need to change both the terms and content of debate
12
Impact Evaluation 101
• Core evaluation challenge:
– Disentangling effects of people, place, and project (or policy) from
what would have happened otherwise
• i.e., need a counterfactual (but this is rarely observed)
• ‘Tin’ standard
– Beneficiary assessments, administrative checks
• ‘Silver’
– Double difference: before/after, program/control
• ‘Gold’
– Randomized allocation, natural experiments
13
Impact Evaluation 101
• Core evaluation challenge:
– Disentangling effects of people, place, and project (or policy) from
what would have happened otherwise
• i.e., need a counterfactual (but this is rarely observed)
• ‘Tin’ standard
– Beneficiary assessments, administrative checks
• ‘Silver’
– Double difference: before/after, program/control
• ‘Gold’
– Randomized allocation, natural experiments
• (‘Diamond’?)
– Randomized, triple-blind, placebo-controlled, cross-over
• Alchemy?
– Making ‘gold’ with what you have, given prevailing constraints
(people, money, time, logistics, politics)…
14
Making knowledge claims in project
evaluation and development research
• Construct validity
– How well does my instrument assess the underlying
concepts (‘poverty’, ‘participation’, ‘conflict’,
‘empowerment’)?
• Internal validity
– How well have I addressed various sources of bias
(most notably selection effects) influencing the
relationship between IV and DV?
• i.e., what is my identification strategy?
• External validity
– How well can I extrapolate my findings? If my project
works ‘here’, will it also work ‘there’? Will bigger be
better?
We observe an outcome indicator…
(observedl)
Y1
Y0
t=0
Intervention
16
…and its value rises after the program
(observedl)
Y1
Y0
t=1 time
t=0
Intervention
17
However, we need to identify the counterfactual
(i.e., what would have happened otherwise)…
Y1
(observedl)
Y1*
(counterfactual)
Y0
Intervention
t=0
t=1 time
18
… since only then can we determine the
impact of the intervention
Y1
Impact = Y1- Y1*
Y1*
Y0
t=0
t=1 time
19
The Challenge of Assessing SD Projects
• You’re a star in development if you devise a “best
practice” and a “tool kit”—i.e., a universal, easy-toadminister solution to a common problem
• There are certain problems for which finding such a
universal solution is both desirable and possible (e.g., TB,
roads for high rainfall environments)…
• But many key problems, such as those pertaining to local
governance and law reform (e.g., J4P), inherently require
context-specific solutions that are heavily dependent on
negotiation and teamwork, not a technology (pills,
bridges, seeds)
– Not clear that if such a project works ‘here’ that it will also work
‘there’, or that ‘bigger’ will be ‘better’
– Assessing such complex projects is enormously difficult
20
Why are ‘complex’ interventions so hard to
evaluate? A simple example
• You are the inventor of ‘BrightSmile’, a new toothpaste
that you are sure makes teeth whiter and reduces
cavities without any harmful side effects. How would you
‘prove’ this to public health officials and (say) Colgate?
21
Why are ‘complex’ interventions so hard to
evaluate? A simple example
• You are the inventor of ‘BrightSmile’, a new toothpaste
that you are sure makes teeth whiter and reduces
cavities without any harmful side effects. How would you
‘prove’ this to public health officials and (say) Colgate?
• Hopefully (!), you would be able to:
– Randomly assign participants to a ‘treatment’ and ‘control’
group (and then have then switch after a certain period); make
sure both groups brushed the same way, with the same
frequency, using the same amount of paste and the same type
of brush; ensure nobody (except an administrator, who did not
do the data analysis) knew who was in which group
22
Demonstrating ‘impact’ of BrightSmile vs.
SD projects
• Enormously difficult—methodologically, logistically and
empirically—to formally identify ‘impact’; equally
problematic to draw general ‘policy implications’, especially
for other countries
• Prototypical “complex” CDD/J4P project:
– Open project menu: unconstrained content of intervention
– Highly participatory: communities control resources and decisionmaking
– Decentralized: local providers and communities given high degree of
discretion in implementation
– Emphasis on building capabilities and the capacity for collective action
– Context-specific; project is (in principle) designed to respond to and
reflect local cultural realities
– Project’s impact may be ‘non-additive’ (e.g., stepwise, exponential,
high initially then tapering off…)
23
How does J4P work over time?
(or, what is its ‘functional form’?)
‘Governance’?
CCTs?
Impact
Impact
A
B
Time
Time
Bridges?
‘AIDS awareness’?
Impact
Impact
C
Time
D
Time
24
How does J4P work over time?
(or, what is its ‘functional form’?)
E
Unintended consequences?
Impact
Impact
Shocks?
(‘Impulse response
function’)
Time
F
Time
‘Pest control’?
e.g., cane toads
‘Empowerment’?
Impact
Impact
G
Time
H
Time
25
How does J4P work over time?
(or, what is its ‘functional form’?)
Impact
Impact
?
Unknown… Unknowable?
J
I
Time
Time
26
Science, Complexity, and Evaluation
Pure Science Applied
Science
Theory
Human Dev
(education,
health)
projects
Social Dev
(e.g., CDD
projects)
Hi
Predictive precision
 Cumulative
knowledge
 Subject/object gap

Lo
Mechanisms
# Causal pathways
 # of ‘people-based’
transactions

Few
Many
Outcomes
Plausible range
 Measurement
precision

Narrow
Wide
27
So, what can we do when…
• Inputs are variables (not constants)?
– Facilitation/participation vs. tax cuts (seeds, pills, etc)
– Teaching vs. text books
– Therapy vs. medicine
• Adapting to context is an explicit, desirable feature?
– Each context/project nexus is thus idiosyncratic
• Outcomes are inherently hard to define and measure?
– E.g., empowerment, collective action, conflict mediation, social
capital
28
Linking Questions, Methodologies,
Methods, Data
• Questions should drive choice of methods and
measurement tools (not vice versa)
• Social science data is always partial, an imperfect reflection
of a more complex underlying reality
• Data can be manipulated for political purposes
• Some (very important) things cannot be measured—love,
identity, meaning
• “Not everything that can be counted, counts”
• “It’s better to be vaguely right than precisely wrong”
 “Triangulation”—integrating more abundant, more diverse,
and higher-quality evidence
29
• Begin with interesting and important questions
– “The most important questions of method begin where the
standard techniques do not apply” (C Wright Mills)
– Finding answers may require single or multiple methods and
data forms—need to be a good detective
– But difficult to do when one has invested many years in
mastering difficult techniques—“Everything looks like a nail
when all you have is a hammer”
• Methodologies as the particular combination and sequence
of methods used to answer the question(s)
• Methods can be qualitative and/or quantitative
• Data can also be qualitative and/or quantitative
30
•
Qual/quan disputes often stem from…
1. Conflating methods and data
2. Mismatches between question, methods and data
3. Assumptions that different “standards” apply
• Qualitative approaches seen as
– inductive, valid, subjective, process (‘how’), generating ideas
•
Quantitative approaches seen as
– deductive, reliable, objective, effects (‘whether’), testing ideas
 Not necessarily…
•
Integrating qual and quan approaches to…
•
•
•
Complement strengths, compensate weaknesses
Address problems of missing/inadequate data
Observing the unobservable
31
Types of Mixed Methods
• Pure Qualitative: ‘Think quan, act qual’
• Parallel: Quan and qual done separately
qual
quan
• Sequential: Quan follows qual
qual
quan
• Iterative: Quan and qual in constant dialogue
qual
quan
• (Pure Quantitative)
32
Forms and sources of data
• Quantitative (“numbers”)
–
–
–
–
Household and other surveys (e.g., census, LSMS)
Opinion polls (e.g., Gallup, marketing research)
Data from official files (e.g., membership lists, government reports)
Indexes created from multiple sources (e.g., “governance”)
• Qualitative (“texts”)
–
–
–
–
–
–
Historical records, political reports, letters, legal documents
Media (print, radio, and television)
Open-ended responses to survey questions
Observation (ethnography)
Interviews—key informants, focus groups
Participatory approaches—PRA, etc
• Comparative (“cases”)
– ‘Rare’, ‘small-N’ historical events (e.g. wars, economic crises)
33
Arraying methods by source of variation
(Source: Gerring 2004: 343)
Temporal variation
No
Spatial
variation
Yes
None (1 unit) Impossible
Case study I
Within-unit
Case study II
Case study III
Across-unit
Cross-section
Panel data
Across- and
within-unit
Hierarchical
Hierarchical time-series;
comparative-historical
Types of methods of analysis
• Quantitative
– Statistical analysis
– Hypothesis testing (deductive)
• Qualitative
– Emergent themes
– Generates propositions (inductive)
– Software available: e.g., N6 (reduces ‘small N’ problem)
• Comparative (“cases”)
– Differences among otherwise similar cases
– Commonalities among otherwise different cases
– Common strategy in history; used to try to explain ‘causality’
 “The goal is not to show which approach is best, but rather to
generate dialogue between ideas and evidence” (Ragin)
35
Alternative causal
claim-making modalities
1. Econometrics
•
Robustness tests on large N datasets; controlling for various
contending factors
2. History
•
•
Single/rare event processes; ‘processing tracing’ of case studies
(QCA; Fuzzy sets – Ragin)
3. Anthropology
•
Deep knowledge of contexts; (cf. CEOs…)
4. Legal approaches
•
•
Civil standard: ‘Preponderance of the evidence’
Criminal standard: ‘Beyond a reasonable doubt’
Types of Data and Methods
Quan
Subjective Welfare
Standard Survey
Qual
Quan
Data
Ethnography
PRA
Quantitative Anthropology
Small-N Matched Comparisons
Qual
Methods
Adapted from Hentschel, 1999
37
Integrating Qualitative and
Quantitative Approaches
1. Parallel Qual/Quan
– Teams work separately
– Best suited to large (e.g. country level) assessments (GUAPA)
– Quantitative
• Large household survey
– Qualitative
• In-depth work with selected groups
– Data analyzed separately, integrated as part of write-up and
conclusions
38
Integrating Qualitative and
Quantitative Approaches
2. Sequential Qual/Quan (the ‘classical’ approach)
– Qualitative
• Use PRA, focus groups, etc to get a grounded
understanding of key issues
– Quantitative
• Use this material to design a survey instrument
• Use the survey to test hypotheses that emerged from the
qualitative work
– Examples
• Survival and mobility in Delhi slums (Jha, Rao and
Woolcock, 2007)
• Evaluating Jamaica Social Investment Fund (Rao and
Ibanez, 2002)
39
Integrating Qualitative and
Quantitative Approaches
3. Iterative Qual/Quan (‘Bayesian’ approach)
•
•
•
•
Ongoing dialogue between Qual and Quan
Qualitative
• As above: used to generate initial hypotheses, establish
validity of questions
Quantitative
• Hypotheses tested with household survey
• Return to the field; cycle repeats
Example:
• Potters in India (Rao, 2000)
•
Initial study of marriage markets lead to study of domestic
violence, and another on unit price differentials/inequality
40
Other uses for Mixed Methods
1. When existing time and resources prelude
doing or using formal survey/census data
•
Examples: St Lucia and Colombia
2. When it’s unclear what “intervention” might
be responsible for observed outcomes
– That is, no clear ex ante hypotheses; working
inductively from matched comparison cases
•
Examples:
–
–
–
–
Putnam (1993) on regional governance in Italy
Mahoney (2010) on governance in Central America
Collins (2001) on “good to great” US companies
Varshney (2002) on sources of ethnic violence in India
41
Practical examples
1. Poverty in Guatemala (GUAPA)
– ‘Parallel’
– Quan: expanded LSMS
• first social capital module
• large differences by region, gender, income, ethnicity
• pervasive elite capture
– Qual: 10 villages (5 different ethnic groups)
•
•
•
•
perceptions of exclusion, access to services
fear of reprisal, of children being stolen
legacy of shocks (political and natural)
links to LSMS data
42
Practical examples
2. Poverty in Delhi slums
– ‘Sequential’
– Qual: 4 migrant communities
• near, far, recent, long-term
– Quan: 800 randomly selected representative households
– From survival to mobility
• role of norms (sharing, status) and networks (kinship, politics)
• housing, employment transitions
• property rights
– Understanding ‘governance’
• managing collective action
• crucial role of service provision
43
Practical examples
3. ‘Justice for the Poor’ Initiative
– Origins in Indonesia
• Draws on the approach and findings from large local conflict
study
– Integrated qualitative and quantitative approach
– Results show importance of understanding
• Rules of the game (meta-rules)
• Dynamics of difference (politics of ‘us’-‘them’ relations)
• Efficacy of intermediaries (legitimacy, enforceability)
• Extension to Cambodia…
– Research on collective disputes (e.g., land), to inform IDA grant
in 2007
• …and now into Africa and East Asia
– Sierra Leone, Kenya, Vanuatu, East Timor, PNG…
44
J4P: Core Research Design
• Enormous investment in recruiting, training, keeping
local field staff
• Training centers on techniques, ethics, data
management and analysis
• Where possible, use existing quantitative data sources
to (a) complement qualitative work, and (b) help with
sampling
• Sampling based on basic comparative method:
– Maximum difference between contexts
– Focus on outliers (‘exceptions to the rule’)
• Rough rule of thumb: analysis takes three times as long
as data collection
– Analysis can’t be “outsourced”: research team needs to be
involved at all stages
45
Concluding thoughts
•
The virtues and limits of measurement
–
•
Triangulation
–
•
Tension between simplifying versus complicating reality
Integrating more data, better data, more diverse data as
“substitutes” and “complements”
Surveys as tool for adaptation and guidance
–
–
–
Not prescription for uniformity or control
One size does not fit all
Encouraging comparability across time and space
46
Summary of methods used in
Barron, Diprose and Woolcock (2010)
Breadth
Breadth
PODES,
GDS
DD
ee
pp
t
t
hh
Newspaper
Analysis
Key Informant
Survey
Case Studies