randomization and impact evaluation
DESCRIPTION
Randomization and Impact evaluation . The Types of Program Evaluation. Process evaluation Audit and monitoring Did the intended policy actually happen (2) Impact evaluation What effect (if any) did the policy have?. Why Impact Evaluation ?. Knowledge is a global public good - PowerPoint PPT PresentationTRANSCRIPT
WBI WORKSHOP
Randomization and Impact evaluation
WBI WORKSHOP
The Types of Program Evaluation
(1) Process evaluation• Audit and monitoring• Did the intended policy actually happen
(2) Impact evaluation• What effect (if any) did the policy have?
WBI WORKSHOP
Why Impact Evaluation ?
Knowledge is a global public good
Long term credibility
Help choosing best projects: build long term support for development
WBI WORKSHOP
• Impact is the difference between the relevant outcome indicator with the program and that without it.
• However, we can never simultaneously observe someone in two different states of nature.
• So, while a post-intervention indicator is observed, its value in the absence of the program is not, i.e., it is a counter-factual.
The evaluation problem and alternative solutions
WBI WORKSHOP
Problems when Evaluation is not Built in Ex-Ante
Need a reliable comparison group
Before/After: Other things may happen
Units with/without the policy:May be different for other reasons than the policy (e.g. because policy is placed in specific areas)
WBI WORKSHOP
We observe an outcome indicator,
Y1 (observedl)
Y0 t=0 Intervention
WBI WORKSHOP
and its value rises after the program:
Y1 (observedl)
Y0 t=0 t=1 time Intervention
WBI WORKSHOP
However, we need to identify the counterfactual…
Y1 (observedl)
Y1
* (counterfactual)
Y0 t=0 t=1 time Intervention
WBI WORKSHOP
… since only then can we determine the impact of the intervention
Y1
Impact = Y1- Y1*
Y1
*
Y0 t=0 t=1 time
WBI WORKSHOP
How can we fill in the missing dataon the counterfactual?
• Randomization• Matching• Propensity-score matching• Difference-in-difference• Matched double difference• Regression Discontinuity Design• Instrumental variables
WBI WORKSHOP
1. Randomization“Randomized out” group reveals counterfactual.
•Only a random sample participates.
•As long as the assignment is genuinely random, impact is revealed in expectation.
•Randomization is the theoretical ideal, and the benchmark for non-experimental methods. Identification issues are more transparent compare with other evaluation technique.
•But there are problems in practice:•internal validity: selective non-compliance•external validity: difficult to extrapolate results from a pilot experiment to the whole population
WBI WORKSHOP
2. MatchingMatched comparators identify counterfactual.
Propensity-score matching: Match on the basis of the probability of participation.
• Match participants to non-participants from a larger survey.
• The matches are chosen on the basis of similarities in observed characteristics.
• This assumes no selection bias based on unobservable heterogeneity.
• Validity of matching methods depends heavily on data quality.
WBI WORKSHOP
• Ideally we would match on the entire vector X of observed characteristics. However, this is practically impossible. X could be huge.
• Rosenbaum and Rubin: match on the basis of the propensity score =
• This assumes that participation is independent of outcomes given X. If no bias give X then no bias given P(X).
3. Propensity-score matching (PSM)Match on the probability of participation.
)1Pr()( iii XDXP
WBI WORKSHOP
1: Representative, highly comparable, surveys of the non-participants and participants.
2: Pool the two samples and estimate a logit (or probit) model of program participation. Predicted values are the “propensity scores”.
3: Restrict samples to assure common support
Failure of common support is an important source of bias in observational studies (Heckman et al.)
Steps in score matching:
WBI WORKSHOP
Density
0 Propensity score 1
Density of scores for participants
WBI WORKSHOP
Density
0 1 Propensity score
Density of scores for non-participants
WBI WORKSHOP
Density
0 Region of common support 1 Propensity score
Density of scores for non-participants
WBI WORKSHOP
4: For each participant find a sample of non-participants that have similar propensity scores.
5: Compare the outcome indicators. The difference is the estimate of the gain due to the program for that observation.
6: Calculate the mean of these individual gains to obtain the average overall gain.
Steps in score matching:
WBI WORKSHOP
Collect baseline data on non-participants and (probable) participants before the program.
• Compare with data after the program. • Subtract the two differences, or use a
regression with a dummy variable for participant.
• This allows for selection bias but it must be time-invariant and additive.
4. Difference-in-difference (double difference)
Observed changes over time for nonparticipantsprovide the counterfactual for participants.
WBI WORKSHOP
Selection bias
Y1 Impact
Y1
*
Y0 t=0 t=1 time
Selection bias
WBI WORKSHOP
Diff-in-diff requires that the bias is additive and time-invariant
Y1 Impact
Y1
*
Y0 t=0 t=1 time
WBI WORKSHOP
The method fails if the comparison group is on a different trajectory
Y1 Impact?
Y1
*
Y0 t=0 t=1 time
WBI WORKSHOP
Diff-in-diff: if (i) change over time for comparison group reveals counterfactual
and (ii) baseline is uncontaminated by the program,
itC
itT
it GYYE )]([
00 iG
*it
Cit YEYE
WBI WORKSHOP
Score match participants and non-participants based on observed characteristics in baseline
•Then do a double difference•This deals with observable heterogeneity in initial conditions that can influence subsequent changes over time
5. Matched double differenceMatching helps control for bias in diff-in-diff
WBI WORKSHOP
6. Regression Discontinuity Design
Selection function is a discontinuous function
UPP in Indonesia: two similar kecamatan in the same kabupaten that have scores within the neighborhood of the cut off score can be treated differently
Kecamatan score
0
1
controltreatme
nt
Selection
WBI WORKSHOP
7. Instrumental variables Identifying exogenous variation using a 3rd variable
Outcome regression:
D = 0,1 is our program – not random
• “Instrument” (Z) influences participation, but does not affect outcomes given participation (the “exclusion restriction”).
• This identifies the exogenous variation in outcomes due to the program.
Treatment regression:
iii DY
iii uZD
WBI WORKSHOP
Randomization: An example from Mexico
Progresa: Grants to poor families, conditional on preventive health care and school attendance for children. Given to women
Mexican government wanted an evaluation; order of community phase-in was random
Results: child illness down 23%; height increased 1-4cm; 3.4% increase in enrollment
After evaluation: PROGRESA expanded within Mexico, similar programs adopted throughout other Latin American countries
WBI WORKSHOP
School-based deworming: treat with a single pill every 6 months at a cost of 49 cents per student per year
27% of treated students had moderate-to-heavy infection, 52% of comparison
Treatment reduced school absenteeism by 25%, or 7 percentage points
Costs only $3 per additional year of school participation
Randomization: An example from Kenya
WBI WORKSHOP
Lessons randomized experiments
Randomized evaluations are often feasible Have been conducted successfully Are labor intensive and costly, but no more so
than other data collection activities
Results from randomized evaluations can be quite different from those drawn from retrospective evaluations
NGOs are well-suited to conduct randomized evaluations in collaboration with academics and external funders
WBI WORKSHOP
Lessons randomized experiments
While randomization is a powerful tool:
Internal validity can be questionable if we do not allow properly for selective compliance with the randomized assignment.
Not always feasible beyond pilot projects, which raises concerns about external validity.
Contextual factors influence outcomes; scaled up program may work differently.
WBI WORKSHOP
Matching Method Example :Piped water and child health in rural India
Is a child less vulnerable to diarrhea if he/she lives in a household with piped water?
Do children in poor, or poorly educated, households realize the same health gains from piped water as others?
Does income matter independently of parental education?
WBI WORKSHOP
The evaluation problem
There are observable differences between those households with piped water and those without it.
And these differences probably also matter to child health.
WBI WORKSHOP
Naïve comparisons can be deceptive
Common practice: compare villages with piped water, or some other infrastructure facility, and those without.
Failure to control for differences in village characteristics that influence infrastructure placement can severely bias such comparisons.
WBI WORKSHOP
Model for the propensity scores for piped water placement in India
Village variables: agricultural modernization, educational and social infrastructure.
Household variables: demographics, education, religion, ethnicity, assets, housing conditions, and state dummy variables.
WBI WORKSHOP
More likely to have piped water if:
Household lives in a larger village, with a high school, a pucca road, a bus stop, a telephone, a bank, and a market;
it is not a member of a scheduled tribe; it is a Christian household; it rents rather than owns the home; this is not a
perverse wealth effect, but is related to the fact that rental housing tends to be better equipped;
it is female-headed; it owns more land.
WBI WORKSHOP
Impacts of piped water on child health
The results for mean impact indicate that access to piped water significantly reduces diarrhea incidence and duration.
Disease incidence amongst those with piped water would be 21% higher without it. Illness duration would be 29% higher.
WBI WORKSHOP
Stratifying by income per capita:
No significant child-health gains amongst the poorest 40% (roughly corresponding to the poor in India).
Very significant impacts for the upper 60%
Without piped water there would be no difference in infant diarrhea incidence between the poorest quintile and the richest.
WBI WORKSHOP
When we stratify by both income and education:
For the poor, the education of female members matters greatly to achieving the child-health benefits from piped water.
Even in the poorest 40%, women’s schooling results in lower incidence and duration of diarrhea among children from piped water.
Women’s education matters much less for upper income groups.
WBI WORKSHOP
Lessons on matching methods
When neither randomization nor a baseline survey are feasible, careful matching to control for observable heterogeneity is crucial.
This requires good data, to capture the factors relevant to participation.
Look for heterogeneity in impact; average impact may hide important differences in the characteristics of those who gain or lose from the intervention.
WBI WORKSHOP
Tracking participants and non-participants over time
1. Single-difference matching can still be contaminated by selection biasLatent heterogeneity in factors relevant to participation
2. Tracking individuals over time allows a double differenceThis eliminates all time-invariant additive selection bias
3. Combining double difference with matching:This allows us to eliminate observable heterogeneity in factors relevant to subsequent changes over time
WBI WORKSHOP
Improving Evaluation Practice
When there is an impact evaluation: Build in evaluation ex-ante
Make a quality evaluation a primary responsibility of the manager of the program
Allocate the necessary resources
Encourage randomization whenever feasible (education, health, micro-finance, governance, not monetary policy…)
WBI WORKSHOP
Practical suggestions
Not every project needs impact evaluation: select projects in priority areas, where knowledge needed
Take advantage of budget constraints and phase-in
Require pilot project before large scale project
Finance pilot projects and evaluations with grants
Collaborate with others: Academics (e.g. Evaluation Based Policy Fund in UK) NGOs
WBI WORKSHOP
Evaluation: An Opportunity
Creating hard evidence of success will spend future resources more effectively influence other policymakers build public support