icar-ifpri : problems of impact evaluation confounding factors and selection biases
Upload: international-food-policy-research-institute-south-asia-office
Post on 12-Apr-2017
377 views
TRANSCRIPT
Thursday, October 01, 2015
Problems of Impact evaluation
Confounding factors and selection biases
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE Page 2
Objective of Impact Evaluation
Measure the effect of the program on its beneficiaries (and eventually on its
non-beneficiaries) by answering the counterfactual question:
• How would individuals who participated in a program have fared in the absence
of the program?
• How would those who were not exposed to the program have fared in the
presence of the program?
Two main problems arise: confounding factors and selection biases.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE Page 3
Comparing averages
• Individual-level measure of impact : what would be the outcome (e.g. purchase patterns) had he/she not participated to the program (in our case the treatment?
• Compare the individual with the program, to the same individual without the program, at the same time ?
Pb: can never observe both, missing data problem.
• Instead: Average impact on given groups of individuals
• Compare mean outcome in group of participants (Treatment group) to mean outcome in similar group of non-participants (Control group)
• Average Treatment effect on the treated (ATT):
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE Page 4
Building a control group
• Compare what is comparable.
• Treatment” and “Control” groups must look the same if there was no
program.
• But: very often, those individuals who benefit from the program initially
differ from those who don’t.
• External selection: programs are explicitly targeted (Particular areas,
Particular individuals).
• Self selection: the decision to participate is voluntary.
Pb with comparing beneficiaries and non-beneficiaries: the difference can be
attributed to both the impact or the original differences.
• SELECTION BIAS when individuals or groups are selected for
treatment on characteristics that may also affect their outcomes.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE Page 5
Initial
PopulationSelection
Treatment Group
(receives procedure X)
Impact = Y Exp – Y Control
Quintile I
(Poorer)
Quintile II Quintile III Quintile IV QuintileV
(Richer)
Program selection does not lead to selection bias
Control group
(does not receives procedure X)
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE Page 6
Initial
Population
Quintile I
(Poorer)
Quintile II Quintile III Quintile IV QuintileV
(Richer)
Control group
(does not receives procedure X)
Treatment Group
(receives procedure X)
Program selection leads to selection bias
Selection
Impact ≠ Y Exp – Y Control
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE Page 7
“Sign” of the selection bias (1)
Program targeted on “worse-off” households
Treatment Control
Observed difference is negative
Actual impact
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE Page 8
Treatment Control
Observed difference is very large
Actual impact
“Sign” of the selection bias (2)
Program targeted on “better-off” households
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE Page 9
Exercise
1. Detail how confounding factors may be an issue in evaluating the impact
of your project.
2. Suppose that you were to compare households in communities were the
project was implemented to households in the neighboring communities
were the project was not implemented.
- What would be the likely sign of the selection bias?
3. Suppose that you were to compare, within the communities were the
project is implemented, households who have decided to use the project
(e.g. drink water from the tap or build stone bunds in their field), to the
ones who have decided not to use it.
- What would be the likely sign of the selection bias?
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Step 3: What data to collect -Collect qualitative data and
quantitative data on both treatment and control households
in the baseline
• Qualitative data-key supplement to quantitative IE providing
complementary perspectives on program’s performance.
• Approaches include FGD, expert elicitation, key informant
interviews
• Useful 1. Can use to develop hypotheses as to how and why
the program would work
• 2. Before quantitative IE results are out, qualitative work can
provide quick insights on happenings in the program.
• 3. In the analysis stage, it can provide context and
explanations for the quantitative results
Page 10
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Mixed methods- Quantitative and qualitative
:
• Possible rationale:• Triangulation: to cross-check and compare results and offset any
weaknesses in one method by the strengths of another;
• Complementarities: examining overlapping and different facets of a
phenomenon by using several approaches and tools;
• Initiation: discovering paradoxes, identifying contradictions, or obtaining
fresh perspectives that relate to the topic of investigation;
• Development: using quantitative and qualitative methods sequentially, such
that results from the first method inform the use of the second method and
vice versa; and
• Expansion: adding breadth and scope to a project to convey findings and
recommendations to audiences with different expertise and interests.
Page 11
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Step 3 linked to Step 2: Focusing on quantitative methods-
Propose to execute double difference methods
• Central feature of the method is use of
longitudinal data to use “difference-in-
differences” or “double difference”.
• Method relies on baseline data collected before
the project implementation and follow-up data
after it starts to develop a “before/after”
comparison.
• Data collected from households receiving the
program and those that do not (“with the
program” / “without the program”).
Page 12
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Double difference method
Survey round
Intervention group
(Group I)
Control group
(Group C)
Difference across
groups
Follow-up I1 C1 I1 – C1
Baseline I0 C0 I0 – C0
Difference across
timeI1 – I0 C1 – C0
Double-difference
(I1 – C1) – (I0 – C0)
Page 13
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Double difference methods: continued
• Why both “before/after” and “with/without” data are necessary
?
• Suppose only collected data from beneficiaries.
• Suppose between the baseline and follow-up, some adverse event occurs.
• —the benefits of the program being more than offset by the damage
from bad event. These effects would show up in the difference over
time in the intervention group, in addition to the effects attributable to
the program.
• More generally, restricting the evaluation to only “before/after”
comparisons makes it impossible to separate program impacts from
the influence of other events that affect beneficiary households.
• To guard against this add a second dimension to evaluation design
that includes data on households “with” and “without” the program.
Page 14
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Summary of the method and its application
• The approach- By comparing changes in selected outcome
indicators between treatment group and the comparable
control group, the project impact is estimated quantitatively.
• Approach can also be applied to measure spillover effect from
the treated to the non-treated famers in the treated areas.
• examined by comparing the outcomes between non-treated households
in treatment areas and households in control areas.
• Moreover, impact heterogeneity across population sub-groups can be
investigated.
• The sub-groups can be defined based on caste, gender, agro-
ecological zones etc.
• Such information will be collected in the baseline survey.
Page 15
Other issues
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Using Monitoring Data
• Monitoring data -a critical resource in an IE. • Lets the evaluator verify which participants received the
program,
• how fast the program is expanding,
• how resources are being spent, and
• whether activities are being implemented as planned. This information is critical to implementing the En, for example, to ensure that baseline data are collected before the program is introduced and to verify the integrity of the treatment and comparison groups.
• In addition, M can provide information on the cost of implementing the program, which is also needed for cost-benefit analysis.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Evaluation question
• What is the impact or causal effect of the
program on an outcome of interest?
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Setting up an evaluation: The steps (Gertler
et al) • (i) establishing the type of question to be answered
by the evaluation, (ii) constructing a theory of change that outlines how the project is supposed to achieve the intended results (iii) developing a results chain, formulating hypotheses to be tested by the evaluation, and selecting performance indicators.
• All of these steps are best taken at the outset of the program, engaging a range of stakeholders from policy makers to program managers, to forge a common vision of the program’s goals and how they will be achieved. This engagement builds consensus regarding the main questions to be answered and will strengthen links between the evaluation, program implementation, and policy.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Theories of change
• A theory of change is a description of how an intervention is supposed to deliver the desired results. It describes the causal logic of how and why a particular project, program, or policy will reach its intended outcomes.
• A theory of change is a key underpinning of any impact evaluation, given the cause-and-effect focus of the research.
• A theory of change can specify the research questions.
• The best time to develop a theory of change for a program is at the beginning of the design process, when stakeholders can be brought together to develop a common vision for the program, its goals, and the path to achieving those goals.
• Stakeholders can then start program implementation from a common understanding of the program, how it works, and its objectives.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Theories of change: The results chain
• A basic results chain maps the following elements
• Inputs: Resources at the disposal of the project, including staff and budget Activities: Actions taken or work performed to convert inputs into outputs
• Outputs: The tangible goods and services that the project activities produce (They are directly under the control of the implementing agency.)
• Outcomes: Results likely to be achieved once the benefi ciarypopulation
• uses the project outputs (They are usually achieved in the short-to-medium
• term.)
• Final outcomes: The fi nal project goals (They can be infl uenced by multiple
• factors and are typically achieved over a longer period of time.)
• The results chain has three main parts:
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Results
• Results: Intended results consist of the
outcomes and final outcomes, which are not
under the direct control of the project and are
contingent on behavioral changes by program
beneficiaries.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Selecting performance indicators (Gertler et
al 2010)
• SMART is the rule
• Specific: to measure the information required as
closely as possible
• Measurable: to ensure that the information can
be readily obtained
• Attributable: to ensure that each measure is
linked to the project’s efforts
• Realistic: to ensure that the data can be
obtained in a timely fashion, with reasonable
frequency, and at reasonable cost
• Targeted: to the objective population.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Intent to treat versus treatment- easier to
understand with randomized assessment
example • Program offering- Less than full compliance
• Non compliance possible from both sides beneficiaries as well as non-beneficiaries
• Under these circumstances, a straight comparison of the group originally assigned to treatment with the group originally assigned to comparison will yield the “intent to-treat” estimate (ITT).
• We will be comparing those whom we intended to treat (those assigned to the treatment group) with those whom we intended not to treat (those assigned to the comparison group).
• It is not unimportant since most policy makers can only offer a program and cannot force the program on their target population
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
What about treatment effects?• Getting the treatment effect requires correcting
for the fact that some of the units assigned to the
treatment group did not actually receive the
treatment, or that some of the units assigned to
the comparison group actually did receive it.
• In other words, we want to estimate the impact
of the program on those to whom treatment was
offered and who actually enrolled. This is the
“treatment-on the-treated” estimate (TOT).
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Example (Gertler et al 2010)
• Enroll-if-offered. These are the individuals who comply with their assignment.
• If they are assigned to the treatment group (offered the program), they take it up, or enroll; if they are assigned to the comparison group (not offered the program), they do not enroll.
• Never. These are the individuals that never enroll in or take up the program, even if they are assigned to the treatment group. They are noncompliers in the treatment group.
• Always. These are the individuals who will find a way to enroll in the program or take it up, even if they are assigned to the comparison group. They are noncompliers in the comparison group.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Non-compliance issue: continued (Gertler et
al 2010)
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
ITT and ATT (Gertler et al 2010)
• If the average income (Y) for the treatment group is $110, and the average income for the comparison group is $70, then the ITT is $40.
• Second, we need to recover the treatment-on-the-treated estimate (TOT) from the intention-to-treat estimate. To do that, we will need to identify where the $40 difference came from. Let us proceed by elimination. First, we know that the difference cannot be caused by any differences between the Nevers in the treatment and comparison groups. The reason is that the Nevers never enroll in the program, so that for them, it makes no difference whether they are in the treatment group or in the comparison group. Second,we know that the $40 difference cannot be caused by differences between the Always people in the treatment and comparison groups because the Always people always enroll in the program. For them, too, it makes no difference whether they are in the treatment group or the comparison group.
• Thus, the difference in outcomes between the two groups must necessarily come from the effect of the program on the only group affected by their assignment to treatment or comparison, that is, the Enroll-if-offered group.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
ITT and ATT
• Suppose that a doctor tells everyone in a treatment group to go home and exercise for an hour per day and tell the control group nothing.
• After a month, if he evaluates the difference in their blood pressure.
• If just compare the difference in mean blood pressures between the two groups, get the ITT.
• This doesn't tell the causal effect of exercise on blood pressure, but the causal effect of telling people to exercise on blood pressure. We would presume that this estimate would be smaller than the treatment effect of exercise per se, as only a (small!) fraction of people in the treatment group would follow the advice.
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Retrieving ATT (Gertler et al 2010)
• We know that the entire impact of $40 came from a difference in enrollment for the 80 percent of the units in our sample who are Enroll-if-offered. Now if 80 percent of the units are responsible for an average impact of $40 for the entire group offered treatment, then the impact on those 80 percent of Enroll-if-offered must be 40/0.8, or $50. Put another way, the impact of the program for the Enroll-if-offered is $50, but when this impact is spread across the entire group offered treatment, the average effect is watered down by the percent that was noncompliant with the original randomized assignment.
• ATT=40/0.8=50