icar-ifpri : problems of impact evaluation confounding factors and selection biases

Thursday, October 01, 2015

Problems of Impact evaluation

Confounding factors and selection biases

INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE

Objective of Impact Evaluation

Measure the effect of the program on its beneficiaries (and eventually on its

non-beneficiaries) by answering the counterfactual question:

• How would individuals who participated in a program have fared in the absence

of the program?

• How would those who were not exposed to the program have fared in the

presence of the program?

Two main problems arise: confounding factors and selection biases.


Comparing averages

• Individual-level measure of impact : what would be the outcome (e.g. purchase patterns) had he/she not participated to the program (in our case the treatment?

• Compare the individual with the program, to the same individual without the program, at the same time ?

Pb: can never observe both, missing data problem.

• Instead: Average impact on given groups of individuals

• Compare mean outcome in group of participants (Treatment group) to mean outcome in similar group of non-participants (Control group)

• Average Treatment effect on the treated (ATT):


Building a control group

• Compare what is comparable.

• Treatment” and “Control” groups must look the same if there was no

program.

• But: very often, those individuals who benefit from the program initially

differ from those who don’t.

• External selection: programs are explicitly targeted (Particular areas,

Particular individuals).

• Self selection: the decision to participate is voluntary.

Pb with comparing beneficiaries and non-beneficiaries: the difference can be

attributed to both the impact or the original differences.

• SELECTION BIAS when individuals or groups are selected for

treatment on characteristics that may also affect their outcomes.


Initial

PopulationSelection

Treatment Group

(receives procedure X)

Impact = Y Exp – Y Control

Quintile I

(Poorer)

Quintile II Quintile III Quintile IV QuintileV

(Richer)

Program selection does not lead to selection bias

Control group

(does not receives procedure X)


Initial

Population

Quintile I

(Poorer)

Quintile II Quintile III Quintile IV QuintileV

(Richer)

Control group

(does not receives procedure X)

Treatment Group

(receives procedure X)

Program selection leads to selection bias

Selection

Impact ≠ Y Exp – Y Control


“Sign” of the selection bias (1)

Program targeted on “worse-off” households

Treatment Control

Observed difference is negative

Actual impact


Treatment Control

Observed difference is very large

Actual impact

“Sign” of the selection bias (2)

Program targeted on “better-off” households


Exercise

1. Detail how confounding factors may be an issue in evaluating the impact

of your project.

2. Suppose that you were to compare households in communities were the

project was implemented to households in the neighboring communities

were the project was not implemented.

- What would be the likely sign of the selection bias?

3. Suppose that you were to compare, within the communities were the

project is implemented, households who have decided to use the project

(e.g. drink water from the tap or build stone bunds in their field), to the

ones who have decided not to use it.

- What would be the likely sign of the selection bias?


Step 3: What data to collect -Collect qualitative data and

quantitative data on both treatment and control households

in the baseline

• Qualitative data-key supplement to quantitative IE providing

complementary perspectives on program’s performance.

• Approaches include FGD, expert elicitation, key informant

interviews

• Useful 1. Can use to develop hypotheses as to how and why

the program would work

• 2. Before quantitative IE results are out, qualitative work can

provide quick insights on happenings in the program.

• 3. In the analysis stage, it can provide context and

explanations for the quantitative results


Mixed methods- Quantitative and qualitative

:

• Possible rationale:• Triangulation: to cross-check and compare results and offset any

weaknesses in one method by the strengths of another;

• Complementarities: examining overlapping and different facets of a

phenomenon by using several approaches and tools;

• Initiation: discovering paradoxes, identifying contradictions, or obtaining

fresh perspectives that relate to the topic of investigation;

• Development: using quantitative and qualitative methods sequentially, such

that results from the first method inform the use of the second method and

vice versa; and

• Expansion: adding breadth and scope to a project to convey findings and

recommendations to audiences with different expertise and interests.


Step 3 linked to Step 2: Focusing on quantitative methods-

Propose to execute double difference methods

• Central feature of the method is use of

longitudinal data to use “difference-in-

differences” or “double difference”.

• Method relies on baseline data collected before

the project implementation and follow-up data

after it starts to develop a “before/after”

comparison.

• Data collected from households receiving the

program and those that do not (“with the

program” / “without the program”).


Double difference method

Survey round

Intervention group

(Group I)

Control group

(Group C)

Difference across

groups

Follow-up I1 C1 I1 – C1

Baseline I0 C0 I0 – C0

Difference across

timeI1 – I0 C1 – C0

Double-difference

(I1 – C1) – (I0 – C0)


Double difference methods: continued

• Why both “before/after” and “with/without” data are necessary

?

• Suppose only collected data from beneficiaries.

• Suppose between the baseline and follow-up, some adverse event occurs.

• —the benefits of the program being more than offset by the damage

from bad event. These effects would show up in the difference over

time in the intervention group, in addition to the effects attributable to

the program.

• More generally, restricting the evaluation to only “before/after”

comparisons makes it impossible to separate program impacts from

the influence of other events that affect beneficiary households.

• To guard against this add a second dimension to evaluation design

that includes data on households “with” and “without” the program.


Summary of the method and its application

• The approach- By comparing changes in selected outcome

indicators between treatment group and the comparable

control group, the project impact is estimated quantitatively.

• Approach can also be applied to measure spillover effect from

the treated to the non-treated famers in the treated areas.

• examined by comparing the outcomes between non-treated households

in treatment areas and households in control areas.

• Moreover, impact heterogeneity across population sub-groups can be

investigated.

• The sub-groups can be defined based on caste, gender, agro-

ecological zones etc.

• Such information will be collected in the baseline survey.

Other issues


Using Monitoring Data

• Monitoring data -a critical resource in an IE. • Lets the evaluator verify which participants received the

program,

• how fast the program is expanding,

• how resources are being spent, and

• whether activities are being implemented as planned. This information is critical to implementing the En, for example, to ensure that baseline data are collected before the program is introduced and to verify the integrity of the treatment and comparison groups.

• In addition, M can provide information on the cost of implementing the program, which is also needed for cost-benefit analysis.


Evaluation question

• What is the impact or causal effect of the

program on an outcome of interest?


Setting up an evaluation: The steps (Gertler

et al) • (i) establishing the type of question to be answered

by the evaluation, (ii) constructing a theory of change that outlines how the project is supposed to achieve the intended results (iii) developing a results chain, formulating hypotheses to be tested by the evaluation, and selecting performance indicators.

• All of these steps are best taken at the outset of the program, engaging a range of stakeholders from policy makers to program managers, to forge a common vision of the program’s goals and how they will be achieved. This engagement builds consensus regarding the main questions to be answered and will strengthen links between the evaluation, program implementation, and policy.


Theories of change

• A theory of change is a description of how an intervention is supposed to deliver the desired results. It describes the causal logic of how and why a particular project, program, or policy will reach its intended outcomes.

• A theory of change is a key underpinning of any impact evaluation, given the cause-and-effect focus of the research.

• A theory of change can specify the research questions.

• The best time to develop a theory of change for a program is at the beginning of the design process, when stakeholders can be brought together to develop a common vision for the program, its goals, and the path to achieving those goals.

• Stakeholders can then start program implementation from a common understanding of the program, how it works, and its objectives.


Theories of change: The results chain

• A basic results chain maps the following elements

• Inputs: Resources at the disposal of the project, including staff and budget Activities: Actions taken or work performed to convert inputs into outputs

• Outputs: The tangible goods and services that the project activities produce (They are directly under the control of the implementing agency.)

• Outcomes: Results likely to be achieved once the benefi ciarypopulation

• uses the project outputs (They are usually achieved in the short-to-medium

• term.)

• Final outcomes: The fi nal project goals (They can be infl uenced by multiple

• factors and are typically achieved over a longer period of time.)

• The results chain has three main parts:


Results

• Results: Intended results consist of the

outcomes and final outcomes, which are not

under the direct control of the project and are

contingent on behavioral changes by program

beneficiaries.


Selecting performance indicators (Gertler et

al 2010)

• SMART is the rule

• Specific: to measure the information required as

closely as possible

• Measurable: to ensure that the information can

be readily obtained

• Attributable: to ensure that each measure is

linked to the project’s efforts

• Realistic: to ensure that the data can be

obtained in a timely fashion, with reasonable

frequency, and at reasonable cost

• Targeted: to the objective population.


Intent to treat versus treatment- easier to

understand with randomized assessment

example • Program offering- Less than full compliance

• Non compliance possible from both sides beneficiaries as well as non-beneficiaries

• Under these circumstances, a straight comparison of the group originally assigned to treatment with the group originally assigned to comparison will yield the “intent to-treat” estimate (ITT).

• We will be comparing those whom we intended to treat (those assigned to the treatment group) with those whom we intended not to treat (those assigned to the comparison group).

• It is not unimportant since most policy makers can only offer a program and cannot force the program on their target population


What about treatment effects?• Getting the treatment effect requires correcting

for the fact that some of the units assigned to the

treatment group did not actually receive the

treatment, or that some of the units assigned to

the comparison group actually did receive it.

• In other words, we want to estimate the impact

of the program on those to whom treatment was

offered and who actually enrolled. This is the

“treatment-on the-treated” estimate (TOT).


Example (Gertler et al 2010)

• Enroll-if-offered. These are the individuals who comply with their assignment.

• If they are assigned to the treatment group (offered the program), they take it up, or enroll; if they are assigned to the comparison group (not offered the program), they do not enroll.

• Never. These are the individuals that never enroll in or take up the program, even if they are assigned to the treatment group. They are noncompliers in the treatment group.

• Always. These are the individuals who will find a way to enroll in the program or take it up, even if they are assigned to the comparison group. They are noncompliers in the comparison group.


Non-compliance issue: continued (Gertler et

al 2010)


ITT and ATT (Gertler et al 2010)

• If the average income (Y) for the treatment group is $110, and the average income for the comparison group is $70, then the ITT is $40.

• Second, we need to recover the treatment-on-the-treated estimate (TOT) from the intention-to-treat estimate. To do that, we will need to identify where the $40 difference came from. Let us proceed by elimination. First, we know that the difference cannot be caused by any differences between the Nevers in the treatment and comparison groups. The reason is that the Nevers never enroll in the program, so that for them, it makes no difference whether they are in the treatment group or in the comparison group. Second,we know that the $40 difference cannot be caused by differences between the Always people in the treatment and comparison groups because the Always people always enroll in the program. For them, too, it makes no difference whether they are in the treatment group or the comparison group.

• Thus, the difference in outcomes between the two groups must necessarily come from the effect of the program on the only group affected by their assignment to treatment or comparison, that is, the Enroll-if-offered group.


ITT and ATT

• Suppose that a doctor tells everyone in a treatment group to go home and exercise for an hour per day and tell the control group nothing.

• After a month, if he evaluates the difference in their blood pressure.

• If just compare the difference in mean blood pressures between the two groups, get the ITT.

• This doesn't tell the causal effect of exercise on blood pressure, but the causal effect of telling people to exercise on blood pressure. We would presume that this estimate would be smaller than the treatment effect of exercise per se, as only a (small!) fraction of people in the treatment group would follow the advice.


Retrieving ATT (Gertler et al 2010)

• We know that the entire impact of $40 came from a difference in enrollment for the 80 percent of the units in our sample who are Enroll-if-offered. Now if 80 percent of the units are responsible for an average impact of $40 for the entire group offered treatment, then the impact on those 80 percent of Enroll-if-offered must be 40/0.8, or $50. Put another way, the impact of the program for the Enroll-if-offered is $50, but when this impact is spread across the entire group offered treatment, the average effect is watered down by the percent that was noncompliant with the original randomized assignment.

• ATT=40/0.8=50

icar-ifpri : problems of impact evaluation confounding factors and selection biases

Education