randomized controlled trials and the evaluation of development programs chris elbers vu university...

Randomized controlled trials and the evaluation of development programs

Chris ElbersVU University and AIID

11 November 2015

• Joint work with Jan Willem Gunning• Ideas developed when evaluating Dutch

development programs (commissioned work)• Related work by White (2006), Elbers, Gunning

and de Hoop (WD 2009), De Janvry, Finan and Sadoulet (REStat 2012)

RCTs under fire

• Great successes trigger criticism• RCTs’ claim to Gold Standard status has been

attacked more or less aggressively– External validity questioned– Black box approach not scientific– Cannot answer ‘big questions’ (e.g. on economic

development)– “experiments have no special ability to produce more

credible knowledge than other methods” (Deaton, 2010, JEL)

Practical considerations

• External validity not an issue if the goal is to evaluate a particular project

• RCTs are great for providing proof of conceptBut…• Actual programs not always amenable to

evaluation by RCT– Of course, the program could be changed…

• Salvage old-fashioned regression using observational (i.e. non-experimental) data?

Outline

• Internal validity or RCTs not automatic• Programs vs projects• What do we want to estimate when evaluating

a program?– The total program effect

• Application to health insurance in Vietnam• Conclusion

When internal validity of RCTs could fail

• Example: program implemented at arm’s length:– Program officers select (intended) participants based

on information specific to them• Evaluation of the program must follow this design– Direct random assignment of (intended) treatment to

ultimate participants misses the effect of POs’ selection activity → internal validity violated

– Randomization must be over POs …– … killing statistical power

• precision is of order of number of POs in sample

Regression alternative(simplest case)

• Take random sample from potential beneficiaries of program

• Observe (intended) treatment status T and outcome y

• Regress y on T– Regression coefficient on T is ATET (assuming absence

of confounders)– ATET times treatment fraction is per capita effect of

program (‘total program effect’)– Precision is of order of number of sampled individuals

Projects and programs

• RCTs best suited for evaluating simple interventions in homogeneous group of subjects with strong supervision of implementation

• Real-life interventions are messier– They are a change of already existing policies– They implement different policies simultaneously, with

different intensity, involving officers with varying degrees of enthusiasm, …

– Selective participation is part and parcel of a typical program• Should we not also try to quantify the impact of such

programs? Can we?

A regression approach for evaluation of programs

• Consider the following model

– (at least) two observations t=0 and t=1– Random sample of beneficiaries i– bundle of policy variables, vector of impact coefficients

• We want to link a change of the outcome variable over time to a change in policy :

• The contribution of the policy change to the change in outcome per beneficiary in the population is

TPE = • Policy mix must be observed at observation unit level

– Íntervention histories

The total program effect

TPE = • Different combinations of interventions affect different

individuals• Allow for selectivity: differences in policies will be correlated

to impact parameters (e.g. POs selecting participants based on perceived likelihood of success)– Selectivity compromises simple estimation of impact coefficients

even if is independent of and (usual assumptions needed for regressions)

– This problem of treatment heterogeneity can be tackled by modeling as a function of and

Formally:

Example: health insurance in Vietnam

• Using data from a study by Wagstaff and Pradhan (WB, 2005)

• Health insurance introduced in 1990s• Wagstaff and Pradhan try to avoid bias from

treatment heterogeneity by matching insured and uninsured households on the likelihood of being insured (propensity score matching)

• This technique not suitable for TPE– Sample with matched T/C individuals is no longer

representative of population

Table 1: Data for the Vietnam Insurance Example

Variable: change in (average) Mean Std. Dev Min MaxArm circumference (cm) 1.154 2.013 -7.3 9.4Height (cm) 5.175 11.35 -49.57 39.84Body weight (kg) 2.983 6.544 -27.75 26.25Health expenditure (‘000 Dong) 1,081 5,519 -8808 233,965Total consumption expenditure (‘000 Dong) 6,513 8,009 -22,988 116,826Insurance (binary at individual level) 0.170 0.268 0 1School attended -0.017 0.683 -3.5 3Currently attending school (binary at individual level) 0.082 0.388 -2 2Gender 0.002 0.138 -0.75 1Age 3.522 8.299 -48.43 48.6Farm dummy -0.079 0.421 -1 1Household size -0.267 1.696 -18 11

The number of observations varies between 4299 and 4305.Source: authors’ calculations using the Vietnam Living Standard Surveys 1992-3, 1997-8.

Estimation of TPE

• Program variable is fraction of insured household members– No ITT approach: (self) selection is part of an insurance

program• Unit of analysis is household– Individual outcomes averaged per household

• TPE estimated ‘naively’– Assuming constant across households i

• TPE estimated as proposed above, allowing to be household specific

Table 2: Total Program EffectsDependent variable Naïve program

effect† (I)(s.e.)

Total program effect†† (II)

(s.e.)

R-squared of underlying regressions

Remarks

I II Arm circumference .022

(.029)0.090***(0.027)

0.22 0.23

Height -0.190(0.154)

.095( 0.139)

0.34 0.36

Body weight 0.167*(0.083)

0.384***(0.074)

0.31 0.33

Health expenditure -28.08(60.59)

-52.79(51.01)

0.03 0.04 Total consumption included in controls

Health expenditure 55.41(66.42)

64.32(52.87)

0.00 0.00 Total consumption expenditure not included

Total consumption expenditure

626.7***(110.9)

888.8***(105.7)

0.10 0.12 Total consumption expenditure not included

Conclusions

• RCTs ill suited for evaluation of ‘programs’• Programs involve strategic participation by potential

participants and implementers– Evaluation must take that into account

• Regression techniques combined with proper sampling can identify combined impact of program elements – under nontrivial assumptions

• Simplest approximation of treatment heterogeneity suggests extensive use of interactions

randomized controlled trials and the evaluation of development programs chris elbers vu university...

Documents