emergent u.s. design and analysis strategies for learning ... · emergent u.s. design and analysis...
TRANSCRIPT
Emergent U.S. Design and Analysis Strategies for Learning More from Social
Experiments – with Development Applications (PART 1)
Stephen H. Bell, Ph.D. Abt Associates
September 1, 2014
The Three Biggest RCT Challenges in the U.S.
• Making randomized exclusions acceptable
– for internal validity
• Doing so under characteristic circumstances
– for external validity
• Finding ways to have experimental evidence guide program improvements
– for policy relevance
• Only time for two first and last
– Furthest along in the States
– External validity (e.g, Klerman, 2014; Olsen et al., 2013)
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 2
Outline of Workshop ( pre-break: 14.00 – 15.30)
Making random assignment acceptable
• Constituencies and their concerns
• Estimating long-run impacts after the control group
receives the intervention
• Building local agency priorities into the random
assignment design
Questions
Discussion: possible development applications
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 3
Outline of Workshop (post-break: 16.00 – 17.30)
Learning “what works” to guide program improvements
• Case study: Training for health care occupations
– Learning which local program models work best
• Case study: The role of quality in early childhood programs
– Learning what in-program experiences help individuals most
Questions
Discussions: possible development applications
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 4
Making Program Exclusions for the Control Group Palatable
• Goal = be able to do RCTs more often
• Caveat = do so without sacrificing scientific integrity
• Constituencies
– Political / community leaders
– Implementation agencies
– Target population for intervention (major issue in U.S.; not apparent as a challenge in developing world will not address)
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 5
For Political / Community Leaders
Graduated phase-in of intervention (“step wedge” design)
• Fits resource and implementation capacity circumstances
• Lottery = fairest way to determine “who goes first”
• No “losers” in long-run
What about evidence of long-run impacts / sustainability?
Use recursive estimation (Bell & Bradley, 2013)
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 6
For Implementation Agencies
• Agencies care about
– How long exclusions will last
– How many cases go into the C group
– Which cases go in
• How long = “step wedge” design
• How many = uneven random assignment ratio
• Which ones = “Wild card” exemptions
“Agency-Preferred Random Assignment”
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 7
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 8
Estimation of Long-Run Impacts after the Control Group Receives the Intervention
OUTLINE:
• Motivation of problem
• Method for estimating impacts in RCTs after the control
group receives the intervention
• The sole assumption behind the method . . . and the
conditions under which it is fulfilled
• Designing RCTs to test the assumption
• Making future satisfaction of the assumption more likely
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 9
Context of the Problem
• For equity or to obtain leader & implementation agency cooperation
– guarantee all communities or families a new intervention within a discrete time period (e.g., one year )
once included, control group can no longer provide the no-intervention counterfactual with which to estimate impacts experimentally
• Need to know longer-term impacts to judge success / test sustainability
• Example: Aflasafe pilot in Nigeria
– Rolling out to all maize-producing villages within 3 years
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 10
• Studies providing lagged intervention to control group
– Most stop reporting impacts no long-run findings
– Those that don’t stop typically do pre/post or interrupted
time series analysis to go further
• Biased if . . .
– Exogenous trend shift concurrent with program start
– Different trend shift (if have comparison group for ITS)
• What to do instead?
Take continued advantage of experimental design . .
.
Getting Past the End of the Control Group: Bad and Better Options
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 11
Desired Time 2 Comparison
0
1
2
3
4
5
6
7
8
9
Baseline Time 1 Time 2
Treatment Control
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 12
Observed Time 2 Comparison
0
1
2
3
4
5
6
7
8
9
Baseline Time 1 Time 2
Treatment
Control
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 13
Recursive Method for Estimating Time 2 Impact
0
1
2
3
4
5
6
7
8
9
Baseline Time 1 Time 2
Treatment
Lag Control
Control
Computation
• Subtract I1 from Y2
C to get Y2C *
• Estimate I2 = Y2
T - Y2C * = ( Y2
T - Y2C ) + ( Y1
T - Y1C )
• Bell & Bradley (2013) provide
– Standard error formula
– Recursive extension to impacts in third period & beyond . . .
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 14
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 15
Sufficient Condition for Unbiased Estimation
Impact on T group in its initial year of intervention
= Impact on C group in its initial year of intervention
Things that affect impacts must stay constant over time;
Things that affect outcomes can change!
Next . . .
• Identify conditions under which the assumption of constancy holds
• Discuss how one might test the conditions
• Suggest how those conditions might be made more likely
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 16
Intervention-Related Factors that Need to Remain Stable over Time
Sponsor’s guidelines for intervention’s design and
implementation
– Eligibility guidelines / intake process
– Design of services / service delivery guidelines
Central implementation agency’s desire and evolving ability
to support the intervention
– OK if local partner agencies’ desires and abilities differ
between the two years, if random (i.e., not a time trend)
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 17
Consequences of Delay that Must Not Happen (Part I)
Different types of cases choose to participate in the
intervention in the C group than in the T group because . .
– Learn the study’s early findings (unlikely)
– Economy shifts over time
– Other programs become available
• Don’t need 100% participation for either T or C group
• Don’t even need the same % participation
– can do separate Bloom no-show adjustment for each group
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 18
Consequences of Delay that Must Not Happen (Part II)
Local implementation agencies systematically invest less
effort in the intervention in the later C-group year than did
the same or different local agencies in the T-group year
For example, one-year delay in opportunity to launch
intervention may
– Reduce enthusiasm
– Involve the agency in other new initiatives
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 19
An Experimental Test of the Method
• 3-way random assignment to:
– Immediate treatment (T)
– Lagged treatment (L)
– Permanent control (C)
• Use T and L to compute the recursive estimate
• Compare to purely experimental long- run result (T vs. C)
• At least 3 such tests exist in the U.S. – by happenstance
– all are too small to be informative
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 20
Making Satisfaction of Unbiasedness Conditions More Likely
• Lock in sponsor/developer commitment to keep the
intervention unchanged over time
• Maximize all local implementation agencies’ up-front
commitment to fully implement with fidelity, regardless of
timing
• Minimize circulation of early study results
• Shorten the lag before C group implementation? (fewer
things change)
• Lengthen the lag? (fewer years of assumptions)
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 21
Known Applications and Extensions
Two known uses of the recursive method in K-12 education
research in the U.S.
• PCI Reading Evaluation
– No significant impact in Year 2 using QED method
– Significant impact in Year 2 with recursive approach
• Alabama Mathematics, Science, and Technology Initiative
(AMSTI) Evaluation
– Significant impact in Year 2 with recursive approach
– Also applied, through two iterations, to Year 3
Making Intervention Exclusions Palatable to Implementation Agencies
• Many experiments seek to randomize
– Facilities (e.g., health care clinics)
– Workers (e.g., farmers)
– Target clients (e.g., poor children)
• Consider cases where randomization will be carried out all
within one organization
– Organization usually has preferences for inclusions and
exclusions
– Always prefers fewer and shorter exclusions
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 22
Concerns about Control Group Exclusions for Implementing Agencies
• In other words, regarding control group members
agencies care about
– How long the exclusion lasts
– How many cases are excluded
– Which cases are excluded
• How long = limit the embargo & use recursive estimation
• How many = use an uneven random assignment ratio
• Which ones = “wild card” exemptions
“agency-preferred random assignment”
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 23
Options for Addressing Agency Concerns about “How Many” and “Which Ones”
• Allow more than half to participate (immediately)
– Tilt the random assignment ratio away from 50:50, toward
the treatment group [does not create mismatch or bias ]
• Allow “wild card” exemptions from RA for a few cases
– Automatically “in”, no questions asked
– Excluded from the research
• Higher odds of inclusion for “preferred” cases
– Above 50:50 for those agency most wants included
– Below 50:50 for others
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 24
Abt Associates Footer Information goes here
Tilting the Random Assignment Ratio
Ability to detect smaller impacts deteriorates slowly as
treatment group share of sample goes up . . .
Treatment Group
Share
Random Assignment
Ratio ( T : C )
Minimum Detectable
Impact
0.50 1:1 100 units
0.60 3:2 102
0.67 2:1 106
0.75 3:1 116
0.83 5:1 135
0.90 9:1 167
Abt Associates Footer Information goes here
“Wild Card” Exemptions from Random Assignment
Minor distortions (< 10%) unless combine large impact
ratio (3 to 1) with generous exemptions ( > 1 in 20)
Exempted Share Impact Ratios that
Hold Distortion < 10%
Size of Distortion
with 3 to 1 Ratio
1 in 5 < 1.6 to 1 29%
1 in 10 < 2.1 to 1 23%
1 in 15 < 2.6 to 1 17%
1 in 20 < 3.2 to 1 9%
1 in 33 < 4.7 to 1 7%
1 in 50 < 6.5 to 1 4%
“Agency-Preferred Random Assignment”
• Set the treatment group assignment probability higher for
preferred cases than other cases (Olsen et al., 2007)
Example: 2:1 (T vs. C) for preferred cases
1:2 (T vs. C) for others
• If equal shares are “preferred” and “other,” results are
identical to uniform 1:1 (T vs. C) ratio for both groups re
– Total number randomized
– Share (50%) and number excluded as control group
– Expected value of impact estimate
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 27
How Findings Change with “Agency-Preferred Random Assignment”
• Minimum detectable impacts increase
– Uneven T:C ratios MDIs rise above the illustrative
benchmark of 100 (reversing ratios for “preferred” vs.
other cases makes no difference to MDIs)
• For an ongoing program, able to calculate impacts
separately for
– “Usually included” group (the “agency preferred” cases),
whose finding cannot be distorted by added cases with
different impact magnitudes
– Cases that would be added by expansion (the “other”
cases), which are also important to policy
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 28
Abt Associates Footer Information goes here
Minimum Detectable Impacts for “Agency-Preferred Random Assignment”
For “usually included” group, confining analysis to half the
sample escalates the penalty from an uneven RA ratio
T:C Ratios Used Overall Minimum
Detectable Impact
MDI for Usually-
Included Group
1:1 and 1:1 100 141
3:2 and 2:3 102 144
2:1 and 1:2 106 149
3:1 and 1:3 116 164
5:1 and 1:5 135 190
9:1 and 1:9 167 235
Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 30