evaluating public programs with close substitutes: …pkline/papers/headstart_qje.pdfevaluating...
TRANSCRIPT
Evaluating Public Programs with Close Substitutes:
The Case of Head Start
Patrick Kline
UC Berkeley/NBER
Christopher Walters∗
UC Berkeley/NBER
April 2016
Abstract
This paper empirically evaluates the cost-effectiveness of Head Start, the largest early-
childhood education program in the United States. Using data from the Head Start Impact
Study (HSIS), we show that Head Start draws roughly a third of its participants from compet-
ing preschool programs, many of which receive public funds. Accounting for the public savings
associated with reduced enrollment in other subsidized preschools substantially increases esti-
mates of the program’s rate of return. To parse Head Start’s test score impacts relative to home
care and competing preschools, we selection correct test scores in each care environment using
excluded interactions between experimental offer status and household characteristics. We find
that Head Start’s effects are greater for children who would not otherwise attend preschool and
for children that are less likely to participate in the program.
Keywords: Program Evaluation, Head Start, Early-Childhood Education, Cost-Benefit Analy-
sis, Substitution Bias
∗We thank Danny Yagan, James Heckman, Nathan Hendren, Magne Mogstad, Jesse Rothstein, Melissa Tartari,and seminar participants at UC Berkeley, the University of Chicago, Arizona State University, Harvard University,Stanford University, the NBER Public Economics Spring 2015 Meetings, Columbia University, UC San Diego, Prince-ton University, the NBER Labor Studies Summer Institute 2015 Meetings, Uppsala University, MIT, and VanderbiltUniversity for helpful comments. Raffaele Saggio provided outstanding research assistance. We also thank ResearchConnections for providing the data. Generous funding support for this project was provided by the Berkeley Centerfor Equitable Growth.
1
I. Introduction
Many government programs provide services that can be obtained, in roughly comparable form, via
markets or through other public organizations. The presence of close program substitutes compli-
cates the task of program evaluation by generating ambiguity regarding which causal estimands are
of interest. Standard intent-to-treat impacts from experimental demonstrations can yield unduly
negative assessments of program effectiveness if most participants would receive similar services
in the absence of an intervention (Heckman et al. 2000). On the other hand, experiments that
artificially restrict substitution alternatives may yield impacts that are not representative of the
costs and benefits of actual policy changes.
This paper assesses the cost-effectiveness of Head Start – a prominent public education program
for which close public and private substitutes are widely available. Head Start is the largest early-
childhood education program in the United States. Launched in 1965 as part of President Lyndon
Johnson’s war on poverty, the program has evolved from an eight-week summer program into a
year-round program that offers education, health, and nutrition services to disadvantaged children
and their families. By 2013, Head Start enrolled about 900,000 3- and 4-year-old children at a cost
of $7.6 billion (US DHHS 2013).
Views on the effectiveness of Head Start vary widely (Ludwig and Phillips [2007] and Gibbs,
Ludwig and Miller [2011] provide reviews). A number of observational studies find substantial
short- and long-run impacts on test scores and other outcomes (Currie and Thomas 1995; Garces,
Thomas and Currie 2002; Ludwig and Miller 2007; Deming 2009; Carneiro and Ginja 2014). By
contrast, a recent randomized evaluation – the Head Start Impact Study (HSIS) – finds small
impacts on test scores that fade out quickly (Puma et al. 2010; Puma, Bell and Heid 2012). These
results have generally been interpreted as evidence that Head Start is ineffective and in need of
reform (Barnett 2011; Klein 2011).
Two observations suggest such conclusions are premature. First, research on early childhood
interventions finds long run gains in adult outcomes despite short run fadeout of test score impacts
(Heckman et al. 2010; Heckman, Pinto and Savelyev 2013; Chetty et al. 2011; Chetty, Friedman
and Rockoff 2014b). Second, roughly one-third of the HSIS control group participated in alternate
forms of preschool. This suggests that the HSIS may have shifted many students between different
sorts of preschools without altering their exposure to preschool services. The aim of this paper is
to clarify how the presence of substitute preschools affects the interpretation of the HSIS results
and the cost-effectiveness of the Head Start program.
Our study begins by revisiting the experimental impacts of the HSIS on student test scores.
We replicate the fade-out pattern found in previous work but find that adjusting for experimental
non-compliance leads to imprecise estimates of the effect of Head Start participation beyond the
first year of the experiment. As a result, the conclusion of complete effect fadeout is less clear than
naive intent-to-treat estimates suggest. Turning to substitution patterns, we find that roughly one
third of Head Start compliers in the HSIS experiment would have participated in other forms of
preschool had they not been lotteried into the program. These alternative preschools draw heavily
2
on public funding, which mitigates the net costs to government of shifting children from other
preschools into Head Start.
These facts motivate a theoretical analysis clarifying which parameters are (and are not) pol-
icy relevant when publicly subsidized program substitutes are present. We work with a stylized
model where test score impacts are valued according to their effects on children’s after-tax lifetime
earnings. We show that, when competing preschool programs are not rationed, the policy-relevant
causal parameter governing the benefits of Head Start expansion is an average effect of Head Start
participation relative to the next best alternative, regardless of whether that alternative is a com-
peting program or home care. This parameter coincides with the local average treatment effect
(LATE) identified by a randomized experiment with imperfect compliance when the experiment
contains a representative sample of program “compliers” (Angrist, Imbens and Rubin 1996). Hence,
imperfect compliance and program substitution, often thought to be confounding limitations of so-
cial experiments, turn out to be virtues when the substitution patterns in the experiment replicate
those found in the broader population.
We use this result to derive an estimable benefit cost ratio associated with Head Start expan-
sions. This ratio scales Head Start’s projected impacts on the after-tax earnings of children by
its net costs to government inclusive of fiscal externalities. Chief among these externalities is the
cost savings that arise when Head Start draws children away from competing subsidized preschool
programs. While such effects are typically ignored in cost-benefit analyses of Head Start and other
similar programs (e.g., CEA 2015), we find via a calibration exercise that such omissions can be
quantitatively important: Head Start roughly breaks even when the cost savings associated with
program substitution are ignored, but yields benefits nearly twice as large as costs when these sav-
ings are incorporated. This appears to be a robust finding – after accounting for fiscal externalities,
Head Start’s benefits exceed its costs whenever short run test score impacts yield earnings gains
within the range found in the recent literature.
A limitation of our baseline analysis is that it assumes changes in program scale do not alter
the mix of program compliers. To address this issue, we also consider “structural reforms” to
Head Start that change the mix of compliers without affecting test score outcomes. Examples
of such reforms might include increased transportation services, marketing efforts, or spending on
program features that parents value. Households who respond to structural reforms may differ from
experimental compliers on unobserved dimensions, including their mix of counterfactual program
choices. Assessing these reforms therefore requires knowledge of parameters not directly identified
by the HSIS experiment. Specifically, we show that such reforms require identification of a variant
of the marginal treatment effect (MTE) concept of Heckman and Vytlacil (1999).
To assess reforms that attract new children, we develop a selection model that parameterizes
variation in treatment effects with respect to counterfactual care alternatives as well as observed
and unobserved child characteristics. We prove that the model parameters are identified, and
propose a two-step control function estimator that exploits heterogeneity in the response to Head
Start offers across sites and demographic groups to infer relationships between unobserved factors
3
driving preschool enrollment and potential outcomes. The estimator is shown to pass a variety of
specification tests and to accurately reproduce patterns of treatment effect heterogeneity found in
the experiment. The model estimates indicate that Head Start has large positive short run effects
on the test scores of children who would have otherwise been cared for at home, and insignificant
effects on children who would otherwise attend other preschools – a finding corroborated by Feller et
al. (forthcoming), who reach similar conclusions using principal stratification methods (Frangakis
and Rubin 2002). Our estimates also reveal a “reverse Roy” pattern of selection whereby children
with unobserved characteristics that make them less likely to enroll in Head Start experience larger
test score gains.
We conclude with an assessment of prospects for increasing Head Start’s rate of return via
outreach to new populations. Our estimates suggest that expansions of Head Start could boost
the program’s rate of return provided that the proposed technology for increasing enrollment (e.g.
improved transportation services) is not too costly. We also use our estimated selection model
to examine the robustness of our results to rationing of competing preschools. Rationing implies
that competing subsidized preschools do not contract when Head Start expands, which shuts down
a form of public savings. On the other hand, expanding Head Start generates opportunities for
new children to fill vacated seats in substitute programs. Our estimates indicate that the effect on
test scores (and therefore earnings) of moving children from home care to competing preschools is
substantial, leading us to conclude that rationing is unlikely to undermine the favorable estimated
rates of return found in our baseline analysis.
The rest of the paper is structured as follows. Section II provides background on Head Start.
Section III describes the HSIS data and basic experimental impacts. Section IV presents evidence on
substitution patterns. Section V introduces a theoretical framework for assessing public programs
with close substitutes. Section VI provides a cost-benefit analysis of Head Start. Section VII
develops our econometric selection model and discusses identification and estimation. Section VIII
reports estimates of the model. Section IX simulates the effects of structural program reforms.
Section X concludes.
II. Background on Head Start
Head Start provides preschool for disadvantaged children in the United States. The program is
funded by federal grants awarded to local public or private organizations. Grantees are required
to match at least 20 percent of their Head Start awards from other sources and must meet a set of
program-wide performance criteria. Eligibility for Head Start is generally limited to children from
households below the federal poverty line, though families above this threshold may be eligible if
they meet other criteria such as participation in the Temporary Aid for Needy Families (TANF)
program. Up to 10 percent of a Head Start center’s enrollment can also come from higher-income
families. The program is free: Head Start grantees are prohibited from charging families fees for
services (US DHHS 2014). It is also oversubscribed: in 2002, 85 percent of Head Start participants
4
attended programs with more applicants than available seats (Puma et al. 2010).
Head Start is not the only form of subsidized preschool available to poor families. Preschool par-
ticipation rates for disadvantaged children have risen over time as cities and states expanded their
public preschool offerings (Cascio and Schanzenbach 2013). Moreover, the Child Care Development
Fund program provides block grants that finance childcare subsidies for low-income families, often in
the form of childcare vouchers that can be used for center-based preschool (US DHHS 2012). Most
states also use TANF funds to finance additional childcare subsidies (Schumacher, Greenberg and
Duffy 2001). Because Head Start services are provided by local organizations who themselves must
raise outside funds, it is unclear to what extent Head Start and other public preschool programs
actually differ in their education technology.
A large non-experimental literature suggests that Head Start produced large short- and long-
run benefits for early cohorts of program participants. Several studies estimate the effects of Head
Start by comparing program participants to their non-participant siblings (Currie and Thomas
1995; Garces, Thomas and Currie 2002; Deming 2009). Results from this research design show
positive short run effects on test scores and long run effects on educational attainment, earnings
and crime. Other studies exploit discontinuities in Head Start program rules to infer program effects
(Ludwig and Miller 2007; Carneiro and Ginja 2014). These studies show longer run improvements
in health outcomes and criminal activity.
In contrast to these non-experimental estimates, results from a recent randomized controlled
trial reveal smaller, less-persistent effects. The 1998 Head Start reauthorization bill included a
congressional mandate to determine the effects of the program. This mandate resulted in the HSIS:
an experiment in which more than more than 4,000 applicants were randomly assigned via lottery
to either a treatment group with access to Head Start or a control group without access in the Fall
of 2002. The experimental results showed that a Head Start offer increased measures of cognitive
achievement by roughly 0.1 standard deviations during preschool, but that these gains faded out
by kindergarten. Moreover, the experiment showed little evidence of effects on non-cognitive or
health outcomes (Puma et al. 2010; Puma, Bell and Heid 2012). These results suggest both
smaller short-run effects and faster fadeout than non-experimental estimates for earlier cohorts.
Scholars and policymakers have generally interpreted the HSIS results as evidence that Head Start
is ineffective and in need of reform (Barnett 2011). The experimental results have also been cited
in the popular media to motivate calls for dramatic restructuring or elimination of the program
(Klein 2011; Stossel 2014).1
Differences between the HSIS results and the non-experimental literature could be due to
1Subsequent analyses of the HSIS data suggest caveats to this negative interpretation, but do not overturn thefinding of modest mean test score impacts accompanied by rapid fadeout. Gelber and Isen (2013) find persistenteffects on parental engagement with children. Bitler, Domina and Hoynes (2014) find larger experimental impactsat low quantiles of the test score distribution. These quantile treatment effects fade out by first grade, though thereis some evidence of persistent effects at the bottom of the distribution for Spanish-speakers. Walters (2015) findsevidence of substantial heterogeneity in impacts across experimental sites and investigates the relationship betweenthis heterogeneity and observed program characteristics. Walters finds smaller effects for Head Start centers thatdraw more children from other preschools rather than home care, a finding we explore in more detail here.
5
changes in program effectiveness over time or to selection bias in non-experimental sibling compar-
isons. Another explanation, however, is that these two research designs identify different parame-
ters. Most non-experimental analyses have focused on recovering the effect of Head Start relative to
home care. In contrast, the HSIS measures the effect of Head Start relative to a mix of alternative
care environments, including other preschools.
III. Data and Experimental Impacts
Before turning to an analysis of program substitution issues, we first describe the HSIS data and
report experimental impacts on test scores and program compliance.
III.A. Data
Our core analysis sample includes 3,571 HSIS applicants with non-missing baseline characteristics
and Spring 2003 test scores. Appendix A describes construction of this sample (this and all other
Appendix material appears in the Online Appendix). The outcome of interest is a summary index of
cognitive test scores that averages Woodcock Johnson III (WJIII) test scores with Peabody Picture
and Vocabulary Test (PPVT) scores, with each test normed to have mean zero and variance one
in the control group by cohort and year. We use WJIII and PPVT scores because these are among
the most reliable tests in the HSIS data; both are also available in each year of the experiment,
allowing us to produce comparable estimates over time.
Table I provides summary statistics for our analysis sample. The HSIS experiment included
two age cohorts: 55 percent of applicants were randomized at age 3 and could attend Head Start
for up to two years, while the remaining 45 percent were randomized at age 4 and could attend
for up to one year. The demographic information in Table I shows that the Head Start population
is disadvantaged. Less than half of Head Start applicants live in two-parent households, and the
average applicant’s household earns about 90 percent of the federal poverty line. Column (2) of
Table I compares these and other baseline characteristics for the HSIS treatment and control groups
to check balance in randomization. The results here indicate that randomization was successful:
baseline characteristics were similar for offered and non-offered applicants.2
Columns (3) through (5) of Table I report summary statistics for children attending Head
Start, other preschool centers, and no preschool.3 Children in other preschools tend to be less
disadvantaged than children in Head Start or no preschool, though most differences between these
groups are modest. The other preschool group has a lower share of high school dropout mothers, a
higher share of mothers who attended college, and higher average household income than the Head
2Random assignment in the HSIS occurred at the Head Start center level, and offer probabilities differed acrosscenters. We weight all models by the inverse probability of a child’s assignment, calculated as the site-specific fractionof children assigned to the treatment group. Because the numbers of treatment and control children at each centerwere fixed in advance, this is an error-free measure of the probability of an offer (Puma et al. 2010).
3Preschool attendance is measured from the HSIS “focal arrangement type” variable, which combines informationfrom parent interviews and teacher/care provider interviews to construct a summary measure of the childcare setting.See Appendix A for details.
6
Start and no preschool groups. Children in other preschools outscore the other groups by about
0.1 standard deviations on a baseline summary index of cognitive skills. The other preschool group
also includes a relatively large share of four-year-olds, likely reflecting the fact that alternative
preschool options are more widely available for four-year-olds (Cascio and Schanzenbach 2013).
III.B. Experimental Impacts
Table II reports experimental impacts on test scores. Columns (1), (4) and (7) report intent-to-
treat impacts of the Head Start offer, separately by year and age cohort. To increase precision,
we regression-adjust these treatment/control differences using the baseline characteristics in Table
I.4 The intent-to-treat estimates mirror those previously reported in the literature (e.g., Puma et
al. 2010). In the first year of the experiment, children offered Head Start scored higher on the
summary index. For example, three-year-olds offered Head Start gained 0.19 standard deviations in
test score outcomes relative to those denied Head Start. The corresponding effect for four-year-olds
is 0.14 standard deviations. However, these gains diminish rapidly: the pooled impact falls to a
statistically insignificant 0.02 standard deviations by year three. Our data includes a fourth year
of follow-up for the three-year-old cohort. Here too, the intent-to-treat is small and statistically
insignificant (0.038 standard deviations).
Interpretation of these intent-to-treat impacts is clouded by noncompliance with random as-
signment. Columns (2), (5) and (8) of Table II report first-stage effects of assignment to Head Start
on the probability of participating in Head Start and Columns (3), (6) and (9) report instrumental
variables (IV) estimates, which scale the intent-to-treat estimates by the first stage estimates.5
These estimates can be interpreted as local average treatment effects (LATEs) for “compliers” –
children who respond to the Head Start offer by enrolling in Head Start. Assignment to Head Start
increases the probability of participation by two-thirds in the first year after random assignment.
The corresponding IV estimate implies that Head Start attendance boosts first-year test scores by
0.247 standard deviations.
Compliance for the three-year-old cohort falls after the first year as members of the control
group reapply for Head Start, resulting in larger standard errors for estimates in later years of the
experiment. The first stage for three-year-olds falls to 0.36 in the second year, while the intent-
to-treat falls roughly in proportion, generating a second-year IV estimate of 0.245 for this cohort.
Estimates in years three and four are statistically insignificant and imprecise. The fourth-year
4The control vector includes gender, race, assignment cohort, teen mother, mother’s education, mother’s maritalstatus, presence of both parents, an only child dummy, a Spanish language indicator, dummies for quartiles of familyincome and missing income, urban status, an indicator for whether the Head Start center provides transportation,an index of Head Start center quality, and a third-order polynomial in baseline test scores.
5Here we define Head Start participation as enrollment at any time prior to the test. This definition includesattendance at Head Start centers outside the experimental sample. An experimental offer may cause some childrento switch from an out-of-sample center to an experimental center; if the quality of these centers differs, the exclusionrestriction required for our IV approach is violated. Appendix Table A.I compares characteristics of centers attendedby children in the control group (always takers) to those of the experimental centers to which these children applied.These two groups of centers are very similar, suggesting that substitution between Head Start centers is unlikely tobias our estimates.
7
estimate for the three-year-old cohort (corresponding to first grade) is 0.110 standard deviations,
with a standard error of 0.098. The corresponding first grade estimate for four year olds is 0.081
with a standard error of 0.060. Notably, the 95-percent confidence intervals for first-grade impacts
include effects as large as 0.2 standard deviations for four-year-olds and 0.3 standard deviations
for three-year-olds. These results show that although the longer-run estimates are insignificant,
they are also imprecise due to experimental noncompliance. Evidence for fadeout is therefore less
definitive than the naive intent-to-treat estimates suggest. This observation helps to reconcile
the HSIS results with observational studies based on sibling comparisons, which show effects that
partially fade out but are still detectable in elementary school (Currie and Thomas, 1995; Deming,
2009).6
IV. Program Substitution
We now turn to documenting program substitution in the HSIS and how it influences our results. It
is helpful to develop some notation to describe the role of alternative care environments. Each Head
Start applicant participates in one of three possible treatments: Head Start, which we label h; other
center-based preschool programs, which we label c; and no preschool (i.e., home care), which we
label n. Let Zi ∈ {0, 1} indicate whether household i has a Head Start offer, and Di(z) ∈ {h, c, n}denote household i’s potential treatment status as a function of the offer. Then observed treatment
status can be written Di = Di(Zi).
The structure of the HSIS leads to natural theoretical restrictions on substitution patterns. We
expect a Head Start offer to induce some children who would otherwise participate in c or n to
enroll in Head Start. By revealed preference, no child should switch between c and n in response
to a Head Start offer, and no child should be induced by an offer to leave Head Start. These
restrictions can be expressed succinctly by the following condition:
Di(1) 6= Di(0) =⇒ Di(1) = h, (1)
which extends the monotonicity assumption of Imbens and Angrist (1994) to a setting with multiple
counterfactual treatments. This restriction states that anyone who changes behavior as a result of
the Head Start offer does so to attend Head Start.7
Under restriction (1), the population of Head Start applicants can be partitioned into five groups
defined by their values of Di(1) and Di(0):
1. n-compliers: Di(1) = h, Di(0) = n,
6One might also be interested in the effects of Head Start on non-cognitive outcomes, which appear to be importantmediators of the effects of early childhood programs in other contexts (Chetty et al. 2011; Heckman, Pinto andSavelyev 2013). The HSIS includes short-run parent-reported measures of behavior and teacher-reported measuresof teacher/student relationships, and Head Start appears to have no impact on these outcomes (Puma et al. 2010;Walters 2015). The HSIS non-cognitive outcomes differ significantly from those analyzed in previous studies, however,and it is unclear whether they capture the same skills.
7See Engberg et al. (2014) for discussion of related restrictions in the context of attrition from experimental data.
8
2. c-compliers: Di(1) = h, Di(0) = c,
3. n-never takers: Di(1) = Di(0) = n,
4. c-never takers: Di(1) = Di(0) = c,
5. always takers: Di(1) = Di(0) = h.
The n- and c- compliers switch to Head Start from home care and competing preschools, respec-
tively, when offered a seat. The two groups of never takers choose not to attend Head Start
regardless of the offer. Always takers manage to enroll in Head Start even when denied an offer,
presumably by applying to other Head Start centers outside the HSIS sample.
Using this rubric, the group of children enrolled in alternative preschool programs is a mixture
of c-never takers and c-compliers denied Head Start offers. Similarly, the group of children in home
care includes n-never takers and n-compliers without offers. The two complier subgroups switch
into Head Start when offered admission; as a result, the set of children enrolled in Head Start is a
mixture of always takers and the two groups of offered compliers.
IV.A. Substitution Patterns
Table III presents empirical evidence on substitution patterns by comparing program participation
choices for offered and non-offered households. In the first year of the experiment 8.3 percent of
households decline Head Start offers in favor of other preschool centers; this is the share of c-never
takers. Similarly, column (3) shows that 9.5 percent of households are n-never takers. As can be
seen in column (4), 13.6 percent of households manage to attend Head Start without an offer, which
is the share of always takers. The Head Start offer reduces the share of children in other centers
from 31.5 percent to 8.3 percent, and reduces the share of children in home care from 55 percent
to 9.5 percent. This implies that 23.2 percent of households are c-compliers, and 45.5 percent are
n-compliers.
Notably, in the first year of the experiment, three year olds have uniformly higher participation
rates in Head Start and lower participation rates in competing centers, which likely reflects the
fact that many state provided programs only accept four year olds. In the second year of the
experiment, participation in Head Start drops among children in the three year old cohort with a
program offer, suggesting that many families enrolled in the first year decided that Head Start was
a bad match for their child. We also see that Head Start enrollment rises among those families that
did not obtain an offer in the first round, which reflects reapplication behavior.
IV.B. Interpreting IV
How do the substitution patterns displayed in Table III affect the interpretation of the HSIS test
score impacts? Let Yi(d) denote child i’s potential test score if he or she participates in treatment
d ∈ {h, c, n}. Observed scores are given by Yi = Yi(Di). We shall assume that Head Start offers
affect test scores only through program participation choices. Under assumption (1), IV identifies
9
a variant of the Local Average Treatment Effect (LATE) of Imbens and Angrist (1994), giving the
average effect of Head Start participation for compliers relative to a mix of program alternatives.
Specifically, under (1) and excludability of Head Start offers:
E [Yi|Zi = 1]− E [Yi|Zi = 0]
E [1 {Di = h} |Zi = 1]− E [1 {Di = h} |Zi = 0]= E [Yi(h)− Yi(Di(0))|Di(1) = h,Di(0) 6= h]
≡ LATEh.(2)
The left-hand side of (2) is the population coefficient from a model that instruments Head Start
attendance with the Head Start offer. This equation implies that the IV strategy employed in Table
II yields the average effect of Head Start for compliers relative to their own counterfactual care
choices, a quantity we label LATEh.
We can decompose LATEh into a weighted average of “subLATEs” measuring the effects of
Head Start for compliers drawn from specific counterfactual alternatives as follows:
LATEh = ScLATEch + (1− Sc)LATEnh, (3)
where LATEdh ≡ E [Yi(h)− Yi(d)|Di(1) = h,Di(0) = d] gives the average treatment effect on
d−compliers for d ∈ {c, n} and the weight Sc ≡ P (Di(1)=h,Di(0)=c)P (Di(1)=h,Di(0)6=h) gives the fraction of compli-
ers drawn from other preschools.
Column (7) of Table III reports estimates of Sc by year and cohort, computed as minus the ratio
of the Head Start offer’s effect on other preschool attendance to its effect on Head Start attendance
(see Appendix B). In the first year of the HSIS experiment, 34 percent of compliers would have
otherwise attended competing preschools. IV estimates combine effects for these compliers with
effects for compliers who would not otherwise attend preschool.
As detailed in Appendix D, the competing preschools attended by c-compliers are largely
publicly funded and provide services roughly comparable to Head Start. The modal alternative
preschool is a state-provided preschool program, while others receive funding from a mix of public
sources (see Appendix Table A.II). Moreover, it is likely that even Head Start-eligible children at-
tending private preschool centers receive public funding (e.g., through CCDF or TANF subsidies).
We next consider the implications of substitution from these alternative preschools for assessments
of Head Start’s cost-effectiveness.
V. A Model of Head Start Provision
In this section, we develop a model of Head Start participation with the goal of conducting a cost-
benefit analysis that acknowledges the presence of publicly subsidized program substitutes. Our
model is highly stylized and focuses on obtaining an estimable lower bound on the rate of return
to potential reforms of Head Start measured in terms of lifetime earnings. The analysis ignores
redistributional motives and any effects of human capital investment on criminal activity (Lochner
and Moretti 2004; Heckman et al. 2010), health (Deming 2009; Carneiro and Ginja 2014), or grade
10
repetition (Currie 2001). Adding such features would tend to raise the implied return to Head
Start. We also abstract from parental labor supply decisions because prior analyses of the Head
Start Impact Study find no short term impacts on parents’ work decisions (Puma et al. 2010).8
Again, incorporating parental labor supply responses would likely raise the program’s rate of return.
Our discussion emphasizes that the cost-effectiveness of Head Start is contingent upon assump-
tions regarding the structure of the market for preschool services and the nature of the specific
policy reforms under consideration. Building on the heterogeneous effects framework of the previ-
ous section, we derive expressions for policy relevant “sufficient statistics” (Chetty 2009) in terms
of causal effects on student outcomes. Specifically, we show that a variant of the Local Average
Treatment Effect concept of Imbens and Angrist (1994) is policy relevant when considering pro-
gram expansions in an environment where slots in competing preschools are not rationed. With
rationing, a mix of LATEs becomes relevant, which poses challenges to identification with the HSIS
data. When considering reforms to Head Start program features that change selection into the
program, the policy relevant parameter is shown to be a variant of the Marginal Treatment Effect
concept of Heckman and Vytlacil (1999).
V.A. Setup
Consider a population of households, indexed by i, each with a single preschool-aged child. Each
household can enroll its child in Head Start, a competing preschool program (e.g., state subsidized
preschool), or care for the child at home. The government rations Head Start participation via
program offers Zi, which arrive at random via lottery with probability δ ≡ P (Zi = 1). Offers are
distributed in a first period. In a second period, households make enrollment decisions. Tenacious
applicants who have not received an offer can enroll in Head Start by exerting additional effort.
We begin by assuming that competing programs are not rationed and then relax this assumption
below.
Each household has utility over its enrollment options given by the function Ui (d, z). The
argument d ∈ {h, c, n} indexes child care environments, while the argument z ∈ {0, 1} indexes offer
status. Head Start offers raise the value of Head Start and have no effect on the value of other
options, so that:
Ui (h, 1) > Ui (h, 0) , Ui (c, z) = Ui (c) , Ui (n, z) = Ui (n).
Household i’s enrollment choice, as a function of its offer status z, is given by:
Di (z) = arg maxd∈{h,c,n}
Ui (d, z) .
It is straightforward to show that this model satisfies the monotonicity restriction (1). Since offers
are assigned at random, market shares for the three care environments can be written P (Di = d) =
8We replicate this analysis for our sample in Table A.III, which shows that a Head Start offer has no effect onthe probability that a child’s mother works or on the likelihood of working full- vs. part-time. Recent work by Long(2015) suggests that Head Start may have small effects on full- vs. part-time work for mothers of three-year-olds.
11
δP (Di (1) = d) + (1− δ)P (Di (0) = d).
V.B. Benefits and Costs
Debate over the effectiveness of educational programs often centers on their test score impacts. A
standard means of valuing such impacts is in terms of their effects on later life earnings (Heckman
et al. 2010; Heckman, Pinto and Savelyev 2013; Chetty et al. 2011; Chetty, Friedman and Rockoff
2014b).9 Let the symbol B denote the total after-tax lifetime income of a cohort of children. We
assume that B is linked to test scores by the equation:
B = B0 + (1− τ) pE [Yi] , (4)
where p gives the market price of human capital, τ is the tax rate faced by the children of eligible
households, and B0 is an intercept reflecting how test scores are normed. Our focus on mean test
scores neglects distributional concerns which may lead us to undervalue Head Start’s test score
impacts (see Bitler, Domina and Hoynes 2014).
The net costs to government of financing preschool are given by:
C = C0 + φhP (Di = h) + φcP (Di = c)− τpE [Yi] , (5)
where the term C0 reflects fixed costs of administering the program and φh gives the administrative
cost of providing Head Start services to an additional child. Likewise, φc gives the administrative
cost to government of providing competing preschool services (which often receive public subsidies)
to another student. The term τpE [Yi] captures the revenue generated by taxes on the adult earnings
of Head Start-eligible children. This formulation abstracts from the fact that program outlays must
be determined before the children enter the labor market and begin paying taxes, a complication
we will adjust for in our empirical work via discounting.
V.C. Changing Offer Probabilities
Consider now the effects of adjusting Head Start enrollment by changing the rationing probability
δ. An increase in δ draws additional households into Head Start from competing programs and
home care. As shown in Appendix C, the effect of a change in the offer rate δ on average test scores
is given by:
∂E [Yi]
∂δ= LATEh︸ ︷︷ ︸
Effect on compliers
× ∂P (Di = h)
∂δ︸ ︷︷ ︸Complier density
. (6)
In words, the aggregate impact on test scores of a small increase in the offer rate equals the average
impact of Head Start on complier test scores times the measure of compliers. By the arguments
9Appendix C considers how such valuations should be adjusted when test score impacts yield labor supply re-sponses.
12
in Section IV, both LATEh and ∂∂δP (Di = h) = P (Di (1) = h,Di (0) 6= h) are identified by the
HSIS experiment. Hence, (6) implies that the hypothetical effects of a market-level change in offer
probabilities can be inferred from an individual-level randomized trial with a fixed offer probability.
This convenient result follows from the assumption that Head Start offers are distributed at random
and that δ does not directly enter the alternative specific choice utilities, which in turn implies that
the composition of compliers (and hence LATEh) does not change with δ. Below we explore how
this expression changes when the composition of compliers responds to a policy change.
From (4), the marginal benefit of an increase in δ is given by:
∂B
∂δ= (1− τ) pLATEh ×
∂P (Di = h)
∂δ.
The offsetting marginal cost to government of financing such an expansion can be written:
∂C
∂δ=
φh︸︷︷︸Provision Cost
− φcSc︸︷︷︸Public Savings
− τpLATEh︸ ︷︷ ︸Added Revenue
× ∂P (Di = h)
∂δ. (7)
This cost consists of the measure of compliers times the administrative cost φh of enrolling them
in Head Start minus the probability Sc that a complying household comes from a substitute
preschool times the expected government savings φc associated with reduced enrollment in sub-
stitute preschools. The quantity φh − φcSc can be viewed as a local average treatment effect of
Head Start on government spending for compliers. Subtracted from this effect is any extra revenue
the government gets from raising the productivity of the children of complying households.
The ratio of marginal impacts on after-tax income and government costs gives the marginal
value of public funds (Mayshar 1990; Hendren 2016), which we can write:
MV PFδ ≡∂B/∂δ
∂C/∂δ=
(1− τ) pLATEhφh − φcSc − τpLATEh
. (8)
The MV PFδ gives the value of an extra dollar spent on Head Start net of fiscal externalities. These
fiscal externalities include reduced spending on competing subsidized programs (captured by the
term φcSc) and additional tax revenue generated by higher earnings (captured by τpLATEh). As
emphasized by Hendren (2016), the MVPF is a metric that can easily be compared across programs
without specifying exactly how program expenditures are to be funded. In our case, if MV PFδ > 1
a dollar of government spending can raise the after-tax incomes of children by more than a dollar,
which is a robust indicator that program expansions are likely to be welfare improving.
An important lesson of the above analysis is that identifying costs and benefits of changes to offer
probabilities does not require identification of treatment effects relative to particular counterfactual
care states. Specifically, it is not necessary to separately identify the subLATEs. This result
shows that program substitution is not a design flaw of evaluations. Rather, it is a feature of the
policy environment that needs to be considered when computing the likely effects of changes to
13
policy parameters. Here, program substitution alters the usual logic of program evaluation only by
requiring identification of the complier share Sc, which governs the degree of public savings realized
as a result of reducing subsidies to competing programs.
V.D. Rationed Substitutes
The above analysis presumes that Head Start expansions yield reductions in the enrollment of
competing preschools. However, if competing programs are also over-subscribed, the slots vacated
by c−compliers may be filled by other households. This will reduce the public savings associated
with Head Start expansions but also generate the potential for additional test score gains.
With rationing in substitute preschool programs, the utility of enrollment in c can be written
Ui(c, Zic), where Zic indicates an offered slot in the competing program. Household i’s enrollment
choice, Di(Zih, Zic), depends on both the Head Start offer Zih and the competing program offer.
Assume these offers are assigned independently with probabilities δh and δc, but that δc adjusts to
changes in δh to keep total enrollment in c constant. In addition, assume that all children induced
to move into c as a result of an increase in δc come from n rather than h.
We show in Appendix C that under these assumptions the marginal impact of expanding Head
Start becomes:
∂E [Yi]
∂δh= (LATEh + LATEnc)×
∂P (Di = h)
∂δh,
where LATEnc ≡ E [Yi (c)− Yi (n) |Di (Zih, 1) = c,Di (Zih, 0) = n]. Intuitively, every c−complier
now spawns a corresponding n-to-c complier who fills the vacated preschool slot.
The marginal cost to government of inducing this change in test scores can be written:
∂C
∂δh= [φh − τp (LATEh + LATEnc)]×
∂P (Di = h)
∂δh.
Relative to (7), rationing eliminates the public savings from reduced enrollment in substitute pro-
grams but adds another fiscal externality in its place: the tax revenue associated with any test
score gains of shifting children from home care to competing preschools. The resulting marginal
value of public funds can be written:
MV PFδ,rat =(1− τ)p (LATEh + LATEnc · Sc)φh − τp (LATEh + LATEnc · Sc)
. (9)
While the impact of rationed substitutes on the marginal value of public funds is theoretically
ambiguous, there is good reason to expect MV PFδ,rat > MV PFδ in practice. Specifically, ignoring
rationing of competing programs yields a lower bound on the rate of return to Head Start expansions
if Head Start and other forms of center based care have roughly comparable effects on test scores and
competing programs are cheaper (see Appendix C). Unfortunately, effects for n-to-c compliers are
not nonparametrically identified by the HSIS experiment since one cannot know which households
14
that care for their children at home would otherwise choose to enroll them in competing preschools.
We return to this issue in Section IX.
V.E. Structural Reforms
An important assumption in the previous analyses is that changing lottery probabilities does not
alter the mix of program compliers. Consider now the effects of altering some structural feature f of
the Head Start program that households value but which has no impact on test scores. For example,
Executive Order #13330, issued by President Bush in February 2004, mandated enhancements to
the transportation services provided by Head Start and other federal programs (Federal Register
2004). Expanding Head Start transportation services should not directly influence educational
outcomes but may yield a compositional effect by drawing in households from a different mix of
counterfactual care environments.10 By shifting the composition of program participants, changes
in f may boost the program’s rate of return.
To establish notation, we assume that households now value Head Start participation as:
Ui(h, Zi, f) = Ui (h, Zi) + f.
Utilities for other preschools and home care are assumed to be unaffected by changes in f . This
implies that increases in f make Head Start more attractive for all households. For simplicity, we
return to our prior assumption that competing programs are not rationed. As shown in Appendix
C, the assumption that f has no effect on potential outcomes implies:
∂E [Yi]
∂f= MTEh ×
∂P (Di = h)
∂f,
where
MTEh ≡E [Yi(h)− Yi (c) |Ui(h, Zi) + f = Ui(c), Ui(c) > Ui(n)]−→S c
+ E [Yi(h)− Yi(n)|Ui(h, Zi) + f = Ui(n), Ui(n) > Ui(c)] (1−−→S c),
and−→S c gives the share of children on the margin of participating in Head Start who prefer the
competing program to preschool non-participation. Following the terminology in Heckman, Urzua
and Vytlacil (2008), the marginal treatment effect MTEh is the average effect of Head Start on test
scores among households indifferent between Head Start and the next best alternative. This is a
marginal version of the result in (6), where integration is now over a set of children who may differ
from current program compliers in their mean impacts. Like LATEh, MTEh is a weighted average
of “subMTEs” corresponding to whether the next best alternative is home care or a competing
preschool program. The weight−→S c may differ from Sc if inframarginal participants are drawn from
different sources than marginal ones.
10This presumes that peer effects are not an important determinant of test score outcomes. Large changes in thestudent composition of Head Start classrooms could potentially change the effectiveness of Head Start.
15
The test score effects of improvements to the program feature must be balanced against the
costs. We suppose that changing program features changes the average cost φh (f) of Head Start
services, so that the net costs to government of financing preschool are now:
C (f) = C0 + φh (f)P (Di = h) + φcP (Di = c)− τpE [Yi] , (10)
where ∂φh (f) /∂f ≥ 0. The marginal costs to government (per program complier) of a change in
the program feature can be written:
∂C (f) /∂f
∂P (Di = h) /∂f= φh︸︷︷︸
Marginal Provision Cost
+∂φh (f) /∂f
∂ lnP (Di = h) /∂f︸ ︷︷ ︸Inframarginal Provision Cost
− φc−→S c︸ ︷︷ ︸
Public Savings
− τpMTEh︸ ︷︷ ︸ .Added Revenue
(11)
The first term on the right hand side of (11) gives the administrative cost of enrolling another child.
The second term gives the increased cost of providing inframarginal families with the improved
program feature. The third term is the expected savings in reduced funding to competing preschool
programs. And the final term gives the additional tax revenue raised by the boost in the marginal
enrollee’s human capital.
Letting η ≡ ∂ lnφ(f)/∂f∂ lnP (Di=h)/∂f
be the elasticity of costs with respect to enrollment, we can write
the marginal value of public funds associated with a change in program features as:
MV PFf ≡∂B/∂f
∂C (f) /∂f=
(1− τ) pMTEh
φh (1 + η)− φc−→S c − τpMTEh
. (12)
As in our analysis of optimal program scale, equation (11) shows that it is not necessary to sepa-
rately identify the “subMTEs” that compose MTEh to determine the optimal value of f . Rather,
it is sufficient to identify the average causal effect of Head Start for children on the margin of
participation along with the average net cost of an additional seat in this population.
VI. A Cost-Benefit Analysis of Program Expansion
We next use the HSIS data to conduct a formal cost-benefit analysis of changes to Head Start’s
offer rate under the assumption that competing programs are not rationed (we consider the case
with rationing in Section IX). Our analysis focuses on the costs and benefits associated with one
year of Head Start attendance.11 This exercise requires estimates of each term in equation (7). We
11Children in the three-year-old cohort who enroll for two years generate additional costs. As shown in TableIII, a Head Start offer raises the probability of enrollment in the second year by only 0.16, implying that first-yearoffers have modest net effects on second-year costs. Enrollment for two years may also generate additional benefits,but these cannot be estimated without strong assumptions on the Head Start dose/response function. We thereforeconsider only first-year benefits and costs.
16
estimate LATEh and Sc from the HSIS, and calibrate the remaining parameters using estimates
from the literature. Calibrated parameters are listed in panel A of Table IV. To be conservative,
we deliberately bias our calibrations towards understating Head Start’s benefits and overstating its
costs in order to arrive at a lower bound rate of return. Further details of the calibration exercise
are provided in Appendix D.
Panel B of Table IV reports estimates of the marginal value of public funds associated with an
expansion of Head Start offers (MV PFδ). To account for sampling uncertainty in our estimates
of LATEh and Sc we report standard errors calculated via the delta method. Because asymptotic
delta method approximations can be inaccurate when the statistic of interest is highly nonlinear
(Lafontaine and White 1986), we also report bootstrap p-values from one-tailed tests of the null
hypothesis that the benefit/cost ratio is less than one.12
The results show that accounting for the public savings associated with enrollment in substitute
preschools has a large effect on the estimated social value of Head Start. We conduct cost-benefit
analyses under three assumptions: φc is either zero, 50%, or 75% of φh. Our preferred calibration
uses φc = 0.75φh, reflecting that fact that roughly 75 percent of competing centers are publicly
funded (see Appendix D). Setting φc = 0 yields a MV PFδ of 1.10. Setting φc equal to 0.5φh and
0.75φh raises the MV PFδ to 1.50 and 1.84, respectively. This indicates that the fiscal externality
generated by program substitution has an important effect on the social value of Head Start.
Bootstrap tests decisively reject values of MV PFδ less than one when φc = 0.5φh or 0.75φh.
Notably, our preferred estimate of 1.84 is well above the estimated rates of return of comparable
expenditure programs summarized in Hendren (2016), and comparable to the marginal value of
public funds associated with increases in the top marginal tax rate (between 1.33 and 2.0).
To assess the sensitivity of our results to alternative assumptions regarding the relationship
between test score effects and earnings, Table IV also reports “breakeven” relationships between
test scores and earnings that set MV PFδ equal to one for each value of φc. When φc = 0 the
breakeven earnings effect is 9 percent per test score standard deviation, only slightly below our
calibrated value of 10 percent. This indicates that when substitution is ignored, Head Start is
close to breaking even and small changes in assumptions will yield values of MV PFδ below one.
Increasing φc to 0.5φh or 0.75φh reduces the breakeven earnings effect to 8 percent or 7 percent,
respectively. The latter figure is well below comparable estimates in the recent literature, such
as estimates from Chetty et al.’s (2011) study of the Tennessee STAR class size experiment (13
percent; see Appendix Table A.IV). Therefore, after accounting for fiscal externalities, Head Start’s
costs are estimated to exceed its benefits only if its test score impacts translate into earnings gains
at a lower rate than similar interventions for which earnings data are available.
12This test is computed by a non-parametric block bootstrap of the studentized t-statistic that resamples HeadStart sites. We have found in Monte Carlo exercises that Delta method confidence intervals for MV PFδ tend toover-cover, while bootstrap-t tests have approximately correct size. This is in accord with theoretical results fromHall (1992) that show bootstrap-t methods yield a higher-order refinement to p-values based upon the standard deltamethod approximation.
17
VII. Beyond LATE
Thus far, we have evaluated the return to a marginal expansion of Head Start under the assumption
that the mix of compliers can be held constant. However, it is likely that major reforms to Head
Start would entail changes to program features such as accessibility that could in turn change the
mix of program compliers. To evaluate such reforms, it is necessary to predict how selection into
Head Start is likely to change and how this impacts the program’s rate of return.
VII.A. Instrumental Variables Estimates of SubLATEs
A first way in which selection into Head Start could change is if the mix of compliers drawn from
home care and competing preschools were altered while holding the composition of those two groups
constant. To predict the effects of such a change on the program’s rate of return we need to estimate
the “subLATEs” in equation (3).
One approach to identifying subLATEs is to conduct two-stage least squares (2SLS) estima-
tion treating Head Start enrollment and enrollment in other preschools as separate endogenous
variables. A common strategy for generating instruments in such settings is to interact an exper-
imentally assigned program offer with observed covariates or site indicators (e.g., Kling, Liebman
and Katz 2007; Abdulkadiroglu, Angrist and Pathak 2014). Such approaches can secure identifi-
cation in a constant effects framework but, as we demonstrate in Appendix E, will typically fail to
identify interpretable parameters if the subLATEs themselves vary across the interacting groups
(see Kirkeboen, Leuven and Mogstad [forthcoming] and Hull [2015] for related results).
Table V reports 2SLS estimates of the separate effects of Head Start and competing preschools
using as instruments the Head Start offer indicator and its interaction with 8 student- and site-
level covariates likely to capture heterogeneity in compliance patterns.13 These instruments strongly
predict Head Start enrollment but induce relatively weak independent variation in enrollment in
other preschools, with a partial first stage F-statistic of only 1.8. The 2SLS estimates indicate
that Head Start and other centers yield large and roughly equivalent effects on test scores of
approximately 0.4 standard deviations. This finding is roughly in line with the view that preschool
effects are homogeneous and that program substitution simply attenuates instrumental variables
estimates of the effect of Head Start relative to home care. Cautioning against this interpretation
is the 2SLS overidentification test, which strongly rejects the constant effects model, indicating the
presence of substantial effect heterogeneity across covariate groups.
A separate source of variation comes from experimental sites: the HSIS was implemented at
hundreds of individual Head Start centers, and previous studies have shown substantial variation
in treatment effects across these centers (Bloom and Weiland 2015; Walters 2015). Using site
13Previous analyses of the HSIS have shown important effect heterogeneity with respect to baseline scores andfirst language (Bitler, Domina and Hoynes 2014; Bloom and Weiland 2015) so we include these in the list of studentlevel interactions. We also allow interactions with variables measuring whether a child’s center of random assignmentoffers transportation to Head Start, whether the center of random assignment is above the median of the Head Startquality measure, the education level of the child’s mother, whether the child is age four, whether the child is black,and an indicator for family income above the poverty line.
18
interactions as instruments again yields much more independent variation in Head Start enrollment
than in competing preschools.14 However, now the estimated impact of Head Start is smaller and
competing centers are estimated to yield no gains relative to home care. While these site-based
estimates are nominally more precise than those obtained from the covariate interactions, with 183
instruments the asymptotic standard errors may provide a poor guide to the degree of uncertainty
in the parameter estimates (Bound, Jaeger and Baker 1995). We explore this issue in Appendix
Table A.V, which reports limited information maximum likelihood and jackknife IV estimates of
the same model. These approaches yield much larger standard errors and very different point
estimates, suggesting that weak instrument biases are at play here.
To deal with these statistical problems, we use a choice model with discrete unobserved hetero-
geneity (described in more detail later on) to aggregate Head Start sites together into six groups
with similar substitution patterns. Using the site group interactions as instruments yields signifi-
cant independent variation in both Head Start and competing preschool enrollment, and produces
results more in line with those obtained from the covariate interactions. Pooling the site group
and covariate interaction instruments together yields the most precise estimates, which indicate
that both preschool types increase scores relative to home care and that Head Start is slightly
more effective than competing preschools. However, the overidentification test continues to reject
the constant effects model, suggesting that these estimates are still likely to provide a misleading
guide to the underlying subLATEs. Another important limitation of the interacted 2SLS approach
is that it conditions on realized selection patterns and therefore cannot be used to predict the
effects of reforms that change the underlying composition of n− and c− compliers. We now turn
to developing an econometric selection model that allows us to address both of these limitations.
VII.B. Selection Model
Our selection model parametrizes the preferences and potential outcomes introduced in the model
of Section V to motivate a two-step control function estimator. Like the interacted 2SLS approach,
the proposed estimator exploits interactions of the Head Start offer with covariates and site groups
to separately identify the causal effects of care alternatives. Unlike the interacted 2SLS approach,
the control function estimator allows the interacting groups to have different subLATEs that vary
parametrically with the probability of enrolling in Head Start and competing preschools.
Normalizing the value of preschool non-participation to zero, we assume households have utilities
over program alternatives given by:
Ui (h, Zi) = ψh (Xi, Zi) + vih,
Ui (c) = ψc (Xi) + vic, (13)
Ui (n) = 0,
14To avoid extreme imbalance in site size, we grouped the 356 sites in our data into 183 sites with 10 or moreobservations. See Appendix G for details.
19
where Xi denotes the vector of baseline household and experimental site characteristics listed
in Table I and Zi again denotes the Head Start offer dummy. The stochastic components of
utility (vih, vic) reflect unobserved differences in household demand for Head Start and competing
preschools relative to home care. In addition to pure preference heterogeneity, these terms may
capture unobserved constraints such as whether family members are available to help with child
care. We suppose these components obey a multinomial probit specification:
(vih, vic) |Xi, Zi ∼ N
(0,
[1 ρ (Xi)
ρ (Xi) 1
]),
which allows for violations of the Independence from Irrelevant Alternatives (IIA) condition that
underlies multinomial logit selection models such as that of Dubin and McFadden (1984).
As in the Heckman (1979) selection framework, we model endogeneity in participation decisions
by allowing linear dependence of mean potential outcomes on the unobservables that influence
choices. Specifically, for each program alternative d ∈ {h, c, n}, we assume:
E [Yi (d) |Xi, Zi, vih, vic] = µd (Xi) + γdhvih + γdcvic. (14)
The {γdh, γdc} coefficients in (14) describe the nature of selection on unobservables. This spec-
ification can accommodate a variety of selection schemes. For example, if γdh = γh > 0, then
conditional on observables, selection into Head Start is governed by potential outcome levels –
those most likely to participate in Head Start have higher test scores in all care environments. But
if γhh > 0 and γnh = −γhh, then households engage in Roy (1951)-style selection into Head Start
based upon test score gains – those most likely to participate in Head Start receive larger test score
benefits when they switch from home care to Head Start.
By iterated expectations, (14) implies the conditional expectation of realized outcomes can be
written:
E [Yi|Xi, Zi, Di = d] = µd (Xi) + γdhλh (Xi, Zi, d) + γdcλc (Xi, Zi, d) , (15)
where λh (Xi, Zi, Di) ≡ E [vih|Xi, Zi, Di] and λc (Xi, Zi, Di) ≡ E [vic|Xi, Zi, Di] are generalizations
of the standard inverse Mills ratio terms used in the two-step Heckman (1979) selection correction
(see Appendix F for details). These terms depend on Xi and Zi only through the conditional
probabilities of enrolling in Head Start and other preschools.
VII.C. Identification
To demonstrate identification of the selection coefficients {γdh, γdc} it is useful to eliminate the
main effect of the covariates by differencing (15) across values of the program offer Zi as follows:
E [Yi|Xi, Zi = 1, Di = d]− E [Yi|Xi, Zi = 0, Di = d] = γdh [λh (Xi, 1, d)− λh (Xi, 0, d)]
+γdc [λc (Xi, 1, d)− λc (Xi, 0, d)] . (16)
20
This difference measures how selected test score outcomes in a particular care alternative respond
to a Head Start offer. Responses in selected outcomes are driven entirely by compositional changes
– i.e. from compliers switching between alternatives.
With two values of the covariates Xi, equation (16) can be evaluated twice, yielding two equa-
tions in the two unknown selection coefficients. Appendix F details the conditions under which this
system can be solved and provides expressions for the selection coefficients in terms of population
moments. Additive separability of the potential outcomes in observables and unobservables is es-
sential for identification. If the selection coefficients in (16) were allowed to depend on Xi, there
would be two unknowns for every value of the covariates and identification would fail. Heuristically
then, our key assumption is that selection on unobservables works “the same way” for every value of
the covariates, which allows us to exploit variation in selected outcome responses across subgroups
to infer the parameters governing the selection process.
To understand this restriction, suppose (as turns out to be the case) that college educated
mothers are more likely to enroll their children in competing preschools when denied access to
Head Start. Our model allows Head Start and other preschools to have different average treatment
effects on the children of more and less educated mothers. However, it rules out the possibility
that children with college educated mothers sort into Head Start on the basis of potential test
score gains, while children of less educated mothers exhibit no sorting on these gains. As in Brinch,
Mogstad and Wiswall (forthcoming), this restriction is testable when Xi takes more than two values
because it implies we should obtain similar estimates of the selection coefficients based on variation
in different subsets of the covariates. We provide evidence along these lines by contrasting estimates
that exploit site variation with estimates based upon household covariates.
VII.D. Estimation
To make estimation tractable, we approximate ψh (X,Z) and ψc(X) with flexible linear functions.
The non-separability of ψh(X,Z) is captured by linear interactions between Z and the 8 covari-
ates used in our earlier 2SLS analysis. We also allow interactions with the 183 experimental site
indicators but, to avoid incidental parameters problems, constrain the coefficients on those dum-
mies to belong to one of K discrete categories. Results from Bonhomme and Manresa (2015) and
Saggio (2012) suggest that this “grouped fixed effects” approach should yield good finite sample
performance even when some sites have as few as 10 observations. As described in Appendix G,
we choose the number of site groups K using the Bayesian Information Criterion (BIC). Finally,
all of the interacting variables (both site groups and covariates) are allowed to influence the corre-
lation parameter ρ (X). We assume that arctanhρ(X) = 12 ln
(1+ρ(X)1−ρ(X)
)is linear in these variables,
a standard transformation that ensures the correlation is between -1 and 1 (Cox 2008).
The model is fit in two steps. First, we estimate the parameters of the Probit model via sim-
ulated maximum likelihood, evaluating choice probabilities with the Geweke-Hajivassiliou-Keane
(GHK) simulator (Geweke 1989; Hajivassiliou and McFadden 1998; Keane 1994). Models including
site groups are estimated with an algorithm that alternates between maximizing the likelihood and
reassigning groups, described in detail in Appendix G. Second, we use the parameters of the choice
21
model to form control function estimates(λh (Xi, Zi, Di) , λc (Xi, Zi, Di)
), which are then used in
a second step regression of the form:
Yi = θn0 +X ′θnx + γnhλh (Xi, Zi, Di) + γncλc (Xi, Zi, Di) (17)
+ 1 {Di = c}[(θc0 − θn0) +X ′
i (θcx − θnx) + (γch − γnh) λh (Xi, Zi, c) + (γcc − γnc) λc (Xi, Zi, c)]
+ 1 {Di = h}[(θh0 − θn0) +X ′
i (θhx − θnx) + (γhh − γnh) λh (Xi, Zi, h) + (γhc − γnc) λc (Xi, Zi, h)]
+ εi.
The covariate vector Xi is normed to have unconditional mean zero, so the intercepts θd0 can be
interpreted as average potential outcomes. Hence, the differences θh0 − θn0 and θh0 − θc0 capture
average treatment effects of Head Start and other preschools relative to no preschool. To avoid
overfitting, we restrict variables other than the site types and 8 key covariates to have common
coefficients across care alternatives.15 Inference on the second step parameters is conducted via the
nonparametric block bootstrap, clustered by experimental site.
VIII. Model Estimates
VIII.A. Model Parameters
Table VI reports estimates of the full choice model obtained from exploiting both covariates and site
heterogeneity. The BIC selects a specification with six site groups for the full model (see Appendix
Table A.VI), with group shares that vary between 12% and 21% of the sample. These assignments
comprise the site groups used in the earlier 2SLS analysis of Table V.
Columns (1) and (2) of Table VI show the coefficients governing the mean utility of enrollment
in Head Start. We easily reject the null hypothesis that the program offer interaction effects in the
Head Start utility equation are homogenous. Panel A of Column (2) indicates that the effects of
an offer are greater at high-quality centers and lower among non-poor children that would typically
be ineligible for Head Start enrollment.16 Panel B of Column (2) reveals the presence of significant
heterogeneity across site groups in the response to a program offer, which likely reflects unobserved
market features such as the presence or absence of state provided preschool.
Column (4) of Table VI reports the parameters governing the correlation in unobserved tastes
for Head Start and competing programs. The correlation is positive for four of six site groups,
indicating that most households view preschool alternatives as more similar to each other than
to home care. This establishes that the IIA condition underlying logit-based choice models is
empirically violated. While there is some evidence of heterogeneity in the correlation based upon
mother’s education, we cannot reject the joint null hypothesis that the correlation is constant across
covariate groups. However, we easily reject that the correlation is constant across site groups.
The many sources of heterogeneity captured by the choice model yield substantial variation
15This restriction cannot be statistically rejected and has minimal effects on the point estimates.16The quality variable aggregates information on center characteristics (teacher and center director education and
qualifications, class size) and practices (variety of literacy and math activities, home visiting, health and nutrition)measured in interviews with center directors, teachers, and parents of children enrolled in the preschool center.
22
in predicted enrollment shares for Head Start and competing preschools. Appendix Figure A.I
shows that these predictions match variation in choice probabilities across subgroups. Moreover,
diagnostics indicate this variation is adequate to secure separate identification of the second stage
control function coefficients. From (16), the model is under-identified if, for any alternative d, the
control function difference λh (Xi, 1, d) − λh (Xi, 0, d) is linearly dependent on the corresponding
difference λc (Xi, 1, d) − λc (Xi, 0, d). Appendix Figure A.II shows that the deviations from linear
dependence are visually apparent and strongly statistically significant.
Table VII reports second-step estimates of the parameters in (17). Column (1) omits all controls
and simply reports differences in mean test scores across care alternatives (the omitted category
is home care). Head Start students score 0.2 standard deviations higher than students in home
care, while the corresponding difference for students in competing preschools is 0.26 standard
deviations. Column (2) adds controls for baseline characteristics. Because the controls include a
third order polynomial in baseline test scores, this column can be thought of as reporting “value-
added” estimates of the sort that have received renewed attention in the education literature (Kane,
Rockoff and Staiger 2008; Rothstein 2010; Chetty, Friedman and Rockoff 2014a). Surprisingly,
adding these controls does little to the estimated effect of Head Start relative to home care but
improves precision. By contrast, the estimated impact of competing preschools relative to home
care falls significantly once controls are added.
Columns (3)-(5) add control functions adjusting for selection on unobservables based on choice
models with covariates, site groups, or both. Unlike the specifications in previous columns, these
control function terms exploit experimental variation in offer assignment. Adjusting for selection
on unobservables dramatically raises the estimated average impact of Head Start relative to home
care. However, the estimates are fairly imprecise. Imprecision in estimates of average treatment
effects is to be expected given that these quantities are only identified via parametric restrictions
that allow us to infer the counterfactual outcomes of always takers and never takers. Below we
consider average treatment effects on compliers, which are estimated more precisely.
While some of the control function coefficient estimates are also imprecise, we reject the hy-
potheses of no selection on levels (γkd = 0 ∀(k, d)) and no selection on gains (γdk = γjk for d 6= j,
k ∈ {h, c}) in our most precise specification. The selection coefficient estimates exhibit some in-
teresting patterns. One regularity is that estimates of γhh − γnh are negative in all specifications
(though insignificant in the model using site groups only). In other words, children who are more
likely to attend Head Start receive smaller achievement benefits when shifted from home care to
Head Start. This “reverse-Roy” pattern of negative selection on test score gains suggests large
benefits for children with unobservables making them less likely to attend the program.17 Other
preschool programs, by contrast, seem to exhibit positive selection on gains: the estimated dif-
ference γcc − γnc is always positive and in the full model is significant. A possible interpretation
of these patterns is that Head Start is viewed by parents as a preschool of last resort, leading to
17Walters (2014) finds a related pattern of negative selection in the context of charter schools, though in his settingthe fallback potential outcome (as opposed to the charter school outcome) appears to respond positively to unobservedcharacteristics driving program participation.
23
enrollment by the families most desperate to get help with child care. Such households cannot be
selective about whether the local Head Start center is a good match for their child, which results
in lower test score gains. By contrast, households considering enrollment in substitute preschools
may have greater resources that afford them the luxury of being more selective about whether such
programs are a good match for their child.
Estimates of the control function coefficients are very similar in columns (3) and (4), though
the estimates are less precise when only site group interactions are used. This indicates that the
implied nature of selection is the same regardless of whether identification is based on site or
covariate interactions, lending credibility to our assumption that selection works the same way
across subgroups. Also supporting this assumption are the results of score tests for the additive
separability of control functions and covariates, reported in the bottom row of Table VII. These
tests are conducted by regressing residuals from the two-step models on interactions of the control
functions with covariates and site groups, along with the main effects from equation (15). In all
specifications, we fail to reject additive separability at conventional levels (see Appendix F for
some additional goodness of fit tests). While these tests do not have the power to detect all forms
of nonseparability, the correspondence between estimates based on covariate and site variation
suggests that our key identifying assumption is reasonable.
VIII.B. Treatment Effects
Table VIII reports average treatment effects on compliers for each of our selection-corrected models.
The first row uses the model parameters to compute the pooled LATEh, which is nonparametrically
identified by the experiment. The model estimates line up closely with the nonparametric estimate
obtained via IV. Appendix Figure A.III shows that this close correspondence between model and
non-parametric LATEh holds even across different covariate groups, across which there is enormous
heterogeneity. The remaining rows of Table VIII report estimates of average effects for compliers
relative to specific care alternatives (i.e. subLATEs).18 Estimates of the subLATE for n-compliers,
LATEnh, are stable across specifications and indicate that the impact of moving from home care to
Head Start is large – on the order of 0.37 standard deviations. By contrast, estimates of LATEch,
though more variable across specifications, never differ significantly from zero.
Our estimates of LATEnh are somewhat smaller than the average treatment effects of Head
Start relative to home care displayed in Table VII. This is a consequence of the reverse Roy pattern
captured by the control function coefficients: families willing to switch from home care to Head
Start in response to an offer have stronger than average tastes for Head Start, implying smaller
than average gains. We can reject that predicted effects of moving from home care to Head Start
are equal for n-compliers and n-never takers, implying that this pattern is statistically significant
(p = 0.038). Likewise, LATEhc is slightly negative, while the average treatment effect of Head
Start relative to other preschools is positive (0.47 - 0.11). In other words, switching from c to h
18We compute the subLATEs by integrating over the relevant regions of Xi, vih and vic as described in AppendixF.
24
reduces test scores for c-compliers, but would improve the score of an average student. This reflects
a combination of above average tastes for competing preschools among c-compliers and positive
selection on gains into other preschools. Note that the control function coefficients in Table VII
capture selection conditional on covariates and sites, while the treatment effects in Table VIII
average over the distribution of observables for each subgroup. The subLATE estimates show that
the selection patterns discussed above still hold when variation in effects across covariate and site
groups is taken into account.
Another interesting point of comparison is to the 2SLS estimates of Table V. The 2SLS ap-
proach found a somewhat smaller LATEnh than our two-step estimator. It also found that
Head Start preschools were slightly more effective at raising test scores than competing programs
(LATEch > 0), while our full control function estimates suggest the opposite. Importantly, the
control function estimates corroborate the failed overidentification tests of Table V by detecting
substantial heterogeneity in the underlying subLATEs. This can be seen in the last four rows of
Table VIII, which report estimates for the top and bottom quintiles of the model-predicted distri-
bution of LATEh (see Appendix F for details). Fixing each group’s Sc at the population average
brings estimates for the top and bottom quintiles closer together, but a large gap remains due to
subLATE heterogeneity.
Finally, it is worth comparing our findings with those of Feller et al. (2014), who use the principal
stratification framework of Frangakis and Rubin (2002) to estimate effects on n- and c-compliers
in the HSIS. They also find large effects for compliers drawn from home and negligible effects
for compliers drawn from other preschools, though their point estimate of LATEnh is somewhat
smaller than ours (0.21 vs. 0.37). This difference reflects a combination of different test score
outcomes (Feller et al. look only at PPVT scores) and different modeling assumptions. Since
neither estimation approach nests the other, it is reassuring that we find qualitatively similar
results.
IX. Policy Counterfactuals
We now use our model estimates to consider policy counterfactuals that are not non-parametrically
identified by the HSIS experiment.
IX.A. Rationed Substitutes
In the cost-benefit analysis of Section VI we assumed that seats at competing preschools were not
rationed. While this assumption is reasonable in states with universal preschool mandates, other
areas may have preschool programs that face relatively fixed budgets and offer any vacated seats
to new children. In this case, increases in Head Start enrollment will create opportunities for new
children to attend substitute preschools rather than generating cost savings in these programs. Our
model-based estimates allow us to assess the sensitivity of our cost/benefit results to the possibility
of rationing in competing programs.
25
From (9), the marginal value of public funds under rationing depends on LATEnc – the average
treatment effect of competing preschools on “n-to-c compliers” who would move from home care
to a competing preschool program in response to an offered seat. We compute the MV PFδ,rat
under three alternative assumptions regarding this parameter. First, we consider the case where
LATEnc = 0. Next, we consider the case where the average test score effect of competing preschools
for marginal students equals the corresponding effect for Head Start compliers drawn from home
care (i.e. LATEnc = LATEnh). Finally, we use our model to construct an estimate for LATEnc.
Specifically, we compute average treatment effects competing preschools relative to home care for
students who would be induced to move along this margin by an increase in Ui(c) equal to the
utility value of the Head Start offer coefficient.19 This calculation assumes that the utility value
households place on an offered seat at a competing program is comparable to the value of a Head
Start offer.
Table IX shows the results of this analysis. Setting LATEnc = 0 yields an MV PFδ,rat of
1.10. This replicates the naive analysis with φc = 0 in the non-rationed analysis. Both of these
cases ignore costs and benefits due to substitution from competing programs. Assuming that
LATEnc = LATEnh produces a benefit-cost ratio of 2.36. Finally, our preferred model estimates
from Section VIII predict that LATEnc = 0.294, which produces a ratio of 2.02. These results
suggest that, under plausible assumptions about the effects of competing programs relative to home
care, accounting for the benefits generated by vacated seats in these programs yields estimated social
returns larger than those displayed in panel B of Table IV.
IX.B. Structural Reforms
We next predict the social benefits of a reform that expands Head Start by making it more attractive
rather than by extending offers to additional households. This reform is modeled as an improvement
in the structural program feature f , as described in Section V. Examples of such reforms might
include increases in transportation services, outreach efforts, or spending on other services that
make Head Start attractive to parents. Increases in f are assumed to draw additional households
into Head Start but to have no effect on potential outcomes, which rules out peer effects generated
by changes in student composition. We use the estimates from our preferred model to compute
marginal treatment effects and marginal values of public funds for such reforms, treating changes
in f as shifts in the mean Head Start utility ψh(X,Z).
Panel A of Figure I displays predicted effects of structural reforms on test scores. Since the
program feature has no intrinsic scale, the horizontal axis is scaled in terms of the Head Start
attendance rate, with a vertical line indicating the current rate (f = 0). The right axis measures−→S c – the share of marginal students drawn from other preschools. The left axis measures test
19Ideally we would compute LATEnc for students who do not receive offers to competing programs but would attendthese programs if offered. Since we do not observe offers to substitute preschools, it is not possible to distinguishbetween non-offered children and children who decline offers. Our estimate of LATEnc therefore captures a mix ofeffects for compliers who would respond to offers and children who currently decline offers but would be induced toattend competing programs if these programs became more attractive.
26
score effects. The Figure plots average treatment effects for subgroups of marginal students drawn
from home care and other preschools, along with MTEh, a weighted average of alternative-specific
effects.
Figure I shows that Head Start’s effects on marginal home compliers increase modestly with
enrollment and then level out in the neighborhood of the current program scale (f = 0). This
pattern is driven by reverse Roy selection for children drawn from home care: increases in f attract
children with weaker tastes for Head Start, leading to increases in effects for compliers who would
otherwise stay home. Predicted effects for children drawn from other preschools are slightly negative
for all values of f . At the current program scale, the model predicts that the share of marginal
students drawn from other preschools is larger for structural reforms than for an increase in the
offer rate (0.44 vs. 0.35). This implies that marginal compliers are more likely to be drawn from
other preschools than inframarginal compliers. As a result, the value of MTEh is comparable to
the experimental LATEh, despite very large effects on marginal children drawn from home care
(roughly 0.5σ).
To investigate the consequences of this pattern for the social return to Head Start, Panel B plots
MV PFf , the marginal value of public funds for structural reforms. This Figure relies on the same
parameter calibrations as Table IV. Calculations of MV PFf must account for the fact that changes
in structural program features may increase the direct costs of the program. This effect is captured
in (12) by the term η which gives the elasticity of the per-child cost of Head Start with respect to
the scale of the program. Without specifying the program feature being manipulated, there is no
natural value for η. We start with the extreme case where η = 0, which allows us to characterize
costs and benefits associated with reforms that draw in children on the margin without changing
the per-capita cost of the program. We then consider how the cost-benefit calculus changes when
η > 0.
As in our basic cost/benefit analysis, the results in panel B of Figure I show that accounting for
the public savings associated with program substitution has an important effect on the marginal
value of public funds. The red curve plots MV PFf setting φc = 0. This calibration suggests
a marginal value of public funds slightly above one at the current program scale, similar to the
naive calibration in Table IV. The blue curve accounts for public savings by setting φc equal to
our preferred value of 0.75φh. This generates an upward shift and steepens the MV PFf schedule,
indicating that both marginal and average social returns increase with program scale. The implied
marginal value of public funds at the current program scale (f = 0) is above 2. This is larger than
the MV PFδ of 1.84 reported in Table IV, which indicates the social returns to marginal expansions
that shift the composition of compliers are greater than those for expansions that simply raise the
offer rate.
The final scenario in Panel B shows MV PFf when φc = 0.75φh and η = 0.5.20 This scenario
implies sharply rising marginal costs of Head Start provision: an increase in f that doubles enroll-
20For this case, marginal costs are obtained by solving the differential equation φ′h(f) = ηφ(f) (∂ lnP (Di = h) /∂f)with the initial condition φh(0) = $8, 000. This yields the solution φh(f) = $8, 000 exp (η (lnP (Di = h) − lnP0))where P0 is the initial Head Start attendance rate.
27
ment raises per-capita costs by 50 percent. In this simulation the marginal value of public funds is
roughly equal to one when f = 0, and falls below one for higher values. Hence, if η is at least 0.5,
a dollar increase in Head Start spending generated by structural reform will result in less than one
dollar transferred to Head Start applicants. This exercise illustrates the quantitative importance
of determining provision costs when evaluating specific policy changes such as improvements to
transportation services or marketing.
Our analysis of structural reforms suggests increasing returns to the expansion of Head Start in
the neighborhood of the current program scale – expansions will draw in households with weaker
tastes for preschool with above average potential gains. These findings imply that structural reforms
targeting children who are currently unlikely to attend Head Start and children that are likely to
be drawn from non-participation will generate larger effects than reforms that simply create more
seats. Our results also echo other recent studies finding increasing returns to early-childhood invest-
ments, though the mechanism generating increasing returns in these studies is typically dynamic
complementarity in human capital investments rather than selection and effect heterogeneity (see,
e.g., Cunha, Heckman and Schennach 2010).
X. Conclusion
Our analysis suggests that Head Start, in its current incarnation, passes a strict cost-benefit test
predicated only upon projected effects on adult earnings. It is reasonable to expect that this
conclusion would be strengthened by incorporating the value of any impacts on crime (e.g. as
in Lochner and Moretti [2004] and Heckman et al. [2010]), or other externalities such as civic
engagement (Milligan, Moretti and Oreopoulos 2004), or by incorporating the value to parents of
subsidized care (e.g., as in Aaberge et al. 2010). We find evidence that Head Start generates
especially large benefits for children who would not otherwise attend preschool and for children
with weak unobserved tastes for the program. This suggests that the program’s rate of return
can be boosted by reforms that target new populations, though this necessitates the existence of a
cost-effective technology for attracting these children.
The finding that returns are on average greater for nonparticipants is potentially informative for
the debate over calls for universal preschool, which might reach high return households. However,
it is important to note that if competing state level preschool programs become ubiquitous, the
rationale for expansions to federal preschool programs could be undermined. To see this, consider
how the marginal value of expanding Head Start changes as the compliance share Sc approaches
one, so that nearly all denied Head Start applicants would otherwise enroll in competing programs.
If Head Start and competing program have equivalent effects on test scores, then (8) indicates that
we should decide between federal and state level provision based entirely on cost criteria. Since state
programs are often cheaper (CEA 2015) and are expanding rapidly, the case for federal preschool
may actually be weaker now than at the time of the Head Start Impact Study.
It is important to note some other limitations to our analysis. First, our cost-benefit calculations
28
rely on literature estimates of the link between test score effects and earnings gains. These calcu-
lations are necessarily speculative, as the only way to be sure of Head Start’s long-run effects is to
directly measure long-run outcomes for HSIS participants. Second, we have ignored the possibility
that substantial changes to program features or scale could, in equilibrium, change the education
production technology. For example, implementing recent proposals for universal preschool could
generate a shortage of qualified teachers (Rothstein, forthcoming). Finally, we have ignored the
possibility that administrative program costs might change with program scale, choosing instead
to equate average with marginal provision costs.
Despite these caveats, our analysis has shown that accounting for program substitution in the
HSIS experiment is crucial for an assessment of the Head Start program’s costs and benefits. Similar
issues arise in the evaluation of job training programs (Heckman et al. 2000), health insurance
(Finkelstein et al. 2012), and housing subsidies (Kling, Liebman and Katz 2007; Jacob and Ludwig
2012). The tools developed here are potentially applicable to a wide variety of evaluation settings
where data on enrollment in competing programs are available.
29
References
1. Aaberge, Rolf, Manudeep Bhuller, Audun Langørgen, and Magne Mogstad, “The Distribu-tional Impact of Public Services When Needs Differ,” Journal of Public Economics, 94 (2010),549-562.
2. Abdulkadiroglu, Atila, Joshua Angrist, and Parag Pathak, “The Elite Illusion: AchievementEffects at Boston and New York Exam Schools,” Econometrica, 82 (2014), 137-196.
3. Angrist, Joshua, Guido Imbens, and Donald Rubin, “Identification of Causal Effects usingInstrumental Variables,” Journal of the American Statistical Association, 91 (1996), 444-455.
4. Angrist, Joshua, and Jorn-Steffen Pischke, Mostly Harmless Econometrics, Princeton, NJ:Princeton University Press, 2009.
5. Barnett, W. Steven, “Effectiveness of Early Educational Intervention,” Science, 333 (2011),975-978.
6. Bitler, Marianne, Thurston Domina, and Hilary Hoynes, “Experimental Evidence on Distri-butional Effects of Head Start,” NBER Working Paper No. 20434, 2014.
7. Bloom, Howar, and Christina Weiland, “Quantifying Variation in Head Start Effects on YoungChildren’s Cognitive and Socio-Emotional Skills Using Data from the National Head StartImpact Study,” MDRC Report, 2015.
8. Bonhomme, Stephane, and Elena Manresa, “Grouped Patterns of Heterogeneity in PanelData,” Econometrica, 83 (2015), 1147-1184.
9. Bound, John, David Jaeger, and Regina Baker, “Problems With Instrumental Variables Es-timation When the Correlation Between the Instruments and the Endogenous ExplanatoryVariable is Weak,” Journal of the American Statistical Association, 90 (1995), 443-450.
10. Brinch, Christian, Magne Mogstad, and Matthew Wiswall, “Beyond LATE With a DiscreteInstrument,” Journal of Political Economy, forthcoming.
11. Carneiro, Pedro, and Rita Ginja, “Long-Term Impacts of Compensatory Preschool on Healthand Behavior: Evidence from Head Start,” American Economic Journal: Economic Policy,6 (2014), 135-173.
12. Cascio, Elizabeth, and Diane Whitmore Schanzenbach, “The Impacts of Expanding Accessto High-Quality Preschool Education,” Brookings Papers on Economic Activity, Fall (2013),127-192.
13. Chetty, Raj, “Sufficient Statistics for Welfare Analysis: A Bridge Between Structural andReduced-form Methods,” Annual Review of Economics, 1 (2009), 451-488.
14. Chetty, Raj, John Friedman, Nathaniel Hilger, Emmanuel Saez, Diane Whitmore Schanzen-bach, and Danny Yagan, “How Does Your Kindergarten Classroom Affect Your Earnings?Evidence From Project STAR,” Quarterly Journal of Economics, 126 (2011), 1593-1660.
30
15. Chetty, Raj, John Friedman, and Jonah Rockoff, “Measuring the Impacts of Teachers I:Measuring Bias in Teacher Value-added Estimates,” American Economic Review, 104 (2014a),2593-2632.
16. Chetty, Raj, John Friedman, and Jonah Rockoff, “Measuring the Impacts of Teachers II:Teacher Value-Added and Student Outcomes in Adulthood,” American Economic Review,104 (2014b), 2633-2679.
17. Chetty, Raj, Nathaniel Hendren, Patrick Kline, and Emmanuel Saez, “Where is the Land ofOpportunity? The Geography of Intergenerational Mobility in the United States,” QuarterlyJournal of Economics, 129 (2014), 1553-1623.
18. Congressional Budget Office (CBO), “Effective Marginal Tax Rates for Low- and Moderate-Income Workers,” 2012, available at: https://www.cbo.gov/sites/default/files/11-15-2012-MarginalTaxRates.pdf.
19. Council of Economic Advisers (CEA), “The Economics of Early Childhood Investments,”2014, available at: https://www.whitehouse.gov/sites/default/files/docs/early childhood report1.pdf.
20. Cox, Nicholas, “Speaking Stata: Correlation with Confidence, or Fisher’s Z Revisited,” TheStata Journal, 8 (2008), 413-439.
21. Cunha, Flavio, James Heckman, and Susanne Schennach, “Estimating the Technology ofCognitive and Noncognitive Skill Formation,” Econometrica 78 (2010), 883-931.
22. Currie, Janet, “Early Childhood Education Programs,” Journal of Economic Perspectives,15 (2001), 213-238.
23. Currie, Janet, and Duncan Thomas, “Does Head Start Make a Difference?” American Eco-nomic Review, 85 (1995), 341-364.
24. Deming, David, “Early Childhood Intervention and Life-Cycle Skill Development: Evidencefrom Head Start,” American Economic Journal: Applied Economics, 1 (2009), 111-134.
25. Dubin, Jeffrey, and Daniel McFadden, ““An Econometric Analysis of Residential ElectricAppliance Holdings,” Econometrica, 52 (1984), 345-362.
26. Engberg, John, Dennis Epple, Jason Imbrogno, Holger Sieg, and Ron Zimmer, “EvaluatingEducation Programs That Have Lotteried Admission and Selective Attrition,” Journal ofLabor Economics, 32 (2014), 27-63.
27. Federal Register, “Executive Order 13330 of February 24, 2004” 69 (38), 2004, 9185-9187.
28. Feller, Avi, Todd Grindal, Luke Miratrix, and Lindsay Page, “Compared to What? Variationin the Impact of Early Childhood Education by Alternative Care-Type Settings,” Annals ofApplied Statistics, forthcoming.
29. Finkelstein, Amy, Sarah Taubman, Bill Wright, Mira Bernstein, Jonathan Gruber, JosephNewhouse, Heidi Allen, Katherine Baicker, and the Oregon Health Study Group, “The Ore-gon Health Insurance Experiment: Evidence from the First Year,” Quarterly Journal ofEconomics, 127 (2012), 1057-1106.
31
30. Frangakis, Constantine, and Donald Rubin, “Principal Stratification in Causal Inference,”Biometrics, 58 (2002), 21-29.
31. Garces, Eliana, Duncan Thomas, and Janet Currie, “Longer-Term Effects of Head Start,”American Economic Review, 92 (2002), 999-1012.
32. Gelber, Alexander, and Adam Isen, “Children’s Schooling and Parents’ Investment in Chil-dren: Evidence from the Head Start Impact Study,” Journal of Public Economics, 101 (2013),25-38
33. Geweke, John, “Bayesian Inference in Econometric Models Using Monte Carlo Integration,”Econometrica, 57 (1989), 1317-1339.
34. Gibbs, Chloe, Jens Ludwig, and Douglas Miller, “Does Head Start Do Any Lasting Good?”NBER Working Paper No. 17452, 2011.
35. Hajivassiliou, Vassilis, and Daniel McFadden, “The Method of Simulated Scores for the Esti-mation of LDV Models,” Econometrica, 66 (1998), 863-898.
36. Hall, Peter, The Bootstrap and Edgeworth Expansion, New York, NY: Springer, 1992.
37. Heckman, James, “Sample Selection Bias as a Specification Error,” Econometrica, 47 (1979),153-161.
38. Heckman, James, and Edward Vytlacil, “Local Instrumental Variables and Latent Vari-able Models for Identifying and Bounding Treatment Effects,” Proceedings of the NationalAcademy of Sciences, 96 (1999), 4730-4734.
39. Heckman, James, Neil Hohmann, Jeffrey Smith, and Michael Khoo, “Substitution and DropoutBias in Social Experiments: A Study of an Influential Social Experiment,” Quarterly Journalof Economics, 115 (2000), 651-694.
40. Heckman, James, Seong Hyeok Moon, Rodrigo Pinto, Peter Savelyev, and Adam Yavitz, “TheRate of Return to the High/Scope Perry Preschool Program,” Journal of Public Economics,94 (2010), 114-128.
41. Heckman, James, Rodrigo Pinto, and Peter Savelyev, “Understanding the Mechanisms ThroughWhich an Influential Early Childhood Program Boosted Adult Outcomes,” American Eco-nomic Review 103 (2013), 2052-2086.
42. Heckman, James, Sergio Urzua, and Edward Vytlacil, “Instrumental Variables in ModelsWith Multiple Outcomes: The General Unordered Case,” Annales d’Economie et de Statis-tique, 91/92 (2008), 151-174.
43. Hendren, Nathaniel, “The Policy Elasticity,” Tax Policy and the Economy, 30 (2016).
44. Hull, Peter. “IsoLATEing: Identifying Counterfactual-Specific Treatment Effects by Strati-fied Comparison,” Working Paper, 2015.
45. Imbens, Guido, and Joshua Angrist, “Identification and Estimation of Local Average Treat-ment Effects,” Econometrica, 62 (1994), 467-475.
46. Jacob, Brian, and Jens Ludwig, “The Effects of Housing Assistance on Labor Supply: Evi-dence from a Voucher Lottery,” American Economic Review, 102 (2012), 272-304.
32
47. Kane, Thomas, Jonah Rockoff, and Douglas Staiger, “What does Certification Tell Us AboutTeacher Effectiveness? Evidence from New York City,” Economics of Education Review, 27(2008), 615-631.
48. Keane, Michael, “A Computationally Practical Simulation Estimator for Panel Data,” Econo-metrica, 62 (1994), 95-116.
49. Kirkeboen, Lars, Edwin Leuven, and Magne Mogstad, “Field of Study, Earnings, and Self-Selection,” Quarterly Journal of Economics, forthcoming.
50. Klein, Joe, “Time to Ax Public Programs That Don’t Yield Results,” Time, July 7th, 2011.
51. Kling, Jeffrey, Jeffrey Liebman, and Lawrence Katz, “Experimental Analysis of NeighborhoodEffects,” Econometrica, 75 (2007), 83-119.
52. Lafontaine, Francine, and White, Kenneth, “Obtaining any Wald Statistic you Want,” Eco-nomics Letters, 21 (1986), 35-40.
53. Lee, Chul-In, and Gary Solon, “Trends in Intergenerational Income Mobility,” The Review ofEconomics and Statistics, 91 (2009), 766-772.
54. Lochner, Lance, and Enrico Moretti, “The Effect of Education on Crime: Evidence fromPriston Inmates, Arrests, and Self-Reports.” American Economic Review 94 (2004), 155-189.
55. Long, Cuiping, “Experimental Evidence of the Effect of Head Start on Maternal HumanCapital Investment,” Working Paper, 2015.
56. Ludwig, Jens, and Douglas Miller, “Does Head Start Improve Children’s Life Chances? Evi-dence from a Regression Discontinuity Design,” Quarterly Journal of Economics, 122 (2007),159-208.
57. Ludwig, Jens, and Deborah Phillips, “The Benefits and Costs of Head Start,” NBER WorkingPaper No. 12973, 2007.
58. Mayshar, Joram, “On Measures of Excess Burden and Their Application,” Journal of PublicEconomics, 43 (1990), 263-289.
59. Milligan, Kevin, Enrico Moretti, and Philip Oreopoulos, “Does Education Improve Citi-zenship? Evidence from the United States and the United Kingdom,” Journal of PublicEconomics, 88 (2004), 1667-1695.
60. Noss, Amanda, “Household Income: 2013,” American Community Survey Briefs 13-02, 2014.
61. Puma, Michael, Stephen Bell, Ronna Cook, and Camilla Heid, “Head Start Impact Study:Final Report,” Washington, DC: U.S. Department of Health and Services, Administration forChildren and Families, 2010.
62. Puma, Michael, Stephen Bell, and Camilla Heid, “Third Grade Follow-up to the Head StartImpact Study,” Washington, DC: U.S. Department of Health and Human Services, Adminis-tration for Children and Families, 2012.
63. Rothstein, Jesse, “Teacher Quality in Educational Production: Tracking, Decay, and StudentAchievement,” Quarterly Journal of Economics, 125 (2010), 175-214.
33
64. Rothstein, Jesse, “Teacher Quality Policy When Supply Matters,” American Economic Re-view, 105 (2015), 100-130.
65. Roy, A. D., “Some Thoughts on the Distribution of Earnings,” Oxford Economics Papers, 3(1951), 135-146.
66. Saggio, Raffaele, “Discrete Unobserved Heterogeneity in Discrete Choice Panel Data Models,”Master’s Thesis, Center for Monetary and Financial Studies, 2012.
67. Schumacher, Rachel, Mark Greenberg, and Janellen Duffy, “The Impact of TANF Fundingon State Child Care Subsidy Programs,” Washington, DC: Center for Law and Social Policy,2001.
68. Stossel, John, “Head Start Has Little Effect by Grade School?” Fox Business Television,March 7th, 2014.
69. United States Department of Health and Human Services (US DHHS), Administration forChildren and Families, “Child Care and Development Fund Fact Sheet,” 2012, available at:http://www.acf.hhs.gov/sites/default/files/occ/ccdf factsheet.pdf.
70. United States Department of Health and Human Services (US DHHS), Administration forChildren and Families, “Head Start Program Facts, Fiscal Year 2013,” 2013, available at:http://eclkc.ohs.acf.hhs.gov/hslc/mr/factsheets/docs/hs-program-fact-sheet-2011-final.pdf.
71. United States Department of Health and Human Services (US DHHS), Administration forChildren and Families, “Head Start Services,” 2014, available at: http://www.acf.hhs.gov/programs/ohs/about/head-start.
72. Walters, Christopher, “Inputs in the Production of Early Childhood Human Capital: Evi-dence from Head Start,” American Economic Journal: Applied Economics, 7 (2015), 76-102.
73. Walters, Christopher, “The Demand for Effective Charter Schools,” NBER Working PaperNo. 20640, 2014.
34
Notes: This figure plots predicted test score effects and marginal values of public funds for various values of the program feature f , which shifts the utility of Head Start attendance. Horizontal axes shows the Head Start attendance rate at each f , and a vertical line indicates the HSIS attendance rate (f = 0). Panel A shows marginal treatment effects and competing preschool compliance shares. The left axis measures test score effects. MTE h is the average effect for marginal students, while MTE nh and MTE ch are effects for subgroups of marginal students drawn from home care and other preschools. The right axis measures the share of marginal students drawn from other preschools. The shaded region shows a 90-percent symmetric bootstrap confidence interval for MTE h . Panel B shows predicted marginal values of public funds for structural reforms, using the same parameter calibrations as Table IV. P -values come from bootstrap tests of the hypothesis that the marginal value of public funds is less than or equal to one at f = 0.
Panel A. Test score effects Panel B. Marginal value of public funds
Figure I. Effects of Structural Reforms
.35
.4.4
5.5
Oth
er p
resc
hool
com
plie
r sha
re
-.10
.1.2
.3.4
.5.6
Test
sco
re e
ffect
(std
. dev
.)
.2 .4 .6 .8Head Start attendance rate
MTEh MTEnh
Sc MTEch
p-values forMVPF≤1:
p = 0.00
p = 0.10
p = 0.59
01
23
4M
argi
nal v
alue
of p
ublic
fund
s
.2 .4 .6 .8Head Start attendance rate
φc = 0.75φh, η = 0 φc = 0, η= 0φc = 0.75φh, η = 0.5
Offered mean Non-offered mean Differential Head Start Other centers No preschoolVariable (1) (2) (3) (4) (5) (6)
Male 0.494 0.505 -0.011 0.501 0.506 0.492(0.019)
Black 0.308 0.298 0.010 0.317 0.353 0.250(0.010)
Hispanic 0.376 0.369 0.007 0.380 0.354 0.373(0.010)
Teen mother 0.159 0.174 -0.015 0.159 0.169 0.176(0.014)
Mother married 0.436 0.448 -0.011 0.439 0.420 0.460(0.017)
Both parents in household 0.497 0.488 0.009 0.497 0.468 0.499(0.017)
Mother is high school dropout 0.368 0.397 -0.029 0.377 0.322 0.426(0.017)
Mother attended some college 0.298 0.281 0.017 0.293 0.342 0.253(0.016)
Spanish speaker 0.287 0.273 0.014 0.296 0.274 0.260(0.011)
Special education 0.136 0.108 0.028 0.134 0.145 0.091(0.011)
Only child 0.161 0.139 0.022 0.151 0.190 0.123(0.012)
Income (fraction of FPL) 0.896 0.896 0.000 0.892 0.983 0.851(0.024)
Age 4 cohort 0.448 0.451 -0.003 0.426 0.567 0.413(0.012)
Baseline summary index 0.003 0.012 -0.009 -0.001 0.106 -0.040(0.027)
Urban 0.833 0.835 -0.002 0.834 0.859 0.819(0.003)
Center provides transportation 0.606 0.604 0.002 0.586 0.614 0.628(0.005)
Center quality index 0.465 0.470 -0.005 0.452 0.474 0.488(0.005)
Joint p -value 0.506N 2256 1315 3571 2043 598 930
By preschool choiceTable I. Descriptive Statistics
Notes: All statistics weight by the reciprocal of the probability of a child's experimental assignment. Standard errors are clustered at the center level. The transportation and quality index variables refer to a child's Head Start center of random assignment. The quality variable combines information on center characteristics (teacher and center director education and qualifications, class size) and practices (variety of literacy and math activities, home visiting, health and nutrition). Income is missing for 19 percent of observations. Missing values are excluded in statistics for income. The baseline summary index is the average of standardized PPVT and WJIII scores in Fall 2002, with each score standardized to have mean zero and standard deviation one in the control group separately by applicant cohort.The joint p -value is from a test of the hypothesis that all coefficients equal zero.
By offer status
Reduced form First stage IV Reduced form First stage IV Reduced form First stage IVTime period (1) (2) (3) (4) (5) (6) (7) (8) (9)
Year 1 0.194 0.699 0.278 0.141 0.663 0.213 0.168 0.682 0.247(0.029) (0.025) (0.041) (0.029) (0.022) (0.044) (0.021) (0.018) (0.031)
N 1970 1601 3571
Year 2 0.087 0.356 0.245 -0.015 0.670 -0.022 0.046 0.497 0.093(0.029) (0.028) (0.080) (0.037) (0.023) (0.054) (0.024) (0.020) (0.049)
N 1760 1416 3176
Year 3 -0.010 0.365 -0.027 0.054 0.666 0.081 0.019 0.500 0.038(0.031) (0.028) (0.085) (0.040) (0.025) (0.060) (0.025) (0.020) (0.050)
N 1659 1336 2995
Year 4 0.038 0.344 0.110 - -(0.034) (0.029) (0.098)
N 1599
Table II. Experimental Impacts on Test Scores
Notes: This table reports experimental estimates of the effects of Head Start on test scores. The outcome is the average of standardized PPVT and WJIII scores, with each score standardized to have mean zero and standard deviation one in the control group separately by applicant cohort and year. Columns (1), (4) and (7) report coefficients from regressions of test scores on an indicator for assignment to Head Start. Columns (2), (5) and (8) report coefficients from first-stage regressions of Head Start attendance on Head Start assignment. The attendance variable is an indicator equal to one if a child attends Head Start at any time prior to the test. Columns (3), (6) and (9) report coefficients from two-stage least squares (2SLS) models that instrument Head Start attendance with Head Start assignment. All models weight by the reciprocal of a child's experimental assignment, and control for sex, race, Spanish language, teen mother, mother's marital status, presence of both parents in the home, family size, special education status, income quartile dummies, urban, and a cubic polynomial in baseline score. Missing values for covariates are set to zero, and dummies for missing are included. Standard errors are clustered by center of random assignment.
Three-year-old cohort Four-year-old cohort Cohorts pooled
Other centerHead Start Other centers No preschool Head Start Other centers No preschool complier share
Time period Cohort (1) (2) (3) (4) (5) (6) (7)Year 1 3-year-olds 0.851 0.058 0.092 0.147 0.256 0.597 0.282
4-year-olds 0.787 0.114 0.099 0.122 0.386 0.492 0.410
Pooled 0.822 0.083 0.095 0.136 0.315 0.550 0.338
Year 2 3-year-olds 0.657 0.262 0.081 0.494 0.379 0.127 0.719
OfferedTable III. Preschool Choices by Year, Cohort, and Offer Status
Notes: This table reports shares of offered and non-offered students attending Head Start, other center-based preschools, and no preschool, separately by year and age cohort. All statistics are weighted by the reciprocal of the probability of a child's experimental assignment. Column (7) reports estimates of the share of compliers drawn from other preschools, given by minus the ratio of the offer's effect on attendance at other preschools to its effect on Head Start attendance.
Not offered
Parameter Description Value Source(1) (2) (3) (4)
p Effect of a 1 SD increase in test scores on earnings Table A.IV
e US US average present discounted value of lifetime earnings at age 3.4 $438,000 Chetty et al. 2011 with 3% discount rate
e parent /e US Average earnings of Head Start parents relative to US average 0.46 Head Start Program Facts
IGE Intergenerational income elasticity 0.40 Lee and Solon 2009
Average present discounted value of lifetime earnings for Head Start applicants $343,392 [1 - (1 - e parent /e US )IGE]e US
Effect of a 1 SD increase in test scores on earnings of Head Start applicants $34,339
LATE h Local Average Treatment Effect 0.247 HSIS
τ Marginal tax rate for Head Start population 0.35 CBO 2012
S c Share of Head Start population drawn from other preschools 0.34 HSIS
ϕ h Marginal cost of enrollment in Head Start $8,000 Head Start program facts
ϕ c Marginal cost of enrollment in other preschools $0 Naïve assumption: ϕ c = 0$4,000 Pessimistic assumption: ϕ c = 0.5ϕ h
$6,000 Preferred assumption: ϕ c = 0.75ϕ h
NMB Marginal benefit to Head Start population net of taxes $5,513 (1 - τ )pLATE h
MFC Marginal fiscal cost of Head Start enrollment $5,031 ϕ h - ϕ c S c - τpLATE h , naïve assumption$3,671 Pessimistic assumption$2,991 Preferred assumption
MVPF Marginal value of public funds 1.10 (0.22) NMB /MFC (s.e.), naïve assumptionp -value = 0.1
1.50 (0.34) Pessimistic assumptionp-value = 0.00
1.84 (0.47) Preferred assumptionp-value = 0.00
Table IV. Benefits and Costs of Head Start
Panel A. Parameter values
Panel B. Marginal value of public funds
Notes: This table reports results of cost/benefit calculations for Head Start. Parameter values are obtained from the sources listed in column (4). Standard errors for MVPF ratios are calculated using the delta method. P -values are from one-tailed tests of the null hypotheses that the MVPF is less than one. These tests are performed via nonparametric block bootstrap of the t -statistic, clustered at the Head Start center level. Breakevens give percentage effects of a standard deviation of test scores on earnings that set MVPF equal to one.
e
0.1e
0.1e
Breakeven p/e = 0.09 (0.01)
Breakeven p/e = 0.08 (0.01)
Breakeven p/e = 0.07 (0.01)
One endogenous variable
Head Start Head Start Other centersInstruments (1) (2) (3)
Offer 0.247 - -(1 instrument) (0.031)
Offer X covariates 0.241 0.384 0.419(9 instruments) (0.030) (0.127) (0.359)
First-stage F 276.2 17.7 1.8Overid. p-value 0.007
Offer X sites 0.210 0.213 0.008(183 instruments) (0.026) (0.039) (0.095)
First-stage F 215.1 90.0 2.7Overid. p-value 0.002
Offer X site groups 0.229 0.265 0.110(6 instruments) (0.029) (0.056) (0.146)
First-stage F 1,015.2 339.1 32.6Overid. p-value 0.077
Offer X covariates and 0.229 0.302 0.225offer X site groups (14 instruments) (0.029) (0.054) (0.134)
First-stage F 340.2 121.2 13.3Overid. p-value 0.012
Notes: This table reports two-stage least squares estimates of the effects of Head Start and other preschool centers in Spring 2003. The model in the first row instruments Head Start attendance with the Head Start offer. Models in the second row instrument Head Start and other preschool attendance with interactions of the offer and transportation, above-median quality, race, Spanish language, mother's education, an indicator for income above the federal poverty line, and baseline score. The third row uses the Head Start offer interacted with 183 experimental site indicators as instruments. The fourth row uses interactions of the offer and indicators for groups of experimental sites obtained from a multinomial probit model with unobserved group fixed effects, as described in Appendix G. The fifth row uses both covariate and site group interactions. All models control for main effects of the interacting variables and baseline covariates. First stage F -statistics are Angrist/Pischke (2009) partial F 's. Standard errors are clustered at the center level.
Table V. Two Stage Least Squares Estimates with Interaction InstrumentsTwo endogenous
0.010
variables
0.006
0.050
0.002
Main effect Offer interaction Other center utility Arctanh ρ(1) (2) (3) (4)
Center provides 0.022 0.111 0.054 0.096transportation (0.114) (0.142) (0.087) (0.178)
Above-median -0.233 0.425 -0.115 -0.007center quality (0.091) (0.102) (0.082) (0.153)
Black 0.095 0.282 0.206 -0.185(0.108) (0.127) (0.100) (0.166)
Spanish speaker -0.049 -0.273 -0.213 0.262(0.136) (0.122) (0.169) (0.224)
Mother's education 0.106 0.021 0.105 -0.219(0.056) (0.060) (0.064) (0.110)
Income above FPL 0.216 -0.305 0.173 0.097(0.128) (0.121) (0.126) (0.192)
Baseline score 0.080 -0.025 0.292 0.026(0.094) (0.108) (0.069) (0.094)
Age 4 0.164 -0.277 0.518 0.010(0.142) (0.166) (0.104) (0.170)
P -values: no heterogeneity 0.015 0.000 0.000 0.666
Group 1 -0.644 2.095 0.424 0.435(share = 0.215) (0.136) (0.153) (0.085) (0.128)
Group 2 -4.847 6.760 -0.577 -0.496(share = 0.183) (0.076) (0.158) (0.045) (0.172)
Group 3 -2.148 2.912 -0.768 0.530(share = 0.183) (0.312) (0.340) (0.081) (0.159)
Group 4 0.488 0.541 -0.139 0.483(share = 0.151) (0.130) (0.150) (0.226) (0.322)
Group 5 -1.243 2.849 -1.643 -0.772(share = 0.145) (0.108) (0.171) (0.164) (0.359)
Group 6 0.072 1.191 0.110 2.988(share = 0.124) (0.127) (0.183) (0.106) (0.925)
P -values: no heterogeneity 0.000 0.000 0.000 0.000
Table VI. Multinomial Probit Estimates
Notes: This table reports simulated maximum likelihood estimates of a multinomial probit model of preschool choice. The model includes fixed effects for six unobserved groups of experimental sites, estimated as described in Appendix G. The Head Start and other center utilities also include the main effects of gender, test language, teen mother, mother's marital status, presence of both parents, family size, special education, family income categories, and second- and third-order terms in baseline test scores. The likelihood is evaluated using the GHK simulator, and likelihood contributions are weighted by the reciprocal of the probability of experimental assignments. P -values for site heterogeneity are from tests that all group-specific constants are equal. P -values for covariate heterogeneity are from tests that all covariate coefficients in a column are zero. Standard errors are clustered at the Head Start center level.
Head Start utility
Panel A. Covariates
Panel B. Experimental site groups
No controls Covariates Covariates Site groups Full model(1) (2) (3) (4) (5)
Head Start 0.202 0.218 0.483 0.380 0.470(0.037) (0.022) (0.117) (0.121) (0.101)
Other preschools 0.262 0.151 0.183 0.065 0.109(0.052) (0.035) (0.269) (0.991) (0.253)
𝜆h - - 0.015 0.004 0.019(0.053) (0.063) (0.053)
Head Start X 𝜆h -0.167 -0.137 -0.158(0.080) (0.126) (0.091)
Other preschools X 𝜆h -0.030 -0.047 0.000(0.109) (0.366) (0.115)
𝜆c -0.333 -0.174 -0.293(0.203) (0.187) (0.115)
Head Start X 𝜆c 0.224 0.065 0.131(0.306) (0.453) (0.172)
Other preschools X 𝜆c 0.488 0.440 0.486(0.248) (0.926) (0.197)
P -values:No selection 0.016 0.510 0.046
No selection on gains 0.133 0.560 0.084Additive separability 0.261 0.452 0.349
Notes: This table reports selection-corrected estimates of the effects of Head Start and other preschool centers in Spring 2003. Each column shows coefficients from regressions of test scores on an intercept, a Head Start indicator, an other preschool indicator, and controls. Column (1) shows estimates with no controls. Column (2) adds controls for gender, race, home language, test language, mother's education, teen mother, mother's marital status, presence of both parents, family size, special education, income categories, experimental site characteristics (transportation, above-median quality, and urban status) and a third-order polynomial in baseline test score. This column interacts the preschool variables with transportation, above-median quality, race, Spanish language, mother's education, an indicator for income above the federal poverty line, and the main effect of baseline score. Covariates are de-meaned in the estimation sample, so that main effects can be interpreted as estimates of average treatment effects. Column (3) adds control function terms constructed from a multinomial probit model using the covariates from column (3) and the Head Start offer. The interacting variables from column (2) are allowed to interact with the Head Start offer and enter the preschool taste correlation equation in column (3). Column (4) omits observed covariates and includes indicators for experimental site groups, constructed using the algorithm described in Appendix G. The multinomial probit model is saturated in these site group indicators, and the second step regression interacts site groups with preschool alternatives. Column (5) combines the variables used in columns (3) and (4). Standard errors are bootstrapped and clustered at the center level. The bottom row shows p -values from a score test of the hypothesis that interactions between the control functions and covariates are zero in each preschool alternative (see Appendix F for details).
Table VII. Selection-corrected Estimates of Preschool EffectsLeast squares Control function
IV Covariates Sites Full modelParameter (1) (2) (3) (4)LATE h 0.247 0.261 0.190 0.214
(0.031) (0.032) (0.076) (0.042)
LATE nh - 0.386 0.341 0.370(0.143) (0.219) (0.088)
LATE ch 0.023 -0.122 -0.093(0.251) (0.469) (0.154)
Lowest predicted quintile:LATE h 0.095 0.114 0.027
(0.061) (0.112) (0.067)
LATE h with fixed S c 0.125 0.125 0.130(0.060) (0.434) (0.119)
Highest predicted quintile:LATE h 0.402 0.249 0.472
(0.042) (0.173) (0.079)
LATE h with fixed S c 0.364 0.289 0.350(0.056) (1.049) (0.126)
Notes: This table reports estimates of treatment effects for subpopulations. Column (1) reports an IV estimate of the effect of Head Start. Columns (2)-(4) show estimates of treatment effects computed from the control function models displayed in Table VII. The bottom rows show effects in the lowest and highest quintiles of model-predicted LATE. Rows with fixed c -complier shares weight subLATEs using the full-sample estimate of this share (0.34). Standard errors are boostrapped and clustered at the center level.
Table VIII. Treatment Effects for SubpopulationsControl function
Parameter Description Value Source(1) (2) (3) (4)
LATE h Head Start Local Average Treatment Effect 0.247 HSIS
LATE nc Effect of other centers for marginal children 0 Naïve assumption: No effect of competing preschools0.370 Homogeneity assumption: n->c subLATE equals n->h subLATE0.294 Model-based prediction
NMB Marginal benefit to Head Start population net of taxes $5,513 (1 - τ)p (LATE h +S c LATE nc ), naïve assumption$8,321 Homogeneity assumption$7,744 Model-based prediction
MFC Marginal fiscal cost of Head Start enrollment $5,031 ϕ h - τp (LATE h +S c LATE nc ), naïve assumption$3,519 Homogeneity assumption$3,830 Model-based prediction
MVPF Marginal value of public funds 1.10 Naïve assumption2.36 Homogeneity assumption2.02 Model-based prediction
Table IX. Benefits and Costs of Head Start when Competing Preschools are Rationed
Notes: This table reports results of a rate of return calculation for Head Start, assuming that competing preschools are rationed and that marginal students offered seats in these programs as a result of Head Start expansion would otherwise receive home care. Parameter values are obtained from the sources listed in column (4).