the effect of large-scale performance-based funding in

48
1 The Effect of Large-scale Performance-Based Funding in Higher Education Jason Ward a and Ben Ost a a Department of Economics, University of Illinois at Chicago, 601 South Morgan UH718 M/C144 Chicago, IL 60607, United States Abstract: The use of performance-based funding that ties state higher education appropriations to performance metrics has increased dramatically in recent years, but most programs place a small percent of overall funding at stake. We analyze the effect of two notable exceptions— Ohio and Tennessee—where nearly all state funding is tied to performance measures. Using a difference-in-differences identification strategy along with a synthetic control approach, we find no evidence that these programs improve key academic outcomes. JEL Classification: I23 Keywords: Performance-based funding; Higher education

Upload: others

Post on 10-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

1

The Effect of Large-scale Performance-Based Funding in Higher Education Jason Ward a and Ben Ost a

a Department of Economics, University of Illinois at Chicago, 601 South Morgan UH718 M/C144 Chicago, IL 60607, United States Abstract: The use of performance-based funding that ties state higher education appropriations to performance metrics has increased dramatically in recent years, but most programs place a small percent of overall funding at stake. We analyze the effect of two notable exceptions—Ohio and Tennessee—where nearly all state funding is tied to performance measures. Using a difference-in-differences identification strategy along with a synthetic control approach, we find no evidence that these programs improve key academic outcomes. JEL Classification: I23 Keywords: Performance-based funding; Higher education

2

Introduction

In the United States, educational funding has historically been a function of inputs. K-12

teachers are typically paid based on years of experience and their certification (regardless of the

performance of their students). Public universities receive state funding based on the number of

students enrolled (regardless of the performance of those students). In the past decade, a

structural shift in funding mechanisms has taken hold at every level of education. From 2009 to

2015, the number of states requiring that teachers be evaluated based on student performance

tripled. Large school districts throughout the country have adopted new compensation systems

that directly link pay to performance (e.g., Dallas and Washington, DC). At the higher education

level, 10 states were in the process of developing new performance-based funding policies in

2015 alone (Snyder 2015). The shift to funding institutions based on outputs has affected every

level of education and it is expected to continue (McLendon and Hearn 2013; Hess and Castle

2008; Dee and Wyckoff 2015).

In this study, we provide the first evidence on the efficacy of a new set of higher

education performance-based funding policies implemented by Ohio and Tennessee in 2010 and

2011 respectively. These policies represent a dramatic shift in the scale of state funding

allocated based on performance, as well as the nature of the incentivized outcomes. In terms of

funding at stake, Tennessee now allocates over 80% of total state dollars using their outcomes

formula and Ohio allocates 100% (Synder 2016). In terms of outcome measures, these programs

both shift away from course completion counts to incentivize persistence and degree completion

amongst enrolled students.

3

Performance-based funding for higher education has existed since the late 1970s but early

programs were limited in scope. Starting in the 2000s, a series of states introduced more

consequential performance-based funding measures. This series of policy reforms has been

dubbed “PBF 2.0” to distinguish them from PBF 1.0 policies, which typically offered low

amounts of funding in the form of bonuses and did not tie this funding to outcomes such as

graduation rates (McLendon et al. 2006). PBF 2.0 policies placed larger amounts of funding at

stake (between 2% and 10%) and, while this funding was still often in the form of bonuses, some

states began to place baseline funding at risk as well. Additionally, this second wave of programs

began to tie funding more directly to outcome metrics such as credit hour accumulation and

graduation rates. The literature has generally found PBF 1.0 to have had little (or sometimes

negative) effects, a finding attributed to the fact that PBF 1.0 had limited scope and poor

implementation (Layzell 1999; Dougherty and Reddy 2011; Rutherford and Rabovsky 2014).

Despite having higher stakes and being better targeted, the literature to date has found

that PBF 2.0 also had limited impact on student performance (Hillman, Tandberg, and Gross

2014; Hillman, Tandberg, and Fryar 2015; Tandberg and Hillman 2014; Rutherford and

Rabovsky 2014). Recent performance-based funding programs enacted in Ohio and Tennessee

bear important similarities to PBF 2.0 programs in terms of allocating baseline funding on the

basis of incentivized outcomes, but they allocate all or nearly all base funding according to this

incentive structure. This large difference in stakes led Kelchen and Stedrak (2016) to dub these

programs PBF 3.0, a convention we follow in this work. To date, there is no evidence on the

effect of these policies on student outcomes.

Proponents of performance funding argue that the lack of efficacy of PBF 2.0 policies is

due to their limited scope – not because performance-based funding is generally ineffective.

4

Many states appear to be persuaded by such arguments and are moving toward PBF 3.0 type

programs despite limited existing evidence that PBF 2.0 has been effective and no existing

evidence on the efficacy of PBF 3.0 (Snyder 2015). Our study contributes to the literature by

providing the first estimates on the efficacy of PBF 3.0 policies. Estimating the efficacy of these

full-scale PBF programs will help state policy makers assess whether PBF 2.0 policies are

ineffective because of their small scope and misaligned incentives, or whether performance

funding is more generally ineffective at the higher education level.

We examine the effect of high-stakes performance-based funding on several different

academic outcomes. First, we examine total degree completions, as this outcome is directly

incentivized by performance-based funding. Second, we measure three dimensions of

undergraduate success: first-to-second-year retention, six-year graduation rates, and total BA

completions.1, 2 Though first-to-second year retention and six-year graduation rates are common

measures of institutional performance, one limitation of these measures is that they are only

defined for first-time, full-time freshman. The total BA completions outcome represents a

broader class of students. We use multiple measures of undergraduate success because there are

many different policy objectives in higher education and the outcome of interest depends on the

policy objective. First-to-second year retention and six-year graduation rates are key metrics for

students considering whether to enroll in a particular university. Total degree completions and

1 As we discuss in the institutional background section, some of the outcomes are directly incentivized, while others are not. Our goal is to assess how PBF affects core academic outcomes whether they are directly incentivized or not. 2 Our analysis focuses on undergraduates at four-year schools, so we do not consider associate degree production. Community colleges are also subject to performance-based funding and, though several papers examine the effect of early PBF programs on community colleges, there is no evidence on the effect of high-stakes PBF on community colleges. We view this as a useful direction for future work.

5

BA completions are the relevant outcomes for policy makers interested in increasing the

proportion of the state that is college educated.

Using a difference-in-differences strategy, we find no evidence that PBF 3.0 affected

total degree completions, first-to-second year retention, six-year graduation rates or BA

completions. Though estimates are near zero for all four outcomes, the precision of the estimate

varies across outcomes, and we only have precise null estimates for first-to-second year retention

and six-year graduation rates. For total degree completions and BA completions, there is no

statistically significant evidence of improvement on either outcome, but standard errors are

sufficiently large so that moderate effects are included in the confidence intervals.

In order to assess the validity of the results, we estimate a dynamic difference-in-

differences model that includes leads and lags of the policy effect. Specifically, we examine

whether the policies have an “effect” on outcomes in the years leading up to implementation and

we directly assess whether there appears to be differential trends in treatment states vs control

states. In addition to paying careful attention to whether treatment and control trend similarly

prior to policy adoption, we complement the difference-in-differences analysis by implementing

a synthetic control approach (Abadie et al 2010). The synthetic control approach constructs a

weighted control group that minimizes pre-policy differences between the control group and the

treatment group and compares outcome trends in the post-adoption period.

In addition to assessing the overall effect of PBF 3.0 on academic outcomes, we consider

potential mechanisms through which institutions might attempt to improve outcomes. First, we

consider whether institutions alter overall spending or the proportion of spending that goes to

instruction, student supports or research. Second, we consider whether institutions enroll more

students or a different type of student in response to PBF 3.0. Enrolling more students can help

6

increase outcomes such as total degree production and altering student composition can help

increase graduation rates.3 We find little evidence of an institutional response on either the

institutional spending or student enrollment dimensions. The one exception is that we find that

the proportion of Hispanic students falls slightly following PBF 3.0. An important caveat to these

findings is that we cannot rule out moderate effects, since many of the coefficients are not

precisely estimated.

Our study makes three main contributions. First, as noted above, PBF 3.0 represents a

significant evolution of scope of performance-based funding policies and there is little reason to

expect that the literature evaluating small-scale performance-based funding would be informative

regarding the likely effects of full-fledged performance-based funding. Our study provides the

first evidence on the effect of PBF 3.0 on core academic outcomes. Second, we are the first study

to use a synthetic control approach to study the effect of PBF and among only a few studies that

empirically assess the plausibility of the parallel trend assumption when estimating difference-in-

differences models. Finally, our study provides a comprehensive analysis by examining multiple

performance measures along with student compositional changes and institutional spending,

outcomes that may shed light on institutional responses to PBF 3.0.

There are several limitations to our analysis. First, we only study the effect of the

Tennessee and Ohio policies and so our results may not generalize to other states where

implementation details and existing institutional contexts differ. The institutional context

surrounding higher education is very heterogeneous across states and so this is an important

limitation. Second, our analysis is limited to the six years after policy implementation and so we

3 In both Tennessee and Ohio, there are also explicit provisions in the policy that provide incentives to enroll traditionally underrepresented or disadvantaged students.

7

are only able to assess the short-run effect of PBF 3.0. This is a particularly important limitation

when considering outcomes such as six-year graduation rates. Third, our estimates are only

statistically precise for first-to-second year retention and six-year graduation so we cannot make

strong conclusions regarding degree completions. Finally, the difference-in-differences design

relies on a fundamentally untestable assumption that trends in the control group provide a valid

counterfactual for trends in the treatment group. We test this assumption indirectly by looking

for parallel trends in the pre-period, but we cannot definitively rule out that our estimates may be

affected by unobservable differential trends between “treated” states and the states we use for

comparison.

Literature Review

We focus here on discussing the quantitative literature that aims to estimate the effect of

PBF on academic outcomes at four-year institutions.4 We refer the reader to Dougherty and

Reddy (2011) for a comprehensive review of both the quantitative and qualitative literature on

the effect of performance-based funding on a variety of immediate, intermediate and ultimate

outcomes. We further restrict our analysis to studies that strive to identify the causal effect of

PBF and refer the reader to Tandberg and Hillman (2014) for a discussion of additional studies

that examine associations with PBF.

4 Hillman, Tandberg, and Fryar (2015) study community colleges in Washington state and Tandberg, Hillman and Barakat (2014) study community colleges nationally. We emphasize the literature studying four-year schools here because the policies, and institutional context surrounding four-year schools is quite different from that of community colleges. That said, it is important to emphasize that the two-year college sector is arguably just as important as the four-year college sector and future work should assess the effect of PBF 3.0 on two-year colleges.

8

Tandberg and Hillman (2014) estimate the causal effect of PBF on BA degree

completions among four-year institutions using a difference-in-differences approach, finding no

evidence of a non-zero effect. They note that further research could examine whether states with

stronger incentives generate different educational outcomes.” 5 Rutherford and Rabovsky (2014)

perform a similar national analysis taking advantage of the disparate timing of PBF adoption

across states. They examine whether PBF 1.0 and PBF 2.0 policies have different effects on

outcomes such as graduation rates and first-to-second year persistence. Their results show no

evidence of either PBF 1.0 or PBF 2.0 affecting academic outcomes. Neither of these studies

directly assess whether pre-trends are similar across treated and control states.

Hillman, Tandberg and Gross (2014) analyze the effect of a PBF policy enacted in

Pennsylvania in 2000 on BA completions. They use difference-in-differences and provide a

direct assessment of the whether treatment and control trended similar in the pre-policy period.

They find that control institutions defined by geographic proximity show evidence of differential

trends prior to the policy and use an approach matching Pennsylvania institutions to other

institutions based on 1990 characteristics to generate a control group that trended similarly to

Pennsylvania institutions prior to the 2000 policy. With this approach, they find no effect of the

PBF policy on BA degree production.6

Sanford and Hunter (2011) study the effects of earlier PBF policies in Tennessee on 6-

year graduation rates and first-to-second year retention rates. They utilize spline-linear mixed

5 Interestingly, in discussing how treatment intensity varies across states in their (pre-2011) data, Tandberg and Hillman point to the example of Pennsylvania which allocates 8% of appropriations to performance whereas Oklahoma allocates 2%. The PBF 3.0 reforms come too late to be included in their analysis. 6 Hillman, Tandberg and Gross note that one limitation of their study is that they only use a single outcome (BA production per 100 students) and it is possible that the Pennsylvania policy could have affected other important dimensions of institutional success

9

models and control for observable institutional characteristics and find no evidence that early

PBF programs in Tennessee were effective.

Conceptual framework

Performance-based funding is motivated by the idea that state funding of higher

education fits into a classical principal-agent framework. There are two key features of a

principal-agent model: asymmetric information and divergent preferences between the principal

and the agent. In this context, the principal (the state) seeks to improve student outcomes,

whereas the agents (university administrators) have a different objective. The principal can

observe final outcomes but is assumed to be unable to perfectly observe inputs. In a principal-

agent model, compensation schemes that fail to tie compensation to output will result in

inefficient outcomes from the principal’s perspective. This is the concern with traditional

funding models and a major reason that performance-based funding has grown in popularity.

In a traditional funding model, universities receive funding based on total enrollments

and administrators have little financial stake in the performance of those enrolled students. Such

a funding system may incentivize enrolling too many students, investing too little in improving

educational quality, or devoting resources to non-student outcomes. By tying funding to student

outcomes, the state changes the incentives of the university administrator and encourages them to

focus more resources on the incentivized outcomes. Whether or not this will be successful

depends on several factors.

First, incentivizing certain outcomes will only be effective if there was an initial

disconnect between the priorities of the state and the priorities of administrators. If the state

begins rewarding degree completions, but university administrators already strongly value

10

degree completions, the incentive may have no effect on behavior. Second, tying funding to

specific outcomes will only improve these outcomes if administrators have the knowledge and

ability to improve the incentivized outcomes. Improving outcomes such as degree completions is

a complex process and the institution’s administrator may have limited understanding of how to

improve these outcomes, even when she has a strong incentive to do so. Relatedly, in many

cases, institution administrators have limited autonomy and cannot experiment with new

approaches because of state-imposed regulations and requirements. Third, Holmstrom and

Milgrom (1991) note that performance pay has the potential to produce unintended consequences

when the agent has many different tasks to complete and only some of these tasks are directly

rewarded. Higher education institutions certainly have multiple outputs and, thus, the incentive

to increase certain outcomes may be at the expense of other outcomes. For example, institutions

focused primarily on increasing course and degree completions may reduce academic standards.

Dougherty et al. (2014) suggests that administrators intend to respond to the incentives of

performance-based funding and Dougherty and Reddy (2011) discuss various dimensions on

which institutions might alter inputs in an attempt to improve outcomes. Here we discuss four

broad dimensions that may allow administrators to affect outcomes. First, administrators can

expand a range of student services from academic counseling to academic supports such as

tutoring. Webber and Ehrenberg (2010) show that spending on student services increases

graduation rates, suggesting that institutions may have some capacity to improve outcomes

through these inputs. Second, administrators can devote more resources to instructional activities

such as providing more teaching assistants, reducing class sizes, or expanding incentives such as

awards and bonuses for exceptional teaching. Bettinger and Long (2018) find that smaller classes

improve student persistence and Philipp, Tretter and Rich (2016) find that undergraduate

11

teaching assistants help improve student class performance. Though there is little evidence on the

effect of teaching awards on student outcomes, Brawer et al. (2006) finds that faculty report

improving the quality of their teaching due to these awards.

Third, administrators could adopt data-driven approaches to student improvement by

introducing tracking systems that include predictive analytics to help better target interventions.

That said, there is mixed evidence on the efficacy of these types of data-driven approaches in

terms of actually improving student outcomes (Alamuddin, Rossman and Kurzweil 2018; Main

and Griffith 2018; Milliron, Malcom and Kil 2014). Finally, administrators could attempt to alter

the number and composition of entering students, either by altering recruiting efforts or by

explicitly changing admission requirements. This would likely have a direct effect on outcomes

through compositional change, and it may also have indirect effects on outcomes through a peer

effects channel.

Institutional Background

Our study is focused on the performance-based funding policies enacted by Ohio and

Tennessee in 2009 and 2010 respectively. Ohio and Tennessee were both early adopters of

performance-based funding, with Tennessee’s first program beginning in 1979 and Ohio’s first

program beginning in 1995. Both states provided universities with bonuses based on a variety of

performance measures, but neither had large amounts of funding at stake until the recent policy

changes. Importantly, in both states, the bonuses in the early programs were less than 5 percent

of state appropriations. Dougherty and Reddy (2011) provide a description of the qualitative

literature based on interviews with university administrators in Ohio and Tennessee. They find

that in both states, prior to 2010, performance funding was viewed as a trivial incentive given its

12

small scale. Nevertheless, rather than interpret our estimates as the effect of performance funding

relative to no performance funding, it is more appropriate to interpret our estimates as the effect

of moving from a very small performance funding program to a large-scale performance funding

program.

The recent policies enacted in Tennessee and Ohio represent a substantial shift in the

magnitude of performance-based funding. In Ohio, the funding reform moved all state higher

education funding to a formula-based allocation. In 2015, performance-based funding

determined around $4500 per full-time equivalent (FTE) student (Snyder 2015). Ohio’s initial

formula awarded points for accumulating credit hours (progression) at around 60%, for degree

completions at around 20%, and for Doctoral and medical degrees at around 20% (Ohio Higher

Ed Funding Commission, 2012).7 Performance-based funding in Tennessee—amounting to

around $4000 per FTE student—is determined similarly, but with greater weight on degree

production relative to Ohio’s initial formula, and with some weight on research and public

service.

Despite these differences in the details of implementation, the Ohio and Tennessee

programs are similar in their core design and implementation. Both programs convert multi-year

moving averages of an additive set of weighted measures—primarily consisting of counts of

students acquiring credit hours or completing a degree program—into points that determine what

proportion of the overall state instructional appropriation will accrue to a school.8 Both programs

include an incentive to increase admissions among disadvantaged students (e.g., low-income,

7 This formula was revised in 2014 and roughly inverted the relative weights for these two primary metrics. 8 In the first four years of the program Ohio used, variously, two- and five-year averages but standardized their program at three-year MAs in 2014, matching the averaging approach used in Tennessee.

13

adult students, underrepresented groups) that amounts to multipliers on these students in the

overall calculation of points for credit acquisition and graduation. Given the similarity of the

programs, our preferred analysis reduces noise by estimating the aggregate effect of these PBF

3.0 policies, but we also consider the programs separately. The pooled analysis provides an

estimate of the average effect across the two states and should be interpreted keeping in mind

that the programs are not identical.

In both Ohio and Tennessee, there was a short adjustment period to ensure that schools

would not lose too much funding in the first years of the program.9 Our empirical model assesses

the possibility that universities only respond to the policies once the stop-loss ended, but there

are several reasons that institutions had an incentive to respond earlier. First, institutions still had

an incentive to improve outcomes since it would be difficult to concentrate all of the

improvement in the year stop-loss ended. Second, although institutions were protected from large

losses during the stop-loss period, there was no cap on increased funding and so for many

schools, the stop-loss provision was moot. Finally, since performance is measured as a moving

average, outcomes during the stop-loss period still affected performance metrics in the later

periods. Though there are several reasons to expect a response during the stop-loss period, if

institutions expected that PBF would be eliminated before stop-loss ended, then we would expect

9 In Ohio, there was a formal stop-loss program for the first four years of the program that limited losses to a percentage of the prior year’s funding level. In FY 2011, changes were limited to 1%, FY2012, 2%, and FY2013, 3%. The state also included a final year of “bridge” funding in FY 2014 before transitioning to no stop loss. See the 2014 Performance-Based Funding Evaluation Report, available at https://www.ohiohighered.org/financial. Tennessee had no explicit stop-loss provision, but the funding adjustments necessary to transition to the performance-based funding model were phased in the during the first few years of the policy. (Personal Correspondence with Steven Gentile, Associate Chief Fiscal Officer of the Tennessee Higher Education Commission).

14

limited response in the early years of the program.

Though no other state has a performance-based funding policy that comes close to

matching the strength of the Ohio and Tennessee programs, many states have adopted smaller-

scale programs. This suggests that treating all states as controls could be inappropriate. That

said, it is unclear which states should be included in the control group because many states have

programs that put 0% of base funding at risk and do not reward persistence or graduation rates at

all. We follow the taxonomy developed by Snyder (2015) that classifies states into four

categories of performance funding. Our baseline control group consists of states with no

performance funding, but we also consider a control group that adds states that have

implemented only “rudimentary” performance funding.10 Rudimentary performance funding

programs do not link funding to college completion or cumulative attainment goals and only

include bonus funding (as opposed to putting baseline funding at risk).

Because we are interested in the effect of very high stakes performance funding, we

exclude states that had moderate performance-based funding from the analysis. These states have

not implemented the treatment of interest, but they are not an appropriate control group since

they have programs that place between 5 and 25 percent of funding at risk.

Data

Our primary source of data is the Integrated Postsecondary Education Data System

(IPEDS) that provides institution-level data for all schools that participate in Title IV funding.

We use IPEDS to measure total degree completions and three measures of undergraduate

10 Appendix table A1 lists the states in our main and restricted sample specifications.

15

success: first-to-second year retention, six-year graduation rates and BA completions.11 Total

degree completions is a measure of aggregate production and is directly incentivized by the

programs. First-to-second year retention provides a leading indicator of graduation rates and

improving this outcome is a priority for many institutions since the majority of attrition occurs

during the first few years. Six-year graduation rates are a standard measure of institutional

performance and capture an outcome that is critically important from the perspective of

individual students considering enrolling in a particular institution. BA completions capture a

broader class of students than graduation rates and is the outcome of interest for policy makers

interested in increasing the proportion of college-educated workers in a state. Though the four

outcomes are closely related, they need not move in lock-step and it is useful to consider all four

together.

We also use IPEDS data on full-time equivalent (FTE) enrollment, student composition

variables, and institutional spending in order to try to understand the mechanisms behind any

changes in outcomes. First, institutions could try to improve student outcomes by changing

spending patterns. We use IPEDS data on total institutional spending as well as the share

allocated to student supports, instructional spending and research spending. Second, institutions

may increase degree production by simply enrolling more students or enrolling a different type

of student. We measure total student enrollment, the proportion of undergraduates that are over

24, Pell dollars per enrolled student12, the proportion that are black, and the proportion that are

11 Total degrees captures the total number of BA, MA or Phd degrees granted by the institution in a given year. 12 Increases in Pell dollars per student could reflect increases in the proportion of students on Pell or it could reflect increases in the severity of need among the existing Pell recipients. It will not capture national changes in Pell grant generosity as all models include year fixed effects.

16

Hispanic. We supplement the IPEDS data with census data on time varying unemployment rates

and state demographics.

Table 1 shows descriptive statistics for 3 samples. Column (1) shows Ohio and Tennessee

(the treatment states), column (2) shows states that have no PBF programs during this time-

period (controls) and column (3) includes states that have at most rudimentary performance-

based funding during this period (alternative control group).13 Though the difference-in-

differences empirical approach does not require the treatment and control groups to have similar

characteristics in levels, examining the characteristics of treatment and control can be useful in

assessing external validity and may suggest areas of potential concern. Comparing across the 3

columns suggests that the treatment states are fairly similar to the control states in many, but not

all dimensions. Outcomes such as retention and graduation rates are similar across treatment and

control, but treatment institutions are generally larger and therefore generate more degrees per

year and have higher FTE enrollment. Treatment and control enroll a very similar type of student

in terms of Pell dollars per student and the proportion over 25, however, treated institutions

enroll far fewer Hispanic students compared to the control groups.

Table 2 shows descriptive statistics for our key academic outcomes before and after

policy adoption. Treatment states refer to Ohio and Tennessee and control states refer to states

that have no PBF programs. The alternative control group includes states that have, at most,

rudimentary performance-based funding during this period. The “difference” column for the

treatment states shows that treatment states experienced increased graduation rates, BA

completions and total degree completions over this time period. Though this pattern is

13 The “no-PBF” restriction generates a control group of 25 states. The less-restrictive “rudimentary PBF” restriction generates a control group of 34 states. The included states are detailed in Appendix Table A1.

17

encouraging, it may simply reflect national changes in higher education. Consistent with this

view, we see the same pattern of changes in these outcomes for the control states, regardless of

which control group we consider. For example, graduation rates increased by 3.16 percentage

points in the treatment states and by 3.6 percentage points in the control states. The raw

difference-in-differences are near zero for all four outcomes.

Though the simple difference-in-difference estimates are suggestive, Table 2 provides no

confidence intervals around these estimates and the point estimates could be contaminated by

pre-existing trends. In the following section, we describe our empirical model that places the

analysis in a more rigorous regression context.

Empirical Approach

Our empirical approach follows the related literature by estimating school-level

regressions that control for both school and year fixed effects (e.g. Kelchen and Stedrak 2016).

Specifically, for each outcome, we estimate the following two-way fixed effects regression

model

"#$ = & + ()*+,#$ + -.#$ + /# + 0$ + 1#$. (1)

*+,#$ is an indicator for whether a school is affected by performance-based funding 3.0 in a

particular year. We define this variable so that schools are considered affected by the policy

starting the year after the policy passes the state legislature. /# and 0$ are institution and year

fixed effects. "#$ is one of the 4 outcomes, and .#$ is a vector of time-varying characteristics

comprising the fraction of 18- to 26-year-olds in a state that are black or Hispanic, state

unemployment rates, interactions between these demographics and unemployment rates,

institutional total revenue and the share of revenue that comes from the state. In some

18

specifications, we also include baseline graduation rates interacted with a linear time trend. We

report analytic standard errors clustered at the state level, but results are very similar if we

bootstrap standard errors instead.

Equation (1) relies on the assumption that trends in the control states provide a valid

counterfactual for trends in the treatment states. Though this parallel trends assumption is

inherently untestable, we can provide suggestive evidence on its validity by re-estimating

equation (1) but providing the full event-study series of indicators for time relative to the policy

change. Specifically, we estimate

"#$ = & + ∑ (3453647 *+,#3 + ∑ (3

7368 *+,#3 + /# + 0$ + 1#$ (2)

Time period t-1 is the omitted category so all estimates are relative to the year before the policy

passes. If schools unaffected by the policy are trending differently than schools affected by the

policy, β-6 through β-2 will be trending up or down and this would cause us to doubt the validity

of our estimates from equation (1). In addition to providing a test of the validity of the empirical

design, the estimates of β0 through β6 in equation (2) provide an estimate of the time-path of the

policy effect.

To assess the importance of differential trends for our estimates, we also examine the

sensitivity of the estimates to the inclusion of state-specific linear time trends. If the difference-

in-differences assumption holds, estimates should be similar when state-specific time trends are

added to the model. Although the model with state-specific time trends allows for the possibility

of differential state trends, this model does not strictly dominate the simpler difference-in-

differences model because it requires the equally strong assumption that deviations from a linear

trend would have been similar in treatment and control in the absence of the policy. In cases

where the simple difference-in-differences and the de-trended difference-in-differences yield

19

very different estimated effects, which of these estimates is considered to be more reliable

depends on which assumption one considers more plausible. Our interpretation of this scenario is

that if results are strongly dependent on these inherently untestable assumptions, both estimates

should be interpreted with caution.

To complement the difference-in-differences design, we also implement the synthetic

control approach. The synthetic control method, as described in Abadie et al. (2010), constructs a

weighted average of the controls that best matches the pre-trends observed in the treated states.

We implement this analysis at the state-year level, and therefore seek to find a weighted average

of control states that represents a plausible counterfactual for the treated states. We match on all

pre-policy values of the outcome variable of interest, which renders pre-policy covariates

redundant. Our estimates and inference are very similar if we instead exclude some pre-policy

outcome periods and include covariates in the pre-period minimization problem.

We follow the approach laid out in Cavallo et al. (2013), to account for the fact that Ohio

and Tennessee vary in terms of the timing of policy implementation.14 Following Abadie et al.

(2010), we perform inference using a permutation test that sequentially treats all control units as

treated units and asks what proportion of these placebo estimates have a more extreme ratio of

post-treatment mean squared prediction error to pre-treatment mean squared prediction error.

The downside of the synthetic control approach is that it is not possible to convert these p-values

to confidence intervals without implausibly strong assumptions and, thus, we report p-values

rather than standard errors for the synthetic control estimates.

14 We operationalize this analysis using the synth_runner package described in Quistorff and Galiani (2017).

20

Results

Table 3 shows estimates of the effect of PBF 3.0 based on estimating equation (1) for the

four outcomes of interest. Panel A uses states with no PBF as the control while panel B expands

the control group to include states with no more than rudimentary PBF. The first column for each

outcome estimates a baseline model with just school and year fixed effects. The second column

for each outcome adds the vector of covariates .#$. The third column for each outcome adds a

control for baseline graduation rates interacted with a time trend to account for the possibility

that policies may have been enacted endogenously to trends in graduation rates.

Across all four performance outcomes, we estimate small, statistically insignificant

effects that are fairly robust across specifications. Though estimates are small for all outcomes,

first-year retention and six-year graduation rates are more precisely estimated, with confidence

intervals that exclude effects larger than 0.015 for both outcomes. Log BA degrees on the other

hand has fairly large standard errors so it is not possible to rule out moderate effects. The total

degree coefficient is more precisely estimated than the BA degree coefficient, but it remains the

case that substantively important effects are in the confidence interval. Though it is impossible to

prove a null result, the six-year graduation and first-year retention results are very unlikely to

have occurred if the true effect were moderate. Consider, for example, that the 99.9% confidence

interval for both outcomes exclude effects such as 0.025.

A key consideration with interpreting these results is whether or not the treatment states

are trending similarly to the control states in the years prior to policy implementation. We assess

whether there is evidence of differential trends by estimating equation (2) for each outcome in

turn. For these event study plots, the year before policy implementation is the omitted category

21

and so this is zero by construction. If the difference-in-differences assumption holds, the

coefficients should be near zero and not be trending up or down in the pre-period.

Panel A of Figure 1 shows the event study coefficients for log total degrees. There are

several statistically significant coefficients, but no general trend in either the pre- or the post-

period. Importantly, the coefficients for t-5 through t-2 are quite similar to the coefficients in the

post-period suggesting that there was no relative change between treatment and control during

this time period. The significance of some of the coefficients in Figure 1 appears to be driven by

the t-1 reference group being unusually low as opposed to a structural shift from the pre to the

post period. Panel B shows that for first-to-second year retention the pre-period estimates are

statistically indistinguishable from zero. Compared to the first-to-second year retention figure,

the graduation rate event study (shown in Panel C) is less stable, and the t-4 and t-3 coefficients

are statistically different than t-1. That said, there is little evidence of an overall differential trend

in the pre-period. In the post-period, there is one significant coefficient, though this coefficient is

fairly similar to many of the pre-period coefficients. Overall, there is little difference between the

post-period coefficients and the pre-period coefficients and we see no evidence suggesting that

the null result for six-year graduation rate is driven by differential trends. Panel D shows that the

event study for log BA completions is similar to the event study for total completions.

Though Figure 1 does not provide clear evidence of differential trends, it is also true that

with the exception of Panel B, the event study plots are somewhat unclear. This prevents any

strong conclusions for these outcomes based solely on the event study analysis. To complement

the event study analysis, we show in Table 4 how our preferred estimates from Table 3 change

when we add state-specific linear time trends. If the estimates in Table 3 are driven by

differential trends, we will observe very different estimates for specifications with and without

22

time trends. Table 4 show that the coefficients for log total degrees, first-to-second year retention

and graduation rates are fairly similar with or without linear time trends. The BA completions

coefficient is more sensitive, with the coefficient reversing sign and becoming significant at the

10% level. That said, the large standard error on the baseline specification means that the

coefficients are statistically indistinguishable across specifications.

Table 5 shows the results from estimating the synthetic control approach for each

outcome. Each outcome is estimated separately so the weights used to construct the control

group vary by outcome. Across all four outcomes, none of the estimates are statistically

significant. Column (2) and (3) show that there is a zero coefficient for first-to-second year

retention and six-year graduation rates. Columns (1) and (4) show moderate point estimates for

BA completions and total degree completions, but neither estimate is statistically significant.

Overall, the synthetic control method confirms the previous findings for graduation rates and

persistence, but it does not clarify whether there is an effect on degree production as the

estimates are moderate, but not statistically significant.

Although the Abadie et al. (2010) approach chooses the synthetic control group to

maximize match quality, whether or not there exists a weighted average of our potential control

groups that matches the treatment group well is an empirical question. To assess the success of

the matching strategy, we plot each outcome for the treatment states along with the

counterfactual path of the synthetic control group of states. Panel A of Figure 2 shows that for

log total degrees, the synthetic control matches fairly well to the treatment group in the pre-

policy period. In the post policy period, the outcomes diverge slightly and then converge.

Though this divergence is suggestive of a short-term increase in degree production, some

divergence is expected by chance since the synthetic control approach explicitly matches on the

23

pre-policy outcomes and does not match on the post-policy outcomes. The p-values from Table 5

suggest that the divergence between the treatment states and the synthetic control is not

unusually large compared to a randomly chosen treatment group.

Panel B of Figure 2 shows that for first-to-second year retention, the synthetic control

group does not perfectly match the treatment group in the pre-period, but the magnitude of the

differences is not large. In the post-period, first-to-second year retention diverges somewhat

between treatment and synthetic control, but this divergence is modest in magnitude and non-

monotonic. Panel C shows that six-year graduation rates are fairly similar in the synthetic control

and treatment groups both before and after the policy. Finally, Panel D shows that pre-policy

match quality is generally strong for log BA degrees and similar to log total degrees, treatment

and synthetic control diverge in the post-policy period. Again, based on the p-value from the

permutation test shown in Table 5, this divergence is not statistically significant.

Heterogeneity

It is reasonable to expect that certain institutions will be more likely to respond to

performance-based funding than other institutions and therefore the overall null effect may mask

larger effects at certain institutions. To explore this possibility, we split the analysis sample along

several dimensions. Specifically, we stratify treated institutions into groups with high/low

baseline graduation rates, high/low endowments, and high/low baseline reliance on state funding.

This last measure is based on the fraction of an institution’s revenues that come directly from the

state as opposed to other sources such as grants or donations. All of these measures are defined

based on average pre-policy levels between 2005 and 2009. We also include estimated effects for

Ohio and Tennessee separately.

24

From an ex-ante perspective, we suspect that institutions with low graduation rates will

be more likely to respond to performance-based funding, since they potentially face funding cuts.

That said, it is also possible that institutions with high graduation rates have more resources

available to attempt to improve student outcomes. Relatedly, we expect that institutions with

large endowments may be less responsive to financial pressures from the state, but these

institutions might also be better positioned to respond effectively to performance incentives. We

expect that institutions with greater reliance on state funding will be more likely to respond to

performance-based funding. Finally, given that the two PBF programs we focus on are not

identical (for example, the Ohio and Tennessee programs have different weighting on degree

completion relative to the accumulation of credit hours), nor are other aspects of each state’s

higher education system, we allow for the possibility that overall outcomes may differ according

to these differences in programs, contexts, and potential interactions between them.

Table 6 shows estimates of the effect of performance-based funding split by the three

characteristics and by state. For each outcome we show estimates from a simple difference-in-

differences, a difference-in-differences with state time trends, and the synthetic control approach.

The difference-in-differences estimates show standard errors in parentheses and the synthetic

control estimates show p-values from a permutation test in brackets. Given that we are

examining many different hypotheses (12 specifications with 8 subsamples), some coefficients

are likely to be statistically significant even if the null hypothesis is true. As such, we emphasize

the general patterns in the data rather than focusing entirely on the statistically significant

coefficients. We view a finding as most credible when all three of these specifications yield the

same qualitative conclusion.

25

For log total degree completions, we do not find robust evidence of improved outcomes

for any subsamples. There are some statistically significant estimates but, in each case, the

results are not robust to the other approaches. This general pattern is similar for first-to-second-

year retention, six-year graduation rates, and log BA completions. That said, unlike the overall

analysis, the estimates for many subsamples are large in magnitude and in some cases, standard

errors are so large so that the results are essentially uninformative. Given that the synthetic

control approach specifically identifies the control group based on pre-policy trends, this

specification is less likely to be driven by differential trends compared to the difference-in-

differences approach. Looking at just the synthetic control estimates, we see that of the 32

estimates, only 1 is statistically significant at the 5% level.

Considering each state separately, we estimate positive, statistically significant outcomes

on three of the four measures in Tennessee when using the simple difference-in-differences

model. However, estimated effects on both log total degrees and the first-to-second year

retention rate decline in magnitude by tenfold and fivefold, respectively, when we allow for

differential state-level trends and the estimated effect on log BA degrees changes sign. The

synthetic control results also disagree in sign for two of the positive estimates. All but one of the

12 results for Ohio lack statistical significance at the 95% confidence level, and though estimates

are less sensitive to the inclusion of the state-specific time trend, we still observe differently

signed estimates across the difference-in-difference and synthetic control approaches. While it is

possible that our overall null findings are the result of averaging these mostly-positive results for

Tennessee and mostly-negative results for Ohio, we believe substantial caution is warranted

given that none of the statistically significant results are robust across empirical approaches.

Potential mechanisms

26

Institutions seeking to improve outcomes such as graduation rates can do so in several

ways. First, institutions can increase spending on student supports or instruction and thereby

increase educational quality. Second, institutions could alter educational quality in ways that do

not require spending money. For example, they may spend more efficiently due to pressure from

the PBF policy. Finally, institutions can try to alter the composition of enrolled students.

Changing student composition can directly affect graduation rates through a compositional

effect, and it can also affect graduation rates through a peer effect. Though we find no evidence

that PBF had an effect on key academic outcomes, assessing whether institutions responded to

PBF in terms of these potential mechanisms provides important context for the null results.

Given our data, we cannot test whether institutions alter educational efficiency, but we are able

to study whether institutions alter educational spending priorities or enroll a different type of

student.

In Table 7, we examine whether institutions alter their total spending and also consider

whether they alter the share of spending on student support, instruction and research. For each

outcome, we show the simple difference-in-differences, the difference-in-differences with state

time-trends and the synthetic control approach. As discussed earlier, we view results to be most

credible when all three approaches yield qualitatively similar results. If PBF causes institutions

shift spending, we might expect increases in the proportion of spending on student support or

instruction. The results in Table 7 provide no evidence of an increase in spending or a change in

the proportion of spending in different categories. The coefficients are generally small in

27

magnitude, the signs of the coefficients switch across specifications, and none of the estimates

are statistically significant.15

In Table 8 we examine whether schools respond to the policy by changing the number or

composition of enrolled students. Schools can theoretically alter the composition of students

either by changing recruitment patterns or in some cases, by explicitly altering admission

standards. We measure total student enrollments, Pell dollars per enrolled student, the proportion

who are over 24, the proportion who are black and the proportion who are Hispanic. Columns

(1)-(6) of Table 8 show no statistically significant effect on total enrollment or Pell dollars per

student, though the coefficients are not very precisely estimated. Columns (7)-(9) shows that

there is little change in the proportion of students over 24 and this is precisely estimated. The

proportion of black students is estimated to fall in the simple difference-in-differences model, but

the sign reverses when controlling for state-specific trends and is essentially a zero estimate in

the synthetic control approach. The one result that shows a consistent pattern is that the

proportion Hispanic falls slightly following PBF 3.0. This effect is only statistically significant in

the two difference-in-differences estimates, but the magnitude is fairly similar across all three

specifications.16

The overall change in academic outcomes can be thought of as the combination of two

forces. First, institutions can alter the composition of their enrolled students. Second, institutions

can improve outcomes, conditional on the composition of their enrolled students. The baseline

15 Appendix figure A1 shows the event study plots for the outcomes studied in Table 7. Appendix figure A2 shows the associated synthetic control figures. 16 Appendix figure A3 shows the event study plots for the outcomes studied in Table 8. Panel E shows clear evidence of declining proportion Hispanic prior to the policy, suggesting that the differential trend assumption may not hold for this outcome, but the rate of decline does appear to accelerate in the post period. Appendix figure A4 shows the synthetic control plots for the outcomes studied in Table 8.

28

analysis examines the overall effect on academic outcomes, but given the declines in Hispanic

enrollment, it is interesting to separately investigate whether academic outcomes change

conditional on the composition of enrolled students. Appendix Table A2 shows 2 columns for

each outcome. The first column replicates the earlier difference-in-difference analysis. The

second column adds controls for student composition. This exercise shows that the coefficients

are fairly similar regardless of whether we control for student composition. This suggests that, in

addition to finding no evidence of effect on academic outcomes overall, there is no evidence of

an effect when holding student composition constant.17

Conclusion

Despite having dramatically higher stakes than other states, we find no evidence that the

performance-based funding enacted in Ohio and Tennessee had an effect on key academic

outcomes. This finding suggests that even large-scale performance-based funding is unlikely to

be an effective policy for improving higher education outcomes. An important caveat to this

conclusion is that only the estimates for first-to-second year retention and six-year graduation

rates are precisely estimated and we cannot rule out moderate effects on degree production.

A common concern regarding performance funding is that universities may lower

academic standards in order to increase their course completion, student persistence and

17 By controlling for the intermediate outcome of student composition, it is possible that we are introducing bias since these outcomes are caused by the policy. One indirect piece of evidence on this is whether the event study pre-trend plots are similar with or without compositional controls. In other words, we ask whether there are differential trends in academic outcomes conditional on student composition. Appendix Figure A5 shows that the conditional event study plots are fairly similar to the main event study plots shown in Figure 1. We have also formally tested whether the event study indicator coefficients statistically differ across models with and without compositional controls and find no evidence that they do.

29

graduation (Fain 2014). Our institution-level data lacks information that would allow us to test

this concern directly, but it is worth noting that our finding no effect on persistence points away

from this concern. If institutions lowered standards or used other artificial means to increase

performance, we should have observed an improvement in measured outcomes. As such, if

institutions are attempting to game the new system by reducing standards, they are not doing so

successfully.

One caveat to our results is that we evaluate the relatively short-term effects of

performance-based funding. With only six or seven years of follow up data, we cannot make

strong claims regarding the long-term effect of the performance funding system. In particular,

public higher education institutions have a reputation for being slow moving and it is possible

that institutions will respond eventually but they simply haven’t done so yet. This caveat is

particularly relevant for our study of graduation rates. But, given that first-to-second year

retention is a leading indicator of graduation rates, our null result for retention suggests that we

are unlikely to observe large increases to graduation rates in the near future.

Given that there is clear financial incentive to improve outcomes in response to these

policies, it is worth considering theoretical reasons why outcomes may not improve. First,

performance-based funding is motivated by the principal-agent model where a state (the

principal) provides incentives for universities (the agents) to improve student outcomes. In cases

where the principal and agent have very different objectives, these incentives should alter

university behavior. But if universities share the same objectives as the state, then theoretically

the incentives will not have their intended effects. In other words, it may be the case that schools

strive to increase persistence and graduation in the absence of state financial incentives. Second,

it is possible that schools are incentivized by performance-based funding, but they do not know

30

how to reallocate resources successfully in order to improve outcomes. Finally, it is possible that

school administrators and faculty have their own principal-agent problem since only the

university faces financial incentives – not individual workers.

Although we find no evidence that performance-based funding has improved outcomes,

we also find no evidence that it has harmed outcomes. As such, it is not clear that existing

performance-based funding models should be abandoned since they appear to be as effective as

traditional funding models. However, our study is focused entirely on the effects of performance-

based funding on outcomes and we provide no evidence on the direct costs of administering

these systems and the cost of compliance.18 High-stakes performance-based funding programs

may lead decision-makers to refocus finite resources towards satisfying the compliance

requirements of the program and away from efforts more directly related to student outcomes. In

such a case, these programs may be implicitly harming student outcomes by reducing potential

future gains. Future work might estimate these potential costs and their interaction with student

outcomes and assess the longer-term effects of PBF 3.0.

18 Qualitative work on performance-based funding suggests that administrators respond by increasing the use of data for institutional planning and by altering academic and student services (see Dougherty and Reddy (2011) for a review).

31

Bibliography Alamuddin, R., Rossman, D., & Kurzweil, M. (2018, April 4). Monitoring Advising Analytics to

Promote Success (MAAPS): Evaluation Findings from the First Year of Implementation. https://doi.org/10.18665/sr.307005

Bettinger, E. P., & Long, B. T. (2018). Mass Instruction or Higher Learning? The Impact of

College Class Size on Student Retention and Graduation. Education Finance and Policy, 13(1), 97-118.

Brawer, J., Steinert, Y., St-Cyr, J., Watters, K., & Wood-Dauphinee, S. (2006). The significance

and impact of a faculty teaching award: disparate perceptions of department chairs and award recipients. Medical Teacher, 28(7), 614-617.

Dee, T. S., & Wyckoff, J. (2015). Incentives, selection, and teacher performance: Evidence from

IMPACT. Journal of Policy Analysis and Management, 34(2), 267-297. Dougherty, K. J., & Reddy, V. (2011). The impacts of state performance funding systems on

higher education institutions: Research literature review and policy recommendations. New York: Community College Research Center, Teachers College, Columbia University. Retrieved from http://ccrc.tc.columbia.edu/publications/impacts-state-performance-funding.html

Dougherty, K. J., Jones, S. M., Lahr, H., Natow, R. S., Pheatt, L., & Reddy, V. (2014).

Performance funding for higher education: Forms, origins, impacts, and futures. The ANNALS of the American Academy of Political and Social Science, 655(1), 163-184.

Fain, Paul. (2014). “Gaming the System.” Inside Higher Ed.

https://www.insidehighered.com/news/2014/11/19/performance-based-funding-provokes-concern-among-college-administrators

Hess, F. & Castle, J. (2008). Teacher pay and 21st-century school reform. In T. Good (Ed.), 21st

Century education: A reference handbook. Thousand Oaks, CA: SAGE Publications, Inc. doi:10.4135/9781412964012.n58

Hillman, N. W., Tandberg, D. A., & Gross, J. P. (2014). Performance funding in higher education: Do financial incentives impact college completions?. The Journal of Higher Education, 85(6), 826-857.

Hillman, N. W., Tandberg, D. A., & Fryar, A. H. (2015). Evaluating the impacts of “new”

performance funding in higher education. Educational Evaluation and Policy Analysis, 37(4), 501-519.

Holmstrom, B., & Milgrom, P. (1991). Multitask principal-agent analyses: Incentive contracts,

asset ownership, and job design. Journal of Law, Economics, & Organization, 7, 24-52.

32

Kelchen, R., & Stedrak, L. J. (2016). Does Performance-Based Funding Affect Colleges' Financial Priorities?. Journal of education finance, 41(3), 302-321.

Layzell, D. T. (1999). Linking performance to funding outcomes at the state level for public

institutions of higher education: Past, present, and future. Research in Higher Education, 40(2), 233-246.

Main and Griffith (2018) From SIGNALS to Success? The Effects of an Online Advising

System on Course Grades. mimeo.

McKinney, Lyle & Hagedorn, Linda Serra. (2017). Performance-Based Funding for Community Colleges: Are Colleges Disadvantaged by Serving the Most Disadvantaged Students? The Journal of Higher Education, 88(2), 159-182.

McLendon, M. K., Hearn, J. C., & Deaton, R. (2006). Called to account: Analyzing the origins

and spread of state performance-accountability policies for higher education. Educational Evaluation and Policy Analysis, 28(1), 1-24.

McLendon, M. K., & Hearn, J. C. (2013). The resurgent interest in performance-based funding

for higher education. Academe, 99(6), 25. Milliron, M. D., Malcolm, L., & Kil, D. (2014). Insight and Action Analytics: Three Case

Studies to Consider. Research & Practice in Assessment, 9, 70-89. Ohio Higher Education Funding Commission. Recommendations of the Ohio Higher Education

Funding Commission. 2012 Quistorff, Brian and Sebastian Galiani. The synth_runner package: Utilities to automate

synthetic control estimation using synth, Feb 2017. https://github.com/bquistorff/synth_runner. Version 1.3.0.

Philipp, S. B., Tretter, T. R., & Rich, C. V. (2016). Undergraduate teaching assistant impact on

student academic achievement. Electronic Journal of Science Education, 20(2). Rutherford, A., & Rabovsky, T. (2014). Evaluating impacts of performance funding policies on

student outcomes in higher education. The ANNALS of the American Academy of Political and Social Science, 655(1), 185-208.

Sanford, T., & Hunter, J. M. (2011). Impact of performance funding on retention and graduation

rates. education policy analysis archives, 19, 33. Snyder, Martha. Driving Better Outcomes: Typology and Principles to Inform Outcome-Based

Funding Models. 2015. HCM Strategists. Snyder, Martha. Driving Better Outcomes: Fiscal Year 2016 State Status & Typology Update.

2016. HCM Strategists.

33

Tandberg, D. A., & Hillman, N. W. (2014). State higher education performance funding: Data,

outcomes, and policy implications. Journal of Education Finance, 39(3), 222-243. Tandberg, D. A., Hillman, N., & Barakat, M. (2014). State Higher Education Performance

Funding for Community Colleges: Diverse Effects and Policy Implications. Teachers College Record, 116(12), n12.

Webber, D. A., & Ehrenberg, R. G. (2010). Do expenditures other than instructional

expenditures affect graduation and persistence rates in American higher education?. Economics of Education Review, 29(6), 947-958.

Treatment ControlAlternative

ControlFirst-year undergrad retention rate 0.7656 0.7968 0.79216-yr graduation rate 0.4839 0.5055 0.5005Baccalaureate degrees awarded 2,912 2,134 2,189Total degrees 4,065 2,895 2,983Fall FTE ug enrollment 14,614 9,849 10,211Pell dollars per student 1,044 1,022 1,008Proportion undergraduates over 25 0.1879 0.2066 0.2037Proportion undergraduate black students enrolled 0.1211 0.0953 0.0894Proportion undergraduate hispanic students enrolled 0.0249 0.1135 0.1042R1 or flagship university 0.1 0.1155 0.131Share of oper rev from state appr 0.4392 0.5837 0.5645Share spent on instruction 0.3724 0.354 0.3517Share spent on student supports 0.2454 0.2783 0.2676Share research expenditures 0.0614 0.0598 0.0626Number of schools 20 241 326

Table 1: Descriptive Statistics

Notes: The control group is states with no PBF programs, the alternative control group is states with at most rudimentary PBF programs.

Before After Difference Before After Difference Diff-in-diffFirst-year undergrad retention rate 0.7649 0.7664 0.0015 0.7970 0.7959 -0.0011 0.00266-yr graduation rate 0.4691 0.5007 0.0316 0.4883 0.5243 0.0360 -0.0044Log BA degrees 7.6142 7.7958 0.1817 7.1873 7.3672 0.1799 0.0017Log total degrees 7.9021 8.0960 0.1939 7.4679 7.6665 0.1986 -0.0047Number of institutions 20 20 241 241Number of years 6 6 6 6

Before After Difference Diff-in-diffFirst-year undergrad retention rate 0.7918 0.7920 0.0002 0.00136-yr graduation rate 0.4844 0.5187 0.0343 -0.0027Log BA degrees 7.2172 7.3950 0.1777 0.0039Log total degrees 7.4919 7.6898 0.1980 -0.0041Number of institutions 326 326Number of years 6 6

Notes: The control group is states with no PBF programs, the alternative control group is states with at most rudimentary PBF programs.

Table 2: Descriptive statistics split by pre/post policyTreatment States Control States

Alternative control group

Table 3: Difference-in-difference main results

Panel A: Control is states with no program(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Undergraduate outcomes

Dependent variable: Log total degrees First-year retention rate 6-yr graduation rate Log BA degreesTreatment effect -0.00024 0.0134 0.0139 0.00143 0.00369 0.00371 -0.00573 -0.00035 -0.00045 -0.00613 0.0113 0.0119

(0.0279) (0.0286) (0.0254) (0.00708) (0.00683) (0.00675) (0.00770) (0.00657) (0.00691) (0.0377) (0.0368) (0.0335)Fixed effects School School School School School School School School School School School SchoolCovariates No Yes Yes No Yes Yes No Yes Yes No Yes YesBaseline grad rate x trend No No Yes No No Yes No No Yes No No YesN 3104 3104 3104 3105 3105 3105 3098 3098 3098 3104 3104 3104

Panel B: Control is states with at most rudimentary programsUndergraduate outcomes

Dependent variable: Log total degrees First-year retention rate 6-yr graduation rate Log BA degreesTreatment effect -0.00218 0.0104 0.0091 0.000576 0.00089 0.000868 -0.00399 -0.0026 -0.00232 -0.00465 0.00945 0.00801

(0.0265) (0.0256) (0.0227) (0.00681) (0.00657) (0.00652) (0.00704) (0.00611) (0.00668) (0.0370) (0.0344) (0.0311)Fixed effects School School School School School School School School School School School SchoolCovariates No Yes Yes No Yes Yes No Yes Yes No Yes YesBaseline grad rate x trend No No Yes No No Yes No No Yes No No YesN 4023 4023 4023 4024 4024 4024 4017 4017 4017 4023 4023 4023

Notes: This table shows the results from estimating equation (1) in the text. Standard errors clustered at the state level are shown in parentheses.

Table 6: Heterogeneity analysis(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Undergraduate outcomes

Log total degrees First-year retention rate 6-yr graduation rate Log BA degrees

Panel A: High endowment institutionsTreatment Effect 0.0228 -0.0157 0.0161 0.00603 -0.00972* -0.0043 -0.0161 0.00844 0.0186 0.0218 -0.032 0.0104

(0.0192) (0.0296) [0.4291] (0.00368) (0.00508) [0.8185] (0.0104) (0.00632) [0.4480] (0.0178) (0.0199) [0.4499]

Panel B: Low endowment institutionsTreatment Effect 0.00158 0.0135 0.0456 -0.000875 0.00804 -0.0034 0.0130* -0.00168 -0.0109 -0.00435 -0.0104 0.0254

(0.0323) (0.0228) [0.6900] (0.00887) (0.00672) [0.4783] (0.00645) (0.00873) [0.9959] (0.0437) (0.0193) [0.9017]

Panel C: High graduation rate institutionsTreatment Effect 0.0141 -0.0199 0.0192** 0.00726 -0.00846** -0.004 -0.00204 0.00248 0.0055 0.0218 -0.0229 0.0084

(0.01310) (0.01790) [0.0321] (0.00651) (0.00369) [0.9735] (0.00759) (0.00448) [0.9887] (0.01510) (0.01960) [0.6144]

Panel D: Low graduation rate institutionsTreatment Effect 0.0198 0.00942 0.0522 -0.00205 0.00723 -0.0034 0.00573 0.00947 0.0079 0.00857 -0.0266 0.0485

(0.0579) (0.0165) [0.7483] (0.00856) (0.00450) [0.7222] (0.00836) (0.00813) [0.8733] (0.0598) (0.0190) [0.8785]

Panel E: High state-share of revenue institutionsTreatment Effect -0.00301 -0.0119 0.0291 0.0058 0.00545 0.0063 0.000619 0.0125 0.0092 0.0288 -0.0273 0.0742

(0.0525) (0.0152) [0.8247] (0.0118) (0.00525) [0.7951] (0.00893) (0.0109) [0.8166] (0.0629) (0.0173) [0.5399]

Panel F: Low state-share of revenue institutionsTreatment Effect 0.0354*** -0.00545 0.031 0.0023 -0.000941 -0.0028 0.00249 -0.00443 -0.0069 0.00551 -0.0195 0.0175

(0.0113) (0.0234) [0.2726] (0.00269) (0.00408) [0.9965] (0.00792) (0.00798) [0.6858] (0.0125) (0.0202) [0.6337]

Panel G: Tennessee institutionsTreatment Effect 0.0471*** 0.00566 -0.023 0.0129*** 0.00201 0.003 0.0062 0.00382 -0.0022 0.0586*** -0.0119 0.0244

(0.0138) (0.00754) [0.625] (0.00303) (0.00302) [0.7083] (0.00533) (0.00414) [1.0000] (0.0139) (0.00851) [0.8333]

Panel H: Ohio institutionsTreatment Effect -0.0129 -0.017 0.0663 -0.00355 -0.000248 -0.0018 -0.00557 0.00634 -0.0004 -0.0248 -0.0369** 0.043

(0.0165) (0.0166) [0.4167] (0.00434) (0.00360) [0.9167] (0.00723) (0.00850) [1.0000] (0.0203) (0.0177) [0.5000]Covariates Yes Yes - Yes Yes - Yes Yes - Yes Yes -State-specific time trend No Yes - No Yes - No Yes - No Yes -Synthetic control - - Yes - - Yes - - Yes - - YesNote: The first two columns for each outcome include school and year fixed effects and use states with no PBF as the control group. The control group in the thirdcolumn for each outcome is the synthetic control. Synthetic control estimates show p-values in square brackets. Standard errors clustered at the state level shown inparentheses. * p<0.1, ** p<0.05, *** p<0.01

Table 7: Spending mechanism(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Log total expenditure Share spent on instruction Share spent on student supports Share research expenditures

Treatment Effect -0.0166 0.00579 -0.0285 0.00273 -0.00511 -0.0069 -0.0101 -0.00647 0.0045 -0.00233 -0.00278 -0.0029(0.0119) (0.0138) [0.6198] (0.0133) (0.0150) [0.7066] (0.0138) (0.0167) [0.6979] (0.00322) (0.00442) [0.9549]

Covariates Yes Yes - Yes Yes - Yes Yes - Yes Yes -State-specific time trend No Yes - No Yes - No Yes - No Yes -Synthetic control - - Yes - - Yes - - Yes - - YesN 3039 3039 3039 3039 3039 3039 3039 3039 3039 3039 3039 3039

Table 8: Student composition mechanism(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)

Log total enrollment (FTE) Log Pell dollars per student Proportion black Proportion Hispanic

Treatment Effect 0.0158 -0.0124 0.01 0.0212 0.0647 0.078 0.0007 0.006 0.0148 -0.008** 0.011*** -0.0039 -0.0105* -0.00418* -0.0061(0.0126) (0.0239) [0.8750] (0.0407) (0.0392) [0.6389] (0.004) (0.004) [0.550] (0.0038) (0.0037) [0.3750] (0.00569) (0.0021) [0.4948]

Covariates Yes Yes - Yes Yes - Yes Yes - Yes Yes - Yes Yes -State time trends No Yes - No Yes - No Yes - No Yes - No Yes -Synthetic control - - Yes - - Yes - - Yes - - Yes - - YesN 2846 2846 2846 3103 3103 3103 2920 2920 2920 3105 3105 3105 3105 3105 3105

Note: The first two columns for each outcome include school and year fixed effects and use states with no PBF as the control group. The control group in thethird column for each outcome is the synthetic control. Synthetic control estimates show p-values in square brackets. Standard errors clustered at the state levelshown in parentheses. * p<0.1, ** p<0.05, *** p<0.01

Proportion undergraduatesover 24

Note: The first two columns for each outcome include school and year fixed effects and use states with no PBF as the control group. The control group in the third column for each outcome is thesynthetic control. Synthetic control estimates show p-values in square brackets. Standard errors clustered at the state level shown in parentheses. * p<0.1, ** p<0.05, *** p<0.01

Figure 1: Event Study Analysis of Effects of PBF 3.0 on Academic Outcomes Panel A: Log Total Degrees Panel B: First-to-Second Year Retention

Panel C: Six-Year Graduation Rate Panel D: Log BA Degrees

Figures show year-over-year results from a regression of the specified outcome on institution and year fixed effects, and a set of indicator variables for years before and after a PBF program was enacted. t-1 is omitted from the regression models so estimated effects are relative to this period. See equation (2) and accompanying explanation in the text for additional details. Standard errors clustered at the state level.

Figure 2: Synthetic Control Analysis of Effects of PBF 3.0 on Academic Outcomes Panel A: Log Total Degrees Panel B: First-to-Second Year Retention

Panel C: Six-Year Graduation Rate Panel D: Log BA Degrees

Figures show results from synthetic control method estimates (Abadie et al 2010). The counterfactual path of the outcome is generated by minimizing all pre-period differences of the dependent variable between treated states and control states. See text for further detail.

Table A1: Control States Used in Analysis

(1) (2)

No-PBF

Control Group At Most Rudimentary PBF Control Group

Alabama Yes Yes Arizona Yes Yes California Yes Yes Colorado Yes Yes Connecticut Yes Yes Georgia Yes Yes Hawaii Yes Yes Idaho Yes Yes Iowa Yes Yes Kansas Yes Yes Kentucky Yes Yes Maryland Yes Yes Massachusetts Yes Yes Michigan - Yes Minnesota - Yes Missouri - Yes Nebraska - Yes New Hampshire Yes Yes New Jersey Yes Yes New Mexico - Yes New York Yes Yes North Carolina - Yes Oklahoma Yes Yes Rhode Island Yes Yes South Carolina Yes Yes South Dakota Yes Yes Texas Yes Yes Utah - Yes Vermont Yes Yes Virginia Yes Yes Washington - Yes West Virginia Yes Yes Wisconsin Yes Yes Wyoming - Yes Source: Author calculations

Table A2: Academic Outcomes Conditional on Student Composition (1) (2) (3) (4) (5) (6) (7) (8)

Undergraduate Outcomes

Log Total Degrees First-year

Retention rate 6-yr Graduation

rate Log BA degrees

Treatment Effect 0.0137 0.0132 0.0037 0.0001 0.0011 0.0017 0.0117 0.0036 (0.0253) (0.0288) (0.0068) (0.0057) (0.0064) (0.0068) (0.0335) (0.0333)

Fixed Effects School School School School School School School School Covariates Yes Yes Yes Yes Yes Yes Yes Yes Baseline Grad Rate x Trend Yes Yes Yes Yes Yes Yes Yes Yes Composition No Yes No Yes No Yes No Yes Observations 3,102 3,103 , 3,102 Notes: The first column for each outcome replicates the analysis from Table 3 but has a few fewer observations due to schools missing student composition data. The second column for each outcome adds the student composition controls to assess whether schools improve outcomes, conditional on composition. Standard errors clustered at the state level reported in parentheses. *p<0.1, ** p<0.05, *** p<0.01

Figure A1: Event Study Analysis of Effects of PBF 3.0 on Spending Outcomes Panel A: Log Total Expenditures Panel B: Share Spent on Instruction

Panel C: Share Spent on Student Supports Panel D: Share Spent on Research

See notes to Table 1.

Figure A2: Synthetic Control Analysis of Effects of PBF 3.0 on Spending Outcomes Panel A: Log Total Expenditures Panel B: Share Spent on Instruction

Panel C: Share Spent on Student Supports Panel D: Share Spent on Research

See notes to Table 2.

Figure A3: Event Study Analysis of Effects of PBF 3.0 on Alternate Outcomes Panel A: Log Total Enrollment Panel B: Log Pell Dollars per Student

Panel C: Proportion of Students Over Age 24 Panel D: Proportion of Students Black

Panel E: Proportion of Students Hispanic

See notes to Table 1.

Figure A4: Synthetic Control Analysis of Effects of PBF 3.0 on Alternate Outcomes Panel A: Log Total Enrollment Panel B: Log Pell Dollars per Student

Panel C: Proportion of Students Over Age 24 Panel D: Proportion of Students Black

Panel E: Proportion of Students Hispanic

See notes to Table 2.

Figure A5: Event Study Analysis of Effects of PBF 3.0 on Academic Outcomes Holding Student Composition Constant

Panel A: Log Total Degrees Panel B: First-to-Second Year Retention

Panel C: Six-Year Graduation Rate Panel D: Log BA Degrees

See notes to Table 1.