on interpreting the regression discontinuity design as a...

On Interpreting the Regression DiscontinuityDesign as a Local Experiment∗

Jasjeet S. Sekhon† Rocıo Titiunik‡

Robson Professor James Orin Murfin Associate ProfessorPolitical Science and Statistics Political Science

UC Berkeley University of Michigan

October 2, 2016

Abstract

We discuss the two most popular frameworks for identification, estimation and infer-ence in regression discontinuity (RD) designs: the continuity-based framework, wherethe conditional expectations of the potential outcomes are assumed to be continuousfunctions of the score at the cutoff, and the local randomization framework, wherethe treatment assignment is assumed to be as good as randomized in a neighborhoodaround the cutoff. Using various examples, we show that (i) assuming random assign-ment of the RD running variable in a neighborhood of the cutoff neither implies thatthe potential outcomes and the treatment are statistically independent nor that thepotential outcomes are unrelated to the running variable in this neighborhood, and (ii)assuming local independence between the potential outcomes and the treatment doesnot imply the exclusion restriction that the score affects the outcomes only throughthe treatment indicator. Our discussion highlights key distinctions between “locallyrandomized” RD designs and real experiments, including that statistical independenceand random assignment are conceptually different in RD contexts, and that the RDtreatment assignment rule places no restrictions on how the score and potential out-comes are related. Our findings imply that the methods for RD estimation, inferenceand falsification used in practice will necessarily be different (both in formal propertiesand in interpretation) according to which of the two frameworks is invoked.

∗We are indebted to Matias Cattaneo, Anthony Fowler, Kellie Ottoboni, Nicolas Idrobo Rincon, KosukeImai, Joshua Kalla, Fredrik Savje, Yotam Shem-Tov, and three anonymous reviewers for helpful commentsand discussion. Sekhon gratefully acknowledges support from the Office of Naval Research (N00014-15-1-2367) and Titiunik gratefully acknowledges financial support from the National Science Foundation (SES1357561).†UC Berkeley, Travers Department of Political Science, 210 Barrows Hall #1950, Berkeley, CA 94720-

1950. Email: <[email protected]>, Web: http://sekhon.berkeley.edu.‡Department of Political Science, 505 South State St., 5700 Haven Hall, University of Michigan, Ann

Arbor, MI 48109. Email: <[email protected]>, Web: http://www.umich.edu/~titiunik.

http://sekhon.berkeley.edu

http://www.umich.edu/~titiunik

The regression discontinuity (RD) design is a research strategy based on three main

components—a score or “running variable,” a cutoff, and a treatment. Its basic characteristic

is that the treatment is assigned based on a known rule: all units receive a score value, and

the treatment is offered to those units whose score is above a cutoff and not offered to those

units whose score is below it (or viceversa).

The RD design has been available since the 1960s (Thistlethwaite and Campbell 1960),

but its popularity has grown particularly fast in recent years. In the last decade, an increasing

number of empirical researchers across the social and biomedical sciences has turned to

the RD design to estimate causal effects of treatments that are not, and often cannot be,

randomly assigned. This growth in RD empirical applications has occurred in parallel with

a rapid development of methodological tools for estimation, inference and interpretation of

RD effects. See Cook (2008), Imbens and Lemieux (2008) and Lee and Lemieux (2010)

for early reviews, and the introduction to this volume by Cattaneo and Escanciano for a

comprehensive list of recent references.

The recent popularity of the RD design was in part sparked by the work of Hahn, Todd,

and van der Klaauw (2001), who translated the design into the Neyman-Rubin potential

outcomes framework (Holland 1986) and offered minimal conditions for non-parametric iden-

tification of average effects. These authors showed that, when all units comply with their

assigned treatment, the average effect of the treatment at the cutoff can be identified under

the assumption that the conditional expectations of the potential outcomes given the score

are continuous (and other mild regularity conditions). This emphasis on continuity condi-

tions for identification was a departure from other “quasi-experimental” research designs,

which are typically based on independence or mean independence assumptions.

In the Neyman-Rubin framework, randomized experiments are seen as the gold standard,

and quasi-experimental designs are broadly characterized in terms of the assumptions under

which the (non-random) treatment assignment mechanism is as good as randomized. For ex-

ample, in observational studies based on the unconfoundedness or “selection-on-observables”

1

assumption, the treatment is as good as randomly assigned, albeit with unknown distribu-

tion, after conditioning on observable covariates (Imbens and Rubin 2015). Similarly, instru-

mental variables designs can be seen as randomized experiments with imperfect compliance,

and difference-in-difference designs compare treatment and control groups that, on average—

and except for time-invariant characteristics that can be removed by differencing—differ only

on treatment status. In all these cases, a treatment and a control group are well defined and,

under certain assumptions, can be compared to identify the average treatment effect for the

population of interest (or a subpopulation thereof).

In this context, the results derived by Hahn, Todd, and van der Klaauw (2001) set the RD

design apart. In contrast to most other quasi-experimental designs, RD identification was

established only for the average treatment effect at a single point: the cutoff. This implied

that, unlike differences-in-differences or selection-on-observable designs, the RD design could

not be accurately described by appealing to a comparison between a treatment and a control

group. Given the RD assignment rule, under perfect compliance it is impossible for treated

and control units to have the same score value. Moreover, when the running variable is

continuous, the probability of seeing an observation with a score value exactly equal to the

cutoff is zero. Thus, in the continuity-based RD setup, identification of the average treatment

effect at the cutoff necessarily relies on extrapolation, as there are neither treated nor control

observations with score values exactly equal to the cutoff.

The influential contribution by Lee (2008) changed the way RD was perceived, and

aligned the interpretation of RD designs with experimental (and the other quasi-experimental)

research designs. Lee (2008) argued that in a RD setting where the score can be influenced by

the subjects’ choices and unobservable characteristics, treatment status can be interpreted

to be as good as randomized in a local neighborhood of the cutoff as long as subjects lack

the ability to precisely determine the value of the score they receive—i.e., as long as their

score contains a random chance component. Lee’s framework captured the original ideas

in the seminal article by Thistlethwaite and Campbell, who called a hypothetical experi-

2

ment where the treatment is randomly assigned near the cutoff an “experiment for which

the regression-discontinuity analysis may be regarded as a substitute” (Thistlethwaite and

Campbell 1960, p. 310).

The interpretation of RD designs as local experiments developed by Lee (2008) has been

very influential, both conceptually and practically. Among other things, it established the

need to provide falsification tests based on predetermined covariates just as one would do in

the analysis of experiments, a practice that has now been widely adopted and has increased

the credibility of countless RD applications (e.g., Caughey and Sekhon 2011; Eggers et al.

2014; Hyytinen et al. 2015). Moreover, it provided an intuitive interpretation of the RD

parameter that allowed researchers to think about treatment and control groups instead of

an effect at a single point—the cutoff—where there are effectively no observations.

The claim that the RD treatment assignment rule is “as good as randomized” in a

neighborhood of the cutoff can be interpreted in at least two ways. In one interpretation, it

means that there should be no treatment effect on predetermined covariates at the cutoff,

and that the validity of the underlying RD assumptions can be evaluated by testing the

null hypothesis that the RD treatment effect is zero on predetermined covariates. In another

interpretation, it means that the treatment is (as good as) randomly assigned near the cutoff,

and estimation and inference for treatment effects (and covariate balance tests) can be carried

out using the same tools used in experimental analysis. The first interpretation does not

imply the second because one can test for covariate balance (at the cutoff) under the usual

continuity assumptions. While the first interpretation has resulted in an increased and much

needed focus on credibility and falsification, the second has been the source of considerable

confusion. Our goal is to discuss the source of such confusion in detail. In doing so, we

clarify crucial conceptual distinctions within the local experiment RD framework, which in

turn elucidate the differences and similarities between this framework and the more standard

continuity-based approach.

We explore both the relationship between continuity and local randomization assumptions

3

in RD designs, and the implications of adopting a RD framework based on an explicit local

randomization assumption. Our paper builds on prior studies that have considered this issue.

Hahn, Todd, and van der Klaauw (2001) first invoked a local randomization assumption for

identification of RD effects under noncompliance. More recently, Cattaneo, Frandsen, and

Titiunik (2015) formalized an analogous assumption using a Fisherian, randomization-based

RD framework, which was extended by Cattaneo, Titiunik, and Vazquez-Bare (2016a,b).

Various super-population versions of the local randomization RD assumption were also pro-

posed by Keele, Titiunik, and Zubizarreta (2015) and Angrist and Rokkanen (2015), and

more recently de la Cuesta and Imai (2016) discussed the relationship between local ran-

domization and continuity RD assumptions.

We make two main arguments. First, we show that the usual RD continuity assumptions

are not sufficient for a literal local randomization interpretation of RD designs—that is,

not sufficient to ensure that, near the cutoff, the potential outcomes are independent of the

treatment assignment and unrelated to the running variable. This has been argued previously

(Cattaneo, Frandsen, and Titiunik 2015; de la Cuesta and Imai 2016); we simply provide a

stylized example to further illustrate the main issues. Second, contrary to common practice,

we show that the assumption that the treatment is randomly assigned among units in a

neighborhood of the cutoff cannot be used to justify analyzing and interpreting RD designs

as actual experiments. The restrictions imposed by the treatment assignment rule in a sharp

RD design rule out a local experiment where one randomly assigns treatment status without

changing the units’ score values. Instead, one could assume that score values are randomly

assigned in a neighborhood of the cutoff, a randomization model that is consistent with the

RD assignment mechanism. However, even under this model, interpreting and analyzing the

RD design as an experiment will generally result in invalid inferences because the score can

affect the potential outcomes directly in addition to through treatment status.

In particular, we show that (i) assuming random assignment of the RD running variable

in a neighborhood of the cutoff does not imply that the potential outcomes and the treatment

4

assignment are statistically independent or that the potential outcomes are unrelated to the

running variable in this neighborhood, and (ii) assuming local independence between the

potential outcomes and the treatment assignment does not imply the exclusion restriction

that the score affects the outcomes only via the treatment assignment indicator. Our discus-

sion makes clear that the RD treatment assignment rule need not place any restrictions on

the ways in which the score influences the potential outcomes and shows that, in local ran-

domization RD settings, statistical independence and random assignment are conceptually

different.

The discussion that follows focuses on the sharp RD design, in which units’ compliance

with treatment assignment is perfect, and the probability of receiving treatment changes from

zero to one at the cutoff. We do not explicitly discuss fuzzy RD settings, where compliance is

imperfect and the decision to take treatment is endogenous, because most of our arguments

and conclusions apply equally to both sharp and fuzzy RD designs. When our discussion

must be modified for the fuzzy case, we note it explicitly. Throughout, we refer to the

standard RD design based on continuity identification assumptions as the continuity-based

RD design; and to the RD design based on a local randomization assumption as the local

randomization or randomization-based RD design.

1 The Continuity-Based RD Framework

In this and the subsequent sections we assume that we have a random sample {Yi(1), Yi(0), Xi}ni=1,

where Xi is the score on the basis of which a binary treatment Ti is assigned according to the

rule Ti = 1(Xi ≥ c)—or Ti = 1(Xi ≤ c) depending on the example—for a known constant c,

Yi(1) is the potential outcome under the treatment condition, and Yi(0) the potential out-

come under the untreated or control condition. For every unit i we observe either Yi(0) or

Yi(1), so the observed sample is {Yi, Xi}ni=1 where Yi := TiYi(1) + (1− Ti)Yi(0). We assume

throughout that all moments we employ exist, and that the density of Xi is positive and

continuous at the cutoff or in the intervals we consider.

5

The continuity-based framework is based on the identification conditions and estimation

methods first proposed by Hahn, Todd, and van der Klaauw (2001). The authors proposed

the following assumption:

Assumption 1: Continuity. The regression functions E[Yi(1)|Xi = x] and E[Yi(0)|Xi = x]

are continuous in x at the cutoff c.

Under this assumption, they showed that

τ RDCB ≡ E[Yi(1)− Yi(0)|Xi = c] = limx↓c

E[Yi|Xi = x]− limx↑c

E[Yi|Xi = x]. (1)

Thus, when the focus is on mean effects, the target parameter in the continuity-based RD

framework is τ RDCB—the average treatment effect at cutoff. The identification result in equation

(1) states that, under continuity, this parameter—which depends on potential outcomes that

are fundamentally unobservable, can be expressed as the difference between the right and

left limits of the average observed outcomes at the cutoff. Estimation is therefore concerned

with constructing appropriate estimators for limx↑c E[Yi|Xi = x] and limx↓c E[Yi|Xi = x].

The most commonly used approach to estimate these limits is to rely on local polynomial

methods, fitting two polynomials of the observed outcome on the score—one for observations

above the cutoff, the other for observations below it— using only observations in a neigh-

borhood of the cutoff, with kernel weights assigning higher weights to observations closer to

the cutoff. Because these fitted polynomials are approximations to the unknown regression

functions E[Yi(1)|Xi = x] and E[Yi(0)|Xi = x] near the cutoff, the choice of neighborhood,

commonly known as bandwidth, is crucial. Given the order of the polynomial—typically

one—the bandwidth controls the quality of the approximation, with smaller bandwidths

reducing the bias of the approximation and larger bandwidths reducing its variance—see

Cattaneo and Vazquez-Bare (2016) for an overview of RD neighborhood selection.

Although the technical details of local polynomial estimation and inference are outside

the scope of our discussion, we highlight several issues that are central to distinguishing the

continuity-based approach from the local randomization approach we discuss next. In the

6

continuity-based RD design:1

• The target parameter, τ RDCB , is an average effect at a single point, rather than in an

interval.

• The functional form of the regression functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] is

unknown, and is locally approximated by a polynomial for estimation and inference.

• In general, the polynomial approximation will be imperfect. The approximation error

is controlled by the bandwidth sequence, hn: the smaller hn, the smaller the error.

• Local polynomial estimation and inference are based on large-sample results that re-

quire the bandwidth to shrink to zero as the sample size increases. For example,

consistent estimation of τ RDCB requires hn → 0 and nhn →∞.

• Local polynomial methods require smoothness conditions on the underlying regression

functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] in order to control the leading biases of

the local polynomial RD estimators. These smoothness conditions are stronger than

the continuity assumption required for identification of τ RDCB .

• The bandwidth hn plays no role in the identification of τ RDCB .

• Neither the continuity assumptions required for identification nor the smoothness as-

sumptions required for estimation and inference are implied by the RD treatment

assignment rule. See Sekhon and Titiunik (2016) for further discussion.

1.1 Continuity Does Not Imply Local Randomization

We now consider a stylized example that shows that continuity of the potential outcomes

regression functions (as in Assumption 1) does not imply that the treatment can be seen as

1For a general treatment of local polynomial methods, see Fan and Gijbels (1996) and Calonico, Cattaneo,and Farrell (2016) for recent higher-order results. For local polynomial methods applied specifically to theRD case, see Hahn, Todd, and van der Klaauw (2001), Porter (2003), Imbens and Kalyanaraman (2012),Calonico, Cattaneo, and Titiunik (2014), and Calonico et al. (2016).

7

locally randomly assigned in a literal or precise sense. We assume that we have a sample of

n students who take a mathematics exam in the first quarter of the academic year. Each

student receives a test score Xi in the exam, which ranges from 0 to 100, and students whose

grade is equal to or below 50 receive a double dose of algebra instruction (Ti = 1) during the

second quarter—while students with Xi > 50 receive a single dose (Ti = 0). The outcome of

interest is the test score Yi obtained in another mathematics exam taken at the end of the

second quarter, also ranging between 0 and 100. We assume that test scores are fine enough

so that they have no mass points and can be treated as continuous random variables.

To illustrate our argument, we assume that a student’s expected grade in the second

quarter under the control condition given her score in the first quarter, E[Yi(0)|Xi = x],

is simply equal to her score in the first quarter, Xi. We also assume that the double-dose

algebra treatment effect, τ , is constant for all students. The potential outcomes regression

functions are therefore:

E[Yi(0)|Xi] = Xi (2)

E[Yi(1)|Xi] = Xi + τ .

This model is of course extremely simplistic, but we adopt it because it allows us to illus-

trate the difference between the treatment assignment mechanism and the outcome model

in a straightforward way. We now assume that the initial grade Xi is entirely determined

by each student’s fixed inherent (and unobservable) ability ai—e.g. Xi = g(ai) for some

strictly increasing function g(·). Given this setup, the assignment of treatment according

to the rule 1(Xi ≤ 50) is as far from random as can be conceived, since it assigns all stu-

dents of lower ability to the treatment group and all students of higher ability to the control

group, inducing a complete lack of common support in the distribution of ability between

the groups. Despite this severe selection into treatment based on ability, the effect of the

treatment at the cutoff is readily identifiable, because the regression functions in equation

(2) are continuous at the cutoff (and everywhere else).

8

In this context, what does it mean to say that the RD design induces as-if randomness

near the cutoff? No matter how close to the threshold we get, the average ability in the

control group is always higher than in the treatment group: the effect ofX on E[Yi(j)|Xi] near

the cutoff is always nonzero—dE[Yi(j)|Xi = x]/dx = 1 for j = 0, 1 and for all x. Thus, the

continuity of the regression functions is entirely compatible with a very strong relationship

between outcome and score. This means that continuity is not sufficient to guarantee the

comparability of units on either side of the cutoff, even in a small neighborhood around it. In

other words, for any w > 0, one can always conceive a data generating process such that the

distortion induced by ignoring the relationship between X and Y in the window [c−w, c+w]

is arbitrarily large (a uniformity argument).

Thus, from an identification point of view, it is immediate to see that the continuity

condition in Assumption 1 that ensures identification in the super population does not imply

that the usual finite-sample “random assignment” identification assumptions hold. We will

discuss the latter type of assumptions in detail in the following section, but we now consider

one possibility in the context of the example in equation (2). Imagine that in this example we

wish to invoke a mean independence assumption in a small neighborhood W = [c−w, c+w]

around the cutoff, with w > 0, such as E[Yi(j)|Ti, Xi ∈ W ] = E[Yi(j)|Xi ∈ W ] for j = 0, 1;

recall that in this example Ti = 1(Xi ≤ c).

Focusing, for example, on the potential outcome under control, and given equation (2),

we have

E[Yi(0)|Ti = 0, Xi ∈ W ] = E [Yi(0)|Xi > c,Xi ∈ [c− w, c+ w] ]

= E [Yi(0)|Xi ∈ [c, c+ w] ]

= E [Xi|Xi ∈ [c, c+ w] ]

9

and

E[Yi(0)|Xi ∈ W ] = E [Yi(0)|Xi ∈ [c− w, c+ w] ]

= E [Xi|Xi ∈ [c− w, c+ w] ]

In general, E [Xi|Xi ∈ [c, c+ w] ] 6= E [Xi|Xi ∈ [c− w, c+ w] ], so the mean independence

assumption that is typically invoked in experiments cannot be invoked in this case. In

other words, this example shows that the continuity assumption is not enough to guaran-

tee independence between the potential outcomes and the treatment near the cutoff, and

consequently cannot be used to justify analyzing a RD design as one would analyze an

experiment.

From an estimation point of view, imagine that we mistakenly decided to analyze this

valid continuity-based RD design as an experiment in the fixed neighborhood [c− w, c+ w]

around the cutoff, defining the parameter of interest as average treatment effect in this

window. To calculate this effect, we would simply compare the average treated-control

difference in the observed outcomes in [c − w, c + w]. Unsurprisingly, this approach would

lead to an incorrect answer, since

E[Yi|c− w ≤ Xi ≤ c]−E[Yi|c < Xi ≤ c+ w]

= τ + E[Xi|c− w ≤ Xi ≤ c]− E[Xi|c < Xi ≤ c+ w]

< τ ,

where the last line follows from the fact that E[Xi|c−w ≤ Xi ≤ c]−E[Xi|c < Xi ≤ c+w] ∈

[−2w, 0).2 Thus, for any fixed window W = [c− w, c+ w], analyzing this RD design as one

would analyze an experiment will lead to a biased treatment effect estimate.

In this example, the smaller w, the closer the naive local randomization estimate will

be to the true effect τ . However, as we discuss below, in order for the local randomiza-

2If the treatment assignment rule were 1(Xi ≥ c), which is the more common definition of the treatmentin the RD literature, we would have E[Yi|c ≤ Xi ≤ c + w] − E[Yi|c − w ≤ Xi < c] = τ + E[Xi|c ≤ Xi ≤c+ w]− E[Xi|c− w ≤ Xi < c], and E[Xi|c ≤ Xi ≤ c+ w]− E[Xi|c− w ≤ Xi < c] ∈ (0, 2w].

10

tion RD framework to be conceptually distinct from the continuity-based framework, the

neighborhood W must necessarily be conceived as fixed. And, given a fixed W , the lack

of comparability between treated and control groups cannot be eliminated by increasing

the sample size—a direct consequence of that fact that such lack of comparability is an

identification problem rather than an estimation one.

In contrast, in the continuity-based framework, focusing on the average treated-control

outcome difference in a neighborhood of the cutoff can be understood as approximating the

unknown potential outcomes regression functions with a local constant fit. Such strategy

will result in a possibly large approximation error; however, the error is entirely due to the

estimation strategy and will vanish asymptotically. In this fundamental sense, a RD design

based on a continuity assumption cannot be interpreted literally as a local experiment. In

other words, assuming the conditional regression functions E[Yi(1)|Xi] and E[Yi(0)|Xi] are

continuous in Xi at c is not enough to treat the RD design as a pure experiment near the

cutoff. If we are to treat the RD design as a local experiment, we must effectively fix a

window width, and thus change the parameter of interest.

2 The Local Randomization RD Framework

The simple example above shows that continuity of the conditional regression functions is not

enough to justify analyzing or interpreting the RD design as an experiment in a neighborhood

of the cutoff. We now consider a local randomization RD setup where we explicitly assume

that the treatment is randomly assigned for all subjects with Xi ∈ [c − w, c + w]. As we

discuss, formalizing precisely the local randomization RD framework is difficult, because the

notion of random assignment near the cutoff can be interpreted in different ways.

2.1 Local Randomization of Score or Treatment?

Intuition suggests that simply assuming that the treatment is randomly assigned in a small

neighborhood around the cutoff should be enough to analyze the RD design as a local

11

experiment. However, this is not the case for two reasons: (i) the RD treatment assignment

rule places strict restrictions on the type of random assignments that are conceivable, and

(ii) the assignments that are conceivable do not imply an exclusion restriction that is always

true in actual experiments.

We consider two possible scenarios, according to two different interpretations of what it

means for the treatment to be locally randomly assigned near the cutoff. In the first scenario,

the values of Xi stay constant but the treatment received is randomly changed for subjects

near the cutoff. In the second scenario, the value of Xi is randomly assigned for all subjects

near the cutoff. As we show, this is an important distinction that must be considered when

formalizing the local randomization RD assumption.

Scenario 1: Treatment status randomly assigned for all subjects near the cutoff

The first way to understand locally random treatment assignment in the RD context is to

imagine a situation where all subjects with score in a neighborhood of the cutoff—that is,

with Xi ∈ [c − w, c + w]—are randomly assigned to receive treatment or control. In the

context of our education example introduced in the previous section, we could accomplish

this if, for example, we randomly assigned all students who scored between 45 and 55 in the

first exam to receive either a single or double dose of algebra, with every student receiving

double dose with the same positive probability. Given our assumption that the test score in

the first quarter is entirely determined by the students’ ability, this assignment mechanism

would break the relationship between ability and treatment status induced by the RD rule

1(Xi ≤ 50), because it would make receiving the treatment entirely independent of the grade

obtained in the first exam.

However, this way of conceptualizing randomization implies that the score Xi is unrelated

to treatment status in the local neighborhood, which is incompatible with the treatment as-

signment rule. In particular, this assignment would imply that there would be both treated

and control subjects on both sides of the cutoff in the neighborhood [c − w, c + w], con-

12

tradicting the RD treatment assignment rule 1(Xi ≤ c). This illustrates the general point

that, in a sharp RD design, treatment status is deterministic given the score, and thus it

is not possible to randomly assign the treatment without altering the values of the running

variable.3

Scenario 2: Score value randomly assigned for all subjects near the cutoff

The alternative is to assume that the running variable, not treatment status, is randomly

assigned near the cutoff—a manipulation that would be consistent with the sharp RD treat-

ment rule. There are multiple ways in which one could conceive of such an experiment.

For example, following our double-dose algebra example, we might believe that, even though

exam performance is broadly influenced by ability, the precise grade received by each student

involves some degree of randomness and arbitrariness, such that students whose grades are

four points or less apart should be of comparable ability. The school may therefore imple-

ment a two-stage process, where first all exams are graded, and for those students whose

grades fall between 48 and 52, the original score is replaced with a uniform random number

between 48 and 52. The justification for this two-stage procedure might be that it still as-

signs the treatment to those students who most need it (all students who get very low grades

are guaranteed to receive the treatment) but it implements a transparent and fair process

to assign the treatment to those students very near the cutoff whose characteristics may be

indistinguishable.

Under this setup, the test score in the first exam, Xi, is still broadly determined by

ability—so that a student who obtains a grade of 25 is on average of lower ability than a

student who obtains a grade of 90—but for two students whose grades are between 48 and 52,

the second-stage ensures that they have the same ex-ante probability of receiving treatment.

3Note that in a fuzzy RD design, it is possible to have both treated and control observations on eitherside of the cutoff. However, the simple random assignment we have described would still be incompatiblewith a fuzzy RD assignment because it would assign treatment with the same probability on either side ofthe cutoff, which would violate the assumption of discontinuity of the probability of treatment assignmentat the cutoff.

13

This hypothetical random assignment of score values implies that the average level of ability

(and any other predetermined confounder) of treated students with Xi ∈ [48, 50] is the same

as for control students with Xi ∈ [50, 52]. Note that this setup also assumes that the score

Xi collected by the researcher is the second stage score for all students, so that the treatment

rule 1(Xi ≤ 50) correctly distinguishes treated and control students. Such a two-stage rule

is of course artificial, although a second-stage randomization has been used in some actual

RD settings (see, e.g., Hyytinen et al. 2015).4

The crucial question is whether the assumption that the value of the score is (as-if)

randomly assigned in a neighborhood of the cutoff—that is, the assumption that all units

with score in [c− w, c + w] have the same probability of receiving any score value x within

this neighborhood and therefore the same probability of receiving treatment or control—is

enough to justify analyzing and interpreting the RD design in this neighborhood as an actual

experiment. At first glance, it might appear as if it is. We are assuming that, among units

in a local neighborhood of the cutoff, the value of the score is randomly assigned; since the

treatment is deterministically assigned based on this score, this guarantees that, on average,

there are no differences in the predetermined characteristics of units above and below the

4The reason we use two stages to set up a scenario where the score is randomly assigned is becausewe want to create a scenario where ability—a predetermined characteristic—is not systematically differentbetween treated and control groups. Because of potential individual-level heterogeneity in ability within thewindow [c−w, c+w], this cannot be guaranteed unless we assume a two-stage procedure or make an explicitassumption about the distribution of ability for subjects that fall in the neighborhood of the cutoff. Notealso that the two-stage scenario ensures comparability of all predetermined characteristics near the cutoff,whereas a scenario based on a single stage must consider assumptions about all possible predeterminedcharacteristics that affect the score.

To see this, consider the following alternative setup, where we assume that the score is a step functionof ability, such that higher ability leads to higher grades but the relationship between both variables isconstant in intervals. For example, we might imagine Xi = f(ai) + εi, with f(ai) = f(a) for all i such thatc− w ≤ Xi ≤ c+ w. Then,

P [Ti = 1 ≤ c | c− w ≤ Xi ≤ c+ w] = P [Xi ≤ c | c− w ≤ Xi ≤ c+ w]

= P [f(a) + εi ≤ c | c− w ≤ Xi ≤ c+ w]

= Fε (c− f(a) | c− w ≤ Xi ≤ c+ w)

where Fε(·) is the CDF of ε, which is the same function for all individuals with c−w ≤ Xi ≤ c+w. Underthis setup, all individuals in the local neighborhood of the cutoff have the same probability of receivingtreatment and the same ability.

14

cutoff within this window.

However, this reasoning ignores the possibility that the score itself may affect the po-

tential outcomes, a phenomenon we have explicitly allowed in our example in equation (2).

When we introduced this equation, we motivated it by imagining that students’ unobservable

ability affects both the grade obtained in the first exam, Xi, and the grade obtained in the

second exam, Yi. Our new assumption that the score is unrelated to students’ predetermined

characteristics in the neighborhood [48, 52] might seem at first to imply that the running

variable must be unrelated to the regression functions in this neighborhood, as illustrated

in Figure 1(a). However, if the running variable Xi affects the potential outcomes through

factors other than predetermined characteristics, the regression functions can follow equa-

tion (2) even when Xi is randomly assigned in a window around the cutoff—this situation

is illustrated in 1(b).

Figure 1: Two Scenarios with Randomly Assigned Score

Assigned to Treatment Assigned to Control

E[Y(1)|X]

E[Y(0)|X]

Cutoff

0

20

40

60

80

100

0 20 40 50 60 80 100Math Test Score Quarter I

Mat

h Te

st S

core

Qua

rter

II

(a) Test scores locally unrelated to potential out-comes

Assigned to Treatment Assigned to Control

E[Y(1)|X]

E[Y(0)|X]

Cutoff

0

20

40

60

80

100

0 20 40 50 60 80 100Math Test Score Quarter I

Mat

h Te

st S

core

Qua

rter

II

(b) Test scores locally related to potential out-comes

A more precise way to denote the potential outcomes would be Yi(Ti, Xi), where the first

argument captures the effect of the treatment assignment—i.e., of being on one side of the

cutoff or the other—and the second argument captures the effect of the running variable on

15

the outcome that occurs independently of the treatment status. This “direct” effect of Xi on

Yi would occur in our example if, for instance, a student’s test score in the first exam had a

reinforcing effect. If near the cutoff students who receive lower test scores were discouraged

and put less effort in the second exam than students who receive higher scores (because, for

example, they feel stigmatized by being assigned to the treatment group and are frustrated

by having been so close to the cutoff), Xi might be positively associated with the potential

outcomes even in the absence of a treatment effect. Under this assumption, the model in

equation (2) and Figure 1(b) could still be true even if the score were randomly assigned in

[48, 52].

Another way to see the distinction is to note that any experiment can be seen as a RD

design in which the score is a random number and the cutoff is chosen to ensure the desired

probability of treatment assignment. In our example, if instead of having instructors grade

the exams we assigned each student a uniform random number between 0 and 100, the

treatment assignment rule 1(Xi ≤ 50) would effectively become a rule that assigns double-

dose algebra entirely randomly among students, with each student having a 50% probability

of receiving treatment. In this case, however, there would be no need to add an extra

argument in the regression functions for the random number that determines treatment, as

the score used to randomly assign treatment is simply a computer-generated pseudo-random

number that is unrelated to the potential outcomes by construction.5

In contrast, in a RD design where the score is assumed to be randomly assigned in the

the local neighborhood [c − w, c + w] but is not an arbitrarily generated number unrelated

to the phenomena under study, there is nothing preventing the value of the running variable

from affecting the potential outcomes directly. In other words, local random assignment of

Xi does not guarantee the exclusion restriction that we typically take for granted in actual

randomized experiments. This illustrates the distinction between assumptions about the

assignment of the score Xi, and assumptions about the shape of the regression functions:

5Note that this statement assumes that students are never informed about the particular random “grade”that they receive, as occurs in actual experiments.

16

assumptions about the law of the random variable Xi place no restrictions on the functions

E[Yi(1)|Xi], and E[Yi(0)|Xi].

Note also that this phenomenon is analogous to the instrumental variables (IV) design,

where random assignment of the instrument does not imply the exclusion restriction that is

required to interpret the usual estimand as the average treatment effect for the compliers

(Angrist, Imbens, and Rubin 1996). The parallelism arises because in both IV and RD

designs, researchers are interested in the effect of the treatment on the outcome, but there is

a third variable (the score in RD, the instrument in IV) that induces a change in treatment

status and can also affect the potential outcomes directly. In both cases, the way in which

this third variable is assigned imposes no general restrictions on its relationship with the

potential outcomes.

The analogy between IV and fuzzy RD designs has been long recognized in the continuity-

based framework, where the similarities arise because of imperfect compliance with the

treatment assignment rule in fuzzy RD settings. However, in the context of the local ran-

domization RD framework, similarities occur even in the sharp RD case where compliance

with treatment assignment is perfect. The analogy we point out is not related to treatment

compliance but rather to the possible role of the score (instrument) as a determinant of the

outcome irrespective of treatment assignment and/or status.

2.2 Formalizing the Local Randomization RD Assumption

We now attempt to formalize a local randomization RD assumption. The first such for-

malization was proposed by Cattaneo, Frandsen, and Titiunik (2015), who used a Fisherian

randomization-based approach in which the potential outcomes are seen as fixed as opposed

to random variables. Their proposed assumption has two parts. The first part states that

the conditional distribution function of the score in the finite sample is the same for all units

in the neighborhood of the cutoff. The second explicitly adopts an exclusion restriction that

rules out any “direct” effects of the score on the potential outcomes and states that, within

17

the window, the fixed potential outcomes depend on the running variable solely through the

treatment indicator.6 As our discussion above illustrates, the exclusion restriction plays a

crucial role in the analogy between RD designs and experiments, an issue that has not been

generally recognized by scholars outside of the Fisherian framework.

Since the continuity-based RD framework has been almost exclusively developed within a

random sampling framework in which the potential outcomes are random variables, and the

analogy between RD and experiments is often understood in these terms, we formalize the

local randomization assumption using the random sampling setting introduced in Section

1. We thus retain the random sampling framework throughout the paper. In particular,

our formalization is based on statistical independence between the potential outcomes Yi(1)

and Yi(0) and the binary treatment assignment indicator in a neighborhood of the cutoff,

an assumption routinely invoked in experimental analysis and guaranteed by construction

by the random assignment of treatment in actual randomized experiments. A condition of

this kind has been invoked in local randomization RD settings by, for example, Angrist and

Rokkanen (2015), de la Cuesta and Imai (2016), Keele, Titiunik, and Zubizarreta (2015).

Formally, we state this assumption as:

Assumption 2: (Super-population) Local Independence. There exists a neighborhood

around the cutoff c, W = [c− w, c+ w], w > 0, such that

Yi(1), Yi(0) ⊥⊥ Ti | Xi ∈ W ,

where ⊥⊥ denotes statistical independence and Ti = 1(Xi ≥ c) as defined above.

Under this assumption, the average treatment effect in the window W is identified by

τ RDLR ≡ E[Yi(1)− Yi(0)|Xi ∈ W ] = E[Yi|Ti = 1, Xi ∈ W ]− E[Yi|Ti = 0, Xi ∈ W ]

If the window W is known, estimation can proceed, for example, simply by computing a

6Cattaneo, Titiunik, and Vazquez-Bare (2016a) relax this restriction to allow the potential outcomesfunctions to depend on the score, and use that information to adjust the potential outcomes and build therandomization distribution of the desired test-statistic. Their approach clarifies that one could use a finite-sample randomization-based framework to analyze a RD design when the exclusion restriction is violated,provided one specifies a particular functional form for the potential outcome function yi(x, t).

18

difference in means between treated and control groups for observations in this window.

Under Assumption 2, the local randomization RD framework has the following features:

• The target parameter, τ RDLR , is the average treatment effect in an interval of the support

of the running variable, rather than at a point.

• Because the focus is on an average over an interval rather than at a point, ap-

proximation of the functional form of the regression functions E[Yi(0)|Xi = x] and

E[Yi(1)|Xi = x] is not necessary for estimation.

• Knowledge of the window W = [c− w, c + w] around the cutoff is necessary for iden-

tification of τ RDLR .

• Estimation and inference methods assume the window is fixed as the sample size grows.

Thus, as we have defined it, the local randomization RD setup stands in contrast to the

continuity-based framework described above. We highlight two distinctions in particular.

Remark 1 (Randomization in window versus at the cutoff). Our characterization of the local

randomization RD framework is explicit in stating a “random assignment” assumption that

holds in an interval around the cutoff rather than at the cutoff point. This is in contrast

to some formalizations of the local randomization RD framework that make assumptions

at the cutoff point. The reason we focus on an interval rather than a point is because

the requirement of statistical independence at a point is trivial: any random variable is

independent of a constant, so the potential outcomes and the score will always be independent

at the cutoff. In other words, an assumption of randomization or independence at the cutoff

has no empirical or theoretical content, and thus cannot be used as a justification to analyze

RD designs as experiments.

Naturally, one can still use the local randomization RD interpretation simply as an

approximation device, where the window is not seen as the interval where a randomiza-

tion condition holds but rather as the interval where the unknown regression functions are

approximated—see Cattaneo, Frandsen, and Titiunik (2015, §6.5). But used in this heuristic

way, the local randomization RD framework becomes in essence identical to the continuity-

based framework: the parameter of interest becomes the treatment effect at the cutoff, and

identification and estimation results are ultimately based on some kind of continuity condi-

tion (see, e.g., Lee 2008; Canay and Kamat 2016). �

Remark 2 (The Role of Neighborhood). If the local randomization RD framework is under-

stood in terms of a fixed window as in Assumption 2 and not as the heuristic approximation

19

discussed in Remark 1, the role of the neighborhood in this approach is conceptually different

from the role of the bandwidth in the continuity-based framework. In the latter framework,

the bandwidth is used to control the bias and variance of the local polynomial approxima-

tion to the unknown regression functions; this implies, among other things, that optimal and

valid estimation and inference will require choosing a different bandwidth for every outcome

variable or covariate analyzed. In contrast, in the local randomization RD framework based

on an assumption such as Assumption 2, the window is a fundamental piece of the research

design, since it is the interval where the required identification assumption holds. Consequen-

tially, in the local randomization framework, this single window is used to perform estimation

and inference for all outcomes and covariates. See Cattaneo and Vazquez-Bare (2016) for

further details on the role of neighborhood selection in RD estimation and inference. �

3 The Difference Between Random Assignment and

Independence in RD contexts

We now consider whether Assumption 2 is implied by the random assignment of the score

near the cutoff and whether it implies that the exclusion restriction is satisfied in the RD

context. An in depth consideration of these issues reveals subtle relationships between the

concepts of random treatment assignment, statistical independence, and exclusion restric-

tion in RD settings. Our discussion makes three main points: (i) the random assignment

of the score does not imply local independence between treatment assignment and potential

outcomes; (ii) the local statistical independence between treatment assignment and potential

outcomes does not imply the exclusion restriction that holds by construction in experiments;

and (iii) an experimental analysis is possible under local independence if the exclusion re-

striction fails, provided that the interpretation of the parameter is modified accordingly.

3.1 Random Assignment Of Score Does Not Imply Local Inde-

pendence

The local independence assumption as stated in Assumption 2 is not guaranteed to hold when

we randomly assign the score near the cutoff—i.e., when all subjects with c−w ≤ Xi ≤ c+w

face the same ex-ante probability of receiving every value of the score between c − w and

c + w. The reason is that, even if all subjects in the neighborhood of the cutoff have the

20

same marginal distribution of the running variable, the potential outcomes and the running

variable may be related in ways that violate the local independence assumption. If, as

discussed above, higher test scores lead students to expend systematically more or less effort

in future exams, the randomly assigned value of Xi will affect the potential outcomes directly

even inside the local neighborhood [c−w, c+w], inducing a relationship between the potential

outcomes Yi(1), Yi(0) and the treatment indicator 1(Xi ≤ c) or 1(Xi ≥ c) that may violate

independence.

Thus, unlike in the case of actual randomized experiments, in a local randomization RD

design in the sense of Assumption 2, statistical independence and random assignment are

conceptually different. The reason is that, as discussed above, the implicit randomization

rule in a local randomization RD design does not and cannot manipulate treatment status

directly; instead, the rule must randomly assign the score values. Therefore, if the randomly

assigned score has a direct impact on the potential outcomes—so that, for example, high

values of Xi lead to high values of the potential outcomes—the indicators 1(Xi ≤ c) and

1(Xi ≥ c) may fail to be statistically independent of the potential outcomes.

To see this more formally, simply consider the example in equation (2) introduced above.

We already showed in Section 1 that this example violates Assumption 2; now simply assume

that Xi is randomly assigned in a neighborhood of the cutoff, which is of course allowed by

equation (2).

3.2 Local Independence Does Not Imply Exclusion Restriction

Moreover, the local independence assumption as stated in Assumption 2 is not a sufficient

condition for the exclusion restriction that the score does not affect the potential outcomes

except via the treatment indicator. That is, the local independence assumption does not

imply that the score Xi and the regression functions E[Yi(0) | Xi], E[Yi(1) | Xi] are unrelated

in a local neighborhood of the cutoff. Graphically, this means that regression functions need

not be flat in the neighborhood [c−w, c+w] even when Assumption 2 holds; in other words,

21

local independence does not necessarily imply a scenario like the one illustrated in Figure

1(a).

We briefly present an example to illustrate this point. It suffices to focus on one

potential outcome, so we focus on Yi(1).

Yi(1) = 1(Xi ∈ W ) · (1− |Xi − c|) + 1(Xi /∈ W ) · (1− w) + εi (3)

for all i, with εi a random error independent of Xi, c the cutoff, 1(Xi ≤ c) the treatment

rule, and W = [c − w, c + w] as before. The regression function E[Yi(1)|X] implied by this

model for Yi(1) is shown in Figure 2. To simplify notation, we redefine x := x− c so that the

cutoff for x is normalized to zero. We also assume that the density of X, f(x), is symmetric

around zero in [−w,w].

Figure 2: Example Where Exclusion Restriction Is Violated But Local Independence Holds

Cutoff

1 − w

1

c − w c c + wScore (X)

E[Y

(1)|X

]

We wish to show that in this setup, which clearly violates the exclusion restriction, the

22

local independence assumption holds. Let W = [−w,w]. We want to show

P[Y (1) ≤ y, X ≤ 0 | X ∈ W ] = P[Y (1) ≤ y | X ∈ W ] · P[X ≤ 0 | X ∈ W ]

Under the assumptions we made, we can show that

P[Y (1) ≤ y, X ≤ 0 | X ∈ W ] = P[Y (1) ≤ y | − w ≤ X ≤ 0] · P[X ≤ 0 | − w ≤ X ≤ w]

= E[ P[Y (1) ≤ y | X] | − w ≤ X ≤ 0] · P[X ≤ 0 | − w ≤ X ≤ w]

=

∫ 0

−w Fε(y − 1 + |x|) · f(x) · dx∫ 0

−w f(x)dx·∫ 0

−w f(x)dx∫ w

−w f(x)dx

=

12

(∫ 0

−w Fε(y − 1 + |x|) · f(x) · dx+∫ w

0Fε(y − 1 + |x|) · f(x) · dx

)∫ w

−w f(x)dx

=12

∫ w

−w Fε(y − 1 + |x|) · f(x) · dx∫ w

−w f(x)dx

=1

2P[Y (1) ≤ y |X ∈ W ]

Since we assumed that the density of x is symmetric around zero in the window, P[X ≤

0 | X ∈ W ] = 12. Thus, we have shown that in this example

P[Y (1) ≤ y, X ≤ 0 | X ∈ W ] =1

2P[Y (1) ≤ y |X ∈ W ]

= P[X ≤ 0 | X ∈ W ] · P[Y (1) ≤ y | X ∈ W ]

and therefore Y (1) and 1(X ≤ 0) are independent in W .

The most important assumptions in this example are the symmetry of the density of

the running variable Xi around the cutoff in the window [c − w, c + w], the independence

between the error term εi and Xi in this window, and the symmetry around the cutoff of

the functional form that relates Y (1) and X. The intuition behind the result is as follows:

if X is symmetrically distributed in [c − w, c + w], the value of the random variable Y =

1− |X − c|+ ε will be the same regardless of whether X > c or X < c; being above or below

the cutoff does not provide any information regarding the particular value of Y (1) that will

23

be observed. Thus the local independence condition between the potential outcome Yi(1)

and the treatment indicator 1(X ≤ c) holds even though Y (1) and X are related.

3.3 Experimental Analysis Under Local Independence Will Cap-

ture ‘Overall’ Effect If Exclusion Restriction Fails

We have shown that local statistical independence as stated in Assumption 2 does not guar-

antee that the exclusion restriction holds. Although referring to the potential outcomes

functions as Yi(1) and Yi(0) may give the impression that these functions are only affected

by the treatment indicator but not the score itself, this notation is commonly used to broadly

refer to the potential outcome under the treatment and control conditions, including every-

thing that these conditions entail. In the context of actual randomized experiments, it

is generally unnecessary to make the notation more explicit to let the potential outcomes

depend on the particular randomization device used. But in contexts where the variable

that induces variation in treatment assignment is an important determinant of the potential

outcomes rather than an arbitrary device, generalizing the potential outcomes notation is

necessary. For example, a more general potential outcomes notation has been used to allow

for the direct effect of the instrument in instrumental variable setups (Angrist, Imbens, and

Rubin 1996) and of the cutoff in multi-cutoff RD setups (Cattaneo et al. 2016).

Following the notation we introduced briefly in Scenario 2 in Section 2, we use let

Yi(Ti, Xi) denote the potential outcomes, now explicitly acknowledging that the potential

outcomes may depend on the value of the score Xi directly in addition to through the

treatment assignment indicator. Using this notation, Assumption 2 implies the mean inde-

pendence condition E[Yi(j,Xi)|Ti = j,Xi ∈ W ] = E[Yi(j,Xi)|Xi ∈ W ] for j ∈ {0, 1}. Given

this generalization, we now consider whether the local randomization RD framework can be

used in a context where Assumption 2 holds but the exclusion restriction fails.

Even if the exclusion restriction is violated in the sense that Yi(j,Xi) 6= Yi(j,X′i) for

24

Xi 6= X ′i, j ∈ {0, 1}, under Assumption 2 we have

E[Yi|Ti = 1, Xi ∈ W ] = E[Yi(1, Xi)|Ti = 1, Xi ∈ W ] = E[Yi(1, Xi)|Xi ∈ W ]

E[Yi|Ti = 0, Xi ∈ W ] = E[Yi(0, Xi)|Ti = 0, Xi ∈ W ] = E[Yi(0, Xi)|Xi ∈ W ],

leading to

E[Yi|Ti = 1, Xi ∈ W ]− E[Yi|Ti = 0, Xi ∈ W ] = E[Yi(1, Xi)− Yi(0, Xi)|Xi ∈ W ] = τ RDLR .

Thus, in a broad sense, Assumption 2 justifies analyzing the RD design in the neighbor-

hood of the cutoff as one would analyze an experiment because it allows identification of the

average treatment effect τ RDLR . However, if the exclusion restriction does not hold, the treated

and control average outcomes within the local window will combine the effect of the treat-

ment (e.g., a student receiving double-dose algebra or a party winning an election) on the

outcome, plus the additional effect of the score on the outcome that would occur regardless of

treatment status (e.g., students who receive higher grades may feel motivated to study more,

political donors may wish to donate more money to localities where parties obtain higher

vote shares). In this case, the parameter τ RDLR will not capture the (local) average effect of the

treatment alone, but rather the effect of obtaining a value of the score in [c, c + w] versus

[c− w, c), which will include, among other things, the effect of the treatment.

Using the more general notation, we can see that in a local randomization RD design

where Assumption 2 holds but the exclusion restriction does not, the parameter τ RDLR , unlike

τ RDCB , is not the average effect at any single point x:

τ RDLR = E[Yi(1)− Yi(0)|Xi ∈ W ] = E[Yi(1, Xi)− Yi(0, Xi)|Xi ∈ W ]

6= E[Yi(1, x)− Yi(0, x)|Xi ∈ W ]

Imagine, for example, that E[Yi(0, Xi)|Xi ∈ W ] = µ0, constant for every i in the window,

25

and Yi(1, Xi) follows equation 3. In this case,

τ RDLR = E[1− |Xi − c||Xi ∈ W ]− µ0 =

∫ c+w

c−w (1− |x− c|)f(x)dx∫ c+w

c−w f(x)dx− µ0,

which will take any value between 1 − w − µ0 and 1 − µ0, depending on the density of

the running variable X inside the window. This example shows that, in a scenario where

the exclusion restriction fails, the average effect τ RDLR will take different values according to

the likelihood of observations concentrating in particular ranges of the running variable

inside the window. For example, the same setup just described would lead to different

values of τ RDLR if the observations within the window were uniformly distributed than if they

were disproportionally concentrated near the cutoff. Thus, when the exclusion restriction

fails and the regression functions are not constant in the window around the cutoff, the

interpretation of the average effect τ RDLR differs from the interpretation it would have in a

standard experiment.

4 Concluding Remarks

Our discussion highlights that the local randomization interpretation of the RD design, if

taken literally, introduces conceptual distinctions that are absent in pure experimental de-

signs. As we have shown, in the context of RD designs, the distinction between random

assignment and statistical independence is consequential. This distinction is often meaning-

ful in natural experiments (Sekhon and Titiunik 2012). In an actual experiment, random

assignment of treatment leads to statistical independence between treatment status and po-

tential outcomes, because the “score” used to randomly assign subjects is a device (likely a

computer-generated pseudo-random number) that is by construction arbitrary and unrelated

to the potential outcomes or any systematic characteristic of the experimental subjects. As

we observed, a randomized experiment can be recast as a RD design where the score is a

randomly generated number and the cutoff is chosen to ensure the desired probability of re-

ceiving treatment. Thus, seen as functions of this random number, the regression functions

26

are guaranteed to be constant (graphically flat), because the score is a randomly generated

number that is by construction unrelated to the potential outcomes.

In contrast, in a RD design, the score used to assign treatment, even if its values are

randomly allocated near the cutoff, are usually important determinants of the outcome of

interest. Indeed, the importance of the RD score is often what motivates using it as the

basis of treatment assignment in the first place. In a context where the score is meaningfully

related to the outcome (past and future test scores, past and future vote shares, etc.),

random assignment of the score value contains no information about the particular form of

the regression functions E[Yi(1)|Xi] and E[Yi(0)|Xi]. This is straightforward to see once one

realizes that restrictions on the randomization distribution of the score Xi (and implicitly

of the treatment assignment) are fundamentally different from restrictions on the shape of

the regression functions—and more generally, on the conditional distribution of the potential

outcomes given the score. No matter how much information we have about the way in which

Xi is assigned, this information will generally be insufficient to determine how Xi and the

potential outcomes are related.

Given the additional assumptions needed for the local randomization interpretation to

hold, in most applications one should proceed using the continuity assumption alone. This

is typically a plausible assumption if there are neither formal nor informal mechanisms for

sorting—i.e., for units to appeal and change the score value they were originally assigned

in order to receive their preferred treatment condition. However, in practice, the methods

used to estimate the continuous functions at the cutoff are consequential. Estimation is

delicate because the functional forms are unknown, and one is estimating the trend at an

endpoint (the cutoff) of measure zero. The example in Hyytinen et al. (2015) illustrates

how inferences in RD designs are sensitive to the method used to estimate the continuous

functions on both sides of the cutoff even when sample sizes are not small, and the design is

valid by construction. More generally, the properties of RD estimation and inference based

on local polynomial methods depend crucially on the bandwidth choice; for example, the

27

commonly used mean-squared-error optimal bandwidth, though valid for point estimation,

is too large for inference and requires either undersmoothing or robust methods to yield

valid confidence intervals (Calonico, Cattaneo, and Titiunik 2014). One appeal of the local

randomization interpretation of RD is that it avoids these estimation and inference issues, but

unfortunately, this interpretation requires additional assumptions that may not be plausible.

28

References

Angrist, Joshua D, Guido W Imbens, and Donald B Rubin. 1996. “Identification of causaleffects using instrumental variables.” Journal of the American statistical Association91 (434): 444–455.

Angrist, Joshua D., and Miikka Rokkanen. 2015. “Wanna get away? Regression discontinuityestimation of exam school effects away from the cutoff.” Journal of the American StatisticalAssociation 110 (512): 1331–1344.

Calonico, Sebastian, Matias D. Cattaneo, and Max H. Farrell. 2016. “On the Effect of BiasEstimation on Coverage Accuracy in Nonparametric Inference.” Working paper, Universityof Michigan.

Calonico, Sebastian, Matias D. Cattaneo, Max H. Farrell, and Rocıo Titiunik. 2016. “Re-gression Discontinuity Designs Using Covariates.” Working paper, University of Michigan.

Calonico, Sebastian, Matias D. Cattaneo, and Rocio Titiunik. 2014. “Robust NonparametricConfidence Intervals for Regression-Discontinuity Designs.” Econometrica 82 (6): 2295–2326.

Canay, Ivan A, and Vishal Kamat. 2016. “Approximate Permutation Tests and InducedOrder Statistics in the Regression Discontinuity Design.” Working Paper, NorthwesternUniveristy.

Cattaneo, Matias D., Brigham Frandsen, and Rocio Titiunik. 2015. “Randomization Infer-ence in the Regression Discontinuity Design: An Application to Party Advantages in theU.S. Senate.” Journal of Causal Inference 3 (1): 1–24.

Cattaneo, Matias D., and Gonzalo Vazquez-Bare. 2016. “The Choice of Neighborhood inRegression Discontinuity Designs.” forthcoming Observational Studies.

Cattaneo, Matias D., Luke Keele, Rocıo Titiunik, and Gonzalo Vazquez-Bare. 2016. “In-terpreting Regression Discontinuity Designs with Multiple Cutoffs.” Journal of Politics78 (3): 1229-1248.

Cattaneo, Matias D., Rocio Titiunik, and Gonzalo Vazquez-Bare. 2016a. “Comparing Infer-ence Approaches for RD Designs: A Reexamination of the Effect of Head Start on ChildMortality.” Working paper, University of Michigan.

Cattaneo, Matias D., Rocio Titiunik, and Gonzalo Vazquez-Bare. 2016b. “Inference in Re-gression Discontinuity Designs Under Local Randomization.” The Stata Journal 16 (2):331-367.

Caughey, Devin, and Jasjeet S. Sekhon. 2011. “Elections and the Regression DiscontinuityDesign: Lessons from Close U.S. House Races, 1942–2008.” Political Analysis 19 (4): 385-408.

29

Cook, Thomas D. 2008. ““Waiting for Life to Arrive”: A history of the regression-discontinuity design in Psychology, Statistics and Economics.” Journal of Econometrics142 (2): 636–654.

de la Cuesta, Brandon, and Kosuke Imai. 2016. “Misunderstandings about the RegressionDiscontinuity Design in the Study of Close Elections.” Annual Review of Political Science19: 375–396.

Eggers, Andrew C., Olle Folke, Anthony Fowler, Jens Hainmueller, Andrew B. Hall, andJames M. Snyder. 2014. “On The Validity Of The Regression Discontinuity Design ForEstimating Electoral Effects: New Evidence From Over 40,000 Close Races.” AmericanJournal of Political Science, forthcoming.

Fan, J., and I. Gijbels. 1996. Local Polynomial Modelling and Its Applications. New York:Chapman & Hall/CRC.

Hahn, Jinyong, Petra Todd, and Wilbert van der Klaauw. 2001. “Identification and Estima-tion of Treatment Effects with a Regression-Discontinuity Design.” Econometrica 69 (1):201–209.

Holland, Paul W. 1986. “Statistics and causal inference.” Journal of the American statisticalAssociation 81 (396): 945–960.

Hyytinen, Ari, Jaakko Merilainen, Tuukka Saarimaa, Otto Toivanen, and Janne Tukiainen.2015. “Does Regression Discontinuity Design Work? Evidence from Random ElectionOutcomes.” VATT Institute for Economic Research, Working Paper 59.

Imbens, Guido, and Thomas Lemieux. 2008. “Regression Discontinuity Designs: A Guide toPractice.” Journal of Econometrics 142 (2): 615–635.

Imbens, Guido W., and Donald B. Rubin. 2015. Causal Inference in Statistics, Social, andBiomedical Sciences. Cambridge University Press.

Imbens, Guido W., and Karthik Kalyanaraman. 2012. “Optimal Bandwidth Choice for theRegression Discontinuity Estimator.” Review of Economic Studies 79 (3): 933–959.

Keele, Luke, Rocıo Titiunik, and Jose R Zubizarreta. 2015. “Enhancing a geographic regres-sion discontinuity design through matching to estimate the effect of ballot initiatives onvoter turnout.” Journal of the Royal Statistical Society: Series A (Statistics in Society)178 (1): 223–239.

Lee, David S. 2008. “Randomized Experiments from Non-random Selection in U.S. HouseElections.” Journal of Econometrics 142 (2): 675–697.

Lee, David S., and Thomas Lemieux. 2010. “Regression Discontinuity Designs in Economics.”Journal of Economic Literature 48 (2): 281–355.

Porter, Jack. 2003. “Estimation in the Regression Discontinuity model.” Working paper,University of Wisconsin at Madison.

30

Sekhon, Jasjeet S., and Rocıo Titiunik. 2012. “When natural experiments are neither naturalnor experiments.” American Political Science Review 106 (01): 35–57.

Sekhon, Jasjeet S., and Rocıo Titiunik. 2016. “Understanding Regression Discontinuity De-signs As Observational Studies.” forthcoming Observational Studies.

Thistlethwaite, Donald L., and Donald T. Campbell. 1960. “Regression-discontinuity Analy-sis: An Alternative to the Ex-Post Facto Experiment.” Journal of Educational Psychology51 (6): 309–317.

31

on interpreting the regression discontinuity design as a...

Documents