on interpreting the regression discontinuity design as a...
TRANSCRIPT
On Interpreting the Regression DiscontinuityDesign as a Local Experiment∗
Jasjeet S. Sekhon† Rocıo Titiunik‡
Robson Professor James Orin Murfin Associate ProfessorPolitical Science and Statistics Political Science
UC Berkeley University of Michigan
October 2, 2016
Abstract
We discuss the two most popular frameworks for identification, estimation and infer-ence in regression discontinuity (RD) designs: the continuity-based framework, wherethe conditional expectations of the potential outcomes are assumed to be continuousfunctions of the score at the cutoff, and the local randomization framework, wherethe treatment assignment is assumed to be as good as randomized in a neighborhoodaround the cutoff. Using various examples, we show that (i) assuming random assign-ment of the RD running variable in a neighborhood of the cutoff neither implies thatthe potential outcomes and the treatment are statistically independent nor that thepotential outcomes are unrelated to the running variable in this neighborhood, and (ii)assuming local independence between the potential outcomes and the treatment doesnot imply the exclusion restriction that the score affects the outcomes only throughthe treatment indicator. Our discussion highlights key distinctions between “locallyrandomized” RD designs and real experiments, including that statistical independenceand random assignment are conceptually different in RD contexts, and that the RDtreatment assignment rule places no restrictions on how the score and potential out-comes are related. Our findings imply that the methods for RD estimation, inferenceand falsification used in practice will necessarily be different (both in formal propertiesand in interpretation) according to which of the two frameworks is invoked.
∗We are indebted to Matias Cattaneo, Anthony Fowler, Kellie Ottoboni, Nicolas Idrobo Rincon, KosukeImai, Joshua Kalla, Fredrik Savje, Yotam Shem-Tov, and three anonymous reviewers for helpful commentsand discussion. Sekhon gratefully acknowledges support from the Office of Naval Research (N00014-15-1-2367) and Titiunik gratefully acknowledges financial support from the National Science Foundation (SES1357561).†UC Berkeley, Travers Department of Political Science, 210 Barrows Hall #1950, Berkeley, CA 94720-
1950. Email: <[email protected]>, Web: http://sekhon.berkeley.edu.‡Department of Political Science, 505 South State St., 5700 Haven Hall, University of Michigan, Ann
Arbor, MI 48109. Email: <[email protected]>, Web: http://www.umich.edu/~titiunik.
The regression discontinuity (RD) design is a research strategy based on three main
components—a score or “running variable,” a cutoff, and a treatment. Its basic characteristic
is that the treatment is assigned based on a known rule: all units receive a score value, and
the treatment is offered to those units whose score is above a cutoff and not offered to those
units whose score is below it (or viceversa).
The RD design has been available since the 1960s (Thistlethwaite and Campbell 1960),
but its popularity has grown particularly fast in recent years. In the last decade, an increasing
number of empirical researchers across the social and biomedical sciences has turned to
the RD design to estimate causal effects of treatments that are not, and often cannot be,
randomly assigned. This growth in RD empirical applications has occurred in parallel with
a rapid development of methodological tools for estimation, inference and interpretation of
RD effects. See Cook (2008), Imbens and Lemieux (2008) and Lee and Lemieux (2010)
for early reviews, and the introduction to this volume by Cattaneo and Escanciano for a
comprehensive list of recent references.
The recent popularity of the RD design was in part sparked by the work of Hahn, Todd,
and van der Klaauw (2001), who translated the design into the Neyman-Rubin potential
outcomes framework (Holland 1986) and offered minimal conditions for non-parametric iden-
tification of average effects. These authors showed that, when all units comply with their
assigned treatment, the average effect of the treatment at the cutoff can be identified under
the assumption that the conditional expectations of the potential outcomes given the score
are continuous (and other mild regularity conditions). This emphasis on continuity condi-
tions for identification was a departure from other “quasi-experimental” research designs,
which are typically based on independence or mean independence assumptions.
In the Neyman-Rubin framework, randomized experiments are seen as the gold standard,
and quasi-experimental designs are broadly characterized in terms of the assumptions under
which the (non-random) treatment assignment mechanism is as good as randomized. For ex-
ample, in observational studies based on the unconfoundedness or “selection-on-observables”
1
assumption, the treatment is as good as randomly assigned, albeit with unknown distribu-
tion, after conditioning on observable covariates (Imbens and Rubin 2015). Similarly, instru-
mental variables designs can be seen as randomized experiments with imperfect compliance,
and difference-in-difference designs compare treatment and control groups that, on average—
and except for time-invariant characteristics that can be removed by differencing—differ only
on treatment status. In all these cases, a treatment and a control group are well defined and,
under certain assumptions, can be compared to identify the average treatment effect for the
population of interest (or a subpopulation thereof).
In this context, the results derived by Hahn, Todd, and van der Klaauw (2001) set the RD
design apart. In contrast to most other quasi-experimental designs, RD identification was
established only for the average treatment effect at a single point: the cutoff. This implied
that, unlike differences-in-differences or selection-on-observable designs, the RD design could
not be accurately described by appealing to a comparison between a treatment and a control
group. Given the RD assignment rule, under perfect compliance it is impossible for treated
and control units to have the same score value. Moreover, when the running variable is
continuous, the probability of seeing an observation with a score value exactly equal to the
cutoff is zero. Thus, in the continuity-based RD setup, identification of the average treatment
effect at the cutoff necessarily relies on extrapolation, as there are neither treated nor control
observations with score values exactly equal to the cutoff.
The influential contribution by Lee (2008) changed the way RD was perceived, and
aligned the interpretation of RD designs with experimental (and the other quasi-experimental)
research designs. Lee (2008) argued that in a RD setting where the score can be influenced by
the subjects’ choices and unobservable characteristics, treatment status can be interpreted
to be as good as randomized in a local neighborhood of the cutoff as long as subjects lack
the ability to precisely determine the value of the score they receive—i.e., as long as their
score contains a random chance component. Lee’s framework captured the original ideas
in the seminal article by Thistlethwaite and Campbell, who called a hypothetical experi-
2
ment where the treatment is randomly assigned near the cutoff an “experiment for which
the regression-discontinuity analysis may be regarded as a substitute” (Thistlethwaite and
Campbell 1960, p. 310).
The interpretation of RD designs as local experiments developed by Lee (2008) has been
very influential, both conceptually and practically. Among other things, it established the
need to provide falsification tests based on predetermined covariates just as one would do in
the analysis of experiments, a practice that has now been widely adopted and has increased
the credibility of countless RD applications (e.g., Caughey and Sekhon 2011; Eggers et al.
2014; Hyytinen et al. 2015). Moreover, it provided an intuitive interpretation of the RD
parameter that allowed researchers to think about treatment and control groups instead of
an effect at a single point—the cutoff—where there are effectively no observations.
The claim that the RD treatment assignment rule is “as good as randomized” in a
neighborhood of the cutoff can be interpreted in at least two ways. In one interpretation, it
means that there should be no treatment effect on predetermined covariates at the cutoff,
and that the validity of the underlying RD assumptions can be evaluated by testing the
null hypothesis that the RD treatment effect is zero on predetermined covariates. In another
interpretation, it means that the treatment is (as good as) randomly assigned near the cutoff,
and estimation and inference for treatment effects (and covariate balance tests) can be carried
out using the same tools used in experimental analysis. The first interpretation does not
imply the second because one can test for covariate balance (at the cutoff) under the usual
continuity assumptions. While the first interpretation has resulted in an increased and much
needed focus on credibility and falsification, the second has been the source of considerable
confusion. Our goal is to discuss the source of such confusion in detail. In doing so, we
clarify crucial conceptual distinctions within the local experiment RD framework, which in
turn elucidate the differences and similarities between this framework and the more standard
continuity-based approach.
We explore both the relationship between continuity and local randomization assumptions
3
in RD designs, and the implications of adopting a RD framework based on an explicit local
randomization assumption. Our paper builds on prior studies that have considered this issue.
Hahn, Todd, and van der Klaauw (2001) first invoked a local randomization assumption for
identification of RD effects under noncompliance. More recently, Cattaneo, Frandsen, and
Titiunik (2015) formalized an analogous assumption using a Fisherian, randomization-based
RD framework, which was extended by Cattaneo, Titiunik, and Vazquez-Bare (2016a,b).
Various super-population versions of the local randomization RD assumption were also pro-
posed by Keele, Titiunik, and Zubizarreta (2015) and Angrist and Rokkanen (2015), and
more recently de la Cuesta and Imai (2016) discussed the relationship between local ran-
domization and continuity RD assumptions.
We make two main arguments. First, we show that the usual RD continuity assumptions
are not sufficient for a literal local randomization interpretation of RD designs—that is,
not sufficient to ensure that, near the cutoff, the potential outcomes are independent of the
treatment assignment and unrelated to the running variable. This has been argued previously
(Cattaneo, Frandsen, and Titiunik 2015; de la Cuesta and Imai 2016); we simply provide a
stylized example to further illustrate the main issues. Second, contrary to common practice,
we show that the assumption that the treatment is randomly assigned among units in a
neighborhood of the cutoff cannot be used to justify analyzing and interpreting RD designs
as actual experiments. The restrictions imposed by the treatment assignment rule in a sharp
RD design rule out a local experiment where one randomly assigns treatment status without
changing the units’ score values. Instead, one could assume that score values are randomly
assigned in a neighborhood of the cutoff, a randomization model that is consistent with the
RD assignment mechanism. However, even under this model, interpreting and analyzing the
RD design as an experiment will generally result in invalid inferences because the score can
affect the potential outcomes directly in addition to through treatment status.
In particular, we show that (i) assuming random assignment of the RD running variable
in a neighborhood of the cutoff does not imply that the potential outcomes and the treatment
4
assignment are statistically independent or that the potential outcomes are unrelated to the
running variable in this neighborhood, and (ii) assuming local independence between the
potential outcomes and the treatment assignment does not imply the exclusion restriction
that the score affects the outcomes only via the treatment assignment indicator. Our discus-
sion makes clear that the RD treatment assignment rule need not place any restrictions on
the ways in which the score influences the potential outcomes and shows that, in local ran-
domization RD settings, statistical independence and random assignment are conceptually
different.
The discussion that follows focuses on the sharp RD design, in which units’ compliance
with treatment assignment is perfect, and the probability of receiving treatment changes from
zero to one at the cutoff. We do not explicitly discuss fuzzy RD settings, where compliance is
imperfect and the decision to take treatment is endogenous, because most of our arguments
and conclusions apply equally to both sharp and fuzzy RD designs. When our discussion
must be modified for the fuzzy case, we note it explicitly. Throughout, we refer to the
standard RD design based on continuity identification assumptions as the continuity-based
RD design; and to the RD design based on a local randomization assumption as the local
randomization or randomization-based RD design.
1 The Continuity-Based RD Framework
In this and the subsequent sections we assume that we have a random sample {Yi(1), Yi(0), Xi}ni=1,
where Xi is the score on the basis of which a binary treatment Ti is assigned according to the
rule Ti = 1(Xi ≥ c)—or Ti = 1(Xi ≤ c) depending on the example—for a known constant c,
Yi(1) is the potential outcome under the treatment condition, and Yi(0) the potential out-
come under the untreated or control condition. For every unit i we observe either Yi(0) or
Yi(1), so the observed sample is {Yi, Xi}ni=1 where Yi := TiYi(1) + (1− Ti)Yi(0). We assume
throughout that all moments we employ exist, and that the density of Xi is positive and
continuous at the cutoff or in the intervals we consider.
5
The continuity-based framework is based on the identification conditions and estimation
methods first proposed by Hahn, Todd, and van der Klaauw (2001). The authors proposed
the following assumption:
Assumption 1: Continuity. The regression functions E[Yi(1)|Xi = x] and E[Yi(0)|Xi = x]
are continuous in x at the cutoff c.
Under this assumption, they showed that
τ RDCB ≡ E[Yi(1)− Yi(0)|Xi = c] = limx↓c
E[Yi|Xi = x]− limx↑c
E[Yi|Xi = x]. (1)
Thus, when the focus is on mean effects, the target parameter in the continuity-based RD
framework is τ RDCB—the average treatment effect at cutoff. The identification result in equation
(1) states that, under continuity, this parameter—which depends on potential outcomes that
are fundamentally unobservable, can be expressed as the difference between the right and
left limits of the average observed outcomes at the cutoff. Estimation is therefore concerned
with constructing appropriate estimators for limx↑c E[Yi|Xi = x] and limx↓c E[Yi|Xi = x].
The most commonly used approach to estimate these limits is to rely on local polynomial
methods, fitting two polynomials of the observed outcome on the score—one for observations
above the cutoff, the other for observations below it— using only observations in a neigh-
borhood of the cutoff, with kernel weights assigning higher weights to observations closer to
the cutoff. Because these fitted polynomials are approximations to the unknown regression
functions E[Yi(1)|Xi = x] and E[Yi(0)|Xi = x] near the cutoff, the choice of neighborhood,
commonly known as bandwidth, is crucial. Given the order of the polynomial—typically
one—the bandwidth controls the quality of the approximation, with smaller bandwidths
reducing the bias of the approximation and larger bandwidths reducing its variance—see
Cattaneo and Vazquez-Bare (2016) for an overview of RD neighborhood selection.
Although the technical details of local polynomial estimation and inference are outside
the scope of our discussion, we highlight several issues that are central to distinguishing the
continuity-based approach from the local randomization approach we discuss next. In the
6
continuity-based RD design:1
• The target parameter, τ RDCB , is an average effect at a single point, rather than in an
interval.
• The functional form of the regression functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] is
unknown, and is locally approximated by a polynomial for estimation and inference.
• In general, the polynomial approximation will be imperfect. The approximation error
is controlled by the bandwidth sequence, hn: the smaller hn, the smaller the error.
• Local polynomial estimation and inference are based on large-sample results that re-
quire the bandwidth to shrink to zero as the sample size increases. For example,
consistent estimation of τ RDCB requires hn → 0 and nhn →∞.
• Local polynomial methods require smoothness conditions on the underlying regression
functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] in order to control the leading biases of
the local polynomial RD estimators. These smoothness conditions are stronger than
the continuity assumption required for identification of τ RDCB .
• The bandwidth hn plays no role in the identification of τ RDCB .
• Neither the continuity assumptions required for identification nor the smoothness as-
sumptions required for estimation and inference are implied by the RD treatment
assignment rule. See Sekhon and Titiunik (2016) for further discussion.
1.1 Continuity Does Not Imply Local Randomization
We now consider a stylized example that shows that continuity of the potential outcomes
regression functions (as in Assumption 1) does not imply that the treatment can be seen as
1For a general treatment of local polynomial methods, see Fan and Gijbels (1996) and Calonico, Cattaneo,and Farrell (2016) for recent higher-order results. For local polynomial methods applied specifically to theRD case, see Hahn, Todd, and van der Klaauw (2001), Porter (2003), Imbens and Kalyanaraman (2012),Calonico, Cattaneo, and Titiunik (2014), and Calonico et al. (2016).
7
locally randomly assigned in a literal or precise sense. We assume that we have a sample of
n students who take a mathematics exam in the first quarter of the academic year. Each
student receives a test score Xi in the exam, which ranges from 0 to 100, and students whose
grade is equal to or below 50 receive a double dose of algebra instruction (Ti = 1) during the
second quarter—while students with Xi > 50 receive a single dose (Ti = 0). The outcome of
interest is the test score Yi obtained in another mathematics exam taken at the end of the
second quarter, also ranging between 0 and 100. We assume that test scores are fine enough
so that they have no mass points and can be treated as continuous random variables.
To illustrate our argument, we assume that a student’s expected grade in the second
quarter under the control condition given her score in the first quarter, E[Yi(0)|Xi = x],
is simply equal to her score in the first quarter, Xi. We also assume that the double-dose
algebra treatment effect, τ , is constant for all students. The potential outcomes regression
functions are therefore:
E[Yi(0)|Xi] = Xi (2)
E[Yi(1)|Xi] = Xi + τ .
This model is of course extremely simplistic, but we adopt it because it allows us to illus-
trate the difference between the treatment assignment mechanism and the outcome model
in a straightforward way. We now assume that the initial grade Xi is entirely determined
by each student’s fixed inherent (and unobservable) ability ai—e.g. Xi = g(ai) for some
strictly increasing function g(·). Given this setup, the assignment of treatment according
to the rule 1(Xi ≤ 50) is as far from random as can be conceived, since it assigns all stu-
dents of lower ability to the treatment group and all students of higher ability to the control
group, inducing a complete lack of common support in the distribution of ability between
the groups. Despite this severe selection into treatment based on ability, the effect of the
treatment at the cutoff is readily identifiable, because the regression functions in equation
(2) are continuous at the cutoff (and everywhere else).
8
In this context, what does it mean to say that the RD design induces as-if randomness
near the cutoff? No matter how close to the threshold we get, the average ability in the
control group is always higher than in the treatment group: the effect ofX on E[Yi(j)|Xi] near
the cutoff is always nonzero—dE[Yi(j)|Xi = x]/dx = 1 for j = 0, 1 and for all x. Thus, the
continuity of the regression functions is entirely compatible with a very strong relationship
between outcome and score. This means that continuity is not sufficient to guarantee the
comparability of units on either side of the cutoff, even in a small neighborhood around it. In
other words, for any w > 0, one can always conceive a data generating process such that the
distortion induced by ignoring the relationship between X and Y in the window [c−w, c+w]
is arbitrarily large (a uniformity argument).
Thus, from an identification point of view, it is immediate to see that the continuity
condition in Assumption 1 that ensures identification in the super population does not imply
that the usual finite-sample “random assignment” identification assumptions hold. We will
discuss the latter type of assumptions in detail in the following section, but we now consider
one possibility in the context of the example in equation (2). Imagine that in this example we
wish to invoke a mean independence assumption in a small neighborhood W = [c−w, c+w]
around the cutoff, with w > 0, such as E[Yi(j)|Ti, Xi ∈ W ] = E[Yi(j)|Xi ∈ W ] for j = 0, 1;
recall that in this example Ti = 1(Xi ≤ c).
Focusing, for example, on the potential outcome under control, and given equation (2),
we have
E[Yi(0)|Ti = 0, Xi ∈ W ] = E [Yi(0)|Xi > c,Xi ∈ [c− w, c+ w] ]
= E [Yi(0)|Xi ∈ [c, c+ w] ]
= E [Xi|Xi ∈ [c, c+ w] ]
9
and
E[Yi(0)|Xi ∈ W ] = E [Yi(0)|Xi ∈ [c− w, c+ w] ]
= E [Xi|Xi ∈ [c− w, c+ w] ]
In general, E [Xi|Xi ∈ [c, c+ w] ] 6= E [Xi|Xi ∈ [c− w, c+ w] ], so the mean independence
assumption that is typically invoked in experiments cannot be invoked in this case. In
other words, this example shows that the continuity assumption is not enough to guaran-
tee independence between the potential outcomes and the treatment near the cutoff, and
consequently cannot be used to justify analyzing a RD design as one would analyze an
experiment.
From an estimation point of view, imagine that we mistakenly decided to analyze this
valid continuity-based RD design as an experiment in the fixed neighborhood [c− w, c+ w]
around the cutoff, defining the parameter of interest as average treatment effect in this
window. To calculate this effect, we would simply compare the average treated-control
difference in the observed outcomes in [c − w, c + w]. Unsurprisingly, this approach would
lead to an incorrect answer, since
E[Yi|c− w ≤ Xi ≤ c]−E[Yi|c < Xi ≤ c+ w]
= τ + E[Xi|c− w ≤ Xi ≤ c]− E[Xi|c < Xi ≤ c+ w]
< τ ,
where the last line follows from the fact that E[Xi|c−w ≤ Xi ≤ c]−E[Xi|c < Xi ≤ c+w] ∈
[−2w, 0).2 Thus, for any fixed window W = [c− w, c+ w], analyzing this RD design as one
would analyze an experiment will lead to a biased treatment effect estimate.
In this example, the smaller w, the closer the naive local randomization estimate will
be to the true effect τ . However, as we discuss below, in order for the local randomiza-
2If the treatment assignment rule were 1(Xi ≥ c), which is the more common definition of the treatmentin the RD literature, we would have E[Yi|c ≤ Xi ≤ c + w] − E[Yi|c − w ≤ Xi < c] = τ + E[Xi|c ≤ Xi ≤c+ w]− E[Xi|c− w ≤ Xi < c], and E[Xi|c ≤ Xi ≤ c+ w]− E[Xi|c− w ≤ Xi < c] ∈ (0, 2w].
10
tion RD framework to be conceptually distinct from the continuity-based framework, the
neighborhood W must necessarily be conceived as fixed. And, given a fixed W , the lack
of comparability between treated and control groups cannot be eliminated by increasing
the sample size—a direct consequence of that fact that such lack of comparability is an
identification problem rather than an estimation one.
In contrast, in the continuity-based framework, focusing on the average treated-control
outcome difference in a neighborhood of the cutoff can be understood as approximating the
unknown potential outcomes regression functions with a local constant fit. Such strategy
will result in a possibly large approximation error; however, the error is entirely due to the
estimation strategy and will vanish asymptotically. In this fundamental sense, a RD design
based on a continuity assumption cannot be interpreted literally as a local experiment. In
other words, assuming the conditional regression functions E[Yi(1)|Xi] and E[Yi(0)|Xi] are
continuous in Xi at c is not enough to treat the RD design as a pure experiment near the
cutoff. If we are to treat the RD design as a local experiment, we must effectively fix a
window width, and thus change the parameter of interest.
2 The Local Randomization RD Framework
The simple example above shows that continuity of the conditional regression functions is not
enough to justify analyzing or interpreting the RD design as an experiment in a neighborhood
of the cutoff. We now consider a local randomization RD setup where we explicitly assume
that the treatment is randomly assigned for all subjects with Xi ∈ [c − w, c + w]. As we
discuss, formalizing precisely the local randomization RD framework is difficult, because the
notion of random assignment near the cutoff can be interpreted in different ways.
2.1 Local Randomization of Score or Treatment?
Intuition suggests that simply assuming that the treatment is randomly assigned in a small
neighborhood around the cutoff should be enough to analyze the RD design as a local
11
experiment. However, this is not the case for two reasons: (i) the RD treatment assignment
rule places strict restrictions on the type of random assignments that are conceivable, and
(ii) the assignments that are conceivable do not imply an exclusion restriction that is always
true in actual experiments.
We consider two possible scenarios, according to two different interpretations of what it
means for the treatment to be locally randomly assigned near the cutoff. In the first scenario,
the values of Xi stay constant but the treatment received is randomly changed for subjects
near the cutoff. In the second scenario, the value of Xi is randomly assigned for all subjects
near the cutoff. As we show, this is an important distinction that must be considered when
formalizing the local randomization RD assumption.
Scenario 1: Treatment status randomly assigned for all subjects near the cutoff
The first way to understand locally random treatment assignment in the RD context is to
imagine a situation where all subjects with score in a neighborhood of the cutoff—that is,
with Xi ∈ [c − w, c + w]—are randomly assigned to receive treatment or control. In the
context of our education example introduced in the previous section, we could accomplish
this if, for example, we randomly assigned all students who scored between 45 and 55 in the
first exam to receive either a single or double dose of algebra, with every student receiving
double dose with the same positive probability. Given our assumption that the test score in
the first quarter is entirely determined by the students’ ability, this assignment mechanism
would break the relationship between ability and treatment status induced by the RD rule
1(Xi ≤ 50), because it would make receiving the treatment entirely independent of the grade
obtained in the first exam.
However, this way of conceptualizing randomization implies that the score Xi is unrelated
to treatment status in the local neighborhood, which is incompatible with the treatment as-
signment rule. In particular, this assignment would imply that there would be both treated
and control subjects on both sides of the cutoff in the neighborhood [c − w, c + w], con-
12
tradicting the RD treatment assignment rule 1(Xi ≤ c). This illustrates the general point
that, in a sharp RD design, treatment status is deterministic given the score, and thus it
is not possible to randomly assign the treatment without altering the values of the running
variable.3
Scenario 2: Score value randomly assigned for all subjects near the cutoff
The alternative is to assume that the running variable, not treatment status, is randomly
assigned near the cutoff—a manipulation that would be consistent with the sharp RD treat-
ment rule. There are multiple ways in which one could conceive of such an experiment.
For example, following our double-dose algebra example, we might believe that, even though
exam performance is broadly influenced by ability, the precise grade received by each student
involves some degree of randomness and arbitrariness, such that students whose grades are
four points or less apart should be of comparable ability. The school may therefore imple-
ment a two-stage process, where first all exams are graded, and for those students whose
grades fall between 48 and 52, the original score is replaced with a uniform random number
between 48 and 52. The justification for this two-stage procedure might be that it still as-
signs the treatment to those students who most need it (all students who get very low grades
are guaranteed to receive the treatment) but it implements a transparent and fair process
to assign the treatment to those students very near the cutoff whose characteristics may be
indistinguishable.
Under this setup, the test score in the first exam, Xi, is still broadly determined by
ability—so that a student who obtains a grade of 25 is on average of lower ability than a
student who obtains a grade of 90—but for two students whose grades are between 48 and 52,
the second-stage ensures that they have the same ex-ante probability of receiving treatment.
3Note that in a fuzzy RD design, it is possible to have both treated and control observations on eitherside of the cutoff. However, the simple random assignment we have described would still be incompatiblewith a fuzzy RD assignment because it would assign treatment with the same probability on either side ofthe cutoff, which would violate the assumption of discontinuity of the probability of treatment assignmentat the cutoff.
13
This hypothetical random assignment of score values implies that the average level of ability
(and any other predetermined confounder) of treated students with Xi ∈ [48, 50] is the same
as for control students with Xi ∈ [50, 52]. Note that this setup also assumes that the score
Xi collected by the researcher is the second stage score for all students, so that the treatment
rule 1(Xi ≤ 50) correctly distinguishes treated and control students. Such a two-stage rule
is of course artificial, although a second-stage randomization has been used in some actual
RD settings (see, e.g., Hyytinen et al. 2015).4
The crucial question is whether the assumption that the value of the score is (as-if)
randomly assigned in a neighborhood of the cutoff—that is, the assumption that all units
with score in [c− w, c + w] have the same probability of receiving any score value x within
this neighborhood and therefore the same probability of receiving treatment or control—is
enough to justify analyzing and interpreting the RD design in this neighborhood as an actual
experiment. At first glance, it might appear as if it is. We are assuming that, among units
in a local neighborhood of the cutoff, the value of the score is randomly assigned; since the
treatment is deterministically assigned based on this score, this guarantees that, on average,
there are no differences in the predetermined characteristics of units above and below the
4The reason we use two stages to set up a scenario where the score is randomly assigned is becausewe want to create a scenario where ability—a predetermined characteristic—is not systematically differentbetween treated and control groups. Because of potential individual-level heterogeneity in ability within thewindow [c−w, c+w], this cannot be guaranteed unless we assume a two-stage procedure or make an explicitassumption about the distribution of ability for subjects that fall in the neighborhood of the cutoff. Notealso that the two-stage scenario ensures comparability of all predetermined characteristics near the cutoff,whereas a scenario based on a single stage must consider assumptions about all possible predeterminedcharacteristics that affect the score.
To see this, consider the following alternative setup, where we assume that the score is a step functionof ability, such that higher ability leads to higher grades but the relationship between both variables isconstant in intervals. For example, we might imagine Xi = f(ai) + εi, with f(ai) = f(a) for all i such thatc− w ≤ Xi ≤ c+ w. Then,
P [Ti = 1 ≤ c | c− w ≤ Xi ≤ c+ w] = P [Xi ≤ c | c− w ≤ Xi ≤ c+ w]
= P [f(a) + εi ≤ c | c− w ≤ Xi ≤ c+ w]
= Fε (c− f(a) | c− w ≤ Xi ≤ c+ w)
where Fε(·) is the CDF of ε, which is the same function for all individuals with c−w ≤ Xi ≤ c+w. Underthis setup, all individuals in the local neighborhood of the cutoff have the same probability of receivingtreatment and the same ability.
14
cutoff within this window.
However, this reasoning ignores the possibility that the score itself may affect the po-
tential outcomes, a phenomenon we have explicitly allowed in our example in equation (2).
When we introduced this equation, we motivated it by imagining that students’ unobservable
ability affects both the grade obtained in the first exam, Xi, and the grade obtained in the
second exam, Yi. Our new assumption that the score is unrelated to students’ predetermined
characteristics in the neighborhood [48, 52] might seem at first to imply that the running
variable must be unrelated to the regression functions in this neighborhood, as illustrated
in Figure 1(a). However, if the running variable Xi affects the potential outcomes through
factors other than predetermined characteristics, the regression functions can follow equa-
tion (2) even when Xi is randomly assigned in a window around the cutoff—this situation
is illustrated in 1(b).
Figure 1: Two Scenarios with Randomly Assigned Score
Assigned to Treatment Assigned to Control
E[Y(1)|X]
E[Y(0)|X]
Cutoff
0
20
40
60
80
100
0 20 40 50 60 80 100Math Test Score Quarter I
Mat
h Te
st S
core
Qua
rter
II
(a) Test scores locally unrelated to potential out-comes
Assigned to Treatment Assigned to Control
E[Y(1)|X]
E[Y(0)|X]
Cutoff
0
20
40
60
80
100
0 20 40 50 60 80 100Math Test Score Quarter I
Mat
h Te
st S
core
Qua
rter
II
(b) Test scores locally related to potential out-comes
A more precise way to denote the potential outcomes would be Yi(Ti, Xi), where the first
argument captures the effect of the treatment assignment—i.e., of being on one side of the
cutoff or the other—and the second argument captures the effect of the running variable on
15
the outcome that occurs independently of the treatment status. This “direct” effect of Xi on
Yi would occur in our example if, for instance, a student’s test score in the first exam had a
reinforcing effect. If near the cutoff students who receive lower test scores were discouraged
and put less effort in the second exam than students who receive higher scores (because, for
example, they feel stigmatized by being assigned to the treatment group and are frustrated
by having been so close to the cutoff), Xi might be positively associated with the potential
outcomes even in the absence of a treatment effect. Under this assumption, the model in
equation (2) and Figure 1(b) could still be true even if the score were randomly assigned in
[48, 52].
Another way to see the distinction is to note that any experiment can be seen as a RD
design in which the score is a random number and the cutoff is chosen to ensure the desired
probability of treatment assignment. In our example, if instead of having instructors grade
the exams we assigned each student a uniform random number between 0 and 100, the
treatment assignment rule 1(Xi ≤ 50) would effectively become a rule that assigns double-
dose algebra entirely randomly among students, with each student having a 50% probability
of receiving treatment. In this case, however, there would be no need to add an extra
argument in the regression functions for the random number that determines treatment, as
the score used to randomly assign treatment is simply a computer-generated pseudo-random
number that is unrelated to the potential outcomes by construction.5
In contrast, in a RD design where the score is assumed to be randomly assigned in the
the local neighborhood [c − w, c + w] but is not an arbitrarily generated number unrelated
to the phenomena under study, there is nothing preventing the value of the running variable
from affecting the potential outcomes directly. In other words, local random assignment of
Xi does not guarantee the exclusion restriction that we typically take for granted in actual
randomized experiments. This illustrates the distinction between assumptions about the
assignment of the score Xi, and assumptions about the shape of the regression functions:
5Note that this statement assumes that students are never informed about the particular random “grade”that they receive, as occurs in actual experiments.
16
assumptions about the law of the random variable Xi place no restrictions on the functions
E[Yi(1)|Xi], and E[Yi(0)|Xi].
Note also that this phenomenon is analogous to the instrumental variables (IV) design,
where random assignment of the instrument does not imply the exclusion restriction that is
required to interpret the usual estimand as the average treatment effect for the compliers
(Angrist, Imbens, and Rubin 1996). The parallelism arises because in both IV and RD
designs, researchers are interested in the effect of the treatment on the outcome, but there is
a third variable (the score in RD, the instrument in IV) that induces a change in treatment
status and can also affect the potential outcomes directly. In both cases, the way in which
this third variable is assigned imposes no general restrictions on its relationship with the
potential outcomes.
The analogy between IV and fuzzy RD designs has been long recognized in the continuity-
based framework, where the similarities arise because of imperfect compliance with the
treatment assignment rule in fuzzy RD settings. However, in the context of the local ran-
domization RD framework, similarities occur even in the sharp RD case where compliance
with treatment assignment is perfect. The analogy we point out is not related to treatment
compliance but rather to the possible role of the score (instrument) as a determinant of the
outcome irrespective of treatment assignment and/or status.
2.2 Formalizing the Local Randomization RD Assumption
We now attempt to formalize a local randomization RD assumption. The first such for-
malization was proposed by Cattaneo, Frandsen, and Titiunik (2015), who used a Fisherian
randomization-based approach in which the potential outcomes are seen as fixed as opposed
to random variables. Their proposed assumption has two parts. The first part states that
the conditional distribution function of the score in the finite sample is the same for all units
in the neighborhood of the cutoff. The second explicitly adopts an exclusion restriction that
rules out any “direct” effects of the score on the potential outcomes and states that, within
17
the window, the fixed potential outcomes depend on the running variable solely through the
treatment indicator.6 As our discussion above illustrates, the exclusion restriction plays a
crucial role in the analogy between RD designs and experiments, an issue that has not been
generally recognized by scholars outside of the Fisherian framework.
Since the continuity-based RD framework has been almost exclusively developed within a
random sampling framework in which the potential outcomes are random variables, and the
analogy between RD and experiments is often understood in these terms, we formalize the
local randomization assumption using the random sampling setting introduced in Section
1. We thus retain the random sampling framework throughout the paper. In particular,
our formalization is based on statistical independence between the potential outcomes Yi(1)
and Yi(0) and the binary treatment assignment indicator in a neighborhood of the cutoff,
an assumption routinely invoked in experimental analysis and guaranteed by construction
by the random assignment of treatment in actual randomized experiments. A condition of
this kind has been invoked in local randomization RD settings by, for example, Angrist and
Rokkanen (2015), de la Cuesta and Imai (2016), Keele, Titiunik, and Zubizarreta (2015).
Formally, we state this assumption as:
Assumption 2: (Super-population) Local Independence. There exists a neighborhood
around the cutoff c, W = [c− w, c+ w], w > 0, such that
Yi(1), Yi(0) ⊥⊥ Ti | Xi ∈ W ,
where ⊥⊥ denotes statistical independence and Ti = 1(Xi ≥ c) as defined above.
Under this assumption, the average treatment effect in the window W is identified by
τ RDLR ≡ E[Yi(1)− Yi(0)|Xi ∈ W ] = E[Yi|Ti = 1, Xi ∈ W ]− E[Yi|Ti = 0, Xi ∈ W ]
If the window W is known, estimation can proceed, for example, simply by computing a
6Cattaneo, Titiunik, and Vazquez-Bare (2016a) relax this restriction to allow the potential outcomesfunctions to depend on the score, and use that information to adjust the potential outcomes and build therandomization distribution of the desired test-statistic. Their approach clarifies that one could use a finite-sample randomization-based framework to analyze a RD design when the exclusion restriction is violated,provided one specifies a particular functional form for the potential outcome function yi(x, t).
18
difference in means between treated and control groups for observations in this window.
Under Assumption 2, the local randomization RD framework has the following features:
• The target parameter, τ RDLR , is the average treatment effect in an interval of the support
of the running variable, rather than at a point.
• Because the focus is on an average over an interval rather than at a point, ap-
proximation of the functional form of the regression functions E[Yi(0)|Xi = x] and
E[Yi(1)|Xi = x] is not necessary for estimation.
• Knowledge of the window W = [c− w, c + w] around the cutoff is necessary for iden-
tification of τ RDLR .
• Estimation and inference methods assume the window is fixed as the sample size grows.
Thus, as we have defined it, the local randomization RD setup stands in contrast to the
continuity-based framework described above. We highlight two distinctions in particular.
Remark 1 (Randomization in window versus at the cutoff). Our characterization of the local
randomization RD framework is explicit in stating a “random assignment” assumption that
holds in an interval around the cutoff rather than at the cutoff point. This is in contrast
to some formalizations of the local randomization RD framework that make assumptions
at the cutoff point. The reason we focus on an interval rather than a point is because
the requirement of statistical independence at a point is trivial: any random variable is
independent of a constant, so the potential outcomes and the score will always be independent
at the cutoff. In other words, an assumption of randomization or independence at the cutoff
has no empirical or theoretical content, and thus cannot be used as a justification to analyze
RD designs as experiments.
Naturally, one can still use the local randomization RD interpretation simply as an
approximation device, where the window is not seen as the interval where a randomiza-
tion condition holds but rather as the interval where the unknown regression functions are
approximated—see Cattaneo, Frandsen, and Titiunik (2015, §6.5). But used in this heuristic
way, the local randomization RD framework becomes in essence identical to the continuity-
based framework: the parameter of interest becomes the treatment effect at the cutoff, and
identification and estimation results are ultimately based on some kind of continuity condi-
tion (see, e.g., Lee 2008; Canay and Kamat 2016). �
Remark 2 (The Role of Neighborhood). If the local randomization RD framework is under-
stood in terms of a fixed window as in Assumption 2 and not as the heuristic approximation
19
discussed in Remark 1, the role of the neighborhood in this approach is conceptually different
from the role of the bandwidth in the continuity-based framework. In the latter framework,
the bandwidth is used to control the bias and variance of the local polynomial approxima-
tion to the unknown regression functions; this implies, among other things, that optimal and
valid estimation and inference will require choosing a different bandwidth for every outcome
variable or covariate analyzed. In contrast, in the local randomization RD framework based
on an assumption such as Assumption 2, the window is a fundamental piece of the research
design, since it is the interval where the required identification assumption holds. Consequen-
tially, in the local randomization framework, this single window is used to perform estimation
and inference for all outcomes and covariates. See Cattaneo and Vazquez-Bare (2016) for
further details on the role of neighborhood selection in RD estimation and inference. �
3 The Difference Between Random Assignment and
Independence in RD contexts
We now consider whether Assumption 2 is implied by the random assignment of the score
near the cutoff and whether it implies that the exclusion restriction is satisfied in the RD
context. An in depth consideration of these issues reveals subtle relationships between the
concepts of random treatment assignment, statistical independence, and exclusion restric-
tion in RD settings. Our discussion makes three main points: (i) the random assignment
of the score does not imply local independence between treatment assignment and potential
outcomes; (ii) the local statistical independence between treatment assignment and potential
outcomes does not imply the exclusion restriction that holds by construction in experiments;
and (iii) an experimental analysis is possible under local independence if the exclusion re-
striction fails, provided that the interpretation of the parameter is modified accordingly.
3.1 Random Assignment Of Score Does Not Imply Local Inde-
pendence
The local independence assumption as stated in Assumption 2 is not guaranteed to hold when
we randomly assign the score near the cutoff—i.e., when all subjects with c−w ≤ Xi ≤ c+w
face the same ex-ante probability of receiving every value of the score between c − w and
c + w. The reason is that, even if all subjects in the neighborhood of the cutoff have the
20
same marginal distribution of the running variable, the potential outcomes and the running
variable may be related in ways that violate the local independence assumption. If, as
discussed above, higher test scores lead students to expend systematically more or less effort
in future exams, the randomly assigned value of Xi will affect the potential outcomes directly
even inside the local neighborhood [c−w, c+w], inducing a relationship between the potential
outcomes Yi(1), Yi(0) and the treatment indicator 1(Xi ≤ c) or 1(Xi ≥ c) that may violate
independence.
Thus, unlike in the case of actual randomized experiments, in a local randomization RD
design in the sense of Assumption 2, statistical independence and random assignment are
conceptually different. The reason is that, as discussed above, the implicit randomization
rule in a local randomization RD design does not and cannot manipulate treatment status
directly; instead, the rule must randomly assign the score values. Therefore, if the randomly
assigned score has a direct impact on the potential outcomes—so that, for example, high
values of Xi lead to high values of the potential outcomes—the indicators 1(Xi ≤ c) and
1(Xi ≥ c) may fail to be statistically independent of the potential outcomes.
To see this more formally, simply consider the example in equation (2) introduced above.
We already showed in Section 1 that this example violates Assumption 2; now simply assume
that Xi is randomly assigned in a neighborhood of the cutoff, which is of course allowed by
equation (2).
3.2 Local Independence Does Not Imply Exclusion Restriction
Moreover, the local independence assumption as stated in Assumption 2 is not a sufficient
condition for the exclusion restriction that the score does not affect the potential outcomes
except via the treatment indicator. That is, the local independence assumption does not
imply that the score Xi and the regression functions E[Yi(0) | Xi], E[Yi(1) | Xi] are unrelated
in a local neighborhood of the cutoff. Graphically, this means that regression functions need
not be flat in the neighborhood [c−w, c+w] even when Assumption 2 holds; in other words,
21
local independence does not necessarily imply a scenario like the one illustrated in Figure
1(a).
We briefly present an example to illustrate this point. It suffices to focus on one
potential outcome, so we focus on Yi(1).
Yi(1) = 1(Xi ∈ W ) · (1− |Xi − c|) + 1(Xi /∈ W ) · (1− w) + εi (3)
for all i, with εi a random error independent of Xi, c the cutoff, 1(Xi ≤ c) the treatment
rule, and W = [c − w, c + w] as before. The regression function E[Yi(1)|X] implied by this
model for Yi(1) is shown in Figure 2. To simplify notation, we redefine x := x− c so that the
cutoff for x is normalized to zero. We also assume that the density of X, f(x), is symmetric
around zero in [−w,w].
Figure 2: Example Where Exclusion Restriction Is Violated But Local Independence Holds
Cutoff
1 − w
1
c − w c c + wScore (X)
E[Y
(1)|X
]
We wish to show that in this setup, which clearly violates the exclusion restriction, the
22
local independence assumption holds. Let W = [−w,w]. We want to show
P[Y (1) ≤ y, X ≤ 0 | X ∈ W ] = P[Y (1) ≤ y | X ∈ W ] · P[X ≤ 0 | X ∈ W ]
Under the assumptions we made, we can show that
P[Y (1) ≤ y, X ≤ 0 | X ∈ W ] = P[Y (1) ≤ y | − w ≤ X ≤ 0] · P[X ≤ 0 | − w ≤ X ≤ w]
= E[ P[Y (1) ≤ y | X] | − w ≤ X ≤ 0] · P[X ≤ 0 | − w ≤ X ≤ w]
=
∫ 0
−w Fε(y − 1 + |x|) · f(x) · dx∫ 0
−w f(x)dx·∫ 0
−w f(x)dx∫ w
−w f(x)dx
=
12
(∫ 0
−w Fε(y − 1 + |x|) · f(x) · dx+∫ w
0Fε(y − 1 + |x|) · f(x) · dx
)∫ w
−w f(x)dx
=12
∫ w
−w Fε(y − 1 + |x|) · f(x) · dx∫ w
−w f(x)dx
=1
2P[Y (1) ≤ y |X ∈ W ]
Since we assumed that the density of x is symmetric around zero in the window, P[X ≤
0 | X ∈ W ] = 12. Thus, we have shown that in this example
P[Y (1) ≤ y, X ≤ 0 | X ∈ W ] =1
2P[Y (1) ≤ y |X ∈ W ]
= P[X ≤ 0 | X ∈ W ] · P[Y (1) ≤ y | X ∈ W ]
and therefore Y (1) and 1(X ≤ 0) are independent in W .
The most important assumptions in this example are the symmetry of the density of
the running variable Xi around the cutoff in the window [c − w, c + w], the independence
between the error term εi and Xi in this window, and the symmetry around the cutoff of
the functional form that relates Y (1) and X. The intuition behind the result is as follows:
if X is symmetrically distributed in [c − w, c + w], the value of the random variable Y =
1− |X − c|+ ε will be the same regardless of whether X > c or X < c; being above or below
the cutoff does not provide any information regarding the particular value of Y (1) that will
23
be observed. Thus the local independence condition between the potential outcome Yi(1)
and the treatment indicator 1(X ≤ c) holds even though Y (1) and X are related.
3.3 Experimental Analysis Under Local Independence Will Cap-
ture ‘Overall’ Effect If Exclusion Restriction Fails
We have shown that local statistical independence as stated in Assumption 2 does not guar-
antee that the exclusion restriction holds. Although referring to the potential outcomes
functions as Yi(1) and Yi(0) may give the impression that these functions are only affected
by the treatment indicator but not the score itself, this notation is commonly used to broadly
refer to the potential outcome under the treatment and control conditions, including every-
thing that these conditions entail. In the context of actual randomized experiments, it
is generally unnecessary to make the notation more explicit to let the potential outcomes
depend on the particular randomization device used. But in contexts where the variable
that induces variation in treatment assignment is an important determinant of the potential
outcomes rather than an arbitrary device, generalizing the potential outcomes notation is
necessary. For example, a more general potential outcomes notation has been used to allow
for the direct effect of the instrument in instrumental variable setups (Angrist, Imbens, and
Rubin 1996) and of the cutoff in multi-cutoff RD setups (Cattaneo et al. 2016).
Following the notation we introduced briefly in Scenario 2 in Section 2, we use let
Yi(Ti, Xi) denote the potential outcomes, now explicitly acknowledging that the potential
outcomes may depend on the value of the score Xi directly in addition to through the
treatment assignment indicator. Using this notation, Assumption 2 implies the mean inde-
pendence condition E[Yi(j,Xi)|Ti = j,Xi ∈ W ] = E[Yi(j,Xi)|Xi ∈ W ] for j ∈ {0, 1}. Given
this generalization, we now consider whether the local randomization RD framework can be
used in a context where Assumption 2 holds but the exclusion restriction fails.
Even if the exclusion restriction is violated in the sense that Yi(j,Xi) 6= Yi(j,X′i) for
24
Xi 6= X ′i, j ∈ {0, 1}, under Assumption 2 we have
E[Yi|Ti = 1, Xi ∈ W ] = E[Yi(1, Xi)|Ti = 1, Xi ∈ W ] = E[Yi(1, Xi)|Xi ∈ W ]
E[Yi|Ti = 0, Xi ∈ W ] = E[Yi(0, Xi)|Ti = 0, Xi ∈ W ] = E[Yi(0, Xi)|Xi ∈ W ],
leading to
E[Yi|Ti = 1, Xi ∈ W ]− E[Yi|Ti = 0, Xi ∈ W ] = E[Yi(1, Xi)− Yi(0, Xi)|Xi ∈ W ] = τ RDLR .
Thus, in a broad sense, Assumption 2 justifies analyzing the RD design in the neighbor-
hood of the cutoff as one would analyze an experiment because it allows identification of the
average treatment effect τ RDLR . However, if the exclusion restriction does not hold, the treated
and control average outcomes within the local window will combine the effect of the treat-
ment (e.g., a student receiving double-dose algebra or a party winning an election) on the
outcome, plus the additional effect of the score on the outcome that would occur regardless of
treatment status (e.g., students who receive higher grades may feel motivated to study more,
political donors may wish to donate more money to localities where parties obtain higher
vote shares). In this case, the parameter τ RDLR will not capture the (local) average effect of the
treatment alone, but rather the effect of obtaining a value of the score in [c, c + w] versus
[c− w, c), which will include, among other things, the effect of the treatment.
Using the more general notation, we can see that in a local randomization RD design
where Assumption 2 holds but the exclusion restriction does not, the parameter τ RDLR , unlike
τ RDCB , is not the average effect at any single point x:
τ RDLR = E[Yi(1)− Yi(0)|Xi ∈ W ] = E[Yi(1, Xi)− Yi(0, Xi)|Xi ∈ W ]
6= E[Yi(1, x)− Yi(0, x)|Xi ∈ W ]
Imagine, for example, that E[Yi(0, Xi)|Xi ∈ W ] = µ0, constant for every i in the window,
25
and Yi(1, Xi) follows equation 3. In this case,
τ RDLR = E[1− |Xi − c||Xi ∈ W ]− µ0 =
∫ c+w
c−w (1− |x− c|)f(x)dx∫ c+w
c−w f(x)dx− µ0,
which will take any value between 1 − w − µ0 and 1 − µ0, depending on the density of
the running variable X inside the window. This example shows that, in a scenario where
the exclusion restriction fails, the average effect τ RDLR will take different values according to
the likelihood of observations concentrating in particular ranges of the running variable
inside the window. For example, the same setup just described would lead to different
values of τ RDLR if the observations within the window were uniformly distributed than if they
were disproportionally concentrated near the cutoff. Thus, when the exclusion restriction
fails and the regression functions are not constant in the window around the cutoff, the
interpretation of the average effect τ RDLR differs from the interpretation it would have in a
standard experiment.
4 Concluding Remarks
Our discussion highlights that the local randomization interpretation of the RD design, if
taken literally, introduces conceptual distinctions that are absent in pure experimental de-
signs. As we have shown, in the context of RD designs, the distinction between random
assignment and statistical independence is consequential. This distinction is often meaning-
ful in natural experiments (Sekhon and Titiunik 2012). In an actual experiment, random
assignment of treatment leads to statistical independence between treatment status and po-
tential outcomes, because the “score” used to randomly assign subjects is a device (likely a
computer-generated pseudo-random number) that is by construction arbitrary and unrelated
to the potential outcomes or any systematic characteristic of the experimental subjects. As
we observed, a randomized experiment can be recast as a RD design where the score is a
randomly generated number and the cutoff is chosen to ensure the desired probability of re-
ceiving treatment. Thus, seen as functions of this random number, the regression functions
26
are guaranteed to be constant (graphically flat), because the score is a randomly generated
number that is by construction unrelated to the potential outcomes.
In contrast, in a RD design, the score used to assign treatment, even if its values are
randomly allocated near the cutoff, are usually important determinants of the outcome of
interest. Indeed, the importance of the RD score is often what motivates using it as the
basis of treatment assignment in the first place. In a context where the score is meaningfully
related to the outcome (past and future test scores, past and future vote shares, etc.),
random assignment of the score value contains no information about the particular form of
the regression functions E[Yi(1)|Xi] and E[Yi(0)|Xi]. This is straightforward to see once one
realizes that restrictions on the randomization distribution of the score Xi (and implicitly
of the treatment assignment) are fundamentally different from restrictions on the shape of
the regression functions—and more generally, on the conditional distribution of the potential
outcomes given the score. No matter how much information we have about the way in which
Xi is assigned, this information will generally be insufficient to determine how Xi and the
potential outcomes are related.
Given the additional assumptions needed for the local randomization interpretation to
hold, in most applications one should proceed using the continuity assumption alone. This
is typically a plausible assumption if there are neither formal nor informal mechanisms for
sorting—i.e., for units to appeal and change the score value they were originally assigned
in order to receive their preferred treatment condition. However, in practice, the methods
used to estimate the continuous functions at the cutoff are consequential. Estimation is
delicate because the functional forms are unknown, and one is estimating the trend at an
endpoint (the cutoff) of measure zero. The example in Hyytinen et al. (2015) illustrates
how inferences in RD designs are sensitive to the method used to estimate the continuous
functions on both sides of the cutoff even when sample sizes are not small, and the design is
valid by construction. More generally, the properties of RD estimation and inference based
on local polynomial methods depend crucially on the bandwidth choice; for example, the
27
commonly used mean-squared-error optimal bandwidth, though valid for point estimation,
is too large for inference and requires either undersmoothing or robust methods to yield
valid confidence intervals (Calonico, Cattaneo, and Titiunik 2014). One appeal of the local
randomization interpretation of RD is that it avoids these estimation and inference issues, but
unfortunately, this interpretation requires additional assumptions that may not be plausible.
28
References
Angrist, Joshua D, Guido W Imbens, and Donald B Rubin. 1996. “Identification of causaleffects using instrumental variables.” Journal of the American statistical Association91 (434): 444–455.
Angrist, Joshua D., and Miikka Rokkanen. 2015. “Wanna get away? Regression discontinuityestimation of exam school effects away from the cutoff.” Journal of the American StatisticalAssociation 110 (512): 1331–1344.
Calonico, Sebastian, Matias D. Cattaneo, and Max H. Farrell. 2016. “On the Effect of BiasEstimation on Coverage Accuracy in Nonparametric Inference.” Working paper, Universityof Michigan.
Calonico, Sebastian, Matias D. Cattaneo, Max H. Farrell, and Rocıo Titiunik. 2016. “Re-gression Discontinuity Designs Using Covariates.” Working paper, University of Michigan.
Calonico, Sebastian, Matias D. Cattaneo, and Rocio Titiunik. 2014. “Robust NonparametricConfidence Intervals for Regression-Discontinuity Designs.” Econometrica 82 (6): 2295–2326.
Canay, Ivan A, and Vishal Kamat. 2016. “Approximate Permutation Tests and InducedOrder Statistics in the Regression Discontinuity Design.” Working Paper, NorthwesternUniveristy.
Cattaneo, Matias D., Brigham Frandsen, and Rocio Titiunik. 2015. “Randomization Infer-ence in the Regression Discontinuity Design: An Application to Party Advantages in theU.S. Senate.” Journal of Causal Inference 3 (1): 1–24.
Cattaneo, Matias D., and Gonzalo Vazquez-Bare. 2016. “The Choice of Neighborhood inRegression Discontinuity Designs.” forthcoming Observational Studies.
Cattaneo, Matias D., Luke Keele, Rocıo Titiunik, and Gonzalo Vazquez-Bare. 2016. “In-terpreting Regression Discontinuity Designs with Multiple Cutoffs.” Journal of Politics78 (3): 1229-1248.
Cattaneo, Matias D., Rocio Titiunik, and Gonzalo Vazquez-Bare. 2016a. “Comparing Infer-ence Approaches for RD Designs: A Reexamination of the Effect of Head Start on ChildMortality.” Working paper, University of Michigan.
Cattaneo, Matias D., Rocio Titiunik, and Gonzalo Vazquez-Bare. 2016b. “Inference in Re-gression Discontinuity Designs Under Local Randomization.” The Stata Journal 16 (2):331-367.
Caughey, Devin, and Jasjeet S. Sekhon. 2011. “Elections and the Regression DiscontinuityDesign: Lessons from Close U.S. House Races, 1942–2008.” Political Analysis 19 (4): 385-408.
29
Cook, Thomas D. 2008. ““Waiting for Life to Arrive”: A history of the regression-discontinuity design in Psychology, Statistics and Economics.” Journal of Econometrics142 (2): 636–654.
de la Cuesta, Brandon, and Kosuke Imai. 2016. “Misunderstandings about the RegressionDiscontinuity Design in the Study of Close Elections.” Annual Review of Political Science19: 375–396.
Eggers, Andrew C., Olle Folke, Anthony Fowler, Jens Hainmueller, Andrew B. Hall, andJames M. Snyder. 2014. “On The Validity Of The Regression Discontinuity Design ForEstimating Electoral Effects: New Evidence From Over 40,000 Close Races.” AmericanJournal of Political Science, forthcoming.
Fan, J., and I. Gijbels. 1996. Local Polynomial Modelling and Its Applications. New York:Chapman & Hall/CRC.
Hahn, Jinyong, Petra Todd, and Wilbert van der Klaauw. 2001. “Identification and Estima-tion of Treatment Effects with a Regression-Discontinuity Design.” Econometrica 69 (1):201–209.
Holland, Paul W. 1986. “Statistics and causal inference.” Journal of the American statisticalAssociation 81 (396): 945–960.
Hyytinen, Ari, Jaakko Merilainen, Tuukka Saarimaa, Otto Toivanen, and Janne Tukiainen.2015. “Does Regression Discontinuity Design Work? Evidence from Random ElectionOutcomes.” VATT Institute for Economic Research, Working Paper 59.
Imbens, Guido, and Thomas Lemieux. 2008. “Regression Discontinuity Designs: A Guide toPractice.” Journal of Econometrics 142 (2): 615–635.
Imbens, Guido W., and Donald B. Rubin. 2015. Causal Inference in Statistics, Social, andBiomedical Sciences. Cambridge University Press.
Imbens, Guido W., and Karthik Kalyanaraman. 2012. “Optimal Bandwidth Choice for theRegression Discontinuity Estimator.” Review of Economic Studies 79 (3): 933–959.
Keele, Luke, Rocıo Titiunik, and Jose R Zubizarreta. 2015. “Enhancing a geographic regres-sion discontinuity design through matching to estimate the effect of ballot initiatives onvoter turnout.” Journal of the Royal Statistical Society: Series A (Statistics in Society)178 (1): 223–239.
Lee, David S. 2008. “Randomized Experiments from Non-random Selection in U.S. HouseElections.” Journal of Econometrics 142 (2): 675–697.
Lee, David S., and Thomas Lemieux. 2010. “Regression Discontinuity Designs in Economics.”Journal of Economic Literature 48 (2): 281–355.
Porter, Jack. 2003. “Estimation in the Regression Discontinuity model.” Working paper,University of Wisconsin at Madison.
30
Sekhon, Jasjeet S., and Rocıo Titiunik. 2012. “When natural experiments are neither naturalnor experiments.” American Political Science Review 106 (01): 35–57.
Sekhon, Jasjeet S., and Rocıo Titiunik. 2016. “Understanding Regression Discontinuity De-signs As Observational Studies.” forthcoming Observational Studies.
Thistlethwaite, Donald L., and Donald T. Campbell. 1960. “Regression-discontinuity Analy-sis: An Alternative to the Ex-Post Facto Experiment.” Journal of Educational Psychology51 (6): 309–317.
31