· lecture 1lecture 2lecture 3 some examples to get started 1/3 case 1 evaluation of the impact of...

Lecture 1 Lecture 2 Lecture 3

Quantile Regression for Program Evaluation:

Some (Introductory) ExamplesCESifo Lecture Series

Margherita Fort

University of Bologna, CESifo, IZA

[email protected]

Last updated: May 20, 2014

M.Fort () Quantile Regression Last updated: May 20, 2014 1


Some Examples To Get Started 1/3

Case 1 Evaluation of the impact of welfare reforms on family earnings, income

and labour supply responses: eg AFDC, Jobs First in the US

Theory predicts heterogenous responses in the sign and magnitude of these

effects: eg no change in income at the bottom of the distribution;

fall or no change in income at the top

Mean impacts will average together positive and negative responses,

possibly obscuring the welfare reform effect

Using experimental data, Bitler, Gelbach, Hoynes (AER, 2006) find

• evidence of heterogeneity in the effects consistent with the theory

• that the intra-group variation in the effects at different points of the income

distribution exceeds the inter-group variaiton in mean impacts

Other recent examples: Maynard et al. (JAE, 2009); Ozkan et al. (JAE, 2014)



Jobs First Features

from Bitler et al. (AER, 2006)990 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2006

TABLE 1-KEY DIFFERENCES IN JOBS FIRST AND AFDC PROGRAMS

Jobs First AFDC

Earnings disregard All earned income disregarded up to poverty line Months 1-3: $120 + 1/3 (policy also applied to food stamps) Months 4-12: $120

Months > 12: $90 Time limit 21 months (6-month extension if in compliance None

and nontransfer income less than maximum benefit)

Work requirements Mandatory work first, exempt if child < 1 Education/training, exempt if child < 2

Sanctions 1st violation: 20-percent cut for 3 months (Rarely enforced) 2nd violation: 35-percent cut for 3 months 1st: adult removed from grant until compliant 3rd violation: grant cancelled for 3 months 2nd: adult removed

- 3 months

3rd: adult removed - 6 months Other policies * Asset limit $3,000 * Asset limit $1,000

* Partial family cap (50 percent) * 100-hour rule and work history requirement for two-parent families

* Two years transitional Medicaid * One-year transitional Medicaid * Child care assistance * Child support: $50 disregard, $50 * Child support: $100 disregard, full pass-through maximum pass-through

Source: Bloom et al. (2002).

to much recent discussion among policymakers and researchers, our results suggest the possibility that Connecticut's welfare reform reduced income for a nontrivial share of the income distribution after time limits took effect. Fourth, we find that the essential features of our empirical findings could not have been revealed using mean impact analysis on typically defined subgroups: the intra- group variation in QTE greatly exceeds the inter- group variation in mean impacts.

The remainder of the paper is organized as follows. In Section I, we provide an overview of the Jobs First program and its predicted effects. We then discuss our data in Section II. In Sec- tion III, we present empirical evidence that strongly suggests the time limit was an important program feature, and we present mean treatment effects in Section IV. Our main QTE results appear in Section V. We discuss extensions and sensitivity tests in Section VI, and we conclude in Section VII.

I. The Jobs First Program and Its Economic Implications

Below we compare the earnings, transfer, and income distributions between a randomly assigned treatment group, whose members face the Jobs First eligibility and program rules, and a randomly assigned control group, whose

members face the AFDC eligibility and program rules. We begin by outlining the two programs and use labor supply theory to generate predictions about earnings, transfers, and income under Jobs First compared to AFDC.

Table 1 summarizes the major features of Connecticut's Jobs First waiver program and the existing AFDC program. The Jobs First waiver contained each of the key elements in PRWORA: time limits, work requirements, and financial sanctions. Jobs First's earnings disregard policy is quite simple: every dollar of earnings below the federal poverty line (FPL) is disregarded for purposes of benefit determina- tion. This leads to an implicit tax rate of 0 percent for all earnings up to the poverty line, which is a very generous policy by comparison to AFDC's. The statutory AFDC policy disregarded the first $120 of monthly earnings during a woman's first 12 months on aid, and $90 thereafter. In the first four months, benefits were reduced by two dollars for every three dollars earned, and starting with the fifth month on aid, benefits were reduced dollar for dollar, so that the long-run statutory implicit tax rate on earnings above the disregard was 100 percent.3

3In practice, AFDC effective tax rates were less than the 100-percent statutory rate. First, there were work expense and

This content downloaded from 137.204.178.134 on Tue, 22 Apr 2014 13:01:20 PMAll use subject to JSTOR Terms and Conditions



Predictions From Static Labour Supply Theory

fig1.pdf

VOL. 96 NO. 4 BITLER ET AL.: WHAT MEAN IMPACTS MISS 991

As shown in Table 1, the Jobs First time limit is 21 months, which is currently the shortest in the United States (Office of Family Assistance, 2003, Table 12:10). By contrast, there were no time limits in the AFDC program. In addition, work requirements and financial sanctions were strengthened in the Jobs First program relative to AFDC. For example, the Jobs First work requirements moved away from general education and training, focusing instead on "work first" training programs. Further, Jobs First exempts from work requirements only women with children under the age of one, and financial sanctions are supposed to be levied on parents who do not comply with work requirements. While Jobs First's sanctions are more stringent than AFDC's, the available evidence suggests that they were rarely used. For more information on these and other features of the Jobs First program, see our earlier working paper (Bitler et al., 2003b) and MDRC's final report on the Jobs First evaluation (Diana Adams-Ciardullo et al., 2002, henceforth the "final report").

Basic labor supply theory makes strong and heterogeneous predictions concerning welfare reforms like those in Jobs First. In the rest of this section, we discuss the economic impacts of Jobs First on the earnings, transfers, and income distributions. We focus on earnings disregards and time limits, since they are the salient features for ex- amining heterogeneous treatment effects.

A. Economic Impacts of Earnings Disregards

To begin, Figure 1 shows a stylized budget constraint in income-leisure space before and

after Jobs First. The AFDC program is represented by line segment AB while Jobs First is represented by AF. The Jobs First program dra- matically affects the budget constraint faced by welfare recipients-lowering the benefit reduction rate to 0 percent and raising the breakeven earnings level to the FPL.4 The effective AFDC benefit reduction rate in this figure is below the statutory long-run rate of 100 percent (see foot- note 3 for a discussion).

What is the impact of this transformation of the on-welfare budget segment from AFDC's AB to Jobs First's AF? To begin, we make the usual static labor supply model assumptions: the woman can freely choose hours of work at the given offered wage, and offered wages are constant. In particular, we ignore any human capital, search-theoretic, or related issues. We

Monthly income

FPL

G

H F

E

D

B C

0 Monthly work hours

FIGURE 1. STYLIZED CONNECTICUT BUDGET CONSTRAINT UNDER AFDC AND JOBS FIRST

child care disregards. Second, AFDC eligibility redetermina- tion occurred less frequently than monthly, so there could be a lag between the month when an AFDC participant earned income and the date when benefits were reduced. Third, the Earned Income Tax Credit (EITC) provides a 40-percent wage subsidy in its phase-in region, which generally ended above Connecticut's maximum benefit level. (The EITC is available to both experimental groups in our data, so it raised the net wage above its before-tax level for both groups.) In Bitler et al. (2003b), we present local nonparametric regressions of transfer payments on earnings and find that the control group members receiving AFDC in our sample faced an effective benefit reduction rate of about one-third, similar to earlier studies of the national caseload in Terra McKinnish et al. (1999) and Thomas Fraker et al. (1985). Also, statutory rules for both AFDC and Jobs First tax away nonlabor income other than child support dollar for dollar; we discuss child support inter- actions in Section VIC.

4 Under AFDC rules, eligibility for AFDC conferred categorical eligibility for food stamps, with a 30-percent benefit reduction rate applied to non-food stamps income. Under Jobs First, food stamps rules mirror those for cash assistance: food stamps benefits are determined after disre- garding all earnings up to the poverty line (though this food stamps disregard expansion operates only while a woman assigned to Jobs First is receiving cash welfare payments). However, losing eligibility for welfare benefits under Jobs First assignment (e.g., through time limits) need not elimi- nate food stamps eligibility, since one could still satisfy the food stamps need standard.

A




Predictions From Static Labour Supply Theory

tab2.pdf

992 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2006

TABLE 2-PRE-TIME LIMITrr PREDICTED EFFECTS OF JOBS FIRST ASSIGNMENT, BY OPTIMAL CHOICE GIVEN AFDC ASSIGNMENT

Compared to this point, does Jobs Location if First assignment change: Effect on distribution of: assigned to Location on Jobs AFDC After-tax wage? Nonlabor income? First budget set Hours/earnings Transfers Income

A Yes No A 0 0 0 Yes No On AF, left of A + 0 +

C Yes No On AF, left of C + + + D No Yes On AF, right of D - + + E No Yes On AF, left of A - + + H No Yes On AF, left of A - + -

No No H 0 0 0

Notes: Table contains predictions of static labor supply model for women facing AFDC and counterfactual Jobs First disregard rules (assuming all other rules are the same). Points are those labeled in Figure 1. There are two predictions for women at points A and H depending on those women's preferences.

also assume that there is no time limit. Later we relax these assumptions.

Consider first the case in which an AFDC- assigned woman locates at point A, working zero hours and receiving the maximum benefit payment G. Depending on the woman's preferences (e.g., the steepness of her indif- ference curves), assignment to Jobs First could lead to either of two outcomes. First, she might continue to work zero hours and receive the maximum benefit with no change in income. Second, she might enter the labor market, moving from A to some point on AF; transfer income remains at the maximum benefit level, while total income rises. This labor supply prediction-together with others discussed below-is summarized in Table 2, which indicates whether Jobs First changes the after-tax wage (in this case, yes) and nonlabor income (in this case, no). Table 2 then indicates the predicted location on the Jobs First budget set and the expected impact of Jobs First assignment on earnings, transfers, and income.

We next consider points such as C, where women work positive hours and receive welfare when they are assigned to AFDC. For such women, assignment to Jobs First has only a price effect: the benefit reduction rate is lower, but there is no change in nonlabor

income at zero hours of work. As long as substitution effects dominate income effects when only the net wage changes, Jobs First will cause an increase in hours, earnings, transfers, and income.

Now imagine that a woman's preferences are such that she would not participate in welfare if assigned to AFDC, instead locating at a point like D. At this point, her earnings would be between the maximum benefit amount and the FPL. Assignment to Jobs First would make this woman income-eligible for welfare even if she did not change her behavior; this is the case of Orley Ashenfelter's (1983) "mechanical" induced eligibility effect leading to an increase in transfers. If we assume that both leisure and consumption are normal goods, then we will expect the increase in nonlabor income accom- panying Jobs First assignment to reduce hours of work and increase total income. That is, we expect women who would locate at point D to move to a point on AF that is both right of and above D.

Next consider a woman who would locate at a point like E if assigned to AFDC. At E, earnings are between the poverty line and the sum of the maximum benefit and the poverty line. Such points are clearly dominated under Jobs First assignment: the woman can increase income by reducing hours of work and claiming welfare (an example of Ashenfelter' s behavioral induced eligibility effect). If both leisure and consumption are normal goods, we expect this woman to locate on AF at a point higher than E, so that hours worked decrease, while transfers and income both increase.

5 Note that labor supply theory makes predictions about hours worked. Assuming no change in offered wages, this implies a prediction about earnings. Thus the table includes a single prediction for hours/earnings, which is important, since we observe earnings but not hours in our data.





Case 2 Evaluation of the impact of risk or protective factors on weight/BMI

High cost and long-term effects (both medical and economc) motivate an

interest for factors associated to particularly low-birthweight

BMI outside a range (18-25) is not ideal for health: a variable that has a

positive effect only at the bottom of the BMI distribution can be considered a

protective factor because it is negatively associated to underweight;

conversely the same positive effect above the median/at the top of the BMI

distribution may lead one to consider the variable a risk factor

Mean impacts are less interesting for policy makers than impacts

on too low or too high BMI

Related papers: Abrevaya (2001,EE); Brunello et al. (2009b); Stifel (EHB, 2009)




Case 3 Research on income or wage inequality, wage structure“The school is a promising place to increase the skills and incomes of individuals. As a result,

educational policies have the potential to decrease existing, and growing, inequalities in income”

Ashenfelter et al. (2001)

What is the impact of education on (within-levels) wage inequality?

To answer this question we need to assess if returns to education are

heterogenous over the wage distribution

Do public sector firm ownership and lack of competition matter for wages?Shifting workers from the public sector to the private sector has an ambigoustheoretical impact on wages, given the interplay of ownership andcompetition. Because isses such as pay equity and fairness are encountered inpolitical discussion, privatization is likely to affect not only the average wagebut also the distribution of wages.

Related papers: Martins et al (LE, 2004); Brunello et al. (EJ, 2009); Melly et al.

(JEEA, forth.)



Remarks

Theoretical models may predict heterogeneout impacts of “treatments”

Mean impact may miss distributional effects of a policy or a treatment

Policy makers may be intrinsically interested inthe impact of a policy on extreme values of the distribution of the outcomesassessing effects on inequality

There are many reasons to go beyond the average

Quantile regression is an appropriate toolto examine heterogeneous effects onthe distribution of a continuous outcome

Quantile regression coefficients may not have a causal interpretation

No multivariate quantile definition (yet)

Conditional and Unconditional Quantiles are two different objects (!)

Quantile Treatment Effect do not speak about quantiles of the treatmenteffect distribution without further assumptions (!)



Are the example mentioned related to

your research?

Can you think about research questions

in your field for which heterogeneity matters?

(Share them!)



What we cover in these lectures (introductory level)

Quantile Regression (QR) with Exogenous RegressorsTheory (fundamentals) and Applications (some examples)

QR with Endogenous Regressors, Instrumental Variable Approaches:IV-QTELATE-QTE

QR with Endogenous Regressors: The Causal Chain Model

Topics related to identification & estimation of effects of covariates onconditional quantiles

What we do not cover (but maybe relevant for your research)

Unconditional Quantile Regression

Quantile Regression for Time Series, Panel Data and Discrete Data

Censored Quantile Regression and Quantile Regression for Duration Analysis

Decomposition Analysis with QR or Unconditional Quantile Regression

Identification Strategies for QTE that rely on approaches other than IVs

Nonlinear Quantile Regression

. . .



Step 1: We Go Through the Fundamentals

Define quantile (percentile)

The simplest quantile regression (QR) model: the two-sampletreatment-control model

QR interpretation: basics and examples

Estimation (intuition only) & testing (a little bit)

Key properties of the QR estimator



Disclaimer . . .

Going through the fundamentals may not be fun

but . . . no free lunches



Distributions . . .

Y random variable with cumulative distribution function (c.d.f.) FY(·)

FY(y) ≡ Prob[Y ≤ y] y ∈ Y 0 < FY(y) ≤ 1

0.2

.4.6

.81

ecdf

x

-4 -2 0 2 4x



. . . Quantiles

Quantile function of Y, for any 0 < τ < 1 QuantY (τ) ≡ qY (τ) ≡ F−1Y (τ)

τth quantile ≡ QY (τ) = inf y : F (y) ≥ τ

-4-2

02

4x

0 .2 .4 .6 .8 1ecdfx



E.c.d.f. F(y) Quantile F−1(τ)

0.2

.4.6

.81

ecdf

x

-4 -2 0 2 4x

-4-2

02

4x

0 .2 .4 .6 .8 1ecdfx

Recalling two known result

1. For any known c.d.f. F (·), taken U ˜U (0, 1) and Y = F−1(U), then FY (y ) = F (y )∀y ∈ R

2. For any continuous r.v. Y with c.d.f. FY (·), taken U = FY (y ), then U ˜U (0, 1)



A simple (historical) example : food expenditure & income

In 1857, Engel highlighted that, the income elasticity of demand of food is leq 1:

when a households’ income X increases, the proportion of money they spend on

food Y decreases, even if actual expenditure on food rises.0

500

1000

1500

2000

500 1000 1500 2000 2500 3000Income

Food expenditure Predict. Food exp.(mean)

E [Y |X ] = β0 + β1X (β0, β1) : ∑ni=1(yi − β0 − β1xi )

2 = min



Boxplot: Conditional distribution of food expenditure by income levels

050

01,

000

1,50

02,

000

Foo

d ex

pend

iture

Low income (below median) High Income (above median)

Limits of the boxes: 1st and 3rd quartiles of food expenditure (Y) in each class.

Median: horizontal line in the middle of the box.

The mean food expenditure increases across groups but also the dispersionchanges

You could “draw lines” connecting conditional percentiles: the slopes of these

lines will not be constant across percentiles



Scatter plot, E [Y |X ] & QR

050

010

0015

0020

00

500 1000 1500 2000 2500 3000Income

Food expenditure Predict. Food exp.(mean)Predict. Food exp.(median) Predict. Food exp.(25th p)

QuantY (0.5|X ) = β0,0.5 + β1,0.5X

(β0,0.5, β1,0.5) :n

∑i=1

|yi − β0,0.5 − β1,0.5xi | = min



The Two-Sample Treatment-Control Model . . .

To interpret the meaning of β1,0.5 we consider the case with two levels of income

(the treatment) only, X0 (low income) and X1 (high income)C.d.f. Quantiles

0.2

.4.6

.81

E.c

.d.f

0 500 1000 1500 2000Food expenditure

Food exp., low income Food exp., high income0

500

1000

1500

2000

Foo

d ex

pend

iture

0 .2 .4 .6 .8 1

Food expenditure Food expenditure

Providing households with additional income may increase their food expenditure

differently depending on their propensity to consume or on their love for food



QR: interpretation (preview)P.d.f. C.d.f.

010

20

30

Perc

ent

0 500 1000 1500 2000

Food exp. | High income Food exp. | Low income

Food exp. conditional pdf |Income

0.2

.4.6

.81

E.c

.d.f

0 500 1000 1500 2000Food expenditure

Food exp., low income Food exp., high income

The quantile treatment efffect (QTE) is the change in food expenditure required

to keep the individual in the same quantile of the high income (treated)

distribution (G(·)) and low income (control) distribution (F(·), the horizontal

distance δ(x) between the distributions F and G: F (x)= G (x + δ(x))



Doksum’s (1974) treatment effect function δ(τ)

F (y) food expenditure cdf when income is low

G (y) food expenditure cdf when income is high

δ(τ) = G−1(τ)− F−1(τ), 0 < τ < 1

Taking τ = F (x) and changing variables δ(x) = G−1(F (x))− x

An (analog) estimate δ(τ) could:take the difference of the sample quantiles; orcan be obtained through (parametric) quantile regression

This does not say that, for instance, an individual

who is at the τ-th quantile on the low income distribution

will be at the τ-th quantile on the high income distribution

after an increase in income



When Shall One Use Quantile Regression?

The regression model is yi = β0 + β0xi + εi , iid errors ε ˜ Fε(·)Then FY (y) = Fε(y − β0 − β1x) and QuantY (τ) = β0 + β1x + Quantε(τ)

quantiles of food expenditure when income is low:

QuantY (τ) =

QR intercept︷︸︸︷β0 + Quantε(τ) +β1x

quantiles of food expenditure when income is high:

QuantY (τ) =

F−1(τ) (x=0)︷︸︸︷β0 + Quantε(τ) +β1x︸︷︷︸

G−1(τ) (x=1)

= x ′i β(τ)

x ′i ≡ [1 xi ] β(τ) ≡ [β0 +Quantε(τ) β1]′

The regression model is nested in a QR model QuantY (τ) = x ′i β(τ) that

restricts the effect of X to be constant at all quantiles



Presenting QR results

Plot graphs of the coefficient estimates with confidence bounds:

y -axis: β(τ); x-axis: quantile

Show the corresponding OLS coefficient estimate on the graph

Interpret the meaning (! need some caution)

Koenker & Hallock adapted from Tukey

“Never estimate intercepts, always estimate centercepts!”

Interpret the pattern (take location, location/scale models as reference)

Plot the estimated conditional quantile functions at

x = x to check for crossings



Presenting QR results: intercept & income coefficients . . .

0.00

50.0

010

0.00

150.

0020

0.00

Inte

rcep

t

.25 .5 .75Quantile

Fig_constant

0.40

0.50

0.60

0.70

Inco

me

.25 .5 .75Quantile

Fig_income



Presenting QR results: centercept & income coefficients

500.

0060

0.00

700.

0080

0.00

Inte

rcep

t

.1 .25 .5 .75 .9Quantile

Fig_constant

0.30

0.40

0.50

0.60

0.70

res_

inco

me

.1 .25 .5 .75 .9Quantile

Fig_res_income



Some useful benchmark examples

to learn how to

interpret the pattern of

quantile regression coefficients




C.d.f. Quantiles

0.2

.4.6

.81

0 5 10Quantile (Q_y(tau))


05

10

Quantile

(Q

_y(t

au))

0 .2 .4 .6 .8 1

Quantile y | control Quantile y |treatment




Quantiles QTE

05

10Q

uant

ile (

Q_y

(tau

))

0 .2 .4 .6 .8 1


3.30

3.40

3.50

3.60

3.70

x

.1 .25 .5 .75 .9Quantile

Fig_treatment

The regressor X only affects the location of the distribution of Y

Focusing on the impact of X on the conditional average is enough

The impact of X is the same across quantiles



Case A: Income Affects only the Location of the Food Expenditure Distribuion

05

1015

20

0 2 4 6 8 10x

y Fitted cond. quantile 0.10

Location Effect Example

yi = β0 + β1xi + u




05

1015

20

0 2 4 6 8 10x

y Fitted cond. quantile 0.10Fitted cond. quantile 0.50


yi = β0 + β1xi + u

QuantY (τ|X ) = βτ + β1X




05

1015

20

0 2 4 6 8 10x

y Fitted cond. quantile 0.10Fitted cond. quantile 0.50 Fitted cond. quantile 0.90


yi = β0 + β1xi + u

QuantY (τ|X ) = βτ + β1X = β0F−1u (τ) + β1X

(βτ, β1) :

∑yi≥βτ+β1xi

τ|yi − βτ + β1xi |+ ∑yi<βτ+β1xi

(1− τ)|yi − βτ + β1xi | = min



Treatment Effect is a Location & Scale Shift 1/3

P.d.f. C.d.f.

02

46

8P

erc

ent

-10 0 10 20

y |treatment y |control

y conditional pdf |xLocation and Scale Model

0.2

.4.6

.81

-10 0 10 20Quantile (tau)





C.d.f. Quantiles

0.2

.4.6

.81

-10 0 10 20Quantile (tau)


-10

010

20

Quantile

(ta

u)

0 .2 .4 .6 .8 1





Quantiles QTE

-10

010

20Q

uant

ile (

tau)

0 .2 .4 .6 .8 1


-2.0

00.

002.

004.

006.

008.

00x

.1 .25 .5 .75 .9Quantile

Fig_treatment

The regressor X only affects the location & scale of the distribution of Y

Focusing on the impact of X on the conditional average is not enough

The impact of X differs across quantiles



Case B: Income Affects Location & Scale of the Food Exp. Distribuion

020

040

060

080

0

0 2 4 6 8 10x

y Fitted cond. quantile 0.10Fitted cond. quantile 0.50

Location-Scale Effect Example

yi = β0 + β1xi + β2(xi )u




020

040

060

080

0

0 2 4 6 8 10x

y Fitted cond. quantile 0.10Fitted cond. quantile 0.50 Fitted cond. quantile 0.90


yi = β0 + β1xi + (β2(xi ))u

QuantY (τ|X ) = β0F−1u (τ) + β1x +

√β2(xi )F

−1u (τ)

intuition: the rescaled r.v. u = Y−β1x√β2(x)

is distributed independently of X




020

040

060

080

0

0 2 4 6 8 10x

y Fitted cond. quantile 0.10Fitted cond. quantile 0.50 Fitted cond. quantile 0.70Fitted cond. quantile 0.90


yi = β0 + β1xi + β2(xi )u

QuantY (τ|X ) = β0,τ + β3,τx = x ′β(τ)

(βτ, β1) :

∑yi−x ′i β(τ)≥0

τ|yi − x ′i β(τ)|+ ∑yi−x ′i β(τ)<0

(1− τ)|yi − x ′i β(τ)| = min



Scatter plot, E[Y|X] & QR

050

010

0015

0020

00

500 1000 1500 2000 2500 3000Income

Food expenditure Predict. Food exp.(mean)Predict. Food exp.(median) Predict. Food exp.(25th p)

QuantY (0.5|X ) = β0,0.5 + β1,0.5X

(β0,0.5, β1,0.5) : ∑ni=1 |yi − β0,0.5 − β1,0.5xi | = min



(No) Crossings

QuantY (τ) =

F−1(τ) (x=0)︷︸︸︷β0(τ) +β1(τ)x︸︷︷︸G−1(τ) (x=1)

= x ′i β(τ)

x ′i ≡ [1 xi ] β ≡ [β0(τ) β1(τ)]′

!! Y |X = x can be simulated by setting y = x ′β(U) U ˜ U (0, 1)

Crucial: all the coordinates in β(U) are determined by a single draw from U (0, 1)

Percentiles are ordered. Implicit in the formulation QuantY (τ) = x ′i β(τ) is the

requirement that QuantY (τ) is monotone increasing in τ, ∀x .

Crossings may be observed for extreme values of x .

At x = x , quantiles should not cross. If conditional quantile functions cross at a

significant number of points, this can be interpreted as evidence of modelmisspecification.



Treatment Effect is a Shape Change 1/3

P.d.f. C.d.f.

010

20

30

40

50

Perc

ent

-20 -10 0 10 20

y |treatment y |control

y conditional pdf |xChange in Shape

0.2

.4.6

.81

-20 -10 0 10 20Quantile (tau)





C.d.f. Quantiles

0.2

.4.6

.81

-20 -10 0 10 20Quantile (tau)


-20

-10

010

20

Quantile

(ta

u)

0 .2 .4 .6 .8 1





Quantiles QTE

-20

-10

010

20Q

uant

ile (

tau)

0 .2 .4 .6 .8 1


0.50

1.00

1.50

2.00

2.50

3.00

x

.1 .25 .5 .75 .9Quantile

Fig_treatment

The regressor X only affects the shape of the distribution of Y

Focusing on the impact of X on the conditional average is not enough

The impact of X differs across quantiles

Narrower spacing of the lower quantiles indicates higher density and short lower tail

Wider spacing of the upper quantiles indicates lower density and long upper tail



Summary: Interpretation of the Pattern of QR coefficients

Location Shift Location-Scale Shift

3.30

3.40

3.50

3.60

3.70

x

.1 .25 .5 .75 .9Quantile

Fig_treatment

-2.0

00.

002.

004.

006.

008.

00x

.1 .25 .5 .75 .9Quantile

Fig_treatment

see also fig.2.9 Koenker (2005)Shape Change

0.50

1.00

1.50

2.00

2.50

3.00

x

.1 .25 .5 .75 .9Quantile

Fig_treatment



Intuition on the Estimation of QR coefficients

To estimate the parameters for the τ − th quantile, we seek the values of theparameters such that we have a fraction τ of positive and a fraction (1− τ)of negative residuals; e.g. for the median, we need to have as many positiveas negative residuals

Robustness: an outlier observation matters for estimation only if it changesthe sign of the residual

Median 25th quantile

0.5

11.

52

Sym

met

ric L

oss

Fun

ctio

n

-4 -2 0 2 4Residual

Symmetric Loss Function Symmetric Loss Function

MedianSymmetric Loss Function

0.5

11.

52

2.5

Asy

mm

etric

Los

s F

unct

ion

-4 -2 0 2 4 6Residual

Asymmetric Loss Function Asymmetric Loss Function

25th percentileAsymmetric Loss Function

! Linear programming



Properties of QR estimator: Equivariance

It guarantee a coherent interpretation of the results when the data or the

underlying model are modified not in an essential way.

Scale equivariance

For any a > 0, β(τ; ay , X ) = aβ(τ; y , X ) and

β(τ;−ay , X ) = aβ(1− τ; y , X )

Regression Shift

For any γ ∈ Rp, β(τ; y + X γ, X ) = β(τ; y , X ) + γ

Reparametrization of Design

For any |A| 6= 0, β(τ; y + AX , X ) = A−1 β(τ; y , X )

The analog invariance properties hold in mean regression



Equivariance to Monotone Transformations

For any monotone function h(·), conditional quantile functions are equivariant

Quanth(Y )(τ|x) = h(QuantY (τ|x))

i.e. the quantiles of the transformed variable are simply the transformed quantiles

of the original variable

The analog property does not hold in mean regression



Every Serious Estimate Deserves a Standard Error

1 Finite sample distribution of the estimator: limited applicability in practice

2 Asymptotic distribution of the estimator:√n(β(τ)− β(τ)) ˜ N (0, τ(1− τ)H−1

n JnH−1n )

Jn(τ) = n−1 ∑ni=1 xix

′i = X ′X Hn(τ) = limn→∞ n−1 ∑n

i=1 xix′i fi (ξi (τ))

fi (ξi (τ)) is the conditional density of the response evaluated at the τ-thconditional quantile

remark 1 the factor τ(1− τ) tends to make quantiles more precise in the tails;remark 2 the factor Hn(τ) tends to make quantiles less precise in regions of lowdensity; this effect typically dominates;

remark 3: nonparametric density (the sparsity parameter or quantile-density

function) estimation required

3 (Some form of) Resampling (bootstrap)



Few Words on Testing

The results on the distribution of a single vector of QR parameters can be

extended to derive the asymptotic covariance matrix for distinct quantile thus

allowing to contrast estimates of the slope coefficients across quantiles

Test of the location-shift model or of simmetry can be described as

test of linear restriction btw the coefficients of a regressor at different

quantiles → Wald tests can be constructed

Other approaches: quantile likelihood ratio tests, rank-based inference, . . .



Discrete or Continuos Treatment

Binary case yi = yiC · (1− xi ) + yiT · xi = yiC + (yiT − yiC ) · xi

QY (τ) ≡ F−1yC (τ) · (1− x) + F−1

yT (τ)) · x

= F−1yC (τ) + (F−1

yT (τ)− F−1yC (τ)) · x

= α(τ) + β(τ) · x

Discrete case: p treatments yi = yiC + ∑pj=1(yij − yiC ) · xij = yiC + ∑p

j δij · xij

QY (τ) ≡ = F−1yC (τ) + ∑j (F

−1yj (τ)− F−1

yC (τ)) · x

= α(τ) + ∑j δj (τ) · xj

Continuous caseQY (τ) = α(τ) + γ(τ)x



Stata Example: Food Expenditure (Y) and Income (X)



QTE of Jobs First on Earnings

from Bitler et al. (AER, 2006)

1000 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2006

Quarterly

impact

1,000

800

600

400

200

0

-200

-400

-600

10 20 30 40 50 60 70 80 90 Quantile

FIGURE 3. QUANTILE TREATMENT EFFECTS ON THE DISTRIBUTION OF EARNINGS, QUARTERS 1-7

Notes: Solid line is QTE; dotted lines provide bootstrapped 90-percent confidence intervals; dashed line is mean impact; all statistics computed using inverse propensity-score weighting. See text for more details.

group over the first seven quarters and 55 percent of corresponding AFDC group person- quarters. For quantiles 49-82, Jobs First group earnings are greater than control group earnings, yielding positive QTE estimates. Between quantiles 83 and 87, earnings are again equal (though non-zero). Finally, for quantiles 88-97, AFDC group earnings exceed Jobs First group earnings, yielding negative QTE estimates. The only quantile having a statistically significant QTE based on a two-sided test is the ninety- second-for all other quantiles between 89 and 96, the two-sided QTE confidence intervals include zero in the confidence interval. On the other hand, one-sided tests yield p-values of 0.10 or lower for all QTE in the 90-95 quantile range.'6 These results are what basic labor supply theory, discussed above, predicts. That is, the QTE at the low end are zero, they rise, and then they eventually become negative (if im- precisely estimated). The negative effects alt the top of the earnings distribution are particularly interesting given that they have typically not been found in other programs (e.g.,

Nada Eissa and Jeffrey B. Liebman's, 1996, study of the EITC).

The variation in Jobs First's impact across the quantiles of the distributions appears unmistak- ably significant, both statistically and substan- tively; these results suggest that the mean treatment effect is far from sufficient to characterize Jobs First's effects on earnings."7

Figure 4 plots the earnings QTE results in quarters 8-16, after the time limit takes effect for at least some women. For the first 76 quantiles, these results are broadly similar to those for the pre-time limit period (though they have a somewhat wider range and become positive slightly earlier). For quantiles 77-97, we again find negative treatment effects (with a few being zero), but none of them is individually signifi-

16 To test whether these QTE estimates are jointly sig- nificantly negative, we carry out two sets of tests. Details are somewhat complicated, so we relegate them to Appen- dix B. Our basic conclusion, however, is that there is some marginal evidence that these QTE are jointly different from zero.

17 Under the null of constant treatment effects, all QTE must equal the mean treatment effect. This null can be rejected decisively simply by noting the large fraction of the treatment group earnings distribution having zero earnings (Heckman et al., 1997, make a similar point regarding treatment effects of job training). We did conduct more formal tests for the null that the $800(= $500 - (-$300)) range of the estimated QTE could have been generated under the null that all quantiles of the Jobs First distribution equal the mean treatment effect plus the corresponding quantiles of the AFDC distribution. These tests, which impose the null by using paired bootstrap sample draws from the AFDC group sample and then adding the mean treatment effect to each sample quantile in one of the pairs, soundly reject the equality of the QTEs.




References those who fit in the slide . . .

Abrevaya (2001) ‘The Effects of Demographics and Maternal Behavior on the Distributionof Birth Outcomes’, Empirical Economics pp. 247-257

Brunello, Fort, Weber (2009) ‘Changes in Compulsory Schooling,Education and theDistribution ofWages in Europe’, Economic Journal 110 pp. 516-539

Bitler, Gelbach, Hoynes (2006) ‘What Mean Impacts Miss: Distributional Effects ofWelfare Reform Experiments’, The American Economic Review 96(4) pp. 988-1012

Buchinsky (1998) ‘Recent Advances in Quantile Regression Models: A Practical Guidelinfor Empirical Research’ The Journal of Human Resources 33 (1), pp. 88-126

Koenker & Hallock (2001) ‘Quantile regression’ Journal of Economic Perspectives 15(4),pp. 143-156

Maynard, Qui (2009) ‘Public Insurance and Private Savings: Who Is Affected and By HowMuch?’ Journal of Applied Econometrics 24, pp.282-308

Martins et al. (2004) ‘Does Education Reduce Wage Inequality? Quantile RegressionEvidence from 16 Countries’, Labour Economics, Vol. 11. pp. 355-371

Melly, Puhani (forth.) ‘Do Public Ownership and Lack of Competition Matter for Wagesand Employment? Evidence FRom Personnel Records of a Privatized Firm’, Journal of theEuropean Economic Organization pp.

Koenker (2005) Quantile Regression Chapter 1 and 2 and 6 (covered partially)

Ozkan, Ozbeklik (2014) ‘Who Benefits From Job Corps?A Distributional Analysis of AnActive Labour Market Program’ Journal of Applied Econometrics 24, pp.282-308



Wrap Up on the

Fundamentals of Quantile Regression (QR)

before Moving to the Discussion on

Identification Strategies in QR

that Exploit Instrumental Variation



Today’s Running Example (Keep It in Mind To Avoid Getting Lost)

“The school is a promising place to increase the skills and incomes of individuals. As a result,

educational policies have the potential to decrease existing, and growing, inequalities in income”

Ashenfelter et al. (2001)

Does education reduce wage inequality? .e. Are the returns to education

homogenous over the wage distribution?

Policy relevance:

schooling can be a powerful tool to combat inequalityschooling can reduce differences due to genetic & envirnomental factors

Similar questions may be asked for training programs

Related papers:

Martins et al (LE, 2004): assumes exogeneityChernozhucov et al. (2006): IVQTE model; addresses endogeneity issues

√

Abadie et al. (2002) (training): LATE-QTE model; endogeneity issues√

Brunello et al. (EJ, 2009): causal chain model; endogeneity issues√



Discrete or Continuos Treatment

x treatment (education) y outcome (wages)

Binary case yi = yiC · (1− xi ) + yiT · xi = yiC + (yiT − yiC ) · xi

QY (τ) ≡ F−1yC (τ) · (1− x) + F−1

yT (τ)) · x

= F−1yC (τ) + (F−1

yT (τ)− F−1yC (τ)) · x

= α(τ) + β(τ) · x

Discrete case: p treatments yi = yiC + ∑pj=1(yij − yiC ) · xij = yiC + ∑p

j δij · xij

QY (τ) ≡ = F−1yC (τ) + ∑j (F

−1yj (τ)− F−1

yC (τ)) · x

= α(τ) + ∑j δj (τ) · xj

Continuous caseQY (τ) = α(τ) + γ(τ)x



Summary: Interpretation of the Pattern of QR coefficients

Location Shift Location-Scale Shift

3.30

3.40

3.50

3.60

3.70

x

.1 .25 .5 .75 .9Quantile

Fig_treatment

-2.0

00.

002.

004.

006.

008.

00x

.1 .25 .5 .75 .9Quantile

Fig_treatment

see also fig.2.9 Koenker (2005)Shape Change

0.50

1.00

1.50

2.00

2.50

3.00

x

.1 .25 .5 .75 .9Quantile

Fig_treatment



Interpretation of QTE

x treatment (education) y outcome (wages)

QY (τ) = α(τ) + δ(τ)x

We may interpret τ as a latent characteristic

E.g.: τ is an unobserved factor that determines wages (“ability”)

the QTE can be interpreted as “an interaction effect” between unobserved

“ability”/ “propensity to earn high wage”

Without further assumptions, “there is no way of knowing whether the

treatment actually operates in the manner described by δ(τ). In fact, the

treatment may miracoulously make weak subjects especially robust and turn

the strong into a jello. All we can observe from experimental evidence

however is the difference in the two marginal(s)(. . .). This is what the

quantile treatment effect does” (Koenker, 2005)



With Randomization, We Identify Marginal cdf (& QTE)

Randomization Does Not Help to Learn theDistribution of the Impact

We Cannot Retrieve Joint DistributionsFrom Marginals w/o Additional Assumptions



Martins and Pereira (2004)Table 1

Data-sets description, descriptive statistics and inequality measures

Country Data set Year No. of Educ. Exp. Log Wage Wage Wage ratios (1)

observationsMean C.V. 10th

percentile

50th

percentile

90th

percentile

9/1 9/5 5/1

Austria Mikrozensus 1993 7175 10.1 21.3 4.57 0.077 65.8 93.8 150 2.28 1.6 1.43

Denmark Long. Lab. Market Reg. 1995 4416 12 19.4 4.97 0.072 96.5 138.4 230.4 2.39 1.67 1.43

Finland Labour Force Survey 1993 1175 11.4 19.5 4.16 0.091 41.9 62.1 106.1 2.53 1.71 1.48

France Training Qualif. +

Employment Survey

1993 4606 11.4 21.9 10.92 0.036 19.8 29.8 54.1 2.73 1.81 1.5

Germany Socio-Economic Panel 1995 1070 11.9 24.7 3.4 0.103 2.64 2.92 3.01 1.45 1.09 1.33

Greece Household Budget Survey 1994 2096 10.1 21.9 6.93 0.092 527 1103 1907 3.62 1.73 2.09

Ireland ESRI Household Survey 1994 1903 12.4 23.8 1.74 0.351 2.5 5.9 11.9 4.74 2.01 2.36

Italy Survey of Household Income

and Wealth

1995 3441 10.1 22.9 2.52 0.163 7.8 12.5 20.8 2.67 1.67 1.6

Netherlands Structure of Earnings Survey 1996 49805 12.5 20 3.23 0.142 15.5 24.9 43.8 2.83 1.75 1.61

Norway Level of Living Survey 1995 870 12.2 20.9 4.65 0.071 71.4 101.1 158 2.21 1.56 1.42

Portugal Personnel Records 1995 28055 6.5 24.5 6.42 0.095 318 531 1456 4.58 2.74 1.67

Spain Wage Structure Survey 1995 118005 8.8 26 7.3 0.071 761 1410 2999 3.94 2.13 1.85

Sweden Level of Living Surveys 1991 1508 11.8 21.5 4.45 0.070 61 81 127 2.08 1.57 1.33

Switzerland Labour Force Survey 1995 6334 13.2 19.8 3.6 0.111 23.9 35.9 60.3 2.53 1.68 1.51

UK Family Expenditures Survey 1995 2183 12.3 22.6 2 0.245 4.1 7.3 13.5 3.33 1.85 1.8

USA Current Population Survey 1995 42347 12.6 18.5 2.33 0.202 5.5 10 19 3.45 1.82 1.9

See Appendix A for a more detailed characterisation of the data sets.

Results for France and Spain refer to yearly earnings. Hourly wages for France and Spain were computing assuming 1760 h/year. Inequality figures (1, 5, 9) refer to 10th,

50th and 90th percentiles.

P.S.Martin

s,P.T.Pereira

/LabourEconomics

11(2004)355–371

358

Data on (gross) hourly wages of full-time male workers; net wages for Austria,Greece, Italy



Martins and Pereira (2004): Evidence

Fig. 2. Returns to education, QR and OLS.

P.S. Martins, P.T. Pereira / Labour Economics 11 (2004) 355–371362

Fig. 2. Returns to education, QR and OLS.

P.S. Martins, P.T. Pereira / Labour Economics 11 (2004) 355–371362



Martins and Pereira (2004): Evidence

distribution (ninth–fifth deciles), the exceptions being Germany, Greece, Ireland and

the US.6

4. Empirical results

The empirical results were obtained by regressing the following version of the Mincer

(1974) equation, under Becker’s (1975) framework:

logyi ¼ ah þ bh � educi þ dh1 � expi þ dh2 � exp2i þ ui;

where i = 1,. . .,N (N being the number of observations for each year), h= 0.1,0.2,. . .,0.9is the quantile being analysed, y is the hourly wage, educ is the number of schooling

Fig. 2. (continued).

6 These results are generally in accordance with those presented at Gottschalk and Smeeding (1997). However,

a thorough comparison is impossible as both the time period and the earnings measure covered there are different.

P.S. Martins, P.T. Pereira / Labour Economics 11 (2004) 355–371 363

distribution (ninth–fifth deciles), the exceptions being Germany, Greece, Ireland and

the US.6

4. Empirical results

The empirical results were obtained by regressing the following version of the Mincer

(1974) equation, under Becker’s (1975) framework:

logyi ¼ ah þ bh � educi þ dh1 � expi þ dh2 � exp2i þ ui;

where i = 1,. . .,N (N being the number of observations for each year), h= 0.1,0.2,. . .,0.9is the quantile being analysed, y is the hourly wage, educ is the number of schooling

Fig. 2. (continued).

6 These results are generally in accordance with those presented at Gottschalk and Smeeding (1997). However,

a thorough comparison is impossible as both the time period and the earnings measure covered there are different.

P.S. Martins, P.T. Pereira / Labour Economics 11 (2004) 355–371 363



Martins and Pereira (2004): Discussion

Evidence of incraesing returns over the conditional wage distribution

Interpretation: more skilled workers, receive higher returns to education;

additional schooling may increase with-group wage inequality (!)

Why?

Over-education: extensions of the lower tail of the wage

distribution of the highly educated

Schooling & ability non-trivial interaction: differences in ability relevant for the

highly educated but less so for the low educated (less dispersion)

Endogeneity: unobserved factors that impact upon pay differentials and

are heterogeneous across workers with any given skills level



Chernozhucov and Hansen (2006): Evidence

provided by formal education.23 Interpreting the quantile index t as indexing ability,these results are also consistent with a simple model in which individuals acquireeducation up to the point where the cost equals the rate of return and cost dependsnegatively on ability.24 In this case, we would expect the returns to schooling to be

ARTICLE IN PRESS

0.2 0.4 0.6 0.8

0

0.1

0.2

0.3

0.4

0.5

IV-QR: Schooling

0.2 0.4 0.6 0.8

0.06

0.062

0.064

0.066

0.068

0.07

0.072

0.074

0.076

0.078

0.08

QR: Schooling

Fig. 1. The sample size is 329,509. Coefficient estimates are on the vertical axis, while the quantile index is

on the horizontal axis. The shaded region is the 95% confidence band estimated using robust standard

errors. The left panel contains estimates of the returns to schooling obtained through instrumental

variables quantile regression, and the right panel presents estimates of the effect of years of schooling on

earnings obtained through standard quantile regression. For comparison, the dashed line in the first panel

plots the schooling coefficient estimated through standard quantile regression. All estimates were

computed at 0.05 unit intervals for t 2 ½0:05; 0:95�:

Table 1

Process tests for the earning equation. Subsample size ¼ 5n2=5

Null hypothesis Kolmogorov–Smirnov statistic 90% Critical value 95% Critical value

No effect. að�Þ ¼ 0 4.563 2.572 2.935

Constant effect. að�Þ ¼ a 2.630 2.442 2.658

Dominance að�ÞX0 0.000 2.185 2.549

Exogeneity að�Þ ¼ aQRð�Þ 2.510 2.465 2.721

23The term ‘‘ability’’ is used to characterize the unobserved component of earnings, which likely

captures elements of ability and motivation as well as noise.24See, for example, Card (1999).

V. Chernozhukov, C. Hansen / Journal of Econometrics 132 (2006) 491–525512



Chernozhucov and Hansen (2006): Evidence

provided by formal education.23 Interpreting the quantile index t as indexing ability,these results are also consistent with a simple model in which individuals acquireeducation up to the point where the cost equals the rate of return and cost dependsnegatively on ability.24 In this case, we would expect the returns to schooling to be

ARTICLE IN PRESS

0.2 0.4 0.6 0.8

0

0.1

0.2

0.3

0.4

0.5

IV-QR: Schooling

0.2 0.4 0.6 0.8

0.06

0.062

0.064

0.066

0.068

0.07

0.072

0.074

0.076

0.078

0.08

QR: Schooling

Fig. 1. The sample size is 329,509. Coefficient estimates are on the vertical axis, while the quantile index is

on the horizontal axis. The shaded region is the 95% confidence band estimated using robust standard

errors. The left panel contains estimates of the returns to schooling obtained through instrumental

variables quantile regression, and the right panel presents estimates of the effect of years of schooling on

earnings obtained through standard quantile regression. For comparison, the dashed line in the first panel

plots the schooling coefficient estimated through standard quantile regression. All estimates were

computed at 0.05 unit intervals for t 2 ½0:05; 0:95�:

Table 1

Process tests for the earning equation. Subsample size ¼ 5n2=5

Null hypothesis Kolmogorov–Smirnov statistic 90% Critical value 95% Critical value

No effect. að�Þ ¼ 0 4.563 2.572 2.935

Constant effect. að�Þ ¼ a 2.630 2.442 2.658

Dominance að�ÞX0 0.000 2.185 2.549

Exogeneity að�Þ ¼ aQRð�Þ 2.510 2.465 2.721

23The term ‘‘ability’’ is used to characterize the unobserved component of earnings, which likely

captures elements of ability and motivation as well as noise.24See, for example, Card (1999).

V. Chernozhukov, C. Hansen / Journal of Econometrics 132 (2006) 491–525512

Education affects wages

Is it a location-shift effect? No (QTE heterogeneity)

Is the effect unambiguously beneficial? No

Can we reject exogeneity? Yes



How do C & H get at the (causal) QTE for US ?

They use an instrumental variable

How? We get there now



E.c.d.f. F(y) Quantile q(τ) ≡ F−1(τ)0

.2.4

.6.8

1ec

dfx

-4 -2 0 2 4x

-4-2

02

4x

0 .2 .4 .6 .8 1ecdfx

Recalling two known result

1. For any known c.d.f. F (·), taken U ˜U (0, 1) and Y = F−1(U), then FY (y ) = F (y )∀y ∈ R

2. For any continuous r.v. Y with c.d.f. FY (·), taken U = FY (y ), then U ˜U (0, 1)

Y = q(D, UD) UD |D ˜ U (0, 1)

τ → q(d , τ) is the (structural) conditional quantile function



Identification

Exogenous treatment D :

Prob[Y ≤ q(D, τ)] = Prob[UD ≤ τ|D ] = τ ∀τ ∈ (0, 1)

IVQTE by Chernozhucov and Hansen (2005), Z instrumental variable

Prob[Y ≤ q(D, τ)|Z ] = Prob[UD ≤ τ|D, Z ]

Prob[UD ≤ τ|Z ] = τ ∀τ ∈ (0, 1)

A crucial assumption in the IVQTE model is

The rank variable U is made invariant to D via Z

Rank invariance can be relaxed to rank similarity

Still, it rules out any systematic variation

of the rank across treatment states



IVQTE Model: Representation & Assumptions

A1. Potential outcomes Yd = q(d , x , Ud ) is the potential outcome Ud ˜ U (0, 1),q(d , x , τ) strictly increasing in τ

A2. Independence |X , Ud ⊥ Z

A3. Selection |X = x , Z = z , D = δ(x , z , ν) ν random

ν is responsible for different choices of D of observationallyidentical individuals

E.g: ν an unobserved information component correlated with Uthat includes factors relevant in making the education decision

A4. Rank invariance (a) or rank similarity (b) (a) Ud = Ud ′ or (b) Ud ˜ Ud ′

E.g.: U is determined by ability and factors that do not vary with d

(a) makes the joint distrbution of potential outcomes {Yd}degenerate

A5. Observed variables Y = q(D, X , UD), D = δ(Z , X , ν),X ,Z



IVQTE Model: Testable Implications

Testable Implication A1-A5

Prob[Y ≤ q(D, X , τ)|X , Z ] = Prob[UD ≤ τ|X , Z ] = τ ∀τ ∈ (0, 1)

UD ⊥ Z , X

equivalently

Prob[Y − q(D, X , τ) ≤ 0|X , Z ] = τ ∀τ ∈ (0, 1)

i.e.

QY−q(D,X ,τ)(τ|X , Z ) = 0 ∀τ ∈ (0, 1)



IVQTE Model: In Practice

We are interested in QY (τ|X , D) = α(τ)D + x ′β(τ)

1. Run the “usual” first stage regression: D = x ′δ1 + z ′δ2 + ε and predict D

2. Run the quantile regression QY (τ|X , D) = α(τ)D + x ′β(τ)

3. For a grid of values of α(τ) around the estimated α(τ) from step 2.

run the quantile regression

QY−α(τ)D(τ|X , D, Z ) = x ′β(τ) + γ(τ)Z

4. Choose α(τ) as the value of α(τ) for which |γ(τ)| is closer to zero

5.α(τ), β(α(τ), τ)

is the estimator of the parameters of the conditional quantile function we aimat

Routines developed by Hansen available in Ox/MATLAB



IVQTE Model, In Practice: Remarks

Rank invariance makes the joint distribution of potential outcomes not truly

multivariate: this does not restrict QTE but affects interpretation

under the IVQTE model assumptions, the QTE is the effect for an individual

that is in the same quantile τ of the treated and control distribution

Rank invariance may be more plausible if the X set is large

Check the pattern of the objective function on the grid-seach:

a flat pattern may suggest identification problems

C & H 2008 propose an alternative estimation method (dual inference

procedure) that is robust to weak instruments; also this alternative method

involves a grid-search step; direct and dual inference procedures will deliver

similar results when the correlation between Z (iv) and treatment is strong



IVQTE Model, In Practice: Another Example

Figure 4. Treatment effect when the treatment (additional schooling) is assumed to be exogenous. Females only.

-.4

-.3

-.2

-.1

0

.1 .2 .3 .4 .5 .6 .7 .8 .9Quantile

CI 95%, QR QR coeff

FEMALES -7/+7 windowCoefficient of yedu, QR

20

25

30

35

40

.1 .2 .3 .4 .5 .6 .7 .8 .9Quantile

CI 95%, QR QR coeff

conditional distribution of BMI, FEMALES -7/+7 window QR Intercept

Conditional quantiles baseline country (U.K.) at average value of the covariates &zero education

From Brunello, Fabbri, Fort, IZA WP2009; revised in JOLE 2013M.Fort () Quantile Regression Last updated: May 20, 2014 73


Years of Schooling and BMI of European FemalesFigure 5. Treatment effect when the treatment (years of schooling) is assumed to be exogenous and when is treated as endogenous and instrumented with years of compulsory education (ycomp). Females only.

-1.5

-1.2

5-1

-.75

-.5

-.25

0.2

5.5

.1 .2 .3 .4 .5 .6 .7 .8 .9Quantile

CI 95%, IVQR CI 95%, QRIVQR coeff QR coeff

on the conditional distribution of BMI, FEMALES -7/+7 window - QR and IVQR- Effect of years of schooling

-1.5

-1-.

50

.51

1.5

.1 .2 .3 .4 .5 .6 .7 .8 .9Quantile

CI 95%, IVQR CI 95%, QRIVQR coeff QR coeff

on the conditional distribution of BMI, FEMALES -7/+7 window - QR and IVQR- Effect of years of schooling

Education is protective

There is heterogenity but impact is not monotone (as in standard QR)

Imprecise Estimates (as standard QR would show)

From Brunello, Fabbri, Fort, IZA WP2009; revised in JOLE 2013M.Fort () Quantile Regression Last updated: May 20, 2014 74


Years of Schooling and BMI of European Females: Testing

Null hypothesis K-S statistic 90% crit. value 95% crit. value

No effect α(·) = 0 2.799 2.739 2.978

Constant effect α(·) = α(0.5) 1.086 2.748 3.066

Dominance α(·) ≤ 0 0 2.371 2.644

Exogeneity α(·) = αQR (·) 1.542 2.713 2.994



Years of Schooling and BMI of European Females: Grid-search patternsFigure 6: Objective function at selected quantiles. Quantile: 0.35

Matlab: Variable search grid Matlab: Fixed search grid step 0.025

−0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

−1.5 −1 −0.5 0 0.5 1 1.50

0.2

0.4

0.6

0.8

1

1.2

1.4

Matlab: Fixed search grid step 0.05 Ox: Fixed search grid step 0.025

−1.5 −1 −0.5 0 0.5 1 1.50

0.2

0.4

0.6

0.8

1

1.2

1.4

−1.50 −1.25 −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45 norm(gamma(alpha)) × alpha

15



Years of Schooling and BMI of European Females: Grid-search patterns

Figure 9: Objective function at selected quantiles. Quantile: 0.50Matlab: Variable search grid Matlab: Fixed search grid step 0.025

−0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

−1.5 −1 −0.5 0 0.5 1 1.50

0.5

1

1.5


−1.5 −1 −0.5 0 0.5 1 1.50

0.5

1

1.5

−1.50 −1.25 −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50norm(gamma(alpha)) × alpha

18



Years of Schooling and BMI of European Females: Grid-search patterns

Figure 13: Objective function at selected quantiles. Quantile: 0.70Matlab: Variable search grid Matlab: Fixed search grid step 0.025

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

−1.5 −1 −0.5 0 0.5 1 1.50

0.5

1

1.5

2

2.5


−1.5 −1 −0.5 0 0.5 1 1.50

0.5

1

1.5

2

2.5

−1.50 −1.25 −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7 norm(gamma(alpha)) × alpha

22



JTPA evalutation within the C&H IVQTE model

Unlike the case considered above, we do not find large differences between the direct and dual inferenceprocedures for IVQR in this case. The similarity between the two approaches is not unexpected due to thestrong correlation between the instrument and endogenous regressor. The close agreement here furthersuggests that not much is lost by considering the dual procedure in cases where identification is strong. It alsoprovides further support for the argument that the differences detected in the previous section are due to weakidentification. Given the robustness of the dual procedure to the presence of weak instruments and its simplecomputation, it seems that this inference procedure will be preferable to the standard procedure in many cases.

The dual confidence bounds are further illustrated in Fig. 5, which plots the IVQR objective function W nðaÞover the parameter space A. a is plotted on the horizontal axis, and the vertical axis shows W nðaÞ. Thehorizontal line in each graph is the 95% critical value for the dual inference procedure, so all points lyingbelow the horizontal line belong to the confidence region for aðtÞ. The graphs in Fig. 3 differ markedly fromthose in Figs. 1 and 2. In particular, all of the objective functions, and hence confidence regions, in Fig. 3 look

ARTICLE IN PRESS

0.2 0.3 0.4 0.5 0.6 0.7 0.8-2000

0

2000

4000

6000

8000QR: Training Effect

τ

Tra

inin

g E

ffect

0.2 0.3 0.4 0.5 0.6 0.7 0.8-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4QR: Percentage Impact of Training

τ

Tra

inin

g E

ffect

0.2 0.3 0.4 0.5 0.6 0.7 0.8-2000

0

2000

4000

6000

8000IVQR: Training Effect

τ

Tra

inin

g E

ffect

0.2 0.3 0.4 0.5 0.6 0.7 0.8-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4IVQR: Percentage Impact of Training

τ

Tra

inin

g E

ffect

Fig. 4. Estimates of the training impact by QR and by IVQR. Notes: Left column. QR and IVQR estimates of the impact of a job training

program on earnings for t ¼ 0:15, 0.25, 0.50, 0.75, and 0.85. The top panel reports the QR estimate of the training impact, and the bottom

panel reports the IVQR results. In each figure, the solid line represents the point estimates, and the dashed (- -) line represents the 95%

confidence interval formed using the direct inference approach. For the IVQR results, the dash-dot (-.) line represents the 95% confidence

bound constructed using the dual inference procedure described in the text. In both figures, the horizontal axis measures the quantile index

t, and the vertical axis is the impact of training on earning quantiles measured in dollars. Models include covariates as specified in the text,

and the sample size is 5102. Right column. QR and IVQR estimates of the percentage impact of training for t ¼ 0:15, 0.25, 0.50, 0.75, and0.85. The top panel reports the QR estimate of the training impact, and the bottom panel reports the IVQR results. Percentage impacts are

for moving from non-training to training and all other covariates are evaluated at their sample mean. In both figures, the horizontal axis

measures the quantile index t, and the vertical axis is the percentage impact of training.

V. Chernozhukov, C. Hansen / Journal of Econometrics 142 (2008) 379–398 393



JTPA evalutation within the LATE-QTE modelquantiles of trainee earnings 105

TABLE IIIQuantile Treatment Effects and 2SLS Estimates

Dependent Variable: 30-month Earnings

Quantile

2SLS 0.15 0.25 0.50 0.75 0.85

A. MenTraining 1,593 121 702 1,544 3,131 3,378

(895) (475) (670) (1,073) (1,376) (1,811)% Impact of Training 8.55 5.19 12.0 9.64 10.7 9.02

High school or GED 4,075 714 1,752 4,024 5,392 5,954(573) (429) (644) (940) (1,441) (1,783)

Black −2�349 −171 −377 −2�656 −4�182 −3�523(625) (439) (626) (1,136) (1,587) (1,867)

Hispanic 335 328 1,476 1,499 379 1,023(888) (757) (1,128) (1,390) (2,294) (2,427)

Married 6,647 1,564 3,190 7,683 9,509 10,185(627) (596) (865) (1,202) (1,430) (1,525)

Worked less than 13 −6�575 −1�932 −4�195 −7�009 −9�289 −9�078weeks in past year (567) (442) (664) (1,040) (1,420) (1,596)

Constant 10,641 −134 1,049 7,689 14,901 22,412(1,569) (1,116) (1,655) (2,361) (3,292) (7,655)

B. WomenTraining 1,780 324 680 1,742 1,984 1,900

(532) (175) (282) (645) (945) (997)% Impact of Training 14.6 35.5 23.1 18.4 10.1 7.39

High school or GED 3,470 262 768 2,955 5,518 5,905(342) (178) (274) (643) (930) (1026)

Black −554 0 −123 −401 −1�423 −2�119(397) (204) (318) (724) (949) (1,196)

Hispanic −1�145 −73 −138 −1�256 −1�762 −1�707(488) (217) (315) (854) (1,188) (1,172)

Married −652 −233 −532 −796 38 −109(437) (221) (352) (846) (1,069) (1,147)

Worked less than 13 −5�329 −1�320 −3�516 −6�524 −6�608 −5�698weeks in past year (370) (254) (430) (781) (931) (969)

AFDC −2�997 −406 −1�240 −3�298 −3�790 −2�888(378) (189) (301) (743) (1,014) (1,083)

Constant 10,538 984 3,541 9,928 15,345 20,520(828) (547) (837) (1,696) (2,387) (1,687)

Note: The table reports 2SLS and QTE estimates of the effect of training on earnings. Assignment status is used as an instrumentfor training. The specification also includes indicators for service strategy recommended, age group, and second follow-up survey.Robust standard errors are reported in parentheses.

quantile. The estimates at low quantiles are substantially smaller than the corre-sponding quantile regression estimates, and they are small in absolute terms. Forexample, the QTE estimate (standard error) of the effect on the .15 quantile formen is $121 (475), while the corresponding quantile regression estimate is $1,187(205). Similarly, the QTE estimate (standard error) of the effect on the .25 quan-tile for men is $702 (670), while the corresponding quantile regression estimate is



How do AAI get at the (causal) effect of JTPA on earnings ?They use an instrumental variable

extending the IV-LATE identification approachto the identification of QTE

Imbens & Rubin, Review of Economic Studies 1997Abadie, Journal of the America Statistical Association 2002

Abadie, Angrist, Imbens, Econometrica 2002

Remarks

No rank invariance/rank similarity assumption

QTE for compliers only, not for the average individual in the population

LATE-QTE can accomodate only binary treatment and binary instrument



Identification: Imbens, Rubin (1997) Abadie, Angrist, Imbens (2002) (AAI02)

Z binary instrumental variable, eg. random assignment to JTPA

D binary treatment variable, eg. receiving training under JTPA

Dz is the potential treatment under assignment z

Y outcome variabile. eg earnings

Yd is the potential outcome under assignment d

Key assumptions:

comparisons by Z , identify the effect of ZZ does not directly affect outcomesalmost surely, individuals do not do the opposite of their assignment




Compliance Types

Di (Zi = 0)

0 1

0 never-taker defier∀j , D(Zj ) = 0 ∀i , D(Zj ) = 1− Zj

Dj (Zj = 1)

1 complier always-taker∀j, D(Zj) = Zj ∀j , D(Zj ) = 1

The potential treatment status D = hD(Z , ε) can be seen as a type indicator

D varies with Z for compliers and defiers

The presence of defiers in the population is ruled out by assumption




Compliance Types by Observed Treatment Status and Assignment to theTreatment given Monotonicity

Zj0 1

0 never-taker or never-takercomplier

Dj1 always-taker always-taker or

complier

Under the model assumptions, the type indicator defines a partition

The observed outcome distribution Y |D are mixtures of the potential

outcome distribution of the population types with identified proportions

Compliers cannot be identified from observational data



Inference: Abadie, Angrist, Imbens (2002) (AAI02)

We cannot condition on the subsample of compliers to get QTEs

Estimation involves running a ‘weighted’ quantile regression that allows to

estimate the marginal quantiles of the potential outcome distributions for

compliers

Estimation requires an auxiliary firs step estimation of the weights

Weights are a function of E [Z |X , D, Y ]

In practice: STATA code by Froelich and Melly



JTPA evalutation within the LATE-QTE modelquantiles of trainee earnings 105

TABLE IIIQuantile Treatment Effects and 2SLS Estimates

Dependent Variable: 30-month Earnings

Quantile

2SLS 0.15 0.25 0.50 0.75 0.85

A. MenTraining 1,593 121 702 1,544 3,131 3,378

(895) (475) (670) (1,073) (1,376) (1,811)% Impact of Training 8.55 5.19 12.0 9.64 10.7 9.02

High school or GED 4,075 714 1,752 4,024 5,392 5,954(573) (429) (644) (940) (1,441) (1,783)

Black −2�349 −171 −377 −2�656 −4�182 −3�523(625) (439) (626) (1,136) (1,587) (1,867)

Hispanic 335 328 1,476 1,499 379 1,023(888) (757) (1,128) (1,390) (2,294) (2,427)

Married 6,647 1,564 3,190 7,683 9,509 10,185(627) (596) (865) (1,202) (1,430) (1,525)

Worked less than 13 −6�575 −1�932 −4�195 −7�009 −9�289 −9�078weeks in past year (567) (442) (664) (1,040) (1,420) (1,596)

Constant 10,641 −134 1,049 7,689 14,901 22,412(1,569) (1,116) (1,655) (2,361) (3,292) (7,655)

B. WomenTraining 1,780 324 680 1,742 1,984 1,900

(532) (175) (282) (645) (945) (997)% Impact of Training 14.6 35.5 23.1 18.4 10.1 7.39

High school or GED 3,470 262 768 2,955 5,518 5,905(342) (178) (274) (643) (930) (1026)

Black −554 0 −123 −401 −1�423 −2�119(397) (204) (318) (724) (949) (1,196)

Hispanic −1�145 −73 −138 −1�256 −1�762 −1�707(488) (217) (315) (854) (1,188) (1,172)

Married −652 −233 −532 −796 38 −109(437) (221) (352) (846) (1,069) (1,147)

Worked less than 13 −5�329 −1�320 −3�516 −6�524 −6�608 −5�698weeks in past year (370) (254) (430) (781) (931) (969)

AFDC −2�997 −406 −1�240 −3�298 −3�790 −2�888(378) (189) (301) (743) (1,014) (1,083)

Constant 10,538 984 3,541 9,928 15,345 20,520(828) (547) (837) (1,696) (2,387) (1,687)

Note: The table reports 2SLS and QTE estimates of the effect of training on earnings. Assignment status is used as an instrumentfor training. The specification also includes indicators for service strategy recommended, age group, and second follow-up survey.Robust standard errors are reported in parentheses.

quantile. The estimates at low quantiles are substantially smaller than the corre-sponding quantile regression estimates, and they are small in absolute terms. Forexample, the QTE estimate (standard error) of the effect on the .15 quantile formen is $121 (475), while the corresponding quantile regression estimate is $1,187(205). Similarly, the QTE estimate (standard error) of the effect on the .25 quan-tile for men is $702 (670), while the corresponding quantile regression estimate is



JTPA evalutation within the C&H IVQTE model

Unlike the case considered above, we do not find large differences between the direct and dual inferenceprocedures for IVQR in this case. The similarity between the two approaches is not unexpected due to thestrong correlation between the instrument and endogenous regressor. The close agreement here furthersuggests that not much is lost by considering the dual procedure in cases where identification is strong. It alsoprovides further support for the argument that the differences detected in the previous section are due to weakidentification. Given the robustness of the dual procedure to the presence of weak instruments and its simplecomputation, it seems that this inference procedure will be preferable to the standard procedure in many cases.

The dual confidence bounds are further illustrated in Fig. 5, which plots the IVQR objective function W nðaÞover the parameter space A. a is plotted on the horizontal axis, and the vertical axis shows W nðaÞ. Thehorizontal line in each graph is the 95% critical value for the dual inference procedure, so all points lyingbelow the horizontal line belong to the confidence region for aðtÞ. The graphs in Fig. 3 differ markedly fromthose in Figs. 1 and 2. In particular, all of the objective functions, and hence confidence regions, in Fig. 3 look

ARTICLE IN PRESS

0.2 0.3 0.4 0.5 0.6 0.7 0.8-2000

0

2000

4000

6000

8000QR: Training Effect

τ

Tra

inin

g E

ffect

0.2 0.3 0.4 0.5 0.6 0.7 0.8-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4QR: Percentage Impact of Training

τ

Tra

inin

g E

ffect

0.2 0.3 0.4 0.5 0.6 0.7 0.8-2000

0

2000

4000

6000

8000IVQR: Training Effect

τ

Tra

inin

g E

ffect

0.2 0.3 0.4 0.5 0.6 0.7 0.8-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4IVQR: Percentage Impact of Training

τ

Tra

inin

g E

ffect

Fig. 4. Estimates of the training impact by QR and by IVQR. Notes: Left column. QR and IVQR estimates of the impact of a job training

program on earnings for t ¼ 0:15, 0.25, 0.50, 0.75, and 0.85. The top panel reports the QR estimate of the training impact, and the bottom

panel reports the IVQR results. In each figure, the solid line represents the point estimates, and the dashed (- -) line represents the 95%

confidence interval formed using the direct inference approach. For the IVQR results, the dash-dot (-.) line represents the 95% confidence

bound constructed using the dual inference procedure described in the text. In both figures, the horizontal axis measures the quantile index

t, and the vertical axis is the impact of training on earning quantiles measured in dollars. Models include covariates as specified in the text,

and the sample size is 5102. Right column. QR and IVQR estimates of the percentage impact of training for t ¼ 0:15, 0.25, 0.50, 0.75, and0.85. The top panel reports the QR estimate of the training impact, and the bottom panel reports the IVQR results. Percentage impacts are

for moving from non-training to training and all other covariates are evaluated at their sample mean. In both figures, the horizontal axis

measures the quantile index t, and the vertical axis is the percentage impact of training.

V. Chernozhukov, C. Hansen / Journal of Econometrics 142 (2008) 379–398 393



FrameworkHeckman et al. (1997)

Yi1, Yi0 are the potential outcomes

Di is the (binary) treatment indicator

βi = Yi1 − Yi0 impact of the treatment

FY0(y), FY1

(y), , FY1,Y0(y1, y0) marginal & joint distribution of potential

outcomes

Doksum’s quantile treatment effect

δ(τ) = F−1Y1

(τ)− F−1Y0

(τ), 0 < τ < 1

Taking τ = FY0(y0) and changing variables

δ(x) = F−1Y1

(FY0(y0))− y0



Exploring reasonable restrictions

1. Marginals can be identified under randomization

2. The joint distribution can be identified if it is degenerate, i.e.

FY0(y0) = FY1

(y0 + β)

(not interesting though why?)

3. Under perfect rank dependence, the constant treatment effect assumption in2. can be relaxed and replaced

β(y0) = F−1Y1

(FY0(y0))− y0

4. Bounds for the joint distribution (Hoeddfing(1940) & Frechet (1951))

max [FY1(y1|D = 1) + FY0

(y0|D = 1)− 1, 0] ≤FY1,Y0

(y1, y0|D = 1) ≤min[FY1

(y1|D = 1),FY0(y0|D = 1)



Identification exploiting information from revealed preferences 1/3Heckman & Honore HH (1990) Econometrica

Consider a (Roy) model to explain occupational choice and its consequences for

the distribution of earnings when individuals differ in their occupation

specific skills endowments.

(One could think about education/training choices instead of occupation)

HH(90) show that in the model self-selection leads to reduced inequality

in earnings compared to an economy with random assignments to jobs.

HH(90) show under which assumptions is possible to determine the correlation of

latent skills or potential wages of persons even if one observes only one skill

of any person.



Identification exploiting information from revealed preferences: the model

2/3

Income maximizing agents with 2 skills S1 > 0, S0 > 0

Skill prices π1, π0

Individuals differ in endowments; agents know their own

F (s1, s0) population distribution of skills

Skill i is useful only in sector i

Agent chooses sector 1 if W1 ≡ π1S1 > π0S0 ≡ W0

D = 1(W1 ≥ W0)

from Heckman & Honore HH (1990) Econometrica



Identification exploiting information from revealed preferences: the model

3/3Heckman & Honore HH (1990) Econometrica, Theorem 9

Under the previous assumptions, if we only observe

Z = max(S1, π0S0) and π0 takes all values in (0, ∞), F is identifiable.

Pr(max(S1, π0S0) ≤ x) ∀x , π0 is known

Pr(max(S1, π0S0) ≤ x) = Pr(S1 ≤ x , S0 ≤ x/π0)

thus taking s1 = x , π0 = s1/s0; s0 = s1/π0

F (s1, s0) = Pr(S1 ≤ s1, S2 ≤ s1/π0)

F (s1, s0) can be determined along the ray s1 = π0s0 as π0 varies



Comments

The above model assumes that only ‘gains’ determine participation in one

sector and that agents maximize profits

The participation rule implies a tight link between outcomes :

in the participating population the mass of the W1 conditional on W0

will be on the right of w0

To sum up: to solve the identification problem we may

(a) assume some (in)dependence

(b) exploit dependence induced by choices



References those who fit in the slide . . .

Abadie, A. et al. (2002) Instrumental Variable Estimates of the Effect of SubsidizedTraining on the Quantiles of Trainee Earnings, Econometrica, Vol. 70 (1), pp. 91-117

Brunello et al. (2009)‘Years of Schooling, Human Capital and the Body Mass Index ofEuropean Females’, IZA dp4667

Brunello et al. (2009) ‘Changes in Compulsory Schooling,Education and the Distribution ofWages in Europe’, Economic Journal 110 pp. 516-539

Chernozhucov, V. et al. (2005) ‘An IV Model of Quantile TreatmentEffects’,Econometrica, Vol. 73 (1), pp. 245-261

Chernozhucov, V. et al. (2006) ‘Instrumental Quantile Regression Inference for Structuraland Treatment Effect Models’, Journal of Econometrics pp. 491-525

Chernozhucov, V. et al. (2008) ‘Instrumental Variable Quantile Regression: A RobustInference Approach’, Journal of Econometrics pp. 379-398

Heckman et al. (1990) ‘The Empirical Content of the Roy Model’ ,Econometrica,58(5):1121-1149

Heckman et al. (1997) ‘Making The Most Out of Programme Evaluations and SocialExperiments: Accounting for Heterogeneity in Programme Impacts’ , RES,64(4):487-535.

Imbens et al. (1997) ‘Estimating Outcome Distributions for Compliers in InstrumentalVariable Models’, Review of Economic Studies, pp. 555-574

Martins et al. (2004) ‘Does Education Reduce Wage Inequality? Quantile RegressionEvidence from 16 Countries’, Labour Economics, Vol. 11. pp. 355-371



So far . . .

QRs are useful to characterize the dependence between an outcome Y and a

treatment D in the presence of heterogeneity of the treatment

impact among observationally equivalent individuals

Quantile treatment effects (QTEs) describes the difference between quantiles

of the outcome distribution under different levels of the treatment

at a given quantile.

Identification of QTE requires identification of the marginal distribution of

potential outcomes Yd

Under rank invariance or rank similarity, the identification of the marginal

distributions allows to identify the impact distribution



Today

We move a step further by recognizing that in principle both Y and D can be

decomposed in two parts (a deterministic one and a stochastic one)

The two parts need not be additively separable

Thus,

We need to define a treatment parameter that allows for

‘more sources of stochastic variation’

Economic models typically place restrictions

on the number of independent sources of stochastic variation in the model

and on the distribution of these stochastic components and X



The exogenous impact function (EIF)

QTEs describe and detect the heterogeneity in the relationship of Y and D

over the distribution of Y

This may be restrictive if there are reasons to believe that the dependence

between the two variables varies also over the distribution of D

Chesher (2003) suggests to look at the exogenous impact function. The EIF

describes the rate at which Y varies as the value of D is marginally increased

at specific quantiles of the distribution of the stochastic components

determining both Y and D (and maybe at specific values of the ex. cov.).

The EIF has a causal interpretation if one is able to shift D without affecting

the other elements.



Usual example: education (D) and wages (Y )

Some studies find that the returns to education increase across deciles of the

conditional distribution of earnings without addressing endogeneity issues,

e.g. Martins & Pereira (2004)

U.S. data: mixed findings. Ability & education are complements (Arias et al.,

2001) or substitutes (Chernozhucov et al., 2006)

U.K. data: substitutability between education and cognitive and

non-cognitive ability (Denny et al., 2007)

Data on Europe: substitutability education & ability (Brunello et al. 2009)

! allow two unobserved sources of stochastic variation



Brunello, Fort, Weber,Economic Journal, 2009

BFW study the causal effects of education on the distribution of earnings

using data from 12 European countries exploiting changes in minimum school

leaving age (MSLA) for identification

BFW find that

conditional wage inequality is reduced by marginal increases in education

education & ability act as substitutes in the earnings function

Policy implications: investing in the less fortunate (because of poor labour

market fortune or poor talent) could pay off both on efficiency and equity

grounds



The Causal Chain Model in Steps

1. We start from a simple linear simultaneous (recursive) equation model with 2

equations and we consider the identification & estimation of the treatment

parameter in such model

2. We recall the control function version of the estimator

3. We relax the additive separability assumption on the stochastic components

in the model and we introduce a generalization of the structural model that

allows to detect and describe heterogeneous structural parameters

(causal chain model)

4. We discuss the assumptions under which the parameters (of interest) in such

model are identified

5. We discuss briefly how the (identified) parameter can be estimated



Notation in Our Running Example

Two endogenous variables Y , wages and D years of education

Both scalar, (approximately) continuous variables.

A matrix of exogenous regressors X (gender, age, country fe, trends,. . .)

The instrument Z , mandatory schooling years

The usual recursive modely = θs + X β1 + u

s = X β2 + zδ + ν

Cov(X , u) = 0 & Cov(X , ν) = 0, Cov(z , ν) = 0 Cov(u, v) 6= 0

Crucial feature: exclusion restriction

Identification: rank & order conditions



The Usual Recursive Model

Recursive model y , s, z scalars u, ν are correlated

y = θs + X β1 + u

s = X β2 + δz + ν

Reduced form: y1 = X

α1︷︸︸︷(θβ2 + β1) +z

α2︷︸︸︷(θδ) +

u︷︸︸︷(u + θν)

“Control function approach”, W = [X z ] , β = [β2 δ]:

y = θs + X β1 + γν + ε

y = θs + X β1 + γ[W (β− β) + ν] + ε



The Usual Recursive Model: Remarks

The model specifies the relationship btw the stochastic component and X , Z

There is an exclusion restriction & no feedback: the model has a triangular

structure

The endogeneity is driven by the correlation between u and ν (latent ν)

u and ν enter additively

The number of “independent” sources of stochastic variation in the model

equals the number of observed variables

The treatment parameter θ is invariant wrt to the stochastic components of

the model: s exerts a location shift on the distribution of y



The Usual Recursive Model: Remarks (continued)

There is a one-to-one correspondence between quantiles of u and v and the

conditional quantiles of y and s

To come up with an estimator, we describe the links between observed

quantities and unknown quantities (parameters)

The control function approach & the 2 stage approach deliver the same

estimator for θ



Properties of QR estimator: Equivariance

It guarantee a coherent interpretation of the results when the data or the

underlying model are modified not in an essential way.

Scale equivariance

For any a > 0, β(τ; ay , X ) = aβ(τ; y , X ) and

β(τ;−ay , X ) = aβ(1− τ; y , X )

Regression Shift

For any γ ∈ Rp, β(τ; y + X γ, X ) = β(τ; y , X ) + γ

Reparametrization of Design

For any |A| 6= 0, β(τ; y + AX , X ) = A−1 β(τ; y , X )

The analog invariance properties hold in mean regression



Equivariance to Monotone Transformations

For any monotone function h(·), conditional quantile functions are equivariant

Quanth(Y )(τ|x) = h(QuantY (τ|x))

i.e. the quantiles of the transformed variable are simply the transformed quantiles

of the original variable

The analog property does not hold in mean regression



A Causal Chain Model with Random Coefficients



A Causal Chain Model with Random Coefficients: Comments

To write the reduced form (hybrid) model

We write ν = f (s, x , z): first stage needs to be monotonic in ν

We exploit recursive structure in observed and latent variables

In a recursive model with monotonicity

there is a one-to-one mapping btw the quantiles of ν and

and the quantiles of s and y (equivariance of QR estimator)

One could allow more heterogeneity

y = s

θ(ε,ν)︷︸︸︷(θ + λν + γε) +x ′β1

or leave θ(ε, ν) unrestricted



Chesher(2003) Causal Chain Model in a Nutshell

Recursive model with nonlinear nonadditive equations

y = hy (s, X , ε, ν) y continuous

s = hs (X , Z , ν) s, z continuous

a triangular structure in both endogenous and latent variables

b hy (·) differentiable wrt s and ν; stricly monotonic in ε

c hs (·) differentiable wrt z and ν; stricly monotonic in ν

d the conditional τε quantile of ε given ν, X , Z is independent pf ν, X , Z

e the conditional τν quantile of ν given X , Z is independent pf X , Z

The exclusion restriction on the observed variables is required for

identification of the derivatives

ceteris paribus variations can be identified under weaker assumptions



Chesher Causal Chain Model: Estimation

1. Weighted average derivative estimator

2. Control variate estimator proposed by Ma et al. (2006)

step 1 estimate ν at a specific quantile τ of s for q2 quantiles

step 2 add the generated regressor nu in the

structural quantile regression equation for y for q1 quantiles

3. Estimation as in 1. and 2. outperforms alternative approaches

4. The output is a q1 × q2 matrix or a multidimensional graph:

the (estimated) exogenous impact function

5. One could obtain QTEs and average treatment effects

No routines available.

Following 2. : simply run a series of QR. Get standard errors with

appropriate bootstrap design (or follow asymptotic results in Ma et al. , 2006)


Empirical Setup I

ln(W) = α + S

Π(A,U)︷︸︸︷(β + λA + φU) +X′γW + A + U (1)

S = α + X′γS + πZ + ξA (2)

where

• W is wage;S is the quantity of schooling

• A is ability; U is labour market fortune orthogonal toA

• Z (years of compulsory schooling) is

an instrumental variable

• X is a vector of covariates

Ex-ante individuals do not have information onU– p. 5/32

About Ability A

Talent has: (i) an absolute effect (on earnings);(ii) a comparative effect (on returns)Ashenfelter & Rouse (1998)

Ability and schooling are complements if returnsincrease with ability (λ > 0), substitutes ifreturns decrease with ability (λ < 0)

We assume that more able individuals get more schoolingconsistent with signalling model and a variant of the human capital

model, see Blackburn and Neumark (1993)

– p. 6/32

About Labour Market Fortune U

The unobservableU captures

the fact that ex-ante identical individuals end up withdifferent wages in the random matching process afterschool completionHornstein et al (2006)

We assume that “luckier” individuals find a better match,i.e. end up with a higher wage

Alternatively, it may refer to

a zero mean demand shock which affects the relative

productivity of jobs and skillsGosling et al.(2000), Machin et al.(1998)

the ability which is productive only at work, in contrast with

cognitive abilityLang (1993)– p. 7/32

Empirical Setup II

ln(W) = α + S(β + λA + φU) + X′γW + A + U (1)

S = α + X′γS + πZ + ξA (2)

Let QY(τ |·) denote theτ -th conditional quantile

of the random variableY.

The conditional quantile model corresponding to eq. (1) and(2) is

QS(τA|X, Z) = α + γSX + πZ + ξQA(τA)

QW(τU|QS(τA|X, Z), X, Z) = α + QS(τA|X, Z)π(τA, τU) + γWX+

QU(τU) + QA(τA)

– p. 8/32

The Parameter of Interest: The Impact Function

Doksum(1974); Chesher(2003); Ma & Koenker (2006)

π(τA, τU) ≡ β + λQA(τA) + φQU(τU)

represents the rate at which wages increase asschooling is exogenously increased for a person

with ability equal toQA(τA) and labour market

fortune equal toQU(τU)

can be summarized as a matrix of quantile treatment

effects,π(τA, τU), which describe how returnsvary over the distribution of wages for a given levelof ability

– p. 9/32

Output “Preview”

π(τA, τU) τU =.10 . . . τU =.50 . . . τU =.90

τA =.10 ↔ ↔... . . . l . . .

τA =.90 l

by rows ↔: the table shows how returns vary overthe distribution of wages for a given level ofability/education (τA)

by columnsl: the table shows how returns varyacross different ability levels for a given level oflabour market fortune/wages (τU)

– p. 10/32

In Practice

Chesher(2003); Ma & Koenker (2006)

1. We estimate conditional quantile models for a set of values of τA

- QS(τA|X, Z) = α + γSX + πZ + QA(τA) - and we compute the

(first stage) residualsQA(τA) ≡ S− QS(τA|X, Z) ≡ S− α + γSX + πZ

2. We estimate the (hybrid) conditional quantile model for a set of

values ofτU for eachτA using a control variate approach

QW(τU|QS(τA|X, Z), X, Z)) = α + Sπ(τA, τU) + γWX + QU(τU)+

ϕ1QA(τA) + ϕ2QA(τA)S

ϕ1 andϕ2 may be interpreted as degree of endogeneity and

exploited to test (local) endogeneity– p. 11/32


Benchmark: from Ma & Koenker, 2006

Recalling that integrating the quantile function F�1X ðtÞ of a random variable, X,over the domain ½0; 1�, yields its expectation, that is,

EX ¼

Z 1

0

F�1X ðtÞdt,

we can define a mean quantile treatment effect by integrating out t2, and denotingmi ¼ Eni,

p1ðt1Þ ¼Z 1

0

ða1 þ dðF�11 ðt1Þ þ lF�12 ðt2ÞÞÞ dt2 � a1 þ dF�11 ðt1Þ þ dlm2.

Averaging again, this time with respect to t1 yields the mean treatment effect

p1 ¼Z 1

0

ða1 þ dF�11 ðt1Þ þ dlm2Þdt1 � a1 þ dm1 þ dlm2.

This mean treatment effect would be what is estimated by the two-stage least-squaresestimator in the pure location shift (d ¼ 0) version of the model, but when the effectsare more heterogeneous as in this location-scale shift model the structural quantiletreatment effect p1ðt1; t2Þ represents a deconstruction of the mean effect into itselementary components. Fig. 1 illustrates the three versions of the treatment effectp1ðt1; t2Þ; p1ðt1Þ; and p1 for a particular parametric instance of model (2.6)–(2.7).

2.2. Estimation of structural quantile treatment effects

In this section, we will describe two general classes of estimators for the parametricrecursive structural model

Y i1 ¼ j1ðY i2;xi; ni1; ni2; aÞ, (2.10)

Y i2 ¼ j2ðzi;xi; ni2; bÞ. (2.11)

We will maintain our assumptions on the nij’s and the functions j1 and j2 and wewill explicitly assume that the functions j1 and j2 are known up to the finite-

ARTICLE IN PRESS

0.20.4

0.6 0.8

tau10.2

0.4

0.60.8

tau2

-5 05

10152025

a1

Mean Treatment Effect

0.20.4

0.6 0.8

tau10.2

0.40.6

0.8

tau2

-5 05

10152025

a1 (

tau1

)

Mean Quantile Treatment Effect

0.20.4

0.6 0.8

tau10.2

0.4

0.60.8

tau2

-5 05

10152025

a1(t

au1,

tau2

)

Quantile Treatment Effect

Fig. 1. Quantile treatment effects for the structural model: the figure illustrates three different notions of

the structural treatment effect for the linear location-scale structural equation model: (2.6)–(2.7) with

ða1; a2; d; lÞ ¼ ð10; 4; 3; 2Þ, ðb1; b2; gÞ ¼ ð1; 2; 3Þ, n1�Nð0; 1Þ, n2�Nð0; 0:5Þ. The left figure depicts p1 ¼ 10,

the mean treatment effect; the middle figure shows p1ðt1Þ ¼ 10þ 3F�11 ðt1Þ, the mean quantile treatment

effect; the right figure shows p1ðt1; t2Þ ¼ 10þ 3ðF�11 ðt1Þ þ 2F�12 ðt2ÞÞ, the general quantile treatment effect.

L. Ma, R. Koenker / Journal of Econometrics 134 (2006) 471–506 477


Effect of the Changes in MSLA on the Distribution of

Years of Education

Males τA = 0.10 τA = 0.30 τA = 0.50 τA = 0.70 τA = 0.90

Coeff. .354∗∗∗ .056∗∗∗ .120∗∗∗ .078∗∗ .026

F-test 2146.6 19.1 307.6 4.86 0.13

(p-val.) (.000) (.000) (.000) (.027) (.714)

Females τA = 0.10 τA = 0.30 τA = 0.50 τA = 0.70 τA = 0.90

Coeff. .416∗∗∗ .284∗∗∗ .072∗∗∗ .219∗∗∗ .135∗∗∗

F-test 643.8 195.4 88.7 57.4 4.26

(p-val.) (.000) (.000) (.000) (.000) (0.039)

τA denotes quantiles of the years of schooling distribution.

Three stars, two stars for statistically significant coefficients at the 1%, 5%,

confidence level. Example (females)⊳ Example (males)⊳– p. 15/32

π(τA, τU) τA = 0.3, Males

.03

.04

.05

.06

.07

.08

.1 .3 .5 .7 .9Quantile of Labour Market Fortune/Wages

Approx. 95% CI Marginal return to SchoolingApprox. 95% CI

Evidence of heterogeneity in returns, which tend to decrease asone moves from the bottom to the top deciles of the distribution of

lnW. Similar pattern for other values ofτA. π(τA, τU) ⊲– p. 17/32

π(τA, τU) τA = 0.3, Females

.06

.07

.08

.09

.1

.1 .3 .5 .7 .9Quantile of Labour Market Fortune/Wages

Approx. 95% CI Marginal return to Schooling Approx. 95% CI

Evidence of heterogeneity in returns, which tend to decrease asone moves from the bottom to the top deciles of the distribution of

lnW. Similar pattern for other values ofτA. π(τA, τU) ⊲– p. 18/32

π(τA, τU) τU = 0.5, Males

.03

.04

.05

.06

.07

.1 .3 .5 .7 .9Quantile of Ability/Schooling


Evidence of heterogeneity in returns, which tend to decrease asone moves oves from the lower to the higher levels ofA. Similar

pattern for other values ofτU. π(τA, τU) ⊲– p. 19/32

π(τA, τU) τU = 0.5, Females

.05

.06

.07

.08

.09

.1 .3 .5 .7 .9Quantile of Ability/Schooling


Evidence of heterogeneity in returns, which tend to decrease asone moves oves from the lower to the higher levels ofA. Similar

pattern for other values ofτU. π(τA, τU) ⊲– p. 20/32

π(τA, τU) = β + λQA(τA) + φQU(τU)

Males Females

(1) (2) (3) (4)

β 0.051.0015

∗∗∗ 0.050.0026

∗∗∗ 0.070.0009

∗∗∗ 0.072.0013

∗∗∗

λ −0.0021.0004

∗∗∗ −0.0022.0008

∗∗∗ −0.0025.0003

∗∗∗ −0.0021.0005

∗∗∗

φ −0.0089.003

∗∗∗ −0.013.0032

∗∗∗ −0.0091.0017

∗∗∗ −0.0119.0016

∗∗∗

R Squared 0.680 0.692 0.856 0.836

Col. (1) and (3) are estimates based on the 25 estimated returnsπ(τA, τU),

τA τU ∈ {0.1, 0.3, 0.5, 0.7, 0.9}. Col. (2) and (4) are based on excludingτA

τU ∈ {0.7, 0.9} and retaining 15 estimated returns. The regressors

QA(τA) ≡ G−1

A(τA) andQU(τU) ≡ G

−1

U(τU) are computed using the deciles of

the ecdf of the 1st stage and 2nd stage residuals. – p. 21/32

Implications for Conditional Wage Inequality

δτ2−τ1 ≡∂QY(τ2,X,S,A)

∂S− ∂QY(τ1,X,S,A)

∂S≡ π(τA, τ2) − π(τA, τ1)

Males δ30−10 δ50−10 δ70−10 δ90−10

τA = 0.10 −0.0165 −0.0193 −0.0198∗∗ −0.0150

τA = 0.30 −0.0149∗∗ −0.0163∗∗ −0.0206∗∗ −0.0122

τA = 0.50 −0.0172∗∗ −0.0187∗∗∗ −0.0233∗∗∗ −0.0196∗∗

Three stars, two stars and one star for statistically significant coefficients at

the 1%, 5%, and 10% confidence level (bootstrap; 100 replications).

– p. 22/32

Implications for Conditional Wage Inequality

δτ2−τ1 ≡∂QY(τ2,X,S,A)

∂S− ∂QY(τ1,X,S,A)

∂S≡ π(τA, τ2) − π(τA, τ1)

Females δ30−10 δ50−10 δ70−10 δ90−10

τA = 0.10 −0.0172∗∗∗ −0.0164∗∗∗ −0.0132∗ −0.0193∗∗

τA = 0.30 −0.0137∗∗ −0.0125∗∗ −0.0108 −0.0136

τA = 0.50 −0.0168∗∗∗ −0.0158∗∗ −0.0140∗∗ −0.0201∗∗

τA = 0.70 −0.0116∗∗ −0.0101∗∗ −0.0074 −0.0077

Three stars, two stars and one star for statistically significant coefficients at

the 1%, 5%, and 10% confidence level (bootstrap; 100 replications).

– p. 23/32

logW S YCOMP Age %Males Nobs

Austria 2.220 12.181 8.767 50.900 0.492 920

Belgium 2.470 14.887 9.782 33.125 0.465 853

Denmark 2.798 13.667 8.030 44.186 0.477 2235

Finland 2.366 15.153 7.511 37.151 0.496 1409

France 2.399 13.410 9.017 47.074 0.525 1293

Germany 2.439 12.127 8.620 45.649 0.590 1690

Greece 2.005 12.929 7.509 38.270 0.562 984

Ireland 2.265 12.356 8.534 39.331 0.574 1260

Italy 2.367 12.556 7.097 49.066 0.590 1762

Netherl. 2.574 14.166 9.445 37.702 0.592 1294

Spain 2.116 11.049 7.099 43.136 0.626 2284

Sweden 2.328 12.197 8.465 50.410 0.480 2344

Counfounders⊲ Data⊳– p. 27/32

Effect of the Changes in MSLA on the Distribution

of Years of Education

0.2 0.4 0.6 0.8

1618

2022

24

Synthetic Europe−12, Females

Years of schooling estimated cdf; ’1st stage’ τ

quan

tile

low ycomp: 6 yearshigh ycomp: 8 years

Blue solid line: 8 years of compulsory schooling; Red dashed line: 6 yrs of

comp. sc. First stage quantile regressions (π) ⊲ – p. 28/32

Effect of the Changes in MSLA on the Distribution

of Years of Education

0.2 0.4 0.6 0.8

2025

3035

Synthetic Europe−12, Males

Years of schooling estimated cdf; ’1st stage’ τ

quan

tile

low ycomp: 6 yearshigh ycomp: 8 years

Blue solid line: 8 yrs of compulsory schooling; Red dashed line: 6 yrs of

comp. sc. First stage quantile regressions (π) ⊲ – p. 29/32

Association between Education and Wages

over the distribution of wages

Coef.(se) τW = 0.10 τW = 0.30 τW = 0.50 τW = 0.70 τW = 0.90

Males .019∗∗∗ .026∗∗∗ .033∗∗∗ .035∗∗∗ .039∗∗∗

(.002) (.001) (.001) (.001) (.002)

Females .027∗∗∗ .037∗∗∗ .043∗∗∗ .050∗∗∗ .051∗∗∗

(.003) (.001) (.001) (.001) (.002)

τW denotes quantiles of the log wage distribution.

Three stars, two stars and one star for statistically significantcoefficients at the 1%, 5%, and 10% confidence level.

– p. 30/32

π(τA, τU) Males

Males τU = 0.10 τU = 0.30 τU = 0.50 τU = 0.70 τU = 0.90

τA = 0.10 .0748.004

.0583.004

.0555.003

.0550.004

.0598.006

τA = 0.30 .0625.007

.0476.004

.0462.003

.0420.005

.0503.006

τA = 0.50 .0665.006

.0492.004

.0478.004

.0432.004

.0469.006

τA = 0.70 .0486.006

.0396.004

.0448.004

.0411.004

.0471.005

τA = 0.90 .0468.006

.0329.004

.0384.003

.0332.004

.0452.006

τU denotes quantiles of the labour market fortune/ wages distribution.

τA denotes quantiles of ability/ years of schooling distribution.

Bootstrapped standard errors (100 replications) in small characters.

All statistically significant at the 1% confidence level.

Back (malesτA) ⊳ Back (malesτU) ⊳ – p. 31/32

π(τA, τU) Females

Females τU = 0.10 τU = 0.30 τU = 0.50 τU = 0.70 τU = 0.90

τA = 0.10 .0952.007

.0780.004

.0788.004

.0820.005

.0759.007

τA = 0.30 .0838.007

.0701.003

.0713.003

.0730.004

.0702.006

τA = 0.50 .0847.006

.0679.004

.0690.003

.0707.004

.0646.006

τA = 0.70 .0689.005

.0573.003

.0588.003

.0615.003

.0612.005

τA = 0.90 .0631.006

.0502.003

.0527.003

.0555.004

.0567.006

τU denotes quantiles of the labour market fortune/ wages distribution.

τA denotes quantiles of ability/ years of schooling distribution.

Bootstrapped standard errors (100 replications) in small characters.

All statistically significant at the 1% confidence level.

Back (femalesτA) ⊳ Back (femalesτU) ⊳ – p. 32/32


Another example: class size (D) and students’ achievement (Y )

Conventional wisdom: “class size reduction is a viable mean to increase

scholastic achievement”

Issue: is there evidence supporting the claim?

Coleman Report (1966): schooling inputs have negligible effects on student’s

achievements

Hanushek (1986): mixed findings; modest positive effects of class size

reduction

Krueger (1997): different effects on boys & girls, blacks & whites, inner-city

& out-of-city students

Lazear (2001): theoretical framework in which optimal class size wrt

scholastic achievement differs btw students that behave well or not

Recent research stresses the role of cheating and class composition to explain

mixed evidence and small class size effects



Example: class size (D) and students’ achievement (Y ) cont’d

Levin’s (2001) findings: mixed, no strong evidence of heterogeneity wrt

students’ achievement (standard QR); positive effects at

low achievement levels (2SLAD) → ‘peer effects’,

‘targeted instruction’; evidence of non random selection

of less (more) able students to larger (smaller) classes;

composition of the class (IQ) matters more than size, the effect

is smaller as one moves up in the achievement distribution

Ma & Koenker’s (2006) findings: negligible effects for the average student;

positive effects on language performance and negative

effects for math perf. for lower attainment students;

negative effects on language performance for high attainment

students, negligible for math (causal chain model)

Remark Levin and Ma & Konker use the same data but different models.



Levin and Ma & Koeker Study: Some Details

Data: 1st wave from a longitudinal survey with info on Dutch

pupils enrolled in grades 2, 4 (aged 7-8), 6 (aged 9-10), 8 (aged

11-12) in 1994-1995. Variables: test scores of students

wrt intelligence, reading abilities,language (Dutch) , mathematics;

background data (parents & teachers); detailed (administrative)

school level data. Sample size: 57,000 pupils; 700 schools

Instrument: weighted school enrollement (WSE), the parameter

according to which the Ministry allocates funding to schools.

The funding determines the number of teachers hired.

WSE is a weighted average of total school enrollment, weighted by the socio-

economic status of the students enrolled in the school.



Levin and Ma & Koneker Study: the IV

zi ≡ WSEi = 1.03max{∑nij=1 sij − 0.9ni , ni}

ni enrollment, i school, j student; sij ∈ {1(ref .), 1.25, 1.4, 1.7, 1.9(worst)}

Z varies between schools not within

Z is distinct from school size (Ma & Koeker, 2006)

Class size has more variability in bigger schools but does not increase with

school size (Ma & Koeker, 2006)



Levin, 2001 Empirical Economics, Math

(�� #�� '2�

.'0 B��#�

;(��'��

�� 2� 2�J 2J 2IJ 2D

�� 3� 2.JI

� 2H�I�== 2.�� 2JJH�

�2 J � 2��==

�2J�� 2JH��===

2DD�� 2J �==

� 2 �J� 2��D�

��'� 2 �� 2 � �

� 2 �� 2 �D�

� 2 �H� 2 ��

� 2 �D� 2 �.�

� 2 �.� 2 �I�

2 �� 2 �J�

��(#��"��4(��#

2��HJ 2 I�H 2 DH� 2 DH� 2 �I� 2 � �

��(�� '��'� 2DI�

� 2HI �===

��'��"��4(��#

2�H� ===

�(�%�� -�%��+�'��

�)�� )�� )�� )�� )�� )��

(�� #�� 8��' �- �� 3� �� (�� 0��+��' �- �'0) �'0 ��# .'0 ��#�� '6� �'��' �4(�� # '6� �'�� ' �%��('� #�+��'�� '��'��

�'0 B��#�

;(��'��

�� 2� 2�J 2J 2IJ 2D

�� 3� 2�HI� 2HI��

2�J�� 2��.�

2��D� 2J��

2 D�� 2HD.�

2��H� 2HD �

� 2��I� 2��I�

��' 2 � � 2 ��

2 �� 2 �.�

2 D� 2 ��

2 �� 2 ��

2 H� 2 ��

� 2 �� 2 ��

��(#��"��4(��#

2��HJ 2 J�. 2 D�D 2 DDD 2 ID� 2 �DH

��(�� '��'� 2�I � 2��

��'��"��4(��#

2H.HD===

�(�%�� -�%��+�'��

�)�JH �)�JH �)�JH �)�JH �)�JH �)�JH

F�� 60�� '0� ��#(�'�� (�'� ! 4(��'�� ,�� - �� 3� ��# �� 8��'� �HH



Levin, 2001 EE, Language

(�� #�� '2�

�'0 B��#�

;(��'��

�� 2� 2�J 2J 2IJ 2D

�� 3�� 2�� 2�I �

� 2��D� 2J�I�

� 2��J� 2��

� 2�� 2�.��

� 2�H�� 2�HJ�

� 2�JJ� 2�H��

��' 2 �� 2 ��

2 � � 2 �D�

2 �J� 2 ��=

2 I� 2 ��

2 .� 2 ��

2 .� 2 ��

��(#��"��4(��#

2�� 2 �H� 2 IHD 2 .�I 2 �D 2 HD�

��(�� '��' 2 I � 2HHJ�

��'��"��4(��#

2��I�===

�(�%�� -�%��+�'��

�) D �) D �) D �) D �) D �) D

.'0 B��#�

;(��'��

�� 2� 2�J 2J 2IJ 2D

�� 3� 2J�D� 2�.��=

2�� 2J�H�

2.�� 2��.�=

2�D.� 2HI��

2H�I� 2H��

2�D�� 2��H�

��'� 2 �� 2 D�

� 2 H� 2 � �

� 2 �� 2 ��

2 � 2 �H�

� 2 �� 2 ��

� 2 .� 2 �J�

��(#��"��4(��#

2�J�H 2 I�� 2 DD� 2� J 2 D D 2 �.�

��(�� '��'� 2JJD� 2�DH�=

��'��"��4(��#

2�H� ===

�(�%�� -�%��+�'��

�)�� )�� )�� )�� )�� )��

��' 9��#��' +��%�� '�� <�� - �(��7 �� '��#��#�3�# ��'0��'�� (�� '��'�2 ! �� (#� �##�'�� '�� -�� '��') -�(� ��#�+�#(� �� #(��) � #(��, -�� (��7� ��#��) ��'�� - -�� (��7��) �� +�� ) '��0��7� ��#�� # �C��) #(��, -�� #(� '��0�� )#(��, -�� ('��#� ��) '0�� #(��, +��%�� -�� 0��7� #��'�� # '�'��0�� '2 "�%(�' �'��#��# �� '�<�� '� ��(�' ��(��# 0�'��<�#��'��',-�� ' �4(�� 4(�'�� # (�� #�� '��C %��'�'��# -�� 4(��'�� '�# �� %��<�'�2 ===N==N= ��#��'� ��:�� ' '0� �ONJON� O��+��2 ��(#� "�

�4(��# �4(�� (� �- 6��0'�# #�+��'�� %�(' ��'��'�# 4(��'��

�(� �- 6��0'�# #�+��'�� (�# ��6 4(��'��

� �2

�H� @2 �+��



Ma & Koenker, 2006 Journal of Econometrics

language and math scores is provided in Figs. 5 and 6, respectively. In the left panelwe depict the conventional two-stage least-square estimate of the mean shift effect ofclass size viewed as a constant function of t1 and t2: In the middle panel we showwhat we have called the mean quantile treatment effect obtained by integrating outthe t2 effect from the WAD estimate, dðt1; t2Þ, of the structural class-size effect. Inthe right panel we present dðt1; t2Þ.

The two-stage least-square estimate of the class-size effect is �0.07 with a standarderror of 0.20, a finding consistent with many other unsuccessful attempts to discern asignificant effect of class size. However, our estimates of the mean quantile treatmenteffect of class size in the middle panel reveals a somewhat more nuanced view. Bothmath and language plots show a positive effect of around 0.7 at low quantiles andfalling gradually to about �0.5 at the upper quantiles, suggesting that poorerstudents benefit from larger classes, while better students do better in smaller classes.Further disaggregating, the plots in the right panel indicate dispersion in the class-size effect in both the t1 and t2 directions, but the picture is roughly similar: positiveeffects at the lower quantiles of test scores, and negative effects at the upper

ARTICLE IN PRESS

0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1.5-1

-0.5 0

0.51

1.52

delta


0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1.5-1

-0.5 0

0.51

1.52

delta

(tau

1)


0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1.5-1

-0.5 0

0.51

1.52

delta

(tau

1, ta

u2)


Fig. 5. Structural class-size effects for language: t1-students achievement, t2-class size.

0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1

-0.5

0

0.5

1

delta


0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1

-0.5

0

0.5

1

delta

(tau

1)


0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1

-0.5

0

0.5

1

delta

(tau

1, ta

u2)


Fig. 6. Structural class-size effects for math: t1-students achievement, t2-class size.

L. Ma, R. Koenker / Journal of Econometrics 134 (2006) 471–506498



Ma & Koenker, 2006

language and math scores is provided in Figs. 5 and 6, respectively. In the left panelwe depict the conventional two-stage least-square estimate of the mean shift effect ofclass size viewed as a constant function of t1 and t2: In the middle panel we showwhat we have called the mean quantile treatment effect obtained by integrating outthe t2 effect from the WAD estimate, dðt1; t2Þ, of the structural class-size effect. Inthe right panel we present dðt1; t2Þ.

The two-stage least-square estimate of the class-size effect is �0.07 with a standarderror of 0.20, a finding consistent with many other unsuccessful attempts to discern asignificant effect of class size. However, our estimates of the mean quantile treatmenteffect of class size in the middle panel reveals a somewhat more nuanced view. Bothmath and language plots show a positive effect of around 0.7 at low quantiles andfalling gradually to about �0.5 at the upper quantiles, suggesting that poorerstudents benefit from larger classes, while better students do better in smaller classes.Further disaggregating, the plots in the right panel indicate dispersion in the class-size effect in both the t1 and t2 directions, but the picture is roughly similar: positiveeffects at the lower quantiles of test scores, and negative effects at the upper

ARTICLE IN PRESS

0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1.5-1

-0.5 0

0.51

1.52

delta


0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1.5-1

-0.5 0

0.51

1.52

delta

(tau

1)


0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1.5-1

-0.5 0

0.51

1.52

delta

(tau

1, ta

u2)


Fig. 5. Structural class-size effects for language: t1-students achievement, t2-class size.

0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1

-0.5

0

0.5

1

delta


0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1

-0.5

0

0.5

1

delta

(tau

1)


0.20.4

0.60.8

tau10.2

0.4

0.6

0.8

tau2

-1

-0.5

0

0.5

1

delta

(tau

1, ta

u2)


Fig. 6. Structural class-size effects for math: t1-students achievement, t2-class size.

L. Ma, R. Koenker / Journal of Econometrics 134 (2006) 471–506498

For low achievers, larger classes improve language performance; smaller math

For average students, no class size effects

For high achievers, smaller classes better for language; no effects on math



Back to Theory (with this Example in Mind)

In the example, Y (test scores, achievement), D (class size), Z (WSE) are

continuous variables

The LATE approach does not apply

The IV −QTE model requires that conditional on Z , X , the relative position

of an individual in the achievement distribution is not affected by class size:

the ‘best’ student in a small class is the ‘best’ student in a big class.

LATE and IV −QTE focus on the QTEs and at most reveal the

heterogeneity of the impact of class size at different levels of achievement.



Levin vs Ma & Koneker Research Questions

Levin ‘how does mean class size affect the distribution of academicoutcomes?’

Addressing Levin’s question may reveal heterogeneity of class size

effects over the distribution of students’s achievement but

cannot reveal heterogeneity wrt the distribution of class size

Ma & Koenker ‘Is there any heterogeneity of the class size effect over the

distribution of academic outcomes and the distribution of class

sizes?’

In any case we look at effects on the conditional distribution of the outcome

Sometimes you may need controls for identification but the research question is on

the marginal distribution . . .

This can be done but the we did not cover the tools to address those issues here



References

Brunello et al. (2009) ‘Changes in Compulsory Schooling,Education and theDistribution of Wages in Europe’, Economic Journal 110 pp. 516-539

Chesher, A. (2003) Identification in Nonseparable Models, Econometrica, Vol.71, pp. 1405-1441 and the 2001 WP version of the paper!

Chesher, A. (2005) ’Nonparametric Identification under discrete variation’,Econometrica, Vol. 73 (5), pp. 1525-1550.

Koenker, R. (2005) Chapter 8 (Section 8.8)

Levin, (2001)‘For Whom the Reductions Count: A Quantile RegressionAnalysis of Class Size and Peer Effects on Scholastic Achievement ’,EmpiricalEconomics Vol. 26, pp. 221-246

Ma et al. (2006)‘Quantile Regression Methods for Recursive StructuralEquation Models’, Journal of Econometrics, Vol. 134 (2), pp. 471-506

Materials for this lecture are also based on lectures R. Spady at EUI (2006) and the talk by A.

Chesher at the 9th World Congress of the Econometric Society


· lecture 1lecture 2lecture 3 some examples to get started 1/3 case 1 evaluation of the impact of...

Documents