quantile regression (final).pdf

1 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

QUANTILE REGRESSION

Motivation: Linear Regression Modeling and Its Shortcomings

Recall: Ordinary Least Squares Model

Note:

A fundamental aspect of linear-regression models is that they attempt to describe how the location of

the conditional distribution behaves by utilizing the mean of a distribution to represent its central

tendency.

It invokes a homoscedasticity assumption; that is, the conditional variance, Var (y|x), is assumed to be

a constant 2 for all values of the covariate.

A third distinctive feature of the OLS is its normality assumption.

Outliers (cases that do not follow the relationship for the majority of the data) tend to have undue

influence on the fitted regression line.

Consider an extreme situation:

Note that: These results show that the LRM approach can be inadequate for a variety of reasons, including

heteroscedasticity and outlier assumptions and the failure to detect multiple forms of shape shifts.


Ordinary Least Squares Versus Quantile Regression Model

Ordinary Least Squares Quantile Regression Model

objective function sums of squared residuals asymmetrically weighted absolute

residuals

estimates conditional mean functions conditional quantile functions,

such as conditional median

functions

allows

heteroskedasticity?

no yes

distributional

assumptions

normality and homoskedasticity

of error terms

none

comprehensiveness only yields information about

the conditional mean E(Y|X)

yields information about the

whole conditional distribution of Y

Prob > chi2 = 0.0000

chi2(1) = 5180.30

Variables: fitted values of income

Ho: Constant variance

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

. estat hettest,normal

_cons -42655.95 1442.537 -29.57 0.000 -45483.42 -39828.47

white 11451.75 799.8409 14.32 0.000 9884.01 13019.5

ed 6313.654 100.8045 62.63 0.000 6116.07 6511.237

income Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 4.7076e+13 22623 2.0809e+09 Root MSE = 41684

Adj R-squared = 0.1650

Residual 3.9306e+13 22621 1.7376e+09 R-squared = 0.1651

Model 7.7702e+12 2 3.8851e+12 Prob > F = 0.0000

F( 2, 22621) = 2235.92

Source SS df MS Number of obs = 22624

. regress income ed white


Quantile Regression Model

Proposed by Koenker and Bassett (1978), quantile regression models conditional quantiles as functions of

predictors. It estimates the effect of a covariate on various quantiles in the conditional distribution.

Quantile Regression Estimation

In Quantile Regression, the distance of points from a line is measured using a weighted sum of vertical

distances (without squaring):

● points below the fitted line are given a weight 1-p;

● points above the fitted line are given a weight p.

Each choice for this proportion p gives rise to a different fitted conditional-quantile function. The task is to find

an estimator with the desired property for each possible p.

The quantile regression is described by the following equation:

where is the vector of unknown parameters associated with the pth quantile.

We minimize an asymmetric loss function given by:


The regression estimator can be solved using linear programming, yielding

where the loss function is defined as

The following are the existing algorithms to obtain the regression estimator:

Simplex Method - for moderate data size

Interior Point Method - for large data size

Interior Point Method with Preprocessing - for very large data sets (n>105)

Smoothing Method

Properties of Quantile Regression Estimators

1. Scale Equivariant

2. Regression Shift Equivariant

3. Equivariant to Reparametrization of Design

4. Equivariant to Monotone Transformation


Inference in Quantile Regression

Methods of Constructing Confidence Intervals

1. Sparsity

- based on the asymptotic distribution of the : the asymptotic dispersion matrix involves the

reciprocal of the density function of the error terms

- this reciprocal is called the sparsity function and this must be estimated first before confidence

intervals can be constructed

- yields different estimates for the case of i.i.d error terms and for the case of non-i.i.d. error

terms

2. Inversion of Rank Tests

- generalization of sign tests

- based on the relationship between order statistics and rank scores

- involves linear programming (simplex method)

- computationally burdensome for large data sets

3. Bootstrap (Resampling)

- does not make use of any distributional assumption

- the number of resamples, M, is usually between 50 and 200

Recommendation:

Let n be the number of observations and k be the number of parameters.

n ≤ 1000 and k ≤ 10 Inversion of Rank Tests

1 × 104 < nk < 2 × 106 Bootstrap

for very large data sets Sparsity

Tests for Significance of Coefficients

1. Wald Test

Ho: , where is a subset of the parameters

Ha: at least one parameter ≠ 0

Test statistic:

where is an estimator of the dispersion matrix of .

Under Ho, the test statistic is distributed as with degrees of freedom equal to the number of

parameters in .


2. Likelihood Ratio Test

Ho: , where is a subset of the parameters

Ha: at least one parameter ≠ 0

Test statistic:

where is the estimated sparsity function.

Under Ho, the test statistic is distributed as with degrees of freedom equal to the number of

parameters in .

Remark: Koenker and Machado (1999) prove that these two tests are asymptotically equivalent.

Test for Equality of Coefficients Across Quantiles

Let p and q be distinct quantiles.

Case 1: Single Coefficient

Ho:

Ha:

Test statistic:

where

is the estimated variance of

.

Under Ho, the test statistic is distributed as .


Case 2: Multiple Coefficients

Ho:

Ha:

Test statistic:

where is the estimated covariance matrix for .

Under Ho, the test statistic is distributed as with degrees of freedom equal to the number of parameters

specified in Ho.

Goodness of Fit

Recall:

In ordinary least squares, the goodness of fit is measured by R 2, the coefficient of determination. It is

interpreted as the proportion of the variation in the dependent variable explained by the predictor variables in

the model.

An analog of the R2 statistic is developed for quantile-regression models. Since quantile-regression models are

based on minimizing a sum of weighted distances – with different weights used depending on whether

or , goodness of fit is measured that is consistent with this criterion.

Koenker and Machado (1999) suggest measuring goodness of fit by comparing the sum of weighted distances

for the model of interest with the sum in which only the intercept appears. Let be the sum of weighted

distances for the full pth quantile regression model and let be the sum of weighted distance for the

model that includes only a constant term.


In a one-covariate model, for instance, we have

and

Then, the goodness of fit is defined as

Since are nonnegative, R (p) is at most 1. Also, is greater than or equal to

implying that R (p) is greater than or equal to zero. Hence, R (p) is [0, 1] with larger R (p) indicating better fit.

R (p) allows for comparison of a fitted model with any number of covariates to the model in which only the

intercept is present.

To extend the concept of R (p), relative R (p) is introduced. It measures the fit relative to a more restricted

form of model. It can be expressed as,

where , sum of weighted distances for the less restricted pth quantile

regression model

, sum of weighted distance for the more restricted model

STATA provides the measure of goodness of fit using R(p) and refers it as “pseudo-R2”.

Remark:

R(p) accounts for the appropriate weight each observation takes for specific quantile equation. It is easy to

comprehend and its interpretation follows the familiar R-squared for the OLS.

Interpretation of Coefficients

In OLS, fitted coefficients can be interpreted as the estimated change in the mean of the response variable

resulting from one unit increase in a continuous covariate.


Similarly, the QRM coefficient estimate is interpreted as the estimated change in the pth quantile of the

response variable corresponding to a unit change in the regressor.

Median-Regression Model

The simplest QRM is the median-regression model (MRM), expresses the conditional median of a response

variable given predictor variables and alternative to OLS that fits the conditional mean. MRM and OLS both

attempt to model the central location of a response variable.

Median-regression model is more suitable in modeling the behavior a collection of skewed conditional

distributions. For instance, if these conditional distributions are skewed to the right, their means reflects what

is happening in the upper tail and not in the middle.

Interpretation: In the case of a continuous covariate, the coefficient estimate is interpreted as the change in

the median of the response variable corresponding to a unit change in the predictor.

Using QRM Results to Interpret Shape Shifts

Two of the most important features to consider are scale (spread) and skewness.

The analysis of shape effects reveals more info than analysis of location effects alone.

Arrays of QRM coefficients for a range of quantiles can be used to determine how a one-unit increase in the

covariate affects the shape of the response distribution. This shape shift is highlighted using the graphical

method. For a particular covariate, we plot the coefficients and the confidence envelope, where the predictor

variable effects on the y-axis and the value of p is on the x-axis.

Graphical patterns for the effect of a covariate on the response:

1. A horizontal line indicates a pure location shift by a one-unit increase in the covariate.

2. An upward-sloping curve indicates an increase in the scale

The effect of one unit increase of the regressor is positive for all values of p and steadily

increasing with p

3. Whereas a downward-sloping curve indicates a decrease in the scale of the conditional-response

distribution

Note that regressors are for shape shifts if is monotonically increasing with p,that is, >

whenever p>q.


Scale Shifts

The standard deviation is commonly employed measure of the scale or spread for symmetric distribution For

skewed distributions, the distaces between selected quantiles provide a more informed description of the

spread than the standard deviation. For a value of p between 0 and .5,we identify two sample quantiles:Q(1-p)

and Q(p)(the pth quantile). The pth interquantile range, IQR(p)=Q(1−p)−Q(p) is a measure of spread. This

quantity describes the range of the middle (1−2p)

proportion of the distribution.

Suppose the reference group and comparison group have the same median. Fixing some choice of p, we can

measure the interquantile range IQRr = Ur –Lr and IQRc = Uc–Lc for the reference group and comparison group

respectively.The difference-in-differences IQRc – IQRr as a measure of the scale shift.

The QRM fits provide an alternative approach to estimating scale-shift effects. Here, is the fitted

coefficient indicating the increase or decrease in any particular quantile brought about by a unit increase in

the covariate. Thus, when we increase the covariate by one unit, the corresponding pth interquantile range

changes by the amount - which is the

When SCS(p) is zero, there is apparently no evidence of scale change. A negative value indicates that increasing

the covariate results in a decrease in scale, while a positive value indicates the opposite effect.

Skewness Shifts

A disproportional scale shift that relates to greater skewness indicates an additional effect on the shape of the

response distribution

Let Mr and Mc indicate the median of the reference and the comparison, respectively. The upper spread is Ur−

Mr

and Uc− Mc for the reference and comparison, respectively. The lower spread is for the reference and Mc−Lc for

the comparison. The disproportion can be measured by taking the ratio of Uc− Mc / Ur− Mr to Mc−Lc / Mr−Lr

If this “ratio-of-ratios” equals 1, then there is no skewness shift. Ifthe ratio-of-ratios is less than 1, the right-

skewness is reduced. If the ratio-of ratios is greater than 1, the right-skewness is increased. The shift in terms

of percentage change can be obtained by this quantity minus 1. This is known as quantity skewness shift,or

SKS


In general, using the QRM coefficients, model-based SKS is obtained. This involves the conditional quantiles of

the reference group. The SKSfor the middle 100(1−2p)%of the population is:

Note that because we take the ratio of two ratios, SKS effectively eliminates the influence of a proportional

scale shift. When SKS=0, it indicates either no scale shift or a proportional scale shift. SKS<0 indicates a

reduction of right-skewness due to the effect of the explanatory variable whereas SKS>0 indicates an

exacerbation of right-skewness.

Quantile Regression in Stata

Example 1:

income = household income

ed = number of years of education of household head

white = 1 if household head is white, 0 if black

_cons -29927.67 1312.101 -22.81 0.000 -32499.47 -27355.86

white 9792.334 727.3664 13.46 0.000 8366.645 11218.02

ed 4794.333 91.68188 52.29 0.000 4614.63 4974.036


Min sum of deviations 6.02e+08 Pseudo R2 = 0.0985

Raw sum of deviations 6.68e+08 (about 39977.45)

Median regression Number of obs = 22624

Iteration 8: sum of abs. weighted deviations = 6.018e+08


note: alternate solutions exist




note: alternate solutions exist




Iteration 1: WLS sum of weighted deviations = 6.202e+08

. qreg income ed white


An additional one year of education will increase the median income by about $4,794. The median income of

whites is $9,792 higher than that of the blacks. Both ED and WHITE are significant predictors of INCOME based

on the t-statistics. The coefficient for ED in the MRM is lower than the coefficient in the OLS model ($6,314).

This suggests that while an increase of one year of education gives rise to an average increase of $6,314 in

income, the increase would not be as substantial for most of the population. Similarly, the coefficient for

white in the MRM is lower than the corresponding coefficient in the OLS model ($11,452).

Wald Test of Significance

Reject the null hypothesis of . There is sufficient

evidence to say that ED and WHITE are jointly significant predictors of

INCOME.

Quantile Regression Estimates for Income

.05 .10 .20 .25 .30 .40 .50 .60 .70 .75 .80 .90 .95

ED 1130 1782 2757 3172 3571 4266 4794 5571 6224 6598 6954 8279 9575

WHITE 3197 4689 6557 6724 7541 8744 9792 11091 11739 12142 12972 14049 17484

CONS -7910 -13536 -20721 - 22986 -25590 -29104 -29928 -33090 -32909 -32344 -30702 -27562 -22126

We see that one more year of education can increase income by $1,782 at the .10th quantile and $1,130 at

the .05th quantile. Examining the estimates of education at the .90th and .95th quantiles, the coefficient for

the .95th quantile is $9,575, much larger than at the .90th quantile ($8,279). These results suggest the

contribution of prestigious higher education to income disparity.

Prob > F = 0.0000

F( 2, 22621) = 1589.34

( 2) white = 0

( 1) ed = 0

. test ed white


Test for Equality of Coefficients Across Quantiles

Testing for equality of at the .10th and .90th quantiles:

_cons -27561.84 3388.43 -8.13 0.000 -34203.39 -20920.28

white 14049.07 1900.115 7.39 0.000 10324.71 17773.43

ed 8278.88 224.7802 36.83 0.000 7838.295 8719.465

q90

_cons -32344.18 1995.658 -16.21 0.000 -36255.81 -28432.55

white 12141.82 827.9499 14.66 0.000 10518.98 13764.66

ed 6598.182 169.8196 38.85 0.000 6265.324 6931.04

q75

_cons -29927.67 570.7646 -52.43 0.000 -31046.4 -28808.93

white 9792.334 565.642 17.31 0.000 8683.637 10901.03

ed 4794.333 51.30182 93.45 0.000 4693.778 4894.888

q50

_cons -22985.67 814.5297 -28.22 0.000 -24582.2 -21389.13

white 6723.666 541.5137 12.42 0.000 5662.262 7785.07

ed 3172.222 45.30373 70.02 0.000 3083.424 3261.021

q25

_cons -13536 715.7417 -18.91 0.000 -14938.9 -12133.1

white 4688.667 300.6245 15.60 0.000 4099.422 5277.912

ed 1782.333 59.18355 30.12 0.000 1666.329 1898.337

q10


Bootstrap

.90 Pseudo R2 = 0.1208

.75 Pseudo R2 = 0.1141

.50 Pseudo R2 = 0.0985

.25 Pseudo R2 = 0.0726

bootstrap(20) SEs .10 Pseudo R2 = 0.0441

Simultaneous quantile regression Number of obs = 22624

(bootstrapping ....................)

(fitting base model)

. sqreg income ed white, quantile(0.1 0.25 0.5 0.75 0.9)

Prob > F = 0.0000

F( 1, 22621) = 780.16

( 1) [q10]ed - [q90]ed = 0

. test [q10]ed=[q90]ed


Testing for equality of at the .10th and .90th quantiles:

Testing for the joint equality of and at the .10th and .90th quantiles:

The effect of an additional year of education is different for the lower-income bracket and the higher-income

bracket. Likewise, the effect of being white is also different for the lower-income bracket and the higher-

income bracket. The joint effect of ED and WHITE is also significant i.e. the effect of an addditional year of

schooling and being white at the .10th quantile differs from the effect at the .90th quantile.

Shape Shifts

The effect of ED can be described as the change in the

income quantile brought about by one additional year of

education, at any level of education, fixing race. The

education effect is significantly positive, because the

confidence envelope does not cross the horizontal zero

line. The graph shows an upward-sloping curve for the

effects of education: the effect of one more year of

schooling is positive for all values of p and steadily

increasing with p. The increase accelerates after

the .80th quantile.

Prob > F = 0.0001

F( 1, 22621) = 14.71

( 1) [q10]white - [q90]white = 0

. test [q10]white=[q90]white

Prob > F = 0.0000

F( 2, 22621) = 395.42

( 2) [q10]white - [q90]white = 0

( 1) [q10]ed - [q90]ed = 0

. test ([q10]ed=[q90]ed) ([q10]white=[q90]white)

10

00

08

00

06

00

04

00

02

00

0

0

ed

0 .2 .4 .6 .8 1Quantile


The effect of WHITE can be described as the change in the

income quantile brought about by changing the race from

black to white, fixing the education level. The effect of

being white is significantly positive, as the zero line is far

below the confidence envelope. The graph shows an

upward-sloping curve for the effect of being white as

compared with being black. The slopes below the .15 th

quantile and above the .90th quantile are steeper than

those at the middle quantiles.

The estimate is monotonically increasing with p. This tells us that an additional year of education or changing

race from black to white has a greater effect on income for higher-income brackets than for lower-income

brackets. The monotonicity also has scale-effect implications. Changing race from black to white or adding a

year of education increases the scale of the response.

Shape Shifts: Scale Shifts

pth Interquantile Range SCS (ED) SCS (WHITE)

0.25: 3426 5418

0.10: 6497 9360

0.05: 8445 14287

The scale shift brought about by one more year of schooling for the middle 50% of the population is $3,426.

One more year of schooling increases the scale of income by $6,497 for the middle 80% of the population, and

by $8,445 for the middle 90% of the population. Controlling for education, whites’ income spread is higher

than blacks’ income spread by: $5,418 for the middle 50% of the population, $9,360 for the middle 80%, and

$14,287 for the middle 90%.

25

00

02

00

00

15

00

01

00

00

50

00

0

wh

ite

0 .2 .4 .6 .8 1Quantile


Shape Shifts: Skewness Shifts

Note: The used is the value at the “typical setting” i.e.

middle 100(1-2p)% of the

population

SKS (ED) SKS (WHITE)

middle 50% (p=0.25) -0.047 -0.087

middle 80% (p=0.10) -0.037 -0.085

middle 90% (p=0.05) -0.016 -0.066

One more year of schooling reduces right-skewness by 1.6% for the middle 90% of the population, 3.7% for

the middle 80% and 4.7% for the middle 50%. The impact of being white also decreases right-skewness by 6.6%

for the middle 90%, 8.5% for the middle 80% and 8.7% for the middle 50%. This finding indicates a greater

expansion of the white upper middle class than the black upper middle class.

Summary:

One more year of education induces a positive location and scale shift but a negative skewness shift. Similarly,

being white induces a positive location and scale shift with a negative skewness shift. The model suggests that

while higher education and being white are associated with a higher median income and a wider income

spread, the income distributions for the less educated and for the blacks are more skewed.


Quantile Regression in SAS

Example 2:

Murders – number of murders per 1,000,000 inhabitants per annum

Inhabitants – number of inhabitants

Income – Percentage of families with incomes below $5000

Unemp – Percentage of unemployed inhabitants

PROC QUANTREG DATA = sample CI = rank;

MODEL murders = inhabitants income unemp/quantile = 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 plot =

quantplot;

TEST inhabitants income unemp/ wald lr;

RUN;

Note: If we consider all quantiles, the rank option for computing confidence intervals is not available. (You

may use only sparsity and resampling.) Likewise, it is not possible to use Wald and Likelihood Ratio Tests.

Quantile Regression Estimates for Number of Murders

0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95 Intercept -58.38 -58.38 -59.91 -39.30 -37.18 -46.68 -67.90 -86.95 -76.14 -103.34 -103.42 -104.40 -164.52

Inhabi-tants

1.96 1.96 1.88 0.72 0.63 1.22 1.86 3.28 3.05 5.07 5.07 5.03 9.41

income 0.86 0.86 1.12 1.53 1.34 1.44 1.39 1.26 1.10 1.04 1.04 1.12 1.06

unemp 4.36 4.36 4.04 2.44 2.76 2.88 5.06 5.46 5.08 5.25 5.26 5.31 5.72

An additional inhabitant will increase the median number of murders by 1.86; a unit increase in the

percentage of families with incomes below $5000 will increase the median number of murders by 1.39; a unit

increase in the percentage of unemployed inhabitants will increase the median number of murders by 5.06.


Wald and Likelihood Ratio Tests

Ho: Ha: at least one parameter ≠ 0

Test Results

Quantile Test Test Statistic DF Chi-Square Pr > ChiSq

0.05 Wald 955.7538 3 955.75 <.0001

0.10 Wald 144.0549 3 144.05 <.0001

0.10 Likelihood Ratio 309.5411 3 309.54 <.0001

0.20 Wald 60.6047 3 60.60 <.0001


0.30 Wald 55.0154 3 55.02 <.0001


0.40 Wald 32.7730 3 32.77 <.0001


0.50 Wald 58.0711 3 58.07 <.0001


0.60 Wald 96.7067 3 96.71 <.0001


0.70 Wald 139.9484 3 139.95 <.0001


0.80 Wald 233.4782 3 233.48 <.0001


Test Results

Quantile Test Test Statistic DF Chi-Square Pr > ChiSq


0.90 Wald 1267.9173 3 1267.92 <.0001


0.95 Wald 978.7139 3 978.71 <.0001

For all quantiles in consideration, there is sufficient evidence to conclude that the number of inhabitants, the

percentage of families with incomes below $5000, and the percentage of unemployed inhabitants are jointly

significant predictors of the number of murders.

Test for Equality of Coefficients

PROC QUANTREG DATA = sample CI = rank;

MODEL murders = inhabitants income unemp/quantile = 0.75 0.8;

TEST inhabitants income unemp/qinteract;

RUN;

Test Results Equal Coefficients

Across Quantiles

Chi-Square DF Pr > ChiSq

0.0056 3 0.9999

Thus, there is no sufficient evidence to conclude that the coefficents for the 0.75th and the 0.8th quantile

jointly differ.


Shape Shifts

The effect of inhabitants on the number of murders is only significant from around the 0.5 th quantile onwards.

The effect of income on the number of murders is only significant until somewhere around the 0.7 th quantile.

The effect of the unemployment on the number of murders is only significant until somewhere around the

0.5th quantile. Thus, the lower quantiles of income and unemployment significantly affect the number of

murders while the upper quantiles of the number of inhabitants significantly affect the number of murders.

Scale Shifts

pth interquartile range

SCS(inhabitants) SCS(income) SCS(unemp)

0.25: 4.3518 -0.4872 2.8115

0.10: 3.0699 0.2602 0.9506

0.05: 7.455 0.2017 1.3628

An additional inhabitant increases the scale of the number of murders by 4.3518 for the middle 50% of the

population, by 3.0699 for the middle 80% of the population, and by 7.455 for the middle 90% of the

population. A unit increase in the percentage of families with incomes below $5000 decreases the scale of the

number of murders by 0.4872 for the middle 50% of the population, while it increases the scale of the number

of murders by 0.2602 for the middle 80% of the population, and by 0.2017 for the middle 90% of the

population. A unit increase in the percentage of unemployed inhabitants increases the scale of the number of

murders by 2.8115 for the middle 50% of the population, by 0.9506 for the middle 80% of the population, and

by 1.3628 for the middle 90% of the population.


Skewness Shifts

middle 100(1-2p)% of the population

SKS(inhabitants) SKS(income) SKS(unemp)

middle 50% (p=0.25) -0.50852 0.114957637 -0.73607

middle 80% (p=0.10) 0.100869 -0.465156696 -0.44369

middle 90% (p=0.05) 0.520159 -0.403896185 -0.36112

An additional inhabitant reduces the right-skewness by 50.9% for the middle 50% of the population, while it

increases the right-skewness by 10.1% for the middle 80% and by 52% for the middle 90%. A unit increase in

the percentage of families with incomes below $5000 increases the right-skewness by 11.5% for the middle 50%

of the population, while it reduces the right-skewness by 46.5% for the middle 80% and by 40.4% for the

middle 90%. A unit increase in the percentage of unemployed inhabitants reduces the right-skewness by 73.6%

for the middle 50% of the population, by 44.4% for the middle 80%, and by 36.1% for the middle 90%.

quantile regression (final).pdf

Documents

quantile regression

fitted regression line

regression estimator

pth quantile

conditional variance

linear regression modeling

conditional median functions

fitted line