quantile regression (final).pdf
TRANSCRIPT
1 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
QUANTILE REGRESSION
Motivation: Linear Regression Modeling and Its Shortcomings
Recall: Ordinary Least Squares Model
Note:
A fundamental aspect of linear-regression models is that they attempt to describe how the location of
the conditional distribution behaves by utilizing the mean of a distribution to represent its central
tendency.
It invokes a homoscedasticity assumption; that is, the conditional variance, Var (y|x), is assumed to be
a constant 2 for all values of the covariate.
A third distinctive feature of the OLS is its normality assumption.
Outliers (cases that do not follow the relationship for the majority of the data) tend to have undue
influence on the fitted regression line.
Consider an extreme situation:
Note that: These results show that the LRM approach can be inadequate for a variety of reasons, including
heteroscedasticity and outlier assumptions and the failure to detect multiple forms of shape shifts.
2 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Ordinary Least Squares Versus Quantile Regression Model
Ordinary Least Squares Quantile Regression Model
objective function sums of squared residuals asymmetrically weighted absolute
residuals
estimates conditional mean functions conditional quantile functions,
such as conditional median
functions
allows
heteroskedasticity?
no yes
distributional
assumptions
normality and homoskedasticity
of error terms
none
comprehensiveness only yields information about
the conditional mean E(Y|X)
yields information about the
whole conditional distribution of Y
Prob > chi2 = 0.0000
chi2(1) = 5180.30
Variables: fitted values of income
Ho: Constant variance
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
. estat hettest,normal
_cons -42655.95 1442.537 -29.57 0.000 -45483.42 -39828.47
white 11451.75 799.8409 14.32 0.000 9884.01 13019.5
ed 6313.654 100.8045 62.63 0.000 6116.07 6511.237
income Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 4.7076e+13 22623 2.0809e+09 Root MSE = 41684
Adj R-squared = 0.1650
Residual 3.9306e+13 22621 1.7376e+09 R-squared = 0.1651
Model 7.7702e+12 2 3.8851e+12 Prob > F = 0.0000
F( 2, 22621) = 2235.92
Source SS df MS Number of obs = 22624
. regress income ed white
3 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Quantile Regression Model
Proposed by Koenker and Bassett (1978), quantile regression models conditional quantiles as functions of
predictors. It estimates the effect of a covariate on various quantiles in the conditional distribution.
Quantile Regression Estimation
In Quantile Regression, the distance of points from a line is measured using a weighted sum of vertical
distances (without squaring):
● points below the fitted line are given a weight 1-p;
● points above the fitted line are given a weight p.
Each choice for this proportion p gives rise to a different fitted conditional-quantile function. The task is to find
an estimator with the desired property for each possible p.
The quantile regression is described by the following equation:
where is the vector of unknown parameters associated with the pth quantile.
We minimize an asymmetric loss function given by:
4 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
The regression estimator can be solved using linear programming, yielding
where the loss function is defined as
The following are the existing algorithms to obtain the regression estimator:
Simplex Method - for moderate data size
Interior Point Method - for large data size
Interior Point Method with Preprocessing - for very large data sets (n>105)
Smoothing Method
Properties of Quantile Regression Estimators
1. Scale Equivariant
2. Regression Shift Equivariant
3. Equivariant to Reparametrization of Design
4. Equivariant to Monotone Transformation
5 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
6 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Inference in Quantile Regression
Methods of Constructing Confidence Intervals
1. Sparsity
- based on the asymptotic distribution of the : the asymptotic dispersion matrix involves the
reciprocal of the density function of the error terms
- this reciprocal is called the sparsity function and this must be estimated first before confidence
intervals can be constructed
- yields different estimates for the case of i.i.d error terms and for the case of non-i.i.d. error
terms
2. Inversion of Rank Tests
- generalization of sign tests
- based on the relationship between order statistics and rank scores
- involves linear programming (simplex method)
- computationally burdensome for large data sets
3. Bootstrap (Resampling)
- does not make use of any distributional assumption
- the number of resamples, M, is usually between 50 and 200
Recommendation:
Let n be the number of observations and k be the number of parameters.
n ≤ 1000 and k ≤ 10 Inversion of Rank Tests
1 × 104 < nk < 2 × 106 Bootstrap
for very large data sets Sparsity
Tests for Significance of Coefficients
1. Wald Test
Ho: , where is a subset of the parameters
Ha: at least one parameter ≠ 0
Test statistic:
where is an estimator of the dispersion matrix of .
Under Ho, the test statistic is distributed as with degrees of freedom equal to the number of
parameters in .
7 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
2. Likelihood Ratio Test
Ho: , where is a subset of the parameters
Ha: at least one parameter ≠ 0
Test statistic:
where is the estimated sparsity function.
Under Ho, the test statistic is distributed as with degrees of freedom equal to the number of
parameters in .
Remark: Koenker and Machado (1999) prove that these two tests are asymptotically equivalent.
Test for Equality of Coefficients Across Quantiles
Let p and q be distinct quantiles.
Case 1: Single Coefficient
Ho:
Ha:
Test statistic:
where
is the estimated variance of
.
Under Ho, the test statistic is distributed as .
8 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Case 2: Multiple Coefficients
Ho:
Ha:
Test statistic:
where is the estimated covariance matrix for .
Under Ho, the test statistic is distributed as with degrees of freedom equal to the number of parameters
specified in Ho.
Goodness of Fit
Recall:
In ordinary least squares, the goodness of fit is measured by R 2, the coefficient of determination. It is
interpreted as the proportion of the variation in the dependent variable explained by the predictor variables in
the model.
An analog of the R2 statistic is developed for quantile-regression models. Since quantile-regression models are
based on minimizing a sum of weighted distances – with different weights used depending on whether
or , goodness of fit is measured that is consistent with this criterion.
Koenker and Machado (1999) suggest measuring goodness of fit by comparing the sum of weighted distances
for the model of interest with the sum in which only the intercept appears. Let be the sum of weighted
distances for the full pth quantile regression model and let be the sum of weighted distance for the
model that includes only a constant term.
9 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
In a one-covariate model, for instance, we have
and
Then, the goodness of fit is defined as
Since are nonnegative, R (p) is at most 1. Also, is greater than or equal to
implying that R (p) is greater than or equal to zero. Hence, R (p) is [0, 1] with larger R (p) indicating better fit.
R (p) allows for comparison of a fitted model with any number of covariates to the model in which only the
intercept is present.
To extend the concept of R (p), relative R (p) is introduced. It measures the fit relative to a more restricted
form of model. It can be expressed as,
where , sum of weighted distances for the less restricted pth quantile
regression model
, sum of weighted distance for the more restricted model
STATA provides the measure of goodness of fit using R(p) and refers it as “pseudo-R2”.
Remark:
R(p) accounts for the appropriate weight each observation takes for specific quantile equation. It is easy to
comprehend and its interpretation follows the familiar R-squared for the OLS.
Interpretation of Coefficients
In OLS, fitted coefficients can be interpreted as the estimated change in the mean of the response variable
resulting from one unit increase in a continuous covariate.
10 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Similarly, the QRM coefficient estimate is interpreted as the estimated change in the pth quantile of the
response variable corresponding to a unit change in the regressor.
Median-Regression Model
The simplest QRM is the median-regression model (MRM), expresses the conditional median of a response
variable given predictor variables and alternative to OLS that fits the conditional mean. MRM and OLS both
attempt to model the central location of a response variable.
Median-regression model is more suitable in modeling the behavior a collection of skewed conditional
distributions. For instance, if these conditional distributions are skewed to the right, their means reflects what
is happening in the upper tail and not in the middle.
Interpretation: In the case of a continuous covariate, the coefficient estimate is interpreted as the change in
the median of the response variable corresponding to a unit change in the predictor.
Using QRM Results to Interpret Shape Shifts
Two of the most important features to consider are scale (spread) and skewness.
The analysis of shape effects reveals more info than analysis of location effects alone.
Arrays of QRM coefficients for a range of quantiles can be used to determine how a one-unit increase in the
covariate affects the shape of the response distribution. This shape shift is highlighted using the graphical
method. For a particular covariate, we plot the coefficients and the confidence envelope, where the predictor
variable effects on the y-axis and the value of p is on the x-axis.
Graphical patterns for the effect of a covariate on the response:
1. A horizontal line indicates a pure location shift by a one-unit increase in the covariate.
2. An upward-sloping curve indicates an increase in the scale
The effect of one unit increase of the regressor is positive for all values of p and steadily
increasing with p
3. Whereas a downward-sloping curve indicates a decrease in the scale of the conditional-response
distribution
Note that regressors are for shape shifts if is monotonically increasing with p,that is, >
whenever p>q.
11 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Scale Shifts
The standard deviation is commonly employed measure of the scale or spread for symmetric distribution For
skewed distributions, the distaces between selected quantiles provide a more informed description of the
spread than the standard deviation. For a value of p between 0 and .5,we identify two sample quantiles:Q(1-p)
and Q(p)(the pth quantile). The pth interquantile range, IQR(p)=Q(1−p)−Q(p) is a measure of spread. This
quantity describes the range of the middle (1−2p)
proportion of the distribution.
Suppose the reference group and comparison group have the same median. Fixing some choice of p, we can
measure the interquantile range IQRr = Ur –Lr and IQRc = Uc–Lc for the reference group and comparison group
respectively.The difference-in-differences IQRc – IQRr as a measure of the scale shift.
The QRM fits provide an alternative approach to estimating scale-shift effects. Here, is the fitted
coefficient indicating the increase or decrease in any particular quantile brought about by a unit increase in
the covariate. Thus, when we increase the covariate by one unit, the corresponding pth interquantile range
changes by the amount - which is the
When SCS(p) is zero, there is apparently no evidence of scale change. A negative value indicates that increasing
the covariate results in a decrease in scale, while a positive value indicates the opposite effect.
Skewness Shifts
A disproportional scale shift that relates to greater skewness indicates an additional effect on the shape of the
response distribution
Let Mr and Mc indicate the median of the reference and the comparison, respectively. The upper spread is Ur−
Mr
and Uc− Mc for the reference and comparison, respectively. The lower spread is for the reference and Mc−Lc for
the comparison. The disproportion can be measured by taking the ratio of Uc− Mc / Ur− Mr to Mc−Lc / Mr−Lr
If this “ratio-of-ratios” equals 1, then there is no skewness shift. Ifthe ratio-of-ratios is less than 1, the right-
skewness is reduced. If the ratio-of ratios is greater than 1, the right-skewness is increased. The shift in terms
of percentage change can be obtained by this quantity minus 1. This is known as quantity skewness shift,or
SKS
12 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
In general, using the QRM coefficients, model-based SKS is obtained. This involves the conditional quantiles of
the reference group. The SKSfor the middle 100(1−2p)%of the population is:
Note that because we take the ratio of two ratios, SKS effectively eliminates the influence of a proportional
scale shift. When SKS=0, it indicates either no scale shift or a proportional scale shift. SKS<0 indicates a
reduction of right-skewness due to the effect of the explanatory variable whereas SKS>0 indicates an
exacerbation of right-skewness.
Quantile Regression in Stata
Example 1:
income = household income
ed = number of years of education of household head
white = 1 if household head is white, 0 if black
_cons -29927.67 1312.101 -22.81 0.000 -32499.47 -27355.86
white 9792.334 727.3664 13.46 0.000 8366.645 11218.02
ed 4794.333 91.68188 52.29 0.000 4614.63 4974.036
income Coef. Std. Err. t P>|t| [95% Conf. Interval]
Min sum of deviations 6.02e+08 Pseudo R2 = 0.0985
Raw sum of deviations 6.68e+08 (about 39977.45)
Median regression Number of obs = 22624
Iteration 8: sum of abs. weighted deviations = 6.018e+08
Iteration 7: sum of abs. weighted deviations = 6.018e+08
note: alternate solutions exist
Iteration 6: sum of abs. weighted deviations = 6.018e+08
Iteration 5: sum of abs. weighted deviations = 6.020e+08
Iteration 4: sum of abs. weighted deviations = 6.043e+08
note: alternate solutions exist
Iteration 3: sum of abs. weighted deviations = 6.086e+08
Iteration 2: sum of abs. weighted deviations = 6.151e+08
Iteration 1: sum of abs. weighted deviations = 6.202e+08
Iteration 1: WLS sum of weighted deviations = 6.202e+08
. qreg income ed white
13 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
An additional one year of education will increase the median income by about $4,794. The median income of
whites is $9,792 higher than that of the blacks. Both ED and WHITE are significant predictors of INCOME based
on the t-statistics. The coefficient for ED in the MRM is lower than the coefficient in the OLS model ($6,314).
This suggests that while an increase of one year of education gives rise to an average increase of $6,314 in
income, the increase would not be as substantial for most of the population. Similarly, the coefficient for
white in the MRM is lower than the corresponding coefficient in the OLS model ($11,452).
Wald Test of Significance
Reject the null hypothesis of . There is sufficient
evidence to say that ED and WHITE are jointly significant predictors of
INCOME.
Quantile Regression Estimates for Income
.05 .10 .20 .25 .30 .40 .50 .60 .70 .75 .80 .90 .95
ED 1130 1782 2757 3172 3571 4266 4794 5571 6224 6598 6954 8279 9575
WHITE 3197 4689 6557 6724 7541 8744 9792 11091 11739 12142 12972 14049 17484
CONS -7910 -13536 -20721 - 22986 -25590 -29104 -29928 -33090 -32909 -32344 -30702 -27562 -22126
We see that one more year of education can increase income by $1,782 at the .10th quantile and $1,130 at
the .05th quantile. Examining the estimates of education at the .90th and .95th quantiles, the coefficient for
the .95th quantile is $9,575, much larger than at the .90th quantile ($8,279). These results suggest the
contribution of prestigious higher education to income disparity.
Prob > F = 0.0000
F( 2, 22621) = 1589.34
( 2) white = 0
( 1) ed = 0
. test ed white
14 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Test for Equality of Coefficients Across Quantiles
Testing for equality of at the .10th and .90th quantiles:
_cons -27561.84 3388.43 -8.13 0.000 -34203.39 -20920.28
white 14049.07 1900.115 7.39 0.000 10324.71 17773.43
ed 8278.88 224.7802 36.83 0.000 7838.295 8719.465
q90
_cons -32344.18 1995.658 -16.21 0.000 -36255.81 -28432.55
white 12141.82 827.9499 14.66 0.000 10518.98 13764.66
ed 6598.182 169.8196 38.85 0.000 6265.324 6931.04
q75
_cons -29927.67 570.7646 -52.43 0.000 -31046.4 -28808.93
white 9792.334 565.642 17.31 0.000 8683.637 10901.03
ed 4794.333 51.30182 93.45 0.000 4693.778 4894.888
q50
_cons -22985.67 814.5297 -28.22 0.000 -24582.2 -21389.13
white 6723.666 541.5137 12.42 0.000 5662.262 7785.07
ed 3172.222 45.30373 70.02 0.000 3083.424 3261.021
q25
_cons -13536 715.7417 -18.91 0.000 -14938.9 -12133.1
white 4688.667 300.6245 15.60 0.000 4099.422 5277.912
ed 1782.333 59.18355 30.12 0.000 1666.329 1898.337
q10
income Coef. Std. Err. t P>|t| [95% Conf. Interval]
Bootstrap
.90 Pseudo R2 = 0.1208
.75 Pseudo R2 = 0.1141
.50 Pseudo R2 = 0.0985
.25 Pseudo R2 = 0.0726
bootstrap(20) SEs .10 Pseudo R2 = 0.0441
Simultaneous quantile regression Number of obs = 22624
(bootstrapping ....................)
(fitting base model)
. sqreg income ed white, quantile(0.1 0.25 0.5 0.75 0.9)
Prob > F = 0.0000
F( 1, 22621) = 780.16
( 1) [q10]ed - [q90]ed = 0
. test [q10]ed=[q90]ed
15 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Testing for equality of at the .10th and .90th quantiles:
Testing for the joint equality of and at the .10th and .90th quantiles:
The effect of an additional year of education is different for the lower-income bracket and the higher-income
bracket. Likewise, the effect of being white is also different for the lower-income bracket and the higher-
income bracket. The joint effect of ED and WHITE is also significant i.e. the effect of an addditional year of
schooling and being white at the .10th quantile differs from the effect at the .90th quantile.
Shape Shifts
The effect of ED can be described as the change in the
income quantile brought about by one additional year of
education, at any level of education, fixing race. The
education effect is significantly positive, because the
confidence envelope does not cross the horizontal zero
line. The graph shows an upward-sloping curve for the
effects of education: the effect of one more year of
schooling is positive for all values of p and steadily
increasing with p. The increase accelerates after
the .80th quantile.
Prob > F = 0.0001
F( 1, 22621) = 14.71
( 1) [q10]white - [q90]white = 0
. test [q10]white=[q90]white
Prob > F = 0.0000
F( 2, 22621) = 395.42
( 2) [q10]white - [q90]white = 0
( 1) [q10]ed - [q90]ed = 0
. test ([q10]ed=[q90]ed) ([q10]white=[q90]white)
10
00
08
00
06
00
04
00
02
00
0
0
ed
0 .2 .4 .6 .8 1Quantile
16 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
The effect of WHITE can be described as the change in the
income quantile brought about by changing the race from
black to white, fixing the education level. The effect of
being white is significantly positive, as the zero line is far
below the confidence envelope. The graph shows an
upward-sloping curve for the effect of being white as
compared with being black. The slopes below the .15 th
quantile and above the .90th quantile are steeper than
those at the middle quantiles.
The estimate is monotonically increasing with p. This tells us that an additional year of education or changing
race from black to white has a greater effect on income for higher-income brackets than for lower-income
brackets. The monotonicity also has scale-effect implications. Changing race from black to white or adding a
year of education increases the scale of the response.
Shape Shifts: Scale Shifts
pth Interquantile Range SCS (ED) SCS (WHITE)
0.25: 3426 5418
0.10: 6497 9360
0.05: 8445 14287
The scale shift brought about by one more year of schooling for the middle 50% of the population is $3,426.
One more year of schooling increases the scale of income by $6,497 for the middle 80% of the population, and
by $8,445 for the middle 90% of the population. Controlling for education, whites’ income spread is higher
than blacks’ income spread by: $5,418 for the middle 50% of the population, $9,360 for the middle 80%, and
$14,287 for the middle 90%.
25
00
02
00
00
15
00
01
00
00
50
00
0
wh
ite
0 .2 .4 .6 .8 1Quantile
17 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Shape Shifts: Skewness Shifts
Note: The used is the value at the “typical setting” i.e.
middle 100(1-2p)% of the
population
SKS (ED) SKS (WHITE)
middle 50% (p=0.25) -0.047 -0.087
middle 80% (p=0.10) -0.037 -0.085
middle 90% (p=0.05) -0.016 -0.066
One more year of schooling reduces right-skewness by 1.6% for the middle 90% of the population, 3.7% for
the middle 80% and 4.7% for the middle 50%. The impact of being white also decreases right-skewness by 6.6%
for the middle 90%, 8.5% for the middle 80% and 8.7% for the middle 50%. This finding indicates a greater
expansion of the white upper middle class than the black upper middle class.
Summary:
One more year of education induces a positive location and scale shift but a negative skewness shift. Similarly,
being white induces a positive location and scale shift with a negative skewness shift. The model suggests that
while higher education and being white are associated with a higher median income and a wider income
spread, the income distributions for the less educated and for the blacks are more skewed.
18 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Quantile Regression in SAS
Example 2:
Murders – number of murders per 1,000,000 inhabitants per annum
Inhabitants – number of inhabitants
Income – Percentage of families with incomes below $5000
Unemp – Percentage of unemployed inhabitants
PROC QUANTREG DATA = sample CI = rank;
MODEL murders = inhabitants income unemp/quantile = 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 plot =
quantplot;
TEST inhabitants income unemp/ wald lr;
RUN;
Note: If we consider all quantiles, the rank option for computing confidence intervals is not available. (You
may use only sparsity and resampling.) Likewise, it is not possible to use Wald and Likelihood Ratio Tests.
Quantile Regression Estimates for Number of Murders
0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95 Intercept -58.38 -58.38 -59.91 -39.30 -37.18 -46.68 -67.90 -86.95 -76.14 -103.34 -103.42 -104.40 -164.52
Inhabi-tants
1.96 1.96 1.88 0.72 0.63 1.22 1.86 3.28 3.05 5.07 5.07 5.03 9.41
income 0.86 0.86 1.12 1.53 1.34 1.44 1.39 1.26 1.10 1.04 1.04 1.12 1.06
unemp 4.36 4.36 4.04 2.44 2.76 2.88 5.06 5.46 5.08 5.25 5.26 5.31 5.72
An additional inhabitant will increase the median number of murders by 1.86; a unit increase in the
percentage of families with incomes below $5000 will increase the median number of murders by 1.39; a unit
increase in the percentage of unemployed inhabitants will increase the median number of murders by 5.06.
19 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Wald and Likelihood Ratio Tests
Ho: Ha: at least one parameter ≠ 0
Test Results
Quantile Test Test Statistic DF Chi-Square Pr > ChiSq
0.05 Wald 955.7538 3 955.75 <.0001
0.10 Wald 144.0549 3 144.05 <.0001
0.10 Likelihood Ratio 309.5411 3 309.54 <.0001
0.20 Wald 60.6047 3 60.60 <.0001
0.20 Likelihood Ratio 45.2893 3 45.29 <.0001
0.30 Wald 55.0154 3 55.02 <.0001
0.30 Likelihood Ratio 33.0011 3 33.00 <.0001
0.40 Wald 32.7730 3 32.77 <.0001
0.40 Likelihood Ratio 37.2190 3 37.22 <.0001
0.50 Wald 58.0711 3 58.07 <.0001
0.50 Likelihood Ratio 37.8808 3 37.88 <.0001
0.60 Wald 96.7067 3 96.71 <.0001
0.60 Likelihood Ratio 36.7406 3 36.74 <.0001
0.70 Wald 139.9484 3 139.95 <.0001
0.70 Likelihood Ratio 24.6529 3 24.65 <.0001
0.80 Wald 233.4782 3 233.48 <.0001
20 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Test Results
Quantile Test Test Statistic DF Chi-Square Pr > ChiSq
0.80 Likelihood Ratio 31.7161 3 31.72 <.0001
0.90 Wald 1267.9173 3 1267.92 <.0001
0.90 Likelihood Ratio 26.3500 3 26.35 <.0001
0.95 Wald 978.7139 3 978.71 <.0001
For all quantiles in consideration, there is sufficient evidence to conclude that the number of inhabitants, the
percentage of families with incomes below $5000, and the percentage of unemployed inhabitants are jointly
significant predictors of the number of murders.
Test for Equality of Coefficients
PROC QUANTREG DATA = sample CI = rank;
MODEL murders = inhabitants income unemp/quantile = 0.75 0.8;
TEST inhabitants income unemp/qinteract;
RUN;
Test Results Equal Coefficients
Across Quantiles
Chi-Square DF Pr > ChiSq
0.0056 3 0.9999
Thus, there is no sufficient evidence to conclude that the coefficents for the 0.75th and the 0.8th quantile
jointly differ.
21 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Shape Shifts
The effect of inhabitants on the number of murders is only significant from around the 0.5 th quantile onwards.
The effect of income on the number of murders is only significant until somewhere around the 0.7 th quantile.
The effect of the unemployment on the number of murders is only significant until somewhere around the
0.5th quantile. Thus, the lower quantiles of income and unemployment significantly affect the number of
murders while the upper quantiles of the number of inhabitants significantly affect the number of murders.
Scale Shifts
pth interquartile range
SCS(inhabitants) SCS(income) SCS(unemp)
0.25: 4.3518 -0.4872 2.8115
0.10: 3.0699 0.2602 0.9506
0.05: 7.455 0.2017 1.3628
An additional inhabitant increases the scale of the number of murders by 4.3518 for the middle 50% of the
population, by 3.0699 for the middle 80% of the population, and by 7.455 for the middle 90% of the
population. A unit increase in the percentage of families with incomes below $5000 decreases the scale of the
number of murders by 0.4872 for the middle 50% of the population, while it increases the scale of the number
of murders by 0.2602 for the middle 80% of the population, and by 0.2017 for the middle 90% of the
population. A unit increase in the percentage of unemployed inhabitants increases the scale of the number of
murders by 2.8115 for the middle 50% of the population, by 0.9506 for the middle 80% of the population, and
by 1.3628 for the middle 90% of the population.
22 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
Skewness Shifts
middle 100(1-2p)% of the population
SKS(inhabitants) SKS(income) SKS(unemp)
middle 50% (p=0.25) -0.50852 0.114957637 -0.73607
middle 80% (p=0.10) 0.100869 -0.465156696 -0.44369
middle 90% (p=0.05) 0.520159 -0.403896185 -0.36112
An additional inhabitant reduces the right-skewness by 50.9% for the middle 50% of the population, while it
increases the right-skewness by 10.1% for the middle 80% and by 52% for the middle 90%. A unit increase in
the percentage of families with incomes below $5000 increases the right-skewness by 11.5% for the middle 50%
of the population, while it reduces the right-skewness by 46.5% for the middle 80% and by 40.4% for the
middle 90%. A unit increase in the percentage of unemployed inhabitants reduces the right-skewness by 73.6%
for the middle 50% of the population, by 44.4% for the middle 80%, and by 36.1% for the middle 90%.