seminar on robust regression methods
TRANSCRIPT
![Page 1: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/1.jpg)
ROBUST REGRESSION METHOD
By,SUMON JOSE
A Seminar Presentation
Under the Guidence of Dr. Jessy John
February 24, 2015
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 1 / 69
![Page 2: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/2.jpg)
CONTENTS
1 INTRODUCTION2 REVIEW3 ROBUSTNESS & RESISTANCE4 APPROACH5 STRENGTHS & WEAKNESSES6 M- ESTIMATORS7 DELIVERY TIME PROBLEM8 ANALYSIS9 PROPERTIES10 SURVEY OF OTHER ROBUST REGRESSION
ESTIMATORS11 REFERENCE
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 2 / 69
![Page 3: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/3.jpg)
INTRODUCTION
Perfomance Evaluation- Geethu Anna Jose
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 3 / 69
![Page 4: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/4.jpg)
REVIEW
The classical linear regression model relates the
dependednt or response variables yi to independent
explanatory variables xi1, xi2, ..., xip for i = 1, .., n, such
that
yi = xTi β + εi , (1)
for i=1,...,n
where xTi = (xi1, xi2, ..., xip), εi denote the error terms and
β = (β1, β2, ..., βp)T
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 4 / 69
![Page 5: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/5.jpg)
REVIEW
The expected value of yi called the fitted value is
yi = xTi β (2)
and one can use this to calculate the residual for the i th
case,
ri = yi − yi (3)
In the case of simple linear regression model we may
calculate the value of β0 and β1 using the following
formulae:SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 5 / 69
![Page 6: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/6.jpg)
REVIEW
β1 =
∑ni=1 yixi −
∑ni=1 yi
∑ni=1 xi
n∑ni=1 x
2i −
(∑n
i=1 xi )2
n
(4)
β0 = y − β1x (5)
The vector of fitted values yi curresponding to the
observed value yi may be expressed as follows:
y = X β (6)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 6 / 69
![Page 7: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/7.jpg)
REVIEW
Limitations of Least Square Estimator
Extremly sensitive to deviations from the model
assumptions (as normal distribution is assumed for the
errors).
Drastically changed by the effect of outliers.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 7 / 69
![Page 8: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/8.jpg)
REVIEW
What About Deleting Outliers Before Analysis
All the Outliers need not be erroneous data, they
could be exceptional occurances
Some of such Outliers could be the result of some
factors not considered in the current study
So in general, unusual observations are not always bad
observations. Moreover in large data it is often very
difficult to spot out the outlying data.SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 8 / 69
![Page 9: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/9.jpg)
ROBUSTNESS AND RESISTANCE
Resistant Regression Estimators
Definition
The Resistant regression estimators are primarily
concerned with robustness of validity: meaning that their
main concern is to prevent unsual observations from
affecting the estimates produced.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 9 / 69
![Page 10: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/10.jpg)
ROBUSTNESS AND RESISTANCE
Robust Regression Estimators
Definition
They are concerned with both robustness of efficiency and
robustness of validity, meaning that they should also
maintain a small sampling variance, even when the data
does not fit the assumed distribution.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 10 / 69
![Page 11: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/11.jpg)
ROBUSTNESS AND RESISTANCE
⇒ In general Robust regression estimators aim to fit
a model that describes the majority of a sample.
⇒ Their robustness is achieved by giving the data
different weights
⇒ Whereas in Least Square Approximation all data
are treated equally.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 11 / 69
![Page 12: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/12.jpg)
APPROACH
Robust Estimation methods are powerful tools in
detection of outliers in complicated data sets.
But unless the data is very well behaved, different
estimators would give different estimates.
On their own, they do not provide a final model.
A healthy approach would be to employ both robust
regression methods as well as least square method to
compare the results.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 12 / 69
![Page 13: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/13.jpg)
STRENGTHS & WEAKNESSES
Finite Sample Breakdown Point
Definition
Breakdown Point (BDP) is the measure of the resistance
of an estimator. The BDP of a regression estimator is the
smallest fraction of contamination that can cause the
estimator to ’breakdown’ and no longer represent the
trend of the data.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 13 / 69
![Page 14: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/14.jpg)
STRENGTHS & WEAKNESSES
When an estimator breaks down, the estimate it produces
from the contaminated data can become arbitrarily far
from the estimate than it would give when the data was
uncontaminated.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 14 / 69
![Page 15: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/15.jpg)
STRENGTHS & WEAKNESSES
In order to describe the BDP mathematically, define T as
a regression estimator, Z as a sample of n data points and
T (Z ) = β. Let Z′
be the corrupted sample where m of
the original data points are replaced with arbitrary values.
The maximum effect that could be caused by such
contamination is
effect(m; T ,Z ) = supz ′ |T (Z′)− T (Z )| (7)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 15 / 69
![Page 16: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/16.jpg)
STRENGTHS & WEAKNESS
When (7) is infinite, an outlier can have an arbitrarily
large effect on T . The BDP of T at the sample Z is
therefore defined as:
BDP(T ,Z ) = min{m
n: effect(M ; T ,Z )is infinite} (8)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 16 / 69
![Page 17: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/17.jpg)
STRENGTH & WEAKNESSES
The Least Square Method estimator for example has a
breakdown point of 1n because just one leverage point can
cause it to breakdown. As the number of data increases,
the breakdown point tends to 0 and so it is said to that
the least squares estimator has BDP 0%.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 17 / 69
![Page 18: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/18.jpg)
STRENGTH & WEAKNESS
Remark
The highest breakdown point one can hope for is 50% as
if more than half the data is contaminated that one
cannot differentiate between ’good’ and ’bad’ data.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 18 / 69
![Page 19: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/19.jpg)
STRENGTH & WEAKNESSES
Relative Efficiency of an Estimator
Definition
The efficiency of an estimator for a particular parameter is
defined as the ratio of its minimum possible variance to
its actual variance. Strictly, an estimator is considered
’efficient’ when this ratio is one.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 19 / 69
![Page 20: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/20.jpg)
STRENGTH & WEAKNESSES
High efficiency is crucial for an estimator if the intention
is to use an estimate from sample data to make inference
about the larger population from which the same was
drawn.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 20 / 69
![Page 21: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/21.jpg)
STRENGTH & WEAKNESSES
Relative Efficiency
Relative efficiency compares the efficiency of an
estimator to that of a well known method.
In the context of regression, estimators are compared
to the least squares estimator which is the most
efficient estimator known.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 21 / 69
![Page 22: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/22.jpg)
STRENGTH & WEAKNESSES
Given two estimators T1 and T2 for a population
parameter β, where T1 is the most efficient estimator
possible and T2 is less efficient, the relative efficiency of
T2 is calculated as the ratio of its mean squared error to
the mean squared error of T1
Efficiency(T1,T2) =E [(T1 − β)2]
E [(T2 − β)2](9)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 22 / 69
![Page 23: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/23.jpg)
M-ESTIMATORS
Introduction
1 Were first proposed by Huber(1973)
2 But the early ones had the weakness in terms of one
or more of the desired properties
3 From them developed the modern means
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 23 / 69
![Page 24: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/24.jpg)
M-ESTIMATORS
Maximum Likelihood Type Estimators
M-estimation is based on the idea that while we still want
a maximum likelihood estimator, the errrors might be
better represented by a different, heavier tailed
distribution.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 24 / 69
![Page 25: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/25.jpg)
M-ESTIMATORS
If the probability distribution function of the error of f (εi),
then the maximum likelihood estimator for β is that
which maximizes the likelihood function
n∏i=1
f (εi) =n∏
i=1
f (yi − xTi β) (10)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 25 / 69
![Page 26: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/26.jpg)
M-ESTIMATORS
This means, it also maximizes the log-likelihood function
n∑i=1
ln f (εi) =n∑
i=1
ln f (yi − xTi β) (11)
When the errrors are normally distributed it has been
shown that this leads to minimising the sum of squared
residuals, which is the ordinary least square method.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 26 / 69
![Page 27: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/27.jpg)
M-ESTIMATORS
Assuming the the errors are differently distributed, leads to
the maximum likelihood esimator, minimising a different
function. Using this idea, an M-estimator β minimizes
n∑i=1
ρ(εi) =n∑
i=1
ρ(yi − xTi β) (12)
where ρ(u) is a continuous, symmetric function called the
objectve function with a unique minimum at 0.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 27 / 69
![Page 28: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/28.jpg)
M-ESTIMATORS
1 Knowing the appropriate ρ(u) to use requires
knowledge of how the errors are really distributed.
2 Functions are usually chosen through consideration of
how the resulting estimator down-weights the larger
residuals
3 A Robust M-estimator achieves this by minimizing the
sum of a less rapidly increasing objective function than
the ρ(u) = u2 of the least squares
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 28 / 69
![Page 29: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/29.jpg)
M-ESTIMATORS
Constructing a Scale Equivariant Estimator
The M-estimator is not necessarily scale invariant i.e. if
the errors yi − xTi β were multiplied by a constant, the
new solution to the above equation might not be the
same as the scaled version of the old one.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 29 / 69
![Page 30: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/30.jpg)
M-ESTIMATORS
To obtain a scale invariant version of this estimator we
usually solve,
n∑i=1
ρ(εis
) =n∑
i=1
ρ(yi − xT
i β
s) (13)
where s is a robust estimate of scale.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 30 / 69
![Page 31: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/31.jpg)
M-ESTIMATORS
A popular choice for s is the re-scaled median absolute
deivation
s = 1.4826XMAD (14)
where MAD is the Median Absolute Deviation
MAD = Median|yi − xTi β| = Median|εi | (15)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 31 / 69
![Page 32: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/32.jpg)
M-ESTIMATORS
’s’ is highly resistant to outlying observations, with BDP
50%, as it is based on the median rather than the mean.
The estimator rescales MAD by the factor 1.4826 so that
when the sample is large and εi really distributed as
N(0, σ2)), s estimates the standard deviation.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 32 / 69
![Page 33: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/33.jpg)
M-ESTIMATORS
With a large sample and εi ∼ N(0, σ2):
P(|εi | < MAD) ≈ 0.5
⇒ P(|εi−0σ | <
MADσ ) ≈ 0.5
⇒ P(|Z | < MADσ ) ≈ 0.5
⇒ MADσ ≈ Φ−1(0.75)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 33 / 69
![Page 34: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/34.jpg)
M-ESTIMATORS
⇒ MADΦ−1 ≈ σ
1.4826 X MAD ≈ σ
Thus the tuning constant 1.4826 makes s an
approximately unbiased estimator of σ if n is large and the
error distribution is normal.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 34 / 69
![Page 35: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/35.jpg)
M-ESTIMATORS
Finding an M-Estimator
To obtain an M-estimate we solve,
Minimizeβ
n∑i=1
ρ(εis
) = Minimizeβ
n∑i=1
ρ(yi − x
′
iβ
s) (16)
For that we equate the first partial derivatives of ρ with
respect to βj (j=0,1,2,3,...,k) to zero, yielding a necessary
condition for a minimum.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 35 / 69
![Page 36: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/36.jpg)
M-ESTIMATORS
This gives a system of p = k + 1 equations
n∑i=1
Xijψ(yi − x
′
iβ
s) = 0, j = 0, 1, 2, ..., k (17)
where ψ = ρ′ and Xij is the i th observation on the j th
regressor and xi0 = 1. In general ψ is a non-linear
function and so equation (17) must be solved iteratively.
The most widely used method to find this is the method
of iteratively reweighted least squares.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 36 / 69
![Page 37: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/37.jpg)
M-ESTIMATORS
To use iteratively reweighted least squares suppose that aninitial estimate of β0 is available and that s is an estimateof the scale. Then we write the p = k + 1 equations as:
n∑i=1
Xijψ(yi − x
′i β
s) =
n∑i=1
xij{ψ[(yi − x ′iβ)/s]/(yi − x ′iβ)/s}(yi − x ′iβ)
s= 0
(18)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 37 / 69
![Page 38: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/38.jpg)
M-ESTIMATORS
as
n∑i=1
XijW0i (yi − xiβ) = 0, j = 0, 1, 2, ..., k (19)
where
W 0i =
ψ[
(yi−x′i β)
s ]
(yi−x′iβ)
s
if yi 6= x′
i β0
1 if yi = x′
i β0
(20)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 38 / 69
![Page 39: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/39.jpg)
M-ESTIMATORS
We may write the above equation in matrix form as
follows:
X′W 0Xβ = X
′W 0y (21)
where W0 is an n X n diagonal matrix of weights with
diagonal elements given by the expression
W 0i =
ψ[
(yi−x′i β)
s ]
(yi−x′iβ)
s
if yi 6= x′
i β0
1 if yi = x′
i β0
(22)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 39 / 69
![Page 40: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/40.jpg)
M-ESTIMATORS
From the matrix form we realize that the expression is
same as that of the usual weighted least squares normal
equation. Consequently the one step estimator is
β1 = (X′W 0X )−1X
′W 0y (23)
At the next step we recompute the weights from the
equation for W but using β1 and not β0
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 40 / 69
![Page 41: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/41.jpg)
M-ESTIMATORS
NOTE:
Usually only a few iterations are required to obtain
convergence
It could be easily be implemented by a computer
programme.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 41 / 69
![Page 42: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/42.jpg)
M-ESTIMATORS
Re-Descending Estimators
Re- descending M estimators are those which have
influence functions that are non decreasing near the origin
but decreasing towards zero far from the origin.
Their ψ can be chosen to redescend smoothly to zero, so
that they usually satisfy ψ(x) = 0 for all |x | > r where r
is referred to as the minimum rejection point.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 42 / 69
![Page 43: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/43.jpg)
M-ESTIMATORS
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 43 / 69
![Page 44: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/44.jpg)
M-ESTIMATORS
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 44 / 69
![Page 45: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/45.jpg)
M-ESTIMATORS
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 45 / 69
![Page 46: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/46.jpg)
M-ESTIMATORS
Robust Criterion Functions
Citerion ρ ψ(z) w(x) range
Least
Squares z2
2 z 1.0 |z | <∞Huber’s
t-function z2
2 z 1.0 |z | < t
t = 2 |z |t − t2
2 tsign(z) t|z | |x | > t
Andrew’s
Wave function a(1− cos(za)) sin(za)sin( z
a )za|z | ≤ aπ
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 46 / 69
![Page 47: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/47.jpg)
DELIVERY TIME PROBLEM
ProblemA Softdrink bottler is analyzing the vending machine service routes in hisdistriution system. He is interested in predicting the amount of timerequired by the route driver to service the vending machines in an outlet.This service activity includes stocking the machine with beverage productsand minor maintenance or housekeeping. The industrial engineerresponsible for the study has suggested that the two most importantvariables affecting the delivery time (y) are the numer of cases of productstocked (x1) and the distance walked by the route driver (x2). Theengineer has collected 25 observations on delivery time, which are shownin the following table. Fit a regression model into it.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 47 / 69
![Page 48: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/48.jpg)
DELIVERY TIME PROBLEM
Table of DataObservation Delivery time Number of cases Distance in Feets
i (in minutes) y x1 x21 16.8 7 5602 11.50 3 3203 12.03 3 3404 14.88 4 805 13.75 6 1506 18.11 7 3307 8 2 1108 17.83 7 2109 79.24 30 1460
10 21.50 5 60511 40.33 16 68812 21 10 21513 13.50 4 255
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 48 / 69
![Page 49: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/49.jpg)
DELIVERY TIME PROBLEM
Observation Delivery time Number of cases Distance in Feets(in minutes) y x1 x2
14 19.75 6 46215 24.00 9 44816 29.00 10 77617 15.35 6 20018 19.00 7 13219 9.50 3 3620 35.10 17 77021 17.90 10 14022 52.32 26 81023 18.75 9 45024 19.83 8 63525 10.75 4 150
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 49 / 69
![Page 50: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/50.jpg)
DELIVERY TIME PROBLEM
Least Square Fit of the Delivery Time DataObs. yi yi ei Weight
1 .166800E+02 .217081E+02 -.502808E+01 .100000E+012 0115000E+02 .103536E+02 .114639E+01 .100000E+013 .120300E+02 .120798E+02 -.497937E-01 .100000E+014 .148800E+02 .995565E+01 .492435E+01 .100000E+015 .137500E+02 .141944E+02 -.444398E+00 .100000E+016 .181100E+02 .183996E+02 -.289574E+00 .100000E+017 .800000E+01 .715538E+01 .844624E+00 .100000E+018 .178300E+02 .166734E+02 .115660E+02 .100000E+019 .792400E+02 .718203E+02 .741971E+01 .100000E+01
10 .215000E+02 .191236E+02 .237641E+01 .100000E+0111 .403300E+02 .380925E+02 .223749E+01 .100000E+0112 .2100000E+02 .215930E+02 -.593041E+00 .100000E+0113 .135000E+02 .124730E+02 .102701E+01 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 50 / 69
![Page 51: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/51.jpg)
DELIVERY TIME PROBLEM
Obs. yi yi ei Weight
14 .197500E+02 .186825E+02 .106754E+01 .100000E+0115 .240000E+02 .233288E+02 .671202E+00 .100000E+0116 .290000E+02 .296629E+02 -.662928E+00 .100000E+0117 .153500E+02 .149136E+02 .436360E+00 .100000E+0118 .190000E+02 .155514E+02 .344862E+01 .100000E+0119 .950000E+01 .770681E+01 .179319E+01 .100000E+0120 .351000E+02 .408880E+02 -.578797E+01 .100000E+0121 .179000E+02 .205142E+02 -.261418E+01 .100000E+0122 .523200E+02 .560065E+02 -.368653E+01 .100000E+0123 .187500E+02 .233576E+02 -.460757E+01 .100000E+0124 .198300E+02 .244029E+02 -.457285E+01 .100000E+0125 .107500E+02 .109626E+02 -.212584E+00 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 51 / 69
![Page 52: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/52.jpg)
DELIVERY TIME PROBLEM
Accordingly we have the following values for the
parameters:
β0 = 2.3412
β1 = 1.6159
β2 = 0.014385 Thus we have the regression line as
follows:
yi = 2.3412 + 1.6159x1 + 0.014385x2 (24)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 52 / 69
![Page 53: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/53.jpg)
DELIVERY TIME PROBLEM
Huber’s t-Function, t=2Obs. yi yi ei Weight
1 .166800E+02 .217651E+02 -.508511E+01 .639744E+002 .115000E+02 .109809E+02 .519115E+00 .100000E+013 .120300E+02 .126296E+02 -.599594E+00 .100000E+014 .148800E+02 .105856E+02 .429439E+01 .757165E+005 .137500E+02 .146038E+02 -.853800E+00 .100000E+016 .181100E+02 .186051E+02 -.495085E+00 .100000E+017 .800000E+01 .794135E+01 .586521E-01 .100000E+018 .178300E+02 .169564E+02 .873625E+00 .100000E+019 .792400E+02 .692795E+02 .996050E+01 .327017E+00
10 .215000E+02 .193269E+02 .217307E+01 .100000E+0111 .403300E+02 .372777E+02 .305228E+01 .100000E+0112 .210000E+02 .216097E+02 -.609734E+00 .100000E+0113 .135000E+02 .129900E+02 .510021E+00 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 53 / 69
![Page 54: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/54.jpg)
DELIVERY TIME PROBLEM
Obs. yi yi ei Weighti
14 .197500E+02 .188904E+02 .859556E+00 .100000E+0115 .240000E+02 .232828E+02 .717244E+00 .100000E+0116 .290000E+02 .293174E+02 -.317449E+00 .100000E+0117 .153500E+02 .152908E+02 .592377E-01 .100000E+0118 .190000E+02 .158847E+02 .311529E+01 .100000E+0119 .950000E+01 .845286E+01 .104714E+01 .100000E+0120 .351000E+02 .399326E+02 -.483256E+01 .672828E+0021 .179000E+02 .205793E+02 -.267929E+01 .100000E+0122 .523200E+02 .542361E+02 -.191611E+01 .100000E+0123 .187500E+02 .233102E+02 -.456023E+01 .713481E+0024 .198300E+02 .243238E+02 .449377E+01 .723794E+0025 .107500E+02 .115474E+02 -.797359E+00 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 54 / 69
![Page 55: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/55.jpg)
DELIVERY TIME PROBLEM
Accordingly we get the values of the parameters as
follows: β0 = 3.3736
β1 = 1.5282
β2 = 0.013739
Thus we get the regression line as follows:
yi = 3.3736 + 1.5282x1 + 0.013739x2 (25)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 55 / 69
![Page 56: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/56.jpg)
DELIVERY TIME PROBLEM
Andrew’s Wave Function with a = 1.48Obs. yi yi ei Weight
i
1 .166800E+02 .216430E+02 -.496300E+01 .427594E+002 .115000E+02 .116923E+02 -.192338E+00 .998944E+003 .120300E+02 .131457E+02 .-.111570E+01 .964551E+004 .148800E+02 .114549E+02 .342506E+01 .694894E+005 .137500E+02 .152191E+02 -.146914E+01 .939284E+006 .181100E+01 .188574E+02 -.747381E+00 .984039E+007 .800000E+01 .890189E+01 .901888E+00 .976864E+008 .178300E+02 ..174040E+02 ..425984E+00 .994747E+009 .792400E+02 .660818E+02 .131582E+02 .0
10 .215000E+02 .192716E+02 .222839E+01 .863633E+0011 .403300E+02 .363170E+02 .401296E+01 .597491E+0012 .210000E+02 .218392E+02 -.839167E+00 .980003E+0013 .135000E02 .135744E+02 -.744338E+01 .999843E+00
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 56 / 69
![Page 57: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/57.jpg)
DELIVERY TIME PROBLEM
Obs. yi yi ei Weighti
14 .197500E+02 .198979E+02 .752115E+00 .983877E+0015 .240000E+02 .232029E+02 .797080E+00 .981854E+0016 ..290000E+02 .286336E+02 .366350E+00 .996228E+0017 .153500E+02 .158247E+02 -.474704E+00 .993580E+0018 .190000E+02 .164593E+02 .254067E+01 .824146E+0019 .950000E+01 .946384E+01 .361558E-01 .999936E+0020 .351000E+02 .387684E+02 -.366837E+01 .655336E+0021 .179000E+02 .209308E+02 -.303081E+01 .756603E+0022 .523200E+02 .523766E+02 -.566063E-01 .999908E+0023 .187500E+02 .232271E+02 .-.447714E+01 .515506E+0024 .198300E+02 .240095E+02 -.417955E+01 .567792E+0025 .107500E+02 .123027E+02 -1.55274E+01 .932266E+00
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 57 / 69
![Page 58: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/58.jpg)
DELIVERY TIME PROBLEM
Thus we have the estimates as follows:
β0 = 4.6532
β1 = 1.4582
β2 = 0.012111
Thus we get the regression line as follows:
yi = 4.6532 + 1.4582x1 + 0.012111x2 (26)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 58 / 69
![Page 59: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/59.jpg)
ANALYSIS
Computing M-Estimators
Robust regression methods are not an option in most
statistical software today.
SAS, PROC, NLIN etc can be used to implement
iteratively reweighted least squares procedure.
There are also Robust procedures available in S-Pluz.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 59 / 69
![Page 60: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/60.jpg)
ANALYSIS
Robust Regression Methods...
Robust regression methods have much to offer a data
analyst.
They will be extremly helpful in locating outliers and
hightly influential observations.
Whenever a least squares analysis is perfomed it would
be useful to perform a robust fit also.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 60 / 69
![Page 61: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/61.jpg)
ANALYSIS
If the results of both the fit are in substantial
agreement, the use of Least Square Procedure offers a
good estimation of the parameters.
If the results of both the procedures are not in
agreement, the reason for the difference should be
identified and corrected.
Special attention need to be given to observations
that are down weighted in the robust fit.SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 61 / 69
![Page 62: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/62.jpg)
PROPERTIES
Breakdown Point The finite sample breakdown point is
the smallest fraction of anomalous data that can cause the
estimator to be useless. The smallest possible breakdown
poit is 1n , i.e. s single observation can distort the estimator
so badly that it is of no practical use to the regression
model builder. The breakdown point of OLS is 1n .
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 62 / 69
![Page 63: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/63.jpg)
PROPERTIES
M-estimators can be affected by x-space outliers in an
identical manner to OLS.
Consequently, the breakdown point of the class of m
estimators is 1n as well.
We would generally want the breakdown point of an
estimator to exceed 10%.
This has led to the development of High Breakdown
point estimators.SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 63 / 69
![Page 64: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/64.jpg)
PROPERTIES
Efficiency
The M estimators have a higher efficiency than the least
squares, i.e. they behave well even as the size of the
sample increases to ∞.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 64 / 69
![Page 65: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/65.jpg)
SURVEY OF OTHER ROBUSTREGRESSION ESTIMATORS
High Break Down Point Estimators Because both the
OLS and M-estimator suffer from a low breakdown point
1n , considerable effort has been devoted to finding
estimators that perform better with respect to this
property. Often a break down point of 50% is desirable.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 65 / 69
![Page 66: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/66.jpg)
SURVEY OF OTHER ROBUSTREGRESSION ESTIMATORS
There are various other estimation procedures like
Least Median of Squares
Least Trimmed Sum of Squres
S Estimators
R and L Estimators
Robust Ridge regression
MM Estimation etc.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 66 / 69
![Page 67: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/67.jpg)
ABSTRACT & CONCLUSION
Review ⇒ Robustness and Resistance ⇒Our Approach ⇒ Strengths and Weaknesses
⇒ M-Estimators ⇒ Delivery time
problem ⇒ Analysis ⇒ Properties ⇒Survey of other Robust Regression Estimators
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 67 / 69
![Page 68: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/68.jpg)
REFERENCE
1 Draper, R Norman. & Smith, Harry. “Applied Regression
Analysis”, 3rd edn., John Wiley and Sons, New York, 1998.
2 Montgomery, C Douglas. Peck, A Elizabeth. & Vining, Geoffrey
G. “Introduction to Linear Regression Analysis”, 3rd edn., Wiley
India, 2003.
3 Brook J, Richard. “Applied Regression Analysis and
Experimental Design”, Chapman & Hall, London, 1985.
4 Rawlings O, John. “Applied Regression Analysis: A Research
Tool”, Springer, New York, 1989.
5 Pedhazar, Elazar J. “Multiple Regression in Behavioural Research:
Explanation and Prediction”, Wadsworth, Australia, 1997SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 68 / 69
![Page 69: Seminar on Robust Regression Methods](https://reader033.vdocuments.site/reader033/viewer/2022052302/58740ec01a28ab6f1d8b8075/html5/thumbnails/69.jpg)
THANK YOU
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 69 / 69