different kind of distance and statistical distance
TRANSCRIPT
![Page 1: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/1.jpg)
WELCOME TO MY PRESENTATION
ON STATISTICAL DISTANCE
![Page 2: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/2.jpg)
Md. Menhazul AbedinM.Sc. Student
Dept. of StatisticsRajshahi UniversityMob: 01751385142
Email: [email protected]
![Page 3: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/3.jpg)
Objectives
• To know about the meaning of statistical distance and it’s relation and difference with general or Euclidean distance
![Page 4: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/4.jpg)
Content Definition of Euclidean distance Concept & intuition of statistical distance Definition of Statistical distance Necessity of statistical distance Concept of Mahalanobis distance (population
&sample) Distribution of Mahalanobis distance Mahalanobis distance in RAcknowledgement
![Page 5: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/5.jpg)
Euclidean Distance from origin
(0,0)
(X,Y)
X
Y
![Page 6: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/6.jpg)
Euclidean Distance
P(X,Y) Y O (0,0) X By Pythagoras =
![Page 7: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/7.jpg)
Euclidean Distance
Specific point
![Page 8: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/8.jpg)
![Page 9: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/9.jpg)
we see that two specific points in each picture
Our problem is to determine the length between two points .
But how ??????????
Assume that these pictures are placed in two dimensional spaces and points are joined by a straight line
![Page 10: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/10.jpg)
Let 1st point is (,) and 2nd point is () then distance is
D= )
What will be happen when dimension is three
![Page 11: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/11.jpg)
Distanse in
![Page 12: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/12.jpg)
Distance is given by
• Points are (x1,x2,x3) and (y1,y2,y3)
![Page 13: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/13.jpg)
For n dimension it can be written as the following expression and
named as Euclidian distance
2222
211
2121
)()()(),(
),,,(),,,,(
pp
pp
yxyxyxQPd
yyyQxxxP
![Page 14: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/14.jpg)
05/01/2023 14
Properties of Euclidean Distance and Mathematical Distance
• Usual human concept of distance is Eucl. Dist.• Each coordinate contributes equally to the distance
2222
211
2121
)()()(),(
),,,(),,,,(
pp
pp
yxyxyxQPd
yyyQxxxP
14
Mathematicians, generalizing its three properties ,
1) d(P,Q)=d(Q,P).
2) d(P,Q)=0 if and only if P=Q and
3) d(P,Q)=<d(P,R)+d(R,Q) for all R, define distance
on any set.
![Page 15: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/15.jpg)
P(X1,Y1) Q(X2,Y2)
R(Z1,Z2))
R(Z1,Z2)
![Page 16: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/16.jpg)
Taxicab Distance :Notion Red: Manhattan distance.
Green: diagonal, straight-
line distance
Blue, yellow: equivalent Manhattan distances.
![Page 17: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/17.jpg)
• The Manhattan distance is the simple sum of the horizontal and vertical components, whereas
the diagonal distance might be computed by applying the Pythagorean Theorem .
![Page 18: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/18.jpg)
• Red: Manhattan distance.• Green: diagonal, straight-line distance.• Blue, yellow: equivalent Manhattan distances.
![Page 19: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/19.jpg)
• Manhattan distance 12 unit
• Diagonal or straight-line distance or Euclidean distance is =6 We observe that Euclidean distance is less than Manhattan distance
![Page 20: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/20.jpg)
Taxicab/Manhattan distance :Definition
(p1,p2))
(q1,q2)│𝑝1−𝑞2│
│p2-q2│
![Page 21: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/21.jpg)
Manhattan Distance
• The taxicab distance between (p1,p2) and (q1,q2) is │p1-q1│+│p2-q2│
![Page 22: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/22.jpg)
Relationship between Manhattan & Euclidean distance.
7 Block
6 Block
![Page 23: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/23.jpg)
Relationship between Manhattan & Euclidean distance.
• It now seems that the distance from A to C is 7 blocks, while the distance from A to B is 6 blocks.
• Unless we choose to go off-road, B is now closer to A than C.
• Taxicab distance is sometimes equal to Euclidean distance, but otherwise it is greater than Euclidean distance.
Euclidean distance <Taxicab distanceIs it true always ???Or for n dimension ???
![Page 24: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/24.jpg)
Proof……..
Absolute values guarantee non-negative value
Addition property of inequality
![Page 25: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/25.jpg)
Continued………..
![Page 26: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/26.jpg)
Continued………..
![Page 27: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/27.jpg)
For high dimension
• It holds for high dimensional case • Σ │ Σ │ + 2Σ│Which implies Σ││
![Page 28: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/28.jpg)
05/01/2023
Statistical Distance• Weight coordinates subject to a great deal of
variability less heavily than those that are not highly variable
Who is nearer to
data set if it were
point?
Same distance from
origin
![Page 29: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/29.jpg)
• Here
variability in x1 axis > variability in x2 axis Is the same distance meaningful from
origin ??? Ans: noBut, how we take into account the different variability ????Ans : Give different weights on axes.
![Page 30: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/30.jpg)
05/01/2023
Statistical Distance for Uncorrelated Data
22
22
11
212*
22*
1
222*2111
*1
21
),(
/,/
)0,0(),,(
sx
sxxxPOd
sxxsxx
OxxP
weight
Standardization
![Page 31: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/31.jpg)
all point that have coordinates (x1,x2) and are a constant squared distance , from the origin must satisfy =But … how to choose c ????? It’s a problem Choose c as 95% observation fall in this area ….
= >
![Page 32: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/32.jpg)
05/01/2023
Ellipse of Constant Statistical Distance for Uncorrelated Data
11sc 11sc
22sc
22sc
x1
x2
0
![Page 33: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/33.jpg)
• This expression can be generalized as ……… statistical distance from an arbitrary point P=(x1,x2) to any fixed point Q=(y1,y2)
;lk;lk; For P dimension……………..
![Page 34: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/34.jpg)
Remark : 1) The distance of P to the origin O is obtain by setting all 2) If all are equal Euclidean distance formula is appropriate
![Page 35: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/35.jpg)
Scattered Plot for Correlated Measurements
![Page 36: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/36.jpg)
• How do you measure the statistical distance of the above data set ??????
• Ans : Firstly make it uncorrelated .
• But why and how………???????
• Ans: Rotate the axis keeping origin fixed.
![Page 37: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/37.jpg)
05/01/2023
Scattered Plot for Correlated Measurements
![Page 38: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/38.jpg)
Rotation of axes keeping origin fixed
O M R X1
N Q
~𝑥1
P(x1,x2)x2
~𝑥2
𝜃
𝜃
![Page 39: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/39.jpg)
x=OM =OR-MR =cos– sin…. (i) y=MP =QR+NP = sin cos……….(ii)
![Page 40: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/40.jpg)
• The solution of the above equations
![Page 41: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/41.jpg)
Choice of
What will you choice ? How will you do it ?
Data matrix → Centeralized data matrix → Covariance of data matrix → Eigen vector
Theta = angle between 1st eigen vector and [1,0] or angle between 2nd eigen vector and [0,1]
![Page 42: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/42.jpg)
Why is that angle between 1st eigen vector and [0,1] or angle between 2nd eigen vector and [1,0] ?? Ans: Let B be a (p by p) positive definite matrix with eigenvalues λ1λ2λ3λp> and associated normalized eigenvectors .Then attained when x= attained when x=
![Page 43: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/43.jpg)
attained when x=
![Page 44: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/44.jpg)
Choice of #### Excercise 16.page(309).Heights in inches (x) & Weights in pounds(y). An Introduction to Statistics and Probability M.Nurul Islam ####### x=c(60,60,60,60,62,62,62,64,64,64,66,66,66,66,68,68,68,70,70,70);xy=c(115,120,130,125,130,140,120,135,130,145,135,170,140,155,150,160,175,180,160,175);y ############V=eigen(cov(cdata))$vectors;Vas.matrix(cdata)%*%Vplot(x,y)
![Page 45: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/45.jpg)
data=data.frame(x,y);dataas.matrix(data)colMeans(data)xmv=c(rep(64.8,20));xmv ### x mean vector ymv=c(rep(144.5,20));ymv ### y mean vector meanmatrix=cbind(xmv,ymv);meanmatrixcdata=data-meanmatrix;cdata ### mean centred data plot(cdata) abline(h=0,v=0)
cor(cdata)
![Page 46: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/46.jpg)
• ##################
cov(cdata)
eigen(cov( cdata))
xx1=c(1,0);xx1
xx2=c(0,1);xx2
vv1=eigen(cov(cdata))$vectors[,1];vv1
vv2=eigen(cov(cdata))$vectors[,2];vv2
![Page 47: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/47.jpg)
################theta = acos( sum(xx1*vv1) / ( sqrt(sum(xx1 * xx1)) * sqrt(sum(vv1 * vv1)) ) );theta
theta = acos( sum(xx2*vv2) / ( sqrt(sum(xx2 * xx2)) * sqrt(sum(vv2 * vv2)) ) );theta
###############xx=cdata[,1]*cos( 1.41784)+cdata[,2]*sin( 1.41784);xxyy=-cdata[,1]*sin( 1.41784)+cdata[,2]*cos( 1.41784);yyplot(xx,yy)abline(h=0,v=0)
![Page 48: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/48.jpg)
V=eigen(cov(cdata))$vectors;Vtdata=as.matrix(cdata)%*%V;tdata ### transformed datacov(tdata)round(cov(tdata),14)cor(tdata)plot(tdata)abline(h=0,v=0)round(cor(tdata),16)
![Page 49: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/49.jpg)
• ################ comparison of both method ############
comparison=tdata - as.matrix(cbind(xx,yy));comparisonround(comparison,4)
![Page 50: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/50.jpg)
########### using package. md from original data #####
md=mahalanobis(data,colMeans(data),cov(data),inverted =F);md ## md =mahalanobis distance
######## mahalanobis distance from transformed data ######## tmd=mahalanobis(tdata,colMeans(tdata),cov(tdata),inverted =F);tmd
###### comparison ############ md-tmd
![Page 51: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/51.jpg)
Mahalanobis distance : Manually mu=colMeans(tdata);muincov=solve(cov(tdata));incovmd1=t(tdata[1,]-mu)%*%incov%*%(tdata[1,]-mu);md1md2=t(tdata[2,]-mu)%*%incov%*%(tdata[2,]-mu);md2md3=t(tdata[3,]-mu)%*%incov%*%(tdata[3,]-mu);md3............. ……………. ………….. md20=t(tdata[20,]-mu)%*%incov%*%(tdata[20,]-mu);md20md for package and manully are equal
![Page 52: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/52.jpg)
tdatas1=sd(tdata[,1]);s1s2=sd(tdata[,2]);s2xstar=c(tdata[,1])/s1;xstarystar=c(tdata[,2])/s2;ystar
md1=sqrt((-1.46787309)^2 + (0.1484462)^2);md1md2=sqrt((-1.22516896 )^2 + ( 0.6020111 )^2);md2………. ………… ……………..Not equal to above distances……..Why ???????Take into account mean
![Page 53: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/53.jpg)
05/01/2023
Statistical Distance under Rotated Coordinate System
22222112
2111
212
211
22
22
11
21
21
2),(
cossin~sincos~~~
~~
),(
)~,~(),0,0(
xaxxaxaPOd
xxxxxxsx
sxPOd
xxPO
are sample variances
![Page 54: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/54.jpg)
• After some manipulation this can be written in terms of origin variables
Whereas
![Page 55: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/55.jpg)
Proof…………• = =
= + 2 + = = - 2 +
![Page 56: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/56.jpg)
Continued………….
=
![Page 57: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/57.jpg)
Continued………….
![Page 58: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/58.jpg)
05/01/2023
General Statistical Distance
)])((2))((2))((2
)(
)()([
),(
]222
[),(
),,,(),0,,0,0(),,,,(
11,1
331113221112
2
22222
21111
1,131132112
22222
2111
2121
pppppp
pppp
pppp
ppp
pp
yxyxayxyxayxyxa
yxa
yxayxa
QPd
xxaxxaxxa
xaxaxaPOd
yyyQOxxxP
![Page 59: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/59.jpg)
• The above distances are completely determined by the coefficients(weights) These are can be arranged in rectangular array as
this array (matrix) must be symmetric positive definite.
![Page 60: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/60.jpg)
Why Positive definite ???? Let A be a positive definite matrix .
A=C’C X’AX= X’C’CX = (CX)’(CX) = Y’Y It obeys all the distance property. X’AX is distance ,For different A it gives different distance .
![Page 61: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/61.jpg)
• Why positive definite matrix ????????• Ans: Spectral decomposition : the spectral
decomposition of a kk symmetric matrix A is given by
• Where are pair of eigenvalues and eigenvectors.
And And if pd & invertible .
![Page 62: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/62.jpg)
4.0 4.5 5.0 5.5 6.02
3
4
5
λ1λ2
𝑒1
𝑒2
![Page 63: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/63.jpg)
• Suppose p=2. The distance from origin is
By spectral decomposition
X1
X2𝐶√ λ1
𝐶√ λ2
![Page 64: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/64.jpg)
Another property is
Thus
We use this property in Mahalanobis distance
![Page 65: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/65.jpg)
05/01/2023
Necessity of Statistical Distance
Center of gravity
Another point
![Page 66: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/66.jpg)
• Consider the Euclidean distances from the point Q to the points P and the origin O.
• Obviously d(PQ) > d (QO )
But, P appears to be more like the points in the cluster than does the origin .
If we take into account the variability of the points in cluster and measure distance by statistical distance , then Q will be closer to P than O .
![Page 67: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/67.jpg)
Mahalanobis distance
• The Mahalanobis distance is a descriptive statistic that provides a relative measure of a data point's distance from a common point. It is a unitless measure introduced by P. C. Mahalanobis in 1936
![Page 68: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/68.jpg)
Intuition of Mahalanobis Distance • Recall the eqution
d(O,P)= => = Where x= , A=
![Page 69: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/69.jpg)
Intuition of Mahalanobis Distance
d(O,P)= Where ; A=
![Page 70: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/70.jpg)
Intuition of Mahalanobis Distance
where, A=
![Page 71: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/71.jpg)
Mahalanobis Distance
• Mahalanobis used ,inverse of covariance matrix instead of A
• Thus ……………..(1)
• And used instead of y ………..(2)
Mah-alan-obis
dist-ance
![Page 72: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/72.jpg)
Mahalanobis Distance
• The above equations are nothing but Mahalanobis Distance ……
• For example, suppose we took a single observation from a bivariate population with Variable X and Variable Y, and that our two variables had the following characteristics
![Page 73: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/73.jpg)
• single observation, X = 410 and Y = 400 The Mahalanobis distance for that single value as:
![Page 74: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/74.jpg)
• ghk
1.825
![Page 75: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/75.jpg)
• Therefore, our single observation would have a distance of 1.825 standardized units from the mean (mean is at X = 500, Y = 500).
• If we took many such observations, graphed them and colored them according to their Mahalanobis values, we can see the elliptical Mahalanobis regions come out
![Page 76: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/76.jpg)
• The points are actually distributed along two primary axes:
![Page 77: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/77.jpg)
![Page 78: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/78.jpg)
If we calculate Mahalanobis distances for each of these points and shade them according to their distance value, we see clear elliptical patterns emerge:
![Page 79: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/79.jpg)
![Page 80: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/80.jpg)
• We can also draw actual ellipses at regions of constant Mahalanobis values:
68% obs
95% obs
99.7% obs
![Page 81: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/81.jpg)
• Which ellipse do you choose ??????Ans : Use the 68-95-99.7 rule .
1) about two-thirds (68%) of the points should be within 1 unit of the origin (along the axis). 2) about 95% should be within 2 units 3)about 99.7 should be within 3 units
![Page 82: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/82.jpg)
If normal
![Page 83: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/83.jpg)
Sample Mahalanobis Distancce • The sample Mahalanobis distance is made by
replacing by S and by • i.e (X- )’ (X- )
![Page 84: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/84.jpg)
For sample
(X- )’ (X- )
Distribution of mahalanobis distance
![Page 85: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/85.jpg)
Distribution of mahalanobis distance Let be in dependent observation from any population with meanand finite (nonsingular) covariance Σ . Then is approximately and is approximately for n-p large This is nothing but central limit theorem
![Page 86: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/86.jpg)
Mahalanobis distance in R
• ########### Mahalanobis Distance ##########
• x=rnorm(100);x
• dm=matrix(x,nrow=20,ncol=5,byrow=F);dm ##dm = data matrix
• cm=colMeans(dm);cm ## cm= column means
• cov=cov(dm);cov ##cov = covariance matrix
• incov=solve(cov);incov ##incov= inverse of
covarianc matrix
![Page 87: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/87.jpg)
Mahalanobis distance in R• ####### MAHALANOBIS DISTANCE : MANUALY ######
• @@@ Mahalanobis distance of first • observation@@@@@@• ob1=dm[1,];ob1 ## first observation • mv1=ob1-cm;mv1 ## deviatiopn of first observation from center of gravity • md1=t(mv1)%*%incov%*%mv1;md1 ## mahalanobis distance of first observation from center of gravity •
![Page 88: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/88.jpg)
Mahalanobis distance in R• @@@@@@ Mahalanobis distance of second observation@@@@@
• ob2=dm[2,];ob2 ## second observation • mv2=ob2-cm;mv2 ## deviatiopn of second • observation from • center of gravity • md2=t(mv2)%*%incov%*%mv2;md2 ##mahalanobis distance of second observation from center of gravity ................ ……………… …..……………
![Page 89: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/89.jpg)
Mahalanobis distance in R ………....... ……………… ……………
@@@@@ Mahalanobis distance of 20th observation@@@@@• Ob20=dm[,20];ob20 [## 20th observation • mv20=ob20-cm;mv20 ## deviatiopn of 20th observation from center of gravity • md20=t(mv20)%*%incov%*%mv20;md20 ## mahalanobis distance of 20thobservation from center of gravity
![Page 90: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/90.jpg)
Mahalanobis distance in R
####### MAHALANOBIS DISTANCE : PACKAGE ########
• md=mahalanobis(dm,cm,cov,inverted =F);md ## md =mahalanobis distance• md=mahalanobis(dm,cm,cov);md
![Page 91: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/91.jpg)
Another example
• x <- matrix(rnorm(100*3), ncol = 3)
• Sx <- cov(x)
• D2 <- mahalanobis(x, colMeans(x), Sx)
![Page 92: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/92.jpg)
• plot(density(D2, bw = 0.5), main="Squared Mahalanobis distances, n=100, p=3") • qqplot(qchisq(ppoints(100), df = 3), D2, main = expression("Q-Q plot of Mahalanobis" * ~D^2 * " vs. quantiles of" * ~ chi[3]^2))
• abline(0, 1, col = 'gray')• ?? mahalanobis
![Page 93: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/93.jpg)
Acknowledgement
Prof . Mohammad Nasser . Richard A. Johnson & Dean W. Wichern . & others
![Page 94: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/94.jpg)
THANK YOU ALL
![Page 95: Different kind of distance and Statistical Distance](https://reader036.vdocuments.site/reader036/viewer/2022062821/589bbb5d1a28ab082b8b4b7b/html5/thumbnails/95.jpg)
Necessity of Statistical Distance
In home Mother
In mess Female
maid
Student in mess