the multivariate gaussian jian zhao. outline what is multivariate gaussian? parameterizations...
TRANSCRIPT
The Multivariate Gaussian
Jian Zhao
Outline
What is multivariate Gaussian? Parameterizations Mathematical Preparation Joint distributions, Marginalization and
conditioning Maximum likelihood estimation
What is multivariate Gaussian?
Where x is a n*1 vector, ∑ is an n*n, symmetric matrix
11/ 2/ 2
1 1( | , ) exp{ ( ) ( )}
2(2 )T
np x x x
2
1 1 2
211 2 2
,
,
x x x
x x x
Geometrical interpret
Exclude the normalization factor and exponential function, we look the quadratic form
1( ) ( )Tx x
This is a quadratic form. If we set it to a constant value c, considering the case that n=2, we obtained
22
2 211 1 12 1 2 22a x a x x a x
Geometrical interpret (continued)
This is a ellipse with the coordinate x1 and x2.
Thus we can easily image that when n increases the ellipse became higher dimension ellipsoids
Geometrical interpret (continued)
Thus we get a Gaussian bump in n+1 dimension
Where the level surfaces of the Gaussian bump are ellipsoids oriented along the eigenvectors of ∑
Parameterization
1
1
( )
( )( )TE x
E x x
Another type of parameterization, putting it into the form of exponential family
1( log(2 ) log | | )
2Ta n
1( | , ) exp{ }
2T Tp x a x x x
Mathematical preparation
In order to get the marginalization and conditioning of the partitioned multivariate Gaussian distribution, we need the theory of block diagonalizing a partitioned matrix
In order to maximum the likelihood estimation, we need the knowledge of traces of the matrix.
Partitioned matrices
Consider a general patitioned matrix
E FM
G H
To zero out the upper-right-hand and lower-left-hand corner of M, we can premutiply and postmultiply matrices in the following form
1
1
0 0
0 0
I FH E F I E FH G
I G H H G I H
Partitioned matrices (continued)
Define Schur complement of Matrix M with respect to H , denote M/H as the term 1E FH G
Since1 1 1 1 1
1 1
( )XYZ Z Y X W
Y ZW X
1 1 1
1 1
1 1 1
1 1 1 1 1 1
0 ( / ) 0
0 0
( / ) ( / )
( / ) ( / )
E F I M H I FH
G H H G I H I
M H M H FH
H G M H H H G M H FH
So
Partitioned matrices (continued)
Note that we could alternatively have decomposed the matrix m in terms of E and M/E, yielding the following expression for the inverse.
1 1
1 1 1
0 0
0 00 0
I E F E F EI E F I E F
GE I G H GE F H GE F HI I
1 1 1
11
1 1 1
11
1 1 1 1 1 1
1 1 1
00
0 0 ( / )
0( / )
0 ( / )
( / ) ( / )
( / ) ( / )
E F II E F E
G H GE II M E
IE E F M E
GE IM E
E E F M E GE E F M E
M E GE M E
Partitioned matrices (continued)
Thus we get
1 1 1 1 1 1 1( ) ( )E FH G E E F H GE F GE 1 1 1 1 1 1( ) ( )E FH G FH E F H GE F
At the same time we get the conclusion
|M| = |M/H||H|
Theory of traces
[ ]iii
i i
tr A a
[ ] [ ]T T Tx Ax tr x Ax tr xx A
Define
It has the following properties:
tr[ABC] = tr[CAB] = tr[BCA]
Theory of traces (continued)
[ ] kl lk jik lij ij
tr AB a b ba a
[ ] Ttr BA BA
so
[ ] [ ]T T T T Tx Ax tr xx A xx xxA A
Theory of traces (continued)
We want to show that
Since
Recall
This is equivalent to prove
Noting that
log | | TA AA
1
log | | | |ij ij
A Aa A a
1 1
| |A A
A
| |ij
A Aa
| | ( 1)i j ij ijj
A a M
Joint distributions, Marginalization and conditioning
1
2
11 12
21 22
1
1 1 11 12 1 1
1/ 2( ) / 22 2 21 22 2 2
1 1( | , ) exp{ }
2(2 )
T
p q
x xp x
x x
We partition the n by 1 vector x into p by 1 and q by 1, which n = p + q
Marginalization and conditioning
1
1 1 11 12 1 1
2 2 21 22 2 2
11 1 22
12 2 22 21 22
11 112 22
2 2
1 11 1 12 22 2 2 22
1exp{ }
2
0 ( / ) 01exp{
2 0
}0
1exp ( ( )) ( / ) (
2
T
T
T
x x
x x
x I
x I
xI
xI
x x x
11 1 12 22 2 2
12 2 22 2 2
( ))
1exp ( ) ( )
2T
x
x x
Normalization factor
1/ 2 ( ) / 2 1/ 2( ) / 222 22
/ 2 1/ 2 / 2 1/ 222 22
1 1
(2 ) (| / || |)(2 )
1 1
(2 ) (| / |) (2 ) (| |)
p qp q
p q
Marginalization and conditioning
Thus
12 2 2 22 2 2/ 2 1/ 2
22
1 1( ) exp ( ) ( )
(2 ) (| |) 2T
qp x x x
1 2 / 2 1/ 222
1 1 11 1 12 22 2 2 22 1 1 12 22 2 2
1( | )
(2 ) (| / |)
1exp ( ( )) ( / ) ( ( ))
2
p
T
p x x
x x x x
Marginalization & Conditioning
Marginalization
Conditioning
2 2
2 22
m
m
11|2 1 12 22 2 2
11|2 11 12 22 21
( )c
c
x
In another form
1|2 1 12 2
1|2 11
c
c
x
Marginalization
Conditioning
12 2 21 11 1
12 22 21 11 12
m
m
Maximum likelihood estimation
1
1
1( , | ) log | | ( ) ( )
2 2
NT
i ii
Nl D x x
1
1
( )N
Ti
i
lx
Calculating µ
1
1ˆ
N
ii
xN
Taking derivative with respect to µ
Setting to zero
Estimate ∑
We need to take the derivative with respect to ∑
according to the property of traces
1
1 1
1 1
1( | ) log | | ( ) ( )
2 2
1log | | [( ) ( )]
2 2
1log | | [( )( ) ]
2 2
T
n
T
n
T
n
Nl D x x
Ntr x x
Ntr x x
1
1( )( )
2 2T
n nn
l Nx x
Estimate ∑
Thus the maximum likelihood estimator is
The maximum likelihood estimator of canonical parameters are
1ˆ ( )( )TML n nn
x xN
1
1
ˆˆ
ˆˆ ˆ
ML
ML ML