the multivariate gaussian jian zhao. outline what is multivariate gaussian? parameterizations...

The Multivariate Gaussian

Jian Zhao

Outline

What is multivariate Gaussian? Parameterizations Mathematical Preparation Joint distributions, Marginalization and

conditioning Maximum likelihood estimation

What is multivariate Gaussian?

Where x is a n*1 vector, ∑ is an n*n, symmetric matrix

11/ 2/ 2

1 1( | , ) exp{ ( ) ( )}

2(2 )T

np x x x

2

1 1 2

211 2 2

,

,

x x x

x x x

Geometrical interpret

Exclude the normalization factor and exponential function, we look the quadratic form

1( ) ( )Tx x

This is a quadratic form. If we set it to a constant value c, considering the case that n=2, we obtained

22

2 211 1 12 1 2 22a x a x x a x

Geometrical interpret (continued)

This is a ellipse with the coordinate x1 and x2.

Thus we can easily image that when n increases the ellipse became higher dimension ellipsoids

Geometrical interpret (continued)

Thus we get a Gaussian bump in n+1 dimension

Where the level surfaces of the Gaussian bump are ellipsoids oriented along the eigenvectors of ∑

Parameterization

1

1

( )

( )( )TE x

E x x

Another type of parameterization, putting it into the form of exponential family

1( log(2 ) log | | )

2Ta n

1( | , ) exp{ }

2T Tp x a x x x

Mathematical preparation

In order to get the marginalization and conditioning of the partitioned multivariate Gaussian distribution, we need the theory of block diagonalizing a partitioned matrix

In order to maximum the likelihood estimation, we need the knowledge of traces of the matrix.

Partitioned matrices

Consider a general patitioned matrix

E FM

G H

To zero out the upper-right-hand and lower-left-hand corner of M, we can premutiply and postmultiply matrices in the following form

1

1

0 0

0 0

I FH E F I E FH G

I G H H G I H

Partitioned matrices (continued)

Define Schur complement of Matrix M with respect to H , denote M/H as the term 1E FH G

Since1 1 1 1 1

1 1

( )XYZ Z Y X W

Y ZW X

1 1 1

1 1

1 1 1

1 1 1 1 1 1

0 ( / ) 0

0 0

( / ) ( / )

( / ) ( / )

E F I M H I FH

G H H G I H I

M H M H FH

H G M H H H G M H FH

So


Note that we could alternatively have decomposed the matrix m in terms of E and M/E, yielding the following expression for the inverse.

1 1

1 1 1

0 0

0 00 0

I E F E F EI E F I E F

GE I G H GE F H GE F HI I

1 1 1

11

1 1 1

11

1 1 1 1 1 1

1 1 1

00

0 0 ( / )

0( / )

0 ( / )

( / ) ( / )

( / ) ( / )

E F II E F E

G H GE II M E

IE E F M E

GE IM E

E E F M E GE E F M E

M E GE M E


Thus we get

1 1 1 1 1 1 1( ) ( )E FH G E E F H GE F GE 1 1 1 1 1 1( ) ( )E FH G FH E F H GE F

At the same time we get the conclusion

|M| = |M/H||H|

Theory of traces

[ ]iii

i i

tr A a

[ ] [ ]T T Tx Ax tr x Ax tr xx A

Define

It has the following properties:

tr[ABC] = tr[CAB] = tr[BCA]

Theory of traces (continued)

[ ] kl lk jik lij ij

tr AB a b ba a

[ ] Ttr BA BA

so

[ ] [ ]T T T T Tx Ax tr xx A xx xxA A

Joint distributions, Marginalization and conditioning

1

2

11 12

21 22

1

1 1 11 12 1 1

1/ 2( ) / 22 2 21 22 2 2

1 1( | , ) exp{ }

2(2 )

T

p q

x xp x

x x

We partition the n by 1 vector x into p by 1 and q by 1, which n = p + q

Marginalization and conditioning

1

1 1 11 12 1 1

2 2 21 22 2 2

11 1 22

12 2 22 21 22

11 112 22

2 2

1 11 1 12 22 2 2 22

1exp{ }

2

0 ( / ) 01exp{

2 0

}0

1exp ( ( )) ( / ) (

2

T

T

T

x x

x x

x I

x I

xI

xI

x x x

11 1 12 22 2 2

12 2 22 2 2

( ))

1exp ( ) ( )

2T

x

x x

Normalization factor

1/ 2 ( ) / 2 1/ 2( ) / 222 22

/ 2 1/ 2 / 2 1/ 222 22

1 1

(2 ) (| / || |)(2 )

1 1

(2 ) (| / |) (2 ) (| |)

p qp q

p q

Marginalization and conditioning

Thus

12 2 2 22 2 2/ 2 1/ 2

22

1 1( ) exp ( ) ( )

(2 ) (| |) 2T

qp x x x

1 2 / 2 1/ 222

1 1 11 1 12 22 2 2 22 1 1 12 22 2 2

1( | )

(2 ) (| / |)

1exp ( ( )) ( / ) ( ( ))

2

p

T

p x x

x x x x

Marginalization & Conditioning

Marginalization

Conditioning

2 2

2 22

m

m

11|2 1 12 22 2 2

11|2 11 12 22 21

( )c

c

x

In another form

1|2 1 12 2

1|2 11

c

c

x

Marginalization

Conditioning

12 2 21 11 1

12 22 21 11 12

m

m

Maximum likelihood estimation

1

1

1( , | ) log | | ( ) ( )

2 2

NT

i ii

Nl D x x

1

1

( )N

Ti

i

lx

Calculating µ

1

1ˆ

N

ii

xN

Taking derivative with respect to µ

Setting to zero

Estimate ∑

We need to take the derivative with respect to ∑

according to the property of traces

1

1 1

1 1

1( | ) log | | ( ) ( )

2 2

1log | | [( ) ( )]

2 2

1log | | [( )( ) ]

2 2

T

n

T

n

T

n

Nl D x x

Ntr x x

Ntr x x

1

1( )( )

2 2T

n nn

l Nx x

Estimate ∑

Thus the maximum likelihood estimator is

The maximum likelihood estimator of canonical parameters are

1ˆ ( )( )TML n nn

x xN

1

1

ˆˆ

ˆˆ ˆ

ML

ML ML

the multivariate gaussian jian zhao. outline what is multivariate gaussian? parameterizations...

Documents