bayesian framework ee 645 zhao xin. a brief introduction to bayesian framework the bayesian...

Bayesian Framework

EE 645ZHAO XIN

A Brief Introduction to Bayesian Framework

The Bayesian Philosophy Bayesian Neural Network Some Discussion on

Priors

Bayesian’s Rule

),...,(

)(,...,,...,

)()1(

)()1()()1(

n

nn

xxP

PxxPxxP

Likelihood Prior Distribution Normalizing Constant

Bayesian Prediction

dxxPxP

xxxP

nn

nn

)()1()1(

)()1()1(

,...,

,...,

dyxyxPxfy

dyxyxPxyP

yxyxxyP

nnnk

nk

nnnn

nnnn

)),(),...,,((),(ˆ

)),(),...,,((),(

)),(),...,,(,(

)()()1()1()1()1(

)()()1()1(1()1(

)()()1()1()1()1(

Hierarchical Model

p

kk

p

dPP

PP

1

1

)(

),...,()(

An Example Bayesian Network

)(1xf )(2xf

)(1xh )(2xh )(3xh )(4xh

1x 2x 3x

kkkk

k

iiijjj

jjjkkk

yxfxyP

xwaxh

xhwbxf

)2/))((exp(2

1

)()(

)()(

22

Some Discussion on Priors Priors Converging to Gaussian

Process If the number of Hidden Units is infinite Priors Leads to smooth and Brownian

Functions Fractional Brownian Priors

Priors Converging to Non-Gaussian Stable Process

Bayesian Framework for LS RBF Kernel SVM MUD

Basic Problem and Solution Probabilistic Interpretation of the LS SVM

First Level Inference Second Level Inference Third Level Inference

Basic MUD Model Results and Discussion Summary

Basic Problem for LS SVM

Niebxwy

JJ

ewwebwJ

yRx

yxyxD

iiT

i

Dw

N

ii

Ti

ebw

iK

i

NN

i

,...,1 1])([ S.T.

22),,(min

1,1 , where

,),),...(,(Dataset Given

1

22

,,

11

Basic Solution for LS SVM

N

iiii

jijT

iij

NN

vN

Nv

Tv

bxxKysignxy

xxKxx

eee

yyY

Y

b

I

1

11

1

1

]),([)(

),,()()(

]',...,[],,...,[

],1,...,1[1,]',...,[

0

1

10

The Formula for SVM

space) dual in thesolution thefind wefact,(In

)(),(ˆ

)(),( and , where

,...,1 ))(),((

)( )(

1,

1

N

iMAPiiiMAP

di

Ki

i

N

jijji

Mappingi

Mappingi

bxxywy

RxxRxx

Niebxxwy

xxxx

First Level Inference

),log,log(

),log,log,(),log,log,,(

,log,log,,

model and )},{(Given 1

HDP

HbwPHbwDP

HDbwP

HyxD Niii

Some Assumptions of this Level Separable Gaussian Prior for conditional P(w,b) Independent Data Points Gaussian Distributed Errors Variance of b goes to infinite

)2

exp(

)2

exp(2

1)

2exp(

2,log,

2

22

1

2

2

ww

bwwHbwP

T

bb

T

N f

)2

exp(

)2

exp(2

),log,,(

),log,,,(),log,log,,(

1

2

1

22

1

1

1

N

ii

N

ii

N

ii

N

iii

e

e

HbweP

HbwyxPHbwDP

Result of the First Level

classifier SVM LS Kernel classic of

solution thebe willb bias and wfor weight Estimation

PosterioriA Maximum thefindcan weequation, By this

)),(exp()2

exp()2

exp(

),log,log,,(

21

2 bwJeww

HDbwPN

ii

T

Conditional Distribution of Weight w and Bias b

22

22

2

22

22

2

1

1

1

],[

)2

1exp(

det)2(

1

),log,log,,(

b

J

wb

Jbw

J

w

J

HQ

bbwwg

where

gQgQ

HDbwP

TMAPMAP

T

N

Unbalance Case of 1st LevelIf the means of +1 class and –1 class are not perfectly project to +1 and –1, the

bias term will come. We will introduce 2 new random variables as followed.

)2

exp(2

)2

)ˆ))(),(((exp(

2

),log,,,1,(

22

1

embxxyw

HbwyxeP

di

N

iii

),log,log,log,,1()1(),log,log,log,,1()1(

),log,log,log,,()(

),log,log,log,,(

))(2

ˆexp()(2

),log,log,,,1(

11

2

2

111

HDyxPyPHDyxPyP

HDyxPyP

HDxyP

m

HbwyxP

Last Solution for First Level

2

1

*

1

1*

1

1

)(

),(1

ˆ

])1(

)1(log

ˆˆ2

ˆˆ),([)(

Ijji

N

iiid

dd

ddN

iiii

xxKyN

mwhere

yP

yP

mm

mmbxxKysignxy

Second Level Inference

d.)distribute uniform separable is )log,(logprior (Assume

),log,log(

)(

)log,(log),log,log(),log,(log

formula following as , and eter hyperparam

toRule Bayes'apply will welevel, In this

HP

HDP

HDP

HPHDPHDP

Result of Second Level Inference

matrix Gram centering of

)( eigenvalue zero-non ofnumber theis and

)()det( where

)),(exp()det(

),log,(log

,

1,

2

iG

Neff

iiG

NeffN

MAPMAP

NN

Neff

NH

bwJH

HDP

f

f

TNN

N

NT

Neff

iiG

MAPMAP

NM

IMM

MYIMMMYJ

Neff

bwJJ

111

where

1log

2

1 )

11det(log

2

1

)11

(2

1 ),(

log2

1log

2)log(

2

1

),(),(

3,

1,

23,

min

min

Last Solution for Second Level

MYUIDMUYbwEbwE

MYUIDMUYbwE

bwEbwE

bwEN

J

bwEbwENJ

TGNeffGG

TMAPMAPDMAPMAPw

TGNeffGG

TMAPMAPD

N

i iGMAPMAPDMAPMAPw

MAPMAPD

N

iiGMAPMAPDMAPMAPw

11

212

1 ,2

4

1,4

)(2

1 ),(),(

)(2

1 ),(

1

),(),(

),(

)1

log()),(),(()(

If

min

Third Level Inference

d.)distribute uniform is )(prior (Assume

)()()()(

)()()(

formula following as ,parameter model

toRule Bayes'apply will welevel, In this

j

jjj

jj

j

HP

HDPHPHDPDP

HPHDPDHP

H

Some Assumption in this Level

Neff

i iGMAP

iGMAPeff

effD

effD

DD

j

j

jjj

N

HDP

ddHDP

ddHPHDPHDP

1 ,

,

2log

2log

loglog

,

,

11

and 2

,1

2

where, and anceerror variith Gaussian w separable

as edapproxiamt wellbecan ),log,(log Assume

),log,(log

)log,(log),log,log(

Last Solution for Third Level

j

Neff

iiGMAPMAPeffeff

NMAP

ll

DlDl

jMAPMAPj

HDP

N

HDPHDP

themaximize which j ofindex pick the We

)())(1(

,log,log

1,

NeffMAP

loglog

loglog

Some Comments for this Level For Gaussian Kernel machine, the variance

of Gaussian function can represent the model H

It’s impossible to calculate for all the possible model

Luckily, in general, such as in Gaussian Kernel SVM, the performance of classifier is pretty smooth with respect to the varying of model parameter. Therefore, we can just take sample of the model in the area we feel interested.

A Synchronous CDMA Transmitter

InformatinUser 1

InformatinUser 2

InformationUser K

AWGNChannel

][ ib

SpreadingCoding &

Modulation

SpreadingCoding &

Modulation

SpreadingCoding &

Modulation

)(ty

InformatinUser 1

InformatinUser 2

InformationUser K

AWGNChannel

][ ib

SpreadingCoding &

Modulation

SpreadingCoding &

Modulation

SpreadingCoding &

Modulation

sSynchronou )(ty

Demodulation

The LS SVM Receiver Diagram

)(ty

Match FilterUser 1

Match FilterUser 2

Match FilterUser K

UserSpace

LS SVMNetwork

),

eter Hyperparam

and ,

(Parameter

b ][ˆ ibk][iY

Results and

Discussions

First Inference

0 2 4 6 8 10 1210

-4

10-3

10-2

10-1

100

SNR (dB)

PE

R

Performance of First Level Result

Asterisk: Revised LS SVMCircle: LS SVM

7 Users Rho = 0.429 # of Training Pts: 200Ai/A1 = 5

Second Inference

-1 0 1 2 3 4 5100

150

200

250

300

350

log10(C)

J2

Second Reference Plot

Var = 1

Var = 10

Single User AWGN 0 dB Channel

Third Inference (Plot 1)

-1 0 1 2 3 4 5100

150

200

250

300

350

400

log10(C)

J3

Third Inference Plot

0.5 1

5

10

20

Single User AWGN ChannelSNR = 0 dBNo. of Training Pts = 100

Third Inference (Plot 1)

0 1 2 3 4 5 6-1200

-1000

-800

-600

-400

-200

0

200

400

log10(C)

J3

Third Inference Plot

0.1

0.5

1

5

10

Single User AWGN ChannelSNR = 8 dB No. of Training Pts = 100

A Sample of Parameter Chosen

SNR (dB) Variance 1/C

0 3.98 0.11

2 2.49 0.49

4 3.98 0.28

6 5.01 0.39

8 5.01 0.28

10 10.0 0.34

12 12.6 0.08

Detector Performance

0 2 4 6 8 10 1210

-4

10-3

10-2

10-1

100

SNR (dB)

PE

R

Performance Comparion of Gaussian LS SVM Detector

Circle: Basic Gaussian LS SVM Asterisk: 1st Inference Applied Sqaure: 2nd & 3rd Inference Applied Diamond: MMSE

7 Users Rho = 0.429 Ai/A1 = 5 No. of Training Pts: 200

Some Discussions on this Detector

The first inference does better the performance of LS SVM detector especially in high SNR region by considering the bias term.

The LS SVM detector is very smooth with respect to the varying of those hyper-parameters, which means the adaptive LS SVM will reasonably work well if the channel properties are not varying fast.

The computation for 2nd and 3rd inference are very complex, so it’s not worthwhile to do calculation here. We can choose some approximation formula instead.

Summary of Bayesian Network Pick up a basic neural network. Properly choose the Priors (physically

right and easy for theoretical deduction). Find a reasonable hierarchical framework

(a three-level inference framework is very typical), apply the Bayesian Rule there and find some beneficial assumption to simplify the problem.

Some Comments on Bayesian Framework

It can help us to physically understand a neural network model.

It can theoretically help us to find the way to optimize the parameters and more important those hyper-parameters which can be sometimes impossibly set otherwise.

It even can make up some exist methods in some given problems.

Reference Tony V. G., Johan A. K. Suykens, A

Bayesian Framework for Least Square Support Vector Machine Classifiers

N. Cristianini, John S., An Introduction to Support Vector Machine, 2000

Radford M. Neal, Bayesian Learning for Neural Network, 1996

Sergio Verdo, Multiuser Detection

bayesian framework ee 645 zhao xin. a brief introduction to bayesian framework the bayesian...

Documents

level slide

level inference slide

infinite slide

ls svm slide

bayesian prediction

hierarchical model slide

detector performance

bias b slide