bayesian framework ee 645 zhao xin. a brief introduction to bayesian framework the bayesian...
Post on 21-Dec-2015
224 views
TRANSCRIPT
Bayesian Framework
EE 645ZHAO XIN
A Brief Introduction to Bayesian Framework
The Bayesian Philosophy Bayesian Neural Network Some Discussion on
Priors
Bayesian’s Rule
),...,(
)(,...,,...,
)()1(
)()1()()1(
n
nn
xxP
PxxPxxP
Likelihood Prior Distribution Normalizing Constant
Bayesian Prediction
dxxPxP
xxxP
nn
nn
)()1()1(
)()1()1(
,...,
,...,
dyxyxPxfy
dyxyxPxyP
yxyxxyP
nnnk
nk
nnnn
nnnn
)),(),...,,((),(ˆ
)),(),...,,((),(
)),(),...,,(,(
)()()1()1()1()1(
)()()1()1(1()1(
)()()1()1()1()1(
Hierarchical Model
p
kk
p
dPP
PP
1
1
)(
),...,()(
An Example Bayesian Network
)(1xf )(2xf
)(1xh )(2xh )(3xh )(4xh
1x 2x 3x
kkkk
k
iiijjj
jjjkkk
yxfxyP
xwaxh
xhwbxf
)2/))((exp(2
1
)()(
)()(
22
Some Discussion on Priors Priors Converging to Gaussian
Process If the number of Hidden Units is infinite Priors Leads to smooth and Brownian
Functions Fractional Brownian Priors
Priors Converging to Non-Gaussian Stable Process
Bayesian Framework for LS RBF Kernel SVM MUD
Basic Problem and Solution Probabilistic Interpretation of the LS SVM
First Level Inference Second Level Inference Third Level Inference
Basic MUD Model Results and Discussion Summary
Basic Problem for LS SVM
Niebxwy
JJ
ewwebwJ
yRx
yxyxD
iiT
i
Dw
N
ii
Ti
ebw
iK
i
NN
i
,...,1 1])([ S.T.
22),,(min
1,1 , where
,),),...(,(Dataset Given
1
22
,,
11
Basic Solution for LS SVM
N
iiii
jijT
iij
NN
vN
Nv
Tv
bxxKysignxy
xxKxx
eee
yyY
Y
b
I
1
11
1
1
]),([)(
),,()()(
]',...,[],,...,[
],1,...,1[1,]',...,[
0
1
10
The Formula for SVM
space) dual in thesolution thefind wefact,(In
)(),(ˆ
)(),( and , where
,...,1 ))(),((
)( )(
1,
1
N
iMAPiiiMAP
di
Ki
i
N
jijji
Mappingi
Mappingi
bxxywy
RxxRxx
Niebxxwy
xxxx
First Level Inference
),log,log(
),log,log,(),log,log,,(
,log,log,,
model and )},{(Given 1
HDP
HbwPHbwDP
HDbwP
HyxD Niii
Some Assumptions of this Level Separable Gaussian Prior for conditional P(w,b) Independent Data Points Gaussian Distributed Errors Variance of b goes to infinite
)2
exp(
)2
exp(2
1)
2exp(
2,log,
2
22
1
2
2
ww
bwwHbwP
T
bb
T
N f
)2
exp(
)2
exp(2
),log,,(
),log,,,(),log,log,,(
1
2
1
22
1
1
1
N
ii
N
ii
N
ii
N
iii
e
e
HbweP
HbwyxPHbwDP
Result of the First Level
classifier SVM LS Kernel classic of
solution thebe willb bias and wfor weight Estimation
PosterioriA Maximum thefindcan weequation, By this
)),(exp()2
exp()2
exp(
),log,log,,(
21
2 bwJeww
HDbwPN
ii
T
Conditional Distribution of Weight w and Bias b
22
22
2
22
22
2
1
1
1
],[
)2
1exp(
det)2(
1
),log,log,,(
b
J
wb
Jbw
J
w
J
HQ
bbwwg
where
gQgQ
HDbwP
TMAPMAP
T
N
Unbalance Case of 1st LevelIf the means of +1 class and –1 class are not perfectly project to +1 and –1, the
bias term will come. We will introduce 2 new random variables as followed.
)2
exp(2
)2
)ˆ))(),(((exp(
2
),log,,,1,(
22
1
embxxyw
HbwyxeP
di
N
iii
),log,log,log,,1()1(),log,log,log,,1()1(
),log,log,log,,()(
),log,log,log,,(
))(2
ˆexp()(2
),log,log,,,1(
11
2
2
111
HDyxPyPHDyxPyP
HDyxPyP
HDxyP
m
HbwyxP
Last Solution for First Level
2
1
*
1
1*
1
1
)(
),(1
ˆ
])1(
)1(log
ˆˆ2
ˆˆ),([)(
Ijji
N
iiid
dd
ddN
iiii
xxKyN
mwhere
yP
yP
mm
mmbxxKysignxy
Second Level Inference
d.)distribute uniform separable is )log,(logprior (Assume
),log,log(
)(
)log,(log),log,log(),log,(log
formula following as , and eter hyperparam
toRule Bayes'apply will welevel, In this
HP
HDP
HDP
HPHDPHDP
Result of Second Level Inference
matrix Gram centering of
)( eigenvalue zero-non ofnumber theis and
)()det( where
)),(exp()det(
),log,(log
,
1,
2
iG
Neff
iiG
NeffN
MAPMAP
NN
Neff
NH
bwJH
HDP
f
f
TNN
N
NT
Neff
iiG
MAPMAP
NM
IMM
MYIMMMYJ
Neff
bwJJ
111
where
1log
2
1 )
11det(log
2
1
)11
(2
1 ),(
log2
1log
2)log(
2
1
),(),(
3,
1,
23,
min
min
Last Solution for Second Level
MYUIDMUYbwEbwE
MYUIDMUYbwE
bwEbwE
bwEN
J
bwEbwENJ
TGNeffGG
TMAPMAPDMAPMAPw
TGNeffGG
TMAPMAPD
N
i iGMAPMAPDMAPMAPw
MAPMAPD
N
iiGMAPMAPDMAPMAPw
11
212
1 ,2
4
1,4
)(2
1 ),(),(
)(2
1 ),(
1
),(),(
),(
)1
log()),(),(()(
If
min
Third Level Inference
d.)distribute uniform is )(prior (Assume
)()()()(
)()()(
formula following as ,parameter model
toRule Bayes'apply will welevel, In this
j
jjj
jj
j
HP
HDPHPHDPDP
HPHDPDHP
H
Some Assumption in this Level
Neff
i iGMAP
iGMAPeff
effD
effD
DD
j
j
jjj
N
HDP
ddHDP
ddHPHDPHDP
1 ,
,
2log
2log
loglog
,
,
11
and 2
,1
2
where, and anceerror variith Gaussian w separable
as edapproxiamt wellbecan ),log,(log Assume
),log,(log
)log,(log),log,log(
Last Solution for Third Level
j
Neff
iiGMAPMAPeffeff
NMAP
ll
DlDl
jMAPMAPj
HDP
N
HDPHDP
themaximize which j ofindex pick the We
)())(1(
,log,log
1,
NeffMAP
loglog
loglog
Some Comments for this Level For Gaussian Kernel machine, the variance
of Gaussian function can represent the model H
It’s impossible to calculate for all the possible model
Luckily, in general, such as in Gaussian Kernel SVM, the performance of classifier is pretty smooth with respect to the varying of model parameter. Therefore, we can just take sample of the model in the area we feel interested.
A Synchronous CDMA Transmitter
InformatinUser 1
InformatinUser 2
InformationUser K
AWGNChannel
][ ib
SpreadingCoding &
Modulation
SpreadingCoding &
Modulation
SpreadingCoding &
Modulation
)(ty
InformatinUser 1
InformatinUser 2
InformationUser K
AWGNChannel
][ ib
SpreadingCoding &
Modulation
SpreadingCoding &
Modulation
SpreadingCoding &
Modulation
sSynchronou )(ty
Demodulation
The LS SVM Receiver Diagram
)(ty
Match FilterUser 1
Match FilterUser 2
Match FilterUser K
UserSpace
LS SVMNetwork
),
eter Hyperparam
and ,
(Parameter
b ][ˆ ibk][iY
Results and
Discussions
First Inference
0 2 4 6 8 10 1210
-4
10-3
10-2
10-1
100
SNR (dB)
PE
R
Performance of First Level Result
Asterisk: Revised LS SVMCircle: LS SVM
7 Users Rho = 0.429 # of Training Pts: 200Ai/A1 = 5
Second Inference
-1 0 1 2 3 4 5100
150
200
250
300
350
log10(C)
J2
Second Reference Plot
Var = 1
Var = 10
Single User AWGN 0 dB Channel
Third Inference (Plot 1)
-1 0 1 2 3 4 5100
150
200
250
300
350
400
log10(C)
J3
Third Inference Plot
0.5 1
5
10
20
Single User AWGN ChannelSNR = 0 dBNo. of Training Pts = 100
Third Inference (Plot 1)
0 1 2 3 4 5 6-1200
-1000
-800
-600
-400
-200
0
200
400
log10(C)
J3
Third Inference Plot
0.1
0.5
1
5
10
Single User AWGN ChannelSNR = 8 dB No. of Training Pts = 100
A Sample of Parameter Chosen
SNR (dB) Variance 1/C
0 3.98 0.11
2 2.49 0.49
4 3.98 0.28
6 5.01 0.39
8 5.01 0.28
10 10.0 0.34
12 12.6 0.08
Detector Performance
0 2 4 6 8 10 1210
-4
10-3
10-2
10-1
100
SNR (dB)
PE
R
Performance Comparion of Gaussian LS SVM Detector
Circle: Basic Gaussian LS SVM Asterisk: 1st Inference Applied Sqaure: 2nd & 3rd Inference Applied Diamond: MMSE
7 Users Rho = 0.429 Ai/A1 = 5 No. of Training Pts: 200
Some Discussions on this Detector
The first inference does better the performance of LS SVM detector especially in high SNR region by considering the bias term.
The LS SVM detector is very smooth with respect to the varying of those hyper-parameters, which means the adaptive LS SVM will reasonably work well if the channel properties are not varying fast.
The computation for 2nd and 3rd inference are very complex, so it’s not worthwhile to do calculation here. We can choose some approximation formula instead.
Summary of Bayesian Network Pick up a basic neural network. Properly choose the Priors (physically
right and easy for theoretical deduction). Find a reasonable hierarchical framework
(a three-level inference framework is very typical), apply the Bayesian Rule there and find some beneficial assumption to simplify the problem.
Some Comments on Bayesian Framework
It can help us to physically understand a neural network model.
It can theoretically help us to find the way to optimize the parameters and more important those hyper-parameters which can be sometimes impossibly set otherwise.
It even can make up some exist methods in some given problems.
Reference Tony V. G., Johan A. K. Suykens, A
Bayesian Framework for Least Square Support Vector Machine Classifiers
N. Cristianini, John S., An Introduction to Support Vector Machine, 2000
Radford M. Neal, Bayesian Learning for Neural Network, 1996
Sergio Verdo, Multiuser Detection