locally averaged bayesian dirichlet metrics
TRANSCRIPT
Locally averaged Bayesian Dirichlet metrics
A. Cano, M. Gomez-Olmedo, A. R. Masegosa and S. Moral
Department of Computer Science and Artificial Intelligence
University of Granada (Spain)
Belfast, July 2011
European Conference on Symbolic and Quantitative Approaches to Reasoning under Uncertainty
ECSQARU 2011 Belfast (UK) 1/30
Outline
1 Introduction
2 Bayesian Dirichlet Metrics
3 Locally Averaged Bayesian Dirichlet Metrics
4 Experimental Evaluation
5 Conclusions & Future Works
ECSQARU 2011 Belfast (UK) 2/30
Introduction
Part I
Introduction
ECSQARU 2011 Belfast (UK) 3/30
Introduction
Bayesian Networks
Bayesian Networks
Excellent models to graphically represent the dependency structure of theunderlying distribution in multivariate domains.
This dependency structure in a multivariate problem domain represents a veryrelevant source of knowledge (direct interactions, conditionalindependencies...)
ECSQARU 2011 Belfast (UK) 4/30
Introduction
Learning Bayesian Networks from Data
Learning Algorithms
Constrained-Base learning based on hypothesis tests approaches such as PCalgorithm.
Score+Search methods which employs a search algorithm guided by a scorefunction.
The model with the highest score is selected.
ECSQARU 2011 Belfast (UK) 5/30
Introduction
Bayesian Score Metrics
Marginal Likelihood of the data
P(D|G) =
∫P(D|θ,G)P(θ|G)dφ
Bayesian Dirichlet Equivalent Metric (BDe)
It satisfies the likelihood equivalence property.
A global Dirichlet distribution is assumed in order to guarantee the likelihoodequivalence property.
The parametrization depends of the equivalent sample size, ESS, parameter.
score(G : D) =∏
i
|Ui |∏j=0
Γ( ESS|Ui |
)
Γ( ESS|Ui |
+ Nij )
|Xi |∏k=1
Γ( ESS|Ui ||Xi |
+ Nijk )
Γ( ESS|Ui ||Xi |
)
ECSQARU 2011 Belfast (UK) 6/30
Introduction
Bayesian Score Metrics
Marginal Likelihood of the data
P(D|G) =
∫P(D|θ,G)P(θ|G)dφ
Bayesian Dirichlet Equivalent Metric (BDe)
It satisfies the likelihood equivalence property.
A global Dirichlet distribution is assumed in order to guarantee the likelihoodequivalence property.
The parametrization depends of the equivalent sample size, ESS, parameter.
score(G : D) =∏
i
|Ui |∏j=0
Γ( ESS|Ui |
)
Γ( ESS|Ui |
+ Nij )
|Xi |∏k=1
Γ( ESS|Ui ||Xi |
+ Nijk )
Γ( ESS|Ui ||Xi |
)
ECSQARU 2011 Belfast (UK) 6/30
Introduction
Sensitivity to ESS parameter
Experimental Evaluations [Silander et al.2007]
The global MAP BN was computed with an exhaustive search based algorithmfor 20 UCI data sets.
They found as different ESS values lead to different optimal BN models.
For some data sets (e.g. Yeast database) the optimal BN model monotonicallygoes from the empty to the fully connected graph.
N. of Arcs in the optimal BN vs ESS value
ECSQARU 2011 Belfast (UK) 7/30
Introduction
Our approach
Solution: Marginalizing the ESS parameter
As firstly suggested in [Silander et al. 2007], a possible solution is to employ aBayesian approach:
Assume a prior distribution on the ESS parameter and to marginalize itout.
Locally Averaged Bayesian Dirichlet Metrics
It is based on a local averaging approach to marginalize the ESS parameter.
We experimentally justify that this approach is superior:
It is able to adapt to more complex parameter spaces.
This approach removes the sensitivity of Bayesian Dirichlet metric to the ESSparameter.
ECSQARU 2011 Belfast (UK) 8/30
Introduction
Our approach
Solution: Marginalizing the ESS parameter
As firstly suggested in [Silander et al. 2007], a possible solution is to employ aBayesian approach:
Assume a prior distribution on the ESS parameter and to marginalize itout.
Locally Averaged Bayesian Dirichlet Metrics
It is based on a local averaging approach to marginalize the ESS parameter.
We experimentally justify that this approach is superior:
It is able to adapt to more complex parameter spaces.
This approach removes the sensitivity of Bayesian Dirichlet metric to the ESSparameter.
ECSQARU 2011 Belfast (UK) 8/30
Bayesian Dirichlet Metrics
Part II
Bayesian Dirichlet Metrics
ECSQARU 2011 Belfast (UK) 9/30
Bayesian Dirichlet Metrics
Notation
Let be X = (X1, ...,Xn) a set of nmultinomial random variables.
|Xi | is the number of values of Xi .
We also assume a fully observedmultinomial data set D.
A Bayesian Network B can be described by:
G is a directed acyclic graph.
G = (Pa(X1), ...,Pa(Xn)).
θG a set of parameter vectors.
P(Xi |Pa(Xi ) = j) = θij .
ECSQARU 2011 Belfast (UK) 10/30
Bayesian Dirichlet Metrics
Bayesian Dirchlet equivalent metric
Marginal Likelihood of a graph structure:
P(D|G) =
∫P(D|θ,G)P(θ|G)dφ
It is computed under the following assumptions:
Complete labelled training data.The prior distributions over the parameters are Dirichlet distributions.
θij ∼ Dirichet(αij1, ..., αijk )
Parameters are globally and locally independent:
scoreBDeu(G|D) =n∏
i=1
|PaG(Xi )|∏j=0
Γ(αij )
Γ(αij + Nij )
|Xi |∏k=1
Γ(αijk + Nijk )
Γ(αijk )
BDe metrics sets alpha values as follows, in order to guarantee the likelihoodequivalence property:
αijk =S
|Xi ||Pa(Xi )|
ECSQARU 2011 Belfast (UK) 11/30
Bayesian Dirichlet Metrics
Bayesian Dirchlet equivalent metric
Marginal Likelihood of a graph structure:
P(D|G) =
∫P(D|θ,G)P(θ|G)dφ
It is computed under the following assumptions:
Complete labelled training data.The prior distributions over the parameters are Dirichlet distributions.
θij ∼ Dirichet(αij1, ..., αijk )
Parameters are globally and locally independent:
scoreBDeu(G|D) =n∏
i=1
|PaG(Xi )|∏j=0
Γ(αij )
Γ(αij + Nij )
|Xi |∏k=1
Γ(αijk + Nijk )
Γ(αijk )
BDe metrics sets alpha values as follows, in order to guarantee the likelihoodequivalence property:
αijk =S
|Xi ||Pa(Xi )|
ECSQARU 2011 Belfast (UK) 11/30
Bayesian Dirichlet Metrics
Bayesian Dirchlet equivalent metric
Marginal Likelihood of a graph structure:
P(D|G) =
∫P(D|θ,G)P(θ|G)dφ
It is computed under the following assumptions:
Complete labelled training data.The prior distributions over the parameters are Dirichlet distributions.
θij ∼ Dirichet(αij1, ..., αijk )
Parameters are globally and locally independent:
scoreBDeu(G|D) =n∏
i=1
|PaG(Xi )|∏j=0
Γ(αij )
Γ(αij + Nij )
|Xi |∏k=1
Γ(αijk + Nijk )
Γ(αijk )
BDe metrics sets alpha values as follows, in order to guarantee the likelihoodequivalence property:
αijk =S
|Xi ||Pa(Xi )|ECSQARU 2011 Belfast (UK) 11/30
Bayesian Dirichlet Metrics
Sensitivity to the ESS
The problem is that we make αijk values exponentially small either with thenumber or the cardinality of the parents: αijk = S
|Xi ||Pa(Xi )|.
Beta(1,1), Beta(0.5, 0.5), Beta(0.25, 0.25), Beta(0.125, 0.125)
(SteckJackola2002, Steck2008, Ueno.2010): small αijk values tends to favorthe the absence of an edge Y −→ X over its presence (even if they are notconditionally independent).
Specially if the empirical P̂(X |Y ) is not very extreme (it does not matchwith the prior assupmtions).
ECSQARU 2011 Belfast (UK) 12/30
Bayesian Dirichlet Metrics
Sensitivity to the ESS
The problem is that we make αijk values exponentially small either with thenumber or the cardinality of the parents: αijk = S
|Xi ||Pa(Xi )|.
Beta(1,1), Beta(0.5, 0.5), Beta(0.25, 0.25), Beta(0.125, 0.125)
(SteckJackola2002, Steck2008, Ueno.2010): small αijk values tends to favorthe the absence of an edge Y −→ X over its presence (even if they are notconditionally independent).
Specially if the empirical P̂(X |Y ) is not very extreme (it does not matchwith the prior assupmtions).
ECSQARU 2011 Belfast (UK) 12/30
Bayesian Dirichlet Metrics
Sensitivity to the ESS
If we increase the S value, we implicitly assume that marginal distributionsP(Xi ) = θi have very symmetrical probability distribution.
Beta(1,1), Beta(2, 2), Beta(4, 4), Beta(8, 8)
(SteckJackola2002, Steck2008, Ueno.2010): larger S values tends to favor thepresence of an edge Y −→ X over its absence (even if they are conditionallyindependent).
Specially, if there is a notable skewness in both marginal distributions:P(X |PaX ) and P(Y |PaY ).
ECSQARU 2011 Belfast (UK) 13/30
Bayesian Dirichlet Metrics
Sensitivity to the ESS
If we increase the S value, we implicitly assume that marginal distributionsP(Xi ) = θi have very symmetrical probability distribution.
Beta(1,1), Beta(2, 2), Beta(4, 4), Beta(8, 8)
(SteckJackola2002, Steck2008, Ueno.2010): larger S values tends to favor thepresence of an edge Y −→ X over its absence (even if they are conditionallyindependent).
Specially, if there is a notable skewness in both marginal distributions:P(X |PaX ) and P(Y |PaY ).
ECSQARU 2011 Belfast (UK) 13/30
Locally Averaged Bayesian Dirichlet Metrics
Part III
Locally Averaged BayesianDirichlet Metrics
ECSQARU 2011 Belfast (UK) 14/30
Locally Averaged Bayesian Dirichlet Metrics
Globally Averaged Bayesian Dirichlet Metrics
[Silander et al. 2007] Bayesian solution to the problem of selecting anoptimal ESS:
Consider S as a random variable, place a prior on S and marginalize it out.
P(D|G) =
∫P(D|G, s)P(s|G)ds
where P(D|G, s) is the classic marginal likelihood which depends of theequivalent sample size.
It is assumed that P(S|G) is uniform and integral is approximated by a simpleaveraging method
P(D|G) =1|S|
∑s∈S
∏i
|Ui |∏j=0
Γ( S|Ui |
)
Γ( S|Ui |
+ Nij )
|Xi |∏k=1
Γ( S|Ui ||Xi |
+ Nijk )
Γ( S|Ui ||Xi |
)
where S is a finite set of different S values.
Satisfies the likelihood equivalence property but it is not locally decomposable.
ECSQARU 2011 Belfast (UK) 15/30
Locally Averaged Bayesian Dirichlet Metrics
Globally Averaged Bayesian Dirichlet Metrics
[Silander et al. 2007] Bayesian solution to the problem of selecting anoptimal ESS:
Consider S as a random variable, place a prior on S and marginalize it out.
P(D|G) =
∫P(D|G, s)P(s|G)ds
where P(D|G, s) is the classic marginal likelihood which depends of theequivalent sample size.
It is assumed that P(S|G) is uniform and integral is approximated by a simpleaveraging method
P(D|G) =1|S|
∑s∈S
∏i
|Ui |∏j=0
Γ( S|Ui |
)
Γ( S|Ui |
+ Nij )
|Xi |∏k=1
Γ( S|Ui ||Xi |
+ Nijk )
Γ( S|Ui ||Xi |
)
where S is a finite set of different S values.
Satisfies the likelihood equivalence property but it is not locally decomposable.
ECSQARU 2011 Belfast (UK) 15/30
Locally Averaged Bayesian Dirichlet Metrics
Globally Averaged Bayesian Dirichlet Metrics
[Silander et al. 2007] Bayesian solution to the problem of selecting anoptimal ESS:
Consider S as a random variable, place a prior on S and marginalize it out.
P(D|G) =
∫P(D|G, s)P(s|G)ds
where P(D|G, s) is the classic marginal likelihood which depends of theequivalent sample size.
It is assumed that P(S|G) is uniform and integral is approximated by a simpleaveraging method
P(D|G) =1|S|
∑s∈S
∏i
|Ui |∏j=0
Γ( S|Ui |
)
Γ( S|Ui |
+ Nij )
|Xi |∏k=1
Γ( S|Ui ||Xi |
+ Nijk )
Γ( S|Ui ||Xi |
)
where S is a finite set of different S values.
Satisfies the likelihood equivalence property but it is not locally decomposable.
ECSQARU 2011 Belfast (UK) 15/30
Locally Averaged Bayesian Dirichlet Metrics
Sensitivity to the ESS
A toy example:
Z and Y have very skewed marginaldistributions.
P(X|Z) is not notably far from uniform.
We generate 1000 data samples.
We evaluate the BN with the highest score.
ECSQARU 2011 Belfast (UK) 16/30
Locally Averaged Bayesian Dirichlet Metrics
Globally Averaged Bayesian Dirichlet Metrics
Different averaging set values, SL, were tested:
S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.
S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).
ResultsIt always retrieves the empty graph without any edge.
Reasons:
We assume a global distribution (either strongly uniform or uniform orskewed or very skewed) for all parameters at the same time.This assumption does not fit the parameter space of this Bayesiannetwork.
ECSQARU 2011 Belfast (UK) 17/30
Locally Averaged Bayesian Dirichlet Metrics
Globally Averaged Bayesian Dirichlet Metrics
Different averaging set values, SL, were tested:
S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.
S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).
ResultsIt always retrieves the empty graph without any edge.
Reasons:
We assume a global distribution (either strongly uniform or uniform orskewed or very skewed) for all parameters at the same time.This assumption does not fit the parameter space of this Bayesiannetwork.
ECSQARU 2011 Belfast (UK) 17/30
Locally Averaged Bayesian Dirichlet Metrics
Globally Averaged Bayesian Dirichlet Metrics
Different averaging set values, SL, were tested:
S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.
S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).
ResultsIt always retrieves the empty graph without any edge.
Reasons:
We assume a global distribution (either strongly uniform or uniform orskewed or very skewed) for all parameters at the same time.This assumption does not fit the parameter space of this Bayesiannetwork.
ECSQARU 2011 Belfast (UK) 17/30
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsThe marginalization of the parameter S is carried out locally:
We assume that each parameter vector θij is drawn from a different Dirichlet distribution where the
parameters S are independent.
P(D|G) =1|S|
∏i
|Pa(Xi )|∏j=0
∑s∈S
Γ( S|Pa(Xi )|
)
Γ( S|Pa(Xi )|
+ Nij )
|Xi |∏k=1
Γ( S|Pa(Xi )||Xi |
+ Nijk )
Γ( S|Pa(Xi )||Xi |
)
where S is a finite set of different S values.
It is now locally decomposable metric but it losses the likelihood equivalenceproperty.
ECSQARU 2011 Belfast (UK) 18/30
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsThe marginalization of the parameter S is carried out locally:
We assume that each parameter vector θij is drawn from a different Dirichlet distribution where the
parameters S are independent.
P(D|G) =1|S|
∏i
|Pa(Xi )|∏j=0
∑s∈S
Γ( S|Pa(Xi )|
)
Γ( S|Pa(Xi )|
+ Nij )
|Xi |∏k=1
Γ( S|Pa(Xi )||Xi |
+ Nijk )
Γ( S|Pa(Xi )||Xi |
)
where S is a finite set of different S values.
It is now locally decomposable metric but it losses the likelihood equivalenceproperty.
ECSQARU 2011 Belfast (UK) 18/30
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsThe marginalization of the parameter S is carried out locally:
We assume that each parameter vector θij is drawn from a different Dirichlet distribution where the
parameters S are independent.
P(D|G) =1|S|
∏i
|Pa(Xi )|∏j=0
∑s∈S
Γ( S|Pa(Xi )|
)
Γ( S|Pa(Xi )|
+ Nij )
|Xi |∏k=1
Γ( S|Pa(Xi )||Xi |
+ Nijk )
Γ( S|Pa(Xi )||Xi |
)
where S is a finite set of different S values.
It is now locally decomposable metric but it losses the likelihood equivalenceproperty.
ECSQARU 2011 Belfast (UK) 18/30
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
Different averaging set values, SL, were tested:
S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.
S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).
ResultsWhen L ≥ 5 we always retrieve the right graph.
We assume that each parameter vector follows a different Dirichletdistribution either strongly uniform or uniform or skewed or very skewed. Butindependent from the rest of parameters.
This assumption allow to fit much more complex parameter spaces.
ECSQARU 2011 Belfast (UK) 19/30
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
Different averaging set values, SL, were tested:
S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.
S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).
ResultsWhen L ≥ 5 we always retrieve the right graph.
We assume that each parameter vector follows a different Dirichletdistribution either strongly uniform or uniform or skewed or very skewed. Butindependent from the rest of parameters.
This assumption allow to fit much more complex parameter spaces.
ECSQARU 2011 Belfast (UK) 19/30
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
Different averaging set values, SL, were tested:
S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.
S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).
ResultsWhen L ≥ 5 we always retrieve the right graph.
We assume that each parameter vector follows a different Dirichletdistribution either strongly uniform or uniform or skewed or very skewed. Butindependent from the rest of parameters.
This assumption allow to fit much more complex parameter spaces.
ECSQARU 2011 Belfast (UK) 19/30
Experimental Evaluation
Part IV
Experimental Evaluation
ECSQARU 2011 Belfast (UK) 20/30
Experimental Evaluation
Experimental Set-up
Bayesian Networks:
alarm (37 nodes), boblo (23 nodes), boerlage-92 (23 nodes), hailfinder (56nodes), insurance (27 nodes).
Data Sets:
We run 10 times the algorithms with 1000 data samples (other datasamples sizes were evaluated).
Evaluation Measures
Number of missing/extra links, Kullback-Leibler distance....Algorithms
A greedy search algorithm is used assuming we are given a correcttopological order of the variables.Different SL sets are used to perform averaging: L = 1, ...10 (displayed onx-axis).
ECSQARU 2011 Belfast (UK) 21/30
Experimental Evaluation
Experimental Set-up
Bayesian Networks:
alarm (37 nodes), boblo (23 nodes), boerlage-92 (23 nodes), hailfinder (56nodes), insurance (27 nodes).
Data Sets:
We run 10 times the algorithms with 1000 data samples (other datasamples sizes were evaluated).
Evaluation Measures
Number of missing/extra links, Kullback-Leibler distance....Algorithms
A greedy search algorithm is used assuming we are given a correcttopological order of the variables.Different SL sets are used to perform averaging: L = 1, ...10 (displayed onx-axis).
ECSQARU 2011 Belfast (UK) 21/30
Experimental Evaluation
Experimental Set-up
Bayesian Networks:
alarm (37 nodes), boblo (23 nodes), boerlage-92 (23 nodes), hailfinder (56nodes), insurance (27 nodes).
Data Sets:
We run 10 times the algorithms with 1000 data samples (other datasamples sizes were evaluated).
Evaluation Measures
Number of missing/extra links, Kullback-Leibler distance....
Algorithms
A greedy search algorithm is used assuming we are given a correcttopological order of the variables.Different SL sets are used to perform averaging: L = 1, ...10 (displayed onx-axis).
ECSQARU 2011 Belfast (UK) 21/30
Experimental Evaluation
Experimental Set-up
Bayesian Networks:
alarm (37 nodes), boblo (23 nodes), boerlage-92 (23 nodes), hailfinder (56nodes), insurance (27 nodes).
Data Sets:
We run 10 times the algorithms with 1000 data samples (other datasamples sizes were evaluated).
Evaluation Measures
Number of missing/extra links, Kullback-Leibler distance....Algorithms
A greedy search algorithm is used assuming we are given a correcttopological order of the variables.Different SL sets are used to perform averaging: L = 1, ...10 (displayed onx-axis).
ECSQARU 2011 Belfast (UK) 21/30
Experimental Evaluation
BDe with different S values I
050
100
150
Log of the S Value
Mis
sing
+E
xtra
Lin
ks
AlarmBobloBoerlageHailfinderInsurance
−6 −4 −2 0 1 2 3 4 5 6 7 8
0.0
0.5
1.0
1.5
2.0
Log of the S ValueK
L D
ista
nce
AlarmBobloBoerlageHailfinderInsurance
−6 −4 −2 0 1 2 3 4 5 6 7 8
Analysis
The BDe metric is very sensitive to the S values in some domain problems.
There is an optimal S value which is different for each problem.
ECSQARU 2011 Belfast (UK) 22/30
Experimental Evaluation
BDe with different S values II
510
1520
Log of the S Value
Mis
sing
Lin
ks
AlarmBobloBoerlageHailfinderInsurance
−6 −4 −2 0 1 2 3 4 5 6 7 8
020
4060
8010
012
014
0
Log of the S Value
Ext
ra L
inks
AlarmBobloBoerlageHailfinderInsurance
−6 −4 −2 0 1 2 3 4 5 6 7 8
Analysis
We can see the theoretically predicted tendencies appears.
Higher S values have a tendency to add edges.
Lower S values have a tendency to remove edges.
ECSQARU 2011 Belfast (UK) 23/30
Experimental Evaluation
Locally Averaged Bayesian Dirichlet metrics
510
1520
L Values
Mis
sing
+E
xtra
Lin
ks
AlarmBobloBoerlageHailfinderInsurance
1 2 3 4 5 6 7 8 9 10
0.0
0.2
0.4
0.6
0.8
1.0
1.2
L Values
KL
Dis
tanc
e
AlarmBobloBoerlageHailfinderInsurance
1 2 3 4 5 6 7 8 9 10
Analysis
The higher the L value, the wider the set of averaged S values.
In some domains, the error measures improves with the size of averaged Svalues.
In other domains, the error does not improve but it does not get worse.
ECSQARU 2011 Belfast (UK) 24/30
Experimental Evaluation
Globally Averaged Bayesian Dirichlet metrics
510
1520
L Values
Mis
sing
+E
xtra
Lin
ks
AlarmBobloBoerlageHailfinderInsurance
1 2 3 4 5 6 7 8 9 10
0.0
0.2
0.4
0.6
0.8
1.0
1.2
L ValuesK
L D
ista
nce
AlarmBobloBoerlageHailfinderInsurance
1 2 3 4 5 6 7 8 9 10
Analysis
Similar behavior to locally averaged metrics.
ECSQARU 2011 Belfast (UK) 25/30
Experimental Evaluation
Globally vs Locally Averaged Bayesian Dirichlet metrics
Global-AvBD error minus Local-AvBD error
01
23
L Values
Mis
sing
+E
xtra
Diff
eren
ce
AlarmBobloBoerlageHailfinderInsurance
1 2 3 4 5 6 7 8 9 10
Analysis
In Alarm, Boblo and Boerlage, there hardly are differences between them.
In Hailfinder and Insurance, Local-AvBD metric performs better.
The performance depends of the complexity of the parameter space.
ECSQARU 2011 Belfast (UK) 26/30
Experimental Evaluation
BDe metric vs Locally Averaged Bayesian Dirichlet metrics
BD error minus Local-AvBD error
−1
01
2
L Values
Mis
sing
+E
xtra
Diff
eren
ce
AlarmBobloBoerlageHailfinderInsurance
1 2 3 4 5 6 7 8 9 10
Analysis
For BD metric, it is seleced the model with the lowest error using any of Svalues in the set SL.
Local-AvBD metric peforms as least as well as the BD metric with an optimal Svalue.
In some domains (Hailfinder and Insurance), Local-AvBD metric carries outbetter inferences.
ECSQARU 2011 Belfast (UK) 27/30
Conclusions and Future Works
Part V
Conclusions and Future Works
ECSQARU 2011 Belfast (UK) 28/30
Conclusions and Future Works
Conclusions and Future Works
ConclusionsLocally Averaged Bayesian Dirichlet metrics robustly infers more accuratemodels than the BDe metric with an optimal selection the ESS parameter.
It is able to adapt to complex parameter spaces.
This metric is worth for knowledge discovery tasks: the inferences does notdepend of any free parameter and it gives the performance of an opmtimalsolution.
Future WorksExtend this method to the parameter estimation of a BN model:
P(Xi = k |Pa(Xi ) = j) =nijk + S
|Xi ||Pa(Xi )|
nij + S|Pa(Xi )|
ECSQARU 2011 Belfast (UK) 29/30
Conclusions and Future Works
Conclusions and Future Works
ConclusionsLocally Averaged Bayesian Dirichlet metrics robustly infers more accuratemodels than the BDe metric with an optimal selection the ESS parameter.
It is able to adapt to complex parameter spaces.
This metric is worth for knowledge discovery tasks: the inferences does notdepend of any free parameter and it gives the performance of an opmtimalsolution.
Future WorksExtend this method to the parameter estimation of a BN model:
P(Xi = k |Pa(Xi ) = j) =nijk + S
|Xi ||Pa(Xi )|
nij + S|Pa(Xi )|
ECSQARU 2011 Belfast (UK) 29/30
Conclusions and Future Works
Thanks for you attention!!!
ECSQARU 2011 Belfast (UK) 30/30