multi-dimensional bayesian network...
TRANSCRIPT
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
MULTI-DIMENSIONALBAYESIAN NETWORK CLASSIFIERS
Pedro Larrañaga
Computational Intelligence GroupArtificial Intelligence Department
Technical University of Madrid
EAIA 2018 - DS4BD. Advanced School on Data Science for Big DataPorto, July 4, 2018
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Outline
1 Multi-label classification and multi-dimensional classification
2 Binary relevance, classifier chain and label power set
3 Bayesian networks
4 Multi-dimensional Bayesian network classifiers
5 Applications
6 Conclusions
7 References
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Outline
1 Multi-label classification and multi-dimensional classification
2 Binary relevance, classifier chain and label power set
3 Bayesian networks
4 Multi-dimensional Bayesian network classifiers
5 Applications
6 Conclusions
7 References
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Simultaneous object recognition in images (multi-label)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Medical diagnosis (multi-label)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Multiple fault diagnosis (multi-label)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Single label classification versus multi-label classification
X1 X2 X3 X4 X5 C3.2 1.4 4.7 7.5 3.7 12.8 6.3 1.6 4.7 2.7 07.7 6.2 4.1 3.3 7.7 19.2 0.4 2.8 0.5 3.9 05.5 5.3 4.9 0.6 6.6 1
Single label classification
X1 X2 X3 X4 X5 C1 C2 C3 C43.2 1.4 4.7 7.5 3.7 1 0 1 12.8 6.3 1.6 4.7 2.7 0 0 1 07.7 6.2 4.1 3.3 7.7 1 0 1 19.2 0.4 2.8 0.5 3.9 0 1 0 05.5 5.3 4.9 0.6 6.6 1 1 0 1
Multi-label classification
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Multi-label classification vs multi-dimensional classification
X1 X2 X3 X4 X5 C1 C2 C3 C43.2 1.4 4.7 7.5 3.7 1 0 1 12.8 6.3 1.6 4.7 2.7 0 0 1 07.7 6.2 4.1 3.3 7.7 1 0 1 19.2 0.4 2.8 0.5 3.9 0 1 0 05.5 5.3 4.9 0.6 6.6 1 1 0 1
Multi-label classification
X1 X2 X3 X4 X5 C1 C2 C3 C43.2 1.4 4.7 7.5 3.7 1 0 2 42.8 6.3 1.6 4.7 2.7 0 0 1 07.7 6.2 4.1 3.3 7.7 3 0 2 19.2 0.4 2.8 0.5 3.9 2 1 0 25.5 5.3 4.9 0.6 6.6 3 1 0 3
Multi-dimensional classification
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Neuronal cell-type classification (multi-dimensional)
X1, ...,Xn are morphological variables
C1, ...,CdAnimal specie: rat, mouse, monkey, cat, human, ...Neuronal cell-type: pyramidal, interneuron, Purkinje, Martinotti,...Brain region: amygdala, cerebral cortex, hippocampus, ...Age of the animal: young, adult
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Dow Jones companies stock market values prediction
(multi-dimensional + time )
X1, ...,Xn are the stock market values of the companies during the last week
C1, ...,CdCompany 1: [-5%,-2%), [-2%,-1%), [-1%,0%) [0%,1%) [1%,2%) [2%,5%)Company 2: [-5%,-2%), [-2%,-1%), [-1%,0%) [0%,1%) [1%,2%) [2%,5%)Company 3: [-5%,-2%), [-2%,-1%), [-1%,0%) [0%,1%) [1%,2%) [2%,5%)Company 4: [-5%,-2%), [-2%,-1%), [-1%,0%) [0%,1%) [1%,2%) [2%,5%)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Weather forescast (multi-dimensional + time + spatial)
X1, ...,Xn are variables extracted from meteorological stations: thermometer,barometer, hygrometer, anenometer, wind vane, rain gauge, disdrometer,transmissometer, ceilling projector, ....
C1, ...,CdTemperature: [-100, 00), [00, 100), [100, 250), [250, 400)Relative humidity: [0, 30), [30, 60), [60, 80), [80, 100)Rain: light rain (precipitation rate is < 2.5 mm/h), moderate rain (2.5 mm/h- 10 mm/h), heavy rain (10 mm/h - 50 mm/h), violent rain (> 50 mm/h)Wind: calm (< 1km/h), light air (1 km/h - 5 km/h), ......, storm (103 km/h -117 km/h), hurricane (>117 km/h)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning multi-label classification models from data
A multi-label (multi-dimensional) data setD = {(x(1),c(1)), ..., (x(N),c(N))} wherex(i) ∈ ΩX =
∏nj=1 ΩXj and c
(i) ∈ ΩC =∏d
k=1 ΩCkThe learning task for a multi-label (multi-dimensional)classification paradigm is to output a function φ
ΩC ≡ ΩX1 × · · · × ΩXnφ−→ ΩC ≡ ΩC1 × · · · × ΩCd
x ≡ (x1, ..., xn) → c ≡ (c1, ..., cd )
Reviews on this topic: Tsoumakas and Katakis (2007),Zhang and Zhou (2014) and Gibaja and Ventura (2015)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Overview of learning methods for multi-label classification
Problem transformation methodsThey transform the learning task into one or more single-labelclassification tasksThey are algorithm independent(a) Transform to binary classification: binary relevance (Godbole andSarawagi, 2004), classifiers chain (Read et al., 2011)(b) Transform to multiclass classification: label powerset (Boutell et al.,2004), RAKEL (Tsoumakas et al., 2010)(c) Identifying label dependencies: correlation based pruning (Tsoumakaset al., 2009), LBPR algorithm (Tenenboim et al., 2010)
Algorithm adaptation methodsThey extend specific learning algorithms in order to handle multi-label datadirectlyExamples: classification trees (Clare and King, 2001), neural networks(Zhang and Zhou, 2006), k -nearest neighbors (Zhang and Hou, 2007),support vector machines (Elisseeff and Weston, 2001), random forest(Madjarov et al., 2012), Bayesian networks (van der Gaag and de Waal,2006).
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Mean accuracy and exact match for multi-label classification
xi C1 C2 C3 C4 C5 Ĉ1 Ĉ2 Ĉ3 Ĉ4 Ĉ5x1 1 0 1 0 0 1 0 0 1 0x2 0 1 0 1 0 0 1 0 1 0x3 1 0 0 1 0 1 0 0 1 0x4 0 1 1 0 0 0 1 0 0 0x5 1 0 0 0 0 1 0 0 1 0
Mean accuracy computes the mean of the accuracies in each of the d classvariables
Mean accuracy(φ) =1d
d∑j=1
1N
N∑i=1
I(ĉ ij = cij )
where I(true) = 1 and I(false) = 0In the table: Mean accuracy(φ) = 15 (1 + 1 + 0.6 + 0.6 + 1) = 0.804Exact match computes the fraction of correctly classified instances. An instancewill be correctly classified if the binary vector containing the values of eachbinary class variable coincides with the binary vector containing their predictions
Exact match(φ) =1N
N∑i=1
I(ĉi = ci )
In the table: Exact match(φ) = 15 (0 + 1 + 1 + 0 + 0) = 0.4
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Outline
1 Multi-label classification and multi-dimensional classification
2 Binary relevance, classifier chain and label power set
3 Bayesian networks
4 Multi-dimensional Bayesian network classifiers
5 Applications
6 Conclusions
7 References
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Binary relevance (Godbole and Sarawagi, 2004)
X C1 C2 C3 C4x(1) 1 0 0 1x(2) 0 0 1 1x(3) 1 0 0 1x(4) 0 1 0 0x(5) 1 1 1 0
X C1 X C2 X C3 X C4x(1) 1 x(1) 0 x(1) 0 x(1) 1x(2) 0 x(2) 0 x(2) 1 x(2) 1x(3) 1 x(3) 0 x(3) 0 x(3) 1x(4) 0 x(4) 1 x(4) 0 x(4) 0x(5) 1 x(5) 1 x(5) 1 x(5) 0
Learns one binary classifier for each label independently of the rest of labels
Outputs the concatenation of their predictions
Does not consider label relationships
Any supervised classification method can be used, i.e., Bayesian network basedclassifiers
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Binary relevance with Bayesian network classifiers
c∗ = arg maxc
p(c|x) = arg maxc
p(x, c)
Naive Bayes
Selective naive Bayes
Semi-naive Bayes
ODE
TAN
SPODE
k-DB
BAN
Markov blanket-based
Unrestricted
Bayesian multinet
p(c)p(x1, . . . , xn | c)p(c | pa(c))
n∏i=1
p(xi | pa(xi))
p(c)n∏
i=1
p(xi | pac(xi))
p(c | x1, . . . , xn) ∝ p(c, x1, . . . , xn)
Taxonomy of discrete Bayesian network classifiers according to the factorization of p(x, c)(Bielza and Larrañaga (2014))
Decision boundary for discrete Bayesian network classifiers (Varando et al.,2015)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Binary relevance with Bayesian network classifiers
Naive Bayes (Minsky, 1961)
p(c|x) ∝ p(c)n∏
i=1
p(xi |c)
A naive Bayes structure from which p(c|x) ∝ p(c)p(x1|c)p(x2|c)p(x3|c)p(x4|c)p(x5|c)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Binary relevance with Bayesian network classifiers
Selective naive Bayes (Langley and Sage, 1994)
p(c|x) ∝ p(c|xF ) = p(c)∏i∈F
p(xi |c)
XF denotes the projection of X onto the selected feature subset F ⊆ {1, 2, ..., n}
A selective naive Bayes structure from which p(c|x) ∝ p(c)p(x1|c)p(x2|c)p(x4|c). The variables in the shadednodes have not been selected
Feature subset selection: filter (univariate and multivariate) based on informationtheory measures as mutual information, and wrapperFor multivariate filtering and wrapper approaches a heuristic search (best first,floating search, simulated annealing, tabu search, genetic algorithms, estimationof distribution algorithms) in the huge space of cardinality 2n must be carried outSee Saeys et al. (2007) for a review on feature subset selection
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Binary relevance with Bayesian network classifiers
One-dependence estimators (ODEs). TAN (Friedman et al. 1997)
One-dependence estimators are more general naive Bayes where each predictorvariable is allowed to depend on at most one other predictor in addition to the class.Tree-augmented naive Bayes (TAN)
p(c|x) ∝ p(c)p(xr |c)n∏
i=1,i 6=rp(xi |c, xj(i))
where Xr denotes the root node and {Xj(i)} = Pa(Xi ) \ C, for any i 6= r
(a) (b)
(a) A TAN structure, whose root node is X3, from whichp(c|x) ∝ p(c)p(x1|c, x2)p(x2|c, x3)p(x3|c)p(x4|c, x3)p(x5|c, x4);
(b) Selective TAN (Blanco et al. 2005), from which p(c|x) ∝ p(c)p(x2|c, x3)p(x3|c)p(x4|c, x3)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Binary relevance with Bayesian network classifiers
One-dependence estimators (ODEs). TAN (Friedman et al. 1997)
Algorithm 1: Learning a TAN structureInput : A data setD = {(x1, c1), ..., (xN , cN )} with X = (X1, ..., Xn)Output: A TAN structure
1 for i < j, i, j = 1, ..., n do
Compute MI(Xi , Xj |C)=∑
i,j,r p(xi , xj , cr ) logp(xi ,xj |cr )
p(xi |cr )p(xj |cr )
end2 Build a complete undirected graph where the nodes are X1, ..., Xn . Annotate the weight of an edge connecting Xi
and Xj by MI(Xi , Xj |C)3 Build a maximum weighted spanning tree:
3a Select the two edges with the heaviest weight3b while The tree contains less that n − 1 edges do
if They do not form a cycle with the previous edges thenSelect the next heaviest edge
elseReject the edge and continue
endend
4 Transform the resulting undirected into a directed tree by choosing a root node and setting the direction of all edgesto be outward from this node
5 Construct a TAN structure by adding a node C and an arc from C to each Xi
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Binary relevance with Bayesian network classifiers
One-dependence estimators (ODEs). TAN (Friedman et al. 1997)
MI(X1,X3|C) > MI(X2,X4|C) > MI(X1,X2|C) > MI(X3,X4|C) > MI(X1,X4|C) >MI(X3,X5|C) > MI(X1,X5|C) > MI(X2,X3|C) > MI(X2,X5|C) > MI(X4,X5|C)
(a) (b) (c)
(d) (e) (f)
(g) (h)
An example of TAN structure construction. (a-c) Edges are added according to conditional mutual informationquantities arranged in ascending order. (d-e) Edges X3 − X4 and X1 − X4 (dashed lines) cannot be added since
they form a cycle. (f) Maximum weighted spanning tree. (g) The respective directed tree obtained by choosing X1 asthe root node. (h) Final TAN structure.
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Binary relevance with Bayesian network classifiers
Markov blanket-based Bayesian classifier (Koller and Sahami, 1996)
If C has parents:
p(c|x) ∝ p(c|pa(c))n∏
i=1
p(xi |pa(xi ))
The Markov blanket of C is the only knowledge needed to predict its behavior
Bayesian classifiers based on identifying the Markov blanket of the class variable
A Markov blanket structure for C, MBC = {X1, X2, X3, X4}, from whichp(c|x) ∝ p(c|x2)p(x1|c)p(x2)p(x3)p(x4|c, x3)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Discrete Bayesian network classifiers
c∗ = arg maxc
p(c|x) = arg maxc
p(x, c)
Taxonomy of discrete Bayesian network classifiers according to the factorization of p(x, c)(Bielza and Larrañaga (2014))
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Classifier chain with Bayesian network classifiers
Classifier chain (Read et al. (2011)) overcomes the label independenceassumption by transforming the multi-label learning problem into a chain ofbinary classification problems
Each binary classifier in the chain is built upon the predictions (probabilisticpredictions) of preceding classifiers
A classifier chain learns d functions φi on augmented input spaces, takingĉ1, ..., ĉi−1 as additional features:
ΩX × {0, 1}i−1 φi−→ [0, 1]
(x , ĉ1, ..., ĉi−1) → p(ci |x , ĉ1, ..., ĉi−1)
φi can be learnt using c1, ..., ci−1 instead of ĉ1, ..., ĉi−1. However, when appliedto new instances, the classifier chain method necessarily uses predictionsĉ1, ..., ĉi−1A total order among the class variables should be defined beforehand (the resultis dependent of that order)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Classifier chain with Bayesian network classifiers
Tree naive Bayesian classifier chain (TNBCC)(Sucar et al., 2014)
Only one parent per class in a chain: A TAN structure for class variables
The root of the TAN can be selected in d different ways
TNBCC uses naive Bayes as baseline classifiers
Structure of a TNBCC
Several heuristics for selecting the root node: random, the node with morenumber of incident edges, the node with the highest mutual information, ...
One unique TNBCC versus an ensemble of TNBCC (best results)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Label powerset. Boutell et al. (2004)
X C1 C2 C3 C4 Cx(1) 1 0 0 1 10x(2) 0 0 1 1 4x(3) 1 0 0 1 10x(4) 0 1 0 0 5x(5) 1 1 1 0 15
Each different set of labels becomes a different class in a new single-labelclassification task
Most implementations of label powerset classifiers essentially ignore labelcombination that are not presented in the training set (cannot predict unseenlabelsets)
Limited training examples for many classes
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Label powerset. Multiple diagnosis problem with probabilistic
models. Peng and Reggia (1987a; 1987b)
X1 . . . Xm C1 . . . Cd(x(1), c(1)) x (1)1 . . . x
(1)m c
(1)1 . . . c
(1)d
(x(2), c(2)) x (2)1 . . . x(2)m c
(2)1 . . . c
(2)d
. . . . . . . . .
(x(N), c(N)) x (N)1 . . . x(N)m c
(N)1 . . . c
(N)d
Optimal diagnosis as abductive inference: searching for the most probable explanation(MPE)
(c∗1 , . . . , c∗d )
= arg max(c1,...,cd )
p(C1 = c1, . . . ,Cd = cd |X1 = x1, . . . ,Xm = xm)
= arg max(c1,...,cd )
p(C1 = c1, . . . ,Cd = cd )p(X1 = x1, . . . ,Xm = xm|C1 = c1, . . . ,Cd = cd )
Number of parameters to be estimated: 2d − 1 + 2d (2m − 1)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Outline
1 Multi-label classification and multi-dimensional classification
2 Binary relevance, classifier chain and label power set
3 Bayesian networks
4 Multi-dimensional Bayesian network classifiers
5 Applications
6 Conclusions
7 References
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks (Pearl (1988) and Koller and Friedman (2009))
A Bayesian network is a compact representation of the joint probabilitydistribution (JPD) p(X1, ...,Xn)
Bayesian networks solve this problem by using the concept of conditionalindependence between triplets of variables
Two random variables X and Y are conditionally independent givenanother random variable Z if p(x |y , z) = p(x |z) ∀x , y , zAn equivalent definition is p(x , y |z) = p(x |z)p(y |z) ∀x , y , z. LetIp(X ,Y |Z ) denote this condition
Suppose that for each Xi , a subset Pa(Xi ) ⊆ {X1, ...,Xi−1} such that givenPa(Xi ), Xi is conditionally independent of all variables in {X1, ...,Xi−1} \ Pa(Xi ),i.e., p(Xi |X1, ...,Xi−1) = p(Xi |Pa(Xi )). Then the JPD factorizes as
p(X1, ...,Xn) = p(X1)p(X2|X1)p(X3|X1,X2) · · · p(Xn|X1, ...,Xn−1)= p(X1|Pa(X1)) · · · p(Xn|Pa(Xn))
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Structure and parameters. Example: Factory production
Years Y is the factory’s age, where y denotes ‘more than 10 years’ and ¬y denotes ‘less than 10 years’Employees E represents the number of employees: more (e) or less (¬e) than 100 employeesMachines M has two values: m and ¬m for ‘more than 20 machines’ and ‘less than 20 machines’Pieces P also has two options: more (p) or less (¬p) than 10,000 produced pieces per yearFailures F includes f that stands for ‘more than two failures on average per month’ and otherwise thestate is ¬f
p(Y)
0.750.25
Y
p(M|Y)
0.100.90
Y M
0.850.15
p(F|P)
0.750.25
M F
0.050.95
p(E|Y)
0.700.30
Y E
0.200.80
p(P|E,M)
0.960.04
M P
0.400.60
E
0.450.550.100.90
¬y
yy
¬y
e
e¬e
¬e
y¬y
¬y
yy
¬y
m
m¬m
¬m
¬m
mm
¬m
f
f¬ f
¬ f
e
e
¬e
¬e
ee
¬e¬e
m
¬mm
¬mm
¬mm
¬m
p
p¬p
¬pp
p¬p
¬p
E M
P
YYears
Employees Machines
Pieces Failures
F
Hypothetical Bayesian network modeling factory production. To fully specify the JPD: 25 − 1 = 31 parameters. TheBayesian network representing requires 17 input conditional probabilities
The Bayesian networks factorizes the JPD as: p(Y , E,M, P, F ) = p(Y )p(E|Y )p(M|Y )p(P|E,M)p(F |M)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Markov condition
Markov condition or local directed Markov property
The descendant nodes of Xi are the nodes reachable from Xi by repeatedly following the arcs
Let ND(Xi ) denote the non-descendant nodes of XiFollowing the arcs in the opposite direction, we find the ancestors
In a Bayesian network each node is conditionally independent of its non-descendants, given its parents:Ip(Xi ,ND(Xi )|Pa(Xi ))
p(Y)
0.750.25
Y
p(M|Y)
0.100.90
Y M
0.850.15
p(F|P)
0.750.25
M F
0.050.95
p(E|Y)
0.700.30
Y E
0.200.80
p(P|E,M)
0.960.04
M P
0.400.60
E
0.450.550.100.90
¬y
yy
¬y
e
e¬e
¬e
y¬y
¬y
yy
¬y
m
m¬m
¬m
¬m
mm
¬m
f
f¬ f
¬ f
e
e
¬e
¬e
ee
¬e¬e
m
¬mm
¬mm
¬mm
¬m
p
p¬p
¬pp
p¬p
¬p
E M
P
YYears
Employees Machines
Pieces Failures
F
All nodes are descendants of Y . The descendants of M are P and F . The Markov condition for M statesthat E and M are c.i. given Y
All nodes are non-descendants of P and hence P and {Y , F} are c.i. given {E,M}
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. u-separation
u-separation (Lauritzen et al., 1990): A graphical criterion for finding additional (apartfrom Markov condition) conditional independences
If X is u-separated from Y given Z, then X and Y are c.i. given Z, for any X, Y, Z disjoint random vectors
Checking whether X and Y are u-separated by Z is a three-step procedure:
1 Get the smallest subgraph containing X, Y, and Z and their ancestors. This is called the ancestralgraph
2 Moralize the ancestral graph, i.e., add an undirected link between parents having a common childand then drop directions on all arcs.
3 Z u-separates X and Y whenever Z is in all paths between X and Y
E M
P
Y
Years
Employees Machines
Pieces Failures
F
Moralized ancestral graph of the factory production Bayesian network
Let us check whether P and F are u-separated by {E,M}. Since E or M is always found in every path (inthe moralized ancestral graph) from P to F , P and F are u-separated by {E,M}. Hence P and F are c.i.given {E,M}However, Y and F are not u-separated by P because we can go from F to Y through M, without crossing P
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Types of inference
Probabilistic reasoning: p(Xi |e) where E = e is the observed evidenceAbductive inference finds the values of a set of variables that best explain theobserved evidence
Total abduction: arg maxU p(U|e), i.e., we find the most probable explanation (MPE),Partial abduction solves the same problem for a subset of variables in u (the explanation set),referred to as the partial maximum a posteriori (MAP)
Predictive reasoning: we predict the effect (produced pieces) given a cause(machines), p(p|m) = 0.62 (in the figure) and p(p|¬m) = 0.30 (not shown)
(a) (b)Inference on the factory production example. (a) Prior distributions p(Xi ). (b) p(Xi |m)
Diagnostic reasoning: we diagnose the causes given the effects. Given the effectF = f , the probability of the cause being many machines is p(m|f ) = 0.86Intercausal reasoning: if we know that the factory has many employees (E = e),this would explain the observed high number of pieces produced p and wouldlower the probability of Machines = m being the cause:p(m|p, e) = 0.32 < p(m|p) = 0.45
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Inference methods
Exact inferenceBrute-force approach
p(P) =∑
Y ,E,M,F
p(Y ,E ,M,F ,P)
=∑
Y ,E,M,F
p(Y )p(E |Y )p(M|Y )p(P|E ,M)p(F |M)
Variable elimination algorithm (Zhang and Poole, 1994)
p(P) =∑
Y
p(Y )∑
E
p(E |Y )∑
M
p(M|Y )p(P|E ,M)∑
F
p(F |M)
Junction tree algorithm (Lauritzen and Spiegelhalter, 1988) based onmessage passing algorithm on a junction tree
Approximate inferenceProbabilistic logic sampling (Henrion, 1988)Likelihood weighting (Shachter and Peot, 1989)Gibbs sampling (Pearl, 1987)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Learning parameters
For parameter learning, we need to have the structure
Ri = |ΩXi |
qi = |ΩPa(Xi )|, i.e., ΩPa(Xi ) = {pa1i , ...,pa
qii }
The parameter θijk = p(Xi = k |Pa(Xi ) = paji ): the conditional probability that Xi
takes its k -th value given that its parents take their j-th value
θijk parameters are organized into θ = (θ1, ...,θn) with a total of∑n
i=1 Ri qicomponents, and are estimated from D = {x1, ..., xN}
Nijk be the number of cases in D where Xi = k and Pa(Xi ) = paji have been
observed at the same time
Let Nij be the number of cases in D in which Pa(Xi ) = paji has been observed
(Nij =∑Ri
k=1 Nijk )
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Learning parameters
The maximum likelihood estimation finds θ̂ML such that it maximizes thelikelihood of the dataset given the model:
θ̂ML = arg maxθL(θ|D,G) = arg max
θp(D|G,θ) = arg max
θ
N∏h=1
p(xh|G,θ)
Under the global parameter independence and local parameter independenceasumptions (Spiegelhalter and Lauritzen, 1990):
L(θ|D,G) =n∏
i=1
qi∏j=1
Ri∏k=1
θNijkijk
and θ̂MLijk =NijkNij
With sparse datasets Laplace estimator is used: θ̂Lapijk =Nijk +1Nij +Ri
If D has incomplete instances (with missing values), the estimations can becalculated with the EM algorithm (Lauritzen 1995) or even with the structural EM(Friedman 1998) where not only parameters but also structures can be updatedat each EM iteration
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Learning parametersX1
X2
X3
X4
X1X2X3X4
1 2 21
1 1 21
2 1 2 1
2 1 2 2
1 2 21
1 3 11
(a) (b)
(a) A Bayesian network structure with |ΩXi | = 2, i = 1, 3, 4, |ΩX2 | = 3 and q1 = q2 = 0, q3 = 6, q4 = 2. (b) Adataset with N = 6 for {X1, ..., X4} from which the structure in (a) has been learned
Parameters Meaningθ1 = (θ1−1, θ1−2) (p(X1 = 1), p(X1 = 2))θ2 = (θ2−1, θ2−2, θ2−3) (p(X2 = 1), p(X2 = 2), p(X2 = 3))θ3 = (θ311, θ312, . . . , θ361, θ362) (p(X3 = 1|X1 = 1, X2 = 1),
p(X3 = 2|X1 = 1, X2 = 1), . . .p(X3 = 1|X1 = 2, X2 = 3),p(X3 = 2|X1 = 2, X2 = 3))
θ4 = (θ411, θ412, θ421, θ422) (p(X4 = 1|X3 = 1), p(X4 = 2|X3 = 1),p(X4 = 1|X3 = 2), p(X4 = 2|X3 = 2))
To estimate θ1−1 = p(X1 = 1), we find four out of six instances in the X1 column, and hence, θ̂ML1−1 = 2/3
To estimate θ322 = p(X3 = 2|X1 = 1, X2 = 2), we find that neither of the two instances withX1 = 1, X2 = 2, include either X3 = 2 or θ̂
ML322 = 0 (this is a case where Nijk = 0)
To estimate θ361 = p(X3 = 1|X1 = 2, X2 = 3), we find that θ̂ML361 is undefined since there are no instanceswith X1 = 2, X2 = 3 (i.e., Nij = 0). However, the Laplace estimates yield θ̂
Lap322 = 1/4, θ̂
Lap361 = 1/2
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Learning structures. Constraint-based
methods
Constraint-based methods use the data to statistically test conditionalindependences among triplets of variables
The goal is to build a DAG that represents a large percentage (and wheneverpossible all) of the identified conditional independence constraints
The PC algorithm (Spirtes and Glymour, 1991) starts with all nodes connectedby edges and follows three steps:
1 Step 1 outputs the adjacencies in the graph, i.e., the skeleton of thelearned structure
2 Step 2 identifies colliders (a collider or converging connection at node X isY → X ← Z )
3 Step 3 orients the edges and outputs the complete partially DAG(CPDAG), the Markov equivalence class of DAGs
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Learning structures. Constraint-based
methods
Algorithm 2: Step 1 of the PC algorithm: estimation of the skeletonInput : A complete undirected graph and an ordering σ on the variables {X1, ...,Xn}Output: Skeleton G of the learned structure
1 Form the complete undirected graph G on nodes {X1, ...,Xn}2 t = −13 repeat
t = t + 1repeat
4 Select a pair of adjacent nodes Xi − Xj in G using ordering σ5 Find S ⊂ Adj i \ {Xj} in G with |S| = t (if any) using ordering σ6 Remove edge Xi − Xj from G iff Xi and Xj are c.i. given S
until all ordered pairs of adjacent nodes have been testeduntil all adjacent pairs Xi − Xj in G satisfy |Adj i \ {Xj}| ≤ t
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Score and search methods
AIC, BICBD, K2,
BDe, BDeu
Greedy, simulated annealing, EDAs, genetic algorithms, MCMC
Dynamic programming, branch & bound, mathematical programming
Score and search
Search spaces Scores Search
DAGsEquiv.classes
OrderingsPenalized likelihood
Bayesian Exact Approximate
Methods for Bayesian network structure learning based on score and search
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Score and search methods
Finding a network structure that optimizes a score is an NP-hard (Chickering,1996)
Different search spaces:Space of DAGs whose cardinality was given by Robinson (1977):
f (n) =n∑
i=1
(−1)i+1(n
i
)2i(n−i)f (n − i), for n > 2
with f (0) = f (1) = 1Space of Markov equivalence classes is smaller than the space of DAGs.The #DAGs/#CPDAGs ratio approaches an asymptote of about 3.7(Gillispie and Perlman, 2002)Space of orderings of the variables: some learning algorithms (e.g., the K2algorithm) only work with a fixed order. Cardinality: n!
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Score and search methods. ScoresThe need of penalized likelihood scores
The estimated log-likelihood of the data given the BN:
logL(θ̂|D,G) = log p(D|G, θ̂) = logn∏
i=1
qi∏j=1
Ri∏k=1
θ̂Nijkijk =
n∑i=1
qi∑j=1
Ri∑k=1
Nijk log θ̂ijk
where θ̂ijk = θ̂MLijk =NijkNij
, the frequency counts in DThis score increases monotonically with the complexity of the modelThe optimal structure would be the complete graph
X1
Lik
elih
oo
d
DAG complexity
Training data
Test data
X2
X3
X4
X1
X2
X3
X4
X1
X2
X3
X4
Structural overfitting: the likelihood of the training data is higher for denser graphs, but it degrades for the test data
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Score and search methods. Scores
Penalized likelihood scores
General expression of a family of penalized log-likelihood scores:
QPen(D,G) =n∑
i=1
qi∑j=1
Ri∑k=1
Nijk logNijkNij− dim(G)pen(N)
where dim(G) =∑n
i=1(Ri − 1)qi denotes the model dimension (number ofparameters needed in the Bayesian network), and pen(N) is a non-negativepenalization function
Depending on pen(N):If pen(N) = 1, the score is called Akaike’s information criterion (Akaike,1974)If pen(N) = 12 log N, the score is the Bayesian information criterion (BIC)(Schwartz, 1978)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Score and search methods. Scores
Bayesian scores
The Bayesian approach to structure learning aims to find G that maximizes itsposteriori probability given the data, i.e., find arg maxG p(G|D)Using Bayes’ formula: p(G|D) ∝ p(D,G) = p(D|G)p(G)The second factor, p(G), is the prior distribution over structuresThe first factor, p(D|G), is the marginal likelihood of the data, defined as
p(D|G) =∫
p(D|G,θ)f (θ|G)dθ
where p(D|G,θ) is the likelihood of the data given the Bayesian network(structure G and parameters θ), and f (θ|G) is the prior distribution over theparameters
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Score and search methods. ScoresBayesian scores
Depending on f (θ|G), we have different scores: If (θij |G) follows a Dirichlet ofparameters αij1, ..., αijRi , we have the Bayesian Dirichlet score (BD score). A Dirichletdistribution is determined by hyperparameters αijk for all i, j, k
The K2 score (Cooper and Herskovits, 1992) uses the uninformative assignmentαijk = 1, for all i, j, k , resulting in
QK 2(D,G) = p(G)n∏
i=1
qi∏j=1
(Ri − 1)!(Nij + Ri − 1)!
Ri∏k=1
Nijk !.
The K2 algorithm uses a greedy search method and the K2 score. The user gives a node ordering and themaximum number of parents that any node is permitted to have. Starting with an empty structure, thealgorithm incrementally adds, from the set of nodes that precede each node Xi in the node ordering, theparent whose addition most increases the function:
g(Xi , Pa(Xi )) =qi∏
j=1
(Ri − 1)!(Nij + Ri − 1)!
Ri∏k=1
Nijk !.
When the score does not increase further with the addition of a single parent, no more parents are added to
node Xi , and we move on to the next node in the ordering
The likelihood-equivalent Bayesian Dirichlet score (BDe score) (Heckerman et al.(1995)) sets the hyperparameters as αijk = α p(Xi = k ,Pa(Xi ) = pa
ji |G). The
equivalent sample size α expresses the user’s confidence in the prior networkIn the BDeu score (Buntine, 1991), αijk = α 1qi Ri
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Dynamic, temporal and continuous time Bayesian networks
Dynamic Bayesian networks (Dean and Kanazawa, 1989)
X1[1]
X2[1]
X3[1]
X4[1]
X1[t+1]X 1[t ]
X2[ t ]
X3[ t ]
X2[ t+1]
X3[ t+1]
X4[t ] X
4[t+1]
X1[1]
X2[1]
X3[1]
X4[1]
X1[2]
X2[2 ]
X3[2]
X4[2]
X1[3]
X2[3]
X3[3]
X4[3]
(a) (b) (c)
A dynamic Bayesian network structure with four variables X1, X2, X3 and X4 and three time slices (T = 3):(a) prior Bayesian network; (b) transition network, with the first-order Markovian transition assumption; (c)
dynamic Bayesian network unfolded in time for three time slices
Temporal nodes Bayesian networks (Galán et al., 2007)
Continuous time Bayesian networks (Nodelman et al., 2002)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Bayesian networks. Software
HUGIN1 GeNIe2 Open-Markov3 gRain4 bnlearn5
Exact inferenceJunction tree
√ √ √
Approximate inferenceProbabilistic logic sampling
√ √ √ √
Constraint-based learningPC algorithm
√ √ √ √
Score+searchK2 algorithm
√ √ √ √
The symbol√
denotes that the inference or structure learning method is available in the respective software
1https://www.hugin.com/
2https://www.bayesfusion.com/
3http://www.openmarkov.org/
4https://CRAN.R-project.org/package=gRain
5http://www.bnlearn.com/
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Outline
1 Multi-label classification and multi-dimensional classification
2 Binary relevance, classifier chain and label power set
3 Bayesian networks
4 Multi-dimensional Bayesian network classifiers
5 Applications
6 Conclusions
7 References
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Multiple diagnosis problem. Direct approach
X1 . . . Xm C1 . . . Cd(x(1), c(1)) x (1)1 . . . x
(1)m c
(1)1 . . . c
(1)d
(x(2), c(2)) x (2)1 . . . x(2)m c
(2)1 . . . c
(2)d
. . . . . . . . .
(x(N), c(N)) x (N)1 . . . x(N)m c
(N)1 . . . c
(N)d
Optimal diagnosis as abductive inference: searching for the most probable explanation(MPE)
(c∗1 , . . . , c∗d )
= arg max(c1,...,cd )
p(C1 = c1, . . . ,Cd = cd |X1 = x1, . . . ,Xm = xm)
= arg max(c1,...,cd )
p(C1 = c1, . . . ,Cd = cd )p(X1 = x1, . . . ,Xm = xm|C1 = c1, . . . ,Cd = cd )
Number of parameters to be estimated for the case of binary predictors and classes:2d − 1 + 2d (2m − 1)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Multiple diagnosis with multi-dimensional Bayesian network classifiers (MBCs) (van der
Gaag and de Waal, 2006)
In a multi-dimensional Bayesian network classifier (MBC) the set of vertices V ispartitioned into:
VC = {C1, ...,Cd} of class variables andVX = {X1, ...,Xm} of feature variables
with (d + m = n)
Three subgraphs in the structure of a multi-dimensional Bayesian networkclassifier
Class subgraph: GCBridge subgraph: GCXFeature subgraph: GX
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Examples of MBCs structures
(a) Empty-empty MBC
(b) Tree-tree MBC
(c) Polytree-DAG MBC
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Tractability of MPE in MBCs with class bridgedecomposable MBCs (Bielza et al., 2011)
MPE is generally NP-hard in Bayesian networks (Kwisthout, 2011)
An MBC is class-bridge decomposable (CB-decomposable MBC) (Bielza et al., 2011)if:
1 GC ∪ GCX can be decomposed as: GC ∪ GCX =⋃r
i=1(GC i ∪ G(CX ) i ) whereGC i ∪ G(CX ) i with i = 1, ..., r are its maximal connected components
2 Non-shared children: Ch(VC i ) ∩ Ch(VC j ) = ∅, with i, j = 1, ..., r and i 6= j , whereCh(VC i ) denotes the children of all the variables in VC i
(a) A CB-decomposable MBC
(b) Its two maximal connected components
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Tractability of MPE in MBCs with class bridge decomposable MBCs (Bielza et al., 2011)
Theorem (Bielza et al., 2011)
Given a CB-decomposable MBC where Ii =∏
C∈VC iΩC denotes the sample space associated with VC i , then
maxc1,...,cd
p(C1 = c1, ...,Cd = cd |X1 = x1, ..., Xm = xm)
∝r∏
i=1
maxc↓VC i ∈Ii
∏C∈VC i
p(c|pa(c))∏
X∈Ch(VC i )p(x|paVC (x), paVX (x))
where c↓VC i represents the projection of vector c to the coordinates found in VC i
maxc1,...,c5 p(C1 = c1, ...,C5 = c5|X1 = x1, ..., X6 = x6) ∝maxcp(c1)p(c2)p(c3|c2)p(c4)p(c5|c4)p(x1|c1)p(x2|c1, c2, x1)p(x3|c3)p(x4|c3, x3, x5, x6)p(x5|c4, c5, x6)p(x6|c5)= maxc1,c2,c3 p(c1)p(c2)p(c3|c2)p(x1|c1)p(x2|c1, c2, x1)p(x3|c3)p(x4|c3, x3, x5, x6)·
maxc4,c5 p(c4)p(c5|c4)p(x5|c4, c5, x6)p(x6|c5)
Here VC1 = {C1,C2,C3},VC2 = {C4,C5},Ch(VC1) = {X1, X2, X3, X4} and Ch(VC2) = {X5, X6}. Notethat Ch(VC1) ∩ Ch(VC2) = ∅ as required in the theorem
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Tractability of MPE in MBCs according to thetreewidth
Exact methods for MPE computations in Bayesian networks are exponential in thetreewidth of G
Treewidth as the size of the largest clique in its triangulated graph (all cycles of four ormore nodes have a chord)
Several results for bounding treewidths in MBCs:1 For MBCs with empty feature subgraph (Pastink and van der Gaag, 2015):
treewidth(G) < treewidth(G′ ), where G′ is the pruned graph (first moralize G andthen remove the feature nodes from the moral graph)
2 For general MBCs (DAGs-DAGs MBCs) (de Waal and van der Gaag, 2007):treewidth(G) ≤ treewidth(GX ) + d
3 For CB-decomposable MBCs (Kwisthout, 2011):treewidth(G) ≤ treewidth(GX ) + |dmax|, where |dmax| is the number of classvariables of the component with maximun number of class variables
4 For general MBCs (DAGs-DAGs MBCs) (Benjumeda et al., 2018) present underwhich circunstances MPE computation can be done in polynomial time
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs from data. Cardinality of thesearch space
Theorem (Bielza et al., 2011)
The number of all possible MBC structures with d class variables and n feature variables, MBC(d, n), is
MBC(d,m) = S(d) · 2dm · S(m),
where S(m) =∑m
i=1(−1)i+1(
mi
)2i(m−i)S(m − i) is Robinson’s formula that counts the number of possible DAG
structures of n nodes, which is initialized as S(0) = S(1) = 1
Theorem (Bielza et al., 2011)
The number of all possible bridge subgraphs, BRS(d,m), m ≥ d, for MBCs satisfying the following two conditions:(a) for each Xi ∈ VX , there is a Cj ∈ VC with (Cj , Xi ) ∈ ACX and (b) for each Cj ∈ VC , there is an Xi ∈ VXwith (Cj , Xi ) ∈ ACX , is given by the recursive formula
BRS(d,m) = 2dm −m−1∑k=0
(dmk
)−
dm∑k=m
∑x≤d,y≤m
k≤xy≤dm−d
(dx
)(my
)BRS(x, y, k)
where BRS(x, y, k) denotes the number of bridge subgraphs with k arcs in an MBC with x class variables and yfeature variables which is initialized as BRS(1, 1, 1) = BRS(1, 2, 2) = BRS(2, 1, 2) = 1
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs from data. Empty-empty MBC
Empty-empty MBC learned Multi-label naive Bayes (MLNB)(Zhang et al., 2009)
A two-stage filter-wrapper feature selection strategy is incorporated:First stage (filter): feature extraction techniques based on principlecomponent analysis (PCA)Second stage (wrapper): subset selection techniques based on a geneticalgorithm (GA) (on the space of PCAs) are used to choose the mostappropriate subset of features for classification
For continuous features a Gaussian assumption is assumed: the density of thefeatures variables given the class values follows a Gaussian density
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs from data. Tree-tree MBC
Tree-tree MBC learned in a three steps algorithm (van derGaag and de Waal, 2006)
1 The class subgraph is learnt by searching for the maximum weighted undirectedspanning tree and transforming it into a directed tree using Chow and Liu’s(1968) algorithm
The weight of an edge is the mutual information between a pair of classvariables
2 For a fixed bridge subgraph, the feature subgraph is then learnt by building amaximum weighted directed spanning tree
The weight of an arc is the conditional mutual information between pairs offeature variables given the parents (classes) of the second featuredetermined by the bridge subgraph
3 The bridge subgraph is greedily changed in a wrapper-like way trying to improvethe considered metric (i.e., exact match)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs from data. DAG-DAG MBC. Filter
DAG-DAG MBC (Bielza et al., 2011)
Score: penalized likelihood or Bayesian scoreGreedy search in five steps:
1 Learn the class subgraph2 Learn the feature subgraph3 Propose a candidate bridge subgraph4 Obtain a candidate feature subgraph5 Decide on bridge and feature subgrah candidates
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs from data. DAG-DAG MBC. Filter
DAG-DAG MBC (Bielza et al., 2011)
1. Learn the class subgraph
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs from data. DAG-DAG MBC. Filter
DAG-DAG MBC (Bielza et al., 2011)
2. Learn the feature subgraph and3. Propose a candidate bridge subgraph
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs from data. DAG-DAG MBC. Filter
DAG-DAG MBC (Bielza et al., 2011)
2. Learn the feature subgraph and3. Propose a candidate bridge subgraph
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs from data. DAG-DAG MBC. Filter
DAG-DAG MBC (Bielza et al., 2011)
4. Obtain a candidate feature subgraph and5. Decide on bridge and feature subgrah candidates
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs from data. DAG-DAG MBC.Wrapper
DAG-DAG MBC (Bielza et al., 2011)
i = 01. G(i) = ∅. Acc = Acc(i)
2. Whenever there are arcs that can be added to G(i) (and not previously discarded):Add one arc to GC (i), GCX (i) or GX (i) and obtain the new G(i+1) and Acc(i+1)
3. If Acc(i+1) > Acc(i), Acc = Acc(i+1), i = i + 1, and go to 2Else discard the arc and go to 2
4. Stop and return G(i) and Acc
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning CB decomposable MBCs from data (Borchani et al., 2010)
Phase I. Learn bridge subgraph
X2
C1
X5 X1
C2
X4X2 X1X4
C3
X3X6 X5
C4
X4X6
Starting from an empty graphical structure, learn a selective naive Bayes for each class variable
X2
C1
X1
C2
X3
C3
X6X4 X5
C4
Check the non-shared children property to induce an initial CB-decomposable MBC (remove all commonchildren based on two criteria: feature insertion rank and accuracy)The result of this phase is a simple CB-decomposable MBC where only the bridge subgraph is defined(class and feature subgraphs are empty)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning CB decomposable MBCs from data (Borchani et al., 2010)
Phase II. Learn feature subgraph
X2
C1
X1
C2
X3
C3
X6X4 X5
C4
Introduce the dependence relationships between the feature subgraph
Fix the maximum number of iterations
In each iteration an arc is selected at random between a pair of feature variables. If there is an accuracyimprovement, the arc is added, otherwise it is discarded
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning CB decomposable MBCs from data (Borchani et al., 2010)
Phase III. Merge maximal connected components
X2
C1
X1
C2
X3
C3
X6X4 X5
C4
X2
C1
X1
C2
X3
C3
X6X4 X5
C4
Merging the maximal connected components Bridge subgraph update
All possible arc additions between the class variables are evaluated, adding the arc improving the accuracythe most (from r to r − 1 maximal connected components)A bridge update step is performed inside the new induced maximal connected component
X2
C1
X1
C2
X3
C3
X6X4 X5
C4
Feature subgraph update
Update the feature subgraph by inserting, one by one, additional arcs between feature variables
This phase iterates over these three steps (stopping criteria: no more component merging can improve theaccuracy or r = 1)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Learning MBCs by a constrained-based approach (Borchani et al., 2013)
Markov blanket MBC (MB-MBC) learning algorithm
Apply the HITON algorithm (Aliferis et al., 2003) to each class variable todetermine Markov blankets for each of them
Given the MBC definition, direct parents of any class variable Ci , i = 1, . . . , d ,can only be among the remaining class variables, whereas direct children orspouses of Ci can include either class or feature variables
The MBC subgraphs based on the results of the HITON algorithm:
Class subgraph: firstly insert an edge between each class variable Ci andany class variable belonging to its corresponding parents-children setPC(Ci ). Then, direct all these edges using the PC algorithm’s edgeorientation rulesBridge subgraph: this is built by inserting an arc from each class variableCi to every feature variable belonging to PC(Ci )Feature subgraph: for every feature X in the set MB(Ci ) \ PC(Ci ), i.e., forevery spouse X , we insert an arc from X to the corresponding commonchild given by PC(X) ∩ PC(Ci )
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Multi-dimensional Bayesian network classifier trees (Gil-Begue et al., 2018)
A multi-dimensional Bayesian network classifier tree (MBCTree) is a classification treewith MBCs in the leaves
An internal node of an MBCTree corresponds to a feature variable Xi , as instandard classification trees, and has a labelled branch to a child for each of itspossible valuesA leaf of an MBCTree is an MBC over all the class variables and those featuresnot present in the path from the root to the leaf
Wrapper approach (greedy search) guided by the exact match accuracy for learningMBCTrees
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Detecting multi-dimensional concept drift in data streams (Borchani et al., 2016)
The concept drift detection used to be based on ensemble updatingWe use a single MBC and our drift detection method is performed locallyThe average local log-likelihood of each variable in the MBC network:
llsi =
1Ns∑qi
j=1∑ri
k=1 Nsijk log
NsijkNsij
Change point detection with the Page-Hinkley testCUMs =
∑st=1(LL
t −meanLLt− δ) where mean
LLt= 1t
∑th=1 LL
h
denotes the mean of LL1, ..., LL
tvalues, and δ is a tolerance parameter
MAX s = max{
CUM t , t = 1, ..., s}
The Page-Hinkley test value: PHs = MAX s − CUMs . If PHs > λ, the nullhypothesis is rejected and the Page-Hinkley test alarms a change,otherwise, no change is signaled
Evolution of the average local log-likelihood values of four different class variablesPedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Outline
1 Multi-label classification and multi-dimensional classification
2 Binary relevance, classifier chain and label power set
3 Bayesian networks
4 Multi-dimensional Bayesian network classifiers
5 Applications
6 Conclusions
7 References
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Predicting human immunodeficiency virus (HIV) type 1 inhibitors (Borchani et al., 2013)
A cell with the HIV-1. Reverse transcription
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Predicting human immunodeficiency virus (HIV) type 1 inhibitors (Borchani et al., 2013)
Therapies for HIV-1 are combinations or cocktails of antiretroviral drugs
We aim to gain insight into the different interactions between drugs andresistance mutations
Reverse transcriptase inhibitors (RTIs) consist of two groups of antiretroviraldrugs preventing HIV-1 replication: nucleoside and nucleotide reversetranscriptase inhibitors (NRTIs) and non-nucleoside reverse transcriptaseinhibitors (NNRTIs)
NRTIs consist of seven drugs: Abacavir (ABC), Didanosine (DDI), Emtricitabine(FTC), Lamivudine (3TC), Stavudine (D4T), Tenofovir (TDF), and Zidovudine(AZT)
NNRTIs considers three drugs: Efavirenz (EFV), Nevirapine (NVP), andDelavirdine (DLV)
A total of 38 mutations associated with resistance to RTIs: 22 associated withNNRTs and 16 with NNRTIs
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Predicting human immunodeficiency virus (HIV) type 1 inhibitors (Borchani et al., 2013)
We analyzed reverse transcriptase and protease data sets obtained from theonline Stanford HIV-1 database (Rhee et al., 2003)Treatment histories from 2855 patients that received either NRTIs, NNRTIs orbothThe data set contained a total of 4884 samplesThe number of RTIs varies from 1 to 8 drugs: 17 samples with 1 RTI, 25 with 2RTIs, 157 with 3 RTIs, 698 with 4 RTIs, 1852 with 5 RTIs, 1600 with 6 RTIs, 483with 7 RTIs, and 56 with 8 RTIs
Mean accuracy Exact matchmaxCS = 1 0.7108± 0.0221 0.1151± 0.0466maxCS = 2 0.7062± 0.0191 0.0881± 0.0403
MB-MBC maxCS = 3 0.7019± 0.0153 0.0780± 0.0363maxCS = 4 0.6995± 0.0145 0.0701± 0.0336maxCS = 5 0.6978± 0.0106 0.0646± 0.0241
Tree-Tree 0.6968± 0.0163 0.0364± 0.0101DAG-DAG Filter 0.7074± 0.0063 0.0240± 0.0066DAG-DAG Wrapper 0.7095± 0.0040 0.0291± 0.0008CB-MBC 0.7261± 0.0113 0.0382± 0.0105
Estimated performance metrics (mean± std. deviation) for RTIs data set
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Predicting human immunodeficiency virus (HIV) type 1 inhibitors (Borchani et al., 2013)
The graphical structure of the MBC learnt using RTIs data set
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
EQ-5D health states from PDQ-39 in Parkinson (Borchani et al., 2012)
Parkinson disease motor symptoms
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
EQ-5D health states from PDQ-39 in Parkinson (Borchani et al., 2012)
PDQ-39 and EQ-5D: quality of life instruments to measure the degree of disability in PD
39-item Parkinson’s Disease Questionnaire: a specific instrument
PDQ-39 captures patient’s perception of his illness covering 8 dimensions:
1 Mobility
2 Activities of dailyliving
3 Emotional well-being
4 Stigma
5 Social support
6 Cognitions
7 Communication
8 Bodily discomfort
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
EQ-5D health states from PDQ-39 in Parkinson (Borchani et al., 2012)
European Quality of Life - 5 Dimensions: a generic instrument
EQ-5D is a generic measure of health for clinical and economic appraisal
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
EQ-5D health states from PDQ-39 in Parkinson (Borchani et al., 2012)
Mapping PDQ-39 to EQ-5D
PDQ1 PDQ2 ... ... PDQ39 EQ1 EQ2 EQ3 EQ4 EQ53 1 ... ... 3 1 3 3 2 12 3 ... ... 2 1 1 2 3 25 2 ... ... 4 1 3 3 1 2... ... ... ... ... ... ... ... ... ...4 4 ... ... 3 3 1 2 3 24 4 ... ... 3 3 1 2 3 25 5 ... ... 4 2 3 2 3 3
φ : (PDQ1, ...,PDQ39)→ (EQ1, ...,EQ5)
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
EQ-5D health states from PDQ-39 in Parkinson (Borchani et al., 2012)
488 Parkinson’s patients. Estimated measures over 5-fold cross-validation
Method Mean accuracy Exact matchMB-MBC 0.7119± 0.0338 0.2030± 0.0718CB-MBC 0.6807± 0.0285 0.1865± 0.0429MNL 0.6926± 0.0430 0.1802± 0.0713OLS 0.4201± 0.0252 0.0123± 0.0046CLAD 0.4254± 0.0488 0.0143± 0.0171
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
EQ-5D health states from PDQ-39 in Parkinson (Borchani et al., 2012)
MB-MBC graphical structure
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Outline
1 Multi-label classification and multi-dimensional classification
2 Binary relevance, classifier chain and label power set
3 Bayesian networks
4 Multi-dimensional Bayesian network classifiers
5 Applications
6 Conclusions
7 References
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
MULTI-DIMENSIONAL BAYESIAN NETWORK CLASSIFIERSSpecialized Bayesian networks for solving multi-label andmuti-dimensional problemsAdvantages:
Transparency, interpretabilityVariety of inference (MPE) and learning algorithmsHierarchy of models organized by structural complexityCompetitive results with the state-of-the-art methods whend is about dozens
Disadvantages:Wrapper approaches for learning are very computationallydemandingIn large d problems difficult to compete with state-of-the-artmethods
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
Outline
1 Multi-label classification and multi-dimensional classification
2 Binary relevance, classifier chain and label power set
3 Bayesian networks
4 Multi-dimensional Bayesian network classifiers
5 Applications
6 Conclusions
7 References
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
References
H. Akaike (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control,6, 716-723.C. F. Aliferis, I. Tsamardinos and M. S. Statnikov (2003). HITON: A novel Markov blanket algorithm foroptimal variable selection. AMIA Annual Symposium Proceedings, 2125.M. Benjumeda, C. Bielza, and P. Larrañaga (2018). Tractability of most probable explanations inmultidimensional Bayesian network classifiers. International Journal of Approximate Reasoning, 93, 74-87.C. Bielza and G. Li and P. Larrañaga (2011). Multi-dimensional classification with Bayesian networks.International Journal of Approximate Reasoning, 52, 705-727.C. Bielza and P. Larrañaga (2014). Discrete Bayesian network classifiers: A survey. ACM ComputingSurveys, 47 (1), Article 5.R. Blanco, I. Inza, M. Merino, J. Quiroga, and P. Larrañaga (2005). Feature selection in Bayesian classifiersfor the prognosis of survival of cirrhotic patients treated with TIPS. Journal of Biomedical Informatics, 38(5),376-388.Borchani, H., C. Bielza, and P. Larrañaga (2010). Learning CB-decomposable multi-dimensional Bayesiannetwork classifiers. Proceedings of the 5th European Workshop on Probabilistic Graphical Models, 25-32.H. Borchani, C. Bielza, P. Martı́nez-Martı́n and P. Larrañaga (2012). Multidimensional Bayesian networkclassifiers applied to predict the European quality of life-5 dimensions (EQ-5D) from the 39-item Parkinson’sdisease questionnaire (PDQ-39), Journal of Biomedical Informatics, 45, 1175-1184H. Borchani, C. Bielza, and P. Larrañaga (2013). Predicting human immunodeficiency virus inhibitors usingmulti-dimensional Bayesian network classifiers. Artificial Intelligence in Medicine, 57 (3), 219-229.H. Borchani, P. Larrañaga, J. Gama, and C. Bielza (2016) Mining multi-dimensional concept-drifting datastreams using Bayesian network classifiers. Intelligent Data Analysis, 20 (2), 257-280.M. R. Boutell, J. Luo, X. Shen and C.M. Brown (2004). Learning multi-label scene classification. PatternRecognition 37, 17571771.W.L. Buntine (1991). Theory refinement on Bayesian networks. Proceedings of the 7th Conference onUncertainty in Artificial Intelligence, 52-60.D.M. Chickering (1996). Learning Bayesian networks is NP-Complete. Learning from Data: ArtificialIntelligence and Statistics V, 121-130, Springer.
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
References
C.K. Chow and C.N. Liu (1968). Approximating discrete probability distributions with dependence trees.IEEE Transactions on Information Theory, 14(3), 462-467.
A. Clare and R.D. King (2001). Knowledge discovery in multi-label phenotype data. Proceedings of the 5thEuropean Conference on Principles of Data Mining and Knowledge Discovery, 42-53.
G.F. Cooper and E. Herskovits (1992). A Bayesian method for the induction of probabilistic networks fromdata. Machine Learning, 9, 309-347.
P.R. de Waal and L. van der Gaag (2007). Inference and learning in multi-dimensional Bayesian networkclassifiers. Proceedings of the 9th Europen Conference on Symbolic and Quantitative Approaches toReasoning with Uncertainty, Springer, 501-511.
T. Dean and K. Kanazawa (1989). A model for reasoning about persistence and causation. ComputationalIntelligence, 5(3), 142-150.
K. Deb, A. Pratap, S. Agarwal and T. Meyarivan (2002). A fast and elitist multiobjective genetic algorithm:NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182-197.
A. Elisseeff and J. Weston (2002). A kernel method for multi-labelled classification. Advances in NeuralInformation Processing Systems 14, 681-687.
N. Friedman (1998). The Bayesian structural EM algorithm. Proceedings of the 14th Conference onUncertainty in Artificial Intelligence, 129-138.
N. Friedman, D. Geiger and M. Goldszmidt (1997). Bayesian network classifiers. Machine Learning, 29,131-163.S.F. Galán, G. Arroyo-Figueroa, F.J. Dı́ez L.E. and Sucar (2007). Comparison of two types of eventBayesian networks: A case study. Applied Artificial Intelligence, 21(3), 185-209.
S. Gil-Begue, C. Bielza, P. Larrañaga (2018). Multi-dimensional Bayesian network classifier trees. The 9thInternational Conference on Probabilistic Graphical Models, submitted.S.B. Gillispie and M.D. Perlman (2002). The size distribution for Markov equivalence classes of acyclicdigraph models. Artificial Intelligence, 141(1/2), 137-155.
E. Gibaja and S. Ventura (2015). A tutorial on multi-label learning. ACM Computing Surveys, 47, 3, Article52.
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
References
S. Godbole and S. Sarawagi (2004). Discriminative methods for multi-labeled classification. Proceedings ofthe 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 22-30.
D. Heckerman, D. Geiger and D.M. Chickering (1995). Learning Bayesian networks: The combination ofknowledge and statistical data. Machine Learning, 20, 197-243.
M. Henrion (1988). Propagating uncertainty in Bayesian networks by probabilistic logic sampling.Uncertainty in Artificial Intelligence 2, 149-163, Elsevier Science.
D. Koller and N. Friedman (2009). Probabilistic Graphical Models: Principles and Techniques. The MITPressD. Koller and M. Sahami (1996). Toward optimal feature selection. Proceedings of the 13th InternationalConference on Machine Learning, 284-292.
J. Kwisthout (2011). Most probable explanations in Bayesian networks: Complexity and tractability.International Journal of Approximate Reasoning, 52(9), 1452-1469.
P. Langley and S. Sage (1994). Induction of selective Bayesian classifiers. Proceedings of the 10thConference on Uncertainty in Artificial Intelligence, 399-406.
S.L. Lauritzen (1995). The EM algorithm for graphical association models with missing data. ComputationalStatistics and Data Analysis, 19, 191-201.
S.L. Lauritzen, A.P. Dawid, B.N. Larsen and H.G. Leimer (1990). Independence properties of directedMarkov fields. Networks, 20(5), 491-505.
S.L. Lauritzen and D.J. Spiegelhalter (1988). Local computations with probabilities on graphical structuresand their application to expert systems. Journal of the Royal Statistical Society, Series B (Methodological),50(2), 157-224.
G. Madjarov, D. Kocev, D. Gjorgjevikj and S. Džeroskii (2012). An extensive experimental comparison ofmethods for multi-label learning. Pattern Recognition, 45, 9, 3084-3014.
M. Minsky (1961). Steps toward artificial intelligence. Transactions on Institute of Radio Engineers, 49, 8-30.
U. Nodelman, C.R. Shelton and D. Koller (2002). Continuous time Bayesian networks. Proceedings of the18th Conference on Uncertainty in Artificial Intelligence, 378-387.
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
References
A. Pastink and L. van der Gaag (2015). Multi-classifiers of small treewidth. Proceedings of the 13th EuropenConference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, Springer, 199-209.
J. Pearl (1987). Evidential reasoning using stochastic simulation of causal models. Artificial Intelligence,32(2), 245-257.
J. Pearl (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.
Y. Peng and J. A. Reggia (1987a). A probabilistic causal model for diagnostic problem solving. Part I:Integrating symbolic causal inference with numeric probabilistic inference. IEEE Transactions on Systems,Man and Cybernetics, 17 (2), 146-162.
Y. Peng and J. A. Reggia (1987b). A probabilistic causal model for diagnostic problem solving. Part II:Diagnostic strategy. IEEE Transactions on Systems, Man and Cybernetics, 17 (3), 395-406.
J. Read, B. Pfahringer, G. Homes and E. Frank (2011). Classifier chains for multi-label classification.Machine Learning, 85, 3, 333-359.
S.Y. Rhee, M.J. Gonzales, R. Kantor, J. Betts, J. Ravela and R.W. Shafer (2003). Human immunodeficiencyvirus reverse transcriptase and protease sequence database. Nucleic Acids Research, 31(1), 298-303.
R. Robinson (1977). Counting unlabeled acyclic digraphs. Lecture Notes in Mathematics, 622, 28-43,Springer.
Y. Saeys, I. Inza and P. Larrañaga (2007). A review of feature selection techniques in bioinformatics.Bioinformatics, 23 (19), 2507-2517.
G. Schwarz (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461-464.
R.D. Shachter and M.A. Peot (1989). Simulation approaches to general probabilistic inference on beliefnetworks. Proceedings of the 5th Annual Conference on Uncertainty in Artificial Intelligence, 221-234.
D.J. Spiegelhalter and S.L. Lauritzen (1990). Sequential updating of conditional probabilities on directedgraphical structures. Networks, 20, 579-605.
P. Spirtes and C. Glymour (1991). An algorithm for fast recovery of sparse causal graphs. Social ScienceComputer Review, 90(1), 62-72.
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
References
E. Sucar, C. Bielza, E. F. Morales, P. Hernandez-Leal, J. H. Zaragoza and P. Larrañaga (2014). Multi-labelclassification with Bayesian network-based chain classifiers. Pattern Recognition Letters, 41, 14-22.L. Tenenboim, L. Rokach and B. Shapira (2010). Identification of label dependencies for multi-labelclassification. Proceedings of the 2nd International Workshop on Learning from Multi-Label Data, 53-60.G. Tsoumakas, A. Dimou, E. Spyromitros, V. Mezaris, I. Kompatsiaris and I. Vlahavas (2009).Correlation-based pruning of stacked binary relevance models for multi-label learning. Proceedings of the1st International Workshop on Learning from Multi-Label Data, 101-116.G. Tsoumakas, I. Katakis and I. Vlahavas (2010). Random k -labelsets for multi-label classification. IEEETransactions on Knowledge and Data Engineering, 23, 7, 1079-1089.G. Tsoumakas and I. Katakis (2007). Multi-label classification: An overview. International Journal of DataWarehousing and Mining, 3, 1-13.L.C. van der Gaag and P.R. de Waal (2006). Multi-dimensional Bayesian network classifiers. Proceedings ofthe 3rd European Workshop on Probabilistic Graphical Models, 107-114.G. Varando, C. Bielza and P. Larrañaga (2015). Decision boundary for discrete Bayesian network classifiers.Journal of Machine Learning Research, 16, 2725-2749.G. I. Webb, J. Boughton and Z. Wang (2002). Not so naive Bayes: Aggregating one-dependenceestimators. Machine Learning, 58, 5-24.M.L. Zhang, J.M. Peña and V. Robles (2009). Feature selection for multi-label naive Bayes classification.Information Sciences, 179 (19), 3218-3229.N. Zhang and D. Poole (1994). A simple approach to Bayesian network computations. Proceedings of the10th Biennial Canadian Conference on Artificial Intelligence, 171-178.M.L. Zhang and Z.H. Zhou (2006). Multilabel neural networks with applications to functional genomics andtext categorization. IEEE Transactions on Knowledge and Data Engineering, 18,10, 1338-1351.M. L. Zhang and Z.H. Zhou (2007). ML-KNN: A lazy learning approach to multi-label learning. PatternRecognition, 40, 7, 2038-2048.M. L. Zhang and Z.H. Zhou (2014). A review on multi-label learning algorithms. IEEE Transactions onKnowledge and Data Engineering, 26, 8, 1819-1837.
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
-
Intro BR, CC and LBP BN MBC Appl Concl Ref
MULTI-DIMENSIONALBAYESIAN NETWORK CLASSIFIERS
Pedro Larrañaga
Computational Intelligence GroupArtificial Intelligence Department
Technical University of Madrid
EAIA 2018 - DS4BD. Advanced School on Data Science for Big DataPorto, July 4, 2018
Pedro Larrañaga Multi-dimensional Bayesian network classifiers
Multi-label classification and multi-dimensional classificationBinary relevance, classifier chain and label power setBayesian networksMulti-dimensional Bayesian network classifiersApplicationsConclusionsReferences