full bayesian network classifiers by jiang su and harry...

151
Full Bayesian Network Classifiers by Jiang Su and Harry Zhang Flemming Jensen November 2008

Upload: others

Post on 09-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Full Bayesian Network Classifiersby Jiang Su and Harry Zhang

Flemming Jensen

November 2008

Page 2: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Purpose

To introduce the full Bayesian network classifier(FBC).

Page 3: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Introduction

Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.

Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.

The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.

We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.

Page 4: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Introduction

Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.

Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.

The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.

We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.

Page 5: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Introduction

Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.

Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.

The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.

We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.

Page 6: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Introduction

Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.

Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.

The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.

We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.

Page 7: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Variable Independence

Definition - Conditionally independence

Let X , Y , Z be subsets of the variable set W . The subsets X andY are conditionally independent given Z if:

P(X |Y ,Z ) = P(X |Z )

Definition - Contextually independence

Let X , Y , Z , T be disjoint subsets of the variable set W . Thesubsets X and Y are contextually independent given Z and thecontext t if:

P(X |Y ,Z , t) = P(X |Z , t)

Page 8: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Variable Independence

Definition - Conditionally independence

Let X , Y , Z be subsets of the variable set W . The subsets X andY are conditionally independent given Z if:

P(X |Y ,Z ) = P(X |Z )

Definition - Contextually independence

Let X , Y , Z , T be disjoint subsets of the variable set W . Thesubsets X and Y are contextually independent given Z and thecontext t if:

P(X |Y ,Z , t) = P(X |Z , t)

Page 9: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Existence

Theorem - Existence

For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.

Proof:

Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.

Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .

The resulting network FB is a full BN.

Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.

Page 10: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Existence

Theorem - Existence

For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.

Proof:

Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.

Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .

The resulting network FB is a full BN.

Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.

Page 11: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Existence

Theorem - Existence

For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.

Proof:

Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.

Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .

The resulting network FB is a full BN.

Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.

Page 12: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Existence

Theorem - Existence

For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.

Proof:

Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.

Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .

The resulting network FB is a full BN.

Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.

Page 13: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Existence

Theorem - Existence

For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.

Proof:

Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.

Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .

The resulting network FB is a full BN.

Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.

Page 14: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Existence

Theorem - Existence

For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.

Proof:

Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.

Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .

The resulting network FB is a full BN.

Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.

Page 15: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - FBC for Naive Bayes

Example of a naive Bayes

C

X1 X2 X3 X4

Page 16: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - FBC for Naive Bayes

Example of an FBC for the naive Bayes

C

X1 X2 X3 X4

X1 X1

C

p11p12p13p14

X2 X2

C

p21p22p23p24

X3 X3

C

p31p32p33p34

X4 X4

C

p41p42p43p44

Page 17: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Learning Full Bayesian Network Classifiers

Learning an FBC consists of two parts:

Construction of a full BN.

Learning of decision trees to represent the CPT of eachvariable.

The full BN is implemented using a Bayesian multinet.

Definition - Bayesian multinet

A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .

Page 18: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Learning Full Bayesian Network Classifiers

Learning an FBC consists of two parts:

Construction of a full BN.

Learning of decision trees to represent the CPT of eachvariable.

The full BN is implemented using a Bayesian multinet.

Definition - Bayesian multinet

A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .

Page 19: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Learning Full Bayesian Network Classifiers

Learning an FBC consists of two parts:

Construction of a full BN.

Learning of decision trees to represent the CPT of eachvariable.

The full BN is implemented using a Bayesian multinet.

Definition - Bayesian multinet

A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .

Page 20: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Learning Full Bayesian Network Classifiers

Learning an FBC consists of two parts:

Construction of a full BN.

Learning of decision trees to represent the CPT of eachvariable.

The full BN is implemented using a Bayesian multinet.

Definition - Bayesian multinet

A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .

Page 21: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Learning Full Bayesian Network Classifiers

Learning an FBC consists of two parts:

Construction of a full BN.

Learning of decision trees to represent the CPT of eachvariable.

The full BN is implemented using a Bayesian multinet.

Definition - Bayesian multinet

A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .

Page 22: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning

Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.

A variable is ranked based on its total influence on other variables.

The influence (dependency) between two variables can bemeasured by mutual information.

Definition - Mutual information

Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

Page 23: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning

Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.

A variable is ranked based on its total influence on other variables.

The influence (dependency) between two variables can bemeasured by mutual information.

Definition - Mutual information

Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

Page 24: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning

Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.

A variable is ranked based on its total influence on other variables.

The influence (dependency) between two variables can bemeasured by mutual information.

Definition - Mutual information

Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

Page 25: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning

Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.

A variable is ranked based on its total influence on other variables.

The influence (dependency) between two variables can bemeasured by mutual information.

Definition - Mutual information

Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

Page 26: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning

It is possible that the dependency between two variables, measuredby mutual information, is caused merely by noise.

Results by Friedman are used as a dependency threshold to filterout unreliable dependencies.

Definition - Dependency threshold

Let Xi and Xj be two variables in a Bayesian network. Thedependency threshold, denoted by φ, is defined as:

φ(Xi ,Xj) = logN2N × Tij , where Tij = |Xi | × |Xj |.

Page 27: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning

It is possible that the dependency between two variables, measuredby mutual information, is caused merely by noise.

Results by Friedman are used as a dependency threshold to filterout unreliable dependencies.

Definition - Dependency threshold

Let Xi and Xj be two variables in a Bayesian network. Thedependency threshold, denoted by φ, is defined as:

φ(Xi ,Xj) = logN2N × Tij , where Tij = |Xi | × |Xj |.

Page 28: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning

It is possible that the dependency between two variables, measuredby mutual information, is caused merely by noise.

Results by Friedman are used as a dependency threshold to filterout unreliable dependencies.

Definition - Dependency threshold

Let Xi and Xj be two variables in a Bayesian network. Thedependency threshold, denoted by φ, is defined as:

φ(Xi ,Xj) = logN2N × Tij , where Tij = |Xi | × |Xj |.

Page 29: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning

The total influence of a variable on other variables can now bedefined:

Definition - Total influence

Let Xi be a variable in a Bayesian network. The total influence ofXi on other variables, denoted by W (Xi ), is defined as:

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj).

Page 30: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning

The total influence of a variable on other variables can now bedefined:

Definition - Total influence

Let Xi be a variable in a Bayesian network. The total influence ofXi on other variables, denoted by W (Xi ), is defined as:

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj).

Page 31: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 32: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 33: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 34: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 35: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .

Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 36: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .

For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 37: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X

- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 38: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .

- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 39: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .

Add the resulting network Bc to B.

4 Return B.

Page 40: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 41: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Structure Learning Algorithm

Algorithm FBC-Structure(S, X)

1 B = empty.

2 Partition the training data S into |C | subsets Sc by the classvalue c .

3 For each training data set Sc

Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.

4 Return B.

Page 42: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Example using 1000 labeled instances, where C is the class variableand A, B, and D are feature variables.

C A B D #

c1 a1 b1 d1 11

c1 a1 b1 d2 5

c1 a1 b2 d1 7

c1 a1 b2 d2 17

c1 a2 b1 d1 227

c1 a2 b1 d2 97

c1 a2 b2 d1 11

c1 a2 b2 d2 25

C A B D #

c2 a1 b1 d1 36

c2 a1 b1 d2 36

c2 a1 b2 d1 259

c2 a1 b2 d2 29

c2 a2 b1 d1 96

c2 a2 b1 d2 96

c2 a2 b2 d1 43

c2 a2 b2 d2 5

Page 43: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

C A B D #

c1 a1 b1 d1 11

c1 a1 b1 d2 5

c1 a1 b2 d1 7

c1 a1 b2 d2 17

c1 a2 b1 d1 227

c1 a2 b1 d2 97

c1 a2 b2 d1 11

c1 a2 b2 d2 25The 400 data instances where C = c1.

b1 b2

a111+5400

7+17400

a2227+97

40011+25400

P(A,B)

Page 44: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

C A B D #

c1 a1 b1 d1 11

c1 a1 b1 d2 5

c1 a1 b2 d1 7

c1 a1 b2 d2 17

c1 a2 b1 d1 227

c1 a2 b1 d2 97

c1 a2 b2 d1 11

c1 a2 b2 d2 25The 400 data instances where C = c1.

b1 b2

a111+5400

7+17400

a2227+97

40011+25400

P(A,B)

Page 45: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

C A B D #

c1 a1 b1 d1 11

c1 a1 b1 d2 5

c1 a1 b2 d1 7

c1 a1 b2 d2 17

c1 a2 b1 d1 227

c1 a2 b1 d2 97

c1 a2 b2 d1 11

c1 a2 b2 d2 25The 400 data instances where C = c1.

b1 b2

a111+5400

7+17400

a2227+97

40011+25400

P(A,B)

Page 46: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

C A B D #

c1 a1 b1 d1 11

c1 a1 b1 d2 5

c1 a1 b2 d1 7

c1 a1 b2 d2 17

c1 a2 b1 d1 227

c1 a2 b1 d2 97

c1 a2 b2 d1 11

c1 a2 b2 d2 25The 400 data instances where C = c1.

b1 b2

a111+5400

7+17400

a2227+97

40011+25400

P(A,B)

Page 47: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

C A B D #

c1 a1 b1 d1 11

c1 a1 b1 d2 5

c1 a1 b2 d1 7

c1 a1 b2 d2 17

c1 a2 b1 d1 227

c1 a2 b1 d2 97

c1 a2 b2 d1 11

c1 a2 b2 d2 25The 400 data instances where C = c1.

b1 b2

a111+5400

7+17400

a2227+97

40011+25400

P(A,B)

Page 48: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

C A B D #

c1 a1 b1 d1 11

c1 a1 b1 d2 5

c1 a1 b2 d1 7

c1 a1 b2 d2 17

c1 a2 b1 d1 227

c1 a2 b1 d2 97

c1 a2 b2 d1 11

c1 a2 b2 d2 25The 400 data instances where C = c1.

b1 b2

a111+5400

7+17400

a2227+97

40011+25400

P(A,B)

Page 49: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

C A B D #

c1 a1 b1 d1 11

c1 a1 b1 d2 5

c1 a1 b2 d1 7

c1 a1 b2 d2 17

c1 a2 b1 d1 227

c1 a2 b1 d2 97

c1 a2 b2 d1 11

c1 a2 b2 d2 25The 400 data instances where C = c1.

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

Page 50: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)

a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B)= 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 51: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)

a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B)= 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 52: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)

a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B)= 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 53: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)

a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B)= 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 54: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)

a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B)= 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 55: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)

a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B)= 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 56: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 0.085 0.015

a2 0.765 0.135

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B)= 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 57: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 0.085 0.015

a2 0.765 0.135

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B)= 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 58: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 0.085 0.015

a2 0.765 0.135

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B) = 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 59: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 0.085 0.015

a2 0.765 0.135

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B) = 0.04 · log(0.04

0.085)+0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 60: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 0.085 0.015

a2 0.765 0.135

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B) = 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 61: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 0.085 0.015

a2 0.765 0.135

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B) = 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+ 0.06 · log(0.06

0.015)+0.09 · log(

0.09

0.135) = 0.027

Page 62: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 0.085 0.015

a2 0.765 0.135

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B) = 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+ 0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135)= 0.027

Page 63: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

b1 b2

a1 0.04 0.06

a2 0.81 0.09

P(A,B)

b1 b2

a1 0.085 0.015

a2 0.765 0.135

P(A)P(B)

M(X ; Y ) =∑

x∈X ,y∈Y

P(x , y)logP(x , y)

P(x)P(y)

M(A; B) = 0.04 · log(0.04

0.085) + 0.81 · log(

0.81

0.765)

+ 0.06 · log(0.06

0.015) + 0.09 · log(

0.09

0.135) = 0.027

Page 64: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027

M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 65: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004

M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 66: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 67: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 68: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D)

= 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 69: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800

= 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 70: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 71: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 72: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A)

= M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 73: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B)

= 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 74: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027

indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 75: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B)

= M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 76: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B)

+M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 77: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D)

= 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 78: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045

indent indent indentW (D) = M(B; D) = 0.018

Page 79: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D)

= M(B; D) = 0.018

Page 80: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D)

= 0.018

Page 81: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018

Dependency thresholdφ(Xi ,Xj) = logN

2N × Tij

φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013

Total influence

W (Xi ) =

M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)

M(Xi ; Xj)

indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018

Page 82: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

We now construct a full Bayesian network with variable orderaccording to the total influence values:

W (A) = 0.027W (B) = 0.045W (D) = 0.018

W (B) > W (A) > W (D)

B A D

We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.

Page 83: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

We now construct a full Bayesian network with variable orderaccording to the total influence values:

W (A) = 0.027W (B) = 0.045W (D) = 0.018

W (B) > W (A) > W (D)

B A D

We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.

Page 84: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

We now construct a full Bayesian network with variable orderaccording to the total influence values:

W (A) = 0.027W (B) = 0.045W (D) = 0.018

W (B) > W (A) > W (D)

B A D

We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.

Page 85: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

We now construct a full Bayesian network with variable orderaccording to the total influence values:

W (A) = 0.027W (B) = 0.045W (D) = 0.018

W (B) > W (A) > W (D)

B A D

We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.

Page 86: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

We now construct a full Bayesian network with variable orderaccording to the total influence values:

W (A) = 0.027W (B) = 0.045W (D) = 0.018

W (B) > W (A) > W (D)

B A D

We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.

Page 87: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

We now construct a full Bayesian network with variable orderaccording to the total influence values:

W (A) = 0.027W (B) = 0.045W (D) = 0.018

W (B) > W (A) > W (D)

B A D

We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.

Page 88: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

We now construct a full Bayesian network with variable orderaccording to the total influence values:

W (A) = 0.027W (B) = 0.045W (D) = 0.018

W (B) > W (A) > W (D)

B A D

We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.

Page 89: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - Structure Learning Algorithm

We now construct a full Bayesian network with variable orderaccording to the total influence values:

W (A) = 0.027W (B) = 0.045W (D) = 0.018

W (B) > W (A) > W (D)

B A D

We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.

Page 90: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning

We now need to learn a CPT-tree for each variable in the full BN.

A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).

Instead a fast decision tree learning algorithm is purposed.

The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.

The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.

Page 91: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning

We now need to learn a CPT-tree for each variable in the full BN.

A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).

Instead a fast decision tree learning algorithm is purposed.

The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.

The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.

Page 92: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning

We now need to learn a CPT-tree for each variable in the full BN.

A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).

Instead a fast decision tree learning algorithm is purposed.

The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.

The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.

Page 93: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning

We now need to learn a CPT-tree for each variable in the full BN.

A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).

Instead a fast decision tree learning algorithm is purposed.

The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.

The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.

Page 94: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning

We now need to learn a CPT-tree for each variable in the full BN.

A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).

Instead a fast decision tree learning algorithm is purposed.

The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.

The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.

Page 95: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 96: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.

2 If (S is pure or empty) or (ΠXiis empty)

Return T.3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 97: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 98: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.

4 While (qualified == False) and (ΠXiis not empty)

Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 99: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)

Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 100: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).

Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 101: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .

Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 102: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.

Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 103: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.

If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True5 If qualified == True

Create a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 104: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 105: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == True

Create a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 106: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.

Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 107: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .

For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 108: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 109: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)

- Add Tx as a child of Xj .

6 Return T.

Page 110: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 111: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

CPT-tree Learning Algorithm

Algorithm Fast-CPT-Tree(ΠXi, S)

1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi

is empty)Return T.

3 qualified = False.4 While (qualified == False) and (ΠXi

is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True

5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj

- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .

6 Return T.

Page 112: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.

Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.

Bb1 b2

Page 113: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.

Bb1 b2

Page 114: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.

MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.

Bb1 b2

Page 115: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018

, φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.

Bb1 b2

Page 116: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013

MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.

Bb1 b2

Page 117: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.

Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.

Bb1 b2

Page 118: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .

Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.

Bb1 b2

Page 119: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)

Add the resulting trees as children of Xj = B.

Bb1 b2

Page 120: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.

Bb1 b2

Page 121: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)

M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.

Bb1 b2

Page 122: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 123: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 124: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6

, φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 125: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015

MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 126: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 127: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 128: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 129: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 130: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5

, φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 131: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059

MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 132: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 133: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

Fast-CPT-Tree(ΠD = {A}, Sb1)

Only one parent variable remains, so Xj = A.

MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Fast-CPT-Tree(ΠD = {A}, Sb2)

Only one parent variable remains, so Xj = A.

MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.

Since qualified == False, return the empty tree.

Page 134: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.

Bb1 b2

We should repeat this process for each variable in each network.

Page 135: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.

Bb1 b2

We should repeat this process for each variable in each network.

Page 136: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.

Bb1 b2

Dd1 d2

Dd1 d2

We should repeat this process for each variable in each network.

Page 137: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Example - CPT-tree Learning Algorithm

We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.

Bb1 b2

Dd1 d2

Dd1 d2

11+227340 =0.7

5+97340 =0.3

7+1160 =0.3

17+2560 =0.7

We should repeat this process for each variable in each network.

Page 138: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Complexity

Let n be the number of variables and N the number of datainstances.

FBC-Structure has time complexity O(n2 · N).

Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N

|C |) = O(n2 · N).

Thus, the FBC learning algorithm has the time complexityO(n2 · N).

Page 139: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Complexity

Let n be the number of variables and N the number of datainstances.

FBC-Structure has time complexity O(n2 · N).

Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N

|C |) = O(n2 · N).

Thus, the FBC learning algorithm has the time complexityO(n2 · N).

Page 140: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Complexity

Let n be the number of variables and N the number of datainstances.

FBC-Structure has time complexity O(n2 · N).

Fast-CPT-Tree has time complexity O(n · N).

Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N

|C |) = O(n2 · N).

Thus, the FBC learning algorithm has the time complexityO(n2 · N).

Page 141: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Complexity

Let n be the number of variables and N the number of datainstances.

FBC-Structure has time complexity O(n2 · N).

Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N

|C |) = O(n2 · N).

Thus, the FBC learning algorithm has the time complexityO(n2 · N).

Page 142: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Complexity

Let n be the number of variables and N the number of datainstances.

FBC-Structure has time complexity O(n2 · N).

Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N

|C |) = O(n2 · N).

Thus, the FBC learning algorithm has the time complexityO(n2 · N).

Page 143: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Experiments - Results

33 UCI data sets, available in Weka, are used for experiments.

Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.

Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.

Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO

FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2

Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO

FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3

Page 144: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Experiments - Results

33 UCI data sets, available in Weka, are used for experiments.

Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.

Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.

Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO

FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2

Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO

FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3

Page 145: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Experiments - Results

33 UCI data sets, available in Weka, are used for experiments.

Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.

Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.

Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO

FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2

Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO

FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3

Page 146: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Experiments - Results

33 UCI data sets, available in Weka, are used for experiments.

Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.

Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.

Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO

FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2

Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO

FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3

Page 147: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Experiments - Results

33 UCI data sets, available in Weka, are used for experiments.

Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.

Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.

Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO

FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2

Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO

FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3

Page 148: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Experiments - Complexity

Complexity of tested algorithms

Training Classification

FBC O(n2 · N) O(n)AODE O(n2 · N) O(n2)HGC O(n4 · N) O(n)TAN O(n2 · N) O(n)NBT O(n3 · N) O(n)C4.5 O(n2 · N) O(n)SMO O(n2.3) O(n)

Page 149: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Experiments - Conclusion

FBC demonstrates good performance in both classification andranking.

FBC is among the most efficient algorithms in both training andclassification time.

Overall, the performance of FBC is the best among the algorithmscompared.

Page 150: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Experiments - Conclusion

FBC demonstrates good performance in both classification andranking.

FBC is among the most efficient algorithms in both training andclassification time.

Overall, the performance of FBC is the best among the algorithmscompared.

Page 151: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence

Experiments - Conclusion

FBC demonstrates good performance in both classification andranking.

FBC is among the most efficient algorithms in both training andclassification time.

Overall, the performance of FBC is the best among the algorithmscompared.