learning maximum likelihood bounded semi-naïve bayesian network classifier kaizhu huang, irwin...
Post on 19-Dec-2015
221 Views
Preview:
TRANSCRIPT
Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier
Kaizhu Huang, Irwin King, Michael R. Lyu
Multimedia Information Processing Laboratory
The Chinese University of Hong KongShatin, NT. Hong Kong
{kzhuang, king, lyu}@cse.cuhk.edu.hk
SMC2002, October 8, 2002Hammamet, Tunisia
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
2
Outline
Abstract Background
Classifiers Naïve Bayesian Classifiers Semi-Naïve Bayesian Classifiers Chow-Liu Tree
Bounded Semi-Naïve Bayesian Classifiers Experimental Results Discussion Conclusion
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
3
Abstract
Propose a technique for constructing semi-naïve Bayesian classifiers. It is bounded by the number of variables that can be
combined into a node. It has a less computational cost than the traditional
semi-naïve Bayesian networks. Experiments show the proposed technique is more
accurate.
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
4
A Typical Classification Problem
Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease.
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
5
Classifiers Given a pre-classified dataset D,
where is the training data
in m-dimension real space, is the class label.
A classifier is defined as a mapping function:
to satisfy .
Background
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
6
Probabilistic Classifiers The classification mapping function is defined as:
The joint probability is not easily estimated from the dataset; however, the assumption about the distribution has to be made, e.g., dependent or independent?
a constant for a given x
Background
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
7
Naïve Bayesian Classifiers (NB) Assumption: Given the class label C, the attributes
are independent: Classification mapping function
Related Work
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
8
Related Work
Naïve Bayesian Classifiers NB’s performance is comparable with some state-
of-the-art classifiers even when its independency assumption does not hold in normal cases.
Question: Question: Can the performance be better when the conditional Can the performance be better when the conditional
independency assumption of NB is independency assumption of NB is relaxedrelaxed??
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
9
Semi-Naïve Bayesian Classifiers(SNB) A looser assumption than NB. Independency occurs among the jointed variables,
given the class label C.
Related Work
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
10
A tree dependence structure
Related Work
Chow-Liu Tree (CLT) Another looser assumption than NB. A dependence tree exists among the variables,
given the class variable C.
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
11
A conditional tree
dependency assumption
among variables
A conditional independency
assumption among jointed
variables
Chow & Liu68 developed a
global optimal and polynomial
time cost algorithm
Traditional SNBs are not
well developed like CLT
Summary of Related Work
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
12
Kononenko91 Pazzani96
Local heuristicLocal heuristic
Efficient?
Accurate?
NoInefficient even in
jointing 3 variables
No
Exponential time cost
Problems of Traditional SNBs
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
13
Our Novel Bounded Semi-Naïve Bayesian Network
Accurate? We use a global combinatorial optimization method.
Efficient? We find the network based on Linear Programming,
which can be solved in polynomial time.
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
14
Jointed variables
Completely covering the variable set without overlapping
Conditional independency
Bounded
Bounded Semi-Naïve Bayesian Network Model Definition
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
15
Large search space
Reduced by adding the constraint as follows: The cardinality of each jointed variable is exactly equal to K
Hidden principle: When K is small, a K cardinality of jointed variables will be more accurate than
separating them into several jointed variables. Example: P(a,b) P(c,d) is more close to P(a,b,c,d) than P(a,b)P(c)P(d).
Search space after reduction:
Constraining the Search Space
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
16
How to search for the appropriate model? Finding the m= [n/K ] K-cardinality subsets (jointed variables)
from variables (features) set which satisfy the SNB conditions to maximize the Log likelihood.
[x] means rounding the x to the nearest integer
Searching K-Bounded-SNB Model
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
17
Relax the previous constraints into 0x1--an integer programming
(IP) problem is changed into a linear programming (LP)
problem
Relax the previous constraints into 0x1--an integer programming
(IP) problem is changed into a linear programming (LP)
problem
No coverage among jointed
variables
All the jointed variables forms the variable set
Rounding Scheme:Rounding LP solution into an IP
Solution.
Rounding Scheme:Rounding LP solution into an IP
Solution.
Global Optimization Procedure
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
18
Rounding Scheme
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
19
Experimental Setup
Datasets 6 benchmark datasets from UCI machine learning repository 1 synthetically generated dataset named “XOR”
Experimental Environments Platform:Windows 2000 Developing tool: Matlab 6.1
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
20
Overall Prediction Rate(%)
• We set the bound parameter K to 2 and 3.• 2-BSNB means the BSNB model for bounded parameter set to 2.
Experimental Results
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
21
Experimental Results
0
0.05
0.1
0.15
0.2
0.25
0.3
1
Erro
r rat
e NB
CLT
2-BSNB
3-BSNB
Average Error Rate Chart
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
22
1 2 3
4 5 6
7 8 9
Results on Tic-Tac-Toe Dataset
9 attributes for Tic-Tac-Toe dataset
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
23
Observations
Large K B-SNBs are not good for sparse datasets. Post dataset: 90 samples; K=3, the accuracy
decreases.
Which value for K is good depends on the properties of the datasets. For example, Tic-Tac-Toe, Vehicle: 3-variable bias;
K=3, the accuracy increases.
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
24
Discussion
When n cannot be divided by K exactly (n mod K)=l, l0, The assumption that all the joined variable has the
same cardinality K will be violated.Solution:
Find an l-cardinality jointed variable with the minimum entropy Do the optimization on the other n-l variables since (n-l mod K) will be 0.
How to choose K ? When the sample number of the dataset is small, a large K may not
get a good performance. A good K should be related to the nature of the datasets.
How to relax SNB? SNB is still strongly constrained. Upgrading into a mixture of SNBs.
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
25
Conclusion
A novel Bounded Semi-Naïve Bayesian classifier is proposed. Direct combinatorial optimization method enables
B-SNB to have global optimization.
The transformation from IP into a LP problem reduces the computational complexity into a polynomial one.
It outperforms NB and CLT in our experiments.
top related