discriminative training of chow-liu tree multinet classifiers

Discriminative Training of Discriminative Training of Chow-Liu tree Multinet Chow-Liu tree Multinet

ClassifiersClassifiers

Huang, KaizhuHuang, KaizhuDept. of Computer Science and Dept. of Computer Science and

Engineering,Engineering,CUHKCUHK

OutlineOutline

BackgroundBackground– ClassifiersClassifiers

» Discriminative classifiersDiscriminative classifiers

» Generative classifiersGenerative classifiers Bayesian Multinet ClassifiersBayesian Multinet Classifiers

MotivationMotivation Discriminative Bayesian Multinet ClassifiersDiscriminative Bayesian Multinet Classifiers ExperimentsExperiments ConclusionConclusion

Discriminative ClassifiersDiscriminative Classifiers

Directly maximize a discriminative function Directly maximize a discriminative function

SVM

Generative ClassifiersGenerative Classifiers Estimate the distribution for each class, and Estimate the distribution for each class, and

then use Bayes rule to perform classification then use Bayes rule to perform classification

P1(x|C1)

P2(x|C2)

ComparisonComparison

Example of Missing Information:

From left to right: Original digit, Cropped and resized digit, 50% missing digit, 75% missing digit, and occluded digit.

Comparison (Continue)Comparison (Continue)

Discriminative Classifiers Discriminative Classifiers cannot deal with deal with missing information problems easily.missing information problems easily.

Generative Classifiers Generative Classifiers provide a principled way to handle missing to handle missing information problems.information problems.

When is missing, we can use When is missing, we can use Marginalized P1 and P2 to perform classification

)|,...,,( 1211 CxxxP m)|,...,,( 2212 CxxxP m

ix

mmii CxxxPCxxxxxP )|,...,,()|,...,,,...,( 1211111211

ix

mmii CxxxPCxxxxxP )|,...,,()|,...,,,...,( 2212211212

ix

Handling Missing Information Handling Missing Information ProblemProblem

SVM

TJT: a generative model

MotivationMotivation

It seems that a good classifier should It seems that a good classifier should combinecombine the strategies of discriminative the strategies of discriminative classifiers and generative classifiersclassifiers and generative classifiers

Our work trains the one of the generative Our work trains the one of the generative classifier: the classifier: the generativegenerative Bayesian Bayesian Multinet classifierMultinet classifier in a in a discriminativediscriminative wayway

Roadmap of our workRoadmap of our work

S u p p ort V e c to r M a ch in e s (S V M ) O th e rs

D isc rim in a tive C la ss if ie rs

N a ive B aye s ia n C la ss ife rs T re e -like B a yes ia n C la ss if ie rs O th e rs

B a ye s ia n M u ltin e t C la ss if ie rs O th e rs

B a ye s ia n N e tw o rk C la ss if ie rs (B N C )

G a u ss ia n M ix tu re M od e l(G M M ) H id d e n M a rkov M o d e l(H M M )

O th e rs M o d e ls

G e n e ra tive C lass if ie rs

Classifiers

How our work relates to other How our work relates to other work?work?

Discriminative Classifiers Generative Classifiers1.

Jaakkola and Haussler NIPS98

HMM and GMM Discriminative training2.

Difference: Our method performs a reverse process:

From Generative classifiers to Discriminative classifiers

Beaufays etc., ICASS99, Hastie etc., JRSS 96

Difference: Our method is designed for Bayesian Multinet Classifiers, a more general classifier.

S u p p ort V e c to r M a ch in e s (S V M ) O th e rs

D isc rim in a tive C la ss if ie rs

B a ye sia n Ch o w -L iu tre e M u lt in e t O th e rs

B a ye s ia n M u lt in e t C la ss if ie rs O th e rs

B a ye s ia n N e tw o rk C la ss if ie rs (B N C )

G a u ss ia n M ix tu re M od e l(G M M ) H id d e n M a rkov M o d e l(H M M )

O th e rs M o d e ls

G e n e ra tive C lass if ie rs

Classifiers

Problems of Bayesian Multinet Problems of Bayesian Multinet ClassifiersClassifiers

Pre-classified dataset

Sub-dataset D1 for Class I

Sub-dataset D2 for Class 2

Estimate the distribution P1 to approximate D1 accurately

Estimate the distribution P2 to approximate D2 accurately

Use Bayes rule to perform classification

Comments: This framework discards the divergence information between classes.

Our Training SchemeOur Training Scheme

Mathematic ExplanationMathematic Explanation

Bayesian Multinet Classifiers (BMC)Bayesian Multinet Classifiers (BMC)

Discriminative Training of BMCDiscriminative Training of BMC

Mathematic ExplanationMathematic Explanation

x

x

xP

xPxPPerror

xP

xPxPPerror

)(2

)(2log)(2)2(

)(1

)(1log)(1)1(

^^

^^

Finding Finding P1P1 and and P2P2

Experimental SetupExperimental Setup

DatasetsDatasets» 2 benchmark datasets from UCI machine learning repository2 benchmark datasets from UCI machine learning repository

Tic-tac-toe Tic-tac-toe Vote Vote

Experimental EnvironmentsExperimental Environments» Platform:Windows 2000Platform:Windows 2000

» Developing tool: Matlab 6.5Developing tool: Matlab 6.5

Error RateError Rate

Convergence PerformanceConvergence Performance

ConclusionConclusion

A discriminative training procedure for A discriminative training procedure for generative Bayesian Multinet Classifiers is generative Bayesian Multinet Classifiers is presentedpresented

This approach improves the recognition rate This approach improves the recognition rate for two benchmark datasets significantlyfor two benchmark datasets significantly

The theoretic exploration on the The theoretic exploration on the convergence performance of this approach convergence performance of this approach is on the way.is on the way.

discriminative training of chow-liu tree multinet classifiers

Documents

missing digit

generative modelmotivationit

generative classifiersour

discriminative wayroadmap

occluded digit

divergence information

resized digit

original digit