1 scaling multi-class support vector machines using inter- class confusion author:shantanu sunita...

29
1 Scaling multi-class Scaling multi-class Support Vector Support Vector Machines using inter- Machines using inter- class confusion class confusion Author:Shantanu Author:Shantanu Sunita Sarawag Sunita Sarawag i i Soumen Chakrab Soumen Chakrab arti arti Advisor:Dr Hsu Advisor:Dr Hsu Graduate:ching-wen Hong Graduate:ching-wen Hong

Upload: peregrine-chase

Post on 29-Dec-2015

238 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

11

Scaling multi-class Support Scaling multi-class Support Vector Machines using inter-Vector Machines using inter-

class confusionclass confusion

Author:ShantanuAuthor:Shantanu Sunita SarawagiSunita Sarawagi Soumen ChakrabartiSoumen ChakrabartiAdvisor:Dr HsuAdvisor:Dr HsuGraduate:ching-wen HongGraduate:ching-wen Hong

Page 2: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

22

ContentContent

1.Motivation1.Motivation 2.Objective2.Objective 3.Introduction: (1).SVM (2).Using SVM to solve 3.Introduction: (1).SVM (2).Using SVM to solve

multi-class problems. (3).Present a method in multi-class problems. (3).Present a method in this paper.this paper.

4.OUR APPROACH (1).Hierarchical Approach 4.OUR APPROACH (1).Hierarchical Approach (2).The GraphSVM algorithm (2).The GraphSVM algorithm

5.Experimental evaluation5.Experimental evaluation 6.Conclusion6.Conclusion 7.Personal opinion7.Personal opinion

Page 3: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

33

MotivationMotivation

Solve multi-class problems.Solve multi-class problems.

Page 4: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

44

ObjectiveObjective

SVM excel at two-class discrinative learning prSVM excel at two-class discrinative learning problems. The accuracy of SVM is high. oblems. The accuracy of SVM is high.

SVM is difficult to solve multi-class problems. SVM is difficult to solve multi-class problems. Because training time is long. Because training time is long.

The naïve Bayes(NB) classifier is much faster tThe naïve Bayes(NB) classifier is much faster than SVM in training time.han SVM in training time.

We propose a new technique for multi-way claWe propose a new technique for multi-way classification which exploits the accuracy of SVM ssification which exploits the accuracy of SVM and the speed of NB classifiers. and the speed of NB classifiers.

Page 5: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

55

IntroductionIntroduction

1.SVM:1.SVM: Input: a training set S=Input: a training set S= {({( xx11,y,y11 )) ,…, ,…,

(( xxNN,y,yNN )})} ,x,xii is a vector, y is a vector, yii=1,-1=1,-1 Output: a classifier fOutput: a classifier f (( xx )) =W=W .. X+bX+b For example: Medical diagnosisFor example: Medical diagnosis XXii = = (( age,sex,blood,…,genome,…age,sex,blood,…,genome,… )) YYi i indicates the risk of cancer.indicates the risk of cancer.

Page 6: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

66

1.Linear SVM1.Linear SVM

Page 7: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

77

Linear SVMLinear SVM

Page 8: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

88

Linear SVMLinear SVM

Page 9: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

99

Linear SVMLinear SVM

Page 10: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1010

Linear SVMLinear SVM

Page 11: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1111

2.Using SVM to solve multi-class pr2.Using SVM to solve multi-class problems.oblems.

1. “one-vs-others” approach1. “one-vs-others” approach For each of the N classes, We construct a For each of the N classes, We construct a

one-others (yes/no) SVM for that class alone-others (yes/no) SVM for that class alone.one.

The winning SVM is the one which says yThe winning SVM is the one which says yes, and whose margin is largest among aes, and whose margin is largest among all SVMs.ll SVMs.

Page 12: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1212

Using SVM to solve multi-class proUsing SVM to solve multi-class problemsblems

2.Accumulated votes approach 2.Accumulated votes approach To construct SVMs between all possible To construct SVMs between all possible

pairs of classes.pairs of classes. The winning class has the largest numbeThe winning class has the largest numbe

r of accumulated votes.r of accumulated votes.

Page 13: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1313

3.Present a method in this paper.3.Present a method in this paper.

1.Using scalability of NB classifiers w.r.t. 1.Using scalability of NB classifiers w.r.t. number of classes and accuracy of SVMs.number of classes and accuracy of SVMs.

The first stage :Using multi-class NB clasThe first stage :Using multi-class NB classifier to a confusion matrix.sifier to a confusion matrix.

The second stage :Using SVM with the “The second stage :Using SVM with the “one-vs-others” approach. one-vs-others” approach.

Page 14: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1414

OUR APPROACHOUR APPROACH

Confusion matrix: using NB and held-out Confusion matrix: using NB and held-out validation dataset.validation dataset.

Page 15: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1515

OUR APPROACHOUR APPROACH

Page 16: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1616

Hierarchical ApproachHierarchical Approach

Top-level( L1) classifier(NB or SVM) to diTop-level( L1) classifier(NB or SVM) to discriminate amongst the top-level clusterscriminate amongst the top-level clusters of labels.s of labels.

Second-level(L2) we build multi-class SVSecond-level(L2) we build multi-class SVMs within each cluster of classes.Ms within each cluster of classes.

Page 17: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1717

Evaluation of the hierarchical Evaluation of the hierarchical approachapproach

We compare four methods:We compare four methods: MCNB(one-vs-others)MCNB(one-vs-others) MCSVM(one-vs-others)MCSVM(one-vs-others) Hier-NB (L1:NB,L2:NB),Hier-NB (L1:NB,L2:NB), Hier-SVM (L1:NB,L2:SVMHier-SVM (L1:NB,L2:SVM))

Page 18: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1818

Evaluation of the hierarchical Evaluation of the hierarchical approachapproach

Page 19: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

1919

Evaluation of the hierarchical Evaluation of the hierarchical approachapproach

Page 20: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2020

Evaluation of the hierarchical Evaluation of the hierarchical approachapproach

NB-L2( 89.01%),combining with the NB-L1 (93.NB-L2( 89.01%),combining with the NB-L1 (93.56%),Hier-NB (83.28%),MCNB (85.27%)56%),Hier-NB (83.28%),MCNB (85.27%)

SVM-L2 with NB-L1(92.04%), Hier-SVM(86.12SVM-L2 with NB-L1(92.04%), Hier-SVM(86.12%),MCSVM(89.66%) %),MCSVM(89.66%)

The main reason for the low accuracy of the hiThe main reason for the low accuracy of the hierarchical approaches is the compounding of erarchical approaches is the compounding of errors at the two levels.errors at the two levels.

This led us to design a new algorithm GraphSVThis led us to design a new algorithm GraphSVM. M.

Page 21: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2121

The GraphSVM algorithmThe GraphSVM algorithm

1.The confusion matrix obtained by a fast mult1.The confusion matrix obtained by a fast multi-class NB classifier M1,i-class NB classifier M1,

For each class i,F(i)={mis-classified as class i },For each class i,F(i)={mis-classified as class i },a threshold t% mis-classified.a threshold t% mis-classified.

In Figure1 , I=alt.atheism,t=3%,F(alt.atheism)=In Figure1 , I=alt.atheism,t=3%,F(alt.atheism)={talk.religion.misc,soc.religion.christian}.{talk.religion.misc,soc.religion.christian}.

2.Train a multi-class classifier M2(i) to distingu2.Train a multi-class classifier M2(i) to distinguish among the class{i}U F{i}.ish among the class{i}U F{i}.

Page 22: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2222

.Experimental evaluation.Experimental evaluation

1.Datasets1.Datasets 20-newsgroups:18828 news wire articles 20-newsgroups:18828 news wire articles

from 20 Usenet group.We randomly chofrom 20 Usenet group.We randomly chose 70% of the documents for training anse 70% of the documents for training and 30% for testing.d 30% for testing.

Reuter-21578:135 classes,8819 training Reuter-21578:135 classes,8819 training documents and 1887 test documents.documents and 1887 test documents.

Page 23: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2323

Overall comparisonOverall comparison

Page 24: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2424

Scalability with number of Scalability with number of classesclasses

Page 25: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2525

Scalability with number of Scalability with number of classesclasses

Page 26: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2626

Scalability with training set sizeScalability with training set size

Page 27: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2727

Effect of the threshold Effect of the threshold parameterparameter

Page 28: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2828

ConclusionConclusion

GraphSVM is accurate and efficient in multi-claGraphSVM is accurate and efficient in multi-classes problem.sses problem.

GraphSVM outerforms SVMs w.r.t. training timGraphSVM outerforms SVMs w.r.t. training time and memory requirements.e and memory requirements.

GraphSVM is very simple to understand and reGraphSVM is very simple to understand and requires negligible coding,but it is useful to deal quires negligible coding,but it is useful to deal with very large classifiers(ten of thousands of cwith very large classifiers(ten of thousands of classses and millions of instances).lassses and millions of instances).

Page 29: 1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti

2929

Personal opinionPersonal opinion

GraphSVM may be worse is high positive GraphSVM may be worse is high positive value of the threshold t.value of the threshold t.

It is nice that the accurate of GraphSVM It is nice that the accurate of GraphSVM can not affected by the threshold t.can not affected by the threshold t.