machine learning for image classification ----part ii ... · a brief history resampling for...

63
Machine Learning for Image Classification ----Part II: Ensemble Approaches Jianping Fan Dept of Computer Science UNC-Charlotte Course Website: http://webpages.uncc.edu/jfan/itcs5152.html

Upload: others

Post on 14-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Machine Learning for Image Classification----Part II: Ensemble Approaches

Jianping Fan Dept of Computer Science

UNC-Charlotte

Course Website: http://webpages.uncc.edu/jfan/itcs5152.html

Page 2: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Ensemble Learning

A machine learning paradigm where multiple learners are used to solve the problem

Problem

… ...… ...

Problem

LearnerLearner Learner Learner

Previously: single classifier

Ensemble: multiple classifiers

1

( )( )T

t

tT th xH x sign

Page 3: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Ensemble Classifierf1

f2

fT

ML

ML

MLf

Dataset

Subset 1

Subset 2

Subset T

1

( )( )T

t

tT th xH x sign

It is not a good idea to randomly combine multiple classifiers together!

More helps most time !

More is less sometimes!

Page 4: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Wish List:

Each weak classifier may be different

from others! Different focuses,

capabilities, ……they even can

compensate each other!

Each of them plays different roles!

1

( )( )T

t

tT th xH x sign

Page 5: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Ensemble Classifier

Majority voting: winner take all!

Weighted voting: combine with weights

Averaging: combine with equal weights

1

( )( )T

t

tT th xH x sign

Why we learn from data subsets?

Page 6: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Ensemble Classifier

What may affect ensemble classifier?

Diversity of weak classifiers: we will not

hair two ``almost same” persons

Weights for weak classifier combination:

we know they play different not equal

roles in final decision

1

( )( )T

t

tT th xH x sign

Page 7: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Ensemble Classifier

How to train a set of classifiers with

diverse capabilities?

1

( )( )T

t

tT th xH x sign

1. Using different datasets for the same data-driven learning algorithm

2. Using different learning algorithms to train different classifiers

from the same dataset

Page 8: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Ensemble Classifier

We may prefer weighted voting for ensemble

1

( )( )T

t

tT th xH x sign

How to determine the weights automatically?

Page 9: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Wish Lists: NBA Championship Rule

Page 10: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Wish Lists: NBA Championship Rule

Without Shaq, Kobe even cannot got playoffs for several years!

Page 11: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Wish Lists: NBA Championship Rule

Kobe finally finds the solution!

Page 12: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Wish Lists: NBA Championship Rule

How about this man with enough helpers?

Page 13: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Wish Lists: NBA Championship Rule

Yes, you can after I retire or I move to Lakers!

Weak classifiers are not ``weak’ at all!

They all are very strong on some places &

They know balance and compensations!

Page 14: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Our observations from NSA examples

Diversity of weak classifiers is not

sufficient, they should compensate each

other!

Weak classifiers are not weak! They are

very strong at certain places!

Weights should depend on the

importance or potential contributions or

capabilities!

Page 15: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Diversity of Weak Classifier

Training different weak classifiers from

various data subsets!

Data-driven learning algorithms may make these

weak classifiers to be different!

Sampling various subsets from the same big data set!

Page 16: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

16

• Bootstrapping

• Bagging

• Boosting (Schapire 1989)

• Adaboost (Schapire 1995)

A Brief History Resampling for

estimating statistic

Resampling for

classifier design

How to make weak classifier diverse?

Page 17: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Bootstrap Estimation

• Repeatedly draw n samples from D

• For each set of samples, estimate a statistic

• The bootstrap estimate is the mean of the individual estimates

• Used to estimate a statistic (parameter) and its variance

Page 18: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Bagging - Aggregate Bootstrapping

• For i = 1 .. M• Draw n*<n samples from D with

replacement• Learn classifier Ci

• Final classifier is a vote of C1 .. CM

• Increases classifier stability/reduces variance

Page 19: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Bagging

f1

f2

fT

ML

ML

ML

f

Page 20: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Boosting

Training Sample

Weighted Sample

Weighted Sample

fT

f1

f2

f

ML

ML

ML

Page 21: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Revisit Bagging

Page 22: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Boosting Classifier

Page 23: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Differences: Bagging vs. Boosting

Boosting brings connections or compensations

between data subsets, e.g., they know each

other!

Boosting has special combination rules for

weak classifier integration

Page 24: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Bagging vs Boosting

• Bagging: the construction of complementary base-learners is left to chance and to the unstability of the learning methods.

• Boosting: actively seek to generate complementary base-learner--- training the next base-learner based on the mistakes of the previous learners.

Page 25: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Boosting (Schapire 1989)

• Randomly select n1 < n samples from D without replacement to obtain D1

• Train weak learner C1

• Select n2 < n samples from D with half of the samples misclassified by C1 to obtain D2

• Train weak learner C2

• Select all samples from D that C1 and C2 disagree on• Train weak learner C3

• Final classifier is vote of weak learners

Page 26: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

AdaBoost (Schapire 1995)

• Instead of sampling, re-weight• Previous weak learner has only 50%

accuracy over new distribution

• Can be used to learn weak classifiers

• Final classification based on weighted vote of weak classifiers

Page 27: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Adaboost Terms

• Learner = Hypothesis = Classifier

• Weak Learner: < 50% error over any distribution

• Strong Classifier: thresholded linear combination of weak learner outputs

Page 28: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

AdaBoostAdaptive

A learning algorithm

Building a strong classifier a lot of weaker ones

Boosting

Page 29: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

AdaBoost Concept

1 { 1 }) , 1(h x

.

.

.

weak classifiers

slightly better than random

1

( )( )T

t

tT th xH x sign

2 { 1 }) , 1(h x

{ 1( }) , 1Th x

strong classifier

Page 30: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

AdaBoost1

( )( )T

t

tT th xH x sign

How to train weak classifiers and

make them compensate each other?

How to determine the weights

automatically? We do expect such

weights depending on their

performances and capabilities.

Page 31: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Weaker Classifiers

1 { 1 }) , 1(h x

.

.

.

weak classifiers

slightly better than random

1

( )( )T

t

tT th xH x sign

2 { 1 }) , 1(h x

{ 1( }) , 1Th x

strong classifier

Each weak classifier learns by considering one simple feature

T most beneficial features for classification should be selected

How to– define features?– select beneficial features?– train weak classifiers?– manage (weight) training samples?– associate weight to each weak

classifier?

Page 32: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

The Strong Classifiers

1 { 1 }) , 1(h x

.

.

.

weak classifiers

slightly better than random

1

( )( )T

t

tT th xH x sign

2 { 1 }) , 1(h x

{ 1( }) , 1Th x

strong classifier

Page 33: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

The AdaBoost Algorithm

Given: 1 1 where ( , ), ,( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 11( ) , 1, ,

mD i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

:probability distribution of 's at time ( )t iD i x t

• Weight classifier:11

ln2

tt

t

• Update distribution:1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t t

t

D i y h xD i Z

Z

minimize weighted error

for minimize exponential loss

Give error classified patterns more chance for learning.

Page 34: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

The AdaBoost Algorithm

Given: 1 1 where ( , ), ,( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 11( ) , 1, ,

mD i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

• Weight classifier:11

ln2

tt

t

• Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t t

t

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t t

t

sign H x h x

Page 35: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Observations for AdaBoost

Diversity of weak classifiers is enhances by compensation: the current weak classifier focuses on the samples which the previous ones make wrong predictions!

Weights for weak classifier combination largely depends on their performance or capabilities!

Compare these with our wish lists!

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

1

( ) ( )T

t t

t

sign H x h x

11ln

2

tt

t

Page 36: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Boosting illustration

Weak

Classifier 1

Some samples are misclassified!

Page 37: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Boosting illustration

Weights increased

for misclassified samples!

& new weak classifier will

pay more attention on them!

Page 38: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

The AdaBoost Algorithm

typicallywhere

the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set

where

Page 39: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Boosting illustration

Weak

Classifier 2

Weak classifier 2 will not pay attention

on these samples which have good predictions

by weak classifier 1!

Weak classifier 1

Page 40: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Boosting illustration

Weights increased

for misclassified samples!

& new weak classifier will

pay more attention on them!

Page 41: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Boosting illustration

Weak

Classifier 3

Weak classifier 3 will not pay attention

on these samples which have good predictions

by weak classifiers 1 & 2!

Page 42: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Boosting illustration

Final classifier is

a combination of 3

weak classifiers

Page 43: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Observations from this intuitive example

Current weak classifier pay more

attentions on the samples which are

misclassified by the previous weak

classifiers, thus they can compensate each

other on final decision!

It has provided an easy-to-hard solution

for weak classifier training!

Weights for weak classifier combination

largely depends on their performance or

capabilities!

Page 44: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

The AdaBoost Algorithm

Given: 1 1 where ( , ), ,( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 1

1( ) , 1, ,m

D i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

• Weight classifier: 11ln

2

tt

t

• Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t t

t

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t t

t

sign H x h x

What goal the AdaBoost wants to reach?

Page 45: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

The AdaBoost Algorithm

Given: 1 1 where ( , ), ,( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 1

1( ) , 1, ,m

D i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

• Weight classifier:

• Update distribution:1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t t

t

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t t

t

sign H x h x

What goal the AdaBoost wants to reach?

11ln

2

tt

t

They are goal dependent.

Page 46: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Goal

Minimize exponential loss

Final classifier:1

( ) ( )T

t t

t

sign H x h x

( )

exp ,( ) yH x

x yloss H x E e

Page 47: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Goal

Minimize exponential loss

Final classifier:1

( ) ( )T

t t

t

sign H x h x

( )

exp ,( ) yH x

x yloss H x E e ( )yH x

Maximize the margin yH(x)

Page 48: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

GoalFinal classifier:

1

( ) ( )T

t t

t

sign H x h x

( )

exp ,( ) yH x

x yloss H x E e Minimize

( ) ( )

, |t tyH x yH x

x y x yE e E E e x

Define 1( ) ( ) ( )t t t tH x H x h x with0( ) 0H x

Then, ( ) ( )TH x H x

1[ ( ) ( )]|t t ty H x h x

x yE E e x

1 ( ) ( )|t t tyH x y h x

x yE E e e x

1 ( )( ( )) ( ( ))t t tyH x

x t tE e e P y h x e P y h x

Page 49: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

( ) ( )

, |t tyH x yH x

x y x yE e E E e x

Final classifier:1

( ) ( )T

t t

t

sign H x h x

( )

exp ,( ) yH x

x yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x with0( ) 0H x

Then, ( ) ( )TH x H x

1 ( )( ( )) ( ( ))t t tyH x

x t tE e e P y h x e P y h x

( )

, 0tyH x

x y

t

E e

Set

1 ( )( ( )) ( ( )) 0t t tyH x

x t tE e e P y h x e P y h x

0

?t

Page 50: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Final classifier:1

( ) ( )T

t t

t

sign H x h x

Minimize

Define 1( ) ( ) ( )t t t tH x H x h x with0( ) 0H x

Then, ( ) ( )TH x H x

1 ( )( ( )) ( ( )) 0t t tyH x

x t tE e e P y h x e P y h x

0

( ( ))1ln

2 ( ( ))

tt

t

P y h x

P y h x

11ln

2

tt

t

(error)t P

?t ( )

exp ,( ) yH x

x yloss H x E e

( , ) ( )i i tP x y D i

1

( )[ ( )]m

t i j i

i

D i y h x

Page 51: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

( )

exp ,( ) yH x

x yloss H x E e

with

Final classifier:1

( ) ( )T

t t

t

sign H x h x

Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0( ) 0H x

Then, ( ) ( )TH x H x

1 ( )( ( )) ( ( )) 0t t tyH x

x t tE e e P y h x e P y h x

0

( ( ))1ln

2 ( ( ))

tt

t

P y h x

P y h x

11ln

2

tt

t

?t

Given:1 1 where ( , ), ,( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 11( ) , 1, ,

mD i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

• Weight classifier:11

ln2

tt

t

• Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t t

t

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t t

t

sign H x h x

(error)t P

( , ) ( )i i tP x y D i

1

( )[ ( )]m

t i j i

i

D i y h x

Page 52: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

with

Final classifier:1

( ) ( )T

t t

t

sign H x h x

( )

exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0( ) 0H x

Then, ( ) ( )TH x H x

1 ( )( ( )) ( ( )) 0t t tyH x

x t tE e e P y h x e P y h x

0

( ( ))1ln

2 ( ( ))

tt

t

P y h x

P y h x

11ln

2

tt

t

1 ?tD

Given:1 1 where ( , ), ,( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 11( ) , 1, ,

mD i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

• Weight classifier:11

ln2

tt

t

• Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t t

t

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t t

t

sign H x h x

(error)t P

( , ) ( )i i tP x y D i

1

( )[ ( )]m

t i j i

i

D i y h x

Page 53: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

with

Final classifier:1

( ) ( )T

t t

t

sign H x h x

( )

exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

1

, ,t t t tyH yH y h

x y x yE e E e e

1 2 2 2

,

11

2tyH

x y t t t tE e y h y h

1 2 2 2

,

1arg min 1

2tyH

t x y t th

h E e y h y h

2 2 1y h

1 2

,

1arg min 1

2tyH

t x y t th

h E e y h

1 21arg min 1 |

2tyH

t x y t th

h E E e y h x

Page 54: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

with

Final classifier:1

( ) ( )T

t t

t

sign H x h x

( )

exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

1 21arg min 1 |

2tyH

t x y t th

h E E e y h x

1argmin |tyH

t x y th

h E E e y h x

1arg max |tyH

t x yh

h E E e yh x

1 1( ) ( )arg max 1 ( ) ( 1| ) ( 1) ( ) ( 1| )t tH x H x

t xh

h E h x e P y x h x e P y x

Page 55: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

with

Final classifier:1

( ) ( )T

t t

t

sign H x h x

( )

exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

( )1, ~ ( | )arg max ( )yH xtt x y e P y xh

h E yh x maximized when ( ) y h x x

( ) ( )1 1, ~ ( | ) , ~ ( | )( ) ( 1| ) ( 1| )yH x yH xt tt x y e P y x x y e P y x

h x sign P y x P y x

( )1, ~ ( | )( ) |yH xtt x y e P y x

h x sign E y x

1 1( ) ( )arg max 1 ( ) ( 1| ) ( 1) ( ) ( 1| )t tH x H x

t xh

h E h x e P y x h x e P y x

Page 56: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

with

Final classifier:1

( ) ( )T

t t

t

sign H x h x

( )

exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

( ) ( )1 1, ~ ( | ) , ~ ( | )( ) ( 1| ) ( 1| )yH x yH xt tt x y e P y x x y e P y x

h x sign P y x P y x

1( ), ~ ( | )tyH x

x y e P y xAt time t

Page 57: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

with

Final classifier:1

( ) ( )T

t t

t

sign H x h x

( )

exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

1( ), ~ ( | )tyH x

x y e P y xAt time t

At time 1

Given:1 1 where ( , ), ,( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 11( ) , 1, ,

mD i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

• Weight classifier:11

ln2

tt

t

• Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t t

t

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t t

t

sign H x h x

, ~ ( | )x y P y x ( | ) 1i iP y x 1

1

1 1(1)D

Z m

At time t+1( )

, ~ ( | )tyH xx y e P y x

( )t tyh x

tD e

1

( ) exp[ ( ), is for normaliza i

]( ) t ont t i t i

t t

t

D i y h xD i Z

Z

Page 58: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

58

Pros and cons of AdaBoost

Advantages• Very simple to implement• Does feature selection resulting in

relatively simple classifier• Fairly good generalization

Disadvantages• Suboptimal solution• Sensitive to noisy data and outliers

Page 59: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Intuition

• Train a set of weak hypotheses: h1, …., hT.

• The combined hypothesis H is a weighted majority vote of the T weak hypotheses.

Each hypothesis ht has a weight αt.

• During the training, focus on the examples that are misclassified.

At round t, example xi has the weight Dt(i).

Page 60: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

Basic Setting

• Binary classification problem

• Training data:

• Dt(i): the weight of xi at round t. D1(i)=1/m.

• A learner L that finds a weak hypothesis ht: X Y given the training set and Dt

• The error of a weak hypothesis ht:

}1,1{,),,(),....,,( 11 YyXxwhereyxyx iimm

Page 61: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

The basic AdaBoost algorithm

For t=1, …, T

• Train weak learner using training data and Dt

• Get ht: X {-1,1} with error

• Choose

• Update

iit yxhi

tt iD)(:

)(

t

t

t

1ln

2

1

t

xhy

t

iit

iit

t

tt

Z

eiD

yxhife

yxhife

Z

iDiD

itit

t

t

)(

1

)(

)(

)(*

)()(

Page 62: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

The general AdaBoost algorithm

Page 63: Machine Learning for Image Classification ----Part II ... · A Brief History Resampling for estimating statistic Resampling for classifier design How to make weak classifier diverse?

63

Pros and cons of AdaBoost

Advantages• Very simple to implement• Does feature selection resulting in

relatively simple classifier• Fairly good generalization

Disadvantages• Suboptimal solution• Sensitive to noisy data and outliers