recognition ii

36
Recognition II Ali Farhadi

Upload: braith

Post on 25-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Recognition II. Ali Farhadi. We have talked about. Nearest Neighbor Naïve Bayes Logistic Regression Boosting . Support Vector Machines. Linear classifiers – Which line is better?. w . =  j w (j) x (j). Data. Example i. Pick the one with the largest margin!. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recognition II

Recognition II

Ali Farhadi

Page 2: Recognition II

We have talked about

• Nearest Neighbor • Naïve Bayes• Logistic Regression• Boosting

Page 3: Recognition II

Support Vector Machines

Page 4: Recognition II

Linear classifiers – Which line is better?

Data

Example i

w. = j w(j) x(j)

Page 5: Recognition II

Pick the one with the largest margin!

w.x = j w(j) x(j)

w.x

+ b

= 0

w.x

+ b

> 0

w.x

+ b

>>

0

w.x

+ b

< 0

w.x

+ b

<<

0

γ

γ

γ

γ

Margin: measures height of w.x+b plane at each point, increases with distance

Max Margin: two equivalent forms

(1)

(2)

Page 6: Recognition II

How many possible solutions?

w.x

+ b

= 0

Any other ways of writing the same dividing line?• w.x + b = 0• 2w.x + 2b = 0• 1000w.x + 1000b = 0• ….• Any constant scaling has the same

intersection with z=0 plane, so same dividing line!

Do we really want to max γ,w,b?

Page 7: Recognition II

Review: Normal to a plane

w.x

+ b

= 0

Key Terms

-- projection of xj onto w

-- unit vector normal to w

Page 8: Recognition II

Idea: constrained margin

w.x

+ b

= +

1

w.x

+ b

= -1

w.x

+ b

= 0

x-x+

Final result: can maximize constrained margin by minimizing ||w||2!!!

Generally:

γ

Assume: x+ on positive line, x- on negative

Page 9: Recognition II

Max margin using canonical hyperplanes

The assumption of canonical hyperplanes (at +1 and -1) changes the objective and the constraints!

w.x

+ b

= +

1

w.x

+ b

= -1

w.x

+ b

= 0

x-x+

γ

Page 10: Recognition II

Support vector machines (SVMs)

• Solve efficiently by quadratic programming (QP)– Well-studied solution algorithms– Not simple gradient ascent, but close

• Hyperplane defined by support vectors– Could use them as a lower-dimension

basis to write down line, although we haven’t seen how yet

– More on this later

w.x

+ b

= +

1

w.x

+ b

= -1

w.x

+ b

= 0

margin 2

Support Vectors:• data points on the

canonical lines

Non-support Vectors:• everything else• moving them will not

change w

Page 11: Recognition II

What if the data is not linearly separable?

Add More Features!!!

What about overfitting?

Page 12: Recognition II

What if the data is still not linearly separable?

• First Idea: Jointly minimize w.w and number of training mistakes– How to tradeoff two criteria?– Pick C on development / cross validation

• Tradeoff #(mistakes) and w.w– 0/1 loss– Not QP anymore– Also doesn’t distinguish near misses and

really bad mistakes– NP hard to find optimal solution!!!

+ C #(mistakes)

Page 13: Recognition II

Slack variables – Hinge loss

For each data point:• If margin ≥ 1, don’t care• If margin < 1, pay linear penalty

w.x

+ b

= +

1

w.x

+ b

= -1

w.x

+ b

= 0

ξ

ξ

ξ

ξ

+ C Σj ξj

- ξj ξj≥0

Slack Penalty C > 0:• C=∞ have to separate the data!• C=0 ignore data entirely!• Select on dev. set, etc.

Page 14: Recognition II

Side Note: Different LossesLogistic regression: Boosting :

All our new losses approximate 0/1 loss!

SVM:

0-1 Loss:Hinge loss:

Page 15: Recognition II

What about multiple classes?

Page 16: Recognition II

One against All

Learn 3 classifiers:• + vs {0,-}, weights w+

• - vs {0,+}, weights w-

• 0 vs {+,-}, weights w0

Output for x: y = argmaxi wi.x

w+

w-

Any other way?

w0

Any problems?

Page 17: Recognition II

Learn 1 classifier: Multiclass SVM

Simultaneously learn 3 sets of weights:• How do we

guarantee the correct labels?

• Need new constraints!

For j possible classes:

w+

w-

w0

Page 18: Recognition II

Learn 1 classifier: Multiclass SVM

Also, can introduce slack variables, as before:

Page 19: Recognition II

What if the data is not linearly separable?

Add More Features!!!

Page 20: Recognition II
Page 21: Recognition II

Example: Dalal-Triggs pedestrian detector

1. Extract fixed-sized (64x128 pixel) window at each position and scale

2. Compute HOG (histogram of gradient) features within each window

3. Score the window with a linear SVM classifier4. Perform non-maxima suppression to remove

overlapping detections with lower scoresNavneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 22: Recognition II

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 23: Recognition II

• Tested with– RGB– LAB– Grayscale

Slightly better performance vs. grayscale

Page 24: Recognition II

uncentered

centered

cubic-corrected

diagonal

Sobel

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Outperforms

Page 25: Recognition II

• Histogram of gradient orientations

– Votes weighted by magnitude– Bilinear interpolation between

cells

Orientation: 9 bins (for unsigned angles)

Histograms in 8x8 pixel cells

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 26: Recognition II

Normalize with respect to surrounding cells

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 27: Recognition II

X=

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

# features = 15 x 7 x 9 x 4 = 3780

# cells

# orientations

# normalizations by neighboring cells

Page 28: Recognition II

Training set

Page 29: Recognition II

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pos w neg w

Page 30: Recognition II

pedestrian

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 31: Recognition II

Detection examples

Page 32: Recognition II
Page 33: Recognition II

Each window is separately classified

Page 34: Recognition II
Page 35: Recognition II

Statistical Template• Object model = sum of scores of features at

fixed positions

+3 +2 -2 -1 -2.5 = -0.5

+4 +1 +0.5 +3 +0.5 = 10.5

> 7.5?

> 7.5?

Non-object

Object

Page 36: Recognition II

Part Based Models

• Before that:

– Clustering