overview - pennsylvania state university

56
Overview Recall last class: Boos1ng is a way of genera1ng a strong classifier as a weighted ensemble of weak ones Today: Support Vector Machine (SVM) training generates a strong classifier directly Case Study: Dalal and Triggs pedestrian detector

Upload: others

Post on 17-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overview - Pennsylvania State University

Overview  •  Recall  last  class:  Boos1ng  is  a  way  of  genera1ng  a  strong  classifier  as  a  weighted  ensemble  of  weak  ones  

•  Today:  Support  Vector  Machine  (SVM)  training  generates  a  strong  classifier  directly  

•  Case  Study:  Dalal  and  Triggs  pedestrian  detector  

Page 2: Overview - Pennsylvania State University

Support  Vector  Machines  SVM  slides  from  Kristen  Grauman,  UT-­‐Aus1n  

 Other good resources: Presentation slides from Christoph Lampert: https://sites.google.com/site/christophlampert/teaching/kernel-methods-for-object-recognition Simple tutorial document by Chris Williams http://www.inf.ed.ac.uk/teaching/courses/iaml/docs/svm.pdf Video lecture by Pat Winston: https://www.youtube.com/watch?v=_PwhiWxHK8o

Page 3: Overview - Pennsylvania State University

Linear classifiers

Find linear function to separate positive and negative examples

Page 4: Overview - Pennsylvania State University

Lines in R2

0=++ bcyax

⎥⎦

⎤⎢⎣

⎡=ca

w ⎥⎦

⎤⎢⎣

⎡=yx

xLet

Page 5: Overview - Pennsylvania State University

Lines in R2

0=+⋅ bxw

⎥⎦

⎤⎢⎣

⎡=ca

w ⎥⎦

⎤⎢⎣

⎡=yx

x

0=++ bcyax

Let

w

Page 6: Overview - Pennsylvania State University

Linear classifiers •  Find linear function to separate positive and

negative examples

0:negative0:positive

<+⋅

≥+⋅

bb

ii

ii

wxxwxx

Which line is best?

Page 7: Overview - Pennsylvania State University

Support Vector Machines (SVMs)

•  Discriminative classifier based on optimal separating line (for 2d case)

•  Maximize the margin between the positive and negative training examples

Page 8: Overview - Pennsylvania State University

Support vector machines •  Want line that maximizes the margin.

1:1)(negative1:1)( positive−≤+⋅−=

≥+⋅=

byby

iii

iii

wxxwxx

Margin Support vectors

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

For support, vectors, 1±=+⋅ bi wx

Page 9: Overview - Pennsylvania State University

Lines in R2

0=+⋅ bxw

⎥⎦

⎤⎢⎣

⎡=ca

w ⎥⎦

⎤⎢⎣

⎡=yx

x

0=++ bcyax

Let

w

( )00 , yx

D

Page 10: Overview - Pennsylvania State University

Lines in R2

0=+⋅ bxw

⎥⎦

⎤⎢⎣

⎡=ca

w ⎥⎦

⎤⎢⎣

⎡=yx

x

0=++ bcyax

Let

w

( )00 , yx

D

wxw b

ca

bcyaxD +

=+

++=

Τ

22

00 distance from point to line

Page 11: Overview - Pennsylvania State University

Lines in R2

0=+⋅ bxw

⎥⎦

⎤⎢⎣

⎡=ca

w ⎥⎦

⎤⎢⎣

⎡=yx

x

0=++ bcyax

Let

w

( )00 , yx

D

wxw b

ca

bcyaxD +

=+

++=

Τ

22

00 distance from point to line

Page 12: Overview - Pennsylvania State University

Support vector machines •  Want line that maximizes the margin.

1:1)(negative1:1)( positive−≤+⋅−=

≥+⋅=

byby

iii

iii

wxxwxx

Margin M Support vectors

For support, vectors, 1±=+⋅ bi wx

Distance between point and line: ||||

||wwx bi +⋅

www211

=−

−=Mwwxw 1±

=+ bΤ

For support vectors:

Page 13: Overview - Pennsylvania State University

Support vector machines •  Want line that maximizes the margin.

1:1)(negative1:1)( positive−≤+⋅−=

≥+⋅=

byby

iii

iii

wxxwxx

Margin Support vectors

For support, vectors, 1±=+⋅ bi wx

Distance between point and line: ||||

||wwx bi +⋅

Therefore, the margin is 2 / ||w||

Page 14: Overview - Pennsylvania State University

Finding the maximum margin line 1.  Maximize margin 2/||w|| 2.  Correctly classify all training data points:

Quadratic optimization problem: Minimize

Subject to yi(w·xi+b) ≥ 1

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

wwT21

1:1)(negative1:1)( positive−≤+⋅−=

≥+⋅=

byby

iii

iii

wxxwxx

One constraint for each training point. Note sign trick.

Page 15: Overview - Pennsylvania State University

Finding the maximum margin line •  Solution:

∑= i iii y xw α

Support vector

learned weight

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 16: Overview - Pennsylvania State University

Finding the maximum margin line •  Solution:

b = yi – w·xi (for any support vector)

•  Classification function:

•  Notice that it relies on an inner product between the test point x and the support vectors xi

•  (Solving the optimization problem also involves computing the inner products xi · xj between all pairs of training points)

∑= i iii y xw α

bybi iii +⋅=+⋅ ∑ xxxw α

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

f (x) = sign (w ⋅x+ b)

= sign αiyii∑ xi ⋅x+ b( )If f(x) < 0, classify as negative, if f(x) > 0, classify as positive

Page 17: Overview - Pennsylvania State University

Ques1ons  •  What  if  the  features  are  not  2d?  •  What  if  the  data  is  not  linearly  separable?  •  What  to  do  for  more  than  two  classes?  

Page 18: Overview - Pennsylvania State University

Ques1ons  •  What  if  the  features  are  not  2d?  

– Generalizes  to  d-­‐dimensions  –  replace  line  with  “hyperplane”  

•  What  if  the  data  is  not  linearly  separable?  •  What  to  do  for  more  than  two  classes?  

Page 19: Overview - Pennsylvania State University

Planes in R3

0=+++ dczbyax

0=+⋅ dxw

⎥⎥⎥

⎢⎢⎢

=

cba

w⎥⎥⎥

⎢⎢⎢

=

zyx

xLet w

wxw d

cba

dczbyaxD +

=++

+++=

Τ

222

000 distance from point to plane

( )000 ,, zyx

D

Page 20: Overview - Pennsylvania State University

Hyperplanes in Rn

02211 =++++ bxwxwxw nn…

Hyperplane H is set of all vectors which satisfy:

nR∈x

0=+Τ bxw

wxwx bHD +

),(distance from point to hyperplane

Page 21: Overview - Pennsylvania State University

Ques1ons  •  What  if  the  features  are  not  2d?  •  What  if  the  data  is  not  linearly  separable?  

Page 22: Overview - Pennsylvania State University

Nonlinear  SVMs  

Slide from Andrew Zisserman

Page 23: Overview - Pennsylvania State University

Nonlinear  SVMs  

Slide from Andrew Zisserman

Page 24: Overview - Pennsylvania State University

Nonlinear  SVMs  

Slide from Andrew Zisserman

Page 25: Overview - Pennsylvania State University

The  Kernel  Trick  

•  Recall  we  transformed  linear  regression  into  nonlinear  regression  using  a  feature  vector  Φ(x)  and,  ul1mately,  the  “kernel  trick.”    

•  We  also  use  the  kernel  trick  here  to  transform    a  linear  classifier  into  nonlinear  one.    

Page 26: Overview - Pennsylvania State University

Example  Kernel  

Slide from Andrew Zisserman

Page 27: Overview - Pennsylvania State University

Example  Kernels  

Slide from Andrew Zisserman

Page 28: Overview - Pennsylvania State University

Nonlinear SVMs •  The kernel trick: instead of explicitly computing

the lifting transformation φ(x), define a kernel function K such that

K(xi , xjj) = φ(xi ) · φ(xj)

•  This gives a nonlinear decision boundary in the original feature space:

bKyi

iii +∑ ),( xxα

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 29: Overview - Pennsylvania State University

Ques1ons  •  What  if  the  features  are  not  2d?  •  What  if  the  data  is  not  linearly  separable?  •  What  to  do  for  more  than  two  classes?  

Page 30: Overview - Pennsylvania State University

Mul1-­‐class  SVMs  •  Achieve  mul1-­‐class  classifier  by  combining  a  number  of  binary  

classifiers  

•  One  vs.  all  –  Training:  learn  an  SVM  for  each  class  vs.  the  rest  –  Tes1ng:  apply  each  SVM  to  test  example  and  assign  to  it  the  class  of  the  SVM  that  returns  the  highest  decision  value  

•  One  vs.  one  –  Training:  learn  an  SVM  for  each  pair  of  classes  –  Tes1ng:  each  learned  SVM  “votes”  for  a  class  to  assign  to  the  test  example  

Page 31: Overview - Pennsylvania State University

Software for SVMs

Page 32: Overview - Pennsylvania State University

SVMs for recognition 1. Define a vector representation for

each example.

2. Select a kernel function.

3. Compute pairwise kernel values between labeled examples

4. Given this “kernel matrix” to SVM optimization software to identify support vectors & weights.

5. To classify a new example: compute kernel values between new input and support vectors, apply weights, check sign of output.

Page 33: Overview - Pennsylvania State University

Case  Study:  Pedestrian  Detector  Navneet  Dalal  and  Bill  Triggs,  

“Histograms  of  Oriented  Gradients  for  Human  DetecGon,”  CVPR  2005  

 

Page 34: Overview - Pennsylvania State University

Dalal  and  Triggs  CVPR’05  •  Detect  upright  pedestrians  •  Histogram  of  oriented  gradient  feature  vector  •  Linear  SVM  classifier;  sliding  window  detector  

64X128 HoG descriptor

Page 35: Overview - Pennsylvania State University

HoG  Feature  Extrac1on  

Page 36: Overview - Pennsylvania State University

HoG  Feature  Extrac1on:  Cells  64x128

Compute gradients

Each cell contains a histogram of gradient orientations, weighted by gradient magnitude

Page 37: Overview - Pennsylvania State University

HoG  Feature  Extrac1on:  Blocks  

“Each scalar cell response contributes several components to the final descriptor vector, each normalized with respect to a different block. This may seem redundant but good normalization is critical and including overlap significantly improves the performance.” Dalal&Triggs CVPR’05

2x2 block of cells normalize [ , , , , ... , ]

Page 38: Overview - Pennsylvania State University

38  

HoG  Design  Choices  Parameters  •  Gradient  scale  •  Orienta1on  bins  •  Block  overlap  area  

ε+←2

2/ vvv

Other choices n  RGB or Lab, Color/gray n  Block normalization

L2-hys,

or L1-sqrt,

Cel

l Cen

ter

bin

Block

R-H

OG

/SIF

T C-

HO

G

)/(1ε+← vvv

Page 39: Overview - Pennsylvania State University

Parameter / design choices were guided by extensive experimentation to determine empirical effects on detector performance (e.g. miss rate)

Page 40: Overview - Pennsylvania State University

Dalal&Triggs  Detector  •  Default  detector  configura1on:  

–  RGB  colour  space  with  no  gamma  correc1on  ;    –  [−1,  0,  1]  gradient  filter  with  no  smoothing  ;    –  linear  gradient  vo1ng  into  9  orienta1on  bins  in  0◦–180◦;    –  16×16  pixel  blocks  of  four  8×8  pixel  cells;    –  Gaussian  spa1al  window  with  σ  =  8  pixel;    –  L2-­‐Hys  (Lowe-­‐style  clipped  L2  norm)  block  normaliza1on;    –  block  spacing  stride  of  8  pixels  (hence  4-­‐fold  coverage  of  each  cell)  ;    

–  64×128  detec1on  window  ;    –  linear  SVM  classifier.    

Page 41: Overview - Pennsylvania State University

41  

Detector  Architecture  

Learn binary classifier

Encode images into feature vectors

Create normalised training data set

Object/Non-object decision

Fuse multiple detections in 3-D position & scale space

Run classifier to obtain object/non-object decisions

Scan image at all scales and locations

Object detections with bounding boxes

Learning Phase Detection Phase

Page 42: Overview - Pennsylvania State University

Posi1ve  and  nega1ve  examples  

+  thousands  more…  

+  millions  more…  

Page 43: Overview - Pennsylvania State University
Page 44: Overview - Pennsylvania State University
Page 45: Overview - Pennsylvania State University

Person  detec1on  with  HoG  &  linear  SVM  

[Dalal  and  Triggs,  CVPR  2005]  

Soft (C=0.01) linear SVM trained with SVMLight.

Page 46: Overview - Pennsylvania State University

To  detect  people  at  all  locaGons  and  scales:  

• Sliding  window  using  learnt  HOG  template  

• Post-­‐processing  using  non-­‐maxima  suppression  

Page 47: Overview - Pennsylvania State University

• Sliding  window  using  learnt  HOG  template  

• Post-­‐processing  using  non-­‐maxima  suppression  

To  detect  people  at  all  locaGons  and  scales:  

Page 48: Overview - Pennsylvania State University

• Sliding  window  using  learnt  HOG  template  

• Post-­‐processing  using  non-­‐maxima  suppression  

To  detect  people  at  all  locaGons  and  scales:  

Page 49: Overview - Pennsylvania State University

• Sliding  window  using  learnt  HOG  template  

• Post-­‐processing  using  non-­‐maxima  suppression  

To  detect  people  at  all  locaGons  and  scales:  

Page 50: Overview - Pennsylvania State University

• Sliding  window  using  learnt  HOG  template  

• Post-­‐processing  using  non-­‐maxima  suppression  

To  detect  people  at  all  locaGons  and  scales:  

Page 51: Overview - Pennsylvania State University

• Sliding  window  using  learnt  HOG  template  

• Post-­‐processing  using  non-­‐maxima  suppression  

To  detect  people  at  all  locaGons  and  scales:  

Page 52: Overview - Pennsylvania State University

• Sliding  window  using  learnt  HOG  template  

• Post-­‐processing  using  non-­‐maxima  suppression  

To  detect  people  at  all  locaGons  and  scales:  

Page 53: Overview - Pennsylvania State University

Non-maximum Suppression across Scales

Page 54: Overview - Pennsylvania State University
Page 55: Overview - Pennsylvania State University
Page 56: Overview - Pennsylvania State University

Dalal  and  Triggs  Summary  •  HoG  feature  representa1on  •  Linear  SVM  classifier;  sliding  window  detector  •  Non-­‐maximum  suppression  across  scale  •  Use  of  detector  performance  metrics    to  guide  turning  of  system  parameters  

•  Detec1on  rate  90%  at  10-­‐4  FP  per  window  •  Slower  than  Viola-­‐Jones  detector