svm classifierscvml.ajou.ac.kr/wiki/images/1/1e/ch3_1_svm-1.pdfย ยท 2017. 1. 12.ย ยท linear...

33
SVM classifiers

Upload: others

Post on 26-Jan-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • SVM classifiers

  • Binary classification

    Given training data (๐‘ฅ๐‘– , ๐‘ฆ๐‘–) for ๐‘– = 1 . . . N, with ๐‘ฅ๐‘– โˆˆ R๐‘‘ and ๐‘ฆ๐‘– โˆˆ {-1,1},

    learn a classifier ๐‘“(๐‘ฅ) such that

    ๐‘“ ๐‘ฅ๐‘– = โ‰ฅ 0, ๐‘ฆ๐‘–= +1< 0, ๐‘ฆ๐‘–= โˆ’1

    i.e. ๐‘ฆ๐‘–๐‘“ ๐‘ฅ๐‘– > 0 for a correct classification.

  • Linear separability

    Linearly

    separable

    not

    Linearly

    separable

  • Linear classifiers

    A linear classifier has the form

    ๐‘“ ๐‘ฅ = ๐‘ค๐‘‡๐‘ฅ + b

    โ€ข In 2D the discriminant is a line

    โ€ข ๐‘ค is the normal to the line, and b the bias

    โ€ข ๐‘ค is known as the weight vector

  • Linear classifiers

    A linear classifier has the form

    ๐‘“ ๐‘ฅ = ๐‘ค๐‘‡๐‘ฅ + b

    โ€ข In 3D the discriminant is a plane, and in nD it is a hyperplain

    For a K-NN classifier it was necessary to โ€˜carryโ€™ the training data

    For a linear classifier, the training data is used to learn ๐‘ค and then discarded

    Only ๐‘ค is needed for classifying new data

  • Reminder: The Perceptron Classifier

    Given linearly separable data ๐‘ฅ๐‘– labelled into two categories ๐‘ฆ๐‘– = {-1,1}, find a weight vector ๐‘ค such that the discriminant function

    ๐‘“ ๐‘ฅ๐‘– = ๐‘ค๐‘‡๐‘ฅ๐‘–+ b

    Separates the categories for ๐‘– = 1, โ€ฆ ,N

    โ€ข How can we find this separating hyperplane?

    The Perceptron Algorithm

    Write classifier as ๐‘“ ๐‘ฅ๐‘– = ๐‘ค๐‘‡ ๐‘ฅ๐‘– + ฯ‰0 = ๐‘ค

    ๐‘‡๐‘ฅ๐‘–where ๐‘ค = ( ๐‘ค, ฯ‰0), ๐‘ฅ๐‘– = ( ๐‘ฅ๐‘–,1)

    โ€ข Initialize ๐‘ค = 0

    โ€ข Cycle though the data points {๐‘ฅ๐‘–, ๐‘ฆ๐‘–}

    โ€ข If ๐‘ฅ๐‘– is misclassified then ๐‘ค โ† ๐‘ค + ฮฑsign(๐‘“ ๐‘ฅ๐‘– )๐‘ฅ๐‘–โ€ข Until all the data is correctly classified

  • For example in 2D

    โ€ข Initialize ๐‘ค = 0

    โ€ข Cycle though the data points {๐‘ฅ๐‘–, ๐‘ฆ๐‘–}

    โ€ข If ๐‘ฅ๐‘– is misclassified then ๐‘ค โ† ๐‘ค + ฮฑsign(๐‘“ ๐‘ฅ๐‘– )๐‘ฅ๐‘–

    โ€ข Until all the data is correctly classified

  • โ€ข If the data is linearly separable, then the algorithm will converge

    โ€ข Convergence can be slow โ€ฆ

    โ€ข Separating line close to training data

    โ€ข We would prefer a larger margin for generalization

  • โ€ข Maximum margin solution: most stable under perturbations of the inputs

  • Support Vector Machine

  • SVM โ€“ sketch derivation

    โ€ข Since ๐‘ค๐‘‡๐‘ฅ+ b = 0 and c(๐‘ค๐‘‡๐‘ฅ+ b) = 0 define the same plane, we have the freedom to choose the normalization of ๐‘ค

    โ€ข Choose normalization such that ๐‘ค๐‘‡๐‘ฅ++ b = +1 and ๐‘ค๐‘‡๐‘ฅ-+ b = -1 for the

    positive and negative support vectors respectively

    โ€ข Then the margin is given by

    ๐‘ค

    ๐‘คโˆ™ (๐‘ฅ+ โˆ’ ๐‘ฅโˆ’) =

    ๐‘ค๐‘‡(๐‘ฅ+โˆ’๐‘ฅโˆ’)

    ๐‘ค=

    2

    ๐‘ค

  • Support Vector MachineLinearly separable data

  • SVM โ€“ Optimization

    โ€ข Learning the SVM can be formulated as an optimization:

    m๐‘Ž๐‘ฅ๐‘ค

    2

    ๐‘คsubject to ๐‘ค๐‘‡๐‘ฅ๐‘– + ๐‘

    โ‰ฅ 1 ๐‘–๐‘“ ๐‘ฆ๐‘– = +1โ‰ค โˆ’1 ๐‘–๐‘“ ๐‘ฆ๐‘– = โˆ’1

    for ๐‘– = 1 . . . N

    โ€ข Or equivalently

    m๐‘–๐‘›๐‘ค

    ๐‘ค 2 subject to ๐‘ฆ๐‘–(๐‘ค๐‘‡๐‘ฅ๐‘– + ๐‘) โ‰ฅ 1 for ๐‘– = 1 . . . N

    โ€ข This is a quadratic optimization problem subject to linear constraints and

    there is a unique minimum

  • Linear separability again: What is the best w?

    โ€ข The points can be linearly separated but there is a very narrow margin

    In general there is a trade off between the margin and the number of

    Mistakes on the training data

    โ€ข But possibly the large margin solution is better, even though one constraint is violated

  • Introduce โ€œslackโ€ variables

  • โ€œSoftโ€ margin solution

    The optimization problem becomes

    min๐‘คโˆˆ๐‘…๐‘‘,๐œ‰

    ๐‘–โˆˆ๐‘…+

    ๐‘ค 2 +๐‘

    ๐‘–

    ๐‘

    ๐œ‰

    subject to

    ๐‘ฆ๐‘–(๐‘ค๐‘‡๐‘ฅ๐‘– + ๐‘) โ‰ฅ 1- ๐œ‰๐‘– for ๐‘– = 1 . . . N

    โ€ข Every constraint can be satisfied if ๐œ‰๐‘– is sufficiently large

    โ€ข ๐ถ is regularization parameter:

    - small ๐ถ allows constraints to be easily ignored โ†’ large margin

    - large ๐ถ makes constraints hard to ignored โ†’ narrow margin

    - ๐ถ = โˆž enforces all constraints: hard margin

    โ€ข This is still a quadratic optimization problem and there is a unique minimum.

    Note, there is only one parameter, ๐ถ.

  • โ€ข Data is linearly separable

    โ€ข But only with a narrow margin

  • ๐ถ = โˆž : hard margin

  • C = 10 soft margin

  • Application: Pedestrian detection in

    Computer Vision

    โ€ข Objective: detect (localize) standing humans in an image

    (c.f. face detection with a sliding window classifier)โ€ข

    โ€ข reduces object detection to binary classification

    โ€ข does an image window contain a person or not?

  • Detection problem (binary) classification

    problem

  • Each window is separately classified

  • Training data

    โ€ข 64x128 images of humans cropped from a varied set of personal photos

    โ€ข Positive data โ€“ 1239 positive window examples (reflections->2478)

    โ€ข Negative data โ€“ 1218 person-free training photos (12180 patches)

  • Training

    โ€ข A preliminary detector

    โ€ข Trained with (2478) vs (12180) samples

    โ€ข Retraining

    โ€ข With augmented data set

    โ€ข initial 12180 + hard examples

    โ€ข Hard examples

    โ€ข 1218 negative training photos are searched exhaustively for false positive

  • Feature: histogram of oriented gradients

    (HOG)

  • Averaged examples

  • Algorithm

    โ€ข Training(Learning)

    โ€ข Represent each example window by a HOG feature vector

    โ€ข Train a SVM classifier

    โ€ข Testing(Detection)

    โ€ข Sliding window classifier

    ๐‘“ ๐‘ฅ = ๐‘ค๐‘‡๐‘ฅ + b

    x๐‘–โˆˆR๐‘‘, ๐‘ค๐‘–๐‘กโ„Ž ๐‘‘ = 1024

  • Learned model

    ๐‘“ ๐‘ฅ = ๐‘ค๐‘‡๐‘ฅ + b