cs 59000 statistical machine learning lecture 18 yuan (alan) qi purdue cs oct. 30 2008

34
CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Upload: leonard-payne

Post on 03-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

CS 59000 Statistical Machine learningLecture 18

Yuan (Alan) QiPurdue CS

Oct. 30 2008

Page 2: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Outline

• Review of Support Vector Machines for Linearly Separable Case

• Support Vector Machines for Overlapping Class Distributions

• Support Vector Machines for Regression

Page 3: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Support Vector Machines

Support Vector Machines: motivated by statistical learning theory.

Maximum margin classifiers

Margin: the smallest distance between the decision boundary and any of the samples

Page 4: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Maximizing Margin

Since scaling w and b together will not change the above ratio, we set

In the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.

Page 5: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Optimization Problem

Quadratic programming:

Subject to

Page 6: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Lagrange Multiplier

Maximize

Subject to

Gradient of constraint:

Page 7: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Geometrical Illustration of Lagrange Multiplier

Page 8: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Lagrange Multiplier with Inequality Constraints

Page 9: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Karush-Kuhn-Tucker (KKT) condition

Page 10: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Lagrange Function for SVM

Quadratic programming:Subject to

Lagrange function:

Page 11: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Dual Variables

Setting derivatives over L to zero:

Page 12: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Dual Problem

Page 13: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Prediction

Page 14: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

KKT Condition, Support Vectors, and Bias

The corresponding data points in the latter case are known as support vectors. Then we can solve the bias term as follows:

Page 15: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Computational Complexity

Quadratic programming:

When Dimension < Number of data points, Solving the Dual problem is more costly.

Dual representation allows the use of kernels

Page 16: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Example: SVM Classification

Page 17: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Classification for Overlapping Classes

Soft Margin:

Page 18: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

New Cost Function

To maximize margin and softly penalize points that lies on the wrong side of margin (not decision) boundary, we minimize

Page 19: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Lagrange Function

Where we have Lagrange multipliers:

Page 20: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

KKT Condition

Page 21: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Gradients

Page 22: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Dual Lagrangian

Since and , we have

Page 23: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Dual Lagrangian with Constraints

Maximize

Subject to

Page 24: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Support Vectors

Discussions on two cases of support vectors.

Page 25: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Solve Bias Term

Discussion on solving SVMs...

Page 26: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Interpretation from Regularization Framework

Page 27: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Regularized Logistic Regression

For logistic regression, we have

Page 28: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Visualization of Hinge Error Function

Page 29: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

SVM for Regression

Using sum of square errors, we have

However, the solution for ridge regression is not sparse.

Page 30: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Є-insensitive Error Function

Minimize

Page 31: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Slack Variables

How many slack variables do we need?

Minimize

Page 32: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Visualization of SVM Regression

Page 33: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Support Vectors for Regression

Which points will be support vectors for regression?

Why?

Page 34: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Sparsity Revisited

Discussion: Error function or regularizer (Lasso)