machine learning svm

34
Machine Learning Supervised Learning and Support Vector Machine Raj Kamal [email protected] Department of Mathematics Indian Institute of Technology,Guwahati Guwahati-781039,India Machine Learning – p. 1

Upload: raj-kamal

Post on 28-Dec-2014

460 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Machine  learning SVM

Machine Learning

Supervised Learning and Support Vector Machine

Raj Kamal

[email protected]

Department of Mathematics

Indian Institute of Technology,Guwahati

Guwahati-781039,India

Machine Learning – p. 1

Page 2: Machine  learning SVM

Sem

inar

1-1

Page 3: Machine  learning SVM

Outline of the talk

Introduction

Machine Learning – p. 2

Page 4: Machine  learning SVM

Outline of the talk

Introduction

Motivation

Machine Learning – p. 2

Page 5: Machine  learning SVM

Outline of the talk

Introduction

Motivation

Support Vector Machines

Machine Learning – p. 2

Page 6: Machine  learning SVM

Outline of the talk

Introduction

Motivation

Support Vector Machines

Softwares

Machine Learning – p. 2

Page 7: Machine  learning SVM

Outline of the talk

Introduction

Motivation

Support Vector Machines

Softwares

Applications

Machine Learning – p. 2

Page 8: Machine  learning SVM

Outline of the talk

Introduction

Motivation

Support Vector Machines

Softwares

Applications

Conclusion

Machine Learning – p. 2

Page 9: Machine  learning SVM

Outline of the talk

Introduction

Motivation

Support Vector Machines

Softwares

Applications

Conclusion

Machine Learning – p. 2

Page 10: Machine  learning SVM

Machine Learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the

design and development of algorithms that allow computers to evolve behaviors based on

empirical data, such as from sensor data or databases.

Here computer learns the algorithms from the experience.

Idea: Synthesize computer programs by learning from representative examples of input (and

output) data. Rationale Learning from Examples: A. For many problems, there is no known

method for computing the desired output from a set of inputs. B. For other problems, computation

according to the known correct method may be too expensive.

How can we build computer systems that automatically improve with experience, and what are the

fundamental laws that govern all learning processes?

Machine Learning

Machine Learning – p. 3

Page 11: Machine  learning SVM

Machine Learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the

design and development of algorithms that allow computers to evolve behaviors based on

empirical data, such as from sensor data or databases.

Here computer learns the algorithms from the experience.

Idea: Synthesize computer programs by learning from representative examples of input (and

output) data. Rationale Learning from Examples: A. For many problems, there is no known

method for computing the desired output from a set of inputs. B. For other problems, computation

according to the known correct method may be too expensive.

How can we build computer systems that automatically improve with experience, and what are the

fundamental laws that govern all learning processes?

Machine Learning

Machine Learning – p. 3

Page 12: Machine  learning SVM

Machine Learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the

design and development of algorithms that allow computers to evolve behaviors based on

empirical data, such as from sensor data or databases.

Here computer learns the algorithms from the experience.

Idea: Synthesize computer programs by learning from representative examples of input (and

output) data. Rationale Learning from Examples: A. For many problems, there is no known

method for computing the desired output from a set of inputs. B. For other problems, computation

according to the known correct method may be too expensive.

How can we build computer systems that automatically improve with experience, and what are the

fundamental laws that govern all learning processes?

Machine Learning

Machine Learning – p. 3

Page 13: Machine  learning SVM

continue

What is the Learning Problem?

Learning = Improving with experience at some task

1. Improve over task T ,

2. with respect to performance measure P

3. based on experience E

Machine Learning – p. 4

Page 14: Machine  learning SVM

continue

What is the Learning Problem?

Learning = Improving with experience at some task

1. Improve over task T ,

2. with respect to performance measure P

3. based on experience E

Machine Learning – p. 4

Page 15: Machine  learning SVM

Variants of Machine Learning

1. Supervised Learning : Given a set of label training-data xi, yi , with xi be a set of samples and yi a

set of labels.

2. Unsupervised Learning : Given only a set of data xi . Learning without output values (data

exploration, e.g. clustering).

3. Query Learning : Learning where the learner can query the environment about the output

associated with a particular input.

4. Reinforcement Learning : Learning where the learner has a range of actions which it can take to

attempt to move towards states where it can expect high rewards. Cocktail Party Problem ,Sound

overlapping and verification

Problems are solved using methods of statistics: Regression,EM algorithm,MLE algorithm

Machine Learning – p. 5

Page 16: Machine  learning SVM

Variants of Machine Learning

1. Supervised Learning : Given a set of label training-data xi, yi , with xi be a set of samples and yi a

set of labels.

2. Unsupervised Learning : Given only a set of data xi . Learning without output values (data

exploration, e.g. clustering).

3. Query Learning : Learning where the learner can query the environment about the output

associated with a particular input.

4. Reinforcement Learning : Learning where the learner has a range of actions which it can take to

attempt to move towards states where it can expect high rewards. Cocktail Party Problem ,Sound

overlapping and verification

Problems are solved using methods of statistics: Regression,EM algorithm,MLE algorithm

Machine Learning – p. 5

Page 17: Machine  learning SVM

Variants of Machine Learning

1. Supervised Learning : Given a set of label training-data xi, yi , with xi be a set of samples and yi a

set of labels.

2. Unsupervised Learning : Given only a set of data xi . Learning without output values (data

exploration, e.g. clustering).

3. Query Learning : Learning where the learner can query the environment about the output

associated with a particular input.

4. Reinforcement Learning : Learning where the learner has a range of actions which it can take to

attempt to move towards states where it can expect high rewards. Cocktail Party Problem ,Sound

overlapping and verification

Problems are solved using methods of statistics: Regression,EM algorithm,MLE algorithm

Machine Learning – p. 5

Page 18: Machine  learning SVM

Supervised Learning

1. Training Set :- Training Examples where input and output are known from experiment

2. x(i) :- ith Input value/vector

3. y(i) :- ith Output value/vector

4. (x(i),y(i)) i=1...m:- Training set,m input and output training examples

5. X :- space of input value/vector

6. Y :- space of output value/vector.

7. To describe the supervised learning problem,our goal is to learn a function h(x) : X → Y . such

that h(x) is a good predictor of corresponding value of y.

8. h(x) :- hypothesis

Machine Learning – p. 6

Page 19: Machine  learning SVM

Continue

1. When Target Domain is continuos we call learning problem a Regression Problem.

2. When Y can take descrete value we call it as Classification Problem

3. x ∈ ℜn ,n= no. of features

4. xij :- jth feature of ith training set.

5. an ith training set can have different features (shapes,size,cost).

6. To perform Supervised Learning,we must decide how we are going to do .

7. hℜθ = θ0 + θ1 ∗ x1 + ...+ θn ∗ xn.

8. hθ(x) = Σθi ∗ xi where x0 = 1

9. classifier =0,1

Machine Learning – p. 7

Page 20: Machine  learning SVM

Support Vector Machine(SVM)

Most classification tasks are not as simple ,more complex structure are needed to make optimal

separation,full separation would require a curve

We can see the original objects mapped i.e, rearranged using a set of mathematical functions called

kernels.By this they are linearly separable

Instead of constructing the complex curve all we have to do is to find a optimal line that can separate

these as positive and negative examples

SVM is primarily a classifier method that performs classification task by cosntructing

Goal : To optimize decision boundary.

Machine Learning – p. 8

Page 21: Machine  learning SVM

continue

Binary classifier :-Y ǫ−1, 1

Machine Learning – p. 9

Page 22: Machine  learning SVM

continue

Y ǫ−1, 1

hω,b(x) = g(ωT x+ b)

θi are repalced with ωi

g(z) = 1, z ≥ 0

g(z) = 0, otherwise

ω = (ω1, ω2, ....., ωn)T

Machine Learning – p. 10

Page 23: Machine  learning SVM

continue

Functional Margin:

Given (x(i), y(i0) ith training set we define Functional Margin

Υ(i) = y(i) (ω(T )x+ b)

y(i) = −1 functional margin to be large we need (ωT x+ b) to be large (more negative)

y(i) = 1 functional margin to be large we need (ωT x+ b) to be large (more positive)

functional margin large,so that our predictio is correct and confident.

Although it is not a good measure (scaling can have adverse effect ,it scales up just by

exploiting the scaling freedom and make functional margin large )

Functional Margin:

Υ = min ˆ(Υ(i))i = 1, 2, 3, ...m.

Machine Learning – p. 11

Page 24: Machine  learning SVM

continue

Geometric Margin

decision boundary corresponding to (ω,b)

distance of A from decision boundary =AB Υ(i)

(ω)(‖ω‖) unit vector pointing in same direction as ω

xi −Υi ∗ω

‖ ω ‖7→ B

Machine Learning – p. 12

Page 25: Machine  learning SVM

continue

the above satisfy ωT ∗ x+ b = 0

solving :- γ(i) = ( ω

‖ω‖ ∗ x(i) + b

‖ω‖)

Geometrical Margin :

γ(i) = y(i) ∗ (ω

‖ ω ‖) ∗ x(i) +

b

‖ ω ‖

It is invariant to scaling.

γ = min(γ(i)), i = 1, 2..m

Machine Learning – p. 13

Page 26: Machine  learning SVM

continue

OPTIMAL MARGIN CLASSIFIER

Given a Training set,it seems from previous naturaldesideration is to find decisionboundary that optimizes the geometric margin,sincethis would reject a very confident set ofprediction on the training set and a good fit to traindata.

Classifier that separates positive and negativetraining examples with gap.

Machine Learning – p. 14

Page 27: Machine  learning SVM

continue

This lead to the following Optimization Problem

maxΥωb Υi = 1, 2, ..,m

such that y(i)((ω)T xi + b) ≥ Υi = 1, 2, ...m

‖ ω ‖2= 1 Functional Margin = Geometric Margin

Functional margin at least Υ and we maximise Geometric margin.

maxΥωbΥ

‖ω‖2

such that y(i)((ω)T xi + b) ≥ Υi = 1, 2, ...m

impose Υ = 1

minΥ,ω,b12‖ ω ‖2

such that y(i)((ω)T xi + b) ≥ 1i = 1, 2, ...m

The following gives optimal Margin Classifier ,we can solve by QP quadratic programming Code.

Machine Learning – p. 15

Page 28: Machine  learning SVM

continue

gi(ω) = −yi(ωT xi + b) + 1

OPtical Margin Classifiers

minΥ,ω,b12‖ ω ‖2

such that gi(ω) ≤ 0

Dual

maxαW (α) = Σαi − 12Σy(i)y(j)αiαj < x(i), x(j) > αi ≥ 0, i = 1, 2, ...m

Σαiy(i) = 0i = 1, 2, , ...m

Machine Learning – p. 16

Page 29: Machine  learning SVM

continue

on Solving we get

ω = Σαiy(i)x(i)

b =max(i : y

(i) = −1)ωTX(i) +min(i : y(i) = 1)ωTX(i)

2

f(x) = ωTX + b = Σ(i = 1, 2, ..m)αiy(i) < xi, x > +b

hω,b(x) = g(ωTx+ b)

Machine Learning – p. 17

Page 30: Machine  learning SVM

continue

What if Data set is too hard to linearly separate

We add slack variables ξ to allow misclassification of difficult noise reults called Soft Margin

Primal

minγ,ω,b

1

2(‖ ω ‖)2 + CΣm

i=1ξi

such that

y(i)(ωT ∗ x(i) + b) ≥ 1− ξii = 1, 2, ...,m

ξi ≥ 0

,i=1,2,..m

now we have permitted to chose functional margin less than 1

C[Σξi

controls

‖ ω ‖≈ 1

Machine Learning – p. 18

Page 31: Machine  learning SVM

continue

What if the data set is too hard to handle ,then we map input to higher dimentional using kernels

φ(x) : x → ϕ(x)

φ(x)=feature mapping which maps attribute to input features

K(x, z) = φ(x)Tφ(x)

replace

< x, z > withK(x, z)

exploit it to use SVM implicitely to slove

Kernels

polynomial kernel ,Guassian kernel

Machine Learning – p. 19

Page 32: Machine  learning SVM

continue

Polynomial kernel

φ(x) =

x1x1

x1x2

x1x3

x2x1

x2x2

x2x3

x3x1

x3x2

x3x3√2cx1

Machine Learning – p. 20

Page 33: Machine  learning SVM

continue

Polynomial Kernel

K(x, z) =< xTz + c >d

Guassian kernel

K(x, z) = exp(‖ x− z ‖2

−2σ2)

Kernel helps in computation by reducing timecomplexity

Machine Learning – p. 21

Page 34: Machine  learning SVM

Machine Learning

1. Natural Language processing

2. Data Mining

3. Speech Recognition

4. Classifying web Documents,emails

5. Statistics

6. Economics

7. Finance

8. Robotics

9. .. and so on

Machine Learning – p. 22