support vector machines part 2
DESCRIPTION
Support Vector Machines Part 2. Recap of SVM algorithm. Given training set S = {( x 1 , y 1 ), ( x 2 , y 2 ), ..., ( x m , y m ) | ( x i , y i ) n {+1, -1} Choose a cheap-to-compute kernel function k ( x , z ) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/1.jpg)
Support Vector MachinesPart 2
![Page 2: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/2.jpg)
Recap of SVM algorithm
Given training set S = {(x1, y1), (x2, y2), ..., (xm, ym) | (xi, yi) n {+1, -1}
1. Choose a cheap-to-compute kernel function k(x,z)
2. Apply quadratic programming procedure (using the kernel function k) to find {i}and bias b, where i ≠ 0 only if xi is a support vector.
![Page 3: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/3.jpg)
3. Now, given a new instance, x, find the classification of x by computing
![Page 4: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/4.jpg)
Clarifications from last time
![Page 5: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/5.jpg)
http://nlp.stanford.edu/IR-book/html/htmledition/img1260.png
Without changing the problem, we can rescale our data to set a = 1
Length of the margin
margin
![Page 6: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/6.jpg)
w is perpendicular to decision boundary
![Page 7: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/7.jpg)
More on Kernels• So far we’ve seen kernels that map instances in n to
instances in z where z > n.
• One way to create a kernel: Figure out appropriate feature space Φ(x), and find kernel function k which defines inner product on that space.
• More practically, we usually don’t know appropriate feature space Φ(x).
• What people do in practice is either:1. Use one of the “classic” kernels (e.g., polynomial), or2. Define their own function that is appropriate for their
task, and show that it qualifies as a kernel.
![Page 8: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/8.jpg)
How to define your own kernel
• Given training data (x1, x2, ..., xn)
• Algorithm for SVM learning uses kernel matrix (also called Gram matrix):
• We can choose some function k, and compute the kernel matrix K using the training data.
• We just have to guarantee that our kernel defines an inner product on some feature space.
• Not as hard as it sounds.
![Page 9: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/9.jpg)
What counts as a kernel?
• Mercer’s Theorem: If the kernel matrix K is “symmetric positive definite”, it defines a kernel on the training data, that is, it defines an inner product in some feature space.
• We don’t even have to know what that feature space is! It can have a huge number of dimensions.
![Page 10: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/10.jpg)
![Page 11: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/11.jpg)
![Page 12: Support Vector Machines Part 2](https://reader036.vdocuments.site/reader036/viewer/2022062410/56816641550346895dd9b0e1/html5/thumbnails/12.jpg)
In-class exercises
Note for part (c):