pattern recognition and machine learning: kernel methods

Pattern Recognition and Machine Learning: Kernel Methods

Overview

Many linear parametric models can be recast into an equivalent dual representation in which the predictions are based on linear combinations of a kernel function evaluated at the training data points

Kernel k(x,x’) = Ф(x)T Ф(x’) Ф(x) is a fixed nonlinear feature space

mapping Kernel is symmetric of its arguments

i.e. k(x,x’) = k(x’,x)

Overview

Kernel trick or kernel substitution is the general idea that, if we have an algorithm formulated in such a way that the input vector x enters only in the form of scalar products, then we can replace the scalar product with some other choice of kernel

Stationary kernels – invariant to translations in input space k(x,x’) = k(x-x’)

Homogeneous kernels (RBF) – depend only on the magnitude of the distance k(x,x’) = k(||x-x’||)

Dual Representations

Constructing Kernels

Approach 1: Choose a feature space mapping and then use this to find the kernel

Approach 2: Construct kernel functions directly such that it corresponds to a scalar product in some feature space

A simpler way to test without having to construct Ф(x): Use the necessary and sufficient condition

that for a function k(x,x’) to be a valid kernel, the Gram matrix K, whose elements are given by k(xn,xm), should be positive semidefinite for all possible choices of the set {xn}

Another powerful technique is to build them out of simpler kernels

Historically introduced for the purpose of exact function interpolation

The values of the coefficients are found by least squares

Since there are as many constraints as coefficients, results in a function that fits every target value exactly

Radial Basis Functions

Imagine the noise on the input variable x, described by a variable ξ having a distribution (ξ), the sum of squares error function is

Basis function centred on every data point Nadaraya-Watson model

Radial Basis Functions

Nadaraya-Watson model

Can also be derived from kernel density estimation

where f(x,t) is the component density function and there is one such component centred on each data point

We now find an expression for the regression function y(x), corresponding to the conditional average of the target variable conditioned on the input variable

This model is also known as kernel regression

For a localized kernel function, it has the property of giving more weight to data points that a close to x

pattern recognition and machine learning: kernel methods

Documents

pattern recognition receptors in immune...

pattern recognition •why is pattern recognition …pattern...

invariant kernel functions for pattern analysis and ...1...

statistical pattern recognition techniques for intrusion...

kernel methods for pattern analysis

1 perceptual processes introduction pattern recognition...

pattern recognition (pr) statistical pr - university of...

introduction to object recognition & svm · •linear svm...

visual pattern recognition: pattern classification and

introduction to pattern recognition. pattern recognition

pattern recognition

pattern recognition: an overview -...

pattern recognition •why is pattern recognition...

raf103f mynsturgreining pattern recognition · pattern...

object recognition with hierarchical kernel...

pattern recognition: statistics to deep...

pattern recognition & matlab intro: pattern recognition,...

pattern recognition techniques for boson sampling...

multivariate pattern recognition for...

pattern recognition and machine learning chapter 7: sparse...