kernel approaches to supervised/ unsupervised machine ...€¦ · machine learning techniques are...
TRANSCRIPT
![Page 1: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/1.jpg)
S.Y. Kung
Princeton University
Kernel Approaches to Supervised/ Unsupervised Machine Learning*
*Short course organized by NTU and IEEE Signal Processing Society, Taipei Section, June 29-30, 2009. For lecture notes, see the URL website of IEEE Signal Processing section, http://ewh.ieee.org/r10/taipei/
![Page 2: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/2.jpg)
1. Kernel-Based Unsupervised/supervised Machine Learning: Overview
2.Kernel Based Representation Spaces
3. Kernel Spectral Analysis (Kernel PCA)
4. Kernel Based Supervised Classifiers
5. Fast Clustering Methods for Positive-Definite Kernels (Kernel Component Analysis and Kernel Trick)
6. Extension to Nonvectorial Data and NPD (None-Positive-definite) Kernels
Topics:
![Page 3: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/3.jpg)
Machine learning techniques are useful for data mining and pattern recognition when only pairwise relationships of the training objects are known - as opposed to the training objects themselves. Such pairwise learning approach has a special appeal to many practical applications such as multimedia and bioinformatics, where a variety of heterogeneous sources of information are available. The primary information used in the kernel approach is either (1) the kernel matrix (K) associated with eithervectorial data or (2) the similarity matrix (S) associated with nonvectorial objects.
1. Overview
![Page 4: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/4.jpg)
From Linear to Kernel SVMs
Subject to the constraints:
Wolfe optimization:
Linear inner-product objective function
Kernel inner-product objective function
Decision Function:
![Page 5: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/5.jpg)
Why the Kernel Approach:
Traditionally, we have linear dot-productfor two vectorial data x, and y.
New Dot-Product means different distance (norm) metric in the new Hibert space.
, Tx y x y=
D=Distance 2 2 2|| || || || 2 TD x y x y= + −
, , 2 ,x x y y x y= + −
Traditionally,
In the Kernel Approach, a new distance:
- A New Distance Metric
2 ( , ) ( , ) 2 ( , )D K x x K y y K x y= + −%
![Page 6: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/6.jpg)
Kernelization ⇒ Cornerization
![Page 7: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/7.jpg)
Application to XOR Problem
teacher= +1
teacher= +1
teacher= -1
teacher= -1
Not linearly separable in X….…but linearly separable in kernel spaces: e.g. F, E, & K.
![Page 8: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/8.jpg)
PCA vs. Kernel PCA
![Page 9: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/9.jpg)
Such pairwise learning approach has a special appeal to many practical applications such as multimedia and bioinformatics, where a variety of heterogeneous sources of information are available.
The learning techniques can be extended to the situations where only pairwise relationships of the training objects are knownas opposed to the training objects themselves.
Why the Kernel Approach: nonvectorial data
![Page 10: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/10.jpg)
• Nonvectorial Data: NPD Kernel Matrix
• Vectorial Data:
well-defined centroid/distance.PD
Two kinds of Kernel (or Similarity) Matrices
Sigmoid Kernel (rarely used) can be NPD.
polynomial kernel:
Similarity Matrix
The primary information used in the kernel approach:
PD Kernel Matrix for
or Gaussian kernel:
![Page 11: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/11.jpg)
(1) Kernel matrix (K) for vectorial data
Microarray gene expression data:
Kernel matrix for tissue samples:
Kernel matrix for genes:
![Page 12: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/12.jpg)
Hierarchical clustering of gene expression data the grouping of genes (rows) and normal and malignant lymphocytes (columns). Depicted are the 1.8 million measurements of gene expression from 128 microarray analyses of 96 samples of normal and malignant lymphocytes.
ALIZADEH et al. "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling", Nature 403 503–511.
Microarray Data Matrix
![Page 13: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/13.jpg)
(2) Similarity matrix (S) for nonvectorial objects.
![Page 14: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/14.jpg)
1. Kernel-Based Unsupervised/supervised Machine Learning: Overview
2. Kernel Based Representation Spaces• 2.1 Basic Kernel Hilbert Space• 2.2 Spectral Space• 2.3 Empirical Space• 2.4 Linear Mapping Betw. Spaces
3. Kernel Spectral Analysis (Kernel PCA)
4. Kernel Based Supervised Classifiers
5. Fast Clustering Methods for Positive-Definite Kernels (Kernel Component Analysis and Kernel Trick)
6. Extension to Nonvectorial Data and NPD Kernels
Topics:
![Page 15: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/15.jpg)
Kernel Based Representation Spaces
New vector space:
“Linear” discriminant function:
![Page 16: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/16.jpg)
Polynomial Kernel Function
New vector space:
“Linear” discriminant function:
![Page 17: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/17.jpg)
XOR Training Dataset
teacher= -1teacher= +1 teacher= -1teacher= +1
![Page 18: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/18.jpg)
The representation vectorspace has the following properties:
•It is a Hilbert space endowed with dot-product.
•It has non-orthogonal bases with a finite dimension, J.
•It is independent of the training dataset.
![Page 19: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/19.jpg)
Kernel and Basis FunctionOrthogonal Basis Function
and
Assume K(x,y) is positive-definite (PD)[21], i.e. λ΄k≥0
![Page 20: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/20.jpg)
These basis functions form the coordinate bases of the representation vector space with the dot-product:
2.1 Basic Kernel Hilbert Space
![Page 21: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/21.jpg)
•It is a Hilbert space endowed with a dot-product .
•It has ordered orthogonal bases with indefinite dimension, and
•It is independent of the training dataset.
Basic Kernel Hilbert Space: F
It has the following properties:
![Page 22: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/22.jpg)
Kernel Matrix
Two Factorizations of Kernel Matrix:
E: Kernel Spectral Space
F: Basic Kernel Hilbert Space(1)
(2)
![Page 23: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/23.jpg)
Factorization: Basic Kernel Hilbert Space
![Page 24: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/24.jpg)
Apply the eigenvalue decomposition on the kernel-matrix K:
Kernel Spectral Factorization
Spectral Component N
Spectral Component 2
Spectral Component 1
…
![Page 25: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/25.jpg)
2.2 Spectral Vector Space: E
![Page 26: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/26.jpg)
•It is a Hilbert space endowed with the conventional (linear) dot-product.•It has a finite dimension, N, with ordered orthogonal bases
•It is dependent of the training dataset. More precisely, E should be referred to as the kernel spectral space with respect to the training dataset.
Spectral Space: EIt has the following properties:
![Page 27: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/27.jpg)
E: Kernel Spectral SpaceF: Basic Kernel Hilbert Space
The Twin Spaces
![Page 28: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/28.jpg)
The limiting case of KCA (i.e. with indefinite number of training patterns covering the entire vector space RN:
E: Kernel Spectral Space
F: Basic Kernel Hilbert Space
1 2
1 2
' ' 'N
N
λ λ λλ λ λ
= =L
When N is extremely large with the training dataset uniformly distributed in the entire RN , then
![Page 29: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/29.jpg)
Preservation of Dot-Products
F E
K-Means on E
![Page 30: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/30.jpg)
The spectral space E is effective for dimensional reduction.
Note that, although both kernel Hilbert have ordered orthogonal bases, the latter is more effective for dimensional reduction since it takes into account of the distribution of the training dataset.
e.g. Spectral K-Means on E
Given the equivalence between the two spaces, the (same) supervised/unsupervised learning results can be (more easily) computed via the finite-dimensional spectral space, as opposed to the indefinite-dimensional space F.
![Page 31: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/31.jpg)
XOR Dataset
Empirical Vector
![Page 32: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/32.jpg)
diagonalization
Kernel Matrix
XOR Dataset
spectral factorization
![Page 33: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/33.jpg)
Spectral Space: EEmpirical Vector
diagonalization Component N
Component 2
Component 1
…
ord
ere
d o
rthogonal c
om
ponents
XOR Dataset
![Page 34: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/34.jpg)
⎥⎦
⎤⎢⎣
⎡−−
41.141.1
1x
2x
⎥⎦
⎤⎢⎣
⎡41.141.1
2
2
Now verify:
![Page 35: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/35.jpg)
Solution by Kernel K-means
Solution by K-means
Comparison of Clustering Analyses
XOR Dataset
![Page 36: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/36.jpg)
The empirical vector is defined as
2.3 Empirical Space: K
![Page 37: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/37.jpg)
empirical vector:
4 empirical vectors
4 original vectorsXOR Dataset
![Page 38: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/38.jpg)
• It is a Hilbert space endowed with the conventional (linear) dot-product.•It has a finite dimension, N, with non-orthogonal bases. (Thus, unlike the spectral space, it cannot enjoy the advantage of having ordered orthogonal bases.)•It is dependent of the training dataset.
Empirical Space: KIt has the following properties:
![Page 39: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/39.jpg)
“The Three Spaces”
2.4 Linear Mapping:
![Page 40: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/40.jpg)
Introduce a unifying Vector “z”
Linear Mapping:
![Page 41: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/41.jpg)
Kernel PCALinear Mapping:
![Page 42: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/42.jpg)
The discriminant function (for SVM or KFD) can be expressed as
Representation Theorem
A kernel solution w can always be represented by
Linear Mapping:
![Page 43: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/43.jpg)
Consequently, it opens up a new and promising clustering strategy, termed "empirical K-means".
None-Preservation of Dot-Products
![Page 44: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/44.jpg)
1. Kernel-Based Unsupervised/supervised Machine Learning: Overview
2.Kernel Based Representation Spaces
3. Kernel Spectral Analysis
• 3.1 Conventional PCA• 3.2 Kernel PCA
4. Kernel Fisher Discriminant (KFD) analysis
5. Kernel-based Support Vector Machine and Robustification of Training
6. Fast Clustering Methods for Positive-Definite Kernels (Kernel Component Analysis and Kernel Trick)
7. Extension to Nonvectorial Data and NPD (None-Positive-definite) Kernels
Topics:
![Page 45: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/45.jpg)
Therefore, the PCA may be solved by applying eigenvalue decomposition (i.e. diagonalization) to
as opposed to
S=
3.1 Conventional PCA
![Page 46: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/46.jpg)
which has two eigenvalues 4:0 and 4:0.
For the four XOR training data,
The two nonzero eigenvalues are again 4:0 and 4:0, just the same as before.
The corresponding pairwise score matrix is
![Page 47: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/47.jpg)
The kernel PCA basically amountsto the spectral decomposition of K.
3.2 Kernel Spectral Analysis[32]
![Page 48: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/48.jpg)
Reconfirms the Representer Theorem:
The kernel PCA basically amounts to the spectral decomposition of K.
![Page 49: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/49.jpg)
The m-th components of of the entire set of training data:
Finding the m-th (kernel) components:
![Page 50: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/50.jpg)
Kernel PCA
KCA
The eigenvector property can be confirmed as follows:
![Page 51: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/51.jpg)
From the spectral factorization perspective
Kernel PCA
![Page 52: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/52.jpg)
Kernel PCA: XOR Dataset
![Page 53: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/53.jpg)
The two principal components
The two principal eigenvectors
Kernel PCA
![Page 54: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/54.jpg)
PCA vs. Kernel PCA
![Page 55: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/55.jpg)
Cornering Effect of Kernel Hilbert Space
PCA vs. Kernel PCA
![Page 56: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/56.jpg)
![Page 57: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/57.jpg)
1. Kernel-Based Unsupervised/supervised Machine Learning: Overview2. Kernel Based Representation Spaces3.Principal Component analysis (Kernel PCA)
4. Supervised Kernel Classifiers4.1 Fisher Discriminant Analysis (FDA) 4.2 Kernel Fisher Discriminant (KFD) 4.3 Perfect KFD4.4 Perturbed Fisher Discriminant (PFD)
- Stochastic Modeling of Training Dataset4.5 Hybrid Fisher-SVM Approach
5. Fast Clustering Methods for Positive-Definite Kernels
6. Extension to Nonvectorial Data and NPD (None-Positive-definite) Kernels.
Topics:
![Page 58: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/58.jpg)
In the traditional LDA,
= Linear Discriminant Analysis (LDA)4.1 Fisher Discriminant Analysis (FDA)[8]
Between-Class Scatter Matrix
Total Scatter Matrix
Within-Class Scatter Matrix
![Page 59: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/59.jpg)
Traditional FDA, • Data-Distribution-Dependent
• Range: [0, ∞]
A useful variant,
both share the same optimal w.
• Range: [0,1]
Assume that N > M, so that S is invertible:
![Page 60: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/60.jpg)
Note that
By definition:
![Page 61: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/61.jpg)
=where
|| d ||=1
![Page 62: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/62.jpg)
Generically,
where
Fisher Score
Closed-Form Solution of FDA (M≥N):
![Page 63: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/63.jpg)
The optimal solution can be expressed as
![Page 64: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/64.jpg)
In the KFD Analysis, we shall assume the generic situations that training vectors are distinctive and other pathological distributions are excluded.
References [22-25]
4.2 KFD Analysis
![Page 65: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/65.jpg)
“universal” vector: z
Most common mappings:
![Page 66: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/66.jpg)
“Data” Matrix in Spectral Space: E
Pre-centered Matrix:
![Page 67: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/67.jpg)
![Page 68: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/68.jpg)
4.3 Perfect KFD
![Page 69: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/69.jpg)
A perfect KFD solution expressed in the “universal” vector space is:
Perfect KFD
![Page 70: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/70.jpg)
→ 0, after the projection.Within-Class Scatter Matrix
J= 1Note that implies that
Namely, the positive and negative training vectors fall on two separate but parallel hyperplanes.
![Page 71: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/71.jpg)
Positive & Negative Hyperplanes.
The entries of d have two different values:
for positive hyperplane
for negative hyperplane
for all the positive training patterns.
for all the negative training patterns.
![Page 72: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/72.jpg)
FDA/LDA: XOR Example
= 0
2 2 2 2 4 02 2 2 2 0 4
−⎡ ⎤ ⎡ ⎤ ⎡ ⎤= + =⎢ ⎥ ⎢ ⎥ ⎢ ⎥−⎣ ⎦ ⎣ ⎦ ⎣ ⎦
4 00 4⎡ ⎤
= ⎢ ⎥⎣ ⎦
= 0
The optima are Data-Distribution-Dependent.
![Page 73: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/73.jpg)
KFD: XOR Example
Data-Distribution-Independent.
Data-Distribution-Dependent.
Because E is Data-Distribution-Dependent.
![Page 74: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/74.jpg)
Confirmed for this example:
The optima are Data-Distribution-Independent.
KFD: XOR Example
Next: More examples. Numerical Issues.
![Page 75: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/75.jpg)
x1
x2
x4
x3
First example:
Data-Distribution-Independence
○
○
○
○
1.10.8
11
-1.1-0.8
-1-1
![Page 76: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/76.jpg)
>>K=
9.0000 8.4100 1.0000 0.81008.4100 8.1225 0.8100 0.72251.0000 0.8100 9.0000 8.41000.8100 0.7225 8.4100 8.1225
>> [UT, Lambda]=eigs(K)UT=
0.5154 -0.5098 -0.4841 0.49000.4841 -0.4900 0.5154 -0.50980.5154 0.5098 -0.4841 -0.49000.4841 0.4900 0.5154 0.5098
Lambda =18.6606 0 0 0
0 15.3059 0 00 0 0.1844 00 0 0 0.0941
LambdaSqrt =4.3198 0 0 0
0 3.9123 0 00 0 0.4294 00 0 0 0.3067
>> E= LambdaSqrt *U=2.2264 2.0913 2.2264 2.0913-1.9943 -1.9172 1.9943 1.9172-0.2079 0.2213 -0.2079 0.22130.1503 -0.1564 -0.1503 0.1564
>> data-centered E= ee =
0.0675 -0.0675 0.0675 -0.0675-1.9943 -1.9172 1.9943 1.9172-0.2146 0.2146 -0.2146 0.21460.1503 -0.1564 -0.1503 0.1564
>> denominator_matrix=ee*ee' =
0.0182 0.0000 -0.0580 0.00000.0000 15.3059 0.0000 0.0000 -0.0580 0.0000 0.1842 0.0000 0.0000 0.0000 0.0000 0.0941
![Page 77: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/77.jpg)
>> data-centered E=
0.0675 -0.0675 0.0675 -0.0675-1.9943 -1.9172 1.9943 1.9172-0.2146 0.2146 -0.2146 0.21460.1503 -0.1564 -0.1503 0.156
![Page 78: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/78.jpg)
KFD: Numerical Analysis
>> v = [ 0 1 0 0]T
>> v = [ 0 0 1 0]T is numerically risky, as small perturbation could cause big effect.
is numerically stable, as it is robust w.r.t. perturbation
Two different decision and two different scenarios
>> v = [ x x … x x ]T
each element has its own sensitivity.
In general,
![Page 79: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/79.jpg)
[ rho eta_opt ]1.0000 0.0000
[ score_ve score_hybrid_AR score_hybrid_dR score_vopt ]0.9351 0.9383 0.9383 0.9383
Easy Case
>> numerator_matrix
0.0000 0.0000 0.0000 0.00000.0000 15.3 0.0000 0.02360.0000 0.0000 0.0000 0.00000.0000 0.0236 0.0000 0.00005
>> v = [ 0 1 0 0]T
>> >> Fisher Score = 0.9996
Imperfect FDA score = 0.9997
![Page 80: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/80.jpg)
[ rho eta_opt ]1.0000 -0.2401
[ score_ve score_hybrid_AR score_hybrid_dR score_vopt ]0.1684 0.0919 0.1571 0.1684
Hard Case
>> v = [ 0 0 1 0]T
>> >> Fisher Score = 1.
>> numerator_matrix = ee*d*d'*ee'
0.0182 -0.0000 -0.0580 0.0000-0.0000 0.0000 0.0000 -0.0000-0.0580 0.0000 0.1843 -0.00000.0000 -0.0000 -0.0000 0.0000
>> v = [ 0 0 1 0]T
>> >> Fisher Score = 1.
“Imperfect” FDA score = 1.0000
![Page 81: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/81.jpg)
[ rho eta_opt ]1.0000 0.0000
[ score_ve score_hybrid_AR score_hybrid_dR score_vopt]0.0860 0.0863 0.0863 0.0863
> numerator_matrix = ee*d*d'*ee'
0.0000 -0.0000 0.0000 0.0000-0.0000 0.0060 -0.0000 -0.02370.0000 -0.0000 0.0000 0.00000.0000 -0.0237 0.0000 0.0940
>> v = [ 0 0 0 1 ]T
>> Fisher Score = 1.
vopt_dR =
-0.00000.0047
-0.00000.2803
Not as Hard Case- poorest result
Imperfect FDA score = 0.9634
![Page 82: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/82.jpg)
It may be further verified that a vector is a perfect KFD if and only if itfalls on the two-dimensional solution space prescribed above.
Solution Space of Perfect KFDs
![Page 83: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/83.jpg)
Solution Space of Perfect KFDs
sig
nal
90°
90°
90°
mass c
ente
r
Perfect KFD
90°
Illustration of why both the numerator and denominator are pictorially shown to be equal to 1.
![Page 84: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/84.jpg)
Invariant Properties
The denominator remains constant:
The numerator remains constant:
![Page 85: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/85.jpg)
To create a maximum margin between the training vectors to the decision hyperplane, the threshold is set as the average of the two projected centroids (instead of the projected mass center).
Selection of decision threshold: b
![Page 86: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/86.jpg)
Two Case Studies:
This self-centered solution matches the SVM approach.
Case I:
Case II:
![Page 87: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/87.jpg)
However, different perfect KFDs have their own margins of separation which have a simple formula:
This plays a vital role in the perturbation analysis, since the wider is the separation the more robust is the classifier.
Margin of Separation
![Page 88: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/88.jpg)
Margin of Separation
![Page 89: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/89.jpg)
The solution can be obtained by minimization of
This perfect KFD has a maximal PFD among its own kind.
Maximum Separation by Perfect KFD
![Page 90: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/90.jpg)
4.4 Perturbed Fisher Discriminant (PFD)
perfect is no optimal!
optimal is no perfect!
![Page 91: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/91.jpg)
Stochastic Modeling of Training DatasetNow each training pattern is viewed as an instantiation of a noisy measurement of the (noisy-free) signal, i.e. the data matrix associated wiht the robustified (noise-added) training patterns can be expressed as:
X+N
With some extensive analysis, the following can be established for the robustification in the spectral space
![Page 92: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/92.jpg)
Application Motivation: Stochastic Model of Training Dataset
Perturbed Fisher Discriminant
Simulation Supports
Say No to Conventional KFD or Perfect KFD.
Say Yes to PFD !
![Page 93: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/93.jpg)
Perturbed Fisher Discriminant
![Page 94: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/94.jpg)
PFD for Perfect KFD
Maximal PFD Among Perfect KFDs
Best Perfect KFD from PFD perspective:
![Page 95: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/95.jpg)
Perfect KFD vs. Imperfect KFD
sig
nal
tota
l=si
gnal
+nois
e
90°
90°
90°
noise
mass c
ente
r
Perfect KFD
Imperfect KFD
noise
![Page 96: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/96.jpg)
where
Globally Optimal PFD Solution
and
![Page 97: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/97.jpg)
0.92471.1131
score(best perfect) score(AR) score(vopt)
rho=.04: 0.9825 0.9880 0.9891
rho=.4: 0.8486 0.9081 0.9292
Dataset2: (easy case):
![Page 98: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/98.jpg)
0.06711.0056
score(best perfect) score(AR) score(vopt)
rho=.04: 0.5448 0.6283 0.6910
rho=.4: 0.1069 0.1983 0.5190
Dataset5 (hard case):
![Page 99: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/99.jpg)
(1) for rho=1: PFD= 0.8017 versus 0.8464, with KFD=0.9933;
(2) for rho=.1: PFD= 0.9675 versus 0.9778, with KFD=0.9959; and
(3) again for rho=.1, PFDs are 0.9573 and 0.9734 for perfect optimal KFD and regularized perfect KFD respectively.
Simulations: compare z= d, and optimal z 5 points,
![Page 100: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/100.jpg)
4.5 Hybrid Fisher-SVM
![Page 101: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/101.jpg)
Empirical Space
Computationally speaking, when all the three spaces are basically equivalent, e.g. KFD and SVM, the empirical space is the most attractive, since the spectral space incurs the extra cost for spectral decomposition.
![Page 102: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/102.jpg)
Positive & Negative Hyperplanes: in K
It is perfect, but not optimal!
![Page 103: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/103.jpg)
The Optimal PFD Solution
Regulation by stochastic model of data.
Derivation:
![Page 104: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/104.jpg)
An alternative and effective regularization approach is via imposition of constraints in the Wolf optimization, a technique adopted by kernel-SVM classifiers to alleviate over-training problems.
Regulation by Constraints
![Page 105: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/105.jpg)
Linear SVM
subject to
Lagrange multipliers can be solved by the Wolfe optimization:
![Page 106: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/106.jpg)
Kernel SVM
subject to
Solution by the Wolfe optimization:
Since the dot-product is changed to:
![Page 107: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/107.jpg)
Ignoring for a moment the constraints on ai., the optimal solution can be obtained by taking the derivative of the above with respect to ai. This leads to:
Linear System Solution
![Page 108: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/108.jpg)
XOR Example
Nonsingular K and exact solution:
Linear System Solution
KFD Solution
![Page 109: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/109.jpg)
A Hybrid Classifier Combining KFD and SVM
(1) On one hand, the cost function in theoptimization formulation follows that of KFD.
(2) On the other hand, a second regularization - based on the Wolfe optimization adopted by SVM – is incorporated.
Hybrid Fisher-SVM Classifier
![Page 110: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/110.jpg)
Hybrid Fisher-SVM Classifier
![Page 111: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/111.jpg)
where zi = di - η, and
This is exactly the same as the optimal PFD solution. Recall that
Constraint-Relaxed Solution
Hybrid Fisher-SVM Classifier
![Page 112: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/112.jpg)
Class-Balanced Wolfe Optimization with Perturbation-Adjusted Kernel Matrix
The approach is based on class-balanced Wolfe optimization with perturbation-adjusted kernel matrix.
1. Just like KFD, the class-balanced constraints will be adopted.
2. Under the perturbation analysis, the kernel matrix shouldbe replaced by its perturbation-adjusted more realistic representation:
In summary, by imposing a stochastic model on the training data, we derive a hybrid Fisher-SVM classifier which contains Fisher solution and SVM solution as special cases. On one hand, it gives the same solution as the optimal Fisher when the constraints are lifted. On the other hand, it approaches the (class-balanced) SVM when the perturbation is small.
![Page 113: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/113.jpg)
Does such a shift improve the predictability of the supervised classifiers?
K → K + s I
Kernel Shift
![Page 114: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/114.jpg)
Subcellular Dateset: hy50
![Page 115: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/115.jpg)
------------------------------------------
Kernel PNE min_evalue Accuracy(w/o shift) --> w/ offset)
-------------------------------------------
K1 8.37% -27.79 41.71% --> 52.66%K2 20.10% -23.27 49.47% --> 54.87%K3 0% 4.0922e+004 54.73%K4 10.16% -5.9002e+005 54.82% --> 51.96%K5 0% 1.6442e-010 75.08%
PNE: Percentage of negative eigenvalues in the kernel before applying diagonal shift.
Subcellular Dateset: hy50
![Page 116: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/116.jpg)
1. Overview
2.Kernel Based Representation Spaces
3.Principal Component analysis (Kernel PCA)
4.Supervised Classifier
5. Clustering Methods for PD Kernels • 5.1 K-means and Convergence• 5.2 Kernel K-means• 5.3 Spectral Space: Dimension-Reduction
• 5.4 Kernel NJ: Hierarchical Clustering• 5.5 Fast Kernel Trick
6. Extension to Nonvectorial Data and NPD Kernels
Topics:
![Page 117: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/117.jpg)
Gene expression
Clustering • Unsupervised– K-mean (EM)– SOM– HC
Three major types of clustering approaches:
Unsupervised Clustering Analysis
![Page 118: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/118.jpg)
5.1 Conventional K-meansand Convergence
References: [9][17][18]
![Page 119: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/119.jpg)
Clustering Criteria:
• Intra-Cluster Compactness
• Inter-Cluster Divergence
![Page 120: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/120.jpg)
Exclusive Membership in K-means
xt
exclusive membership
1μ
3μ
2μ
![Page 121: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/121.jpg)
The best parameters can be trained bymaximizing the log-likelihood function:
Likelihood-Based Criteria
Let the K classes be denoted by
and parameterized by
![Page 122: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/122.jpg)
Under the i.i.d. (independent identical distribution) assumption,
Substituting Eq. ** into Eq.*, we obtain
**
*
exclusive membership
i.i.d.
![Page 123: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/123.jpg)
K-means
The objective function to minimize is
![Page 124: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/124.jpg)
Winner Identification Centroid Update
Initiation
NoConvergence
Yes
Stop
?
K-means Flowchart
![Page 125: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/125.jpg)
Monotonic Convergence: The convergence is assured if (and only if) the pairwise matrix is positive definite. Thus, the convergence is assured for all the vector clustering problems, since all the commonly adopted kernel functions arepositive definite.
Sufficiency & Necessity
![Page 126: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/126.jpg)
Monotonic Convergence: The convergence is assured if (and only if) the pairwise matrix is positive definite. Thus, the convergence is assured for all the vector clustering problems, since most commonly adopted kernel functions arepositive definite.
Sufficiency & Necessity
![Page 127: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/127.jpg)
M-Step: update the new centroids (two arrows)
μ1 μ2
( )∑∈
−=X
rtt
tXE
xxx 2
)()( μ
..so as to reduce E(x):
Sufficiency
![Page 128: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/128.jpg)
E-Step: winner identification i.e. the new membership for some data. (3rd and 5th data will have a new owner (winner).
( )∑∈
−=X
rtt
tXE
xxx 2
)()( μDoes the new ownership reduce E(x)?
Sufficiency
![Page 129: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/129.jpg)
E-Step: new membership (color coded) are assigned
( )∑∈
−X
rtt
tx
xx 2)(μ ( )∑
∈
−X
rtt
tx
xx 2)('μ
Convergence: Sufficiency
Reduction of Energy
![Page 130: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/130.jpg)
( )
( ) ( ) ( )
( )∑
∑
∑
∈
∈
∈
−≤
−+−−−=
−
Xrt
GtBtX
rt
Xrt
t
t
t
t
t
t
xx
xx
xx
x
xxx
x
2)(
222)(
2)('
μ
μμμ
μ
The new membership help reduces E(x)!
Sufficiency
![Page 131: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/131.jpg)
μ1μ2
( ) ( )∑∑∈∈
−≤−X
rtX
rtt
t
t
tx
xx
x xx 2)('
2)('' μμ
( )∑∈
−X
rtt
tx
xx 2)('μ ( )∑
∈
−X
rtt
tx
xx 2)(''μ
Likewise, a function is called monotonically decreasing (non-increasing) if,whenever x ≤ y, then f(x) ≥ f(y), so it reverses the order. ...
M-Step: Maximization Minimization
Reduction of Energy
![Page 132: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/132.jpg)
⎥⎦
⎤⎢⎣
⎡01
⎥⎦
⎤⎢⎣
⎡00
⎥⎦
⎤⎢⎣
⎡10
A
B
'C
C
Necessity:Proof by counter-example.
.. not necessary Reduction of Energy after centorid update!
If the kernel matarix is NPD, then ..
![Page 133: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/133.jpg)
5.2 Kernel K-Means
![Page 134: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/134.jpg)
Conventional K-means, the objective function is
The objective of the kernel K-means aims tominimize:
Kernel K-Means
Criteria A: centroid-based objective function
K-means has very limited capability to separate clusters that are non-linearly separable in input space.
![Page 135: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/135.jpg)
Algorithm: Kernel K-means
(i) Winner Identification: Given any training vector x, compute its distances to the K centroids. Identify the winning cluster as the one with the shortest distance
(ii) Membership Update: If the vector x is added to a cluster Ck, then
update the centroid of Ck
(whether explicitly or not).
For PD Kernel, centroids are defined.
e
![Page 136: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/136.jpg)
Spectral K-Means
Kernel trick: Implicit formula
• Kernel PCA provides a suitable Hilbert space which can adequately manifest the similarity of the objects, vectorial or non-vectorial.
•circumvent the explicit derivation of the centroids •obviates the need of performing eigenvalue decomposition.
![Page 137: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/137.jpg)
Assuming that the pairwise (similarity) matrix K is PD, then the following clustering criteria are equivalent.
Equivalence under Mercer (PD) Condition:
Three (Equivalent) Objective Functions
![Page 138: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/138.jpg)
centroid-free and norm-free objective function
centroid-free but norm-based objective function [Duda73]
centroid-based objective function
Kernel K-means
![Page 139: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/139.jpg)
centroid-free and norm-free objective function
centroid-free but norm-based objective function [Duda73]
centroid-based objective function
Kernel K-means
![Page 140: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/140.jpg)
Compare A and B
B =
A =
![Page 141: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/141.jpg)
C is centroid-free and norm-free!
Compare B and C
B =
![Page 142: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/142.jpg)
5.3 Spectral K-Means
•Kernel Spectral Analysis is an effective approach.
![Page 143: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/143.jpg)
In [7],“…kernel k-means and spectral clustering….are related. Specifically, we can rewrite the weighted kernel k-means objective function as a trace maximization problem whose relaxation can be solved with eigenvectors. The result shows how a particular kernel and weight scheme is connected to the spectral algorithm of Ng, Jordan, and Weiss [27].
….. by choosing the weights in particular ways, the weighted kernel k-means objective function is identical to the normalized cut.”
Kernel K-means and Spectral clustering
![Page 144: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/144.jpg)
In the reduced-dimension
In the full-dimension, the full column of E is used:
Spectral Approximation
VCCVCVZZCV
VCCVCVZZCV
The Spectral “Approximate” Kernel matrix:
![Page 145: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/145.jpg)
where
centroid-free and norm-free objective function
Spectral Error AnalysisThe Spectral “Error” Kernel matrix is defined as
![Page 146: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/146.jpg)
Spectral Error Analysis
Since the first summation is independent of the clustering,
![Page 147: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/147.jpg)
Perturbation Analysis
1
1
11 1 1
1
k
k mk k k
k
N
NN N N
N
λ +
⎡ ⎤⎢ ⎥⎢ ⎥⎡ ⎤ ⎢ ⎥⎢ ⎥⎡ ⎤ ⎢ ⎥⎢ ⎥ ≤⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦⎢ ⎥⎢ ⎥⎣ ⎦
L
MK”
![Page 148: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/148.jpg)
If the kernel matrix is NPD, it is necessary to convert it to PD. One way to achieve this objective is by adopting a PD kernel matrix in the empirical space:
The dimension-reduced vectors are formed from the columns of
are used as the training vectors.
In the full-dimension, the full-size columns of
Approximation in Empirical Eigen-Space
![Page 149: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/149.jpg)
Perturbation Analysis
{ 2 21
1
11 1 1 m a x ,
1
k
k m Nk k k
k
N
N K KN N N
N
λ λ+
⎡ ⎤⎢ ⎥⎢ ⎥⎡ ⎤ ⎢ ⎥⎢ ⎥⎡ ⎤ ⎢ ⎥⎢ ⎥ ≤ −⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦⎢ ⎥⎢ ⎥⎣ ⎦
L
MK”
By using the same perturbation analysis and, in addition, notingthat the perturbation is upper bounded by the largest eigenvalueterms of magnitude) of
Here K denote the number of clusters (instead of the kernel matrix).
![Page 150: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/150.jpg)
The popular spectral clustering for graphic partitioning[27,34] resorts to a uniform (as opposed to weighted) truncation strategy.
In the Spectral Clustering, the training patterns are now represented by the columns of E". Unfortunately, the KCA-type error analysis is difficult to derive for the spectral clustering.
Comparison with Spectral Clustering
where the number of 1's are usually set to be equal to K or greater.
![Page 151: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/151.jpg)
Comparison Table
![Page 152: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/152.jpg)
The perturbation analysis has several practical implications
•As long as Kλm+1 is within the tolerance level, then m-dimensional spectral K-means will be sufficient to yield a satisfactory clustering result.
•If the global optimum holds a margin over any local optimum by an amount greater than 2 Kλm+1 (considering the worst possible situation), then the full-dimensional spectral K-means and m-dimensional spectral K-means will have the same optimal solution.
•When a NPD kernel matrix is encountered, it is common practice to discard the component corresponding to the negative spectral values. In this case, the same error analysis can be applied again to assess the level of deviation.
![Page 153: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/153.jpg)
Kernel K-Means (SOM/NJ) w/o explictly
computing centroids
Many applications may be solved without any specific space.
Kernel Trick for Unsupervised Learning
Kernel Trick needs No Space at All.
![Page 154: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/154.jpg)
Kernel Trick is effective for:
Kernel NJ Algorithm
Kernel K-Means for Highly Sparse Data
Neighbor-Joining (NJ) Algorithm
![Page 155: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/155.jpg)
5.4 Hierarchical NJ Algorithm
![Page 156: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/156.jpg)
Hierarchical Clustering
Hierarchical clustering analysis plays an important role in biological taxonomic studies, as a phylogenetic tree or evolutionary tree is often adopted to show the evolutionary lineages among various biological entities.
![Page 157: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/157.jpg)
Phylogenetic tree of SARS viral genesThe phylogenetic tree can help reveal the epidemic links.
The severe acute respiratory syndrome (SARS) virus started its outbreak during the winter of 2002. All of the SARS viruses can be linked via a phylogenetic tree, obtained from genomic data by measuring the DNA similarities between various SARS viruses.
![Page 158: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/158.jpg)
Kernel-NJFive vectorial or nonvectorial objects -(A, B, C, D, E) - have the following (positive definite) kernel matrix:
![Page 159: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/159.jpg)
Step 1: Compute N(N-1)/2 similarity/distance measures for the N objects.
These values correspond to the off-diagonal entries of a symmetric N× N similarity matrix.
Step 2: Merger of two most similar nodes: Update the merger node, remove the two old nodes.
Step 3: After N-1 mergers, all the objects are merged into one common ancestor, i.e. the root of all objects.
Neighbor-Joining (NJ) Algorithm
It starts with N nodes, i.e. one object per node:
![Page 160: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/160.jpg)
The conventional NJ Score Update is approximation-based.
A
B
DC
|CD||AD|+|BD|
2≠
Exact method based centroids is natural.
Distance is defined for a Hilbert Space endowed with a inner-product.
CA+B
2=
![Page 161: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/161.jpg)
Kernel Trick
The winner can be identified to be the cluster, say Ck, that yields the shortest distance.
The computation can be carried out exclusively based on Kij.
![Page 162: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/162.jpg)
Kernel Trick
![Page 163: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/163.jpg)
(i) Winner Identification: Given any training vector x, compute its distances to the K centroids. Identify the winning cluster as the one with the shortest distance
(ii) Membership Update: If a node is added to a cluster Ck, then merge that node into the cluster Ck to form C’k.
Kernel Trick for NJ Algorithm
![Page 164: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/164.jpg)
Kernel-NJ AlgorithmBecause SDE = 0.8 has the highest pairwise similarity. (DE) will first merge into a new node: F.
This leads to a reduced (4 x 4) similarity matrix:2AF AA FF AFd s s s= + −
1 0.9 2 0.3AFd = + − ×
1 0.9 2 0.35 1.2CFd = + − × =
![Page 165: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/165.jpg)
Centroid denoted by F
Centroid denoted by G
Centroid denoted by H
Kernel-NJ Algorithm
![Page 166: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/166.jpg)
5.5 Fast Kernel TrickFor Kernel K-Means
![Page 167: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/167.jpg)
Winner Identification
![Page 168: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/168.jpg)
Similarity Table for Fast Kernel Trick
N Objects K Clusters
![Page 169: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/169.jpg)
In the actual coding of fast kernel trick, we must take into account the fact whenever a cluster is winning a new member there is always another cluster losing an old member. As a result, two clusters (i.e. two columns of the similarity measures) will have to be updated. For illustration simplicity, we focus only on the winning cluster.
N
K
winner’s column
loser’s column
![Page 170: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/170.jpg)
Update of Centroids' Auto-Correlations:
Update of Similarity Table:
Slow Update
Fast Update
![Page 171: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/171.jpg)
•Winner Identification:
• Winner Update:•Winner Column Update•Winner's Auto-Correlation
• Loser Update:•Loser Column Update• Loser's Auto-Correlation:
Fast Kernel Trick
negligible
![Page 172: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/172.jpg)
Winner Column Update:
Loser's Auto-Correlation:
Winner's Auto-Correlation:
Loser Column Update:
Fast Kernel Trick
![Page 173: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/173.jpg)
Note that, for K-means via kernel component, there incurs a (one-time) cost of O(N3) ~ O(N4) for factorization.
O(N3) + Nr Ne O(KMN)
In contrast, the kernel trick obviates the need of performing spectral decomposition.
When to Use What?
Nr
Comparison of Computational Cost
![Page 174: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/174.jpg)
As stated in [7],
"Thus far, only eigenvector based algorithms have been employed to minimize normalized cuts in spectral clustering and image segmentation.However, software to compute eigenvectors of large sparse matrices (often based on the Lanczos algorithm) can have substantial computational overheads, especially when a large number of eigenvectors are to be computed. In such situations, our equivalence has an important implication: we can use k-means-like iterative algorithms for directly minimizing the normalized-cut of a graph."
When to use (Fast) Kernel Trick?
For Sparse Kernel Matrix, (Fast) Kernel Trick is relatively more attractive.
![Page 175: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/175.jpg)
Fast Kernel Trick has less advantage:
1. When the original vector dimension is smallor when the dimension can be substantially reduced with good approximation. (This is because the processing time of Kernel Trick is independent of vector dimension.)
2. Another occasion is for Kernel NJ is itself a finite algorithm, so kernel trickwill already be fast enough and there is no need of fast kernel trick or spectral dimension reduction.
When not to use Fast Kernel Trick?
![Page 176: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/176.jpg)
MATLAB: Kmeans(A,B,C,..)
Kernel Trick is less effective if the required number of K-means, Nr, is large.
Regular None-Sparse Kernel Matrix
Spectral K-Means is relatively more attractive.
Multiple Trial
![Page 177: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/177.jpg)
1. Overview 2. Kernel Based Representation Spaces3. Principal Component analysis (Kernel PCA)4. Supervised Classifier
5. Fast Clustering Methods for Positive-Definite Kernels
6. Extension to Nonvectorial Data and NPD (Non-Positive-definite) Kernels
Topics:
6.1 The Shift Approach• a. Shift-Invariance• b. Concern on The Shift Approach
6. 2 The Truncation Approach 6. 3 Empirical Vectors Space
![Page 178: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/178.jpg)
centroid undefined
Extension of Vectorial data to nonvectorial data.
A Vital Assumption: Positive DefiniteS
A
B
DC
centroid defined
2A Bc +
=
Distance is defined for a Hilbert Space endowed with a inner-product.
![Page 179: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/179.jpg)
By adjusting kernels and weights,the clustering algorithm can be applied to a broad spectrum of Graph Partitions applications. Some examples are:
Application of K-means to Graph Partitions:
• Clustering of Gene expression profiles• Network Analysis of Protein-protein-interaction• Interconnection Networks• Parallel processing algorithm: partitioning• Document data mining: word-by-document matrix• Market basket data: transaction-by-item matrix
![Page 180: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/180.jpg)
The problem is well defined and its convergence is assured if positive definite (PD) kernel functions are adopted.
The problem definition may requires some attention.The convergence of K-means can be assured only after the similarity matrix is converted to a PD matrix.
For nonvectorial data clustering problems:
For all the vector clustering problems:
![Page 181: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/181.jpg)
The ESSENCE of having a PD Kernel
![Page 182: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/182.jpg)
‘Forcing’ a PD Kernel:
Three Approaches
• Shift Approach• Truncation Approach• Squared Kernel: K2
![Page 183: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/183.jpg)
If K is NPD, it must be converted to a PD Matrix.
Kernel Shift
K + s I
6.1 The Shift Approach
![Page 184: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/184.jpg)
Does such a Shift Change the Nature of the Problem?
6.1a Shift-Invariance
Kernel NJ is Not Shift- InvariantKernel K-means is Shift- InvariantKernel SOM can be Shift- Invariant
![Page 185: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/185.jpg)
Kernel NJ is Not Shift- Invariant
Apply Shift, then the squared-distance is adjusted by
s + s/(p+1)
This penalizes smaller cluster and favors larger clusters.
For Kernel-NJ, the positive shift tends to favor identification of a larger cluster.
![Page 186: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/186.jpg)
Kernel-NJThis leads to a reduced (4 x 4) similarity matrix:
' 0.92FFss = +
3' (1 ) (0.9 ) 2 0.35 1.22 2CFs sd s= + + + − × = +
' (1 ) (1 ) 2 0.6 0.8 2ABd s s s= + + + − × = + ×
3 3 2' 1.2 1.2 4.22 2CF
sd × ×= + = + =' 0.8 2 2 4.ABd = + × = 8
' ' ' 1AA BB CCs s s s= = = +
(1) (0.9) 2 0.35 1.2CFd = + − × =(1) (1) 2 0.6 0.8ABd = + − × =
2s =
>
<
![Page 187: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/187.jpg)
centroid-free and norm-free objective function
centroid-free objective function
centroid-based objective function
Equivalence under Mercer (PD) Condition:
Kernel K-means is Shift- Invariant
![Page 188: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/188.jpg)
Example: centroid-free and norm-free objective function
![Page 189: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/189.jpg)
centroid-free objective function
centroid-based objective function
Equivalence under Mercer (PD) Condition:
Kernel K-means admits a centroid-free criterion
Pseudo-NormNPD
![Page 190: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/190.jpg)
+2s
+2Nks
+Ks
+s
constant
constant
Proof of Shift- Invariance
![Page 191: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/191.jpg)
Conversion of K into a PD Matrix.
Kernel K-means is Shift- Invariant
![Page 192: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/192.jpg)
If K is NPD, it must be converted to a PD Matrix.
Kernel Shift
K + s I
![Page 193: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/193.jpg)
Is the formulation rational?
Test Case:
The Tale of Twins.
6.1b Concern on The Shift
![Page 194: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/194.jpg)
⎥⎦
⎤⎢⎣
⎡01
⎥⎦
⎤⎢⎣
⎡00
⎥⎦
⎤⎢⎣
⎡10
A
B
'C
C
Necessity:Proof by counter-example.
![Page 195: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/195.jpg)
⎥⎦
⎤⎢⎣
⎡01
⎥⎦
⎤⎢⎣
⎡00
⎥⎦
⎤⎢⎣
⎡10
A
Example for Pseudo-Norm in Type-B
B
'C
C
Note: Empirical vectors
![Page 196: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/196.jpg)
B
The Concern lies in the twins are separated far from each other.
⎥⎦
⎤⎢⎣
⎡01
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
002
⎥⎦
⎤⎢⎣
⎡00
⎥⎦
⎤⎢⎣
⎡10
0
0ε
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
( )e A
( )e B
A
'C
C
( )e C
( ')e C00
1
⎡ ⎤⎢ ⎥⎢ ⎥−⎢ ⎥⎣ ⎦
![Page 197: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/197.jpg)
A popular and simple means to make an NPD matrix PD is by truncation of negative spectral components. [1,30]. Same perturbation analysis applies.
λm ≥ 0 >λm+1
6.2 The Truncation Approach
![Page 198: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/198.jpg)
6.3 Empirical Vectors Space
• Supervised Learning
• Unsupervised Learning: Empiric K-Means
The K2 Approach [36]
![Page 199: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/199.jpg)
⎥⎦
⎤⎢⎣
⎡01
⎥⎦
⎤⎢⎣
⎡00
⎥⎦
⎤⎢⎣
⎡10
A
Example for Pseudo-Norm in Type-B
B
'C
C
Empirical K-Means: Empirical Vectors
The proximity of the twins are preserved in the empirical space.
![Page 200: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/200.jpg)
![Page 201: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/201.jpg)
• It is a Hilbert space endowed with the conventional (linear) dot-product.•It has a finite dimension, N, with non-orthogonal bases. Its energy is more concentrated to the higher components •It is dependent of the training dataset.
Empirical Eigen-Space vs. spectral space
Empirical Eigen-Space
Empirical Eigen-Space has 1.Positive eigenvalues for all kind of data (vectorial/nonvectorial).2.It has a stronger denoising capability than the spectral space.(Its energy is more concentrated to the higher components).
![Page 202: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/202.jpg)
If the kernel matrix is NPD, it is necessary to convert it to PD. One way to achieve this objective is by adopting a PD kernel matrix in the empirical space:
The dimension-reduced vectors are formed from the columns of
are used as the training vectors.
In the full-dimension, the full-size columns of
Approximation in Empirical Eigen-Space
![Page 203: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/203.jpg)
Perturbation Analysis
{ }1
1
11 1 1 m a x ,
1
k
k m Nk k k
k
N
NN N N
N
λ λ+
⎡ ⎤⎢ ⎥⎢ ⎥⎡ ⎤ ⎢ ⎥⎢ ⎥⎡ ⎤ ⎢ ⎥⎢ ⎥ ≤ −⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦⎢ ⎥⎢ ⎥⎣ ⎦
L
MK”
By using the same perturbation analysis and, in addition, notingthat the perturbation is upper bounded by the largest eigenvalueterms of magnitude) of
Here K denote the number of clusters (instead of the kernel matrix).
![Page 204: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/204.jpg)
Future ResearchKernel Approach to Integration of heterogeneous data. [33].
![Page 205: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/205.jpg)
References
![Page 206: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/206.jpg)
References
![Page 207: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/207.jpg)
References
![Page 208: Kernel Approaches to Supervised/ Unsupervised Machine ...€¦ · Machine learning techniques are useful for data mining and pattern recognition when only pairwiserelationships of](https://reader034.vdocuments.site/reader034/viewer/2022050313/5f750b361485e0275f68a90e/html5/thumbnails/208.jpg)
References
For background reading on Linear Algebra:G. Strang's textbook on “Introduction to Linear Algebra ”, Wellesley Cambridge Press, 2003.
G. Golub and C. Van Loan. “Matrix Computations”, JohnsHopkins University Press, 1989.