elia liitiainen (elia.liitiainen@hut€¦ · elia liitiainen (elia.liitiainen@hut.fi) t-61.6050...
Post on 12-Aug-2020
2 Views
Preview:
TRANSCRIPT
Characteristics of an Analysis Method
Elia Liitiainen (elia.liitiainen@hut.fi)
T-61.6050
Helsinki University of Technology, Finland
September 23, 2007
Introduction
Basic concepts are presented.
The PCA method is analyzed.
Motivation for more sophisticated methods is given.
2 / 26
Outline
1 Expected Functionalities
2 Basic Characteristics of DR algorithms
3 PCA
4 Categorization of DR methods
3 / 26
Basic Requirements
Estimation of the embedding dimension.
Dimensionality reduction
Separation of latent variables.
4 / 26
Instrinsic Dimensionality
Let us assume that the data is in ℜD .
The basic assumption: the data can be embedded into ℜP
with P < D.
5 / 26
Example: A Low Dimensional Manifold
6 / 26
Latent Variables vs. Dimensionality Reduction
When extracting latent variables, a model generating the datais assumed (eg. ICA).
The embedding into a lower dimensional space is done underthis constraint.
Dimensionality reduction is easier: any low dimensionalrepresentation is a solution.
DR is less interpretable?
7 / 26
Example: Dimensionality Reduction
8 / 26
Example: Recovery of Latent Variables
Dimensionality reduction under an independence constraint.
9 / 26
Fundamental Issues
Many dimensionality reduction algorithms assume that thedata is generated by a model.
For example, in PCA it is assumed that a number of latentvariables explain the data in a linear way.
For the same model, different algorithms are possible.
10 / 26
The Criterion
The dimensionality reduction is often done using a criterionthat is optimized.
One possible idea is to measure distance preservation.
One may either try to preserve the distances between pointsor alternatively the topology.
11 / 26
Projection as a criterion
Let P : ℜD → ℜP be a projection.
P−1 denotes the reconstruction ℜP → ℜD .
A common criterion in dimensionality reduction is
E [‖y − P−1P(x)‖2].
12 / 26
Derivation of PCA (1)
The basic model behind PCA is
y = Wx
with y a random variable in ℜD and W a D × P matrix.
The sample (Xi ,Yi )Ni=1 is available; mean centering is
assumed.
Normalization/scaling is done according to prior knowledge.
13 / 26
Derivation of PCA (2)
Assume that W has orthonormal columns.
The projection criterion leads to
minW
E [‖y − WW Ty‖2].
This corresponds to finding the subspace which allows bestpossible reconstruction.
14 / 26
Derivation of PCA (3)
The optimization problem can be written equivalently as
maxW
E [yTWW T y ].
Let Y be the matrix of samples as column vectors.
Empirical approximation leads to
maxW
tr[Y TWW TY ].
15 / 26
Derivation of PCA (4)
Singular value decomposition Y = V ΣUT yields
maxW
tr[UΣV TWW TV ΣUT ].
The solution is taking the columns of V corresponding to thelargest singular values, which can be written as W = VID×P .
The reconstruction error depends on the singular valuesσP+1, . . . , σD .
16 / 26
Relation to the Covariance Matrix
Let Cy be the covariance matrix of the observations.
Finding the projection V is equivalent to finding theeigenvectors V1, . . . ,VP corresponding to the biggesteigenvalues.
The eigenvectors are the directions of maximal variance.
17 / 26
Choosing the embedding dimensionality
A simple method is to plot sorted eigenvalues.
After some point, the decrease is neglible.
This often fails; other choices include Akaike’s informationcriterion and other complexity penalization methods.
It is also possible to put a threshold: for example, require that95% of the variance is preserved.
18 / 26
Example: Determination of Instrinsic
Dimensionality
Choose X ∼ N(0, I ) ∈ ℜ2,
A = [0.1 0.2; 0.4 0.2; 0.3 0.3; 0.5 0.1; 0.1 0.4]
andY = AX + ǫ
with ǫ ∼ N(0, 0.1I ).
19 / 26
Example: Determination of Instrinsic
Dimensionality (2)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Figure: Eigenvalues of the covariance matrix.
The first two contain most of the variance.
20 / 26
PCA for nonlinear data (1)
The model: y =
4 cos(14x1)
4 sin(14x1)
x1 + x2
.
21 / 26
PCA for nonlinear data (2)
The reconstructed surface would be a plane.
22 / 26
DR vs. generative (latent variable) models
It is possible to model the data using latent variables andestimate the parameters.
In practice, it is simpler to directly learn a projection.
23 / 26
Local Dimensionality Reduction
A nonlinear manifold is locally approximately linear.
It is possible to derive a local PCA as a generalization to thenonlinear case.
24 / 26
Other Issues
Batch vs. online algorithm
Local maximas < − > global optimization (PCA)
25 / 26
Conclusion
Many dimensionality reduction methods are based on theassumption that the data is approximately on a manifold.
PCA solves the linear case, but fails in nonlinear problems.
Thank you for your attention.
26 / 26
top related